core_epa__assn_eia_epacamd
Association table providing connections between EPA units and EIA plants, boilers, and generators.
- Most-recent data:
2024
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
EPA -- Mix of multiple EPA sources
- Primary key:
This table has no primary key.
Additional Details
This crosswalk table comes from the PUDL fork of the EPA camd-eia-crosswalk Github repo: https://github.com/catalyst-cooperative/camd-eia-crosswalk-latest.
The camd-eia-crosswalk README and our Data Source documentation page on ../data_sources/epacems depict the complicated relationship between EIA and EPA data, specifically the nature of EPA vs. EIA "units" and the level of granularity that one can connect the two sources.
The original EPA crosswalk runs on 2018 EIA data. We adapted the crosswalk code to run on each new year of EIA data, capturing changes in plant information over time.
Our version of the crosswalk clarifies some of the column names and removes unmatched rows. The pudl.dagster.assets.core.glue.core_epa__assn_eia_epacamd function doc strings explain what changes are made from the EPA's version.
Columns
Four-digit year in which the data was reported.
The ORISPL ID used by EPA to refer to the plant. Usually but not always the same as plant_id_eia.
Emissions (smokestack) unit monitored by EPA CEMS.
Generator ID used by the EPA.
The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration.
Alphanumeric boiler ID.
Generator ID is usually numeric, but sometimes includes letters. Make sure you treat it as a string!
core_epa__assn_eia_epacamd_subplant_ids
Association table providing connections between EPA units and EIA units/generators, at the subplant level.
- Most-recent data:
2024
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
EPA -- Mix of multiple EPA sources
- Primary key:
This table has no primary key. The primary keys would have been: plant_id_eia, generator_id, subplant_id and emissions_unit_id_epa, but there are some null records in the generator_id column. ~2 percent of all EPA CAMD records are not successfully mapped to EIA generators.
Additional Details
This table is an augmented version of the core_epa__assn_eia_epacamd crosswalk table which initially comes from the EPA's Github repo camd-eia-crosswalk: https://github.com/USEPA/camd-eia-crosswalk.
This table identifies subplants within plant_ids, which are the smallest coherent units for aggregation. A plant_id refers to a legal entity that often contains multiple distinct power plants, even of different technology or fuel types.
EPA CEMS data combines information from several parts of a power plant:
emissions from smokestacks
fuel use from combustors
electricity production from generators
But smokestacks, combustors, and generators can be connected in complex, many-to-many relationships. This complexity makes attribution difficult for, as an example, allocating pollution to energy producers. Furthermore, heterogeneity within plant_ids make aggregation to the parent entity difficult or inappropriate.
This table inherits from the EPA's crosswalk, the IDs from EPA CAMD core_epacems__hourly_emissions table itself, the core_eia860__assn_boiler_generator table and the core_eia860__scd_generators table. While the core_epa__assn_eia_epacamd table is the core backbone of the table, EPA CAMD IDs ensure there is complete coverage of EPA CAMD reporting units. The EIA 860 table addition ensures there is also complete coverage of those units as well.
For more information about the how this subplant_id is made, see the documentation for pudl.dagster.assets.core.glue.make_subplant_ids and pudl.dagster.assets.core.glue.update_subplant_ids.
But by analyzing the relationships between combustors and generators, as provided in the core_epa__assn_eia_epacamd crosswalk, we can identify distinct power plants. These are the smallest coherent units of aggregation.
Columns
The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration.
The ORISPL ID used by EPA to refer to the plant. Usually but not always the same as plant_id_eia.
Sub-plant ID links EPA CEMS emissions units to EIA units.
Dynamically assigned PUDL unit id. WARNING: This ID is not guaranteed to be static long term as the input data and algorithm may evolve over time.
Emissions (smokestack) unit monitored by EPA CEMS.
Generator ID is usually numeric, but sometimes includes letters. Make sure you treat it as a string!