core_epa__assn_eia_epacamd

package: pudl

Association table providing connections between EPA units and EIA plants, boilers, and generators.

Most-recent data:

2024

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

EPA -- Mix of multiple EPA sources

Primary key:

This table has no primary key.

Additional Details

This crosswalk table comes from the PUDL fork of the EPA camd-eia-crosswalk Github repo: https://github.com/catalyst-cooperative/camd-eia-crosswalk-latest.

The camd-eia-crosswalk README and our Data Source documentation page on ../data_sources/epacems depict the complicated relationship between EIA and EPA data, specifically the nature of EPA vs. EIA "units" and the level of granularity that one can connect the two sources.

The original EPA crosswalk runs on 2018 EIA data. We adapted the crosswalk code to run on each new year of EIA data, capturing changes in plant information over time.

Our version of the crosswalk clarifies some of the column names and removes unmatched rows. The pudl.dagster.assets.core.glue.core_epa__assn_eia_epacamd function doc strings explain what changes are made from the EPA's version.

Columns
report_year

Four-digit year in which the data was reported.

plant_id_epa

The ORISPL ID used by EPA to refer to the plant. Usually but not always the same as plant_id_eia.

emissions_unit_id_epa

Emissions (smokestack) unit monitored by EPA CEMS.

generator_id_epa

Generator ID used by the EPA.

plant_id_eia

The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration.

boiler_id

Alphanumeric boiler ID.

generator_id

Generator ID is usually numeric, but sometimes includes letters. Make sure you treat it as a string!

core_epa__assn_eia_epacamd_subplant_ids

package: pudl

Association table providing connections between EPA units and EIA units/generators, at the subplant level.

Most-recent data:

2024

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

EPA -- Mix of multiple EPA sources

Primary key:

This table has no primary key. The primary keys would have been: plant_id_eia, generator_id, subplant_id and emissions_unit_id_epa, but there are some null records in the generator_id column. ~2 percent of all EPA CAMD records are not successfully mapped to EIA generators.

Additional Details

This table is an augmented version of the core_epa__assn_eia_epacamd crosswalk table which initially comes from the EPA's Github repo camd-eia-crosswalk: https://github.com/USEPA/camd-eia-crosswalk.

This table identifies subplants within plant_ids, which are the smallest coherent units for aggregation. A plant_id refers to a legal entity that often contains multiple distinct power plants, even of different technology or fuel types.

EPA CEMS data combines information from several parts of a power plant:

  • emissions from smokestacks

  • fuel use from combustors

  • electricity production from generators

But smokestacks, combustors, and generators can be connected in complex, many-to-many relationships. This complexity makes attribution difficult for, as an example, allocating pollution to energy producers. Furthermore, heterogeneity within plant_ids make aggregation to the parent entity difficult or inappropriate.

This table inherits from the EPA's crosswalk, the IDs from EPA CAMD core_epacems__hourly_emissions table itself, the core_eia860__assn_boiler_generator table and the core_eia860__scd_generators table. While the core_epa__assn_eia_epacamd table is the core backbone of the table, EPA CAMD IDs ensure there is complete coverage of EPA CAMD reporting units. The EIA 860 table addition ensures there is also complete coverage of those units as well.

For more information about the how this subplant_id is made, see the documentation for pudl.dagster.assets.core.glue.make_subplant_ids and pudl.dagster.assets.core.glue.update_subplant_ids.

But by analyzing the relationships between combustors and generators, as provided in the core_epa__assn_eia_epacamd crosswalk, we can identify distinct power plants. These are the smallest coherent units of aggregation.

Columns
plant_id_eia

The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration.

plant_id_epa

The ORISPL ID used by EPA to refer to the plant. Usually but not always the same as plant_id_eia.

subplant_id

Sub-plant ID links EPA CEMS emissions units to EIA units.

unit_id_pudl

Dynamically assigned PUDL unit id. WARNING: This ID is not guaranteed to be static long term as the input data and algorithm may evolve over time.

emissions_unit_id_epa

Emissions (smokestack) unit monitored by EPA CEMS.

generator_id

Generator ID is usually numeric, but sometimes includes letters. Make sure you treat it as a string!