core_eia__entity_plants

package: pudl

Entity table containing static information about plants, compiled from across all EIA-860 and EIA-923 data.

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

EIA -- Mix of multiple EIA Forms

Primary key:

plant_id_eia

Usage Warnings

  • Data has been drawn from several EIA sources which are not always consistent with each other, and PUDL chooses the most consistent or relevant value to facilitate cross-referencing even if that means some values will differ from the raw sources. See Harvesting for details, and see Entity Resolution Methodology for a fuller conceptual overview.

  • Contains information from multiple raw inputs.

Additional Details

This is one of two tables where canonical values for plants are set. It contains values which are expected to remain fixed, while core_eia860__scd_plants contains those which may vary from year to year. EIA reports many attributes in many different tables across EIA-860 and EIA-923. In order to compile tidy, well-normalized database tables, PUDL collects all instances of these values and and chooses a canonical value. By default, PUDL chooses the most consistently reported value of a given attribute as long as it is at least 70% of the given instances reported. If an attribute was reported inconsistently across the original EIA tables, then it will show up as a null value. See /methodology/entity_resolution for a conceptual overview of this process. All tables downstream of this one inherit the canonical values established here.

Columns
plant_id_eia

The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration.

plant_name_eia

Plant name.

city

Name of the city.

county

County name.

latitude

Latitude of the plant's location, in degrees.

longitude

Longitude of the plant's location, in degrees.

state

Two letter US state abbreviation.

street_address

Physical street address.

zip_code

Five digit US Zip Code.

timezone

IANA timezone name

_core_eia__forensics_entity_resolution_plants

package: pudl

Forensic table of the statistics determining how we choose a single consistent value during entity resolution for plants.

Processing:

Data has been cleaned but not tidied/normalized. Published only temporarily and may be removed without notice.

Source:

EIA -- Mix of multiple EIA Forms

Primary key:

This table has no primary key.

Usage Warnings

  • This table is meant for forensic purposes only. It contains all values which were used to choose canonical or golden-record. See Entity Resolution Methodology for a fuller conceptual overview.

  • Contains information from multiple raw inputs.

Additional Details

This is a forensic table containing the input values used to choose canonical values during entity resolution. It is not a cleaned up table - it is meant for forensic purposes only. If you have a question about why a value is reported in an scd, entity or out table, you can find out all of the inputs that were used as ingredients to find the canonical value. You can filter by the column_name and the entity id to find all of the possible input values.

Columns
plant_id_eia

The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration.

report_date

Date reported.

valid_until_date

The record in the changelog is valid until this date. The record is valid from the report_date up until but not including the valid_until_date.

column_name

The name of the column.

record_value

The original values found in PUDL _core table records that were used as ingredients to the entity resolution process.

entity_occurrences

The number of times this entity - aka this particular utility, plant, etc - occurs across the pre-entity resolution tables.

record_occurrences

The number of times this particular record_value occurs across the pre-entity resolution tables in association with this particular entity.

consistent_rate

What portion of the entity's records were reported with this particular record_value. This is calculated by dividing the record_occurrences by the entity_occurrences.

is_candidate

Is this record a candidate for being the canonical value? This is based on consistent_rate. By default PUDL requires values to be at least 70 percent consistent to pass this consistency check. There are exceptions to the default 70 percent consistency check for columns like plant or utility names when we always want a value - for those instances we choose the most frequently occurring value regardless of how consistently it was reported.