_core_eia__forensics_entity_resolution_generators
Return to SearchForensic table of the statistics determining how we choose a single consistent value during entity resolution for generators.
- Processing:
Data has been cleaned but not tidied/normalized. Published only temporarily and may be removed without notice.
- Source:
EIA -- Mix of multiple EIA Forms
- Primary key:
This table has no primary key.
Usage Warnings
This table is meant for forensic purposes only. It contains all values which were used to choose canonical or golden-record. See Entity Resolution Methodology for a fuller conceptual overview.
Contains information from multiple raw inputs.
Additional Details
This is a forensic table containing the input values used to choose canonical values during entity resolution. It is not a cleaned up table - it is meant for forensic purposes only. If you have a question about why a value is reported in an scd, entity or out table, you can find out all of the inputs that were used as ingredients to find the canonical value. You can filter by the column_name and the entity id to find all of the possible input values.
Columns
The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration.
Generator ID is usually numeric, but sometimes includes letters. Make sure you treat it as a string!
Date reported.
The record in the changelog is valid until this date. The record is valid from the report_date up until but not including the valid_until_date.
The name of the column.
The original values found in PUDL _core table records that were used as ingredients to the entity resolution process.
The number of times this entity - aka this particular utility, plant, etc - occurs across the pre-entity resolution tables.
The number of times this particular record_value occurs across the pre-entity resolution tables in association with this particular entity.
What portion of the entity's records were reported with this particular record_value. This is calculated by dividing the record_occurrences by the entity_occurrences.
Is this record a candidate for being the canonical value? This is based on consistent_rate. By default PUDL requires values to be at least 70 percent consistent to pass this consistency check. There are exceptions to the default 70 percent consistency check for columns like plant or utility names when we always want a value - for those instances we choose the most frequently occurring value regardless of how consistently it was reported.