Technical Reconciliation¶
Technical reconciliation validates data movement from source to target. As part of Echelon, flow data (or record movement) is captured from point-to-point. Technical reconciliation uses this data in conjunction with a formula to validate if records were all accounted for.
Flow Labels¶
Storing the flow metrics in the flow table facilitates technical reconciliation.
In order to ensure technical reconciliation can be performed, the label column within the flow table must match the following standards:
| Description | Label |
|---|---|
| Number of records present in the source entity that were selected. |
source_count |
| Number of records inserted into the target entity. | insert_count |
| Number of records that were found to be identical in the source and the target entity. |
match_count |
| Number of records that were not inserted into the target entity because of an error. |
reject_count |
| Number of records inserted into the target entity as part of SCD Type 2. |
update_insert_count |
| Number of records updated in the target entity as part of SCD Type 2. |
end_date_count |
| Number of records that are not accounted for. If this value is greater than or less than zero technical reconciliation has failed. |
technical_reconciliation_variance |
Reconciliation Formulas¶
The following formulas have been set up in the constant table to enable technical reconciliation through the Echelon. These formulas live in the constant table and are retrieved during the technical reconciliation job.
Labelling of reconciliation job depends on the target entity type. The results of reconciliation are stored in the flow table.
File¶
All
Formula
{source_count} - {insert_count}
Labels
| Job | Label used in flow table |
|---|---|
| Retrieve the number of records in the header (control file) or the file. | source_count |
| Count the number of records loaded into the staging table. | insert_count |
Data Vault¶
Hub
Formula
{source_count} - {insert_count} - {match_count} - {filter_count}
Labels
| Job | Label used in flow table |
|---|---|
| Count the number of records in the staging table. | source_count |
Count the new records loaded into the hub (i.e., the new inserted because new business key identified). Technically a new hash_key has been identified. |
insert_count |
Count the number of records not loaded into the hub because the business key already exists in the hub. Technically the hash_key already exists in the hub. |
match_count |
| Count the number of records not loaded into hub due to the hard rule being broken. | reject_count |
Satellite
Formula
{source_count} - {insert_count} - ({update_insert_count} + {end_date_count})/2 - {match_count}
Labels
| Job | Label used in flow table |
|---|---|
| Count the number of records in the staging table. | source_count |
Count new records loaded into satellite ( i.e., the new is inserted because new business key identified). Technically a new hash_key identified. |
insert_count |
Count the number of records where the business key already exists in satellite but new descriptive information needs to be loaded. Technically hash_key already exists in the satellite but the new hash_diff_key needs to be inserted. |
update_insert_count |
| Count the number of records where business key already exists but needs to be end dated because new descriptive information has been identified and loaded. | end_date_count |
Count the number of records that already exist in satellite and have no update. Technically hash_key and hash_diff_key are the same. |
match_count |
| Count the number of records rejected due to the hard rule being broken. | reject_count |
Link
Formula
{source_count} - {insert_count} - {match_count} - {filter_count}
Labels
| Job | Label used in flow table |
|---|---|
| In source, count the number of records with a business key combination that defines the link. | source_count |
Count the number of new records loaded into the link ( i.e., the new inserted because the new business key combination was identified). Technically a new hash_key has been identified. |
insert_count |
Count the number of records not loaded into the link because the business key combination already exists in the link. Technically the hash_key already exists in the link. |
match_count |
| Count the number of records not loaded into the link due to the hard rule being broken. | reject_count |
Dimensional Layer¶
Dimension Type 1
Formula
{source_count} - {insert_count} - {match_count} - {update_count}
Labels
| Job | Label used in flow table |
|---|---|
| Count the number of records in the source query. | source_count |
Count the number of new records loaded into the dimension ( i.e., the new is inserted because new business key identified). Technically a new hash_key identified. |
insert_count |
Count the number of records where the business key already exists in dimension but new descriptive information needs to be loaded. Technically hash_key already exists in dimension but new hash_diff_key needs to be inserted. |
update_count |
Count the number of records not loaded into the dimension because the business key combination already exists in the dimension. Technically the hash_key already exists in the dimension. |
match_count |
Dimension Type 2
Formula
{source_count} - {insert_count} - ({update_insert_count} + {end_date_count})/2 - {match_count}
Labels
| Job | Label used in flow table |
|---|---|
| Count the number of records in the source query. | source_count |
Count new records loaded into dimension (i.e.the new is inserted because new business key identified). Technically a new hash_key identified. |
insert_count |
Count the number of records where the business key already exists in the dimension but new descriptive information is needed to be loaded. Technically hash_key already exists in dimension but new hash_diff_key needs to be inserted. |
update_insert_count |
| Count the number of records where the business key already exists but needs to be end dated because new descriptive information has been identified and loaded. | end_date_count |
Count the number of records that already exist in dimension and have no update. Technically hash_key and hash_diff_key are the same. |
match_count |
| Count the number of records rejected due to business. | reject_count |
Accumulating Snapshot Fact
Formula
{source_count} - {insert_count} - {update_count}
Labels
| Job | Label used in flow table |
|---|---|
| Count the number of records in the source query. | source_count |
Count the number of records where the business key already exists in the fact but new descriptive information is needed to be loaded. Technically hash_key already exists in dimension but new hash_diff_key needs to be updated. |
update_insert_count |
Count the number of new records loaded into the fact (i.e., the new is inserted because new business key identified). Technically a new hash_key identified. |
insert_count |
Snapshot Fact
Formula
{source_count} - {insert_count} - {update_count}
Labels
| Job | Label used in flow table |
|---|---|
| Count the number of records in the source query. | source_count |
Count the number of records where the business key already exists in the fact but new descriptive information is needed to be loaded. Technically hash_key already exists in dimension but new hash_diff_key needs to be updated. |
update_insert_count |
Count the number of new records loaded into the fact (i.e., the new is inserted because new business key identified). Technically a new hash_key identified. |
insert_count |
Transaction Fact
Formula
{source_count} - {insert_count}
Labels
| Job | Label used in flow table |
|---|---|
| Count the number of records in the source query. | source_count |
Count the number of new records loaded into the fact (i.e., the new is inserted because new business key identified). Technically a new hash_key identified. |
insert_count |