| blank_as_null |
boolean |
false |
Loads blank fields, which consist of only white space characters, as NULL. |
| ceiling_guardrail |
integer |
false |
Largest acceptable value after deviation. |
| column_delimiter |
char(5) |
false |
The character(s) used to delimit the file. |
| column_quote_character |
char(1) |
false |
The character(s) used to quote values. |
| control_file_path |
text |
false |
Path to the control file. |
| control_file_pattern |
text |
false |
Regular expression pattern of the control file used to find the files in the above path. |
| control_file_server |
text |
false |
Server where the control file is stored. For AWS this is the S3 bucket, for GCP this is the Cloud Storage bucket, for on-premises this is the DNS or IP address of the file server. |
| data_file_path |
text |
false |
Path to the data file. |
| data_file_pattern |
text |
false |
Regular expression pattern of the data file used to find the files in the above path. |
| data_file_server |
text |
false |
Server where the data file is stored. For AWS this is the S3 bucket, for GCP this is the Cloud Storage bucket, for on-premises this is the DNS or IP address of the file server. |
| database_name |
text |
false |
Database name used within connection string. |
| database_port |
integer |
false |
Database port number used within connection string. |
| database_server |
text |
false |
Database host/IP used within connection string. |
| date_format |
text |
false |
Format of the date used to convert the extract date into a date object. The format should match the one expected by the ETL tool. |
| date_pattern |
text |
false |
Regular expression pattern of the data file used to find the files in the above path. |
| empty_as_null |
boolean |
false |
Loads empty fields, which do not consist of any characters, as NULL. |
| end_of_transmision_file_path |
text |
false |
Path to the end of transmission file. |
| end_of_transmision_file_pattern |
text |
false |
Regular expression pattern of the end of transmission file used to find the files in the above path. |
| end_of_transmision_file_server |
text |
false |
Server where the end of transmission file is stored. For AWS this is the S3 bucket, for GCP this is the Cloud Storage bucket, for on-premises this is the DNS or IP address of the file server. |
| extract_job |
text |
false |
Fully qualified name of the job that will extract the data from the source entity. |
| floor_guardrail |
integer |
false |
Smallest acceptable value after deviation. |
| hash_diff_column_name |
text |
false |
Name of the column that stores the MD5 checksum of the concatenated non-primary key columns. |
| hash_function |
text |
false |
Hash function used to create a hash of the original value for checksum purposes. |
| hash_key_column_name |
text |
false |
Name of the column that stores the MD5 checksum of the concatenated primary key columns. |
| header_count |
integer |
false |
Number of rows at the beginning of the file that should be ignored. |
| header_count_end_range |
integer |
false |
End index of the file count header field. |
| header_count_start_range |
integer |
false |
Start index of the file count header field. |
| header_date_end_range |
text |
false |
End index of the file date header field. |
| header_date_start_range |
text |
false |
Start index of the file date header field. |
| html_file_path |
text |
false |
Fully qualified name of the HTML file that will be produced by the data profiling operation. |
| html_file_server |
text |
false |
Server where the control file is stored. For AWS this is the S3 bucket, for GCP this is the Cloud Storage bucket, for on-premises this is the DNS or IP address of the file server. |
| json_schema |
text |
false |
JSON schema to be used by the ETL tool to parse the file. If not present json_schema_file_path is required. |
| json_schema_file_path |
text |
false |
Path to the JSON schema file. If not present json_schema is required. |
| load_job |
text |
false |
Fully qualified name of the job that will load the data into the target entity. |
| max_diviation |
integer |
false |
Numeric value of the maximum deviation required. |
| min_diviation |
integer |
false |
Numeric value of the minimum deviation required. |
| notebook_file_path |
text |
false |
Fully qualified name of the Jupyter Notebook file that will be produced by the data profiling operation. |
| notebook_file_server |
text |
false |
Server where the control file is stored. For AWS this is the S3 bucket, for GCP this is the Cloud Storage bucket, for on-premises this is the DNS or IP address of the file server. |
| orchestration_job |
text |
false |
Fully qualified name of the job that will orchestrate the source file load. |
| read_view |
text |
false |
Default database view used for reading from the database table. |
| replace_pattern |
text |
false |
Pattern used to replace characters selected by the select_pattern. |
| schema_name |
text |
false |
Database schema name of the database table. |
| seed_provider |
text |
false |
Provider name based on your selected fake data library. |
| select_pattern |
text |
false |
Regular expression pattern used to select the text to mask (e.g., ^d{12} to select the first 12 numbers of a credit card number). |
| surrogate_key_column_name |
text |
false |
Database table column name of the surrogate key column. |
| table_name |
text |
false |
Database table name. |
| table_view |
text |
false |
Database table name. |
| token_pattern |
text |
false |
Regular expression pattern used to generate a unique token whos pattern matches the original value (e.g., d{16} for credit card numbers). |
| trailer_count |
integer |
false |
Number of rows at the end of the file that should be ignored. |
| transform_job |
text |
false |
Fully qualified name of the job that will transform the extracted data from the source entity. |
| transform_sql |
text |
false |
For instances where a view cannot be created, SQL transformation logic for the job can be stored here. |
| transform_view |
text |
false |
View that contains transformation logic for the job. |
| trim_blanks |
boolean |
false |
Removes the trailing white space characters. |
| watermark_field |
text |
false |
Field used for pipeline checkpointing. |
| xml_schema |
text |
false |
|
| xml_schema_file_path |
text |
false |
|