Skip to content

Entity Constant Name

value cast required comment
blank_as_null boolean false Loads blank fields, which consist of only white space characters, as NULL.
ceiling_guardrail integer false Largest acceptable value after deviation.
column_delimiter char(5) false The character(s) used to delimit the file.
column_quote_character char(1) false The character(s) used to quote values.
control_file_path text false Path to the control file.
control_file_pattern text false Regular expression pattern of the control file used to find the files in the above path.
control_file_server text false Server where the control file is stored. For AWS this is the S3 bucket, for GCP this is the Cloud Storage bucket, for on-premises this is the DNS or IP address of the file server.
data_file_path text false Path to the data file.
data_file_pattern text false Regular expression pattern of the data file used to find the files in the above path.
data_file_server text false Server where the data file is stored. For AWS this is the S3 bucket, for GCP this is the Cloud Storage bucket, for on-premises this is the DNS or IP address of the file server.
database_name text false Database name used within connection string.
database_port integer false Database port number used within connection string.
database_server text false Database host/IP used within connection string.
date_format text false Format of the date used to convert the extract date into a date object. The format should match the one expected by the ETL tool.
date_pattern text false Regular expression pattern of the data file used to find the files in the above path.
empty_as_null boolean false Loads empty fields, which do not consist of any characters, as NULL.
end_of_transmision_file_path text false Path to the end of transmission file.
end_of_transmision_file_pattern text false Regular expression pattern of the end of transmission file used to find the files in the above path.
end_of_transmision_file_server text false Server where the end of transmission file is stored. For AWS this is the S3 bucket, for GCP this is the Cloud Storage bucket, for on-premises this is the DNS or IP address of the file server.
extract_job text false Fully qualified name of the job that will extract the data from the source entity.
floor_guardrail integer false Smallest acceptable value after deviation.
hash_diff_column_name text false Name of the column that stores the MD5 checksum of the concatenated non-primary key columns.
hash_function text false Hash function used to create a hash of the original value for checksum purposes.
hash_key_column_name text false Name of the column that stores the MD5 checksum of the concatenated primary key columns.
header_count integer false Number of rows at the beginning of the file that should be ignored.
header_count_end_range integer false End index of the file count header field.
header_count_start_range integer false Start index of the file count header field.
header_date_end_range text false End index of the file date header field.
header_date_start_range text false Start index of the file date header field.
html_file_path text false Fully qualified name of the HTML file that will be produced by the data profiling operation.
html_file_server text false Server where the control file is stored. For AWS this is the S3 bucket, for GCP this is the Cloud Storage bucket, for on-premises this is the DNS or IP address of the file server.
json_schema text false JSON schema to be used by the ETL tool to parse the file. If not present json_schema_file_path is required.
json_schema_file_path text false Path to the JSON schema file. If not present json_schema is required.
load_job text false Fully qualified name of the job that will load the data into the target entity.
max_diviation integer false Numeric value of the maximum deviation required.
min_diviation integer false Numeric value of the minimum deviation required.
notebook_file_path text false Fully qualified name of the Jupyter Notebook file that will be produced by the data profiling operation.
notebook_file_server text false Server where the control file is stored. For AWS this is the S3 bucket, for GCP this is the Cloud Storage bucket, for on-premises this is the DNS or IP address of the file server.
orchestration_job text false Fully qualified name of the job that will orchestrate the source file load.
read_view text false Default database view used for reading from the database table.
replace_pattern text false Pattern used to replace characters selected by the select_pattern.
schema_name text false Database schema name of the database table.
seed_provider text false Provider name based on your selected fake data library.
select_pattern text false Regular expression pattern used to select the text to mask (e.g., ^d{12} to select the first 12 numbers of a credit card number).
surrogate_key_column_name text false Database table column name of the surrogate key column.
table_name text false Database table name.
table_view text false Database table name.
token_pattern text false Regular expression pattern used to generate a unique token whos pattern matches the original value (e.g., d{16} for credit card numbers).
trailer_count integer false Number of rows at the end of the file that should be ignored.
transform_job text false Fully qualified name of the job that will transform the extracted data from the source entity.
transform_sql text false For instances where a view cannot be created, SQL transformation logic for the job can be stored here.
transform_view text false View that contains transformation logic for the job.
trim_blanks boolean false Removes the trailing white space characters.
watermark_field text false Field used for pipeline checkpointing.
xml_schema text false
xml_schema_file_path text false