Comment on page
Options
This page describes the available configuration settings for Artie Transfer to use.
Below, these are the various options that can be specified within a configuration file. Once it has been created, you can run Artie Transfer like this:
/transfer -c /path/to/config.yaml
Note: Keys here are formatted in dot notation for readability purposes, please ensure that the proper nesting is done when writing this into your configuration file. To see sample configuration files, visit the Examples page.
Key | Optional | Description |
---|---|---|
outputSource | N | This is the destination.
Supported values are currently:
|
queue | Y | Defaults to kafka .Other valid options are kafka and pubsub . Please check the respective sections below on what else is required. |
reporting.sentry.dsn | Y | DSN for Sentry alerts. If blank, will just go to standard out. |
flushIntervalSeconds | Y | Defaults to 10 .
Valid range is between 5 seconds to 6 hours . |
bufferRows | Y | Defaults to 15000 .
When using BigQuery and Snowflake stages, there is no limit.
For Snowflake, the valid range is between 5-15000 |
flushSizeKb | Y | Defaults to 25mb .
When the in-memory database is greater than this value, it will trigger a flush cycle. |
Key | Optional | Description |
---|---|---|
kafka.bootstrapServer | N | Comma separated list of bootsrap servers.
Following the same spec as Kafka.
Example:
localhost:9092
host1:port1,host2:port2 |
kafka.groupID | N | Consumer group ID |
kafka.username | Y | Username (Transfer correctly only supports Plain SASL or no authentication). |
kafka.password | Y | Password |
kafka.enableAWSMKSIAM | Y | Defaults to false , turn this on if you want to use IAM authentication for communicating with Amazon MSK.
Make sure to unset username and password and provide: AWS_REGION , AWS_ACCESS_KEY_ID , AWS_SECRET_ACCESS_KEY |
TopicConfigs
are used at the table level and store configurations like:- Destination's database, schema and table name.
- What does the data format look like? Is there an idempotent key?
- Whether it should do row based soft deletion or not.
- Whether it should drop deleted columns or not.
kafka:
topicConfigs:
- { }
- { }
# OR as
pubsub:
topicConfigs:
- { }
- { }
Key | Optional | Description |
---|---|---|
*.topicConfigs[0].db | N | Name of the database in destination. |
*.topicConfigs[0].tableName | Y | Optional. Name of the table in destination.
* If not provided, we'll use the table name from the event.
* If provided, tableName acts as an override. |
*.topicConfigs[0].schema | N | Name of the schema in Snowflake (required).
Not needed for BigQuery. |
*.topicConfigs[0].topic | N | Name of the Kafka topic. |
*.topicConfigs[0].idempotentKey | N | Name of the column that is used for idempotency. This field is highly recommended.
For example: updated_at or another timestamp column. |
*.topicConfigs[0].cdcFormat | N | Name of the CDC connector (thus format) we should be expecting to parse against.
Currently, the supported values are:
|
*.topicConfigs[0].cdcKeyFormat | N | Format for what Kafka Connect will the key to be. This is called key.converter in the Kafka Connect properties file.
The supported values are: org.apache.kafka.connect.storage.StringConverter , org.apache.kafka.connect.json.JsonConverter
If not provided, the default value will be org.apache.kafka.connect.storage.StringConverter . |
*.topicConfigs[0].dropDeletedColumns | Y | Defaults to false .
When set to true , Transfer will drop columns in the destination when Transfer detects that the source has dropped these columns. This column should be turned on if your organization follows standard practice around database migrations.
This is available starting transfer:1.4.4 . |
*.topicConfigs[0].softDelete | Y | Defaults to false .
When set to true , Transfer will add an additional column called __artie_delete and will set the column to true instead of issuing a hard deletion.
This is available starting transfer:1.4.4 . |
*.topicConfigs[0].skipDelete | Y | Defaults to false .
When set to true , Transfer will skip the delete events.
This is available starting transfer:2.0.48 |
*.topicConfigs[0].includeArtieUpdatedAt | Y | Defaults to false .
When set to true , Transfer will emit an additional timestamp column named __artie_updated_at which signifies when this row was processed.
This is available starting transfer:2.0.17 |
*.topicConfigs[0].bigQueryPartitionSettings | Y | Enable this to turn on BigQuery table partitioning.
This is available starting transfer:2.0.24 |
Example
bigQueryPartitionSettings:
partitionType: time
partitionField: ts
partitionBy: daily
Key | Optional | Description |
---|---|---|
partitionType | N | Type of partitioning. Currently, we support only time-based partitioning.
Valid values right now are just time |
partitionField | N | Which field or column is being partitioned on. |
partitionBy | N | This is used for time partitioning, what is the time granularity?
Valid values right now are just daily |
Key | Optional | Description |
---|---|---|
pubsub.projectID | N | |
pubsub.pathToCredentials | N | Note: Transfer can support different credentials for BigQuery and Pub/Sub. Such that you can consume from one project and write to BQ on another. |
pubsub.topicConfigs | N | The topicConfigs here follows the same convention as kafka.topicConfigs . Please see above. |
Key | Optional | Description |
---|---|---|
bigquery.pathToCredentials | Y | Path to the credentials file for Google.
You can also directly inject GOOGLE_APPLICATION_CREDENTIALS ENV VAR, else Transfer will set it for you based on this value provided. |
bigquery.projectID | N | Google Cloud Project ID |
bigquery.location | Y | Location of the BigQuery dataset.
Defaults to us . |
bigquery.defaultDataset | N | The default dataset used. This just allows us to connect to BigQuery using data source notation (DSN). |
bigquery.batchSize | Y | Batch size is used to chunk the request to BigQuery's Storage API to avoid the 10 mb limit.
If this is not passed in, we will just default to 1000 . |
Key | Optional | Description |
---|---|---|
sharedTransferConfig.additionalDateFormats | Y | Example: sharedTransferConfig: additionalDateFormats: - 02/01/06 # DD/MM/YY - 02/01/2006 # DD/MM/YYYY |
sharedTransferConfig.createAllColumnsIfAvailable | Y | Boolean field.
If this is set true , it will create columns even if the value is NULL . |
Key | Optional | Description |
---|---|---|
sharedDestinationConfig.uppercaseEscapedNames | Y | Defaults to false .
By enabling this, the escaped value will be in upper case for both table and column names. |
Key | Optional | Description |
---|---|---|
snowflake.account | N | |
snowflake.username | N | Snowflake username |
snowflake.password | N | Snowflake password |
snowflake.warehouse | N | Snowflake virtual warehouse name |
snowflake.region | N | Snowflake region. |
Key | Optional | Description |
---|---|---|
redshift.host | N | Host URL
e.g. test-cluster.us-east-1.redshift.amazonaws.com |
redshift.port | N | - |
redshift.database | N | Namespace / Database in Redshift. |
redshift.username | N | |
redshift.password | N | |
redshift.bucket | N | Bucket for where staging files will be stored.
Click here to see how to set up a S3 bucket and have it automatically purged based on expiration. |
redshift.optionalS3Prefix | Y | The prefix for S3, say bucket is foo and prefix is bar.
It becomes:
s3://foo/bar/file.txt |
redshift.credentialsClause | N | |
redshift.skipLgCols | Y | Defaults to false.
If this is passed in, Artie Transfer will mask the column value with:
1. If value is a string, __artie_exceeded_value
2. if value is a struct / super,
{"key":"__artie_exceeded_value"} |
Key | Optional | Description |
---|---|---|
s3.optionalPrefix | Y | Prefix after the bucket name. |
s3.bucket | N | S3 bucket name |
s3.awsAccessKeyID | N | The AWS_ACCESS_KEY_ID for the service account. |
s3.awsSecretAccessKey | N | The AWS_SECRET_ACCESS_KEY for the service account. |
Key | Type | Optional | Description |
---|---|---|---|
telemetry.metrics | Object | Y | Parent object. See below. |
telemetry.metrics.provider | String | Y | Provider to export metrics to. Transfer currently only supports: datadog . |
telemetry.metrics.settings | Object | Y | Additional settings block, see below |
telemetry.metrics.settings.tags | Array | Y | Tags that will appear for every metrics like: env:production , company:foo |
telemetry.metrics.settings.namespace | String | Y | Optional namespace prefix for metrics. Defaults to transfer. if none is provided. |
telemetry.metrics.settings.addr | String | Y | Address for where the statsD agent is running. Defaults to 127.0.0.1:8125 if none is provided. |
telemetry.metrics.settings.sampling | Number | Y | Percentage of data to send. Provide a number between 0 and 1. Defaults to 1 if none is provided. Refer to this for additional information. |
Last modified 1mo ago