Artie Transfer
Search
K
Comment on page

Overview

In this section, we will go over the metrics that Artie Transfer emits and the future roadmap.

Today

Artie Transfer's currently only supports metrics and integrates with Datadog. We are committed to being vendor neutral, but not at the cost of stability and reliability. As such, we will be using OpenTelemetry when the library is stable.
We also plan to support application tracing such that we can directly plug into your APM provider.

Metrics

You can specify additional tags and namespace in the configuration file and it will apply to every metric that Transfer emits. See Options for more details.
Name
Description
Unit
Tags
transfer.ingestion.lag.95percentile
p95 of time lag from Kafka message was published and received. Since Transfer 1.4.6
ms
  • groupid
  • topic
  • partition
  • table
transfer.ingestion.lag.avg
Avg of time lag from Kafka message was published and received. Since Transfer 1.4.6 If self-hosting Transfer, this is a good metric to set a monitor for.
ms
  • groupid
  • topic
  • partition
  • table
transfer.ingestion.lag.max
max lag from Kafka message was published and received. Since Transfer 1.4.6
ms
  • groupid
  • topic
  • partition
  • table
transfer.process.message.count
How many rows has Transfer processed.
Count
  • database
  • schema
  • table
  • groupid
  • op
  • skipped
transfer.process.message.95percentile
p95 of how long each row process takes.
ms
  • database
  • schema
  • table
  • groupid
  • op
  • skipped
transfer.process.message.avg
Avg of how long each row process takes.
ms
  • database
  • schema
  • table
  • groupid
  • op
  • skipped
transfer.process.message.max
Max of how long each row process takes.
ms
  • database
  • schema
  • table
  • groupid
  • op
  • skipped
transfer.process.message.median
Median of how long each row process takes.
ms
  • database
  • schema
  • table
  • groupid
  • skipped
transfer.flush.count
How many flush operations have been performed.
Count
  • database
  • schema
  • table
  • what
  • reason
transfer.flush.95percentile
p95 of how long each flush process takes.
ms
  • database
  • schema
  • table
  • what
  • reason
transfer.flush.avg
Avg of how long each flush process takes.
ms
  • database
  • schema
  • table
  • what
  • reason
transfer.flush.max
Max of how long each flush process takes.
ms
  • database
  • schema
  • table
  • what
  • reason
transfer.flush.median
Median of how long each flush process takes.
ms
  • database
  • schema
  • table
  • what
  • reason

The what tag explained

The what tag aims to provide a high level of visibility into whether an attempt has succeeded or not. And if it did not succeed, it will provide additional visibility into which particular operation failed (vs just providing a generic error state).
Transfer will provide what:success if the attempt failed and different reasoning depending on the error state. This way, our monitors and response to failures can be more actionable and we can jump straight to the offending code block.
Here's a visualization of what this looks like
Last modified 1mo ago