Azure Data Lake Storage Gen2 Sink Connector

Overview

The Azure Data Lake Storage Gen2 Sink Connector can be used to load data from Kafka topics to a container in an ADLS Gen2 storage account.
The records are combined and stored in a file conforming to the Avro Object Container File specification.
All records in the container file are read from the same partition.

Features

  • Offload data from Kafka topics to Azure Storage

  • Configurable staging and target directories

  • Target directory support timestamp patterns to combine files into directories according to timestamps

  • Use of record timestamps or processing time for pattern resolving

  • Multiple file rotation triggers

    • Timestamp resolves into different target directory path

    • Key or Value datatype changes (including different Avro schemas)

    • Number of records in container file

    • Size of container files

    • Inactivity on partition

  • Kafka topic offsets can be committed to only use offsets of records rotated to the target directory

License

Azure Data Lake Storage Sink Connector is licensed under the Apache License, Version 2.0.

Source code

The source code for the connector can be found on Axual Public repository