Note

This technote is not yet published.

This technote describes the EFD Aggregator, a Kafka aggregator based on the Faust Python Stream Processing library.

1 Introduction¶

Figure 1 Components of the EFD at the LSST Data Facility (LDF) showing the Aggregator, the Replicator and other connectors to write data from Kafka into InfluxDB, PostgreSQL and Parquet.

The EFD aggregator is a Faust based application, Faust agents consume topics from Kafka and perform window aggregation on predefined topic fields, the window size is configurable DM-24403
A new the aggregated topic is produced and its schema is registered in Secondary Schema Registry to avoid Schema ID collision DM-24856
We plan on having a dedicated Kafka Connect cluster (2 nodes) to consume the full and/or aggregated streams and write to different formats using open source Kafka connectors. The InfluxDB Sink connector we currently use is from the stream-reactor project and both JDBC and Amazon S3 connectors are under the Confluent Community License.
The system is flexible enough to record either the full or aggregated stream in multiple formats.
We have successfully tested data replication using the Confluent Replicator connector DM-23973, but we are planning on replacing it by an open source alternative. The best option for now is Mirus DM-24774

SQR-040: The EFD Aggregator

1 Introduction¶