SQR-040: The EFD Aggregator

  • Angelo Fausti

Latest Revision: 2020-05-11

Note

This technote is not yet published.

This technote describes the EFD Aggregator, a Kafka aggregator based on the Faust Python Stream Processing library.

1   IntroductionΒΆ

_images/efd_aggregator.png

Figure 1 Components of the EFD at the LSST Data Facility (LDF) showing the Aggregator, the Replicator and other connectors to write data from Kafka into InfluxDB, PostgreSQL and Parquet.

  • The EFD aggregator is a Faust based application, Faust agents consume topics from Kafka and perform window aggregation on predefined topic fields, the window size is configurable DM-24403
  • A new the aggregated topic is produced and its schema is registered in Secondary Schema Registry to avoid Schema ID collision DM-24856
  • We plan on having a dedicated Kafka Connect cluster (2 nodes) to consume the full and/or aggregated streams and write to different formats using open source Kafka connectors. The InfluxDB Sink connector we currently use is from the stream-reactor project and both JDBC and Amazon S3 connectors are under the Confluent Community License.
  • The system is flexible enough to record either the full or aggregated stream in multiple formats.
  • We have successfully tested data replication using the Confluent Replicator connector DM-23973, but we are planning on replacing it by an open source alternative. The best option for now is Mirus DM-24774