Client-Side De-Duplicating
While duplicate events should be a rare occurrence on the BitBrew platform, our at least-once delivery guarantee requires business applications to have a robust duplicate handling system.
Comparing incoming data to the last received data won't properly de-duplicate when data arrives out of order. Instead, the platform assigns a unique event ID to every received event. Use this UUID to de-duplicate:
Store the
eventId
located in the header of every event in a table with good lookup performance. Suggestions: Redis, Cassandra, or DynamoDB.Upon receipt of a new event, read the
eventId
field and perform a lookup in your table.If the UUID exists, discard the duplicate message. If the UUID doesn't exist, add it to the table and process the record.
It should be safe to begin overwriting event IDs in the table after one year.
Event ID lookup should be done before any other processing to prevent duplicates from interfering with business logic and becoming visible to your customers.