Engineering

Cross-region state sync without a central database

kafka flink clickhouse distributed-systems

SecurityScorecard runs in multiple AWS regions. Users in different regions interact with the same observations: accepting risk, flagging issues, requesting remediation. When a user in us-east-1 accepts a risk, a user in eu-west-1 should see that action reflected. Without a sync mechanism, they won't.

The naive solution is a centralized database that all regions write to and read from. I didn't build that. Here's why, and what I built instead.

Why not a central database

A central database for observation actions creates three problems. First, it's a single point of failure: if that database has an issue, all regions are affected. Second, it creates latency: every user action now involves a cross-region write, adding 50–150ms of RTT to an operation that should feel instant. Third, it's a data residency problem: many of SSC's enterprise customers have requirements about where their data lives, and routing all writes through a central store potentially violates those requirements.

The right architecture keeps data local to each region and synchronizes asynchronously.

The architecture

Kafka as the replication backbone

Every observation action (accept risk, flag, request remediation, update status, saved filters) is published as a Kafka event. Kafka gives us the durability guarantee we need: once published, the event will be delivered. If a region is temporarily unreachable, it catches up when it comes back online by replaying from its last consumed offset.

originating_region in message headers

The subtle part: how do you prevent circular replication? If region A publishes an event and region B consumes it and publishes it again, you get an infinite loop.

The fix is simple: every event carries its originating region in a Kafka message header. Each region's Flink consumer filters out events it originated; it only absorbs events from other regions. Region A publishes, region B sees originating_region=us-east-1, absorbs it. Region A sees the same event, sees originating_region=us-east-1, skips it. No coordination needed, no central authority.

ReplicatedReplacingMergeTree in ClickHouse

On the storage side, I used ClickHouse's ReplicatedReplacingMergeTree engine. This combines two properties:

Replicated - ClickHouse replicates data across nodes within the cluster for high availability. A node failure doesn't lose data.

Replacing - during background merges, ClickHouse keeps only the latest version of rows with the same primary key. If the same action arrives twice due to at-least-once Kafka delivery, only one row survives. The system is idempotent.

Together, these give you exactly-once semantics at the storage layer without distributed transactions, which would be far more complex and far slower.

The tradeoff I made explicitly

This is an AP system in CAP theorem terms. There is a brief window, typically a few seconds, where different regions see different states. A user who accepts a risk in us-east-1 will see the action reflected immediately in their region. A user checking from eu-west-1 in the next few seconds might not see it yet.

I documented this tradeoff and chose it deliberately. The alternative, synchronous cross-region writes, would have meant every user action blocks on a cross-region round trip. That's a 50–150ms hit on every interaction, plus a reliability dependency on inter-region connectivity. For observation actions, eventual consistency on the order of seconds is an acceptable tradeoff. These aren't financial transactions.

The system is eventually consistent, idempotent, and partition-tolerant. It degrades gracefully: if a region is isolated, it keeps working with its local state and catches up when connectivity is restored.

What I'd do differently

I'd add a replication lag metric visible to the team: how far behind is each region from the latest event? Right now you'd have to query Kafka consumer group offsets and do the math yourself. A dashboard showing per-region replication lag would make it immediately obvious if something is falling behind.

← Previous post Next: Reporting on 18B rows →