Event-Driven Architecture: Building Resilient, Scalable Systems with EDA

In a traditional microservices architecture, Service A calls Service B, which calls Service C. If Service C is slow, the entire chain hangs. If Service C is down, the request fails. This is tight coupling—and it creates fragile, hard-to-scale systems.
Event-Driven Architecture (EDA) inverts this model. Instead of "do this," services say "this happened." The difference is profound.
The Problem with Request-Response
Traditional Request-Response (Synchronous)
──────────────────────────────────────────────────────────────────
User ──► Order Service ──► Inventory Service ──► Payment Service
│ │ │
└──── WAITS ───┴────── WAITS ──────┘
If Payment Service is slow (5s):
├── Inventory Service blocks for 5s
├── Order Service blocks for 5s
└── User waits 10+ seconds total
If Inventory Service is DOWN:
├── Order Service gets timeout/error
├── User sees failure
└── Order is lost
Problems with synchronous architecture:
| Issue | Impact |
|---|---|
| Cascading Failures | One slow service brings down the chain |
| Tight Coupling | Services must know each other's APIs |
| Scaling Bottlenecks | Downstream services limit throughput |
| Lost Transactions | Failures lose in-flight requests |
| Hard to Extend | Adding consumers requires code changes |
Event-Driven Architecture: A Different Paradigm
With EDA, services communicate through events—immutable facts about what happened:
Event-Driven Architecture
──────────────────────────────────────────────────────────────────
┌─────────────────┐
│ Message Broker │
│ (Kafka/RabbitMQ)│
└────────┬────────┘
│
┌────────────────────────┼────────────────────────┐
│ │ │
▼ ▼ ▼
┌───────────────┐ ┌───────────────┐ ┌───────────────┐
│ Inventory │ │ Payment │ │ Email │
│ Service │ │ Service │ │ Service │
│ │ │ │ │ │
│ "Reserve │ │ "Process │ │ "Send order │
│ Stock" │ │ Payment" │ │ confirmation"│
└───────────────┘ └───────────────┘ └───────────────┘
Order Service publishes: OrderPlaced { orderId, items, customerId }
├── Order Service doesn't wait
├── Order Service doesn't know who listens
└── Each consumer processes independently
Choreography vs. Orchestration
There are two patterns for coordinating events across services:
Choreography: Decentralized Dance
Each service reacts to events and publishes its own events. No central coordinator.
Choreography Pattern
──────────────────────────────────────────────────────────────────
OrderPlaced ──► Inventory Service ──► StockReserved
│
StockReserved ──► Payment Service ──► PaymentProcessed
│
PaymentProcessed ──► Shipping Service ──► ShipmentCreated
│
ShipmentCreated ──► Email Service ──► NotificationSent
Each service:
├── Listens for events it cares about
├── Does its work
└── Publishes outcome events
Pros:
- Loose coupling—services are truly independent
- Easy to add new consumers
- No single point of failure
Cons:
- Hard to see the full workflow
- Debugging requires tracing across services
- Saga patterns needed for failures
Orchestration: Central Conductor
A central orchestrator coordinates the workflow:
Orchestration Pattern
──────────────────────────────────────────────────────────────────
┌──────────────────┐
│ Order Saga │
│ Orchestrator │
└────────┬─────────┘
│
┌───────────────────┼───────────────────┐
▼ ▼ ▼
Reserve Stock Process Payment Create Shipment
│ │ │
└───────────────────┴───────────────────┘
│
Results back to orchestrator
│
Orchestrator decides next step
Pros:
- Workflow is visible in one place
- Easier to understand and debug
- Compensating actions are clear
Cons:
- Orchestrator is a coupling point
- Single point of failure risk
- Can become a bottleneck
When to Use Each
| Use Case | Pattern |
|---|---|
| Simple notification flows | Choreography |
| Many consumers for same event | Choreography |
| Complex business transactions | Orchestration |
| Strict ordering requirements | Orchestration |
| Regulatory audit trails | Orchestration |
Implementing EDA with Apache Kafka
Kafka is the industry standard for high-throughput event streaming:
Kafka Architecture
Kafka Cluster Architecture
──────────────────────────────────────────────────────────────────
Producer A ──┐ ┌── Consumer Group 1
Producer B ──┼──► Topic: orders │ ├── Consumer 1a
Producer C ──┘ ├── Partition 0 ──────────┼── Consumer 1b
├── Partition 1 ──────────┤
└── Partition 2 ──────────┘
└── Consumer Group 2
├── Consumer 2a
└── Consumer 2b
Key Concepts:
├── Topics: Named streams of events
├── Partitions: Enable parallelism (events ordered within partition)
├── Consumer Groups: Enable scaling consumers
└── Retention: Events persisted for replay
Kafka Producer Example (TypeScript)
import { Kafka, Partitioners } from "kafkajs";
const kafka = new Kafka({
clientId: "order-service",
brokers: ["kafka1:9092", "kafka2:9092"],
});
const producer = kafka.producer({
createPartitioner: Partitioners.DefaultPartitioner,
});
interface OrderPlacedEvent {
eventId: string;
eventType: "OrderPlaced";
timestamp: string;
payload: {
orderId: string;
customerId: string;
items: Array<{ productId: string; quantity: number; price: number }>;
total: number;
};
}
async function publishOrderPlaced(order: Order): Promise<void> {
const event: OrderPlacedEvent = {
eventId: crypto.randomUUID(),
eventType: "OrderPlaced",
timestamp: new Date().toISOString(),
payload: {
orderId: order.id,
customerId: order.customerId,
items: order.items,
total: order.total,
},
};
await producer.send({
topic: "orders",
messages: [
{
key: order.id, // Ensures all events for same order go to same partition
value: JSON.stringify(event),
headers: {
"content-type": "application/json",
"event-type": "OrderPlaced",
},
},
],
});
console.log(`Published OrderPlaced event for order ${order.id}`);
}
Kafka Consumer Example (TypeScript)
import { Kafka, EachMessagePayload } from "kafkajs";
const kafka = new Kafka({
clientId: "inventory-service",
brokers: ["kafka1:9092", "kafka2:9092"],
});
const consumer = kafka.consumer({
groupId: "inventory-service-group",
});
async function startConsumer(): Promise<void> {
await consumer.connect();
await consumer.subscribe({
topic: "orders",
fromBeginning: false,
});
await consumer.run({
eachMessage: async ({ topic, partition, message }: EachMessagePayload) => {
const eventType = message.headers?.["event-type"]?.toString();
if (eventType === "OrderPlaced") {
const event: OrderPlacedEvent = JSON.parse(message.value!.toString());
await handleOrderPlaced(event);
}
},
});
}
async function handleOrderPlaced(event: OrderPlacedEvent): Promise<void> {
const { orderId, items } = event.payload;
try {
// Reserve inventory
for (const item of items) {
await reserveStock(item.productId, item.quantity);
}
// Publish success event
await publishEvent("StockReserved", { orderId, items });
} catch (error) {
// Publish failure event
await publishEvent("StockReservationFailed", {
orderId,
reason: error.message,
});
}
}
Event Schemas and Versioning
As systems evolve, event schemas change. Handle this gracefully:
Schema Design Best Practices
// Event envelope pattern
interface EventEnvelope<T> {
// Metadata
eventId: string;
eventType: string;
schemaVersion: number;
timestamp: string;
correlationId?: string;
causationId?: string;
// Payload
payload: T;
}
// Strongly-typed events
interface OrderPlacedV1 {
orderId: string;
customerId: string;
total: number;
}
interface OrderPlacedV2 {
orderId: string;
customerId: string;
total: number;
currency: string; // Added in V2
shippingAddress: Address; // Added in V2
}
// Consumer that handles both versions
function handleOrderPlaced(envelope: EventEnvelope<unknown>): void {
switch (envelope.schemaVersion) {
case 1:
const v1 = envelope.payload as OrderPlacedV1;
processOrderV1(v1);
break;
case 2:
const v2 = envelope.payload as OrderPlacedV2;
processOrderV2(v2);
break;
default:
throw new Error(`Unknown schema version: ${envelope.schemaVersion}`);
}
}
Schema Registry (Avro/JSON Schema)
# Using Confluent Schema Registry
version: 1
namespace: com.example.orders
type: record
name: OrderPlaced
fields:
- name: orderId
type: string
- name: customerId
type: string
- name: total
type: double
- name: currency
type: string
default: "USD" # Backward compatible - new field with default
Saga Pattern: Handling Distributed Transactions
When a business process spans multiple services, you need sagas:
Order Saga - Happy Path
──────────────────────────────────────────────────────────────────
OrderPlaced
│
├──► Reserve Inventory ──► StockReserved
│ │
└──────────────────────────────┼──► Process Payment ──► PaymentProcessed
│ │
└────────────────────────────┼──► Ship Order
│
OrderCompleted
Order Saga - Compensating Actions (Payment Failed)
──────────────────────────────────────────────────────────────────
OrderPlaced
│
├──► Reserve Inventory ──► StockReserved
│ │
└──────────────────────────────┼──► Process Payment ──► PaymentFailed
│ │
│ ◄─────┘
│
└──► Release Inventory (Compensation)
│
OrderCancelled
Saga Implementation
// Saga orchestrator example
class OrderSaga {
private state:
| "PENDING"
| "STOCK_RESERVED"
| "PAYMENT_PROCESSED"
| "COMPLETED"
| "CANCELLED";
async execute(order: Order): Promise<void> {
try {
// Step 1: Reserve Stock
this.state = "PENDING";
await this.reserveStock(order);
this.state = "STOCK_RESERVED";
// Step 2: Process Payment
await this.processPayment(order);
this.state = "PAYMENT_PROCESSED";
// Step 3: Create Shipment
await this.createShipment(order);
this.state = "COMPLETED";
} catch (error) {
await this.compensate(order, error);
}
}
private async compensate(order: Order, error: Error): Promise<void> {
console.log(`Saga failed at state ${this.state}: ${error.message}`);
// Compensate in reverse order
switch (this.state) {
case "PAYMENT_PROCESSED":
await this.refundPayment(order);
// Fall through
case "STOCK_RESERVED":
await this.releaseStock(order);
// Fall through
case "PENDING":
await this.cancelOrder(order, error.message);
}
this.state = "CANCELLED";
}
}
CQRS: Command Query Responsibility Segregation
EDA pairs naturally with CQRS—separate models for writes and reads:
CQRS Architecture
──────────────────────────────────────────────────────────────────
Commands (Writes) Queries (Reads)
───────────────── ────────────────
┌──────────────┐ ┌──────────────┐
│ Create │ │ Get User │
│ Update │ │ List Orders │
│ Delete │ │ Search │
└──────┬───────┘ └──────┬───────┘
│ │
▼ ▼
┌──────────────┐ ┌──────────────┐
│ Command │ │ Query │
│ Handler │ │ Handler │
└──────┬───────┘ └──────┬───────┘
│ │
▼ ▼
┌──────────────┐ Events ┌──────────────┐
│ Write Model │ ──────────────────► │ Read Model │
│ (Postgres) │ │(Elasticsearch)│
└──────────────┘ └──────────────┘
Benefits:
├── Optimize read model for specific queries
├── Scale reads and writes independently
├── Different storage engines per use case
└── Event sourcing naturally fits
Challenges and Solutions
Challenge 1: Eventual Consistency
Problem: The user sees "Order Placed" before inventory is actually reserved.
Solutions:
// Option 1: Optimistic UI with status updates
interface Order {
id: string;
status: "pending" | "confirmed" | "processing" | "shipped";
lastUpdated: Date;
}
// UI shows: "Order received! We'll confirm availability shortly."
// Subsequent events update status
// Option 2: Wait for critical confirmations
async function placeOrder(order: Order): Promise<OrderResponse> {
// Publish event
await publishEvent("OrderPlaced", order);
// Wait for stock confirmation (with timeout)
const confirmation = await waitForEvent(
"StockReserved",
{ orderId: order.id },
{ timeout: 5000 }
);
if (confirmation.success) {
return { status: "confirmed", orderId: order.id };
} else {
return { status: "failed", reason: "Out of stock" };
}
}
Challenge 2: Debugging Distributed Flows
Problem: Tracing a transaction that hops through 5 queues is hard.
Solution: Distributed tracing with correlation IDs:
// Correlation ID propagation
interface EventContext {
correlationId: string; // Same across entire flow
causationId: string; // ID of event that caused this one
spanId: string; // For OpenTelemetry integration
}
async function publishEvent<T>(
type: string,
payload: T,
context: EventContext
): Promise<void> {
const event = {
eventId: crypto.randomUUID(),
eventType: type,
correlationId: context.correlationId,
causationId: context.causationId,
timestamp: new Date().toISOString(),
payload,
};
// Also emit to OpenTelemetry
tracer.startSpan(type, {
attributes: { correlationId: context.correlationId },
});
await producer.send({
topic: getTopicForEvent(type),
messages: [{ value: JSON.stringify(event) }],
});
}
Challenge 3: Message Ordering
Problem: Events arrive out of order.
Solution: Partition keys and idempotent consumers:
// Same partition = same order
await producer.send({
topic: "orders",
messages: [
{
key: order.id, // All events for this order go to same partition
value: JSON.stringify(event),
},
],
});
// Idempotent consumer
async function handleEvent(event: Event): Promise<void> {
// Check if already processed
const processed = await redis.get(`processed:${event.eventId}`);
if (processed) {
console.log(`Skipping duplicate event ${event.eventId}`);
return;
}
// Process event
await processEvent(event);
// Mark as processed (with TTL for cleanup)
await redis.setex(`processed:${event.eventId}`, 86400, "true");
}
Challenge 4: Dead Letter Queues
Problem: What happens when events can't be processed?
Solution: Dead letter queues with retry logic:
// Kafka consumer with DLQ
await consumer.run({
eachMessage: async ({ message }) => {
try {
await handleMessage(message);
} catch (error) {
const retryCount = getRetryCount(message);
if (retryCount < MAX_RETRIES) {
// Retry with backoff
await publishToRetryTopic(message, retryCount + 1);
} else {
// Move to dead letter queue
await publishToDLQ(message, error);
alertOps(`Message failed after ${MAX_RETRIES} retries`, message);
}
}
},
});
When to Use EDA (and When Not To)
| Scenario | EDA? | Why |
|---|---|---|
| High-volume event processing | ✅ Yes | Scales horizontally, handles bursts |
| Loose coupling needed | ✅ Yes | Services don't need to know each other |
| Real-time notifications | ✅ Yes | Push-based, low latency |
| Simple CRUD operations | ❌ No | Overkill, adds complexity |
| Strong consistency required | ⚠️ Maybe | Sagas add complexity |
| Audit/replay requirements | ✅ Yes | Event log is natural audit trail |
| Team autonomy | ✅ Yes | Teams can deploy independently |
Key Takeaways
- EDA decouples services through asynchronous event communication
- Choreography is simpler but harder to trace; orchestration is more visible but creates coupling
- Kafka is ideal for high-throughput, replayable event streams
- Schema versioning is critical for evolving systems
- Saga patterns handle distributed transactions with compensating actions
- CQRS pairs naturally with EDA for optimized read/write models
- Eventual consistency requires UI/UX patterns to handle uncertainty
- Distributed tracing with correlation IDs is essential for debugging
Event-Driven Architecture isn't a silver bullet—it's a powerful tool that requires a shift in mindset from "Request/Response" to "Publish/Subscribe." When applied correctly, it enables systems that are resilient, scalable, and extensible.
Planning to adopt Event-Driven Architecture? Contact EGI Consulting for architecture reviews, technology selection, and hands-on implementation guidance for Kafka, RabbitMQ, and cloud-native messaging systems.
Related articles
Keep reading with a few hand-picked posts based on similar topics.

Microservices are popular, but they aren't always the answer. Learn the trade-offs between monolithic and distributed architectures with decision frameworks and real-world migration strategies.

REST isn't dead, and GraphQL isn't always the answer. Learn the technical trade-offs, performance implications, and decision framework for choosing the right API architecture.

Technical debt is inevitable, but unmanaged it can sink your project. Learn proven strategies to identify, categorize, prioritize, and strategically pay down your software's hidden liabilities.