Skip to main content

Event-Driven Architecture: Building Resilient, Scalable Systems with EDA

Michael Ross
14 min read
Event-Driven Architecture: Building Resilient, Scalable Systems with EDA

In a traditional microservices architecture, Service A calls Service B, which calls Service C. If Service C is slow, the entire chain hangs. If Service C is down, the request fails. This is tight coupling—and it creates fragile, hard-to-scale systems.

Event-Driven Architecture (EDA) inverts this model. Instead of "do this," services say "this happened." The difference is profound.

The Problem with Request-Response

Traditional Request-Response (Synchronous)
──────────────────────────────────────────────────────────────────

User ──► Order Service ──► Inventory Service ──► Payment Service
                    │              │                    │
                    └──── WAITS ───┴────── WAITS ──────┘

If Payment Service is slow (5s):
├── Inventory Service blocks for 5s
├── Order Service blocks for 5s
└── User waits 10+ seconds total

If Inventory Service is DOWN:
├── Order Service gets timeout/error
├── User sees failure
└── Order is lost

Problems with synchronous architecture:

IssueImpact
Cascading FailuresOne slow service brings down the chain
Tight CouplingServices must know each other's APIs
Scaling BottlenecksDownstream services limit throughput
Lost TransactionsFailures lose in-flight requests
Hard to ExtendAdding consumers requires code changes

Event-Driven Architecture: A Different Paradigm

With EDA, services communicate through events—immutable facts about what happened:

Event-Driven Architecture
──────────────────────────────────────────────────────────────────

                        ┌─────────────────┐
                        │  Message Broker │
                        │ (Kafka/RabbitMQ)│
                        └────────┬────────┘
                                 │
        ┌────────────────────────┼────────────────────────┐
        │                        │                        │
        ▼                        ▼                        ▼
┌───────────────┐       ┌───────────────┐       ┌───────────────┐
│   Inventory   │       │    Payment    │       │     Email     │
│   Service     │       │    Service    │       │    Service    │
│               │       │               │       │               │
│ "Reserve      │       │ "Process      │       │ "Send order   │
│  Stock"       │       │  Payment"     │       │  confirmation"│
└───────────────┘       └───────────────┘       └───────────────┘

Order Service publishes: OrderPlaced { orderId, items, customerId }
├── Order Service doesn't wait
├── Order Service doesn't know who listens
└── Each consumer processes independently

Choreography vs. Orchestration

There are two patterns for coordinating events across services:

Choreography: Decentralized Dance

Each service reacts to events and publishes its own events. No central coordinator.

Choreography Pattern
──────────────────────────────────────────────────────────────────

OrderPlaced ──► Inventory Service ──► StockReserved
                                            │
StockReserved ──► Payment Service ──► PaymentProcessed
                                            │
PaymentProcessed ──► Shipping Service ──► ShipmentCreated
                                            │
ShipmentCreated ──► Email Service ──► NotificationSent

Each service:
├── Listens for events it cares about
├── Does its work
└── Publishes outcome events

Pros:

  • Loose coupling—services are truly independent
  • Easy to add new consumers
  • No single point of failure

Cons:

  • Hard to see the full workflow
  • Debugging requires tracing across services
  • Saga patterns needed for failures

Orchestration: Central Conductor

A central orchestrator coordinates the workflow:

Orchestration Pattern
──────────────────────────────────────────────────────────────────

                    ┌──────────────────┐
                    │   Order Saga     │
                    │  Orchestrator    │
                    └────────┬─────────┘
                             │
         ┌───────────────────┼───────────────────┐
         ▼                   ▼                   ▼
   Reserve Stock      Process Payment     Create Shipment
         │                   │                   │
         └───────────────────┴───────────────────┘
                             │
                    Results back to orchestrator
                             │
                    Orchestrator decides next step

Pros:

  • Workflow is visible in one place
  • Easier to understand and debug
  • Compensating actions are clear

Cons:

  • Orchestrator is a coupling point
  • Single point of failure risk
  • Can become a bottleneck

When to Use Each

Use CasePattern
Simple notification flowsChoreography
Many consumers for same eventChoreography
Complex business transactionsOrchestration
Strict ordering requirementsOrchestration
Regulatory audit trailsOrchestration

Implementing EDA with Apache Kafka

Kafka is the industry standard for high-throughput event streaming:

Kafka Architecture

Kafka Cluster Architecture
──────────────────────────────────────────────────────────────────

Producer A ──┐                              ┌── Consumer Group 1
Producer B ──┼──► Topic: orders             │   ├── Consumer 1a
Producer C ──┘    ├── Partition 0 ──────────┼── Consumer 1b
                  ├── Partition 1 ──────────┤
                  └── Partition 2 ──────────┘
                                            └── Consumer Group 2
                                                ├── Consumer 2a
                                                └── Consumer 2b

Key Concepts:
├── Topics: Named streams of events
├── Partitions: Enable parallelism (events ordered within partition)
├── Consumer Groups: Enable scaling consumers
└── Retention: Events persisted for replay

Kafka Producer Example (TypeScript)

import { Kafka, Partitioners } from "kafkajs";

const kafka = new Kafka({
  clientId: "order-service",
  brokers: ["kafka1:9092", "kafka2:9092"],
});

const producer = kafka.producer({
  createPartitioner: Partitioners.DefaultPartitioner,
});

interface OrderPlacedEvent {
  eventId: string;
  eventType: "OrderPlaced";
  timestamp: string;
  payload: {
    orderId: string;
    customerId: string;
    items: Array<{ productId: string; quantity: number; price: number }>;
    total: number;
  };
}

async function publishOrderPlaced(order: Order): Promise<void> {
  const event: OrderPlacedEvent = {
    eventId: crypto.randomUUID(),
    eventType: "OrderPlaced",
    timestamp: new Date().toISOString(),
    payload: {
      orderId: order.id,
      customerId: order.customerId,
      items: order.items,
      total: order.total,
    },
  };

  await producer.send({
    topic: "orders",
    messages: [
      {
        key: order.id, // Ensures all events for same order go to same partition
        value: JSON.stringify(event),
        headers: {
          "content-type": "application/json",
          "event-type": "OrderPlaced",
        },
      },
    ],
  });

  console.log(`Published OrderPlaced event for order ${order.id}`);
}

Kafka Consumer Example (TypeScript)

import { Kafka, EachMessagePayload } from "kafkajs";

const kafka = new Kafka({
  clientId: "inventory-service",
  brokers: ["kafka1:9092", "kafka2:9092"],
});

const consumer = kafka.consumer({
  groupId: "inventory-service-group",
});

async function startConsumer(): Promise<void> {
  await consumer.connect();
  await consumer.subscribe({
    topic: "orders",
    fromBeginning: false,
  });

  await consumer.run({
    eachMessage: async ({ topic, partition, message }: EachMessagePayload) => {
      const eventType = message.headers?.["event-type"]?.toString();

      if (eventType === "OrderPlaced") {
        const event: OrderPlacedEvent = JSON.parse(message.value!.toString());
        await handleOrderPlaced(event);
      }
    },
  });
}

async function handleOrderPlaced(event: OrderPlacedEvent): Promise<void> {
  const { orderId, items } = event.payload;

  try {
    // Reserve inventory
    for (const item of items) {
      await reserveStock(item.productId, item.quantity);
    }

    // Publish success event
    await publishEvent("StockReserved", { orderId, items });
  } catch (error) {
    // Publish failure event
    await publishEvent("StockReservationFailed", {
      orderId,
      reason: error.message,
    });
  }
}

Event Schemas and Versioning

As systems evolve, event schemas change. Handle this gracefully:

Schema Design Best Practices

// Event envelope pattern
interface EventEnvelope<T> {
  // Metadata
  eventId: string;
  eventType: string;
  schemaVersion: number;
  timestamp: string;
  correlationId?: string;
  causationId?: string;

  // Payload
  payload: T;
}

// Strongly-typed events
interface OrderPlacedV1 {
  orderId: string;
  customerId: string;
  total: number;
}

interface OrderPlacedV2 {
  orderId: string;
  customerId: string;
  total: number;
  currency: string; // Added in V2
  shippingAddress: Address; // Added in V2
}

// Consumer that handles both versions
function handleOrderPlaced(envelope: EventEnvelope<unknown>): void {
  switch (envelope.schemaVersion) {
    case 1:
      const v1 = envelope.payload as OrderPlacedV1;
      processOrderV1(v1);
      break;
    case 2:
      const v2 = envelope.payload as OrderPlacedV2;
      processOrderV2(v2);
      break;
    default:
      throw new Error(`Unknown schema version: ${envelope.schemaVersion}`);
  }
}

Schema Registry (Avro/JSON Schema)

# Using Confluent Schema Registry
version: 1
namespace: com.example.orders
type: record
name: OrderPlaced
fields:
  - name: orderId
    type: string
  - name: customerId
    type: string
  - name: total
    type: double
  - name: currency
    type: string
    default: "USD" # Backward compatible - new field with default

Saga Pattern: Handling Distributed Transactions

When a business process spans multiple services, you need sagas:

Order Saga - Happy Path
──────────────────────────────────────────────────────────────────

OrderPlaced
    │
    ├──► Reserve Inventory ──► StockReserved
    │                              │
    └──────────────────────────────┼──► Process Payment ──► PaymentProcessed
                                   │                            │
                                   └────────────────────────────┼──► Ship Order
                                                                │
                                                           OrderCompleted

Order Saga - Compensating Actions (Payment Failed)
──────────────────────────────────────────────────────────────────

OrderPlaced
    │
    ├──► Reserve Inventory ──► StockReserved
    │                              │
    └──────────────────────────────┼──► Process Payment ──► PaymentFailed
                                   │                            │
                                   │                      ◄─────┘
                                   │
                                   └──► Release Inventory (Compensation)
                                               │
                                         OrderCancelled

Saga Implementation

// Saga orchestrator example
class OrderSaga {
  private state:
    | "PENDING"
    | "STOCK_RESERVED"
    | "PAYMENT_PROCESSED"
    | "COMPLETED"
    | "CANCELLED";

  async execute(order: Order): Promise<void> {
    try {
      // Step 1: Reserve Stock
      this.state = "PENDING";
      await this.reserveStock(order);
      this.state = "STOCK_RESERVED";

      // Step 2: Process Payment
      await this.processPayment(order);
      this.state = "PAYMENT_PROCESSED";

      // Step 3: Create Shipment
      await this.createShipment(order);
      this.state = "COMPLETED";
    } catch (error) {
      await this.compensate(order, error);
    }
  }

  private async compensate(order: Order, error: Error): Promise<void> {
    console.log(`Saga failed at state ${this.state}: ${error.message}`);

    // Compensate in reverse order
    switch (this.state) {
      case "PAYMENT_PROCESSED":
        await this.refundPayment(order);
      // Fall through
      case "STOCK_RESERVED":
        await this.releaseStock(order);
      // Fall through
      case "PENDING":
        await this.cancelOrder(order, error.message);
    }

    this.state = "CANCELLED";
  }
}

CQRS: Command Query Responsibility Segregation

EDA pairs naturally with CQRS—separate models for writes and reads:

CQRS Architecture
──────────────────────────────────────────────────────────────────

Commands (Writes)                      Queries (Reads)
─────────────────                      ────────────────

┌──────────────┐                      ┌──────────────┐
│  Create      │                      │  Get User    │
│  Update      │                      │  List Orders │
│  Delete      │                      │  Search      │
└──────┬───────┘                      └──────┬───────┘
       │                                     │
       ▼                                     ▼
┌──────────────┐                      ┌──────────────┐
│   Command    │                      │   Query      │
│   Handler    │                      │   Handler    │
└──────┬───────┘                      └──────┬───────┘
       │                                     │
       ▼                                     ▼
┌──────────────┐   Events            ┌──────────────┐
│  Write Model │ ──────────────────► │  Read Model  │
│  (Postgres)  │                      │(Elasticsearch)│
└──────────────┘                      └──────────────┘

Benefits:
├── Optimize read model for specific queries
├── Scale reads and writes independently
├── Different storage engines per use case
└── Event sourcing naturally fits

Challenges and Solutions

Challenge 1: Eventual Consistency

Problem: The user sees "Order Placed" before inventory is actually reserved.

Solutions:

// Option 1: Optimistic UI with status updates
interface Order {
  id: string;
  status: "pending" | "confirmed" | "processing" | "shipped";
  lastUpdated: Date;
}

// UI shows: "Order received! We'll confirm availability shortly."
// Subsequent events update status

// Option 2: Wait for critical confirmations
async function placeOrder(order: Order): Promise<OrderResponse> {
  // Publish event
  await publishEvent("OrderPlaced", order);

  // Wait for stock confirmation (with timeout)
  const confirmation = await waitForEvent(
    "StockReserved",
    { orderId: order.id },
    { timeout: 5000 }
  );

  if (confirmation.success) {
    return { status: "confirmed", orderId: order.id };
  } else {
    return { status: "failed", reason: "Out of stock" };
  }
}

Challenge 2: Debugging Distributed Flows

Problem: Tracing a transaction that hops through 5 queues is hard.

Solution: Distributed tracing with correlation IDs:

// Correlation ID propagation
interface EventContext {
  correlationId: string; // Same across entire flow
  causationId: string; // ID of event that caused this one
  spanId: string; // For OpenTelemetry integration
}

async function publishEvent<T>(
  type: string,
  payload: T,
  context: EventContext
): Promise<void> {
  const event = {
    eventId: crypto.randomUUID(),
    eventType: type,
    correlationId: context.correlationId,
    causationId: context.causationId,
    timestamp: new Date().toISOString(),
    payload,
  };

  // Also emit to OpenTelemetry
  tracer.startSpan(type, {
    attributes: { correlationId: context.correlationId },
  });

  await producer.send({
    topic: getTopicForEvent(type),
    messages: [{ value: JSON.stringify(event) }],
  });
}

Challenge 3: Message Ordering

Problem: Events arrive out of order.

Solution: Partition keys and idempotent consumers:

// Same partition = same order
await producer.send({
  topic: "orders",
  messages: [
    {
      key: order.id, // All events for this order go to same partition
      value: JSON.stringify(event),
    },
  ],
});

// Idempotent consumer
async function handleEvent(event: Event): Promise<void> {
  // Check if already processed
  const processed = await redis.get(`processed:${event.eventId}`);
  if (processed) {
    console.log(`Skipping duplicate event ${event.eventId}`);
    return;
  }

  // Process event
  await processEvent(event);

  // Mark as processed (with TTL for cleanup)
  await redis.setex(`processed:${event.eventId}`, 86400, "true");
}

Challenge 4: Dead Letter Queues

Problem: What happens when events can't be processed?

Solution: Dead letter queues with retry logic:

// Kafka consumer with DLQ
await consumer.run({
  eachMessage: async ({ message }) => {
    try {
      await handleMessage(message);
    } catch (error) {
      const retryCount = getRetryCount(message);

      if (retryCount < MAX_RETRIES) {
        // Retry with backoff
        await publishToRetryTopic(message, retryCount + 1);
      } else {
        // Move to dead letter queue
        await publishToDLQ(message, error);
        alertOps(`Message failed after ${MAX_RETRIES} retries`, message);
      }
    }
  },
});

When to Use EDA (and When Not To)

ScenarioEDA?Why
High-volume event processing✅ YesScales horizontally, handles bursts
Loose coupling needed✅ YesServices don't need to know each other
Real-time notifications✅ YesPush-based, low latency
Simple CRUD operations❌ NoOverkill, adds complexity
Strong consistency required⚠️ MaybeSagas add complexity
Audit/replay requirements✅ YesEvent log is natural audit trail
Team autonomy✅ YesTeams can deploy independently

Key Takeaways

  1. EDA decouples services through asynchronous event communication
  2. Choreography is simpler but harder to trace; orchestration is more visible but creates coupling
  3. Kafka is ideal for high-throughput, replayable event streams
  4. Schema versioning is critical for evolving systems
  5. Saga patterns handle distributed transactions with compensating actions
  6. CQRS pairs naturally with EDA for optimized read/write models
  7. Eventual consistency requires UI/UX patterns to handle uncertainty
  8. Distributed tracing with correlation IDs is essential for debugging

Event-Driven Architecture isn't a silver bullet—it's a powerful tool that requires a shift in mindset from "Request/Response" to "Publish/Subscribe." When applied correctly, it enables systems that are resilient, scalable, and extensible.


Planning to adopt Event-Driven Architecture? Contact EGI Consulting for architecture reviews, technology selection, and hands-on implementation guidance for Kafka, RabbitMQ, and cloud-native messaging systems.

Related articles

Keep reading with a few hand-picked posts based on similar topics.

Posted in Blog & Insights