StackSaga-Kafka Framework (Asynchronous)
Overview
StackSaga-Kafka Framework (stacksaga-kafka-spring-boot-starter) is the asynchronous transport implementation of the StackSaga ecosystem.
It replaces synchronous inter-service calls with Kafka as the messaging backbone, enabling fully event-driven saga execution for long-running transactions (LRT) in a microservices system.
Unlike a simple choreography approach where each service publishes and reacts to domain events independently, StackSaga-Kafka follows a centralized orchestration model: a single orchestrator service owns the full saga lifecycle, emits commands to target worker services via dedicated Kafka topics, and collects responses through a framework-managed per-domain reply topic. This separation gives you the scalability and decoupling of Kafka while retaining the deterministic control flow and auditability of the saga orchestration pattern.
The framework is built on top of Spring Boot and integrates with StackSaga’s core engine for event sourcing, state management, retry scheduling, and cluster coordination.
Each saga execution is fully persisted — every state transition of the SagaDomainEntity (the saga’s aggregate root and payload carrier) is written to the event store through stacksaga-database-support — so that any in-progress transaction can be recovered, retried, or inspected at any point in time.
Key capabilities:
-
Asynchronous saga orchestration — the orchestrator issues commands to worker services via Kafka and advances the saga state only upon receiving the worker’s reply, with no blocking thread held during the wait.
-
Forward and backward recovery — built-in support for compensating transactions: when a saga step fails, the framework invokes
onNextRevert()on theSagaEventNavigatorin reverse step order, triggering rollback actions on each previously completed service. -
Per-domain reply topology — outbound command topics are created per worker intent; inbound reply topics are created per saga domain by the framework. A system with three saga domains maintains exactly three reply topics, keeping Kafka partition management predictable.
-
Durable state via event sourcing — the full
SagaDomainEntitypayload and status (STARTED → IN_PROGRESS → COMPLETED | FAILED → COMPENSATING → COMPENSATED) is persisted on every state transition, enabling point-in-time recovery and complete audit trails. -
Distributed retry with token ring partitioning — failed or stalled transactions are retried by the owning orchestrator instance as determined by Murmur3-based token ring partitioning, coordinated through the
stacksaga-ring-coordinator. -
Idempotency guarantees — duplicate Kafka message delivery is handled safely through the StackSaga Trace Window, preventing unintended re-execution of already-completed saga steps.
-
Reactive and imperative support — the framework’s worker-side
KafkaWorkerEndpointsabstraction supports both reactive (Mono/Flux) and blocking handler implementations, giving application developers freedom to choose their programming model without affecting saga coordination.
Glossary
The following terms are used throughout this documentation. Familiarising yourself with them before reading the technical sections will significantly reduce the learning curve.
| Term | Definition |
|---|---|
LRT (Long-Running Transaction) |
A business transaction that spans multiple services and may take seconds, minutes, or longer to complete. LRTs require durable state management, distributed coordination, and compensation support for partial failures. |
Saga Domain |
A specific type of LRT identified by its |
Span |
A single atomic execution step within a saga. Each span targets one worker service via a dedicated Kafka topic. A saga is composed of one or more sequential spans in the primary flow, each with an optional corresponding compensation span. |
SEC (Saga Execution Coordinator) |
The internal engine component responsible for driving a saga forward: evaluating the |
Domain Entity |
The aggregate root for a single saga instance. Carries the full accumulated business payload and current execution state throughout the saga lifecycle. Every state transition is persisted as a snapshot to the event store. |
EventManager |
The orchestrator-side component that defines the routing logic for a saga domain.
Determines which topic to trigger next after each successful span ( |
Executor |
A worker-side handler that executes the business logic for a specific saga step.
Implemented as a |
Compensation |
The reverse process that undoes previously completed steps when a saga fails.
Executed in reverse order via |
Event Store |
The persistent storage layer for all saga state transitions and domain entity snapshots.
Provided by |
Revert Hint Store |
A key-value store that carries metadata forward through the compensation sequence.
Values written during one |
Topic Key |
A unique |
Ring Coordinator |
A standalone service ( |
Why StackSaga-Kafka Over Raw Kafka?
Using Kafka alone for service-to-service communication in a saga does not automatically give you saga orchestration. A raw Kafka topology requires every participating service to understand the broader transaction context: which step it is in, what has already succeeded, what to roll back if something goes wrong, and how to track overall saga progress. This logic inevitably leaks into every service, making the system fragile and difficult to evolve.
The following comparison explains the specific gaps StackSaga-Kafka closes.
Centralized Transaction State Management
| Concern | Detail |
|---|---|
Raw Kafka |
Kafka delivers messages durably but has no concept of a business transaction. There is no built-in mechanism to know that "step 2 of 5 has completed" or that a transaction is in a compensating state. |
StackSaga-Kafka |
The orchestrator tracks the full saga lifecycle through well-defined states: |
Advantage |
No saga coordination logic needs to be implemented in individual worker services. The orchestrator is the sole authority on saga state. |
Resilience and Fault Tolerance
| Concern | Detail |
|---|---|
Raw Kafka |
Kafka provides delivery durability and at-least-once guarantees, but it does not know what to do when the message is consumed and the downstream operation fails. Retry logic, timeouts, and compensation decisions must be built separately. |
StackSaga-Kafka |
The framework implements configurable retry windows, boundary guard thresholds, and automatic compensation invocation. If a worker service returns a failure response, the saga engine immediately begins the reverse traversal of the |
Advantage |
Data consistency is maintained across services even under partial failures, without embedding retry or rollback logic in each service. |
Simplified Development
| Concern | Detail |
|---|---|
Raw Kafka |
Developers must implement state machine logic, idempotency checks, saga step routing, and compensation orchestration in every service that participates in a business transaction. |
StackSaga-Kafka |
The orchestrator-side abstraction ( |
Advantage |
Reduces boilerplate significantly and confines distributed transaction concerns to a small, testable layer. |
Operational Observability
| Concern | Detail |
|---|---|
Raw Kafka |
Consumer lag metrics and broker-level monitoring are available, but there is no concept of "transaction X is stuck at step 3" or "compensation for order Y failed". |
StackSaga-Kafka |
The |
Advantage |
Operations teams can diagnose stalled or failed sagas at the business-transaction level, not just at the Kafka consumer level. |
Components
The StackSaga-Kafka Framework is split into two artifacts with distinct responsibilities. They communicate with each other exclusively through Kafka: the orchestrator sends outbound command messages to worker-designated topics, and workers send reply messages back to the framework-managed per-domain reply topic.
stacksaga-kafka-orchestrator
stacksaga-kafka-orchestrator is the core runtime dependency added to the orchestrator service — the single service responsible for initiating and driving the saga lifecycle.
It exposes:
-
StackSagaKafkaTemplate— the entry point for starting a new saga execution or resuming a recovered one. Accepts an initializedSagaDomainEntityand hands it off to the saga engine. -
SagaEventNavigator— defines the ordered steps and compensation steps of a saga. The engine callsonNext()sequentially for forward progress andonNextRevert()in reverse order during compensation. -
SagaDomainEntity— the aggregate root for a saga instance. Carries the accumulated business payload and the current execution state. It is serialized and persisted to the event store on every state transition.
To run your application as an orchestrator service, first add the stacksaga-kafka-orchestrator-starter dependency to your project.
<dependencyManagement>
<dependencies>
<dependency> <!--Only for stacksaga dependencies version management-->
<groupId>org.stacksaga</groupId>
<artifactId>stacksaga-bom</artifactId>
<version>1.0.0</version>
<type>pom</type>
<scope>import</scope>
</dependency>
</dependencies>
</dependencyManagement>
<dependencies>
<dependency>
<groupId>org.stacksaga</groupId>
<artifactId>stacksaga-kafka-orchestrator-starter</artifactId>
</dependency>
</dependencies>
| It has been built internally using reactive programming principles, ensuring high scalability and performance. It primarily uses Reactor Kafka for interacting with Kafka. Even though the framework operates reactively under the hood, it fully supports both reactive and non-reactive (imperative) application models. |
If your orchestrator service should work as the worker for another saga domain owned by a different orchestrator, no need to add the worker dependency (stacksaga-kafka-worker-starter) separately. because the orchestrator dependency already includes the worker dependency transitively.
|
Additionally, the orchestrator service must include:
-
stacksaga-database-support— provides the event store integration (MySQL, Cassandra, or ScyllaDB depending on configuration) for persistingSagaDomainEntitysnapshots and state transitions. -
stacksaga-ring-coordinator-connector(optional, required for retry) — connects the orchestrator instance to theRing Coordinator Service, registers it as a retry node, and receives its assigned Murmur3 token sub-range. Enables the retry subsystem to re-invoke failed transactions owned by this instance. -
stacksaga-environment-support(optional) — provides environment metadata to the framework for service discovery and instance identity resolution in environments such as Eureka or Kubernetes.
stacksaga-kafka-worker
stacksaga-kafka-worker is the dependency added to every worker service (also called a target service or utility service) that participates in sagas initiated by an orchestrator.
The worker dependency provides:
-
A Kafka listener infrastructure that subscribes to the topic(s) designated for that service by the framework’s topic configuration.
-
KafkaWorkerEndpoints— the abstraction through which application developers implement the forward processing logic and the compensation logic for each saga step handled by this service. -
Automatic response dispatch — after the endpoint handler completes (successfully or with a failure result), the worker sends the outcome back to the framework via the orchestrator’s per-domain reply topic. The application developer does not manage this reply flow directly.
Worker services do not interact with the event store or the ring coordinator. They are stateless with respect to the saga: they receive a command, execute the relevant business operation, and return a result.
A single worker service can handle steps for multiple orchestrator domains (i.e., multiple saga types), each routed to the correct KafkaWorkerEndpoints implementation by the framework.
|
To add the worker dependency, include stacksaga-kafka-worker-starter in your worker project as below.
<dependencyManagement>
<dependencies>
<dependency> <!--Only for stacksaga dependencies version management-->
<groupId>org.stacksaga</groupId>
<artifactId>stacksaga-bom</artifactId>
<version>1.0.0</version>
<type>pom</type>
<scope>import</scope>
</dependency>
</dependencies>
</dependencyManagement>
<dependencies>
<dependency>
<groupId>org.stacksaga</groupId>
<artifactId>stacksaga-kafka-worker-starter</artifactId>
</dependency>
</dependencies>