Stacksaga Agent

Stacksaga agent is the application that invokes for retrying the transactions. You already know that if the transaction is failed with a network exception (Resource-Unavailable) the transaction can be replayed. That retrying part is done by the Stacksaga agent service.

Why Agent-service?

Sometimes you might think that why cannot be re-invoked the transactions by the own instance without an agent?
The short answer is it cannot be done due the instances are Ephemeral in nature in the microservice architecture.

For instance, just imagine there are 3 instances running on, and there are 7 transactions have been saved for retrying by them due to some network issues. If each instance takes care of the transactions by themselves, After a while, the scheduler is triggered for replaying transactions on each instance. At that moment it can have different instances count due to scaling up or scaling down based on the traffic at that particular time. Just imagine an instance that has made some transactions is not running on when the scheduler is triggered for retrying called order-service-3400001. At this moment, other instances do not touch to those transactions that made by that instance. Therefore, those transactions will not be exposed for retrying ever, like the diagram shows below.

why instance does not involve directly for retrying in StackSaga

That is why a separate system should be involved in transaction retrying. Next let’s see how the StackSaga-Agent manages it.

Retrying Transactions with StackSaga Agent

You know that the transactions are executed by the orchestrator service with the help of StackSaga framework. If the transaction is not able to process due to some Resource-Unavailable exception, the transaction is kept in the event-store for retrying. For instance, while the make-order process an exception is occurred in the MakePaymentExecutor due to the payment-services are not available for some reason. Then order-service saves the transaction with the help of StackSaga framework in the event-store for retrying in a configured interval. The duty of the orchestrator service is ended with that temporally.

When the interval is reached, the Stacksaga agent service triggers the schedulers for gathering the transaction that should be retried. And after collecting the transactions, the agent distributes the transactions for the available order-service (orchestrator services) instances for retrying.

how stacksaga agent distribute transactions

Here you can see it does not matter which instance initiated the transaction. All the transactions that should be retried are scanned by the agent, and those are distributed with the available order-service instances.

The communication between the agent-service and the order-services is done via the In-built Http endpoints that are provided by stacksaga-spring-boot-starter in the orchestrator service.
The load is balanced by the service discovery implementation in the system like eureka, kubernetes service, etc.

Ephemeral behavior of the instances.

In the context of microservices, ephemeral refers to the principle that a microservice can be created, destroyed, and replenished on-demand on a target easily, quickly, and with no side effects.

How does agent filter the transactions from event-store?

To retry the transactions, the agent should filter the transactions from the event-store. When the transaction is filtered from the transactions, the agent node considers 2 factors mainly.

The region of the transaction
The token of the transaction (The given quota for the agent).

Region consideration by the agent

The region of the transaction that was initialized is the main factor that considers whether the transaction should be retried by that particular agent. For instance, if the system is deployed in multi-region, one region should have at least one agent node to get done the transaction retrying.

Transaction’s token consideration by the agent

Each transaction has a unique token (Generated by Murmur3 hashing algorithm, and the toke will be a hash between -2^63 to +2^63-1 range) that has been assigned based on the transaction-id. It helps to avoid collisions when all the agent nodes try to filter the transaction from the event-store. Because each agent node has its own token quota.

For instance, if you have four nodes in the given region, at the same scheduled time all the nodes start fetching the transaction from the same event-store. Then, if they do not have an idea what are the transaction should be fetched by the particular node, the same transaction will be exposed for retrying in to multiple nodes at the same time, and all the nodes try to do the same task. It leads to data inconsistency in the system. To avoid that, each node should know about their transaction quota.

Allocating the transaction quota is based on the environment that the agent is deployed like Eureka and Kubernetes.

Due to the tokens are generated by Murmur3 hashing algorithm, the token range can be divided based on the running nodes count in the region.

If you have 4 instances in the region, the entire range is divided into 4 ranges evenly. the range as follows:

Table 1. Token Range For Each Node
Node (Agent Instance)	Token Range
order-service-agent-0	-9223372036854775808 TO -4611686018427387905
order-service-agent-1	-4611686018427387904 TO -1
order-service-agent-2	0 TO 4611686018427387903
order-service-agent-3	4611686018427387904 TO 9223372036854775807

stacksaga diagram transaction range in cluster mode

One node per one region

stacksaga diagram stacksaga service agent single node in multi region

If only one instance is running in the region, that node acquires the entire token range (-9223372036854775808 to 9223372036854775807).

1	Fetch the transactions from the event-store for the respective region. Due to only one node is running in the region, it can be running on one of the available zones in the region, and the entire token range is acquired by that node.
2	Send the collected transactions withing the available instances. It does not matter which zone the service agent is running in. It shares all the collected transactions for the available instances in the region.
3	Receive the transactions by each orchestrator service and execute them by connecting with the event-store.

Multi node per one region

stacksaga diagram stacksaga service agent multi node in multi region

1	Fetch the transactions from the event-store for the respective region and the respective token range. The service agent nodes can have in any zone in the region with any amount, and they have their own token range.
2	Send the collected transactions withing the available instances in the region. It shares all the collected transactions withing the available instances in the region. Not only withing the zone that service agent running on because service agents are for the entire region.
3	Receive the transactions by each orchestrator service and execute them by connecting with the event-store.