Failover
Failover is the redirecting of transactions away from unhealthy provider streams and towards healthy streams. For failover to be possible, the downstream originator must be integrated (via Electrum) to two or more service providers who can provide the same product or service. Failover can occur in different ways.
Load Balancer Failover (Endpoint Health Failover)
The Electrum Switch identifies unhealthy streams by performing periodic health checks. If an upstream provider is unavailable, it redirects all transactions to the remaining healthy streams based on preferential routing. The load balancer redirects transactions to healthy provider streams.
Any transactions that went through the unhealthy stream before failover happened may be lost, and must be intentionally restarted at the originating system.
The load balancer continues to perform health checks. When the unhealthy provider stream becomes healthy again, the load balancer switches back to the usual routing strategy.
This type of failover occurs as a normal part of the routing/load balancing process. The load balancer can also facilitate managed failover, which prevents the system from suddenly shifting all transactions unpredictably from one upstream to another (and thereby putting strain on the system).
Traffic Failure Ratio Failover
This occurs when a certain percentage of transactions are declined within a specified time period (this determines the traffic failure ratio), and the traffic (number of transactions being routed to an service provider) is above a minimum threshold. Once the failure ratio exceeds a specified threshold, failover then prevents transactions from being routed to that service provider.
A small number of transactions will still be allowed through to the service provider from time to time, to determine whether the declines are still occurring. If the transactions are still declined, then that service provider will remain offline.
Error Type Failover (Failover Orchestrator)
Electrum's failover orchestrator kicks in when a service provider returns certain error types, such as OUT_OF_STOCK or INVALID_PRODUCT in response to a general purchase request or other financially impacting transaction. In response to one of a group of certain error types, the failover orchestrator flags that stream in question so that new incoming transactions will not be sent there. It starts routing incoming transactions to the next preferred service provider. It also sends any transactions that originally returned the error to the next preferred service provider, to try and salvage those transactions.
In the example above, the service provider is only blocked from receiving transactions for the specific product that returned an error. Electrum can still route other transactions — relating to products that do not return errors — to that service provider.
Therefore Electrum’s failover orchestrator allows originating systems to attempt to salvage failed transactions, without having to redo the transaction from the start.
As a downstream integrator you must decide whether you want to implement the Electrum Failover Orchestrator as an extra feature. If you do, you must decide which error types will be used to flag unhealthy provider streams.