Failure Safety and Load Balancing
This section shows how nevisAuth implements failure safety and load-balancing. It also briefly discusses the global session store.
Failure safety
nevisAuth manages a global session. As a consequence, the server is stateful and failure safety support requires that the corresponding session cache is synchronized, so that failover-aware clients are able to access it at a different physical location.
The failure safety pattern in nevisAuth is based on a vertical line concept with horizontally paired failure safety. The figure below shows this pattern:
The design in the shown pattern has the following advantages:
- The cost of failure safety does not increase dramatically with the number of nevisAuth server instances (synchronization overhead multiplies if the number of synchronized nodes is larger than 2).
- Configuration remains maintainable.
The restrictions include:
Session stickiness between clients and nevisAuth
For optimal performance and to prevent session loss, an established session context (e.g., the user is authenticated and the corresponding channel is associated with a client, i.e., a reverse proxy instance) should be handled over the same communication channel. In case of a failure, the client failover (e.g., the reverse proxy) should address the nevisAuth instance that holds the backup of the session state (slave instance). Failover to another nevisAuth instance will result in an error (unknown session exception).
Session stickiness between clients and reverse proxy:
Session loss arises when the load balancer in front of the reverse proxy does not route the user's established channel to the same reverse proxy (e.g., based on the SSL session ID, cookies or source IP address).
Load balancing
nevisAuth gets about two to ten hits per user session, for example for a login, a step-up, and a logout. In other words, failure safety is a big issue (as nobody is able to work when login is not possible), while load balancing is far less important. When considering the session synchronization above, a client may distribute calls to the two nevisAuth instances. If every (distributed) client does this, session synchronization delays may lead to sessions that have been modified on both instances. In this case, synchronization fails as sessions are protected (using an optimistic locking scheme). This problem is solved by synchronous synchronization (setting the synchronization delay to 0 seconds), but this increases the overhead and callers are forced to wait a little longer.
Global Session Store
To be able to synchronize sessions between nevisAuth instances, nevisAuth must store sessions in a JDBC database. This allows session sharing within nevisAuth clusters of arbitrary sizes. See Session management for details on how to configure the JDBC global session store.