Management Service
Starting with nevisIDM 2.73.x, nevisIDM provides a new management service, which allows operators to monitor and check the health of a nevisIDM instance. Additionally, the service allows seamless integration and operation with Kubernetes deployment using nevisAdmin.
By default, the management service is exposed on port 8998. For security reasons, it only listens on localhost. You can adjust this in the management.server.*
parameters of the nevisidm-prod.properties configuration file. For more information, see Configuration files.
Securing the management service
Note that the management service is only accessible over HTTP. Additional protection is not required, as no sensitive information is propagated in the management service. Additional authentication or tokens are not required either. However, you can modify the network interface and the port the service listens on. It is recommended that you limit access as tightly as possible using firewalls.
Liveness and readiness probes
The goal of the management service is to monitor the health of a nevisIDM instance. nevisIDM provides two separate endpoints, which can be used to check:
- the liveness of the instance, to indicate if the process should be restarted; and
- the readiness of the instance, to indicate whether the instance is ready to accept requests.
You can use these endpoints with any monitoring tool that supports HTTP calls. They also make it easier to integrate nevisIDM in a Kubernetes orchestration environment.
Kubernetes: Liveness and Readiness Probes
For more information on Kubernetes probes, see `http://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-probes/.
Liveness /liveness
The liveness probe indicates whether the process should be restarted. If the liveness probe fails, the application is broken and will not recover by itself.
curl --max-time 1 http://localhost:8998/liveness
If the application is alive, the probe will return a JSON response with status code 200:
{status: "UP"}
If the application is broken, the probe will time out or return a non-200 response.
Readiness /health
The readiness probe indicates whether the nevisIDM application is up and running and in a good state. If the readiness probe fails, the application is temporary not working. However, the application may still be able to recover, as the cause of the failure might be a temporary network problem between nevisIDM and the database.
The readiness probe monitors disk space, database connectivity and application server state. If one of them fails, the probe will fail.
curl --max-time 1 http://localhost:8998/health
If the application is up and running, the probe will return status code 200 and more details in the response.
{"status": "UP", "details": {
"databaseConnection": { "status": "UP", "details": { "max": 100, "total": 30, "inactive": 20, "active": 10, "threshold": 10 } },
"databaseVersion": { "status": "UP", "details": { "version": "3.1" } },
"diskSpace": { "status": "UP", "details": { ... } },
"appServer": { "status": "UP", "details":{ ... } }
}
// JSON output is reformatted.
If the application is not up and running, the probe will time out or return a non-200 response with additional details on the failing probes (see the Keys in JSON section).
{"status": "DOWN", "details": {
"databaseConnection": { "status": "DOWN", "details": { "max": 100, "total": 1, "inactive": 1, "active": 0, "threshold": 5 } },
"databaseVersion": { "status": "DOWN", "details": { "error": ... } },
"diskSpace": { "status": "UP", "details": { ... } },
"appServer": { "status": "DOWN", "details":{ ... } }
}
// JSON output is reformatted.
The following section lists the information that is additionally available on the readiness endpoint.
Keys in JSON
databaseConnection.details
- .max (Number): Maximum number of connections in the database connection pool. It is not a variable quantity, it comes from configuration.
- .total (Number): Total number of connections made from the connection pool to the database. The sum of active and inactive connections.
- .inactive (Number): Number of currently inactive database connections in the pool.
- .active (Number): The number of active database connections in the pool. The subtraction of total and inactive connections. This number is expected to be equal (or greater) than the threshold (the number set in
.threshold
). - .threshold (Number): Minimum pool size in the database connection pool. If the number of the total connections is smaller than this threshold, the database connection is considered down. The active connections are specified in
.active
.
databaseVersion.details
- .version (Number): The version of the nevisIDM database schema.
diskSpace.details
- .total (Number of bytes): The total number of bytes of the disk where server.log.dir resides.
- .threshold (Number of bytes): When the number of free bytes is below this threshold, the disk space is considered insufficient. The threshold is fixed to 10 MB.
- .free (Number of bytes): The number of bytes free on the disk where server.log.dir resides. There is expected to be at least 10 MB of free disk space.
appServer.details
- .state (String): The state of the application server. If the application server is not ready to process requests, it is considered down.