Interpreting FusionAuth's Prometheus metrics
-
Hello everyone,
I'm setting up a Grafana dashboard for our FusionAuth instances and I've run into some confusion regarding the interpretation of several key metrics, particularly those exported as Dropwizard Histograms/Prometheus Summaries.I would be grateful if anyone who has successfully instrumented and analyzed these metrics could share their insights, especially concerning the units and the meaning of the quantile alignment.
One of the confusing metrics are
Database_primary_pool_*
one example scrape showed
Database_primary_pool_MaxConnections: 20.0
but
Database_primary_pool_Usage{quantile="0.999"}: 45.0
and at some point I saw a1300peak.
Can I assume that in this case the quantile metric is the "time", specifically in milliseconds, consumed in the primary pool connection?Is there any documentation on how to interpret all metrics exposed by
/api/prometheus/metrics/?
I found this page https://fusionauth.io/docs/operate/monitor/prometheus but there's no reference or documentation for the metricsThank you in advance for any hint
_
Fabio -
@fabio-venturi I am not familiar with Prometheus, but I asked the AI on the FusionAuth site and it came back with.
Database_primary_pool_Usageis a Prometheus metric exposed by FusionAuth which reports how much of the primary database connection pool is currently in use. It lets you see whether your HikariCP pool is close to exhaustion and is useful for capacity and health monitoring. [Monitor Prometheus]In the Prometheus UI you can graph it by entering
Database_primary_pool_Usagein the expression box and executing the query. [Monitor Prometheus]It said it based the answer on the page you found, but I don't know enough to say for certain. Does this make sense to you?