java.sql.SQLTransientConnectionException: HikariPool-1 - Connection is not available
-
We've been seeing this error pop up and cause some failures to our API calls recently. We're on an Essentials Hosted plan currently on v1.46.0.
2023-10-20 08:07:06.942 PM ERROR io.fusionauth.api.service.system.DefaultAsyncTaskManager - An exception occurred while managing an async task. org.apache.ibatis.exceptions.PersistenceException: ### Error querying database. Cause: java.sql.SQLTransientConnectionException: HikariPool-1 - Connection is not available, request timed out after 2000ms. ### The error may exist in io/fusionauth/api/domain/LockMapper.java (best guess) ### The error may involve io.fusionauth.api.domain.LockMapper.lock ### The error occurred while executing a query ### Cause: java.sql.SQLTransientConnectionException: HikariPool-1 - Connection is not available, request timed out after 2000ms. at org.apache.ibatis.exceptions.ExceptionFactory.wrapException(ExceptionFactory.java:30)
I saw this related issue and mentioned about IOPS limit, but I haven't seen that metric anywhere. Is this something about how we're using FusionAuth or just a system issue? We've been ramping up more users onto the system recently, along with API calls so not sure if it's something we're doing, but wanted to make sure this doesn't become a larger issue as we continue to scale.
-
@mark-shapiro , just wanted you to be aware that since you have the Essentials plan, you can get support directly thorough the FusionAuth Account Portal if this is time sensitive for you.
-
@mark-robustelli It's not today critical, though I logged into the console and I don't have a support option (I might have to bug our admin to go that route).
Though I also see Essentials is under Self-Hosted, so now I'm questioning life. I'll have to check with our DevOps team as well. So will do that (Monday) if there's there anything to look at on our side, let me know.
-
@mark-shapiro Please do check with your team. There are two different things, the licensing and the hosting. It is possible to have Business Hosting or Self-Hosting with the Essentials Plan.
-
@mark-robustelli I verified with the domain name itself.
nslookup login.mycompany.com Server: 8.8.8.8 Address: 8.8.8.8#53 Non-authoritative answer: login.mycompany.com canonical name = mycompany-prod.fusionauth.io.
So should confirm we're using hosted (and had been what I thought).
-
@mark-shapiro , Thanks for clearing up the hosting. It sounds like you have the business hosting plan. Now, which FusionAuth license (This is different than the hosting) do you have? In an earlier post it sounded like you had the Essentials plan. Can you please confirm that with your admin? (It should be one of the following: Community, Starter, Essentials, or Enterprise)
-
@mark-robustelli I had opened a support ticket (via email) and got this response
If you have external calls in your integration, you will want to ensure a fast response Lambdas - If you are calling FusionAuth APIs in a lambda, ensure a connection over port 9012 (as opposed to port 9011). Any other external HTTP calls (to your own endpoints) using HTTPConnect should return quickly as well to ensure optimal performance. Connectors - If you have a connector, then FusionAuth will hold things in flight (database connections, in-memory information, etc) while we wait for your connector to return an authentication response and log the user in. Ideally, your connector would have a read timeout of 1ms and a connect timeout of 2ms or less. Higher values mean FusionAuth will have fewer resources (database connections, etc) available to service incoming login requests (as older login requests are still in flight waiting for the Connector return). Webhooks - If you have any webhooks, these should also return quickly to optimize performance. Any SocketTimeoutException (read or connect) in the Event Log would indicate a slower webhook integration. Adjust Period Tasks If you have any periodic tasks running against your deployment (perhaps to synchronize user data, application data, etc by calling our APIs), then you will want to write back-off logic if the system shows a heavy load (monitoring documentation). Turn off logging in production This change will be less impactful but will help nevertheless. Logging should be used when there is an issue with SMTP/email, lambdas, connectors, users, etc but turned off in production when not needed. Logs are output to System > Event Log. The debug enabled toggle throughout the UI (and API) indicates whether this logging will occur.
The big ones were that we were making calls to the API in a lambda and switched the port as mentioned. We also had left on debug logging in a lot of places so have disabled that for now. Validated our webhooks are all running pretty fast (median type is 1-2ms) and connector is pretty fast, though slower 20-30ms median.