Connection closed errors with managed Postgresql on DigitalOcean
-
Hi
We're using FA with a managed PostgreSQL instance on DigitalOcean. From time to time we get errors 500 from FA and then in the FA logs we can see that the problem is caused by the closed database connections (full traceback below).
We've tried database configuration with pgbouncer and without pgbouncer (seems to be better), decreased maxLifetime to 60000, made sure we have enough connections available to the managed database (only about 20 from 47 are used) but the issue still appears.
Is there anyone using DigitalOcean having/not having such issues? Any clues what else I can do?
Sample exception:
2021-08-02 2:46:09.245 PM WARN com.zaxxer.hikari.pool.PoolBase - HikariPool-1 - Failed to validate connection org.postgresql.jdbc.PgConnection@22d846c4 (This connection has been closed.). Possibly consider using a shorter maxLifetime value. 2021-08-02 2:46:09.245 PM ERROR com.inversoft.scheduler.LogAndRetainFailureHandler - The scheduled service [class io.fusionauth.api.service.cache.LambdaCacheLoader] failed but will be re-run. org.apache.ibatis.exceptions.PersistenceException: ### Error querying database. Cause: java.sql.SQLTransientConnectionException: HikariPool-1 - Connection is not available, request timed out after 3672ms. ### The error may exist in io/fusionauth/api/domain/LambdaMapper.java (best guess) ### The error may involve io.fusionauth.api.domain.LambdaMapper.retrieveAll ### The error occurred while executing a query ### Cause: java.sql.SQLTransientConnectionException: HikariPool-1 - Connection is not available, request timed out after 3672ms. at org.apache.ibatis.exceptions.ExceptionFactory.wrapException(ExceptionFactory.java:30) at org.apache.ibatis.session.defaults.DefaultSqlSession.selectList(DefaultSqlSession.java:149) at org.apache.ibatis.session.defaults.DefaultSqlSession.selectList(DefaultSqlSession.java:140) at jdk.internal.reflect.GeneratedMethodAccessor42.invoke(Unknown Source) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:564) at org.apache.ibatis.session.SqlSessionManager$SqlSessionInterceptor.invoke(SqlSessionManager.java:357) at com.sun.proxy.$Proxy51.selectList(Unknown Source) at org.apache.ibatis.session.SqlSessionManager.selectList(SqlSessionManager.java:206) at org.apache.ibatis.binding.MapperMethod.executeForMany(MapperMethod.java:147) at org.apache.ibatis.binding.MapperMethod.execute(MapperMethod.java:80) at org.apache.ibatis.binding.MapperProxy$PlainMethodInvoker.invoke(MapperProxy.java:152) at org.apache.ibatis.binding.MapperProxy.invoke(MapperProxy.java:85) at com.sun.proxy.$Proxy75.retrieveAll(Unknown Source) at io.fusionauth.api.service.system.DefaultLambdaService.retrieveAll(DefaultLambdaService.java:81) at io.fusionauth.api.service.cache.LambdaCacheLoader.load(LambdaCacheLoader.java:46) at io.fusionauth.api.service.cache.LambdaCacheLoader.run(LambdaCacheLoader.java:56) at com.inversoft.scheduler.InjectableRunnable.run(InjectableRunnable.java:37) at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305) at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630) at java.base/java.lang.Thread.run(Thread.java:832) Caused by: java.sql.SQLTransientConnectionException: HikariPool-1 - Connection is not available, request timed out after 3672ms. at com.zaxxer.hikari.pool.HikariPool.createTimeoutException(HikariPool.java:697) at com.zaxxer.hikari.pool.HikariPool.getConnection(HikariPool.java:196) at com.zaxxer.hikari.pool.HikariPool.getConnection(HikariPool.java:161) at com.zaxxer.hikari.HikariDataSource.getConnection(HikariDataSource.java:100) at org.apache.ibatis.transaction.jdbc.JdbcTransaction.openConnection(JdbcTransaction.java:139) at org.apache.ibatis.transaction.jdbc.JdbcTransaction.getConnection(JdbcTransaction.java:61) at org.apache.ibatis.executor.BaseExecutor.getConnection(BaseExecutor.java:337) at org.apache.ibatis.executor.SimpleExecutor.prepareStatement(SimpleExecutor.java:86) at org.apache.ibatis.executor.SimpleExecutor.doQuery(SimpleExecutor.java:62) at org.apache.ibatis.executor.BaseExecutor.queryFromDatabase(BaseExecutor.java:325) at org.apache.ibatis.executor.BaseExecutor.query(BaseExecutor.java:156) at org.apache.ibatis.executor.CachingExecutor.query(CachingExecutor.java:109) at org.apache.ibatis.executor.CachingExecutor.query(CachingExecutor.java:89) at org.apache.ibatis.session.defaults.DefaultSqlSession.selectList(DefaultSqlSession.java:147) ... 22 common frames omitted Caused by: org.postgresql.util.PSQLException: This connection has been closed. at org.postgresql.jdbc.PgConnection.checkClosed(PgConnection.java:865) at org.postgresql.jdbc.PgConnection.setNetworkTimeout(PgConnection.java:1647) at com.zaxxer.hikari.pool.PoolBase.setNetworkTimeout(PoolBase.java:552) at com.zaxxer.hikari.pool.PoolBase.isConnectionAlive(PoolBase.java:168) at com.zaxxer.hikari.pool.HikariPool.getConnection(HikariPool.java:185) ... 34 common frames omitted
-
This might be related to this open, unreproducible issue.
https://github.com/FusionAuth/fusionauth-issues/issues/1266Essentially the managed database must communicate well with Hikari (used by FusionAuth). If it closes connections without properly informing Hikari, unexpected behavior can result. Feel free to update the issue with any comments specific to your use case as well. You can try additional DB tuning to see if you can overcome this error mode.
More information on HikariCP below:
https://github.com/brettwooldridge/HikariCPIf we discover anything else on our end, we will be sure to update and report back to this issue.
Thanks,
Josh -
Hi!
We have the same problem FA work with AWS Proxy PosgreSQL manage DB. The last case was with the version FA 1.29.2
Please advise how to debug this problem from the FA side.
Sample exception:
-
@joshua thanks for pointing to these issues. I'm not sure if this is related but who knows. According to what aivanow said it seems this might be issue with all managed databases.
I've tried tuning some values, changed maxlifetime for HikariCP to 30000, changed minimum pool size to 2 connections, but the errors are still appearing.
My FA is 1.27.2
-
@maciej-wisniowski said in Connection closed errors with managed Postgresql on DigitalOcean:
it seems this might be issue with all managed databases.
Essentially the managed database must communicate well with Hikari (used by FusionAuth). If the managed DB closes connections without informing Hikari, then errors will result.
Thanks,
Josh -
@joshua @maciej-wisniowski
Hi! We continue to have these errors ( Error querying database. Cause: java.sql.SQLTransientConnectionException: HikariPool-1 - Connection is not available, request timed out after 2000ms.)
Please, can you recommend something for debugging this? -
My advice would be further database tuning and configuration management for any managed database that is not performing as expected. You might want to try and shorten the max life even more on the connection pools in FusionAuth (HikariCP) (Think 5 seconds for a connection pool to be active).
For instance, you can review the Hikari Connection Pool Timeouts (FusionAuth side) and set them to a shorter duration than the "timeouts" on DigitalOcean's managed database (in other words, if Hikiri keeps DB connections open for 10 mins and DigitalOcean's managed DB kills DB connections after 8mins, then you will want to adjust Hikari connections to 7mins. You will incur a performance cost, however (restarting and killing connection pools is costly from a performance perspective).
Related documentation is below.
https://fusionauth.io/docs/v1/tech/reference/configuration/
database.connection-timeout [Integer] Defaults to 2,000
The number of milliseconds that FusionAuth will attempt to connect to the database before failing.database.idle-timeout [Integer] Defaults to 120,000
The number of milliseconds that FusionAuth will leave an idle database connection in its connection pool before recreating it.Also, as referenced above -
made sure we have enough connections available to the managed database (only about 20 from 47 are used) but the issue still appears.
My understanding is that more connections does not guarantee a better outcome.
If anything else occurs to me, I will post back. I hope this helps!
Thanks,
Josh -
@joshua Hello!
Thanks for advice!
We use aws rds service, not digital ocean.
Our fusionauth insnacess works with the rds base through a rds proxy. Rds proxy has a config parametr: Idle client connection timeout and it 30 min now.
I understand correctly that we need to set it less than the value DATABASE_MAX_LIFETIME (now is 240 sec)?
-
An RDS proxy adds a layer of complexity that I do not have much experience with personally, but I understand the purpose of why you would add it to a DB service.
I would recommend that you review the HikariCP documentation as it outlines the different configuration options available and their impact. Some of the configuration options within HikariCP are interconnected (ie - changing one impacts others). The configuration options here roughly translate to the configuration options that are presented to you in the
fusionauth.properties
file.https://github.com/brettwooldridge/HikariCP#gear-configuration-knobs-baby
What are you are suggesting seems reasonable ( at quick glance ), but as the architecture for DB's/applications can be complex, there may be other points to consider. You are welcome to experiment. The most important paradigm to keep in mind is that if HikariCP is managing your DB connection pool, but you have a layer (proxy, DB, or otherwise) that is closing connections and not telling HikariCP, you may run into technical issues or performance impacts.
Thanks,
Josh -
Thanks @joshua !
Could you tell how to monitor the HikariCP connection pool? Fusionauth launched in docker. -
@aivanov To my understanding, this configuration is not directly exposed via API.
You can however listen in on the status and Prometheus endpoints of FusionAuth.
https://fusionauth.io/docs/v1/tech/installation-guide/monitor/
Thanks,
Josh
FusionAuth -
Thanks @joshua !
-