FusionAuth keeps restarting after upgrade from version 1.36.6 -> 1.38.1 even after successful migrations
-
So my application usually starts up with no issues at first, but when I use the admin console of the app, I notice performance issues which eventually leads to the deployment crashing and restarting - along with this it uses up the CPU of the Postgres instance
The application has been deployed via K8s and these are the configurations/resources given to it
resources: limits: cpu: 1000m memory: 3Gi requests: cpu: 500m memory: 1Gi
The app's memory and cpu does spike it but it always stays in limit even before the time of restart/crash
I initially thought that issue must be due to an incomplete migration in 1.37 run through the silentMode but even after running the migration manually the issue still persists
In my container logs I can see
2025-03-04 11:50:26.118 AM INFO com.inversoft.jdbc.hikari.DataSourceProvider - Connecting to PostgreSQL database at [jdbc:postgresql://{postgres_db_url}]
2025-03-04 11:50:26.120 AM WARN com.zaxxer.hikari.HikariConfig - HikariPool-1 - idleTimeout has been set but has no effect because the pool is operating as a fixed size pool.2025-03-04 11:50:26.122 AM INFO com.zaxxer.hikari.HikariDataSource - HikariPool-1 - Starting...
2025-03-04 11:50:26.851 AM INFO com.zaxxer.hikari.pool.HikariPool - HikariPool-1 - Added connection org.postgresql.jdbc.PgConnection@557b6a37
2025-03-04 11:50:38.245 AM INFO io.fusionauth.api.service.system.NodeService - Node [27af95a4-74ca-4d4f-92dc-a06306e7a597] promoted to master at [2025-03-04T11:50:38.245353677Z], the previous master Node [80222721-88f3-4b24-85f4-3af56b57732d] has been shutdown or removed
2025-03-04 11:50:39.164 AM INFO io.fusionauth.app.primeframework.FusionHTTPContextAuthSetup - Initializing the FusionAuth HTTP Context.
2025-03-04 11:50:39.428 AM INFO org.primeframework.mvc.netty.PrimeHTTPServer - Starting FusionAuth HTTP server on port [9011]
2025-03-04 11:50:39.748 AM INFO org.primeframework.mvc.netty.PrimeHTTPServer - Starting FusionAuth HTTP loopback server on port [9012]
2025-03-04 11:57:51.749 AM INFO org.primeframework.mvc.netty.PrimeHTTPServer - Shutting down the Prime HTTP server [/0.0.0.0:9011]
2025-03-04 11:57:51.750 AM INFO org.primeframework.mvc.netty.PrimeHTTPServer - Shutting down the Prime HTTP server [/0.0.0.0:9012]
2025-03-04 11:57:51.750 AM INFO org.primeframework.mvc.PrimeMVCRequestHandler - Shutting down Prime MVC
2025-03-04 11:57:51.750 AM INFO org.primeframework.mvc.netty.PrimeHTTPServer - Gracefully closing the server resources
2025-03-04 11:57:51.751 AM ERROR org.primeframework.mvc.guice.GuiceBootstrap - Unable to shutdown Closeable [Key[type=org.apache.ibatis.session.SqlSessionManager, annotation=[none]]]
-
@prithwhat It looks like the resources are well above the min. so not sure it has to do with that. I will search around a bit to see if I can learn anything from the logs you provided.
I assume you have seen our docs on Deploying FusionAuth to Kubernetes, but it may make sense to take some time to review and see if anything jumps out at you.
-
@prithwhat The line
2025-03-04 11:50:38.245 AM INFO io.fusionauth.api.service.system.NodeService - Node [27af95a4-74ca-4d4f-92dc-a06306e7a597] promoted to master at [2025-03-04T11:50:38.245353677Z], the previous master Node [80222721-88f3-4b24-85f4-3af56b57732d] has been shutdown or removed
is interesting. Do you have any sort of health checks that restart a node if it can't be accessed? I did see some info around the timeout being upped to 60 seconds instead of 30 and that helped with their issue.
-
@mark-robustelli Hey first of all thanks for taking time out to check this
The chart does have a default liveness, readiness and startup probe specified
livenessProbe:
httpGet:
path: /
port: http
failureThreshold: 3
periodSeconds: 30
timeoutSeconds: 5readinessProbe -- Configures a readinessProbe to ensure fusionauth is ready for requests
readinessProbe:
httpGet:
path: /
port: http
failureThreshold: 5
timeoutSeconds: 5startupProbe -- Configures a startupProbe to ensure fusionauth has finished starting up
startupProbe:
httpGet:
path: /
port: http
failureThreshold: 20
periodSeconds: 10
timeoutSeconds: 5But I'm not sure this is the issue because in that case the app would not start in the first place, and if left unused it will stay healthy and only when i start using the admin console is that it starts to shut down
-
@prithwhat OK, so in the log you posted above
2025-03-04 11:50:39.748 AM INFO org.primeframework.mvc.netty.PrimeHTTPServer - Starting FusionAuth HTTP loopback server on port [9012] 2025-03-04 11:57:51.749 AM INFO org.primeframework.mvc.netty.PrimeHTTPServer - Shutting down the Prime HTTP server [/0.0.0.0:9011]
Was it about 7 minutes between the time you spun up FusionAuth and when you tried to login to the admin UI?