FusionAuth keeps restarting after upgrade from version 1.36.6 -> 1.38.1 even after successful migrations

prithwhat

So my application usually starts up with no issues at first, but when I use the admin console of the app, I notice performance issues which eventually leads to the deployment crashing and restarting - along with this it uses up the CPU of the Postgres instance

The application has been deployed via K8s and these are the configurations/resources given to it

    resources:
      limits:
       cpu: 1000m
       memory: 3Gi
      requests:
       cpu: 500m
       memory: 1Gi

The app's memory and cpu does spike it but it always stays in limit even before the time of restart/crash

I initially thought that issue must be due to an incomplete migration in 1.37 run through the silentMode but even after running the migration manually the issue still persists

In my container logs I can see

2025-03-04 11:50:26.118 AM INFO com.inversoft.jdbc.hikari.DataSourceProvider - Connecting to PostgreSQL database at [jdbc:postgresql://{postgres_db_url}]

2025-03-04 11:50:26.120 AM WARN com.zaxxer.hikari.HikariConfig - HikariPool-1 - idleTimeout has been set but has no effect because the pool is operating as a fixed size pool.2025-03-04 11:50:26.122 AM INFO com.zaxxer.hikari.HikariDataSource - HikariPool-1 - Starting...

2025-03-04 11:50:26.851 AM INFO com.zaxxer.hikari.pool.HikariPool - HikariPool-1 - Added connection org.postgresql.jdbc.PgConnection@557b6a37

2025-03-04 11:50:38.245 AM INFO io.fusionauth.api.service.system.NodeService - Node [27af95a4-74ca-4d4f-92dc-a06306e7a597] promoted to master at [2025-03-04T11:50:38.245353677Z], the previous master Node [80222721-88f3-4b24-85f4-3af56b57732d] has been shutdown or removed

2025-03-04 11:50:39.164 AM INFO io.fusionauth.app.primeframework.FusionHTTPContextAuthSetup - Initializing the FusionAuth HTTP Context.

2025-03-04 11:50:39.428 AM INFO org.primeframework.mvc.netty.PrimeHTTPServer - Starting FusionAuth HTTP server on port [9011]

2025-03-04 11:50:39.748 AM INFO org.primeframework.mvc.netty.PrimeHTTPServer - Starting FusionAuth HTTP loopback server on port [9012]

2025-03-04 11:57:51.749 AM INFO org.primeframework.mvc.netty.PrimeHTTPServer - Shutting down the Prime HTTP server [/0.0.0.0:9011]

2025-03-04 11:57:51.750 AM INFO org.primeframework.mvc.netty.PrimeHTTPServer - Shutting down the Prime HTTP server [/0.0.0.0:9012]

2025-03-04 11:57:51.750 AM INFO org.primeframework.mvc.PrimeMVCRequestHandler - Shutting down Prime MVC

2025-03-04 11:57:51.750 AM INFO org.primeframework.mvc.netty.PrimeHTTPServer - Gracefully closing the server resources

2025-03-04 11:57:51.751 AM ERROR org.primeframework.mvc.guice.GuiceBootstrap - Unable to shutdown Closeable [Key[type=org.apache.ibatis.session.SqlSessionManager, annotation=[none]]]

mark.robustelli

@prithwhat It looks like the resources are well above the min. so not sure it has to do with that. I will search around a bit to see if I can learn anything from the logs you provided.

I assume you have seen our docs on Deploying FusionAuth to Kubernetes, but it may make sense to take some time to review and see if anything jumps out at you.

mark.robustelli

@prithwhat The line

2025-03-04 11:50:38.245 AM INFO io.fusionauth.api.service.system.NodeService - Node [27af95a4-74ca-4d4f-92dc-a06306e7a597] promoted to master at [2025-03-04T11:50:38.245353677Z], the previous master Node [80222721-88f3-4b24-85f4-3af56b57732d] has been shutdown or removed

is interesting. Do you have any sort of health checks that restart a node if it can't be accessed? I did see some info around the timeout being upped to 60 seconds instead of 30 and that helped with their issue.

prithwhat

@mark-robustelli Hey first of all thanks for taking time out to check this

The chart does have a default liveness, readiness and startup probe specified

livenessProbe:
httpGet:
path: /
port: http
failureThreshold: 3
periodSeconds: 30
timeoutSeconds: 5

readinessProbe -- Configures a readinessProbe to ensure fusionauth is ready for requests

readinessProbe:
httpGet:
path: /
port: http
failureThreshold: 5
timeoutSeconds: 5

startupProbe -- Configures a startupProbe to ensure fusionauth has finished starting up

startupProbe:
httpGet:
path: /
port: http
failureThreshold: 20
periodSeconds: 10
timeoutSeconds: 5

But I'm not sure this is the issue because in that case the app would not start in the first place, and if left unused it will stay healthy and only when i start using the admin console is that it starts to shut down

mark.robustelli

@prithwhat OK, so in the log you posted above

2025-03-04 11:50:39.748 AM INFO org.primeframework.mvc.netty.PrimeHTTPServer - Starting FusionAuth HTTP loopback server on port [9012]

2025-03-04 11:57:51.749 AM INFO org.primeframework.mvc.netty.PrimeHTTPServer - Shutting down the Prime HTTP server [/0.0.0.0:9011]

Was it about 7 minutes between the time you spun up FusionAuth and when you tried to login to the admin UI?

prithwhat

@mark-robustelli

It's possible, I didn't really keep track of that per se

We eventually moved on from this version and jumped to 1.40.x and surprisingly there were no issues in that version so this post can be closed

That being said, I did face a similar issue in a newer version for which I might post another thread