FusionAuth /oauth2/* requests performance
-
Hi!
I have some questions about FusionAuth performance.
At the moment I am doing load testing of our system that uses FusionAuth as IdP. I have noticed that response time for requests such as /oauth2/authorize, /oauth2/token, /oauth2/introspect grow dramatically with number of concurrent users trying to login and starting from some number of users FusionAuth hangs.
How is it possible to increase login performance and especially /oauth2/introspect? Could I use /api/jwt/validate to validate access token? Will it work faster? How is it possible to prevent FusionAuth from hanging?
Here are some details:
Our users' common activity look like:- Login (trigger /oauth2/authorize and /oauth2/token)
- Do some actions in our system (/oauth2/introspect is called on every action to validate access token).
We expect that there will be more than 1M users and several thousands will be online at the same time. So we expect several hundreds of requests per second.
My load test emulates several concurrent users, each user performs the following steps sequentially:
- Send /oauth2/authorize request
- Send /oauth2/token request
- Send GET /api/user request
... - Validate access token with /oauth2/introspect request
- Do some actions in the system
.... - Validate access token with /oauth2/introspect request
- Do some actions in the system
....
I get the following numbers (avg per request):
- with 32 concurrent users
- /oauth2/authorize takes 500 ms,
- /oauth2/token - 350 ms
- /oauth2/introspect - 200 ms
- with 64 concurrent users
- /oauth2/authorize - 1.5 s,
- /oauth2/token - 1 s
- /oauth2/introspect - 500 ms
- with 128 concurrent users
- /oauth2/authorize - 3 s,
- /oauth2/token - 2 s
- /oauth2/introspect - 800 ms
- with 256 concurrent users - some errors occur and for success responses timings are
- /oauth2/authorize - 6 s,
- /oauth2/token - 3.5 s
- /oauth2/introspect - 2 s
- with 512 concurrent users - FusionAuth hangs, I could not even open the admin panel, any further requests fail
Test environment:
- 16 CPU (Intel(R) Xeon(R) Silver 4208 CPU @ 2.10GHz), 64Gb RAM
- 2 FusionAuth instances in docker
- 2Gb per instance
- Both target one DB (Postgres) on the same machine
- db connection pool size - 32
- HAProxy as loadbalancer
-
Hiya,
These kinds of issues are hard to debug because they are context dependent. But here are some places to look:
Are you seeing any error messages in the logs? That might give a clue as to where things are breaking down.
What version of FusionAuth are you running?
Are you CPU bound (probably), IO bound or network bound?
If you add more FusionAuth containers, do you see performance improve? FusionAuth as of 1.19 is 100% stateless, so you should be able to scale linearly by adding FusionAuth containers (at least until PostgrSQL or memory limits kick in).
You could also validate the access token by using a JWT library (JWTs are designed to be validated without calling the introspect endpoint, decoupling the consumer of the JWT from the producer), which would reduce traffic and load on FusionAuth. Is that an option? Or do you need to call introspect?
This thread may be worth reading, if you haven't seen it yet: https://fusionauth.io/community/forum/topic/370/performance-issues-even-with-a-8-core-32-gigs They ended up horizontally scaling their FusionAuth instances.
-
One other thing to note.
For performance issues, since they can be so complex and system dependent, we recommend purchasing a paid edition so you can get direct access and support from the engineering team.
You can learn more about pricing and support options here.
-
Hi @dan,
I do see errors in the logs about DB connection timeout. They are posted below. FusionAuth is configured with max connection pool size = 32.
FusionAuth version is 1.19.8
As for the bounds: I see that that all 16 cores of CPU have 100% utilization when I run load test. Still, I could not get performance better than 25 logins per second. No problem with network so far.
Thank you for the link to the thread. I will try to deploy several FusionAuth instances across multiple nodes and run tests again.
Validating JWT token without calling fusionauth could be an option. But still users will need to login at least once and refresh token periodically, and this is not possible to do without calls to FusionAuth. With 1M users we expect it would be rather often.
### Error querying database. Cause: java.sql.SQLTransientConnectionException: HikariPool-1 - Connection is not available, request timed out after 2002ms. ### The error may exist in io/fusionauth/api/domain/SystemConfigurationMapper.java (best guess) ### The error may involve io.fusionauth.api.domain.SystemConfigurationMapper.retrieve ### The error occurred while executing a query ### Cause: java.sql.SQLTransientConnectionException: HikariPool-1 - Connection is not available, request timed out after 2002ms. at org.apache.ibatis.exceptions.ExceptionFactory.wrapException(ExceptionFactory.java:30) at org.apache.ibatis.session.defaults.DefaultSqlSession.selectList(DefaultSqlSession.java:150) at org.apache.ibatis.session.defaults.DefaultSqlSession.selectList(DefaultSqlSession.java:141) at org.apache.ibatis.session.defaults.DefaultSqlSession.selectOne(DefaultSqlSession.java:77) at jdk.internal.reflect.GeneratedMethodAccessor46.invoke(Unknown Source) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:564) at org.apache.ibatis.session.SqlSessionManager$SqlSessionInterceptor.invoke(SqlSessionManager.java:357) at com.sun.proxy.$Proxy47.selectOne(Unknown Source) at org.apache.ibatis.session.SqlSessionManager.selectOne(SqlSessionManager.java:166) at org.apache.ibatis.binding.MapperMethod.execute(MapperMethod.java:83) at org.apache.ibatis.binding.MapperProxy.invoke(MapperProxy.java:59) at com.sun.proxy.$Proxy49.retrieve(Unknown Source) at io.fusionauth.api.service.system.DefaultSystemConfigurationService.retrieve(DefaultSystemConfigurationService.java:66) at io.fusionauth.app.action.BaseAction.<init>(BaseAction.java:109) at io.fusionauth.app.action.oauth2.TokenAction.<init>(TokenAction.java:86) at io.fusionauth.app.action.oauth2.TokenAction$$FastClassByGuice$$688d5ead.newInstance(<generated>) at com.google.inject.internal.DefaultConstructionProxyFactory$FastClassProxy.newInstance(DefaultConstructionProxyFactory.java:89) at com.google.inject.internal.ConstructorInjector.provision(ConstructorInjector.java:114) at com.google.inject.internal.ConstructorInjector.construct(ConstructorInjector.java:91) at com.google.inject.internal.ConstructorBindingImpl$Factory.get(ConstructorBindingImpl.java:306) at com.google.inject.internal.InjectorImpl$1.get(InjectorImpl.java:1094) ... 40 common frames omitted Caused by: java.sql.SQLTransientConnectionException: HikariPool-1 - Connection is not available, request timed out after 2002ms. at com.zaxxer.hikari.pool.HikariPool.createTimeoutException(HikariPool.java:697) at com.zaxxer.hikari.pool.HikariPool.getConnection(HikariPool.java:196) at com.zaxxer.hikari.pool.HikariPool.getConnection(HikariPool.java:161) at com.zaxxer.hikari.HikariDataSource.getConnection(HikariDataSource.java:100) at org.apache.ibatis.transaction.jdbc.JdbcTransaction.openConnection(JdbcTransaction.java:139) at org.apache.ibatis.transaction.jdbc.JdbcTransaction.getConnection(JdbcTransaction.java:61) at org.apache.ibatis.executor.BaseExecutor.getConnection(BaseExecutor.java:338) at org.apache.ibatis.executor.SimpleExecutor.prepareStatement(SimpleExecutor.java:84) at org.apache.ibatis.executor.SimpleExecutor.doQuery(SimpleExecutor.java:62) at org.apache.ibatis.executor.BaseExecutor.queryFromDatabase(BaseExecutor.java:326) at org.apache.ibatis.executor.BaseExecutor.query(BaseExecutor.java:156) at org.apache.ibatis.executor.CachingExecutor.query(CachingExecutor.java:109) at org.apache.ibatis.executor.CachingExecutor.query(CachingExecutor.java:83) at org.apache.ibatis.session.defaults.DefaultSqlSession.selectList(DefaultSqlSession.java:148) ... 60 common frames omitted
-
Great, thanks! Please let us know how the additional troubleshooting goes. I'd look at the error messages and see if anything funny sticks out.
-
As a side note, may or may not be related. In version 1.20.0 we updated our Docker image base from Alpine which was using OpenJDK compiled with the
musl
C library to Ubuntu Focal using OpenJDK compiled withglibc
.See release notes for additional details, but it would be interesting if you see any changes in your performance metrics due to this change.
https://fusionauth.io/docs/v1/tech/release-notes/ -
Hi!
I know that it has passed some time since the topic had been created, but the problem described above is still actual and I would like to raise the issue again, adding some information.
I would like to remind you the problem: while I was load testing my company's system that uses FusionAuth as IdP, I've noticed that it is not possible to proceed more then several dozens (not even hundreds) logins per second using FusionAuth. When I say "login" I mean sequential calls to /oauth2/authorize and /oauth2/token. So I am looking for a way to increase the performance of the system.
Here are some details about the environment:
FusionAuth version is 1.22.2.
6 instances are deployed in kubernetes, each on its own node, no resource limits from kubernetes side.
FusiosionAuth is launched with the following config options:FUSIONAUTH_APP_MEMORY: 2G DATABASE_MAXIMUM_POOL_SIZE: 16 FUSIONAUTH_APP_RUNTIME_MODE: production SEARCH_TYPE: database
The database is MySQL Percona Server 8.0.22-13, deployed on a separate virtual machine with the following params:
Model: Intel Core Processor (Broadwell, IBRS) CPU Cores: 4 Clock: 2199 MHz
Load test emulates several concurrent users, each user performs the following steps sequentially:
- Send /oauth2/authorize request
- Send /oauth2/token request (exchange auth code to access token)
I ran several tests with various number of threads (users) and requests and could hardly reach throughput of 50 logins per second with 64 concurrent users (each login attempt consists of two requests /oauth2/authorize and /oauth2/token, so the total number of requests is x2). Increasing the number of concurrent users lowers the overall throughput and causes more errors.
I could see errors in logs reporing database connection problems:
2021-03-26 1:10:18.956 PM ERROR io.fusionauth.app.primeframework.error.ExceptionExceptionHandler - An unhandled exception was thrown org.apache.ibatis.exceptions.PersistenceException: ### Error querying database. Cause: java.sql.SQLTransientConnectionException: HikariPool-1 - Connection is not available, request timed out after 2090ms. ### The error may exist in io/fusionauth/api/domain/InstanceMapper.java (best guess) ### The error may involve io.fusionauth.api.domain.InstanceMapper.retrieve ### The error occurred while executing a query ### Cause: java.sql.SQLTransientConnectionException: HikariPool-1 - Connection is not available, request timed out after 2090ms. ... ... Caused by: java.sql.SQLTransientConnectionException: HikariPool-1 - Connection is not available, request timed out after 2090ms. ... ... 76 common frames omitted
I've investigated MySQL statistics and found out that even though the number of requests per second to FusionAuth is not high there are a lot of requests to MySQL.
As far as I can see, MySQL could be a possible bottleneck, but I wonder why the number of requests is so high and if there is a way to optimise FusionAuth communication with the database? Besides increasing MySQL's cpu/memory are there any other options?
-
Generally speaking the primary bottleneck for logins per second is CPU. Hashing the password is intentionally slow and FusionAuth will not be able to perform more logins per second than your CPU can handle.
One way to identify if the password hashing is the bottleneck in load tests is to reduce the hash strength. See
Tenants > Edit > Password > Cryptographic hash settings
. Set this toSalted MD5
with a factor of1
and then enableRe-hash on login
. This will cause each user to have their password re-hashed next time they login to useMD5
.If you can still only get 50 logins per second with this config, then the database is likely the bottleneck. If this config allows you to achieve a much higher logins per second, then the CPU is your bottleneck. If you are CPU bound, the only way to get more logins per second is to horizontally scale or throw larger CPUs at each node.
-
Hi!
I've made an additional investigation and found out that for some reason Fusionauth instances deployed in my k8s cluster don't utilize all CPUs available on the nodes. There are 6 nodes with 12 CPUs each, but a single Fusionauth instance hardly ever utilized even 1 CPU.
I've tried to scale horizontally, deploying 50, 60, 75 and more Fusionauth's instances and got much better result during my load testing, up to 250 logins per second (each login - two requests - /oauth2/authorize and /oauth2/token).
I wonder why this could happen and and if there are there any settings for Fusionath or Java or k8s that could help to solve the issue?
Thanks!