FusionAuth
    • Home
    • Categories
    • Recent
    • Popular
    • Pricing
    • Contact us
    • Docs
    • Login

    FusionAuth /oauth2/* requests performance

    Scheduled Pinned Locked Moved
    General Discussion
    performance
    3
    9
    8.4k
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • M
      Marat
      last edited by dan

      Hi!

      I have some questions about FusionAuth performance.

      At the moment I am doing load testing of our system that uses FusionAuth as IdP. I have noticed that response time for requests such as /oauth2/authorize, /oauth2/token, /oauth2/introspect grow dramatically with number of concurrent users trying to login and starting from some number of users FusionAuth hangs.

      How is it possible to increase login performance and especially /oauth2/introspect? Could I use /api/jwt/validate to validate access token? Will it work faster? How is it possible to prevent FusionAuth from hanging?

      Here are some details:
      Our users' common activity look like:

      • Login (trigger /oauth2/authorize and /oauth2/token)
      • Do some actions in our system (/oauth2/introspect is called on every action to validate access token).

      We expect that there will be more than 1M users and several thousands will be online at the same time. So we expect several hundreds of requests per second.

      My load test emulates several concurrent users, each user performs the following steps sequentially:

      • Send /oauth2/authorize request
      • Send /oauth2/token request
      • Send GET /api/user request
        ...
      • Validate access token with /oauth2/introspect request
      • Do some actions in the system
        ....
      • Validate access token with /oauth2/introspect request
      • Do some actions in the system
        ....

      I get the following numbers (avg per request):

      • with 32 concurrent users
        • /oauth2/authorize takes 500 ms,
        • /oauth2/token - 350 ms
        • /oauth2/introspect - 200 ms
      • with 64 concurrent users
        • /oauth2/authorize - 1.5 s,
        • /oauth2/token - 1 s
        • /oauth2/introspect - 500 ms
      • with 128 concurrent users
        • /oauth2/authorize - 3 s,
        • /oauth2/token - 2 s
        • /oauth2/introspect - 800 ms
      • with 256 concurrent users - some errors occur and for success responses timings are
        • /oauth2/authorize - 6 s,
        • /oauth2/token - 3.5 s
        • /oauth2/introspect - 2 s
      • with 512 concurrent users - FusionAuth hangs, I could not even open the admin panel, any further requests fail

      Test environment:

      • 16 CPU (Intel(R) Xeon(R) Silver 4208 CPU @ 2.10GHz), 64Gb RAM
      • 2 FusionAuth instances in docker
        • 2Gb per instance
        • Both target one DB (Postgres) on the same machine
        • db connection pool size - 32
      • HAProxy as loadbalancer
      1 Reply Last reply Reply Quote 0
      • danD
        dan
        last edited by dan

        Hiya,

        These kinds of issues are hard to debug because they are context dependent. But here are some places to look:

        Are you seeing any error messages in the logs? That might give a clue as to where things are breaking down.

        What version of FusionAuth are you running?

        Are you CPU bound (probably), IO bound or network bound?

        If you add more FusionAuth containers, do you see performance improve? FusionAuth as of 1.19 is 100% stateless, so you should be able to scale linearly by adding FusionAuth containers (at least until PostgrSQL or memory limits kick in).

        You could also validate the access token by using a JWT library (JWTs are designed to be validated without calling the introspect endpoint, decoupling the consumer of the JWT from the producer), which would reduce traffic and load on FusionAuth. Is that an option? Or do you need to call introspect?

        This thread may be worth reading, if you haven't seen it yet: https://fusionauth.io/community/forum/topic/370/performance-issues-even-with-a-8-core-32-gigs They ended up horizontally scaling their FusionAuth instances.

        --
        FusionAuth - Auth so modern you can download it.
        https://fusionauth.io

        1 Reply Last reply Reply Quote 0
        • danD
          dan
          last edited by

          One other thing to note.

          For performance issues, since they can be so complex and system dependent, we recommend purchasing a paid edition so you can get direct access and support from the engineering team.

          You can learn more about pricing and support options here.

          --
          FusionAuth - Auth so modern you can download it.
          https://fusionauth.io

          1 Reply Last reply Reply Quote 0
          • M
            Marat
            last edited by

            Hi @dan,

            I do see errors in the logs about DB connection timeout. They are posted below. FusionAuth is configured with max connection pool size = 32.

            FusionAuth version is 1.19.8

            As for the bounds: I see that that all 16 cores of CPU have 100% utilization when I run load test. Still, I could not get performance better than 25 logins per second. No problem with network so far.

            Thank you for the link to the thread. I will try to deploy several FusionAuth instances across multiple nodes and run tests again.

            Validating JWT token without calling fusionauth could be an option. But still users will need to login at least once and refresh token periodically, and this is not possible to do without calls to FusionAuth. With 1M users we expect it would be rather often.

            ### Error querying database.  Cause: java.sql.SQLTransientConnectionException: HikariPool-1 - Connection is not available, request timed out after 2002ms.
            ### The error may exist in io/fusionauth/api/domain/SystemConfigurationMapper.java (best guess)
            ### The error may involve io.fusionauth.api.domain.SystemConfigurationMapper.retrieve
            ### The error occurred while executing a query
            ### Cause: java.sql.SQLTransientConnectionException: HikariPool-1 - Connection is not available, request timed out after 2002ms.
                    at org.apache.ibatis.exceptions.ExceptionFactory.wrapException(ExceptionFactory.java:30)
                    at org.apache.ibatis.session.defaults.DefaultSqlSession.selectList(DefaultSqlSession.java:150)
                    at org.apache.ibatis.session.defaults.DefaultSqlSession.selectList(DefaultSqlSession.java:141)
                    at org.apache.ibatis.session.defaults.DefaultSqlSession.selectOne(DefaultSqlSession.java:77)
                    at jdk.internal.reflect.GeneratedMethodAccessor46.invoke(Unknown Source)
                    at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
                    at java.base/java.lang.reflect.Method.invoke(Method.java:564)
                    at org.apache.ibatis.session.SqlSessionManager$SqlSessionInterceptor.invoke(SqlSessionManager.java:357)
                    at com.sun.proxy.$Proxy47.selectOne(Unknown Source)
                    at org.apache.ibatis.session.SqlSessionManager.selectOne(SqlSessionManager.java:166)
                    at org.apache.ibatis.binding.MapperMethod.execute(MapperMethod.java:83)
                    at org.apache.ibatis.binding.MapperProxy.invoke(MapperProxy.java:59)
                    at com.sun.proxy.$Proxy49.retrieve(Unknown Source)
                    at io.fusionauth.api.service.system.DefaultSystemConfigurationService.retrieve(DefaultSystemConfigurationService.java:66)
                    at io.fusionauth.app.action.BaseAction.<init>(BaseAction.java:109)
                    at io.fusionauth.app.action.oauth2.TokenAction.<init>(TokenAction.java:86)
                    at io.fusionauth.app.action.oauth2.TokenAction$$FastClassByGuice$$688d5ead.newInstance(<generated>)
                    at com.google.inject.internal.DefaultConstructionProxyFactory$FastClassProxy.newInstance(DefaultConstructionProxyFactory.java:89)
                    at com.google.inject.internal.ConstructorInjector.provision(ConstructorInjector.java:114)
                    at com.google.inject.internal.ConstructorInjector.construct(ConstructorInjector.java:91)
                    at com.google.inject.internal.ConstructorBindingImpl$Factory.get(ConstructorBindingImpl.java:306)
                    at com.google.inject.internal.InjectorImpl$1.get(InjectorImpl.java:1094)
                    ... 40 common frames omitted
            Caused by: java.sql.SQLTransientConnectionException: HikariPool-1 - Connection is not available, request timed out after 2002ms.
                    at com.zaxxer.hikari.pool.HikariPool.createTimeoutException(HikariPool.java:697)
                    at com.zaxxer.hikari.pool.HikariPool.getConnection(HikariPool.java:196)
                    at com.zaxxer.hikari.pool.HikariPool.getConnection(HikariPool.java:161)
                    at com.zaxxer.hikari.HikariDataSource.getConnection(HikariDataSource.java:100)
                    at org.apache.ibatis.transaction.jdbc.JdbcTransaction.openConnection(JdbcTransaction.java:139)
                    at org.apache.ibatis.transaction.jdbc.JdbcTransaction.getConnection(JdbcTransaction.java:61)
                    at org.apache.ibatis.executor.BaseExecutor.getConnection(BaseExecutor.java:338)
                    at org.apache.ibatis.executor.SimpleExecutor.prepareStatement(SimpleExecutor.java:84)
                    at org.apache.ibatis.executor.SimpleExecutor.doQuery(SimpleExecutor.java:62)
                    at org.apache.ibatis.executor.BaseExecutor.queryFromDatabase(BaseExecutor.java:326)
                    at org.apache.ibatis.executor.BaseExecutor.query(BaseExecutor.java:156)
                    at org.apache.ibatis.executor.CachingExecutor.query(CachingExecutor.java:109)
                    at org.apache.ibatis.executor.CachingExecutor.query(CachingExecutor.java:83)
                    at org.apache.ibatis.session.defaults.DefaultSqlSession.selectList(DefaultSqlSession.java:148)
                    ... 60 common frames omitted
            
            1 Reply Last reply Reply Quote 0
            • danD
              dan
              last edited by

              Great, thanks! Please let us know how the additional troubleshooting goes. I'd look at the error messages and see if anything funny sticks out.

              --
              FusionAuth - Auth so modern you can download it.
              https://fusionauth.io

              1 Reply Last reply Reply Quote 0
              • robotdanR
                robotdan
                last edited by

                As a side note, may or may not be related. In version 1.20.0 we updated our Docker image base from Alpine which was using OpenJDK compiled with the musl C library to Ubuntu Focal using OpenJDK compiled with glibc.

                See release notes for additional details, but it would be interesting if you see any changes in your performance metrics due to this change.
                https://fusionauth.io/docs/v1/tech/release-notes/

                1 Reply Last reply Reply Quote 0
                • M
                  Marat
                  last edited by

                  Hi!

                  I know that it has passed some time since the topic had been created, but the problem described above is still actual and I would like to raise the issue again, adding some information.

                  I would like to remind you the problem: while I was load testing my company's system that uses FusionAuth as IdP, I've noticed that it is not possible to proceed more then several dozens (not even hundreds) logins per second using FusionAuth. When I say "login" I mean sequential calls to /oauth2/authorize and /oauth2/token. So I am looking for a way to increase the performance of the system.


                  Here are some details about the environment:

                  FusionAuth version is 1.22.2.
                  6 instances are deployed in kubernetes, each on its own node, no resource limits from kubernetes side.
                  FusiosionAuth is launched with the following config options:

                  FUSIONAUTH_APP_MEMORY:        2G
                  DATABASE_MAXIMUM_POOL_SIZE:   16
                  FUSIONAUTH_APP_RUNTIME_MODE:  production
                  SEARCH_TYPE:                  database
                  

                  The database is MySQL Percona Server 8.0.22-13, deployed on a separate virtual machine with the following params:

                  Model: Intel Core Processor (Broadwell, IBRS)
                  CPU Cores: 4
                  Clock: 2199 MHz
                  

                  Load test emulates several concurrent users, each user performs the following steps sequentially:

                  • Send /oauth2/authorize request
                  • Send /oauth2/token request (exchange auth code to access token)

                  I ran several tests with various number of threads (users) and requests and could hardly reach throughput of 50 logins per second with 64 concurrent users (each login attempt consists of two requests /oauth2/authorize and /oauth2/token, so the total number of requests is x2). Increasing the number of concurrent users lowers the overall throughput and causes more errors.

                  276d8e5f-5584-436b-960b-66247794a03e-image.png

                  I could see errors in logs reporing database connection problems:

                  2021-03-26 1:10:18.956 PM ERROR io.fusionauth.app.primeframework.error.ExceptionExceptionHandler - An unhandled exception was thrown
                  org.apache.ibatis.exceptions.PersistenceException: 
                  ### Error querying database.  Cause: java.sql.SQLTransientConnectionException: HikariPool-1 - Connection is not available, request timed out after 2090ms.
                  ### The error may exist in io/fusionauth/api/domain/InstanceMapper.java (best guess)
                  ### The error may involve io.fusionauth.api.domain.InstanceMapper.retrieve
                  ### The error occurred while executing a query
                  ### Cause: java.sql.SQLTransientConnectionException: HikariPool-1 - Connection is not available, request timed out after 2090ms.
                  	...
                          ...
                  Caused by: java.sql.SQLTransientConnectionException: HikariPool-1 - Connection is not available, request timed out after 2090ms.
                  	...
                  	... 76 common frames omitted
                  
                  

                  I've investigated MySQL statistics and found out that even though the number of requests per second to FusionAuth is not high there are a lot of requests to MySQL.

                  cb4dc055-3b9a-478e-bb76-26026914c97d-image.png

                  e53433c8-6a41-4cd2-b92b-c89dd789c742-image.png

                  As far as I can see, MySQL could be a possible bottleneck, but I wonder why the number of requests is so high and if there is a way to optimise FusionAuth communication with the database? Besides increasing MySQL's cpu/memory are there any other options?

                  1 Reply Last reply Reply Quote 0
                  • robotdanR
                    robotdan
                    last edited by

                    Generally speaking the primary bottleneck for logins per second is CPU. Hashing the password is intentionally slow and FusionAuth will not be able to perform more logins per second than your CPU can handle.

                    One way to identify if the password hashing is the bottleneck in load tests is to reduce the hash strength. See Tenants > Edit > Password > Cryptographic hash settings. Set this to Salted MD5 with a factor of 1 and then enable Re-hash on login. This will cause each user to have their password re-hashed next time they login to use MD5.

                    If you can still only get 50 logins per second with this config, then the database is likely the bottleneck. If this config allows you to achieve a much higher logins per second, then the CPU is your bottleneck. If you are CPU bound, the only way to get more logins per second is to horizontally scale or throw larger CPUs at each node.

                    1 Reply Last reply Reply Quote 1
                    • M
                      Marat
                      last edited by

                      Hi!

                      I've made an additional investigation and found out that for some reason Fusionauth instances deployed in my k8s cluster don't utilize all CPUs available on the nodes. There are 6 nodes with 12 CPUs each, but a single Fusionauth instance hardly ever utilized even 1 CPU.

                      I've tried to scale horizontally, deploying 50, 60, 75 and more Fusionauth's instances and got much better result during my load testing, up to 250 logins per second (each login - two requests - /oauth2/authorize and /oauth2/token).

                      I wonder why this could happen and and if there are there any settings for Fusionath or Java or k8s that could help to solve the issue?

                      Thanks!

                      1 Reply Last reply Reply Quote 0
                      • First post
                        Last post