No downtime upgrades?
-
Is it possible, either in general or between specific versions, to upgrade FusionAuth without downtime? I haven't been able to find any documentation specifically on production upgrades.
For instance, I currently deploy FA on AWS ECS, which attempts to deploy the new version and wait for a passing load-balancer health-check before draining and stopping the old version. How likely is it that the old version will start failing requests due to an incompatible schema change in the SQL or search database?
Would upgrading through consecutive versions help? In other words, is there any care given to avoiding a breaking change happening between adjacent versions?
-
Hi,
There is no zero downtime upgrade process currently available for FusionAuth.
In general, the process for upgrading is:
- Take down all nodes
- Upgrade schema
- Upgrade FA
- Start all nodes
The recommendation is that you automate the process and minimize downtime (for our hosted solutions we use a configuration management tool and see downtime on the order of seconds).
-
Thanks for the clarification. Is zero-downtime somewhere on the roadmap? It seems like it could be accomplished with a strict release process on both the FA/development and user/deployment sides. For example, schema changes are always compatible with the immediately previous version and users install every version consecutively; FA vN runs on schema vN+1, but not necessarily vN+2.
I ask because, while seconds aren't critical for most cases, there are cases where either they do matter or getting down to seconds requires non-trivial, hard-to-get-right automation. For instance, the untuned automation I'm using with ECS and ELB seems to run on the order of a minute or two, due to delays in starting containers and ELB health checks, during which I see 502 Bad Gateway errors.
Given that, I'm inclined to take the risk of not shutting down the old instance, and hoping that either the changed schema isn't used during that time window, or the affected features fail without data corruption. Of course, if you know specific reasons that this is dangerous, please let me know.
(It occurs to me that I'm also assuming the schema migration is transactional/atomic. That's at least possible in MySQL 8+ and Postgres, though I don't know if FA does it that way.)
-
Zero downtime isn't on the roadmap currently, though of course we'd love if you filed a github issue to see if the community desires this: https://github.com/fusionauth/fusionauth-issues
Of course, if you know specific reasons that this is dangerous, please let me know
Well, you're running a version of code against a schema that it doesn't expect. I'd consider that a bit worriesome. While some upgrades, especially those that don't modify the schema, should go just fine, I'd recommend following the procedure laid out above.
It occurs to me that I'm also assuming the schema migration is transactional/atomic. That's at least possible in MySQL 8+ and Postgres, though I don't know if FA does it that way.
I'll ask about transactional migrations.
-
Here was the response from the engineering team about transactions:
The migrations are just SQL scripts.
- If you run them manually and use a TX, then they are in a transaction (if your database supports that).
- If you use maintenance mode, I believe each script by itself is run inside of a TX, but the entire upgrade version x to y is not transactional.
Hope this helps.
-
We've added some documentation about no downtime upgrades in FusionAuth cloud: https://fusionauth.io/docs/v1/tech/installation-guide/cloud#upgrade-duration
If you are self-hosting, we recommend running in a cluster and you should be able to have upgrade downtime similar to what FusionAuth cloud has.