Search Limitations and Bypassing them.
-
I am trying to get all the phone numbers of users for a group, to send them notifications. The number of users are in the range of 1-2 Million. I have a couple of questions,
- Is FA the best place to have such time of queries or should I create a pipeline and publish updates elsewhere?
- Is there an API planned which can allow me to get users data with specific fields - name, phone number or just the data field?
- Is it advisable to modify elastic search to index only certain properties and get results directly from there without fetching the database - Bypassing FA APIs? (I have updated the index.max_result_window to the number of results that I am expecting). Is there is way to make those queries through FA?
-
Also there is an issue of the query that FA uses to get the results from PSQL. So there is limit of ~32k even if elastic search is not the limiting factor. https://stackoverflow.com/questions/1009706/postgresql-max-number-of-parameters-in-in-clause.
-
Is FA the best place to have such time of queries or should I create a pipeline and publish updates elsewhere?
I think you have a couple of choices:
- build a pipeline based on webhooks, possibly publishing each user to s3 or another datastore
- querying elasticsearch directly
- querying fusionauth
- changing to the database engine
Given your other requirements (the ability to pick off just the phone number and name), I'd test out querying elasticsearch directly, or, if your queries are all simple, switching to the database search engine.
I'd pursue the pipeline approach only if the direct queries didn't perform well, because it's more moving pieces to break.
Is there an API planned which can allow me to get users data with specific fields - name, phone number or just the data field?
This is not currently planned, but feel free to open a feature request with more details. Feel free to reference this post and your previous github comment. I could see that being a useful API change, similar to how you can specify sort fields.
Is it advisable to modify elastic search to index only certain properties and get results directly from there without fetching the database - Bypassing FA APIs? (I have updated the index.max_result_window to the number of results that I am expecting). Is there is way to make those queries through FA?
We pretty much pass through the elasticsearch queries, letting ES do what it is good at. Modifying the elasticsearch index settings shouldn't cause any issues, but is not something we test.
You'll also want to keep an eye on https://github.com/FusionAuth/fusionauth-issues/issues/494 which should make it possible to query directly from fusionauth.
Also there is an issue of the query that FA uses to get the results from PSQL. So there is limit of ~32k even if elastic search is not the limiting factor. https://stackoverflow.com/questions/1009706/postgresql-max-number-of-parameters-in-in-clause.
I don't believe that this limit applies. Have you seen this limit?