Bulk deletion by tenantId throws error
-
I'm trying to execute the following using
@fusionauth/typescript-client
:try { const result = await fusionAuthClient.deleteUsers({ hardDelete: true, queryString: 'tenantId:886090a4-dd49-44bd-b6ba-c2cbdc3e7d21' }) total = result.response.total } catch (e) { console.error(e) debugger }
When I run the above, I get a
statusCode: 503
with the following error infusionauth-search.log
[Aug 08, 2020 10:22:56.873 AM][DEBUG][o.e.a.s.TransportSearchAction] [auth] [fusionauth_user][0], node[FAaAggPGSieF4t__1kgM5Q], [P], s[STARTED], a[id=fH-Q-YNpSWCbYywRSli7gQ]: Failed to execute [SearchRequest{searchType=QUERY_THEN_FETCH, indices=[fusionauth_user], indicesOptions=Ind icesOptions[ignore_unavailable=false, allow_no_indices=true, expand_wildcards_open=true, expand_wildcards_closed=false, allow_aliases_to_multiple_indices=true, forbid_closed_indices=true, ignore_aliases=false, ignore_throttled=true], types=[], routing='null', preference='null', req uestCache=null, scroll=null, maxConcurrentShardRequests=0, batchedReduceSize=512, preFilterShardSize=128, allowPartialSearchResults=true, localClusterAlias=null, getOrCreateAbsoluteStartMillis=-1, ccsMinimizeRoundtrips=true, source={"from":9698,"size":500,"query":{"query_string":{" query":"tenantId:886090a4-dd49-44bd-b6ba-c2cbdc3e7d21","fields":[],"type":"best_fields","default_operator":"or","max_determinized_states":10000,"enable_position_increments":true,"fuzziness":"AUTO","fuzzy_prefix_length":0,"fuzzy_max_expansions":50,"phrase_slop":0,"escape":false,"aut o_generate_synonyms_phrase_query":true,"fuzzy_transpositions":true,"boost":1.0}},"sort":[{"_score":{"order":"desc"}},{"fullName":{"order":"desc"}},{"email":{"order":"desc"}}]}}] lastShard [true] org.elasticsearch.transport.RemoteTransportException: [auth][127.0.0.1:9020][indices:data/read/search[phase/query]] Caused by: java.lang.IllegalArgumentException: Result window is too large, from + size must be less than or equal to: [10000] but was [10198]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] ind ex level setting. at org.elasticsearch.search.DefaultSearchContext.preProcess(DefaultSearchContext.java:211) ~[elasticsearch-7.6.1.jar:7.6.1] at org.elasticsearch.search.query.QueryPhase.preProcess(QueryPhase.java:113) ~[elasticsearch-7.6.1.jar:7.6.1] at org.elasticsearch.search.SearchService.createContext(SearchService.java:603) ~[elasticsearch-7.6.1.jar:7.6.1] at org.elasticsearch.search.SearchService.createAndPutContext(SearchService.java:550) ~[elasticsearch-7.6.1.jar:7.6.1] at org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:351) ~[elasticsearch-7.6.1.jar:7.6.1] at org.elasticsearch.search.SearchService.lambda$executeQueryPhase$1(SearchService.java:343) ~[elasticsearch-7.6.1.jar:7.6.1] at org.elasticsearch.action.ActionListener.lambda$map$2(ActionListener.java:146) ~[elasticsearch-7.6.1.jar:7.6.1] at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:63) [elasticsearch-7.6.1.jar:7.6.1] at org.elasticsearch.action.ActionRunnable.lambda$supply$0(ActionRunnable.java:58) [elasticsearch-7.6.1.jar:7.6.1] at org.elasticsearch.action.ActionRunnable$2.doRun(ActionRunnable.java:73) [elasticsearch-7.6.1.jar:7.6.1] at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-7.6.1.jar:7.6.1] at org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:44) [elasticsearch-7.6.1.jar:7.6.1] at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:692) [elasticsearch-7.6.1.jar:7.6.1] at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-7.6.1.jar:7.6.1] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) [?:?] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630) [?:?] at java.lang.Thread.run(Thread.java:832) [?:?]
How do I efficiently delete all users of a tenant?
-
@twosevenxyz said in Bulk deletion by tenantId throws error:
Caused by: java.lang.IllegalArgumentException: Result window is too large, from + size must be less than or equal to: [10000] but was [10198]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window]
Looks like the above may be the issue.
For your immediate concern, I'd just add another query parameter (application, maybe) to take the number of users below the 10000 limit.
For the longer term, if you are going to be doing this often, I'd consider increasing the size of your
max_result_window
. Please test this as I'm not sure of the memory or performance ramifications, but I'd expect elasticsearch to need more memory if there were more results pulled. Here's a stackoverflow post on how do make this change: https://stackoverflow.com/questions/40184200/how-to-increase-the-max-result-window-in-elasticsearch-using-a-python-scriptThis issue may also be of interest: https://github.com/FusionAuth/fusionauth-issues/issues/494
-
Forgive me if I sound entitled, but I'm a bit perplexed by this..
In a standard FusionAuth deployment, as documented on fusionauth.io, this API is unusable if the search result exceeds 10,000 users? This would mean that nearly all of the APIs documented here are unusable without tweaking elasticsearch in one way or another..
Additionally, could you recommend what I should do if my total user count simply will not fit in memory?
-
No worries! I'm not aware of folks doing large bulk deletes. We love to hear about people's use cases and I'd suggest filing an issue explaining what you're trying to do, if you run into issues. https://github.com/fusionauth/fusionauth-issues/issues
I confess, I'm curious as to why you're deleting all your users? Is it CI/CD? Tests? Time limited accounts? Something else?
If your user count won't fit in memory, I'd try the following:
- limit the query by adding in an additional parameter (perhaps just the first letter of the email address or something simple) and do multiple bulk deletes.
- increase the amount of memory available to elasticsearch:
fusionauth-search.memory
(https://fusionauth.io/docs/v1/tech/reference/configuration) - if you can get by with less sophisticated searching, try the database search engine. Here's how to switch: https://fusionauth.io/docs/v1/tech/tutorials/switch-search-engines
To be clear, I don't know if it won't work with the default settings and an increased
max_result_window
as I haven't tried, I just wanted to caution you to test (which you probably don't need me to remind you of ). -
The reason for bulk deletion links to another question I asked recently. I imported 850K+ users recently, and given that I had not registered the users against all applications, I decided to delete all users and re-import.
-
Ah.
You may want to consider dropping the database. That will remove all your users (and everything else) and you'll start with a clean slate. I do this often:
- stop fusionauth
- drop the database
- drop the database user
- start fusionauth
FusionAuth then recreates the database and you start over with a fresh install.
That may or may not work for you, just wanted to offer another path.
-
Sadly, that isn't a viable option for me. I spent a considerable amount of time on setting up email templates.. I don't want to lose those.
It would be great if we could export these non-user settings, and later re-import them. This might be a good case for setting up a Github issue/feature-request. I will go ahead and create that.
-
Ah, makes sense.
You should be able to export the email templates via the retrieve email template API and them re-import them using the same API. It may take a bit of fiddling, but it should be possible. In fact, you may want to capture the email templates as a kickstart file for future deployment/dev envt setup ease: https://fusionauth.io/docs/v1/tech/installation-guide/kickstart
I think we already have some issues about configuration migration, so you may want to check them out and vote for them if they convey what you'd like (please upvote them if so):