elasticsearch - Updating field of million of document from a worker -
currently have update field in on 1 million documents indexed in elasticsearch. complex task due field contains metadata generated xml files, evaluating xpath expressions. have loop on documents in index , update field. so, in order avoid overkill system, decide use ironworker platform.
have read several post how update millions of docs in elasticsearch, this one, given gonna use ironworkers there restrictions, task can run 60 minutes.
question: how loop on documents , update fields, considering restriction of 60 min.
thought opening , scroll , pass scroll_id next worker, don't have idea of how long take execute next task, scroll expire , have start over.
it sounds description chain ironworker tasks, easy. if have idea of how long takes through updating single item, extrapolate how long need. let's took 100ms update item, 10 per second, or 600 per minute maybe 6000 (which should take 10 minutes), queue next 1 code. queuing next task easy queuing first task: http://dev.iron.io/worker/reference/api/#queue_a_task (can use client library language too).
or stop after x minutes , queue next worker.
or if want make things faster, how queue 26 @ same time, 1 each letter of alphabet? each 1 can query items starting letter it's assigned (prefix query ) .
there's many ways slice problem.
Comments
Post a Comment