scala - How can I speed up scalaz-stream text processing? -
how can speed following scalaz-stream code? takes 5 minutes process 70mb of text, doing quite wrong, since plain scala equivalent take few seconds.
(follow-up another question)
val converter2: task[unit] = { val docsep = "~~~" io.linesr("myinput.txt") .flatmap(line => { val words = line.split(" "); if (words.length==0 || words(0)!=docsep) process(line) else process(docsep, words.tail.mkstring(" ")) }) .split(_ == docsep) .filter(_ != vector()) .map(lines => lines.head + ": " + lines.tail.mkstring(" ")) .intersperse("\n") .pipe(text.utf8encode) .to(io.filechunkw("correctbutslowoutput.txt")) .run }
i think use 1 of process1 chunk methods chunk. if want lot parallel processing on merge of lines output format, decide if ordered output important , use channel combined merge or tee. make reusable. because doing small amount of processing swamped overhead have work harder make unit of work large enough not swamped.
Comments
Post a Comment