scala - Tokenizing strings in a "String RDD" returning another RDD -


i have spark rdd of individual string values, each string formed out of words separated | symbols.

this rdd generated sparksql query, not .textfile(...) load operation.

i can't (unless i'm miss-understanding fundamental) use .flatmap(_.split("|")) operation flattens each string individual characters before applying .split().

however, need .flatmap() in need 1 many mapping. data set potentially large need operation parallelize, hence use of rdds , related operations.

interestingly when processing strings rdds loaded using .textfile(...), .flatmap(...) operation want! i'm guessing there must way...

any or suggestions appreciated!

thanks!

well, not sure understand problem, try help.

in .flatmap(_.split("|")) split breaks words of each line, , @ end flattened. if don't need flatten result, perhaps can use .map(_.split("|")).


Comments

Popular posts from this blog

Email notification in google apps script -

c++ - Difference between pre and post decrement in recursive function argument -

javascript - IE11 incompatibility with jQuery's 'readonly'? -