scala - Tokenizing strings in a "String RDD" returning another RDD -
i have spark rdd of individual string values, each string formed out of words separated | symbols.
this rdd generated sparksql query, not .textfile(...) load operation.
i can't (unless i'm miss-understanding fundamental) use .flatmap(_.split("|")) operation flattens each string individual characters before applying .split().
however, need .flatmap() in need 1 many mapping. data set potentially large need operation parallelize, hence use of rdds , related operations.
interestingly when processing strings rdds loaded using .textfile(...), .flatmap(...) operation want! i'm guessing there must way...
any or suggestions appreciated!
thanks!
well, not sure understand problem, try help.
in .flatmap(_.split("|")) split breaks words of each line, , @ end flattened. if don't need flatten result, perhaps can use .map(_.split("|")).
Comments
Post a Comment