r - Rhadoop - wordcount using rmr -


i trying run simple rmr job using rhadoop package not working.here r script

print("initializing variable.....") sys.setenv(hadoop_home="/usr/hdp/2.2.4.2-2/hadoop") sys.setenv(hadoop_cmd="/usr/hdp/2.2.4.2-2/hadoop/bin/hadoop") print("invoking functions.......") #referece taken revolution analytics wordcount = function(    input,     output = null,     pattern = " ") { mapreduce(       input = input ,       output = output,       input.format = "text",       map = wc.map,       reduce = wc.reduce,       combine = t) }  wc.map =       function(., lines) {         keyval(           unlist(             strsplit(               x = lines,               split = pattern)),           1)}  wc.reduce =       function(word, counts ) {         keyval(word, sum(counts))}  #function invoke  wordcount('/user/hduser/rmr/wcinput.txt') 

i running above script

rscript wordcount.r 

i getting below error.

[1] "initializing variable....." [1] "invoking functions......." error in wordcount("/user/hduser/rmr/wcinput.txt") : not find function "mapreduce" execution halted 

kindly let me know issue.

firstly, you'll have set hadoop_streaming environment variable in code.

try below code, , note code assumes have copied text file hdfs folder examples/wordcount/data

r code:

sys.setenv("hadoop_cmd"="/usr/local/hadoop/bin/hadoop") sys.setenv("hadoop_streaming"="/usr/local/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.4.0.jar")  # load librarys library(rmr2) library(rhdfs)  # initiate rhdfs package hdfs.init()  map <- function(k,lines) {   words.list <- strsplit(lines, '\\s')   words <- unlist(words.list)   return( keyval(words, 1) ) }  reduce <- function(word, counts) {   keyval(word, sum(counts)) }  wordcount <- function (input, output=null) {   mapreduce(input=input, output=output, input.format="text", map=map, reduce=reduce) }  ## read text files folder example/wordcount/data hdfs.root <- 'example/wordcount' hdfs.data <- file.path(hdfs.root, 'data')  ## save result in folder example/wordcount/out hdfs.out <- file.path(hdfs.root, 'out')  ## submit job out <- wordcount(hdfs.data, hdfs.out)   ## fetch results hdfs results <- from.dfs(out) results.df <- as.data.frame(results, stringsasfactors=f) colnames(results.df) <- c('word', 'count')  head(results.df) 

output:

word count      16       5   b.     1      13      23       7 

for reference, here example of running r word count map reduce program.

hope helps.


Comments

Popular posts from this blog

c++ - Difference between pre and post decrement in recursive function argument -

php - Nothing but 'run(); ' when browsing to my local project, how do I fix this? -

php - How can I echo out this array? -