r - Rhadoop - wordcount using rmr -
i trying run simple rmr job using rhadoop package not working.here r script
print("initializing variable.....") sys.setenv(hadoop_home="/usr/hdp/2.2.4.2-2/hadoop") sys.setenv(hadoop_cmd="/usr/hdp/2.2.4.2-2/hadoop/bin/hadoop") print("invoking functions.......") #referece taken revolution analytics wordcount = function( input, output = null, pattern = " ") { mapreduce( input = input , output = output, input.format = "text", map = wc.map, reduce = wc.reduce, combine = t) } wc.map = function(., lines) { keyval( unlist( strsplit( x = lines, split = pattern)), 1)} wc.reduce = function(word, counts ) { keyval(word, sum(counts))} #function invoke wordcount('/user/hduser/rmr/wcinput.txt')
i running above script
rscript wordcount.r
i getting below error.
[1] "initializing variable....." [1] "invoking functions......." error in wordcount("/user/hduser/rmr/wcinput.txt") : not find function "mapreduce" execution halted
kindly let me know issue.
firstly, you'll have set hadoop_streaming
environment variable in code.
try below code, , note code assumes have copied text file hdfs
folder examples/wordcount/data
r code:
sys.setenv("hadoop_cmd"="/usr/local/hadoop/bin/hadoop") sys.setenv("hadoop_streaming"="/usr/local/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.4.0.jar") # load librarys library(rmr2) library(rhdfs) # initiate rhdfs package hdfs.init() map <- function(k,lines) { words.list <- strsplit(lines, '\\s') words <- unlist(words.list) return( keyval(words, 1) ) } reduce <- function(word, counts) { keyval(word, sum(counts)) } wordcount <- function (input, output=null) { mapreduce(input=input, output=output, input.format="text", map=map, reduce=reduce) } ## read text files folder example/wordcount/data hdfs.root <- 'example/wordcount' hdfs.data <- file.path(hdfs.root, 'data') ## save result in folder example/wordcount/out hdfs.out <- file.path(hdfs.root, 'out') ## submit job out <- wordcount(hdfs.data, hdfs.out) ## fetch results hdfs results <- from.dfs(out) results.df <- as.data.frame(results, stringsasfactors=f) colnames(results.df) <- c('word', 'count') head(results.df)
output:
word count 16 5 b. 1 13 23 7
for reference, here example of running r word count map reduce program.
hope helps.
Comments
Post a Comment