Text Mining Cleanup with Ruby & Regex -
i have word count hash, following:
words = { "love" => 10, "hate" => 12, "lovely" => 3, "loving" => 2, "loved" => 1, "peace" => 14, "thanks" => 3, "wonderful" => 10, "grateful" => 10 # there more idea }
i want make sure "love", "loved" & "loving" counted "love". adding counts count "love", , removing rest of variation of "love". however, @ same time, don't want "lovely" counted "love", preserving is.
so i'll in end.
words = [ "love" => 13, "hate" => 12, "lovely" => 3, "peace" => 14, "thanks" => 3, "wonderful" => 10, "grateful" => 10 # there more idea ]
i have code sort of works, think logic of last line wrong. wonder if can me fix or suggest better way of doing this.
words.select { |k| /\alov[a-z]*/.match(k) } words["love"] = purgedwordcount.select { |k| /\alov[a-z]*/.match(k) }.map(&:last).reduce(:+) - 1 # 1 1 "lovely"; tried not hard code using words["lovely"], messed things completely, had this. words.delete_if { |k| /\alov[a-z]*/.match(k) && k != "love" && k != "lovely" }
thanks!
words = { "love" => 10, "hate" => 12, "lovely" => 3, "loving" => 2, "loved" => 1, "peace" => 14, "thanks" => 3, "wonderful" => 10, "grateful" => 10 # there more idea } aggregated_words = words.inject({}) |memo, (word, count)| key = word =~ /\alov.+/ && word != "lovely" ? "love" : word memo[key] = memo[key].to_i + count memo end > {"love"=>13, "hate"=>12, "lovely"=>3, "peace"=>14, "thanks"=>3, "wonderful"=>10, "grateful"=>10}
Comments
Post a Comment