pattern matching - Clustering Using MapReduce -


i have unstructured twitter data retrieved apache flume , stored hdfs. want convert unstructured data structured 1 using mapreduce. task wanted using mapreduce: 1. conversion unstructured structure one. 2. want text part contain tweet part. 3. want identify tweets particular topic , grouped according sub part. e.g. have tweets of samsung handset want make group according handsets groups of samsung note 4, samsung galaxy etc.

it college project guide suggested me use k means algorithm, search lot on k means failed understand how identifies centroid failed understand how apply k means situation in mapreduce.

please gude me if doing wrong new concept

k-means clustering algorithm. cluster or group similar data , calculate common centroid. can create time-series above questions have mention. group tweets according topic.

k-mean implementation in mapreduce. https://github.com/himank/k-means

using k-means in twitter datasets.

you can check following links

https://github.com/julianhill/r-tutorials/blob/master/r_twitter_cluster.r

http://www.r-bloggers.com/cluster-your-twitter-data-with-r-and-k-means/

http://rstudio-pubs-static.s3.amazonaws.com/5983_af66eca6775f4528a72b8e243a6ecf2d.html


Comments

Popular posts from this blog

c++ - Difference between pre and post decrement in recursive function argument -

php - Nothing but 'run(); ' when browsing to my local project, how do I fix this? -

php - How can I echo out this array? -