pattern matching - Clustering Using MapReduce -
i have unstructured twitter data retrieved apache flume , stored hdfs. want convert unstructured data structured 1 using mapreduce. task wanted using mapreduce: 1. conversion unstructured structure one. 2. want text part contain tweet part. 3. want identify tweets particular topic , grouped according sub part. e.g. have tweets of samsung handset want make group according handsets groups of samsung note 4, samsung galaxy etc.
it college project guide suggested me use k means algorithm, search lot on k means failed understand how identifies centroid failed understand how apply k means situation in mapreduce.
please gude me if doing wrong new concept
k-means clustering algorithm. cluster or group similar data , calculate common centroid. can create time-series above questions have mention. group tweets according topic.
k-mean implementation in mapreduce. https://github.com/himank/k-means
using k-means in twitter datasets.
you can check following links
https://github.com/julianhill/r-tutorials/blob/master/r_twitter_cluster.r
http://www.r-bloggers.com/cluster-your-twitter-data-with-r-and-k-means/
http://rstudio-pubs-static.s3.amazonaws.com/5983_af66eca6775f4528a72b8e243a6ecf2d.html
Comments
Post a Comment