Codersarts is a top rated website for students which is looking for online Data Analytics Assignment Help, Homework help, Coursework Help in Apcahce Spark, Pyspark, Mlib, tweepy others library and tools to students at all levels whether it is school, college and university level Coursework Help or Real time project. Hire us and Get your projects done by Data Analytics expert
There are two common Data Analytics over Social media data
Machine Learning Algorithms: Apply classification to Tweets
Real time analysis of Tweets: Spark Streaming Library
Data Analysis
Recommendation Models:
Content-based filtering
Collaborative filtering
Matrix factorization
Alternating least squares
Classification Models:
Linear models
Logistic regression
Support vector machines (SVM)
Decision trees
Naïve Bayes
Clustering Models:
k-means clustering
Hierarchical clustering
Kohonen node
Classification
Classification is a form of supervised learning where we train a model with
training examples
Can be used for:
Predicting the probability of Internet users clicking on an online advert; here, the classes are binary in nature (that is, click or no click)
Classifying images, video or sounds
Assigning categories or tags to news articles, web pages, tweets (multiclass)
Discovering e-mail and web spam (binary)
Ranking customers or users in order of probability that they might purchase a product or use a service
Predicting customers or users who might stop using a product, service or provider (called churn)
And other cases
Clustering
Clustering models is a form of unsupervised learning where each training
example is assigned to a segment called a cluster
Can be used for:
Segmenting users or customers into different groups based on behavior characteristics and metadata
Grouping content on a website or products in a retail business
Segmenting communities in social media networks
Topic clustering of Tweets
K-means clustering approach
Clustering is the process of grouping a set of objects into
classes of similar objects:
Documents within a cluster should be similar.
Documents from different clusters should be dissimilar.
In principle, optimal partition achieved via minimising the sum of
squared distance to its “representative object” in each cluster
Historical Data Analysis with Mllib (MLDataAnalysis.scala):
Data Representation
Clustering tweets by text
Classification of tweets by sentiment (negative, positive,etc.)
Result visualization in Zepplin
Streaming Data Analysis (CollectingTweetsToFile.scala, CollectingTweetsToMongoDB.scala, witterStreamingAnalyzer.scala):
Stream tweets in json file
Stream tweets to MongoDB