the advancement of web technology and its growth, there is a huge volume of
data present in the web for internet users and a lot of data is generated too.
Social networking sites like Twitter, Facebook, Google+ are rapidly gaining
popularity as they allow people to share and express their views about topics,
have discussion with different communities, or post messages across the world.
There has been lot of work in the field of sentiment analysis of twitter data.
This survey focuses mainly on sentiment analysis of twitter data which is
helpful to analyze the information in the tweets where opinions are highly
unstructured, heterogeneous and are either positive or negative, or neutral in
some cases. In this paper, we provide a survey and a comparative analysis of
existing techniques for opinion mining like machine learning and lexicon-based
approaches, together with evaluation metrics. Using various machine learning algorithms
like Naive Bayes, Max Entropy, and Support Vector Machine, we provide research
on twitter data streams. We have also discussed general challenges and
applications of Sentiment Analysis on Twitter.
data; SVM, Naïve Bayes; and Algorithms;
Geetika Gautam and
Divakar Yadav (2014) 1 proposed the sentiment analysis for customers’
review classification which is helpful to analyze the information in the form
of the number of tweets where opinions are highly unstructured and are either
positive or negative, or somewhere in between of these two. For this we first
pre-processed the dataset, after that extracted the adjective from the dataset
that have some meaning which is called feature vector, then selected the
feature vector list and thereafter applied machine learning based
classification algorithms namely: Naive Bayes, Maximum entropy and SVM along
with the Semantic Orientation based WordNet which extracts synonyms and
similarity for the content feature.
Seyed-Ali Bahrainian and Andreas Dengel (2013) 2 proposed
Sentiment Analysis (SA) and summarization has recently become the focus of many
researchers, because analysis of online text is beneficial and demanded in many
different applications. One such application is product-based sentiment
summarization of multi-documents with the purpose of informing users about pros
and cons of various products. introduces a novel solution to target-oriented
(i.e. aspect-based) sentiment summarization and SA of short informal texts with
a main focus on Twitter posts known as “tweets”. We compare different
algorithms and methods for SA polarity detection and sentiment summarization.
Go and L.Huang (2009) 3 proposed a solution for
sentiment analysis for twitter data by using distant supervision, in which
their training data consisted of tweets with emoticons which served as noisy
labels. They build models using Naive Bayes, Maxnet and Support Vector Machines
(SVM). Their feature space consisted of unigrams, bigrams and POS. They
concluded that SVM outperformed other models and that unigram were more
effective as features.
Barbosa et al.(2010) 4 designed a two phase
automatic sentiment analysis method for classifying tweets. They classified
tweets as objective or subjective and then in second phase, the subjective
tweets were classified as positive or negative. The feature space used included
re-tweets, hash tags, link, punctuation and exclamation marks in conjunction
with features like prior polarity of words and POS.
Bifet and Frank (2010) 5 used Twitter streaming data
provided by Firehouse API , which gave all messages from every user which are
publicly available in real-time. They experimented multinomial naive Bayes,
stochastic gradient descent, and the Hoeffding tree. They arrived at a
conclusion that SGD-based model, when used with an appropriate learning rate
was the better than the rest used.
Mitali Desai. (2016)6 Sentiment analysis relates to
the problem of mining the sentiments from online available data and
categorizing the opinion expressed by an author towards a particular entity
into at most three preset categories: positive, negative and neutral. In this
paper, firstly we present the sentiment analysis process to classify highly
unstructured data on Twitter. Secondly, we discuss various techniques to
carryout sentiment analysis on Twitter data in detail.
Davidov et al.,(2010) 7 proposed a approach to
utilize Twitter user-defined hash tags in tweets as a classification of
sentiment type using punctuation, single words, n-grams and patterns as
different feature types, which are then combined into a single feature vector
for sentiment classification. They made use of K-Nearest Neighbor strategy to
assign sentiment labels by constructing a feature vector for each example in
the training and test set.
Po-Wei Liang et.al.(2014) 8 used Twitter API to
collect twitter data. Their training data falls in three different categories
(camera, movie, mobile). The data is labeled as positive, negative and
non-opinions. Tweets containing opinions were filtered. Unigram Naive Bayes
model was implemented and the Naive Bayes simplifying independence assumption
was employed. They also eliminated useless features by using the Mutual
Information and Chi square feature extraction method. Finally, the orientation
of an tweet is predicted. i.e. positive or negative.