Introduction
State-of-the-art technologies in NLP allow us to analyze natural languages on different layers: from simple segmentation of textual information to more sophisticated methods of sentiment categorizations.
However, it does not inevitably mean that you should be highly advanced in programming to implement high-level tasks such as sentiment analysis in Python.
Sentiment Analysis
The algorithms of sentiment analysis mostly focus on defining opinions, attitudes, and even emoticons in a corpus of texts. The range of established sentiments significantly varies from one method to another. While a standard analyzer defines up to three basic polar emotions (positive, negative, neutral), the limit of more advanced models is broader.
Consequently, they can look beyond polarity and determine six "universal" emotions (e.g. anger, disgust, fear, happiness, sadness, and surprise):
Source: Spectrum Mental Health
Moreover, depending on the task you're working on, it's also possible to collect extra information from the context such as the author or a topic that in further analysis can prevent a more complex issue than a common polarity classification - namely, subjectivity/objectivity identification.
For example, this sentence from Business insider: "In March, Elon Musk described concern over the coronavirus outbreak as a "panic" and "dumb," and he's since tweeted incorrect information, such as his theory that children are "essentially immune" to the virus." expresses subjectivity through a personal opinion of E. Musk, as well as the author of the text.
Sentiment Analysis in Python with TextBlob
The approach that the TextBlob package applies to sentiment analysis differs in that it’s rule-based and therefore requires a pre-defined set of categorized words. These words can, for example, be uploaded from the NLTK database. Moreover, sentiments are defined based on semantic relations and the frequency of each word in an input sentence that allows getting a more precise output as a result.
Once the first step is accomplished and a Python model is fed by the necessary input data, a user can obtain the sentiment scores in the form of polarity and subjectivity that were discussed in the previous section. We can see how this process works in this paper by Forum Kapadia:
TextBlob’s output for a polarity task is a float within the range [-1.0, 1.0]
where -1.0
is a negative polarity and 1.0
is positive. This score can also be equal to 0
, which stands for a neutral evaluation of a statement as it doesn’t contain any words from the training set.
Whereas, a subjectivity/objectivity identification task reports a float within the range [0.0, 1.0]
where 0.0
is a very objective sentence and 1.0
is very subjective.
There are various examples of Python interaction with TextBlob sentiment analyzer: starting from a model based on different Kaggle datasets (e.g. movie reviews) to calculating tweet sentiments through the Twitter API.
But, let’s look at a simple analyzer that we could apply to a particular sentence or a short text. We first start with importing the TextBlob library:
# Importing TextBlob
from textblob import TextBlob
Once imported, we'll load in a sentence for analysis and instantiate a TextBlob
object, as well as assigning the sentiment
property to our own analysis
:
# Preparing an input sentence
sentence = '''The platform provides universal access to the world's best education, partnering with top universities and organizations to offer courses online.'''
# Creating a textblob object and assigning the sentiment property
analysis = TextBlob(sentence).sentiment
print(analysis)
The sentiment
property is a namedtuple
of the form Sentiment(polarity, subjectivity)
.
Where the expected output of the analysis is:
Sentiment(polarity=0.5, subjectivity=0.26666666666666666)
Moreover, it’s also possible to go for polarity or subjectivity results separately by simply running the following:
from textblob import TextBlob
# Preparing an input sentence
sentence = '''The platform provides universal access to the world's best education, partnering with top universities and organizations to offer courses online.'''
analysisPol = TextBlob(sentence).polarity
analysisSub = TextBlob(sentence).subjectivity
print(analysisPol)
print(analysisSub)
Which would give us the output:
0.5
0.26666666666666666
One of the great things about TextBlob is that it allows the user to choose an algorithm for implementation of the high-level NLP tasks:
PatternAnalyzer
- a default classifier that is built on the pattern libraryNaiveBayesAnalyzer
- an NLTK model trained on a movie reviews corpus
To change the default settings, we'll simply specify a NaiveBayes
analyzer in the code. Let’s run sentiment analysis on tweets directly from Twitter:
from textblob import TextBlob
# For parsing tweets
import tweepy
# Importing the NaiveBayesAnalyzer classifier from NLTK
from textblob.sentiments import NaiveBayesAnalyzer
After that, we need to establish a connection with the Twitter API via API keys (that you can get through a developer account):
# Uploading api keys and tokens
api_key = 'XXXXXXXXXXXXXXX'
api_secret = 'XXXXXXXXXXXXXXX'
access_token = 'XXXXXXXXXXXXXXX'
access_secret = 'XXXXXXXXXXXXXXX'
# Establishing the connection
twitter = tweepy.OAuthHandler(api_key, api_secret)
api = tweepy.API(twitter)
Now, we can perform the analysis of tweets on any topic. A searched word (e.g. lockdown) can be both one word or more. Moreover, this task can be time-consuming due to a tremendous amount of tweets. It's recommended to limit the output:
# This command will call back 5 tweets within a “lockdown” topic
corpus_tweets = api.search("lockdown", count=5)
for tweet in corpus_tweets:
print(tweet.text)
The output of this last piece of code will bring back five tweets that mention your searched word in the following form:
RT@DhwaniPandya: How Asia's densest slum contained the virus and the economic catastrophe that stares at the hardworking slum population...
The last step in this example is switching the default model to the NLTK analyzer that returns its results as a namedtuple
of the form: Sentiment(classification, p_pos, p_neg)
:
# Applying the NaiveBayesAnalyzer
blob_object = TextBlob(tweet.text, analyzer=NaiveBayesAnalyzer())
# Running sentiment analysis
analysis = blob_object.sentiment
print(analysis)
Finally, our Python model will get us the following sentiment evaluation:
Sentiment(classification='pos', p_pos=0.5057908299783777, p_neg=0.49420917002162196)
Here, it's classified it as a positive sentiment, with the p_pos
and p_neg
values being ~0.5
each.
Conclusion
In this article, we've covered what Sentiment Analysis is, after which we've used the TextBlob library to perform Sentiment Analysis on imported sentences as well as tweets.
from Planet Python
via read more
No comments:
Post a Comment