Daily Python: Stack Abuse: Sentiment Analysis in Python With TextBlob

Introduction

State-of-the-art technologies in NLP allow us to analyze natural languages on different layers: from simple segmentation of textual information to more sophisticated methods of sentiment categorizations.

However, it does not inevitably mean that you should be highly advanced in programming to implement high-level tasks such as sentiment analysis in Python.

Sentiment Analysis

The algorithms of sentiment analysis mostly focus on defining opinions, attitudes, and even emoticons in a corpus of texts. The range of established sentiments significantly varies from one method to another. While a standard analyzer defines up to three basic polar emotions (positive, negative, neutral), the limit of more advanced models is broader.

Consequently, they can look beyond polarity and determine six "universal" emotions (e.g. anger, disgust, fear, happiness, sadness, and surprise):

Scale of Emotions
Source: Spectrum Mental Health

Moreover, depending on the task you're working on, it's also possible to collect extra information from the context such as the author or a topic that in further analysis can prevent a more complex issue than a common polarity classification - namely, subjectivity/objectivity identification.

For example, this sentence from Business insider: "In March, Elon Musk described concern over the coronavirus outbreak as a "panic" and "dumb," and he's since tweeted incorrect information, such as his theory that children are "essentially immune" to the virus." expresses subjectivity through a personal opinion of E. Musk, as well as the author of the text.

Sentiment Analysis in Python with TextBlob

The approach that the TextBlob package applies to sentiment analysis differs in that it’s rule-based and therefore requires a pre-defined set of categorized words. These words can, for example, be uploaded from the NLTK database. Moreover, sentiments are defined based on semantic relations and the frequency of each word in an input sentence that allows getting a more precise output as a result.

Once the first step is accomplished and a Python model is fed by the necessary input data, a user can obtain the sentiment scores in the form of polarity and subjectivity that were discussed in the previous section. We can see how this process works in this paper by Forum Kapadia:

TextBlob stream

TextBlob’s output for a polarity task is a float within the range [-1.0, 1.0] where -1.0 is a negative polarity and 1.0 is positive. This score can also be equal to 0, which stands for a neutral evaluation of a statement as it doesn’t contain any words from the training set.

Whereas, a subjectivity/objectivity identification task reports a float within the range [0.0, 1.0] where 0.0 is a very objective sentence and 1.0 is very subjective.

There are various examples of Python interaction with TextBlob sentiment analyzer: starting from a model based on different Kaggle datasets (e.g. movie reviews) to calculating tweet sentiments through the Twitter API.

But, let’s look at a simple analyzer that we could apply to a particular sentence or a short text. We first start with importing the TextBlob library:

# Importing TextBlob
from textblob import TextBlob

Once imported, we'll load in a sentence for analysis and instantiate a TextBlob object, as well as assigning the sentiment property to our own analysis:

# Preparing an input sentence
sentence = '''The platform provides universal access to the world's best education, partnering with top universities and organizations to offer courses online.'''

# Creating a textblob object and assigning the sentiment property
analysis = TextBlob(sentence).sentiment
print(analysis)

The sentiment property is a namedtuple of the form Sentiment(polarity, subjectivity).

Where the expected output of the analysis is:

Sentiment(polarity=0.5, subjectivity=0.26666666666666666)

Moreover, it’s also possible to go for polarity or subjectivity results separately by simply running the following:

from textblob import TextBlob

# Preparing an input sentence
sentence = '''The platform provides universal access to the world's best education, partnering with top universities and organizations to offer courses online.'''

analysisPol = TextBlob(sentence).polarity
analysisSub = TextBlob(sentence).subjectivity

print(analysisPol)
print(analysisSub)

Which would give us the output:

0.5
0.26666666666666666

One of the great things about TextBlob is that it allows the user to choose an algorithm for implementation of the high-level NLP tasks:

PatternAnalyzer - a default classifier that is built on the pattern library
NaiveBayesAnalyzer - an NLTK model trained on a movie reviews corpus

To change the default settings, we'll simply specify a NaiveBayes analyzer in the code. Let’s run sentiment analysis on tweets directly from Twitter:

from textblob import TextBlob
# For parsing tweets
import tweepy 

# Importing the NaiveBayesAnalyzer classifier from NLTK
from textblob.sentiments import NaiveBayesAnalyzer

After that, we need to establish a connection with the Twitter API via API keys (that you can get through a developer account):

# Uploading api keys and tokens
api_key = 'XXXXXXXXXXXXXXX'
api_secret = 'XXXXXXXXXXXXXXX'
access_token = 'XXXXXXXXXXXXXXX'
access_secret = 'XXXXXXXXXXXXXXX'

# Establishing the connection
twitter = tweepy.OAuthHandler(api_key, api_secret)
api = tweepy.API(twitter)

Now, we can perform the analysis of tweets on any topic. A searched word (e.g. lockdown) can be both one word or more. Moreover, this task can be time-consuming due to a tremendous amount of tweets. It's recommended to limit the output:

# This command will call back 5 tweets within a “lockdown” topic
corpus_tweets = api.search("lockdown", count=5) 
for tweet in corpus_tweets:
    print(tweet.text)

The output of this last piece of code will bring back five tweets that mention your searched word in the following form:

RT@DhwaniPandya: How Asia's densest slum contained the virus and the economic catastrophe that stares at the hardworking slum population...

The last step in this example is switching the default model to the NLTK analyzer that returns its results as a namedtuple of the form: Sentiment(classification, p_pos, p_neg):

# Applying the NaiveBayesAnalyzer
blob_object = TextBlob(tweet.text, analyzer=NaiveBayesAnalyzer())
# Running sentiment analysis
analysis = blob_object.sentiment
print(analysis)

Finally, our Python model will get us the following sentiment evaluation:

Sentiment(classification='pos', p_pos=0.5057908299783777, p_neg=0.49420917002162196)

Here, it's classified it as a positive sentiment, with the p_pos and p_neg values being ~0.5 each.

Conclusion

In this article, we've covered what Sentiment Analysis is, after which we've used the TextBlob library to perform Sentiment Analysis on imported sentences as well as tweets.

from Planet Python
via read more

Daily Python

Friday, October 23, 2020

Stack Abuse: Sentiment Analysis in Python With TextBlob

Introduction

Sentiment Analysis

Sentiment Analysis in Python with TextBlob

Conclusion

No comments:

Post a Comment

TestDriven.io: Working with Static and Media Files in Django

Search This Blog