Saturday, November 6, 2021

Tweet Sentiment Analysis Using LSTM With PyTorch

Tweet Sentiment Analysis Using LSTM With PyTorch

We will go through a common case study (sentiment analysis) to explore many techniques and patterns in Natural Language Processing.

Overview:

  • Imports and Data Loading
  • Data Preprocessing
    • Null Value Removal
    • Class Balance
  • Tokenization
  • Embeddings
  • LSTM Model Building
  • Setup and Training
  • Evaluation
Imports and Data Loading
In [81]:
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import DataLoader, TensorDataset


import numpy as np
import pandas as pd

import re

from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

import nltk
from nltk.tokenize import word_tokenize

import matplotlib.pyplot as plt
In [4]:
nltk.download('punkt')
[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
Out[4]:
True

This dataset can be found on Github in this repo: https://github.com/ajayshewale/Sentiment-Analysis-of-Text-Data-Tweets-

It is a sentiment analysis dataset comprised of 2 files:

  • train.csv, 5971 tweets
  • test.csv, 4000 tweets

The tweets are labeled as:

  • Positive
  • Neutral
  • Negative

Other datasets have different or more labels, but the same concepts apply to preprocessing and training. Download the files and store them locally.

In [7]:
train_path = "train.csv"
test_path = "test.csv"

Before working with PyTorch, make sure to set the

(continued...)

from Planet SciPy
read more

No comments:

Post a Comment

TestDriven.io: Working with Static and Media Files in Django

This article looks at how to work with static and media files in a Django project, locally and in production. from Planet Python via read...