We will go through a common case study (sentiment analysis) to explore many techniques and patterns in Natural Language Processing.
Overview:
- Imports and Data Loading
- Data Preprocessing
- Null Value Removal
- Class Balance
- Tokenization
- Embeddings
- LSTM Model Building
- Setup and Training
- Evaluation
In [81]:
import torch import torch.nn as nn import torch.nn.functional as F from torch.utils.data import DataLoader, TensorDataset import numpy as np import pandas as pd import re from sklearn.model_selection import train_test_split from sklearn.metrics import accuracy_score import nltk from nltk.tokenize import word_tokenize import matplotlib.pyplot as plt
In [4]:
nltk.download('punkt')
[nltk_data] Downloading package punkt to /root/nltk_data... [nltk_data] Unzipping tokenizers/punkt.zip.
Out[4]:
True
This dataset can be found on Github in this repo: https://github.com/ajayshewale/Sentiment-Analysis-of-Text-Data-Tweets-
It is a sentiment analysis dataset comprised of 2 files:
- train.csv, 5971 tweets
- test.csv, 4000 tweets
The tweets are labeled as:
- Positive
- Neutral
- Negative
Other datasets have different or more labels, but the same concepts apply to preprocessing and training. Download the files and store them locally.
In [7]:
train_path = "train.csv" test_path = "test.csv"
Before working with PyTorch, make sure to set the
from Planet SciPy
read more
No comments:
Post a Comment