Machine Learning ::

📦 Import Libraries⌗

import numpy as np
import pandas as pd
import sklearn as skl
import matplotlib.pyplot as plt

numpy, pandas: Data manipulation
sklearn: ML tools like vectorizers and train/test split
matplotlib: Visualization (optional)

📚 NLP Preprocessing Setup⌗

import re
import nltk
nltk.download('stopwords')
from nltk.corpus import stopwords
from nltk.stem.porter import PorterStemmer

re: Regex for text cleaning
nltk: Tokenization, stopwords, and stemming

🚫 Custom Stopwords⌗

negatives = ['no', 'nor', 'not', "don't", ...]
all_stopwords = [w for w in stopwords.words('english') if w not in negatives]

Retains negative words like “not”, which are important for sentiment.

📄 Load Dataset⌗

df = pd.read_csv('../Data/review_dataset.tsv', sep="\t")
X = df['Review'].values
Y = df['Liked'].values

X = text reviews
Y = labels (0 = negative, 1 = positive)

🧹 Text Preprocessing Function⌗

def preprocessing(data):
    corpus = []
    for sen in data:
        clean = re.sub('[^a-zA-Z ]', '', sen.lower())
        tokens = [w for w in clean.split() if w not in all_stopwords]
        stemmer = PorterStemmer()
        stemmed = [stemmer.stem(w) for w in tokens]
        corpus.append(" ".join(stemmed))
    return corpus

Lowercase
Remove punctuation
Remove stopwords (except negatives)
Stem each word

✍️ Vectorization (Bag of Words)⌗

from sklearn.feature_extraction.text import CountVectorizer
cv = CountVectorizer()
X_cv_data = cv.fit_transform(corpus).toarray()

Converts text to numeric format
Each word is a feature (column)

🔁 Convert to PyTorch Tensors⌗

X = torch.tensor(X_cv_data, dtype=torch.float32)
Y = torch.tensor(Y, dtype=torch.long)

🎲 Train-Test Split⌗

from sklearn.model_selection import train_test_split
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2)

🧠 Define the Model⌗

class SentimentAnalysis(nn.Module):
    def __init__(self, input_dim):
        super().__init__()
        self.l1 = nn.Linear(input_dim, 60)
        self.l2 = nn.Linear(60, 10)
        self.l3 = nn.Linear(10, 2)

    def forward(self, x):
        x = torch.relu(self.l1(x))
        x = torch.relu(self.l2(x))
        return self.l3(x)

⚙️ Model Setup⌗

model = SentimentAnalysis(X_train.shape[1])
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.01)

🏋️ Training the Model⌗

for epoch in range(50):
    model.train()
    total_loss = 0
    for i in range(0, len(X_train), 20):
        batch_X = X_train[i:i+20]
        batch_Y = Y_train[i:i+20]

        optimizer.zero_grad()
        outputs = model(batch_X)
        loss = criterion(outputs, batch_Y)
        loss.backward()
        optimizer.step()
        total_loss += loss.item()

    print(f"Epoch {epoch+1}, Loss: {total_loss:.4f}")

📊 Model Evaluation⌗

model.eval()
with torch.no_grad():
    val_output = model(X_test)
    _, predicted = torch.max(val_output.data, 1)
    accuracy = (predicted == Y_test).float().mean()
    print(f'Validation Accuracy: {accuracy:.4f}')

🔍 Predict New Sentences⌗

def call_model(sen):
    print("Sentence:", sen)
    clean = preprocessing([sen])
    sent = cv.transform(clean).toarray()
    tensor_input = torch.tensor(sent, dtype=torch.float32)
    pred = model(tensor_input)
    predicted_class = torch.argmax(pred.data, dim=1)
    print("Predicted Class:", "positive" if predicted_class == 1 else "negative")

Use this function to classify new reviews with the trained model.

Machine Learning

Table of Contents

📦 Import Libraries⌗

📚 NLP Preprocessing Setup⌗

🚫 Custom Stopwords⌗

📄 Load Dataset⌗

🧹 Text Preprocessing Function⌗

✍️ Vectorization (Bag of Words)⌗

🔁 Convert to PyTorch Tensors⌗

🎲 Train-Test Split⌗

🧠 Define the Model⌗

⚙️ Model Setup⌗

🏋️ Training the Model⌗

📊 Model Evaluation⌗

🔍 Predict New Sentences⌗