NLP Analysis on Subreddit Polarization

Sean McHale
Team: 4 members
Status: Complete
30 Apr 20245 min read
Tags:
Python
Fine-tuning
Big Data
LangChain
NLTK
scikit-learn

This project analyzes online polarization surrounding the Israel-Palestine conflict by leveraging data from Reddit. LDA and BERTopic models were employed to categorize posts into key topics such as conflict violence and geopolitical discourse, followed by fine-tuning a Hugging Face XLNet model to classify polarization.