
About the project:

Team project (4). I led NLP modeling and the inference API; modeling and fine-tuned LSTM, and shipped a full-stack demo. I owned partial data cleaning; trained and fine-tuned an LSTM; built React UI (logo, input flow, animations); implemented Flask inference APIs (model loading & tokenization); deployed on Heroku (backend) and Vercel (frontend) with cost-efficient external model storage via Google Drive.
Features
>Users can submit any political social media post; predictions are not limited to the training dataset.
Classifies posts as Neutral vs. Partisan using a fine-tuned BERT model.
Predicts one of: Attack, Constituency, Information, Media, Mobilization, Personal, Policy, Support.
Removes URLs, emojis, HTML tags, and special characters via regex & NLP; includes tokenization and TF-IDF.
Trained and fine-tuned LSTM, GPT-2, and BERT; the production demo uses BERT for best performance.
Trained on the 2015 Crowdflower Political Social Media Posts dataset and generalizes to unseen inputs.
React frontend integrated with Flask APIs for real-time inference and results visualization.
Features
>What I Built
>- ▹ Preprocessed textual data by removing extraneous characters, tokenizing, and generating sequences for model training
- ▹ Developed and trained a bidirectional Long Short-Term Memory (Bi-LSTM) model using TensorFlow and Keras API, optimizing performance based on F1-score and saving the best model weights
- ▹ Built a full-stack web application with React.js (frontend) and Flask (backend) to classify political media posts
- ▹ Designed a responsive and interactive user interface using JavaScript, CSS, and Bootstrap, allowing users to input political posts and view classification results dynamically
- ▹ Implemented API integration to send user input to the backend for processing and retrieve real-time classification results
What I Built
>Tech Stack
/>The Google Drive API enables developers to create applications that interact with Google Drive's cloud storage. This API allows for programmatic access to manage files and folders within a user's Google Drive.
Figma is a cloud-based design and prototyping tool for creating user interfaces for digital products like websites and apps, emphasizing real-time collaboration for teams. Key features include design, prototyping, and design system management.
Python is a high-level, versatile programming language known for its simplicity and readability, widely used in data science, AI, web development, and beyond.
Scikit-learn is a popular open-source Python library that provides simple and efficient tools for machine learning, including classification, regression, clustering, and model evaluation.
Google Colab is a free cloud-based platform that lets you write and run Python code in Jupyter notebooks, with built-in support for machine learning libraries and free GPU/TPU access.
HuggingFace Transformers acts as the model-definition framework for state-of-the-art machine learning models in text, computer vision, audio, video, and multimodal model, for both inference and training.
spaCy is an open-source library for advanced Natural Language Processing (NLP) in Python.
JavaScript library for building user interfaces with reusable components.
Flask is a lightweight WSGI web application framework. It is designed to make getting started quick and easy, with the ability to scale up to complex applications.
HyperText Markup Language for creating the structure of web pages.
Cascading Style Sheets for styling the presentation of HTML documents.
Vercel is a cloud platform that provides the tools and infrastructure for developers to build, deploy, and scale modern web applications, focusing on speed, developer experience, and global distribution.
Heroku is a cloud Platform as a Service (PaaS) that enables developers to build, run, and manage modern applications in the cloud without needing to manage the underlying infrastructure.