
About the project:

This project predicts arousal — a measure of emotional intensity — from audio-derived features using machine learning models. It further explains the predictions through SHAP (SHapley Additive exPlanations), highlighting the relative importance of different features. The goal is to build an AI model targeted at arousal detection and to provide transparent insights into how feature contributions affect model decisions.
Features
>Used correlation heatmaps and scatter plots to explore feature relationships, and selected the top 10 features based on Pearson correlation.
Predicts emotional arousal levels using machine learning models (Random Forest, XGBoost, MLP Regressor).
Applies GridSearchCV for hyperparameter tuning to improve predictive performance.
Uses SHAP to provide feature importance and explain model decisions.
Visualizes prediction outputs and feature contributions for transparent interpretation.
Features
>What I Built
>- ▹ Conducted exploratory data analysis (EDA) on audio-derived features using correlation heatmaps and scatter plots to identify top 10 key features of emotional arousal
- ▹ Built and compared four regression models (Linear Regression, Random Forest, XGBoost, MLP); selected MLP as final model based on R² score
- ▹ Tuned hyperparameters using GridSearchCV; observed limited gains on small dataset, highlighting the importance of model simplicity and data size
- ▹ Applied SHAP (SHapley Additive Explanations) to interpret MLP predictions and identify the most influential features affecting arousal scores
What I Built
>Tech Stack
/>Python is a high-level, versatile programming language known for its simplicity and readability, widely used in data science, AI, web development, and beyond.
Scikit-learn is a popular open-source Python library that provides simple and efficient tools for machine learning, including classification, regression, clustering, and model evaluation.
Google Colab is a free cloud-based platform that lets you write and run Python code in Jupyter notebooks, with built-in support for machine learning libraries and free GPU/TPU access.
XGBoost is an open-source software library that implements optimized distributed gradient boosting machine learning algorithms under the Gradient Boosting framework.
SHAP (SHapley Additive exPlanations) is a game theoretic approach to explain the output of any machine learning model. It connects optimal credit allocation with local explanations using the classic Shapley values from game theory and their related extensions.
The Google Drive API enables developers to create applications that interact with Google Drive's cloud storage. This API allows for programmatic access to manage files and folders within a user's Google Drive.