Smart-Home Audio Keyword-Spotting

Built an embedded keyword-spotting system on Raspberry Pi Zero 2 W that recognizes voice commands to control RGB LEDs and LCD display using LSTM/CNN models trained on MFCC features.

View on GitHubSep 2025 - Dec 2025

Technologies

PythonPyTorchTensorFlowRaspberry PiEmbedded SystemsMachine LearningAudio ProcessingMFCCCNNLSTMGPIOI²Clibrosa

Overview

An embedded keyword-spotting system that controls peripherals based on voice commands. This project implements a multi-class audio classification system on a Raspberry Pi Zero 2 W microcontroller, creating a prototype smart-home device capable of recognizing voice commands and executing them with peripheral components.

The system recognizes 9 classes: "Red", "Green", "Blue", "White", "Off" (for RGB LED control), "Time" and "Temperature" (for LCD display), plus "Noise" and "Unknown Command" for robustness. The color keywords control an RGB LED, while "Time" and "Temperature" display current information on an LCD screen using an RTC chip and temperature sensor.

The project involved a complete machine learning pipeline: recording audio data from multiple speakers, chopping utterances, applying data augmentation (low-pass filters, high-pass filters, band-pass filters, pitch-shifting, noise addition, dynamic compression), extracting Mel-Frequency Cepstral Coefficients (MFCCs) for feature extraction, training CNN/LSTM models in PyTorch/TensorFlow, and compressing the model for deployment on the Raspberry Pi.

The inference system captures 5-second audio buffers, trims to optimal length (~1.8s), extracts MFCCs, and feeds them to the model for real-time classification. The system achieved high accuracy (99.79% validation, ~98.5% test) with the CNN model.

Project Context

This project was developed as part of a Machine Learning course (EE 475) at Northwestern University. The goal was to create an embedded keyword-spotting system that could recognize voice commands and control hardware peripherals in real-time.

Key Features

Real-time audio keyword recognition on Raspberry Pi Zero 2 W
9-class classification: RGB colors (Red, Green, Blue, White, Off), LCD commands (Time, Temperature), and noise/unknown
Hardware integration: RGB LED control, LCD display, RTC for time, temperature sensor
Complete ML pipeline: data collection, augmentation, feature extraction (MFCCs), model training, and deployment
Data augmentation techniques: LP/HP/BP filters, pitch-shifting, noise addition, dynamic compression
High accuracy: 99.79% validation accuracy, ~98.5% test accuracy with CNN model
Real-time inference with 5s buffer capture, trimmed to ~1.8s for optimal model input