Final Year Project by Abdul Rafay Athar (Software Engineering)
Advanced Audio Processing
Architecture
A comprehensive deep dive into the machine learning pipeline that transforms raw audio signals into precise musical notation through multi-stage analysis.
Core Technology Stack
Supervised Learning Model
Custom-trained classification model that detects instrument types and validates guitar audio input before processing begins.
DeMucs Transformer
State-of-the-art source separation architecture using deep learning to isolate guitar tracks from vocals, drums, and bass in mixed audio.
Librosa & DSP
Advanced Digital Signal Processing (DSP) pipeline for Short-Time Fourier Transform (STFT), Constant-Q Transform, and spectral analysis.
Music21 Framework
Computer-aided musicology library for symbolic music representation, handling complex rhythm quantization and music theory rules.
Processing Pipeline Architecture
Input Analysis & Classification
The system first acts as a gatekeeper, analyzing the spectral characteristics of the uploaded audio.
Technical Process
- Supervised Learning Classifier checks for guitar timbre
- Validates file integrity and format (WAV/MP3)
- Rejects non-musical or purely vocal inputs to ensure quality
- Standardizes sample rate to 22.05kHz for consistent processing
Source Separation (DeMucs)
If the audio contains multiple instruments, we deploy the DeMucs Hybrid Transformer model.
Technical Process
- Separates audio into 4 stems: Drums, Bass, Vocals, Other (Guitar)
- Uses U-Net architecture with LSTM layers for temporal consistency
- Frequency-domain masking to cleanly isolate the guitar track
- Reconstructs the isolated guitar signal for pure analysis
Spectral Feature Extraction
The isolated signal undergoes rigorous mathematical transformation to reveal its musical properties.
Technical Process
- Constant-Q Transform (CQT) maps frequencies to musical notes
- Onset Strength Envelope detection identifies note attacks
- Peak picking algorithms locate exact timing of each note
- Harmonic-Percussive Source Separation (HPSS) refines note clarity
Pitch Tracking & Transcription
Converting raw frequency data into symbolic musical notation.
Technical Process
- Viterbi algorithm smooths pitch estimation paths
- Chroma feature analysis determines chord structures
- Rhythm quantization aligns notes to the nearest musical beat
- Dynamics processing estimates velocity and emphasis
Tablature Optimization & Rendering
The final stage maps musical notes to the physical constraints of a guitar fretboard.
Technical Process
- Pathfinding algorithm minimizes hand movement distance
- String preference logic avoids impossible fingerings
- ReportLab engine draws vector-based PDF tablature
- Embeds metadata and formatting for professional output
Performance Metrics
Our models are trained on thousands of hours of guitar data, but acoustic complexity varies.
System Accuracy & Constraints
Optimal Conditions
- Clean Electric/Acoustic Guitar
- Standard Tuning (EADGBE)
- Moderate Tempo (60-120 BPM)
Known Challenges
- Heavy Distortion / Fuzz effects
- Complex Jazz Chords (>4 notes)
- Extreme reverb or delay