Final Year Project by Abdul Rafay Athar (Software Engineering)

Advanced Audio Processing
Architecture

A comprehensive deep dive into the machine learning pipeline that transforms raw audio signals into precise musical notation through multi-stage analysis.

Core Technology Stack

Supervised Learning Model

Custom-trained classification model that detects instrument types and validates guitar audio input before processing begins.

DeMucs Transformer

State-of-the-art source separation architecture using deep learning to isolate guitar tracks from vocals, drums, and bass in mixed audio.

Librosa & DSP

Advanced Digital Signal Processing (DSP) pipeline for Short-Time Fourier Transform (STFT), Constant-Q Transform, and spectral analysis.

Music21 Framework

Computer-aided musicology library for symbolic music representation, handling complex rhythm quantization and music theory rules.

Processing Pipeline Architecture

Input Analysis & Classification

The system first acts as a gatekeeper, analyzing the spectral characteristics of the uploaded audio.

Technical Process

Supervised Learning Classifier checks for guitar timbre
Validates file integrity and format (WAV/MP3)
Rejects non-musical or purely vocal inputs to ensure quality
Standardizes sample rate to 22.05kHz for consistent processing

Source Separation (DeMucs)

If the audio contains multiple instruments, we deploy the DeMucs Hybrid Transformer model.

Technical Process

Separates audio into 4 stems: Drums, Bass, Vocals, Other (Guitar)
Uses U-Net architecture with LSTM layers for temporal consistency
Frequency-domain masking to cleanly isolate the guitar track
Reconstructs the isolated guitar signal for pure analysis

Spectral Feature Extraction

The isolated signal undergoes rigorous mathematical transformation to reveal its musical properties.

Technical Process

Constant-Q Transform (CQT) maps frequencies to musical notes
Onset Strength Envelope detection identifies note attacks
Peak picking algorithms locate exact timing of each note
Harmonic-Percussive Source Separation (HPSS) refines note clarity

Pitch Tracking & Transcription

Converting raw frequency data into symbolic musical notation.

Technical Process

Viterbi algorithm smooths pitch estimation paths
Chroma feature analysis determines chord structures
Rhythm quantization aligns notes to the nearest musical beat
Dynamics processing estimates velocity and emphasis

Tablature Optimization & Rendering

The final stage maps musical notes to the physical constraints of a guitar fretboard.

Technical Process

Pathfinding algorithm minimizes hand movement distance
String preference logic avoids impossible fingerings
ReportLab engine draws vector-based PDF tablature
Embeds metadata and formatting for professional output

Performance Metrics

Our models are trained on thousands of hours of guitar data, but acoustic complexity varies.

SoundTab

Advanced Audio Processing
Architecture

Core Technology Stack

Supervised Learning Model

DeMucs Transformer

Librosa & DSP

Music21 Framework

Processing Pipeline Architecture

Input Analysis & Classification

Technical Process

Source Separation (DeMucs)

Technical Process

Spectral Feature Extraction

Technical Process

Pitch Tracking & Transcription

Technical Process

Tablature Optimization & Rendering

Technical Process

Performance Metrics

System Accuracy & Constraints

Optimal Conditions

Known Challenges

Advanced Audio Processing Architecture

Core Technology Stack

Supervised Learning Model

DeMucs Transformer

Librosa & DSP

Music21 Framework

Processing Pipeline Architecture

Input Analysis & Classification

Technical Process

Source Separation (DeMucs)

Technical Process

Spectral Feature Extraction

Technical Process

Pitch Tracking & Transcription

Technical Process

Tablature Optimization & Rendering

Technical Process

Performance Metrics

System Accuracy & Constraints

Optimal Conditions

Known Challenges

Advanced Audio Processing
Architecture