SoundTab

Architecture

Final Year Project by Abdul Rafay Athar (Software Engineering)

Advanced Audio Processing
Architecture

A comprehensive deep dive into the machine learning pipeline that transforms raw audio signals into precise musical notation through multi-stage analysis.

Core Technology Stack

Supervised Learning Model

Custom-trained classification model that detects instrument types and validates guitar audio input before processing begins.

DeMucs Transformer

State-of-the-art source separation architecture using deep learning to isolate guitar tracks from vocals, drums, and bass in mixed audio.

Librosa & DSP

Advanced Digital Signal Processing (DSP) pipeline for Short-Time Fourier Transform (STFT), Constant-Q Transform, and spectral analysis.

Music21 Framework

Computer-aided musicology library for symbolic music representation, handling complex rhythm quantization and music theory rules.

Processing Pipeline Architecture

1

Input Analysis & Classification

The system first acts as a gatekeeper, analyzing the spectral characteristics of the uploaded audio.

Technical Process

  • Supervised Learning Classifier checks for guitar timbre
  • Validates file integrity and format (WAV/MP3)
  • Rejects non-musical or purely vocal inputs to ensure quality
  • Standardizes sample rate to 22.05kHz for consistent processing
2

Source Separation (DeMucs)

If the audio contains multiple instruments, we deploy the DeMucs Hybrid Transformer model.

Technical Process

  • Separates audio into 4 stems: Drums, Bass, Vocals, Other (Guitar)
  • Uses U-Net architecture with LSTM layers for temporal consistency
  • Frequency-domain masking to cleanly isolate the guitar track
  • Reconstructs the isolated guitar signal for pure analysis
3

Spectral Feature Extraction

The isolated signal undergoes rigorous mathematical transformation to reveal its musical properties.

Technical Process

  • Constant-Q Transform (CQT) maps frequencies to musical notes
  • Onset Strength Envelope detection identifies note attacks
  • Peak picking algorithms locate exact timing of each note
  • Harmonic-Percussive Source Separation (HPSS) refines note clarity
4

Pitch Tracking & Transcription

Converting raw frequency data into symbolic musical notation.

Technical Process

  • Viterbi algorithm smooths pitch estimation paths
  • Chroma feature analysis determines chord structures
  • Rhythm quantization aligns notes to the nearest musical beat
  • Dynamics processing estimates velocity and emphasis
5

Tablature Optimization & Rendering

The final stage maps musical notes to the physical constraints of a guitar fretboard.

Technical Process

  • Pathfinding algorithm minimizes hand movement distance
  • String preference logic avoids impossible fingerings
  • ReportLab engine draws vector-based PDF tablature
  • Embeds metadata and formatting for professional output

Performance Metrics

Our models are trained on thousands of hours of guitar data, but acoustic complexity varies.

System Accuracy & Constraints

Optimal Conditions
  • Clean Electric/Acoustic Guitar
  • Standard Tuning (EADGBE)
  • Moderate Tempo (60-120 BPM)
Known Challenges
  • Heavy Distortion / Fuzz effects
  • Complex Jazz Chords (>4 notes)
  • Extreme reverb or delay