Input Audio (.wav) │ Mel-Spectrogram (128 mel bins × ~1292 frames) │ ┌─────────────────────────────┐ │ CNN Block × 3 │ Conv2d → BatchNorm → ReLU → MaxPool │ Channels: [1→32→64→128] │ Extracts local ...