🎙️ Mega-ASR — robust ASR in your browser

INT4 ONNX of Mega-ASR (1.7B params) running entirely on your device via onnxruntime-web + WebGPU. First load fetches ~2 GB of model weights (cached by the browser for subsequent runs). Models hosted at Reza2kn/mega-asr-onnx.
Models · not loaded
Try a noisy example
Load the model, pick an audio clip, and hit Transcribe.
How agreement is computed

Hypothesis and reference are lowercased and stripped of punctuation. Word-level Levenshtein gives WER; agreement = max(0, 1 − WER) × 100%. Bands: ≥70% 50-70% 25-50% <25%.

About this demo
  • Loads three ONNX files (audio encoder + decoder prefill + decoder step) + the Qwen3 tokenizer + an embedding table — all directly from the HF Hub.
  • Audio is resampled to 16 kHz via the Web Audio API, then log-mel features (128 bins, Whisper-style) are extracted in pure JS.
  • WebGPU inference where available; falls back to WASM CPU.
  • First load downloads ~2 GB. Subsequent transcriptions reuse the browser cache.
  • Max audio per pass: 30 seconds (longer audio is truncated to the first 30 s).