🎙️ Mega-ASR — robust ASR in your browser

INT4 ONNX of Mega-ASR (1.7B params) running entirely on your device via onnxruntime-web + WebGPU. First load fetches ~2 GB of model weights (cached by the browser for subsequent runs). Models hosted at Reza2kn/mega-asr-onnx.

Models · not loaded

Audio (any format)

Force language (auto-detect can fail at INT4) Reference transcript (optional)

Try a noisy example

Result

Load the model, pick an audio clip, and hit Transcribe.

How agreement is computed

Hypothesis and reference are lowercased and stripped of punctuation. Word-level Levenshtein gives WER; agreement = max(0, 1 − WER) × 100%. Bands: ≥70% 50-70% 25-50% <25%.

About this demo

Loads three ONNX files (audio encoder + decoder prefill + decoder step) + the Qwen3 tokenizer + an embedding table — all directly from the HF Hub.
Audio is resampled to 16 kHz via the Web Audio API, then log-mel features (128 bins, Whisper-style) are extracted in pure JS.
WebGPU inference where available; falls back to WASM CPU.
First load downloads ~2 GB. Subsequent transcriptions reuse the browser cache.
Max audio per pass: 30 seconds (longer audio is truncated to the first 30 s).