Browser-Local Piper TTS: Smaller, Faster, Built for Mobile (with a Romanian-accent demo)

red-reddington · May 17, 2026, 12:12am

[Following up on my Kokoro TTS demo. Same idea, different engine.]

Kokoro on desktop is fantastic, but on phones not so much. I needed something small and fast enough to run anywhere, and Piper fits that brief. Quality isn’t Kokoro-tier, but when latency matters more than fidelity, it’s the better tool. Additionally, Piper has much broader multi-language support, with ~80 voices across ~30 languages.

A project I’m working on has Eastern European characters who need to sound the part. Surprisingly hard with current browser TTS. Voices are either fully native or fully neutral, nothing in between. You can’t tell an English voice “read this with a Romanian accent”…

So my workaround is: run English through a phonemizer to get IPA, map the IPA into Romanian orthography, then feed that into Piper’s Romanian voice. It produces English with a thick Romanian accent, exactly what a voice actor would do, and exactly what I hoped and needed.

The demo also speaks the original English with an English voice.

In both cases, it has active-word highlighting synced across both textboxes, handy for debugging the IPA mapping, and makes the whole thing easier to comprehend.

Stack:

phonemize (npm) converts English words to IPA in the browser
A short custom function maps each IPA symbol to its closest Romanian letter e.g., ʃ becomes ș, tʃ becomes ci. About 40 lines of regex.
@diffusionstudio/vits-web runs the Piper voice model
For word-by-word highlighting I needed to know when each word is spoken, but Piper just hands back a finished WAV file with no timing data. So I resort to heuristics… not perfect, but close enough that the highlight feels in sync.

Gotchas if you go down this road:

I deliberately constrained myself to a single HTML file. No build step, no bundler, no npm install, just open it in a browser. That’s how the Kokoro demo works, and I wanted to keep things consistent. Most of the friction below comes from that constraint: when you load packages straight from a CDN, you’re at the mercy of how that CDN happens to bundle them.

The default esm.sh and @mintplex-labs/piper-tts-web paths both break in different ways (unenv polyfill errors, missing WASM files). Loading @diffusionstudio/vits-web via jsdelivr’s +esm endpoint sidesteps both.
iOS needs the standard audio-unlock dance on first user gesture, and it’ll kill the tab if you try to synthesize a long passage in one go. The fix is to chunk the text into sentences, synthesize each separately, and reuse a single <audio> element for playback (a fresh Audio() per chunk loses iOS’s gesture grace and .play() gets refused).

CodePen: https://codepen.io/the-red-reddington/full/VYmmdGp

GitHub Pages: https://red-reddington.github.io/web-demos/browser-local-tts-piper

Source: https://github.com/red-reddington/web-demos/blob/67601f3a02c20fe90ec8c509563cf6971cc9b51f/browser-local-tts-piper/index.html

Topic		Replies	Views
Local Browser TTS with Kokoro for NPC Dialogue and Narration Resources audio , webgpu , ai , gamedev , tts	18	221	May 17, 2026
Polyglot - synthetic audio generation using google wavenet, three.js, blender, and spring boot Showcase animation , audio , web-audio	2	1126	November 2, 2021
Voices, lip and mouth movements Showcase animation	7	875	November 13, 2023
Web Speech API + Three.js Showcase	0	734	June 11, 2022
Talking chatbot with mobile compatibility Showcase	1	1155	January 22, 2021

Browser-Local Piper TTS: Smaller, Faster, Built for Mobile (with a Romanian-accent demo)

Related topics