Hey folks, for anyone who found this useful, I just posted a follow-up using Piper instead of Kokoro:
Same single-file, fully-local idea. Lower-quality voices than Kokoro, but small and fast enough to actually work on phones, and with ~80 voices across ~30 languages. Different niche: Kokoro for hero characters, Piper for “the guard says hello” five hundred times a level.