Supertonic 2 — Lightning Fast, On-Device TTS, Multilingual TTS

Supertonic Preview

Demo Code

Supertonic is a lightning-fast, on-device text-to-speech system designed for extreme performance with minimal computational overhead. Powered by ONNX Runtime, it runs entirely on your device—no cloud, no API calls, no privacy concerns.

What's New in Supertonic 2

Supertonic 2 extends multilingual capabilities while maintaining the same inference speed and efficiency as the original.

🌍 Multilingual Support

Language Code
English en
Korean ko
Spanish es
Portuguese pt
French fr

⚡ Same Speed, More Languages

  • No speed degradation: Supertonic 2 delivers the same ultra-fast inference speed as the original—up to 167× faster than real-time
  • Efficient architecture: Only 66M parameters, optimized for on-device deployment
  • Cross-language consistency: All supported languages share the same model architecture and inference pipeline

Performance

We evaluated Supertonic's performance (with 2 inference steps) using two key metrics across input texts of varying lengths: Short (59 chars), Mid (152 chars), and Long (266 chars).

Metrics:

  • Characters per Second: Measures throughput by dividing the number of input characters by the time required to generate audio. Higher is better.
  • Real-time Factor (RTF): Measures the time taken to synthesize audio relative to its duration. Lower is better (e.g., RTF of 0.1 means it takes 0.1 seconds to generate one second of audio).

Characters per Second

System Short (59 chars) Mid (152 chars) Long (266 chars)
Supertonic (M4 pro - CPU) 912 1048 1263
Supertonic (M4 pro - WebGPU) 996 1801 2509
Supertonic (RTX4090) 2615 6548 12164
API ElevenLabs Flash v2.5 144 209 287
API OpenAI TTS-1 37 55 82
API Gemini 2.5 Flash TTS 12 18 24
API Supertone Sona speech 1 38 64 92
Open Kokoro 104 107 117
Open NeuTTS Air 37 42 47

Notes:
API = Cloud-based API services (measured from Seoul)
Open = Open-source models
Supertonic (M4 pro - CPU) and (M4 pro - WebGPU): Tested with ONNX
Supertonic (RTX4090): Tested with PyTorch model
Kokoro: Tested on M4 Pro CPU with ONNX
NeuTTS Air: Tested on M4 Pro CPU with Q8-GGUF

Real-time Factor

System Short (59 chars) Mid (152 chars) Long (266 chars)
Supertonic (M4 pro - CPU) 0.015 0.013 0.012
Supertonic (M4 pro - WebGPU) 0.014 0.007 0.006
Supertonic (RTX4090) 0.005 0.002 0.001
API ElevenLabs Flash v2.5 0.133 0.077 0.057
API OpenAI TTS-1 0.471 0.302 0.201
API Gemini 2.5 Flash TTS 1.060 0.673 0.541
API Supertone Sona speech 1 0.372 0.206 0.163
Open Kokoro 0.144 0.124 0.126
Open NeuTTS Air 0.390 0.338 0.343
Additional Performance Data (5-step inference)

Characters per Second (5-step)

System Short (59 chars) Mid (152 chars) Long (266 chars)
Supertonic (M4 pro - CPU) 596 691 850
Supertonic (M4 pro - WebGPU) 570 1118 1546
Supertonic (RTX4090) 1286 3757 6242

Real-time Factor (5-step)

System Short (59 chars) Mid (152 chars) Long (266 chars)
Supertonic (M4 pro - CPU) 0.023 0.019 0.018
Supertonic (M4 pro - WebGPU) 0.024 0.012 0.010
Supertonic (RTX4090) 0.011 0.004 0.002

License

This project’s sample code is released under the MIT License. - see the LICENSE for details.

The accompanying model is released under the OpenRAIL-M License. - see the LICENSE file for details.

This model was trained using PyTorch, which is licensed under the BSD 3-Clause License but is not redistributed with this project. - see the LICENSE for details.

Copyright (c) 2026 Supertone Inc.

Downloads last month
1,332
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Supertone/supertonic-2

Quantizations
1 model

Spaces using Supertone/supertonic-2 3