Week 05 Progress Report by Mebin J Thattil

~ Interesting demos this week!
These blogs can be found in the SugarLabs website. The content is the same in both places.~

Project: Speak Activity
Mentors: Chihurumnaya Ibiam, Kshitij Shah
Assisting Mentors: Walter Bender, Devin Ulibarri
Reporting Period: 2025-06-29 - 2025-07-06

Goals for This Week

This Week’s Progress

1. Hey Kokoro, you sound different today

This week, I tested out different voices of Kokoro in two different ways:

  1. I tested them inside Speak, within Sugar, and it worked. It still uses the hacky way of creating a temporary WAV file and then playing it via GStreamer, but it works. Streaming will be introduced soon.

    Under-the-hood changes:

    Text → Kokoro → handle phonemes via G2P engine → Misaki (primary G2P) → fallback → Espeak-ng

  2. I deployed a web app that lets you generate and mix audio. You can try it out here.

    UI of web app

from openai import OpenAI

client = OpenAI(
    base_url="http://my_kokoro_backend:8880/v1", api_key="not-needed"
)

with client.audio.speech.with_streaming_response.create(
    model="kokoro",
    voice="af_sky+af_bella",  # single or multiple voicepack combo
    input="Hello world!"
) as response:
    response.stream_to_file("output.mp3")

Understanding and playing with Kokoro:

Links:

2. New brains for Speak

But...

Next Week’s Roadmap

Acknowledgments

Thank you to my mentors, the Sugar Labs community, and fellow GSoC contributors for their ongoing support.




Powered by Not An SSG 😎