Vani Sankalan

From Seeta

Jump to: navigation, search


[edit] Vani Sankalan - Speech Synthesis System

"Vani" means "speech" in Hindi and "Sankalan" means "synthesis". This capability can be very effectively and creatively used in the Sugar environment for several applications such as development of language learning activities, story book readers, listening and spelling games, and improving the accessibility of the XO for low-vision/blind students. For this, a system wide speech synthesis framework needs to be integrated into Sugar. The framework will then be used for performing speech synthesis activities from within Sugarized Activities. The integration of speech synthesis technology will open up many areas of interesting activity development which would further improve the learning experience of the student who uses the XO.

[edit] Objectives

The objectives of this project will be two fold:

For the End User:

  • The ability to highlight text in the Sugar User Interface, and have it converted into speech by pressing a button provided in the User interface ("The Magic Speech Synthesis" button).
  • The ability to control speech synthesis parameters such as gender, voice, rate of speaking, volume, pitch from the Sugar Control Panel and to store these settings.

For the Activity Developer:

  • The ability to use a sugar service which converts text to speech on request. The speech synthesis parameters (e.g. gender, voice) can be controlled individually from within the activity.

Existing tools such as eSpeak, speech-dispatcher, Festival will be used reused wherever possible in this project.

[edit] Deliverables

Speech Configuration Management for Sugar

  • Provision of a control panel section for modifying speech synthesis parameters;
  • Storing and retrieving speech synthesis parameters;
  • What parameters to expose
     a. Language - this should default to the language settings on the Sugar environment.
     b. Voice Selection - Male/Female, Child/Adult, Age.
     c. Rate
     d. Pitch
     e. Volume

Graphical User Interface Considerations

  • A Speech Synthesis Button for the Sugar tool bar;
  • Providing speech synthesis parameter control in the Sugar Control Panel;
  • Using GTK Selections to capture highlighted text from anywhere in Sugar and performing speech synthesis for it.

In Discussion

  • Karaoke Style Coloring of text while it is being spoken in Sugar Environment;
  • Patching Sugar to enable a Speech Synthesis keyboard shortcut;
  • Consider re-writing speech-dispatcher python API as a sugar.speech service if found to be a better approach with the Sugar roadmap.

[edit] Development Overview

Sugar Configuration Management - A python based script will be responsible for maintaining the sugar speech synthesis parameters. These parameters will be automatically applied to speech-dispatcher when the machine boots up. Hence the configuration parameters will be stored in a file suitably protected and hidden from the end user. Sugar Control Panel will be suitably modified to provide such configuration management techniques.

Highlight and Speak - The text can be highlighted anywhere in the Sugar User Interface. A function (as a patch for sugar) will be written that can be invoked each time the user highlights text and clicks the "Speak Button". The python script in the background will fetch the text highlighted using GTK selections, and then send it to speech-dispatcher for synthesis.

A suitable mechanism will be found to enable the user to pause and resume the speech synthesis process.

The programming language to be used primarily will be python. Depending upon performance requirements it can also be re-written in C.

[edit] Feedback

Please share your feedback here.