Text to Speech

Converting text into natural sounding speech

Speech synthesis, also known as Text-to-Speech (TTS), is the automatic generation of speech from any textual input. Philips has developed a state-of-the-art TTS engine, with a highly natural speech quality, customizable in terms of voice and emotion expression, and yet a low implementation complexity.

Text to Speech

Converting text into natural sounding speech

Speech synthesis, also known as Text-to-Speech (TTS), is the automatic generation of speech from any textual input. Philips has developed a state-of-the-art TTS engine, with a highly natural speech quality, customizable in terms of voice and emotion expression, and yet a low implementation complexity.

The increasing need for natural interfaces, together with developments in linguistics, speech- and IC-technology, makes the introduction of speech synthesis in everyday life possible.

Anticipating the trend of people interacting with complex, multi modal and personalized systems, we expect TTS to play an important role in the user interface of many applications.

Applications

Natural user interface for complex consumer products
Hands and eyes free when on the move
User interface for products with no or small display e.g
- personal healthcare devices
- spoken artist and song title for mp3 players
Mobile gaming
Spoken e-books
Car navigation systems
Communication aid for disabled and dyslectic people
Context aware, spoken user manual/installation guide

How it works

Our algorithm has a highly natural speech quality. It uses diphone synthesis: the concatenation of prerecorded speech segments (diphones) from a database. A diphone is the transition from one basic sound (phoneme) to the next.

Traditionally, diphone synthesis suffers from artifacts. These mainly come from mismatched joints between recorded diphones and modifications to the synthesized speech for prosodic requirements. Our unique IP enables us to generate an artifact-free, very natural speech quality.

Text-to-Speech Users can define their own personalized voice from a single database, and an advanced recording tool can rapidly add new voices.

There is a set of predefined characters: man, old man, old woman, boy, young girl, robot, giant, dwarf, and alien.

There is also a set of predefined emotions: friendly, angry, furious, drill, scared, emotional, weepy, excited, surprised, sad, disgusted and whisper.

Currently, supported languages are: American English, British English, French, German, Dutch, Italian, Castilian Spanish, Brazilian Portuguese, Russian, Turkish, and Mandarin Chinese.

The compact TTS engine suits embedded systems:

the CPU load on a low cost ARM7 processor is only 20-60 Megahertz,
with 10-30 Kilobytes RAM and 450-3000 Kilobytes ROM usage.
It runs on Windows (PC), Windows CE (PDA), ARM, TriMedia and Linux.

Natural speech quality

Flexible emotion control and personalization from a single database.

The voice can be precisely customized in pitch, speed, spectral shape, formant sharpening, etc
Set of predefined emotions (friendly, angry, furious, drill, scared, emotional, weepy, excited, surprised, sad, disgusted, whisper) and characters (man, old man, old woman, boy, young girl, robot, giant, dwarf, alien)
Speech Synthesis Markup Language (SSML) support

Compact TTS engine, ideal for embedded systems

Low complexity, e.g. on ARM7, for a quality level ranging from 3.5 kHz bandwidth narrowband speech to 15 kHz bandwidth ultra wide band speech CPU load: 20-60 MHz RAM usage: 10-30 KB ROM usage: 450-3000 KB
Highly scalable: trade offs can be made between speech quality, memory size, and processing power
Generic C++ code, portable to various embedded processors
Supported platforms: Windows (PC), Windows CE (PDA), ARM, TriMedia, Linux

Language support

Available languages: American English, British English, French, German, Dutch, Italian, Castilian Spanish, Brazilian Portuguese, Russian, Turkish, and Mandarin Chinese
Easy to add new application specific dictionaries
Cross language speaker support, i.e. a voice can also speak in other (non native) languages
Advanced recording tool to rapidly add new voices

Contact us

* This field is mandatory

Country

State or Province

First name *

Last name *

Company

Role

Email address *

Telephone number

Company address

Website

Current active territories

Company annual sales

Main products and or services

Current brands and / or licenses if applicable

Distribution channels and key customers

Company overview / Key information relevant to your company

Interested Categories for licensing

Key information relevant to your enquiry *

Philips values and respects your privacy. Please read our privacy notice for more information.

Text to Speech

Converting text into natural sounding speech

Text to Speech

Converting text into natural sounding speech

Applications

How it works

Natural speech quality

Contact us

Related pages