!C99Shell v. 2.0 [PHP 7 Update] [25.02.2019]!

Software: Apache/2.2.16 (Debian). PHP/5.3.3-7+squeeze19 

uname -a: Linux mail.tri-specialutilitydistrict.com 2.6.32-5-amd64 #1 SMP Tue May 13 16:34:35 UTC
2014 x86_64
 

uid=33(www-data) gid=33(www-data) groups=33(www-data) 

Safe-mode: OFF (not secure)

/usr/share/doc/espeak/   drwxr-xr-x
Free 130.05 GB of 142.11 GB (91.51%)
Home    Back    Forward    UPDIR    Refresh    Search    Buffer    Encoder    Tools    Proc.    FTP brute    Sec.    SQL    PHP-code    Update    Feedback    Self remove    Logout    


Viewing file:     add_language.html (8.16 KB)      -rw-r--r--
Select action/file-type:
(+) | (+) | (+) | Code (+) | Session (+) | (+) | SDB (+) | (+) | (+) | (+) | (+) | (+) |
eSpeak: Adding a Language Back

6. ADDING OR IMPROVING A LANGUAGE


Most of the work doesn't need any programming knowledge. Just an understanding of the language, an awareness of its features, patience and attention to detail. Wikipedia is a good source of basic phonetic information, eg http://en.wikipedia.org/wiki/Vowel

In many cases it should be fairly easy to add a rough implementation of a new language, hopefully enough to be intelligible.
After that it's a gradual process of improvement to:

  • Make the spelling-to-phoneme translation rules more accurate, including the position of stressed syllables within words. Some languages are easier than others. I expect most are easier than English.

  • Improve the sounds of the phonemes. It may be that a phoneme should sound different depending on adjacent sounds, or whether it's at the start or the end of a word, between vowels, etc. This may consist of making small adjustments to vowel and diphthong quality, or adjusting the strength of consonants. Bigger changes may be recording new or replacement consonant sounds, or even writing program code to implement new types of sounds.

  • Some common words should be added to the dictionary (the *_list file for the language) with an "unstressed" attribute (eg. in English, words such as "the", "is", "had", "my", "she", "of", "in", "some"), or should be preceded by a short pause (such as "and", "but", "which"), or have other attributes, in order to make the speech flow better.

  • Improve the rhythm of the speech by adjusting the relative lengths of vowels in different contexts, eg. stressed/unstressed syllable, or depending on the following phonemes. This is important for making the speech sound good for the language.

  • Identify or implement new functions in the program to improve the speech, or to deal with characteristics of the language which are not currently implemented. For example, a different intonation module.
If you are interested in working on a language, please contact me so that I can set up the initial data and discuss the features of the language.

For most of the eSpeak voices, I do not speak or understand the language, and I do not know how it should sound. I can only make improvements as a result of feedback from speakers of that language. If you want to help to improve a language, listen carefully and try to identify individual errors, either in the spelling-to-phoneme translation, the position of stressed syllables within words, or the sound of phonemes, or problems with rhythm and vowel lengths.


6.1 Language Code

Generally, the language's international ISO 639-1 code is used to identify the language. It is used in the filenames which contains the language's data. In the examples below the code "en" (English) is used as an example. Replace this with the code of your language.

It is possible to have different variants of a language, for example where the sound of some phonemes changed, or where some of the pronunciation rules differ.


6.2 Phoneme File

You must first decide on the set of phonemes to be used for the language. These should be listed and defined in a phonemes file such as ph_english. A reference to this file is then included at the end of the phonemes, file (the master phoneme file), eg:

   phonemetable  en  base
   include  ph_english

This example defines a phoneme table "en" which inherits the contents of phoneme table "base". Its contents are found in the file ph_english.

The base phoneme table contains definitions of a basic set of consonants, and also some "control" phonemes such as stress marks and pauses. The phoneme table for a language will generally inherit this, or alternatively it may inherit the phoneme table of another language which in turn inherits the base phoneme table.

The phonemes file for the language defines those additional phonemes which are not inherited (generally the vowels and diphthongs, plus any additional consonants), or phonemes whose definitions differ from the inherited version (eg. the redefinition of a consonant).

Details of the contents of phonemes files are given in phontab.html.

The Compile phoneme data function of the espeakedit program compiles the phonemes files to produce the files espeak-data/phontab, phonindex, and phondata.

For information on how to analyse recorded sounds of the language and to prepare the corresponding phoneme data, see espeakedit and analysis).

For an initial draft a language will often be able to use vowels and consonants which have already been set up for another language.


6.3 Dictionary Files

Once the language's phonemes have been defined, then pronunciation dictionary data can be produced in order to translate the language's source text into phonemes. This consists of two source files: en_rules (the spelling to phoneme rules) and en_list (an exceptions list, and attributes of certain words). The corresponding compiled data file is espeak-data/en_dict which is produced from en_rules and en_list sources by the command: speak  --compile=en.

Details of the contents of the dictionary files are given in dictionary.html.

The en_list file contains not only pronunciation exceptions, but also gives attributes to specific words, Most notable of these are:

$u Some common words should be marked as "unstressed" in order to make the speech flow better. These words generally include articles (eg: a, the, this, that), auxillary verbs (eg: is, have, will, can, may), pronouns and possessive adjectives (eg: he, his), some common prepositions (eg: of, to, in, of), some common conjunctions (eg. and, or, if), some common adverbs and adjectives (eg. any, already)

$pause Some words should be marked to have a short pause before then, in order to produce natural pauses in long sentences. These include conjunctions (eg. and, or, but, however, which) and perhaps some prepositions.


6.4 Voice File

Each language should have one or more voice files in espeak-data/voices. The filename of the default voice for a language should be the same as the language code.

Details of the contants of voice files are given in voices.html.

The simplest voice file would contain just a single line to give the language code, eg:

   language en

This language code specifies the phoneme table (i.e. phonemetable en and the dictionary (i.e. espeak-data/en_dict) to be used. If needed, these can be overridden by phonemes and dictionary attributes in the voices file.


6.5 Program Code

The behaviour of the speak program is controlled by various options (eg. whether words are stressed on the first, last, or penultimate syllable). The function SetTranslator() at the start of the tr_languages.cpp file recognizes the language code and sets the appropriate set of options.

For a new language, you would add its language code and the required options in SetTranslator(). However, this may not be necessary during testing because most of the options can also be set from the voice file in espeak-data/voices.


:: Command execute ::

Enter:
 
Select:
 

:: Search ::
  - regexp 

:: Upload ::
 
[ Read-Only ]

:: Make Dir ::
 
[ Read-Only ]
:: Make File ::
 
[ Read-Only ]

:: Go Dir ::
 
:: Go File ::
 

--[ c99shell v. 2.0 [PHP 7 Update] [25.02.2019] maintained by KaizenLouie | C99Shell Github | Generation time: 0.026 ]--