Software: Apache/2.2.16 (Debian). PHP/5.3.3-7+squeeze19 uname -a: Linux mail.tri-specialutilitydistrict.com 2.6.32-5-amd64 #1 SMP Tue May 13 16:34:35 UTC uid=33(www-data) gid=33(www-data) groups=33(www-data) Safe-mode: OFF (not secure) /usr/share/doc/espeak/ drwxr-xr-x |
Viewing file: Select action/file-type: 4. TEXT TO PHONEME TRANSLATION4.1 Translation FilesThere is a separate set of pronunciation files for each language, their names starting with the language name.There are two separate methods for translating words into phonemes:
4.2 Phoneme namesEach of the language's phonemes is represented by a mnemonic of 1, 2, 3, or 4 characters. Together with a number of utility codes (eg. stress marks and pauses), these are defined in the phoneme data file (see *spec not yet available*).The utility 'phonemes' are:
The phonemes which are used to represent a language's sounds are based loosely on the Kirshenbaum ascii character representation of the International Phonetic Alphabet www.kirshenbaum.net/IPA/ascii-ipa.pdf
4.3 Pronunciation RulesThe rules in the <language>_rules file specify the phonemes which are used to pronounce each letter, or sequence of letters. Some rules only apply when the letter or letters are preceded by, or followed by, other specified letters.To find the pronunciation of a word, the rules are searched and any which match the letters at the in the word are given a score depending on how many letters are matched. The pronunciation from the best matching rule is chosen. The pointer into the source word is then advanced past those letters which have been matched and the process is repeated until all the letters of the word have been processed.
4.3.1 Rule GroupsThe rules are organized in groups, each starting with a ".group" line:
4.3.2 RulesEach rule is on separate line, and has the syntax:
.group o o 0 // "o" is pronounced as [0] oo u: // but "oo" is pronounced as [u:] b) oo (k U"oo" is pronounced as [u:], but when also preceded by "b" and followed by "k", it is pronounced [U]. In the case of a single-letter group, the first character of <match> much be the group letter. In the case of a 2-letter group, the first two characters of <match> must be the group letters. The second and third rules above may be in either .group o or .group oo Alphabetic characters in the <pre>, <match>, and <post> parts must be lower case, and matching is case-insensitive. Some upper case letters are used in <pre> and <post> with special meanings.
4.3.3 Special characters in <phoneme string>:
th (_ _^_ENindicates that a word which ends in "th" is translated using the English translation rules and spoken with English phonemes. 4.3.4 Special Characters in both <pre> and <post>:
Examples of rules: _) a // "a" at the start of a word a (CC // "a" followed by two consonants a (C% // "a" followed by a double consonant (the same letter twice) a (/% // "a" followed by a percent sign %C) a // "a" preceded by a double consonants 4.3.5 Special characters only in <pre>:
@@) bi // "bi" preceded by at least two syllables @@a) bi // "bi" preceded by at least 2 syllables and following 'a'Note, that matching characters in the <pre> part do not affect the syllable counting.
4.3.6 Special characters only in <post>:
@) ly (_$2 lI // "ly", at end of a word with at least one other // syllable, is a suffix pronounced [lI]. Remove // it and retranslate the word. _) un (@P2 ¬Vn // "un" at the start of a word is an unstressed // prefix pronounced [Vn] _) un (i ju: // ... except in words starting "uni" _) un (inP2 ,Vn // ... but it is for words starting "unin"S and P must be at the end of the <post> string. S<number> may be followed by additonal letters (eg. S2ei ). Some of these are probably specific to English, but similar functions could be used for other languages.
P<number> may be followed by additonal letters (eg. P3v ).
4.4 Pronunciation Dictionary ListThe <language>_list file contains a list of words whose pronunciations are given explicitly, rather than determined by the Pronunciation Rules. The <language>_extra file, if present, is also used and it's contents are taken as coming after those in <language>_list.Also the list can be used to specify the stress pattern, or other properties, of a word. If the Pronunciation rules are applied to a word and indicate a standard prefix or suffix, then the word is again looked up in Pronunciation Dictionary List after the prefix or suffix has been removed. Lines in the dictionary list have the form:
book bUkRather than a full pronunciation, just the stress may be given, to change where it would be otherwise placed by the Pronunciation Rules: berlin $2 // stress on second syllable absolutely $3 // stress on third syllable for $u // an unstressed word 4.4.1 Multiple WordsA pronunciation may also be specified for a group of words, when these appear together. Up to four words may be given, enclosed in brackets. This may be used for change the pronunciation or stress pattern when these words occur together,(de jure) deI||dZ'U@rI2 // note || used as a word break in the phoneme stringor to run them together, pronounced as a single word (of a) @v@or to give them a flag when they occur together (such as) sVtS||a2z $pause // precede with a pauseHyphenated words in the <language>_list file must also be enclosed within brackets, because the two parts are considered as separate words. 4.4.2 Special characters in <phoneme string>:
4.4.3 FlagsA word (or group of words) may be given one or more flags, either instead of, or as well as, the phonetic translation.
The dictionary list is searched from bottom to top. The first match that satisfies any conditions is used (i.e. the one lowest down the list). So if we have: to t@ // unstressed version to tu: $atend // stressed versionthen if "to" is at the end of the clause, we get [tu:], if not then we get [t@].
4.4.4 Translating a Word to another WordRather than specifying the pronunciation of a word by a phoneme string, you can specify another "sounds like" word.Use the attribute $text eg.
cough coff $textAlternatively, use the command $textmode on a line by itself to turn this on for all subsequent entries in the file, until it's turned off by $phonememode. eg.
$textmode cough coff through threw $phonememodeThis feature cannot be used for the special entries in the _list files which start with an underscore, such as numbers. Currently "textmode" entries are only recognized for complete words, and not for for stems from which a prefix or suffix has been removed (eg. the word "coughs" would not match the example above).
4.5 Conditional RulesRules in a _rules file and entries in a _list file can be made conditional. They apply only to some voices. This can be useful to specify different pronunciations for different variants of a language (dialects or accents).Conditional rules have ? and a condition number at the start if the line in the _rules or _list file. This means that the rule only applies of that condition number is specified in a dictrules line in the voice file. If the rule starts with ?! then the rule only applies if the condition number is not specified in the voice file. eg.
?3 can't kant // only use this if the voice has: dictrules 3 ?!3 rather rA:D3 // only use if the voice doesn't have: dictrules 3
4.6 Numbers and Character Names4.6.1 Letter namesThe names of individual letters can be given either in the _rules or _list file. Sometimes an individual letter is also used as a word in the language and its pronunciation as a word differs from its letter name. If so, it should be listed in the _list file, preceded by an underscore, to give the letter name (as distinct from its pronunciation as a word). eg. in English:_a eI 4.6.2 NumbersThe operation the TranslateNumber() function is controlled by the language'slangopts.numbers option. This constructs spoken numbers from fragments according to various options which can be set for each language. The number fragments are given in the _list file.
4.7 Character SubstitutionCharacter substitutions can be specified by using a .replace section at the start of the _rules file. Each line specified either one or two alphabetic characters to be replaced by another one or two alphabetic characters. This substitution is done to a word before it is translated using the spelling-to-phoneme rules. Only the lower-case version of the characters needs to be specified. eg.
.replace cx ĉ // (Esperanto) allow "cx" as an alternative to c-circumflex fi fi // replace a single character ligature by two characters
|
:: Command execute :: | |
--[ c99shell v. 2.0 [PHP 7 Update] [25.02.2019] maintained by KaizenLouie | C99Shell Github | Generation time: 0.0165 ]-- |