Universal Phone Set (UPS)

The Speech API (SAPI) Universal Phone Set (UPS) is a machine-readable phone set that is based on the International Phonetic Alphabet (IPA).

UPS Design Principles

The following design principles were used to select SAPI labels and phone IDs for UPS.

  1. UPS covers the IPA 1993 Unicode character set, plus some extra SAPI phones including some suprasegmental labels that are used in speech synthesis markup but are not found in IPA.

    • In most cases there is a one-to-one mapping between UPS and IPA. In these cases, the SAPI Phone ID can simply use the IPA hex Unicode value.

    • In some cases UPS is a superset of IPA. For example, UPS includes some unique phone labels for commonly used sounds such as diphthongs and nasalized vowels, which the IPA treats as compounds. These compounds are represented using the compounding symbol “+”. This symbol is treated just like any other phone, and so must be space delimited in a lexicon: the compound of A and B is written A + B, not AB or A+B. The SAPI Phone IDs of such compounds are formed from the sequence of phone IDs making up the compound.

  2. UPS pronunciations consist of a string of UPS phones, each separated by whitespace. SAPI makes no distinction between segmental phones and diacritics, suprasegmentals, and tones, and so these must also be separated by whitespace like any other SAPI phone.

  3. The UPS phone labels are not restricted to use the SAPI phone labels from US English, Spanish, French, German, Japanese, or Chinese; however, where possible, existing SAPI labels were re-used. Two sets of guidelines were used in creating the phone label names:

    • The UPS phone set reflects orthography where possible but is not biased too heavily towards English.

    • Phone labels adhere to labeling conventions that apply across phonetic classes (e.g., lax vowels use the character ‘H’ as in ‘IH’, ‘UH’ and ‘AH’)

  4. The UPS phone set is case-sensitive. UPS phone labels are all defined using ASCII character strings.

    • Segmental phones (Consonants, Vowels, Clicks and Ejectives) are represented by upper-case alphabetic symbols. These may vary in length from one to three characters. In general, shorter symbols are used for more frequent phones.

    • Diacritics are represented by three-character, lower-case, alphabetic symbols.

    • Suprasegmentals and Tones may vary in length and contain non-alphabetic characters.