Programming Assignment 1

For programming students, the first task is to develop two programs related to the task of converting phonemes into words.

w2p: words to phonemes

We will begin by converting words into phonemes. Name your program w2p (for Words To Phonemes) and have it accept a filename on the command line. It should open the file and read it. The file consists of words on one or more lines. For each word, convert it into phonemic form and print the result, one phoneme per line.

You are allowed and encouraged to build additional functionality into your w2p program, which functionality can be controlled by additional command line arguments. You may wish, for example, to also print the word that is being translated. Or you may wish to specify the phoneme set to be used for output. Whatever you want. Here is an example of running your program:

w2p whatever.txt > file1

p2p: phonemes to phonemes

The second program reads a speech label file consisting of timing marks and phonemes. Strip away everything that is not needed and print out the phonemes one per line. Here is an example of running your program:

p2p whatever.phn > file2

Comparison

The outputs of your two programs should be compatible. Use the same phoneme set for both programs. We will compare the two outputs as follows. First, run w2p and capture the output into file1. Next, run p2p and capture the output into file2. Finally, run the following diff command and capture the output into file3.

diff -y -W 10 file1 file2 > file3

The -y switch specifies that diff should use side-by-side format for its output. The -W 10 switch and parameter specifies a column width of 10 characters in the output.

To evaluate the goodness of your programs, you can use the following command:

grep "[<|>]" file3 | wc

This will look for all the insertions, substitutions, and deletions mentioned in file3. It will then count them. The fewer you have, the better your match.

Phoneme Set

Note that your phonemes must match between file1 and file2. I have not required a particular phonetic alphabet, but I recommend WorldBet or some variant that you may develop. You are also free to invent your own phonetic alphabet, but I encourage you to make it easy for others to read and understand.

Dictionary

To convert from words to phonemes, the easiest way is probably to use a dictionary that has phonetic transcriptions of words. Two such dictionaries are listed on my web pages, both distributed by Moby. One is the moby dictionary. The other is the cmudict. They do not use the same phonetic alphabet.