Programming Assignment 1
For programming students, the first task is to develop two programs
related to the task of converting phonemes into words.
w2p: words to phonemes
We will begin by converting words into phonemes. Name your program
w2p (for Words To Phonemes) and have it accept a filename on the
command line. It should open the file and read it. The file consists
of words on one or more lines. For each word, convert it into
phonemic form and print the result, one phoneme per line.
You are allowed and encouraged to build additional functionality into
your w2p program, which functionality can be controlled by additional
command line arguments. You may wish, for example, to also print the
word that is being translated. Or you may wish to specify the phoneme
set to be used for output. Whatever you want. Here is an example of
running your program:
w2p whatever.txt > file1
p2p: phonemes to phonemes
The second program reads a speech label file consisting of timing
marks and phonemes. Strip away everything that is not needed and
print out the phonemes one per line. Here is an example of running
your program:
p2p whatever.phn > file2
Comparison
The outputs of your two programs should be compatible. Use the same
phoneme set for both programs. We will compare the two outputs as
follows. First, run w2p and capture the output into file1. Next, run
p2p and capture the output into file2. Finally, run the following
diff command and capture the output into file3.
diff -y -W 10 file1 file2 > file3
The -y switch specifies that diff should use side-by-side
format for its output. The -W 10 switch and parameter
specifies a column width of 10 characters in the output.
To evaluate the goodness of your programs, you can use the following
command:
grep "[<|>]" file3 | wc
This will look for all the insertions, substitutions, and deletions
mentioned in file3. It will then count them. The fewer you have, the
better your match.
Phoneme Set
Note that your phonemes must match between file1 and file2. I have
not required a particular phonetic alphabet, but I recommend WorldBet
or some variant that you may develop. You are also free to invent
your own phonetic alphabet, but I encourage you to make it easy for
others to read and understand.
Dictionary
To convert from words to phonemes, the easiest way is probably to use
a dictionary that has phonetic transcriptions of words. Two such
dictionaries are listed on my web pages, both distributed by Moby.
One is the moby dictionary. The other is the cmudict. They do not
use the same phonetic alphabet.