From don Mon May 26 06:35:05 2003 Date: Mon, 26 May 2003 06:35:05 -1000 Subject: cs441 Tuesday From: Don Colton Aloha, This is a reminder that on Tuesday (tomorrow, May 27), the CS students are expected to have their programs converted into w2p and p2p forms, ready to run in class. The term is zooming past. We need to come to closure on this part of the course. The linguistics students have been a little less lucky to date, with not as clear an assignment. We have had the ongoing task of developing rules that account for the difference between human-labeled speech and dictionary speech. I want to move forward in a new direction now. Let's grab the bull by the horns, finally, and get on with our big task: converting a stream of phonemes into words. I want the Linguistics students to invent a way to convert a stream of phonemes into a stream of "reliable events" for a word. By reliable, I mean that there are not 20 ways to say "seven." I mean there is probably one, maybe two. Feb-u-ary, Feb-ru-ary. Not Feb-u-ary, Feb-ya-wary, Fib-i-wary, etc. What are the features that are most reliably realized in human speech? I need rules to convert a phoneme stream into this new form. This will be the "w2p" and "p2p" activity for you. What does a human listen for? What does a human hear in the speech it takes in? The phonemes-to-words process will be like this (unless someone thinks of a better idea): Starting at the front of the utterance, we look at the first phoneme and do a reverse lookup to see if it is in our dictionary. If so, we report it. We repeat with the first two phonemes, then three, four, five, etc. (We need some way to know when to stop.) We report all the words that "could" start at the beginning of the utterance. Then we move forward to the second phoneme and repeat the whole process. We will need some way to evaluate the reliability of the match. I want us to make substantial progress on this task over the next week. We will talk about it in class. Finally, I am planning a little quiz for tomorrow. I will give you a stream of phonemes and ask you to translate it into words. It shouldn't be too hard, but I guess we will find out when I do it. See you tomorrow. Bro Colton