|Page 1 of 1|
Date of review: 24-April-2000
Type Of Review: Articles/Editorials
He looked the fatherly academic sort. Graying hair, calm and consistent, systematic. Engrossed in the basic concepts of artificial intelligence. Just a basic intro. To him, it would be trivial stuff, and he'd just be passing his time fulfilling his teaching duties. Maybe find a promising student or two.
Definitely not me, the guy who consistently came in 15 to 25 minutes late each lecture. The T-shirt guy who never failed to wear denim shorts and synthetic sandals -- unmistakable hostellite attire (I don't remember if they were in irreversible stench mode yet; my advice: always buy leather).
Round the corner awaited de ja vu for me.
Months before, I had been fiddling with the info hypertext reader so I could read the Emacs lisp hypertext documentation. I was struggling to learn very basic lisp commands so I could spiff up my Emacs config file.
And here the old prof was, spending two lectures to give us a proper lisp primer. (Or was it one...)
I had my ears wide open, comparing what he was introducing to us, Golden Common Lisp, to the elisp that I was still a neophyte at. To me, one lisp looked promisingly similar to the other. (The non-initiate would say "Of course!", but many variants of lisp exist.)
Our first assignment: the AI "hello world" equivalent of writing a unification function in lisp. Not any old lisp. Golden Common Lisp.
Which didn't run on my big iron.
(Not that I didn't have Windows. But booting into Windows meant that I couldn't explore UNIX until another reboot, and what's the point of rebooting so many times?)
So after browsing the elisp documentation for a few more hours, I set about doing it in Emacs lisp. For my purposes, Emacs lisp was equivalent to GCLisp, with one minor exception: the Emacs lisp `if' predicate could take more than three arguments if need be, while the GCLisp version always took exactly three.
(To be precise, elisp is a big superset, with some very nice I/O, networking, process management and other miscellaneous predicates -- more than just a plain equivalent.)
I was pretty confident that some of the other students would have been busy trying to figure out how to debug their program. In my case, I was busy learning how to use the interactive Emacs lisp debugger, stepping through my program and examining the function arguments at run time, in the very editor that I used to write the code.
I kept using the command apropos feature to discover new commands as I went along, resorting to the info reader whenever I needed more detail. And on my 16MB 486DX4-100, I remember telling myself that maybe Emacs is bloatware, but it sure is a cool piece of programmable bloatware.
That was June 96, during the Special Term. I never did understand why we were asked to use GCLisp when Emacs was free (in both senses of the word) and available everywhere, including Windows.
But this is about de ja vu, not Emacs nor GCLisp. There would be many more de ja vus waiting for me over the next few years. One of the interesting ones has to do with eye strain.
"What the heck does eye strain have to do with it?" I hear you say.
It's pretty obvious, isn't it? Online documentation is great with all its hypertext and searchability and evolutionary development (e.g. HOWTOs are seldom unmaintained unto obsolescence; the online versions either find a way to somehow stay up to date or become superceded by better ones), but when it comes to the practical biological limits of the human eye, it just doesn't cut it.
At any one time, I had loads of man pages, HOWTOs, books, tutorials and hostel activities to contend with. There just wasn't enough biological eye-time and brain-time to read and do everything. The sports activities helped the eyes, but they also took away precious reading time. I needed a multimodal, multitasking approach to information intake.
Or at least a non-visual input method. Not that auditory input is so much more efficient than visual, but there is just so much bashing your eyes can take before you go blind.
E-paper would be great; I just can't wait for either Xerox or MIT to solve the manufacturing blues. But E-paper is only a partial solution to the eye strain problem. So it wasn't long before I started looking for a text-to-speech system.
There were numerous little proggies floating around SunSite that claimed varying degrees of efficacy, but the real mama at the time was a combination of two: the Festival text-to-speech system from Edinburgh University, with the MBROLA speech synthesizer from the Polytechnique de Mons in France (I think). You can still use them today; just visit the nearest BLIND Linux mirror (e.g. ftp.leb.net).
Festival was cool. It was architected after Emacs, with a core C++ engine driven by an interpreter of a Turing-complete language. That made it amazingly pluggable, extensible and programmable.
So I went about configuring Festival to use MBROLA for the speech synthesis and writing some minor scheme code to tailor the speed and pitch mean and variance the way I liked. Rendering a piece of text or a web page (via lynx) incurred some processing delay, but the end result was understandable. Just what I needed. (This was after I upgraded to a 32MB P133 with a comfortable swap partition.)
Many a night passed with my ears plugged into the hi-fi and a HOWTO attempting to blast its way into my brain. (Unfortunately, the HOWTOs kept missing their mark. There is no substitute for conscious study. But you get the idea...)
Many months later, in January 1998, I enrolled in Prof Lua's Natural Language Processing class.
Well, it was kind of a semi-(de ja vu). While Festival + MBROLA was a text-to-speech system, the module was an introduction to signal processing, text-to-speech, speech-to-text and all things in-between (with a bit of PSOLA, but its extension to MBROLA was non-examinable, which meant I had to pick it up on my own).
It was interesting to see the theoretical foundations of the methods that had worked in the field of NLP. I was especially impressed at how different angles (syntax, grammar, semantics, pronunciation, frequency spectra, etc) had to be tied together in any NLP system in order to achieve any kind of success.
And the amazing thing is, there was no big secret, no magic algorithm. The hidden markov models and prosody generators and what have you -- at the end of the day they were all nothing more than statistical methods. It had taken statistics, lots of RAM, huge databases and tons of hard work to kill the NLP dragon. Not to mention the numerous recording studios. (How else do you build your diphone databases?)
But as with any undergraduate module, this was simply an introduction to the field, so I soon found myself having to read Thierry Dutoit's book about speech synthesis in order to understand MBROLA.
(Actually, I do not do justice to the NLP community. They have developed some very clever ways of putting the methods together, and though some of their methods look simple, it is because a lot of effort has already been put into distilling them to the essence.)
That's not to say anything about the source code. And so, for the NLP project, I went over to Dutoit's site and learned all I could about Object-Oriented Blocks Programming, and attempted to actually use it.
Prof Lua didn't think very much about the usefulness of doing that, though. He left it to his masters students to decide if the approach I had used for the term project was useful. Yes, the same masters students who were writing "pseudo-C++" code.
And you know what the funniest thing is? I still didn't know how Festival worked at the end of it. I concluded I'd need to read papers and browse source code and soak in formulae to do that, so I quietly put NLP back in my inbox and went on to more pressing things, like the exam...
(The desire to peek under Festival's hood is still floating around the back of my head. Hopefully I'll get to that some day.)
Be sure to check out the previous episodes of the How I Learned Linux series below:
Print this Review
Mail this review to your friend