Saturday, November 18, 2006

Scrabble: Man vs. Machine


The Seattle Times today has a story about recent man-machine match in Seattle. Jim Kramer is the 2006 U.S. Scrabble Open champion, and played a best-of-3 match against RealNetwork's new Scrabble game. Jim managed to beat the computer 2-1, and walked away with the $10,000 prize.

Jim said:




...it was a victory that he wouldn't expect five years from now. The computer programs will be much more sophisticated, he said. "They'll just have much more brute force."


I'm not sure how much stronger Scrabble games will be in 5 years - Scrabble is a fairly simple game for computers to play (unlike, say, Go). The mechanics of the game have documented algorithms that can be used (Appel & Jacobson's "The world's fastest Scrabble program", for example). The tricky bit is the strategy - often playing the highest-scoring move will not ensure a win against a strong human opponent.

This is the algorithm I implemented for fun many years back when I was in university (as a hobby). The project was really fun and had some interesting challenges, the first of which was filling in the few gaps in the above paper so that the algorithm was clear. Other fun things (at least they seemed fun to me at the time):




  • Where to get words from? The most essential part of a Scrabble game is having a good list of words - they should be legal Scrabble words of course. I found a few free word lists online and used them, but had to try to prune out invalid words (proper nouns, acronyms). Eventually I found a nice OWL/OSW word list (I think it was 2-8 letter words) and augmented that with some longer words from my other source. I think in the end I had 130,000+ words.


  • Storing the dictionary in a file. The naive solution is to use a big text file. The program would then read this in and build a DAWG ("directed acyclic word graph" incase you care) for use when searching for words. However this was slow, and the file was huge. So, I started playing with ways to build the DAWG once, and save it as a binary file that was smaller and could be loaded quickly.


  • Compressing the dictionary. It turns out that there is a lot of redundancy in English words, and the DAWG has lots of sub-graphs that are redundant. This means wasted space on the disk (not such a big deal) and wasted memory (more of a big deal). I was coding all this in C/C++ and DJGPP (a free 32-bit compiler and port of the GNU C libraries). This meant I could use more than 640kb in DOS/Windows, but my home machine was not that beefy. (It was a 486SX with 4MB of RAM if I remember right). So, the huge dictionary I was using actually had problems fitting into memory unless I reduced it's size somehow. The solution was to look for the redundant parts of the DAWG and "fold them" onto themselves. Finding matches for sub-graphs was fun, and I had many iterations of my "dictionary builder and packer" tool. Initially it would take an overnight run for it to produce the final output file, but in the end it did its job in an hour or so! :)


  • Graphics. Since the DJGPP compiler and libraries I was using were pretty basic, I didn't have fancy Windows-style graphics I could use (no menus, windows, etc.) I used the fantastic free graphics library Allegro instead. This still meant I had to cobble together my own menus and code to handle the placement of Scrabble tiles. I also had a feature that would let you ask the computer for "hints". It would basically figure out all the possible moves you could make, and sort them in descending score. You could then drag a scrollbar and flip through all the possible words - really quickly.


  • Strategy. I left this to last and never really did anything wonderful (by this stage I was ready to move on to something else...) My game simply chose the highest-scoring move it could find and played that... It was still pretty hard to beat.


Sadly, the code to my Scrabble game disappeared when I upgraded to a new machine (stupid me for not having enough backups)!

No comments: