I’ve recently been listening to an excellent podcast on language from Bob Garfield and Mike Vuolo Slate, called Lexicon Valley. You may remember that back in March I pointed out that my name is supervocalic, i. e. it contains each vowel exactly once; in an early episode they ask a similar question, to find celebrities (Charlie Daniels is one example) who have the same vowels in both names.
In March they did an episode about Scrabble, a game which I’ve taken a renewed interest in because my girlfriend is much better at it than I am. But a large part of this is simply that she knows more obscure words than I do. Stefan Fatsis is the author of the book Word Freak: Heartbreak, Triumph, Genius, and Obsession in the World of Competitive Scrabble Players and a competitive Scrabble player himself, and was interviewed for the Scrabble episode of Lexicon Valley. Apparently the reliance of Scrabble on obscure words is seen as something of a problem in competitive Scrabble as well. North American players use a different word list than the rest of the world, and the North American list is shorter; some players don’t want to move to the longer list because they feel it contains too many obscure words.
One idea that occurs to me — although I don’t know how one would implement this — would be to modify the score that a word receives with some multiplier, a function of the frequency with which the word is used. (I wouldn’t use the frequency of the word itself; then Scrabble would reduce to seeing who can play THE the most.) But this would make scoring much harder — you’d have to pause to use lookup tables after every word. Computers, however, can handle this. More importantly it would make scoring much less transparent. This seems especially a flaw in the end of the game; with opponents that I’m well-matched with games can come down to the final few moves and I know exactly how many points my words will receive.
(And in case you’re wondering: if I had to name a baby I would lean towards first names that contain the vowels A, E, and I exactly once each, and no O or U.)
One complaint I’ve heard about Scrabble points is that the rare letters (Q, X, Z, etc) are now overvalued, because the semi-invented words that have entered the Scrabble word list since its original creation, things like “za”, use those letters disproportionately frequently. As a result, the letters aren’t as difficult to use as they were, and thus should be valued lower.
About twenty years ago I wrote a computer Boggle program. In Boggle, you have three minutes to find all the words in a random grid of 16 letters, and you score points for the words you find that are not found by the other player. Since it is quite feasible for the computer to exhaustively search the grid and find *every* word in three minutes, I had to do something else.
After some tinkering, I settled on the following, which worked well: at the beginning of each game, the computer would select a vocabulary of words that it would notice in the grid. Unselected words would be missed even if they appeared in the grid.
Each word in its core dictionary was selected with a probability that scaled upwards with how familiar the word was believed to be. The computer was very unlikely to miss the word “set”, but fairly likely to miss “pique”. I used polysemy as a proxy for familiarity; there was some research available at the time that this was a reasonable proxy.
I also had the computer update the familiarity scores after each game: if a human player found a certain word, its estimated familiarity was adjusted slightly upwards.
Perhaps something in this could be adapted to Scrabble rules.
The idea of a bonus is intriguing. I’m pretty sure that linguists can link a word to an average grade level. A simple bonus rule could be binary: you get a bonus if the word is “college level” (grade 13 or above) and no bonus for high-school or lower-level words. An additive bonus (+10?) would be more fair than a multiplicative bonus, since the bonus is for the word itself, not for its placement on the board.