Courses using Alpaydin’s machine learning textbook

I’m taking the machine learning course being taught by Andrew Ng at Coursera. At times it’s a bit light on the theory for my tastes, which is understandable, so I’ve been looking to other sources. One that I’d come across previously that I ended up buying is Ethem Alpaydin’s Introduction to Machine Learning.

But Alpaydin’s book has its own problem: a relatively small number of exercises, and no data. So it seems useful to find more exercises and people who have written data-based exercises to go with the book. The obvious place to do this is to find courses that have been taught using this book, so I decided to compile a list of such courses. I make no claim that this list is anywhere near a complete list; it was compiled by half an hour of Googling. But if I was going to make such a list, it seemed good to make it available.

Incidentally, I think in general it would be good to have lists of web pages of “courses taught using book X” available, both for learners (who might want to see supplementary resources, get a sense of which sections of a book are more or less important, and so on ) and for teachers (to see how others have organized their courses).

Here’s the list:

Alan Yuille, Introduction to machine learning, UCLA Stat 161/261, Spring 2010.
Ahmed Elgammal, Machine learning, Rutgers 198:536 (CS), spring 2007.
Joakim Nivre, Machine learning for NLP, Uppsala fall 2011.
Thorsten Joachims, Machine learning, Cornell CS 478, spring 2008.
Dan Lizotte, Introduction to machine learning, Reykjavik, spring 2007.
Alexander Partzin, Machine learning, Tel Aviv, fall 2007.
Berrin Yanikoglu, Machine learning, Sabanci University (Turkey) CS512, fall 2011.
Andrea Danyluk, Machine learning, Williams CS 374, spring 2011.
Shan-Hung Wu, Machine learning, National Tsing Hau University, spring 2012.
Zheng-Hua Tan, Machine learning, Aalborg (Denmark), spring 2011.
Kevin Murphy, Machine learning, British Columbia CS 340, fall 2006.
Jugal Kalita, Machine learning, University of Colorado at Colorado Springs CS 586, spring 2010.

Incidentally, a lot of courses in this area seem to recommend more than one text, because it’s a rapdily growing area. Others that seemed to be mentioned a lot in the same breath as Alpaydin are Bishop, Pattern Recognition and Machine Learning and Mitchell, Machine learning.

(Thanks to Brent Yorgey for a correction.)

OKCupid on gay and straight

OKCupid has dug into their dataset and has looked at gay sex vs. straight sex. (Safe for work, unless charts aren’t safe for work.) It turns out that at least in their data, men and women are equally promiscuous.

Back in 2007, and in more mainstream data sets, the numbers were different. The numbers seemed to vary from population to population, but one thing was consistent: men reported having had twice as many female sexual partners as women reported having male sexual partners. The obvious explanation, of course, is that people lie about how many sexual partners they’ve had, and that men and women lie in different directions (men adjust their number upwards, women downwards).

But this doesn’t show up in the OKCupid data set: the median number of sexual partners for both straight men and straight women, in their data set, is six. This is also the median number of sexual partners for gay men, and for gay women – OKCupid actually points this out to make the point that gay people are no less or more promiscuous than straight people. If you object to comparing medians, they actually give the whole distribution curve; the distributions of number of sexual partners for OKCupid-using straight people and OKCupid-using gay people are substantially the same. (Not having the raw data, I can’t say if the difference is statistically significant, but who cares?)

Of course this only says something about the self-selecting pool of OKCupid users. But it seemed worth calling out.

A half-baked idea for modifying Scrabble scores

I’ve recently been listening to an excellent podcast on language from Bob Garfield and Mike Vuolo Slate, called Lexicon Valley. You may remember that back in March I pointed out that my name is supervocalic, i. e. it contains each vowel exactly once; in an early episode they ask a similar question, to find celebrities (Charlie Daniels is one example) who have the same vowels in both names.

In March they did an episode about Scrabble, a game which I’ve taken a renewed interest in because my girlfriend is much better at it than I am. But a large part of this is simply that she knows more obscure words than I do. Stefan Fatsis is the author of the book Word Freak: Heartbreak, Triumph, Genius, and Obsession in the World of Competitive Scrabble Players and a competitive Scrabble player himself, and was interviewed for the Scrabble episode of Lexicon Valley. Apparently the reliance of Scrabble on obscure words is seen as something of a problem in competitive Scrabble as well. North American players use a different word list than the rest of the world, and the North American list is shorter; some players don’t want to move to the longer list because they feel it contains too many obscure words.

One idea that occurs to me — although I don’t know how one would implement this — would be to modify the score that a word receives with some multiplier, a function of the frequency with which the word is used. (I wouldn’t use the frequency of the word itself; then Scrabble would reduce to seeing who can play THE the most.) But this would make scoring much harder — you’d have to pause to use lookup tables after every word. Computers, however, can handle this. More importantly it would make scoring much less transparent. This seems especially a flaw in the end of the game; with opponents that I’m well-matched with games can come down to the final few moves and I know exactly how many points my words will receive.

(And in case you’re wondering: if I had to name a baby I would lean towards first names that contain the vowels A, E, and I exactly once each, and no O or U.)

Stanford interview with Reviel Netz

Stanford’s in-house news site has an interesting interview with Reviel Netz on mathematical proofs as literature. Netz is also the author of the fascinating book The Archimedes Codex: How a Medieval Prayer Book Is Revealing the True Genius of Antiquity’s Greatest Scientist. (By way of explaining the subtitle: the earliest known manuscript of Archimedes is a palimpsest.)

Weekly links for May 6

Catherine Ulitsky’s paintings, including some like this one that are basically Delaunay triangulations of the positions of birds in a flock. (via Radiolab)

John Cook on Traveling Salesman art, based on a traveling salesman app, which is a companion to Bill Cook’s book In Pursuit of the Traveling Salesman. (I haven’t read it. I also don’t know if the two Cooks are related.)

Kevin Carey argues that everyone should learn statistics because everyone has to serve on juries.

Julian Champkin writes for Significance Magazine about the data journalism handbook

John Allen Paulos on screening the screening tests.

Brian Hayes: Statistical mechanics of magnet balls.

Shankar Vedantam, NPR: Most of us aren’t average – the usual about how many things follow power laws. (From the title I was hoping this would be “most of us are not average at everything“.)

Pete Casazza, A mathematician’s survival guide. (via the AMS grad student blog.)

What’s a number, by Tom Christiansen. (via John Cook)

John Kerl’s Tips for mathematical handwriting.

R tutorial videos

People who want to learn the very basics of R may find these videos made by some Berkeley grad students useful.

Devlin’s Coursera transitions course

Keith Devlin writes about MOOCs, or “massive open online courses”, such as those offered by udacity and coursera¹. In particular he’s going to be offering a five-week “math transitions” course in October, via Coursera. Devlin writes:

Such courses typically comprise a mix of some elementary mathematical logic, proof techniques, some set theory through to an analysis of relations and functions, with a bit of elementary number theory and introductory real analysis thrown in to provide examples.

I’m a bit skeptical about this, because the coursera platform involves automated grading. This is fine for courses where problems have numerical answers, or for courses where the assignments are to program and whether a program works can be validated by an automated system. But the transition course is in some way the course where students learn how to prove things; I almost want to say it’s fundamentally a writing course. This is a problem that one runs into even when teaching in-person courses, if the course is large enough that the grading is outsourced to an inexperienced graduate student or even an undergraduate, as is common in some places; sometimes the grader is simply not experienced enough to give really high-quality feedback. Of course in many situations the grader could give such feedback but doesn’t have time to do so, which is really a different issue. But it seems like it will only be worse in the online format. I’m sure Devlin is aware of this, though, and I’ll be interested to see what he and his TAs do.

1. why is udacity a .com and coursera a .org? In both cases it looks like the company registered the “other” domain at the same time, so it’s not a question of availability of domains.

A forgotten psuedorandom permutation on 26 letters

I’m reasonably sure that long ago and totally by accident, I discovered a permutation of the alphabet a, b, …, z that somehow naturally arose from the order of letters on the QWERTY keyboard and had order 630. One such permutation would be (abcdefg)(hijklmnop)(qrstuvwxyz), which has cycles of order 7, 9, and 10 and therefore has order the least common multiple of 7, 9, and 10, which is 630. But of course this doesn’t naturally arise from the keyboard. 630 is interesting here because it’s ~~the largest order of a permutation of 26 elements~~ fairly large for the order of a permutation of 26 elements; the maximum is twice this, 1260, as pointed out by several commenters.

I had thought that this permutation was the one that, in the two-line notation, is written

abcdefghijklmnopqrstuvwxyz qwertyuiopasdfghjklzxcvbnm

which takes a to q, b to w, and so on. But I checked during an idle moment earlier today; rewriting this in the cycle notation gives

(aqjphioguxbwvcetzmdrk)(fyn)(ls)

which has cycles of length 21, 3, and 2 and therefore has order lcm(21, 3, 2) = 42. So what was I thinking of?

Answer, added Wednesday, May 2: instead of going horizontally, go vertically: the second line is qazwsxedcrfvtgbyhnujmikolp, which gives the 7-9-10 cycle type.

Weekly links for April 29

What does non-Euclidean geometry look like? (using a raytracer)

galton.org collects the papers of Francis Galton.

From Jim Pagels at Slate: which pro league has the most meaningful regular-season games? A “meaningful game” is defined as one that’s played before a team clicnhes a playoff spot or is eliminated from playoff contention.

Probability of lots of twins

Dave Radcliffe asked on Twitter: “A class of 380 students has 16 sets of twins. How likely is this to happen purely by chance?” and he links to this article. The school in question is the Staples High School in Westport, Connecticut.

Let’s assume for the sake of argument that if there are two twins, they will be in the same grade at the same school. Then we may as well treat each pair of twins as a single person for the purpose of school-enrollment purposes, and so we’re asking: out of 364 entities which are either singletons or pairs of twins, what’s the probability of at least 16 twins?

This refers to the class of 2014 at this school, so let’s figure they were born in 1996. This data brief from the CDC gives the rate of twin births in 1996 at about 27 per 1000. But that’s counted per-child. If you count per-pregnancy you should get just over half that; let’s call it 14 per 1000. The probability that a binomial(364, 0.014) random variable is at least 16 is about one in 14,000.

(The data brief, however, points out one interesting fact – Connecticut has the highest rate of twinning. Different states have quite different rates of twinning, which appear to be explained at least partially by different distributions of age and race of mothers giving birth.)