Open Syllabus Project

The Open Syllabus Project has collected syllabuses¹ from about a million college classes. From this they’ve created the Open Syllabus Explorer.

It’s generally focused on listing which texts occur in syllabi. This ends up meaning that it’s mostly useful for humanities classes, since they tend to assign a larger number of texts. I remember my grad school oral exams, where there were perhaps four books I was studying from; there were friends of mine in the humanities that would have, say, 50 books.

That being said, it’s still interesting to know that people who assign Stanley’s Enumerative Combinatorics also assign Wilf’s generatingfunctionology. And, somewhat more confusingly, Rudin’s Real and Complex Analysis – which makes me suspect that perhaps some lists of canonical math texts in a variety of subjects got caught in the dragnet.

An analogue of this for more technical subjects, where there are many books covering similar material and where there are fewer texts per course, would examine the portions of the books which are covered. Which subjects are seen as essential, and which ones are not? Many textbooks include a bit in the introduction that can be paraphrased as “my book is too long for one course, I know, so here are some one-course subsets of it” – but which of those susbets actually get used? This would require parsing the syllabi, and is perhaps feasible (and maybe the folks at the OSP are thinking about it). This is very much the sort of thing I would have liked to see when I was planning new courses, and on a hard drive that no longer works I’m pretty sure I have some files with lists of “professor X, in semester Y, covered chapters A, B, and C of the textbook”.

(Via NYT.)

1. My wife, a classics major, informs me that that’s the right plural, because “syllabus” is not of the declension that pluralizes by taking “-us” to “-i”. Interestingly, the NYT article uses “syllabuses” but the OSP’s page uses “syllabi”.

Snow day?

The National Weather Service’s Weather Prediction Center now has probabilistic winter precipitation guidance. You can get maps showing

the $n$ th percentile of the predicted distribution of accumulation for $n = 5, 10, 25, 50, 75, 90, 95$ ;
dually to that, contours on which the 10th, 20th, …, 90th percentile of the predicted distribution is a selected constant

for the next 72 hours, or for 24- or 48-hour subsets of that time period. Here’s a more in-depth description.

Looks like the mid-Atlantic is in for a rough ride. Stay home. Have some French toast. If you do go outside, ignore the wind chill.

We’re number one!

Glassdoor has one of those lists of the “best jobs in America”. “Data scientist” is #1.

Seeing this got my data scientist spidey sense tingling. It’s kind of ironic, seeing how a data scientist could poke so many holes in this…

Glassdoor describes their methodology as follows:

Methodology: Glassdoor’s 25 Best Jobs in America report identifies specific jobs with the highest overall Glassdoor Job Score. The Glassdoor Job Score is determined by weighting three factors equally: earning potential (median annual base salary), career opportunities rating, and number of job openings. Results represent job titles that rate highly among all three categories. The Glassdoor Job Score is based on a 5-point scale (5.0=best job, 1.0=bad job). For a job title to be considered, it must receive at least 75 salary reports and at least 75 career opportunities ratings shared by U.S.-based employees over the past year (1/8/15-1/7/16). The number of job openings per job title represents active job listings on Glassdoor as of 1/8/16. This report takes into account job title normalization that groups similar job titles.

The “career opportunities rating” appears to be aggregated from ratings left in Glassdoor reviews, and would seem to correspond more to the companies that data scientists work for than data science as a career. Earning potential I can agree with as an important variable (who doesn’t like money?) Looking at “number of job openings” seems a bit off to me – I’m sure there are a lot of job openings for hotel housekeepers – but perhaps they’re doing some normalization and this is actually some measure of number of openings relative to number of qualified candidates. (“Software engineer” has 49,270 openings and nothing else on the list breaks five figures, but it’s not at the top of the list.)

Data scientist has gone up since last year’s list, when it was #9, driven by an increase in salary and in career opportunities rating. Back when I would have called myself a mathematician, that was the #1 job (in a different study, but who cares?) Clearly this is all my fault.

Links for January 18

Siobhan Roberts interviews John Conway: Genius Behind the Numbers

Aaron Fisher, G. Brooke Anderson, Roger Peng, Jeff Leek
on applying randomized trials to figure out if students in MOOCs can learn what statistical significance looks like.

How FiveThirtyEight is forecasting presidential primaries.

Secrets of the MIT Poker Course by Nick Greene; the course referred to is at MIT OpenCourseWare.

Richard V. Reeves and Nathan Joo on How much social mobility do people really want? – it turns out that they want upward mobility without the corresponding amount of downward mobility that it implies.

Ana Swanson at the Washington Post’s Wonkblog on the mistake you’re making with your Fitbit. (The mistake, it turns out, is not wearing it – but there’s some interesting work on sampling to determine how accurate the Fitbit and other wearables are.)

Jordan Eilenberg was messing around with word2vec and found some interesting things.

Links for January 11

A lot of links this week for some reason – to be honest some of this is clearing out an old backlog, but it may reflect the beginning of the new year as well

Andy T. Woods, Charles Michel, and Charles Spence give a scientific study of the ‘rules’ of plating. This follows an earlier “plating manifesto” in two parts: Charles Spence, Betina Piqueras-Fiszman, Charles Michel and Ophelia Deroy:
The plating manifesto (I): from decoration to creation and Plating manifesto (II): the art and science of plating.

Alan J. Bishop, Western mathematics: the secret weapon of cultural imperialism (1990; via Hacker News).

Machine Learning, Uncertain Information, and the Inevitability of Negative “Probabilities” (video lecture, David Lowe, 2007)

John Horgan on Bayes’ Theorem: What’s the big deal? at Scientific American.

Juan Maldacena, The symmetry and simplicity of the laws of physics and the Higgs boson

Bill Gosper on continued fraction arithmetic (1972). At some point I want to sit down and digest this. See also Mark Jason Dominus’ talk on the material.

Aaron Clauset is teaching a course on The History and Future of Computing which has an intresting reading list.

Nick Berry on the Koch snowflake.

Trevor Hastie, Robert Tibshirani, and Martin Wainwright have a new book, Statistical Learning with Sparsity: The Lasso and Generalizations, which you can download.

Christie Aschwanden at FiveThirtyEight wrote You CAn’t Trust What You Read About Nutrition – because collecting data about nutrition is hard and also because there are so many studies that data mining is easy.

Udemy and Priceonomics on How William Cleveland Turned Data Visualization Into a Science.

Too good to be true: when overwhelming evidence fails to convince (via phys.org).

Nicky Case is simulating the world (in emoji).

Cathy O’Neil gave a talk on “Weapons of Math Destruction” and she’s finishing up a book of the same title.

Frank Wilczek (whose classes I slept through freshman year of college) on people’s preferences in recreations showing that they like math and don’t realize it.

Logarithmic approximations for Collatz

A question from Republic of Math:

Below is plot of running average of Collatz lengths of 1<= n <= 5000. Roughly power law. Well known? pic.twitter.com/BRvokUkHLi

— Republic of Math (@republicofmath) January 9, 2016

The fit in this plot looked a bit off to me – it seems like it should be a log, not a square root. (But of course a log is just a zeroth power…) For those who aren’t familiar, the Collatz conjecture is as follows: given an arbitrary positive integer, if it’s even, divide it by 2, and if it’s odd, multiply by 3 and add 1. Repeat. As far as we know, you always end up at 1. For example, say we start with 7; then we get

7, 22, 11, 34, 17, 52, 26, 13, 40, 20, 10, 5, 16, 8, 4, 2, 1

So why should this be true? Consider only the odd numbers in the sequence. In the previous sequence those are

7, 11, 17, 13, 5, 1

where we get each number by taking three times the previous one, plus one, and then dividing by 2 as many times as we can. We can figure that half the time we’ll divide by 2 once, a quarter of the time we’ll divide by 2 twice, and so on; on average we’ll divide by 2 twice. So each number in this sequence should be, on average, about three-quarters of the previous one, and each step in this sequence corresponds to three steps in the previous sequence (one tripling and two halvings). Thus each step on average multiplies by $(3/4)^{1/3}$ and it should take $(\log n)/(\log (4/3)^{1/3})$ steps to get from $n$ down to 1. Call this $C \log n$ where $C = 1/(\log (4/3)^{1/3}) \approx 10.42$ .

This heuristic argument is in, for example, the review paper of Lagarias; this online HTML version of the same paper gives the additional remark that the stopping time of the sequence is conjectured to be asymptotically normally distributed with mean and variance both on the order of $log n$ . (The constant is different there; that’s because Lagarias goes directly from $x$ to $(3x+1)/2$ if $x$ is odd.)

Let’s define a function $t(n)$ to be the number of iterations in this process needed to get to 1. For example look at the trajectory for 7 above; it takes 16 iterations to get to 1, so $t(7) = 16$ . Similarly for example $t(4) = 2$ (the trajectory is 4, 2, 1) and $t(6) = 8$ (the trajectory is 6, 3, 10, 5, 16, 8, 4, 2, 1). This is the sequence A006577 in the OEIS. Then the original question is about the function $m(n) = {1 \over n} (t(1) + t(2) + \cdots + t(n))$ . But since “most” numbers less than $\latex n$ have their logarithms near $\latex \log n$, that sum can be approximated as just $(1/n) (n \times (C \log n))$ or $\latex C \log n$.

And indeed a logarithmic model is the better fit (red in the plot below; blue is a power law – in both cases I’ve just fit to the range $100 \le n \le 5000$ )

running-means-of-collatz-stopping-time

Sadly, we don’t recover that constant 10.42… the red curve is $m = -16.87 + 11.07 n$ and the blue curve is $m = 16.31 \times n^{0.185}$ . But it’s not like you’d expect asymptotic behavior to kick in by 5000 anyway.

Powerball

A pre-Powerball roundup of links, in advance of tonight’s $900 million drawing:
Is it rational for an economist to play powerball? Probably not – the expected value of the Powerball drawing tonight might be negative since so many people buy tickets. Sales apparently go up faster than the jackpot. And Britain’s national lottery had a big jackpot today as well.

Here’s some advice on how to win and what to do if you win. I’d steer clear of thinking that certain numbers are luckier than others, although the standard advice – try to play numbers that other people aren’t playing, so you don’t have to share the jackpot if you win – definitely applies.

Relatively Prime season 2

Samuel Hansen, the force behind Strongly Connected Components and other excellent mathematical podcasts, is releasing the second season of his longform series Relatively r Prime. The first episode, “The Lexicon”, focuses on the language (both words and symbolic) used in mathematics.

In this episode, Samuel revisits his old feud with Peter Rowlett on whether it’s “math” or “maths”. It turns out that, according to Google’s word2vec model (which RaRe technologies has a web tool for), math is to maths as sports is to sport.

Links for January 4

Evelyn Lamb on Shinichi Mochizuki’s proof of abc and Piper Harron’s thesis, and how they exemplify opposite ends of a continuum.

Alycia Zimmerman on teaching mathematics to schoolchildren with LEGO.

Andrew McKenzie lists nine paradoxes with a statistical theme.

The best charts of the year from FiveThirtyEight.

For the knitters out there, you can buy patterns to make mathematical objects from Woolly Thoughts. (I’m a bit sad I can’t buy the actual objects.)

Network visualizations of Shakespearean tragedies by Martin Grandjean.

The math-class paradox by Jo Boaler at the Atlantic.

Buy a 3-D printed digital sundial or hear how it works.

In blog posts that I don’t have to write becasue someone else did, Michael Quinn, Philip Keller, and Tyler Barron have all written solutions to FiveThirtyEight’s Riddler #2 (the one about geysers). The “official” solution by Brian Galebach (who proposed the problem) was published along with the next puzzle.

David R. Hagen figures out why the 11th in the month is mentioned less frequently than other days, answering a question from xkcd.

Alex Albright on the demographics of PhD students in the sciences.

Some properties of 2016

Happy 2016!

I noticed something a couple days ago – I’m 32 now, and it’s 2016, and 32 is a factor of $2016 = 2^5 \times 3^2 \times 7$ .

Last year, though, had the same property – 31 is a factor of $2015 = 5 \times 13 \times 31$ .

So in which years will my age on New Year’s Day, $x$ divide the current year $1984 + x$ ? These are just the factors of $1984 = 2^6 \times 31$ .

(This seems like a nice problem – which numbers have two consecutive factors? But in fact that’s just a characterization of the even numbers.)

Some other properties of 2016:

it has a nice binary representation as 11111100000; alternatively it’s the sum of consecutive powers of 2, $2^{11} + 2^{10} + 2^9 + 2^8 + 2^7 + 2^6$
it’s a triangular number, $1 + 2 + 3 + \cdots + 63 = (64 \times 63)/2$
it’s the number of ways to arrange two same-color pawns on a chessboard, ${8^2 \choose 2}$
it’s divisible by every single-digit number except 5

2016 appears to be a relatively interesting number, appearing in 784 sequences in the OEIS – compare 179 sequences containing 2015 and 428 containing 2017. This is probably traceable to the fact that it has many prime factors – all of the above are essentially divisibility properties – and this is the big idea in the paper on “Sloane’s gap” by Nicolas Gauvrit, Jean-Paul Delahaye and Hector Zenil. (Don’t like reading? Check out the Numberphile video instead.)

In fact, back in 2009 Philippe Guglielmetti put together a spreadsheet giving the number of appearances in OEIS of each number up to $2^{16}$ , which you can download from this post (click on “feuille Excel”); at the time 2016 appeared in the OEIS more often than any other number between $1729 = 12^3 + 1$ and $2047 = 2^{12}-1$ inclusive.