John Cook posted about the “baseball inequality“. In his formulation, if you have two lists of k positive numbers each, and $\latex d_1, \ldots, d_k$, then

.

This has the interpretation that the batting average of a team is between the batting average of the best player and that of the worst player.

This is not what I expected from the headline “baseball inequality” and the list of numbers. What I was expecting was the following. If the two lists are in numerical order, and , then

for any permutation of 1, 2, …, k. This is actually what’s called the rearrangement inequality. Its use in baseball is in setting the batting order. If you want to arrange the batting order so your team gets the most hits, then you want the players with the best batting averages (the highest y) to be earlier in the batting order and therefore get the most at-bats (the highest x). (One-line idea of proof: if any of the are out of order, you can increase the sum by putting them in order.)

Reality is a bit more complicated, because:

first, the goal of baseball is of course to get runs, not hits, so you want to count walks, and extra-base hits as well; hence sorting players by OPS (on-base percentage plus slugging percentage) should be better than sorting by batting average. (OPS doesn’t make dimensional sense because you’re adding two fractions with different denominators, but let’s ignore that.)

second, there are interactions between the players – in order to score runs you usually need to get multiple hits in close succession. Since batting orders are cyclic, you perhaps don’t want to have your worst hitter going immediately before your best hitter, and indeed some teams have tried batting the pitcher eighth. (I’m a National League fan; don’t talk to me about designated hitters.)

These are probably problems that are best solved by simulation, and I’ve got a day job.

Under the direction of statistician Carl Morris (now at Harvard, then at UT/Austin), we did some analysis of batting order circa 1982. We set up a Markov chain and computed the run-production of the various possible batting orders. Surprisingly, or possibly unsurprisingly, managers were nearly always coming very close to the optimum batting order. As you note, there’s more to it than just putting the highest batting averages first. There’s a reason fast guys who get on base a lot and can steal are at the top of the order, and power hitters are placed after that. Pitchers are almost always put last in the order because they are generally the worst hitters, but occasionally, we found one could eke out a fraction of a run by moving the pitcher up a slot or two. However, managers we talked to were more interested in protecting their star pitchers from injury, even if it cost a tenth of a run.

Under the direction of statistician Carl Morris (now at Harvard, then at UT/Austin), we did some analysis of batting order circa 1982. We set up a Markov chain and computed the run-production of the various possible batting orders. Surprisingly, or possibly unsurprisingly, managers were nearly always coming very close to the optimum batting order. As you note, there’s more to it than just putting the highest batting averages first. There’s a reason fast guys who get on base a lot and can steal are at the top of the order, and power hitters are placed after that. Pitchers are almost always put last in the order because they are generally the worst hitters, but occasionally, we found one could eke out a fraction of a run by moving the pitcher up a slot or two. However, managers we talked to were more interested in protecting their star pitchers from injury, even if it cost a tenth of a run.