John Cook posted about the “baseball inequality“. In his formulation, if you have two lists of k positive numbers each, and $\latex d_1, \ldots, d_k$, then
This has the interpretation that the batting average of a team is between the batting average of the best player and that of the worst player.
This is not what I expected from the headline “baseball inequality” and the list of numbers. What I was expecting was the following. If the two lists are in numerical order, and , then
for any permutation of 1, 2, …, k. This is actually what’s called the rearrangement inequality. Its use in baseball is in setting the batting order. If you want to arrange the batting order so your team gets the most hits, then you want the players with the best batting averages (the highest y) to be earlier in the batting order and therefore get the most at-bats (the highest x). (One-line idea of proof: if any of the are out of order, you can increase the sum by putting them in order.)
Reality is a bit more complicated, because:
- first, the goal of baseball is of course to get runs, not hits, so you want to count walks, and extra-base hits as well; hence sorting players by OPS (on-base percentage plus slugging percentage) should be better than sorting by batting average. (OPS doesn’t make dimensional sense because you’re adding two fractions with different denominators, but let’s ignore that.)
- second, there are interactions between the players – in order to score runs you usually need to get multiple hits in close succession. Since batting orders are cyclic, you perhaps don’t want to have your worst hitter going immediately before your best hitter, and indeed some teams have tried batting the pitcher eighth. (I’m a National League fan; don’t talk to me about designated hitters.)
These are probably problems that are best solved by simulation, and I’ve got a day job.