Rolling the dice

From the March 11, 2022 “Riddler”:
We’re playing a game where you have to pick four whole numbers. Then I will roll four fair dice. If any two of the dice add up to any one of the numbers you picked, then you win! Otherwise, you lose.

For example, suppose you picked the numbers 2, 3, 4 and 12, and the four dice came up 1, 2, 4 and 5. Then you’d win, because two of the dice (1 and 2) add up to at least one of the numbers you picked (3).

To maximize your chances of winning, which four numbers should you pick? And what are your chances of winning?

Some first thoughts:

You want numbers that are common as the sums of two dice – middling numbers, numbers near seven.
The problem has a reflection symmetry. The dice values $x_1, x_2, x_3, x_4$ win with the target sums $y_1, y_2, y_3, y_4$ if and only if the dice values $7-x_1, 7-x_2, 7-x_3, 7-x_4$ win with the target sums $14-y_1, 14-y_2, 14-y_3, 14-y_4$ .

Putting these together, a symmetric set of middling numbers seems likely to be the best target set – something like 5, 6, 8, 9. This is a nasty case analysis, but it’s easy to do by brute force in R.

library(tidyverse)


dice_and_targets = expand.grid(d1 = 1:6, d2 = 1:6, d3 = 1:6, d4 = 1:6,
                   t1 = 2:12, t2 = 2:12, t3 = 2:12, t4 = 2:12) %>% filter(t1 < t2 & t2 < t3 & t3 < t4)  %>% 
  mutate(s12 = d1 + d2, s13 = d1 + d3, s14 = d1 + d4,
          s23 = d2 + d3, s24 = d2 + d4, s34 = d3 + d4)

The data frame `dice_and_targets` has a row for every possible combination of dice results (d1 … d4) and targets (t1 … t4), and the sums of the dice (s12 … s34). It’s a big data frame, with $6^4 \times {11 \choose 4} = 1296 \times 330 = 427680$ rows, one for each of the 1296 possible dice rolls and 330 choices of targets.

Let’s take a look at a sample of this data frame, consisting of 10 randomly selected rows:

set.seed(1)
dice_and_targets$idx = sample(nrow(dice_and_targets))
dice_and_targets %>% filter(idx <= 10) %>% select(-idx)

   d1 d2 d3 d4 t1 t2 t3 t4 s12 s13 s14 s23 s24 s34
1   1  2  3  5  2  3  5  8   3   4   6   5   7   8
2   4  1  6  5  2  3  4  9   5  10   9   7   6  11
3   2  3  4  5  3  6  7  9   5   6   7   7   8   9
4   2  1  5  6  3  5  8  9   3   7   8   6   7  11
5   1  6  4  6  2  6  9 10   7   5   7  10  12  10
6   3  4  4  6  3  6  7 11   7   7   9   8  10  10
7   2  2  4  2  4  8 10 11   4   6   4   6   4   6
8   1  2  3  5  2  3  4 12   3   4   6   5   7   8
9   2  6  6  1  2  6  8 12   8   8   3  12   7   7
10  1  6  3  6  6  8 11 12   7   4   7   9  12   9

Consider, for example, the first row. In this case we roll 1, 2, 3, and 5; the targets are 2, 3, 5, and 8; the pairwise sums are 1+2 = 3, 1+3 = 4, 1+5 = 6, 2+3 = 5, 2+5 = 7, and 2+6 = 8; and we win the game, in fact three times over, since the pairwise sums include three of the targets, namely 3, 5, and 8.

Next we can work out which rows win, leveraging some bitwise operations because how often do I get a chance to use these?

dice_and_targets = dice_and_targets %>% mutate(target_bits = bitwOr(bitwOr(2^t1, 2^t2), bitwOr(2^t3, 2^t4)))
dice_and_targets = dice_and_targets %>% 
  mutate(sum_bits = bitwOr(bitwOr(bitwOr(2^s12, 2^s13), bitwOr(2^s14, 2^s23)), bitwOr(2^s24, 2^s34)))
dice_and_targets = dice_and_targets %>% 
  mutate(win = bitwAnd(target_bits, sum_bits) > 0)

In this case target_bits has the bit corresponding to $2^t$ set if $t$ is one of the targets; sum_bits has the bit corresponding to $2^s$ set if $s$ is one of the pairwise sums. Then bitwAnd(target_bits, sum_bits) has a nonzero bit if and only if we have a winning combination.

Let’s look at those randomly selected rows, now with the wins figured out:

dice_and_targets %>% filter(idx <= 10) %>% select(-idx)

   d1 d2 d3 d4 t1 t2 t3 t4 s12 s13 s14 s23 s24 s34 target_bits sum_bits  win
1   1  2  3  5  2  3  5  8   3   4   6   5   7   8         300      504 TRUE
2   4  1  6  5  2  3  4  9   5  10   9   7   6  11         540     3808 TRUE
3   2  3  4  5  3  6  7  9   5   6   7   7   8   9         712      992 TRUE
4   2  1  5  6  3  5  8  9   3   7   8   6   7  11         808     2504 TRUE
5   1  6  4  6  2  6  9 10   7   5   7  10  12  10        1604     5280 TRUE
6   3  4  4  6  3  6  7 11   7   7   9   8  10  10        2248     1920 TRUE
7   2  2  4  2  4  8 10 11   4   6   4   6   4   6        3344       80 TRUE
8   1  2  3  5  2  3  4 12   3   4   6   5   7   8        4124      504 TRUE
9   2  6  6  1  2  6  8 12   8   8   3  12   7   7        4420     4488 TRUE
10  1  6  3  6  6  8 11 12   7   4   7   9  12   9        6464     4752 TRUE

In the first row, here, target_bits is $2^8+2^5+2^3+2^2 = 100101100_2 = 300$ and sum_bits is $2^8+2^7+2^6+2^5+2^4+2^3 = 111111000_2 = 504$ . And bitwAnd(target_bits, sum_bits) (not shown) is $2^8 + 2^5 + 2^3 = 100101000_2 = 296$ . Since it’s greater than zero, that counts as a win.

You might get the idea that it’s impossible to lose from this sample. We got a little lucky here: mean(dice_and_targets$win) returns 0.874018. If you pick a random roll of four dice and four random targets out of 2, 3, …, 12, one of the pairwise sums will be in the target set 87% of the time.

But we want to know which target set makes a win most likely.

win_counts_by_target = dice_and_targets %>% group_by(t1, t2, t3, t4) %>% summarize(wins = sum(win)) %>% arrange(desc(wins))

head(win_counts_by_target)
> head(win_counts_by_target)

# A tibble: 6 x 5
# Groups:   t1, t2, t3 [5]
     t1    t2    t3    t4  wins
  <int> <int> <int> <int> <int>
1     4     6     8    10  1264
2     2     6     8    10  1246
3     4     6     8    12  1246
4     4     6     7     9  1238
5     5     7     8    10  1238
6     4     7     8     9  1236

There we go! And not a loop in sight.

Once I have the 4, 6, 8, 10 target set it’s easy to come up with that number 1264. Consider just the dice that show even numbers – you win if there are at least two of these and they’re not all showing 6. Similarly you win if there are at least two odd dice and they’re not all showing 1. So the losing combinations are

$(1, 1, 1, 1), (1, 1, 1, 2), (1, 1, 1, 4), (1, 1, 1, 6), (1, 1, 6, 6), (1, 6, 6, 6), (3, 6, 6, 6), (5, 6, 6, 6), (6, 6, 6, 6)$

and their rearrangements, of which there are 32.

Incidentally, my initial guess (5, 6, 8, 9) wasn’t bad – it wins 1228 times out of 1296, good enough for 14th place out of the 330 possible target sets. And the very worst target set? It’s (2, 3, 11, 12). No surprise there, although even that one wins 776 times out of 1296, nearly 60% of the time.

Products equaling sums

From The Riddler, June 25, 2021, a puzzle from Matt Enlow (which was originally published in Math Horizons, although I can’t find the original):

A bag contains 100 marbles, and each marble is one of three different colors. If you were to draw three marbles at random, the probability that you would get one of each color is exactly 20 percent.

How many marbles of each color are in the bag?

It doesn’t seem like there’s enough information, but there is. I recommend not trying to solve this puzzle in your head while navigating the day care parking lot, though.

Let’s say the numbers of the three colors are x, y and z. Then the number of sets of three marbles which contain one of each color is just xyz. And the number of total sets of three marbles is ${100 \choose 3}$ . So we need to find positive integers x, y, z such that x + y + z = 100 and $5xyz = {100 \choose 3}.$

It’ll help to find the prime factorization of ${100 \choose 3}$ , so we expand it:

${100 \choose 3} = {100 \times 99 \times 98 \over 3 \times 2 \times 1} = {2^2 \times 5^2 \times 3^2 \times 11 \times 2 \times 7^2 \over 3 \times 2} = 2^2 \times 3 \times 5^2 \times 7^2 \times 11.$

Dividing out the 5, we need $xyz = 2^2 \times 3 \times 5 \times 7^2 \times 11.$ . The three big factors – the two 7s and the 11 – look like they’ll be worrisome. So let’s spread them out: let x and y both be multiples of 7, and let z be a multiple of 11. To get them to add up to 100, we need to find a multiple of 7 and a multiple of 11 that add up to 100; it’s easy to find 44 and 56. So z = 44.

That leaves $xy = 3 \times 5 \times 7^2$ , so x and y must be 21 and 35.

Alternately, we can write some code to solve it, but what fun is that?

n = 100

target = (n * (n-1) * (n-2))/6
for x in range(1, n+1):
    for y in range(x, int((n-x)/2)+1):
        z = n-x-y;
        if target == 5*x*y*z:
            print(x, y, z)

which prints out

21 35 44

as expected.

But now that we have some code, is there something special about 100? Let’s iterate over the first 1000 integers:

for n in range(1, 1000):
    target = (n * (n-1) * (n-2))/6
    for x in range(1, n+1):
        for y in range(x, int((n-x)/2)+1):
            z = n-x-y;
            if target == 5*x*y*z:
                print(n, x, y, z)

returns

6 1 1 4
10 2 2 6
22 4 7 11
26 5 8 13
27 5 9 13
35 7 11 17
36 7 12 17
40 8 13 19
46 11 12 23
56 11 21 24
65 13 24 28
76 19 20 37
100 21 35 44
126 31 35 60
330 82 94 154
345 86 98 161
352 78 120 154
352 91 96 165
406 101 116 189
436 109 124 203
497 124 142 231
512 128 146 238
737 161 268 308

The sequence 6, 10, 22, 26, 27, 35… isn’t in the OEIS as of this writing, although some other Riddler puzzles have made it; it may be there later. These are all numbers where not just n has a lot of factors, but also n-1 and n-2 – for example 345 = 3 x 5 x 23, 344 = 2 x 2 x 2 x 43, 343 = 7 x 7 x 7. Thus ${n \choose 3}/5$ will have many factors, and many ways to be written as a product of three integers, making it more likely that one of those will have the right sum.

I had been expecting an algebraically cleaner solution, because this reminded me of another puzzle: a bag has r red and b blue balls in it. Find r, b such that the probability that two balls picked at random have different colors is exactly one-half. In this case similar code returns the solution

from which we can guess that the solutions are $r = (n^2-n)/2, b = (n^2+n)/2$ .

Balancing the centrifuge – a Riddler puzzle

From the riddler:

Quoc’s lab has a microcentrifuge, a piece of equipment that can separate components of a liquid by spinning around very rapidly. Liquid samples are pipetted into small tubes, which are then placed in one of the microcentrifuge’s 12 slots evenly spaced in a circle.
For the microcentrifuge to work properly, each tube must hold the same amount of liquid. Also, importantly, the center of mass of the samples must be at the very center of the circle — otherwise, the microcentrifuge will not be balanced and may break.
Quoc notices that there is no way to place exactly one tube in the microcentrifuge so that it will be balanced, but he can place two tubes (e.g., in slots 1 and 7).
Now Quoc needs to spin exactly seven samples. In which slots (numbered 1 through 12, as in the diagram above) should he place them so that the centrifuge will be balanced? Extra credit: Assuming the 12 slots are distinct, how many different balanced arrangements of seven samples are there?
https://fivethirtyeight.com/features/can-you-break-a-very-expensive-centrifuge/

I’ve seen this one before, but I’ll take it on as a programming challenge.

I’ll say that hole k is at coordinates $(\sin 2\pi k/12, \cos 2\pi k/12)$ . This is nonstandard, but the result looks like a clock, so it’s easier to visualize:

Now, there are ${12 \choose 7} = 792$ ways that we could pick 7 out of the 12 holes. To generate a list of these I can use the following R function:

combinations = function(n, k){
  if (k == 0 & n == 0){return(list())} else {
    if (k == 0){return(list(c()))} else {
      if (n == k){return(list(1:n))} else {
        without_n = combinations(n-1, k)
        with_n = lapply(combinations(n-1, k-1), function(x){c(x, n)})
        return(c(without_n, with_n))
      }
    }
  }
}

To generate the subsets of [n] of size k, I just take:

the subsets that don’t contain n (which are therefore subsets of [n-1] of size k), and
the subsets that do contain n (which are therefore subsets of [n-1] of size k-1, with n adjoined)
(I learned this a good two decades ago, from section 3.6 of Herb Wilf’s notes East Side, West Side.)

Then iterate over the subsets to find their centers of mass:

our_comb = combinations(12, 7)
com_x = rep(NA, length(our_comb))
com_y = rep(NA, length(our_comb))
for (i in 1:length(our_comb)){
  com_x[i] = mean(x[our_comb[[i]]])
  com_y[i] = mean(y[our_comb[[i]]])
}

So our_comb runs through the combinations, and com_x and com_y are the x- and y-coordinates of the center of mass. Plotting the centers of mass gives we get an attractive symmetric pattern. Here I’ve plotted transparent points, so the darker points are points that arise from more distinct combinations. The outermost points represent the most imbalanced centrifuges – for example the one closest to the top represents loading slots 9, 10, 11, 12, 1, 2, and 3. The main question is: how many points are on top of each other at the very center.

Centers of mass of all subsets of 7 of the 12 centrifuge positions.

I can extract those subsets from the matrix our_comb and build them into a matrix for looking at. There’s a small problem in that we used floating-point arithmetic, so the coordinates don’t come out exactly at zero, but it’s enough to ake the ones that are “close enough”:

epsilon = 10^(-6)
balanced_indices = which(abs(com_x) < epsilon & abs(com_y) < epsilon)
balanced_combs =  t(matrix(unlist(our_comb[balanced_indices]), nrow = k))

and the matrix balanced_combs, which has one row for each combination that balances the centrifuge, is

      [,1] [,2] [,3] [,4] [,5] [,6] [,7]
 [1,]    1    2    3    6    7    9   10
 [2,]    1    2    4    5    8    9   10
 [3,]    1    2    5    6    7   10   11
 [4,]    2    3    4    7    8   10   11
 [5,]    2    3    5    6    9   10   11
 [6,]    1    2    5    6    8    9   12
 [7,]    1    3    4    7    8    9   12
 [8,]    1    4    5    6    9   10   12
 [9,]    1    4    5    7    8   11   12
[10,]    2    3    6    7    8   11   12
[11,]    3    4    5    8    9   11   12
[12,]    3    4    6    7   10   11   12

(The order here is lexicographic order, but you have to read the rows backwards.)

If you can make sense of this pile of numbers without making some pictures, good for you! But I am human, though a mathematician, so let’s plot. Here big black dots represent the loaded spots, and small blue dots represent the non-loaded spots.

(And yes, it’s in base R graphics. It’s been a while since I’ve used those – at my day job it’s all ggplot all the time.)


par(mfrow = c(4, 3), mar = c(1, 1, 1, 1))
for (i in 1:nrow(balanced_combs)){
  full = balanced_combs[i,]
  empty = setdiff(1:n, balanced_combs[i,])
  plot(x[full], y[full], pch = 19,cex = 2,
       asp = 1, axes = FALSE, xlab = '', ylab = '', 
       xlim = c(-1.25, 1.25), ylim = c(-1.25, 1.25))
  points(x[empty], y[empty], pch = 19, cex = 1, col = 'lightblue')
}

So there are 12 ways to load the centrifuge… but we can clearly see that all of them are really just rotations of a single pattern. To me the visually most convenient representative of that pattern is the third one in the third row, which has loadings at 12, 1, 4, 5, 7, 8, and 11.

We can play a little game of connect-the-dots to see why this pattern is balanced. Connect 12, 4, and 8 to form an equilateral triangle. Connect 1 and 7; connect 5 and 11. Like this:

12-hole centrifuge loaded with 7; symmetric subsets specified

Each of those three subsets has its center of mass at the center of the circle, so the whole arrangement does too.

That decomposition turns out to be the key to the general problem of which centrifuge sizes can be loaded with which number of tubes and remain balanced. Stay tuned.