We present and describe Fubini's principle. We also provide short (and hopefully sweet)
examples/illustrations of the method.
Target Audience:
High school students, olympiad enthusiasts.
What:
Many of us might have seen the principle at work without being aware that this has a name.
The essential idea is to count objects in different ways, to yield surprising results.
The principle of Fubini referred to
in this post, corresponds to double integrations where
the order of integrating over $x$ or over $y$ does not matter. There are certain
conditions under which such integrations can be conducted; however that is not the
focus of this short note. (The interested reader may check out the
wiki page for details
as to when we can perform the double integration in this manner). The essential
method underlying Fubini's theorem is commonly used in combinatorics
as Double Counting.
In this short note, we illustrate Fubini's principle in combinatorics and counting,
via the following examples:
We will describe the basic idea behind this counting technique.
We will demonstrate the use of this via some examples.
The icing: we will show how one can create new problems from old, by
tweaking the various assumptions involved.
About the technique.
The basic idea behind the method is simplicity itself. The technique deals with
different ways of counting some combinatorial entities/objects; since we are
essentially counting the same collection of objects, the counts should be the same.
Here is a short summary of the technique: suppose you have a $2\mathbf{D}$ matrix of
$0$s and $1$s, then counting the $1$'s by the rows or counting the $1$'s along
the columns should yield the same sum: the total number of $1$'s in the matrix!
Here, the "combinatorial object" that we are counting are the $1$'s in the matrix.
The two ways of counting (via rows, or via columns) is tantamount to two "views" on the
same table/matrix.
While this may sound rather trivial, it is amazing how such a simple idea can have
really far-reaching consequences. But this is quite common in Mathematics in general,
a rather simple kernel of an idea can have numerous ramifications.
Examples:
As promised, we are going to demonstrate the effectiveness of this technique via
examples.
In a contest, there are $m$ candidates and $n$ judges, where $n \geqslant 3$ is an odd integer.
Each candidate is evaluated by each judge as either pass or fail.
Suppose that each pair of judges agrees on at most $k$ candidates.
Prove that
$$
\frac{k}{m} \geqslant \frac{n-1}{2n}
$$
First Thoughts:
The interactions indicated in the problem are primarily between candidates and judges.
Thus if we were to model the candidates and judges are vertices in a graph, this would
correspond to a bipartite graph.
At the outset, it is always a good idea to start fixing some notation/indexing.
We will label the Judges by the letter $J$ and the candidates by $C$.
Typical indexing considerations go as follows: since the word
judge starts with the letter j, while we would like to denote the judges as
$J_{\star}$, it would be preferable not to index them by $j$ itself (imagine
how $J_j$ would sound/look!).
However, while $J_k$ sounds nice, we cannot use $k$ for indexing, since
$k$ already means something in this problem! So running through the alphabet,
let's choose another letter to index the judges. For this post, our choice
is $r$. So the judges are $J_1, J_2, \cdots, J_r, \cdots, J_n$. For fun, we can use
this list
of judges for names.
For the candidates, we are free to use $i$, so $C_1, \cdots, C_i, \cdots, C_m$ are the
candidates.
À priori, it is not clear why we have the restriction that $n$ is an odd
integer. However, let us call this out here, so that this is a question that the
reader carries in mind, throughout the following to see whether we need this extra
condition at all. We should also note that there are at least two places where
the number $2$ figures prominently in this problem - one is in the parity of $n$ (that
$n$ is odd) and also in the fraction $\frac{n-1}{2n}$ (which is really close to
$\frac{1}{2}$). In fact there is one more implicit way in which the number $2$ figures
in this problem: can the reader figure out where?
Some basic sanity checks.
Note the wording of the problem: every pair of judges
agrees on at most $k$ candidates. So for a specific scenario if a number $k_0$
satisfies this restriction, so also does $k_0 + 1$. Thus, the direction of the
inequality $\frac{k}{m} \geqslant \frac{n-1}{2n}$ is correct. If $k=k_0$ satisfies
this inequality so also does $k_0 + 1$. This may seem almost facetious right now -
having realised the above flavor of the problem, we may rephrase the problem as considering the following quantity:
$$\min_{\text{all scenarios}} \max_{\text{judge pairs}} (\text{agreements over candidates})$$
Thus we recognize the problem as a minimax
problem!
Start off:
Armed with the above, we turn our minds to modeling the problem. Clearly since there is no
direct interaction between judges (or between candidates), we may model this
as a bipartite graph. And of course, there is no better way to represent
(the edges of ) a bipartite graph by its bipartite adjacency matrix, where we have
(say) rows corresponding to the candidates and the columns corresponding to the judges.
Here is an illustration for $n = 3$ judges and $m = 2$ candidates. Call this matrix
$\mathbf{M}$.
$J_1$
$J_2$
$J_3$
$C_1$
P
F
P
$C_2$
F
P
P
Given this specific setting, we may check that the right $k$ for this matrix is
$k = 1$ (any pair of judges agree on at most $1$ candidate). Also note that the
inequality in the problem does hold: $1/2 \geqslant 2/6$ ($k = 1$, $m = 2$, $n = 3$).
Transform:
While the input is essentially the matrix above, we should rather be looking at the matrix
that directly ties up with the minimax statement that we formulated. Look at the minimax
statement again: it considers judge pairs instead of judges. This indicates that
while the matrix representation above considers
(candidate, judge) tuples, we should instead consider tuples of (candidate, judge pairs).
How do we go from the matrix $\mathbf{M}$ above to this latter matrix? Also what should the
entries of the transformed matrix be? Well, given a candidate and
a pair of judges, the entry should be just "Agree" (A in short) or "Disagree" (D in short).
Let us demonstrate
the transformed matrix $\mathbf{T}$ corresponding to the above:
$(J_1, J_2)$
$(J_1, J_3)$
$(J_2, J_3)$
$C_1$
D
A
D
$C_2$
D
D
A
Note that if the original matrix had $n$ columns corresponding to the judges, this
new matrix $\mathbf{T}$ has $n\choose 2$ columns corresponding to judge pairs. The problem
statement - in terms of this new matrix - translates to: every column has
at most $k$ A's.
Enter Fubini
Now the stage is set to make progress. We will think Fubini with the matrix $\mathbf{T}$.
We have some handle of the counts (of A's) as per the columns - how about the rows?
Consider candidate row $C_i$. This candidate gets evaluated by all the $n$ judges
and gets $P$ or $F$ from each of them - let's give numbers here - suppose $C_i$
gets $x_i$ $P$'s and $(n - x_i)$ $F$'s (we are keeping a subscript $i$ on the $x$
here since this is happening for the candidate $C_i$). How many $A$'s and $D$'s
does $C_i$ get in its row in the transformed matrix?
Well this is easy to calculate. Every pair of $P$'s in $\mathbf{M}$
gives rise to an $A$ in $\mathbf{T}$, as also
every pair of $F$'s. Thus the total number of $A$'s in $C_i$'s row in
$\mathbf{T}$ is ${x_i \choose 2} +{{n-x_i} \choose 2}$.
Thus the number of $A$'s in the matrix $\mathbf{T}$ (summing over all
rows) is:
$$
\sum_{i=1}^{i=m} {x_i \choose 2} + {{n-x_i} \choose 2}
$$
The number of $A$'s (summing over all columns now) is:
$$
\leqslant {n \choose 2}\times k
$$
Finally, matching up, we get:
$$
\sum_{i=1}^{i=m} {x_i \choose 2} + {{n-x_i} \choose 2} \leqslant {n \choose 2} k
$$
Dénouement
Now this is just calculation, but before that happens we need to figure out the
final step - how do we take care of the $x_i$'s appearing in this last inequality?
Here is where the $\min_{\text{all scenarios}}$ comes into play: we need this
inequality to be true not just for a specific matrix $\mathbf{T}$ but for
all such matrices.
So we will have to consider the maximum that the quantity
${x_i \choose 2} + {{n-x_i} \choose 2}$ can take (where $x_i$ has to
be an integer). Also, notice that
this final step is the only place where we are invoking that
$n$ is odd!
This maximum (over choices of $x_i$) is easily derived since the
expression ${x_i \choose 2} + {{n-x_i} \choose 2}$ is essentially a
quadratic. (The reader is invited to check the following Fact
by oneself, or to click-expand below).
Let $n = 2t + 1$ (since $n$ is odd).
The expression ${x_i \choose 2} + {{n-x_i} \choose 2}$ simplifies as
$\frac{x_i^2 + (n-x_i)^2 - n}{2}$. We can forget about constants for the
purpose of this calculation (such as $n$, $2$), so we are essentially
maximizing $x_i^2 + (n-x_i)^2$.
Now this is easy. By differentiation or appealing to AM-GM, we can see that
this maximum is achieved when $x_i$ is (roughly) half of $n$. However, since
$n$ is odd and $x_i$ has to be an integer, it cannot happen that $x_i$ is
exactly half of $n$. So the roughly translates to $n = 2t + 1$,
$x_i = t$ (or $t+1$).
Calculating the expression for $x_i = t = (n-1)/2$ gives the result.
Finally, we have the overall inequality:
$$
m \frac{n^2 - 1}{4} \leqslant k {n \choose 2}
$$
which yields the desired inequality: $k/m \geqslant (n-1)/2n$.
Takeaways
Can the reader see where $2$ entered the problem? The answer: because there are two
categories in which the entries of the original matrix $\mathbf{M}$
are classified (Pass or Fail). I invite the reader to
construct his/her own problem for when there are three possibilities
instead of $2$ (say, each Judge labels each Candidate one of {Good, Moderate, Bad}).
For even $n$: the reader is invited to check that in this case, the
resulting inequality is
$$
\frac{k}{m} \geqslant \frac{n-2}{2(n-1)}
$$
i.e. the same as for the case of the odd number $(n-1)$. This was surprising
to me - would appreciate comments as to why this happened!
We note that this problem has a minimax objective. What if we were to
consider other objectives - such as the average number of agreements
over judge pairs?
Example 2:
Sperner's Theorem:
Let $\cal{F}$ be a family of subsets of $N = \{1, 2, \cdots, n\}$, and
suppose there are no $A, B \in \cal{F}$ satisfying $A \subset B$.
Prove that $| \cal{F} | \leqslant {n \choose \lfloor{n/2\rfloor}}$.
First Thoughts:
This is a popular result in extremal set theory. The
extremal set system (i.e. the set family for which the upper bound
is attained) is given by all the sets of size $\lfloor{n/2\rfloor}$
(or alternatively, by all the sets of size $\lceil{n/2\rceil}$).
Chains, Antichains:
In fact, if we recall some concepts from posets, we realise that
this is a result about chains and antichains. Given a
partial order, an antichain is a subcollection of sets such that no
pair of them are comparable according to the partial order
(I leave it to the reader to guess the definition of a chain). Thus
the problem refers to the size of the maximal antichain under the
partial order of set inclusion.
The canonical chain $\cal{C}$ consists of the sets
$$
\phi \subset \{1\} \subset \{1, 2\} \subset \{1, 2, \cdots, i\}\} \subset \cdots \subset \{1, 2, \cdots, n\}
$$
We are giving this a special name $\cal{C}$ because this will figure in the
discussion later.
Clearly, given any chain $C$, and a permutation $\sigma$, the image
$\sigma(C)$ is also a chain.
Notation:
We denote $S_n$ as the set of all the permutations of $n$ letters.
We will let $\sigma, \tau$ denote permutations (and as
given in the problem, $A, B$ will denote sets in the family $\cal{F}$).
As we saw in the example above, the crux of the matter is to
choose the right kind of objects to count,
in two different ways. This is what we focus on, next.
While this is a problem easily found in many books, I recently
chanced upon this as an exercise for the chapter on
Linearity of Expectation in Alon and Spencer's
book on the Probabilistic Method.
It is only fair that I also include
the hint
that came along with the question statement:
Let $\sigma \in S_n$ be a random permutation of the elements
of $N$ and consider the random variable
$$
X = |\{ i : \{\sigma(1), \sigma(2), \cdots, \sigma(i)\} \in \cal{F}\}|
$$
and consider the expectation of the random variable $X$.
Start off:
We will need to play with what we are trying to prove:
$$
{|\cal{F}|} \leqslant {n \choose \lfloor{ n/2 \rfloor}}
$$
or that is
$$
{|\cal{F}|} \lfloor{ n/2 \rfloor}! {\lceil{ n/2 \rceil}!} \leqslant n!
$$
It may not seem overly non-intuitive to define a matrix, where
the rows correspond to sets $A \in \cal{F}$, and the columns correspond
to the permutations $\sigma \in S_n$. So let's do that and call the matrix
$M$. Also, while $\cal{F}$ and $S_n$ index the rows and the columns
respectively, the entries of the matrix $M$ will be numbers (so that
we will count these numbers according to the rows, then according
to the columns to get our conclusion).
Big Question: What should the entries of the matrix $M$ be?
At this point we may scratch our head, since this is not entirely obvious
as to how we map a $(A, \sigma)$ pair to a number. Stretching our minds
a bit, we realise that $\sigma \in S_n$ is actually a function that
maps the elements of $A$, and so $\sigma(A)$ makes sense. Here $\sigma(A)$
may be thought of as another set ($A$'s elements with names
changed according to $\sigma$). Note that
$\sigma(A) \subseteq \sigma(B)$ iff $A \subseteq B$.
Based on the above, we may now define the matrix entry $M(A, \sigma)$
as follows:
$$
M(A, \sigma) = 1\, if\, \sigma(A) \in \cal{C}, 0\, otherwise
$$
where $\cal{C}$ is the canonical chain as defined earlier.
Elaborating on the condition $\sigma(A) \in \cal{C}$
further: we are essentially checking
if under the permutation $\sigma$, the set $A$ "looks like" the set
$\{1, 2, \cdots, |A|\}$.
Clearly, this is the magic step in the whole argument.
Enter Fubini:
Now, as is necessary for an application of Fubini, we check the
number of $1$'s that are present in a row or a column of the matrix $M$.
Columns:Every column corresponds to a $\sigma$.
If $A, B \in \cal{F}$ are such that $M(A, \sigma)= M(B, \sigma) = 1$, then
this means that $\sigma(A) \in \cal{C}$, as also
$\sigma(B) \in \cal{C}$. This cannot happen, since $\cal{C}$ is a chain
and $\cal{F}$ avoids chains (i.e. $A \not\subseteq B$).
Rows:Every row corresponds to a set $A$. Given a
set $A$, how many $\sigma$'s are such that $\sigma(A) \in \cal{C}$?
This is an easy calculation: the elements of $A$ are to be mapped to
$\{1, 2, \cdots, |A|\} \in \cal{C}$, whereas the other elements are
to be permuted amongst themselves. So there are $|A|!(n-|A|)!$ such
permutations $\sigma$.
Note that $|A|!(n-|A|)! \geqslant \lfloor{n/2\rfloor}! \lceil{n/2\rceil}!$,
so each row has at least $\lfloor{n/2\rfloor}! \lceil{n/2\rceil}!$ $1$'s.
The finish: Compare the counts gotten from rows
and columns. There are in total $n!$ columns, each containing
at most one $1$. Each row contains at least $\lfloor{n/2\rfloor}! \lceil{n/2\rceil}!$
$1$'s (and there are $|\cal{F}|$ rows). Thus, we have that
$$
{|\cal{F}|}\lfloor{n/2\rfloor}! \lceil{n/2\rceil}! \leqslant n!
$$
and we are done.
Takeaways:
Notice that we may redefine the entries of the matrix $M$ as
follows $M(A, \sigma) = 1$ if $\sigma(A) \in C$ where $C$ is
any fixed chain: "fixed" meaning that we use this chain
for all $\sigma$, all $A$.
Of course it is easier to define the matrix
in terms of the canonical chain ${\cal{C}}$ which is what
we do here.
We may notice that this double-counting proof really gives us something
more, in fact the
LYM Inequality.
In the book on the Probabilistic Method, the exercise appears in the chapter
on Linearity of Expectation. Can the reader formulate a proof in that
framework?
The underlying deeper thread to Fubini is duality. In the case of
Sperner's Theorem above, this refers to the duality between
chains and antichains. The reader is invited to explore Linear
Programming duality in the context of the above problems.