Musings on mathematics and teaching.

Primes of the form 6k+1

Euclid’s proof of the infinitude of primes is justly famous. Here is one version of this proof:

Let P be a finite set of primes, and let N be the product of the numbers in P. Then N+1 is not divisible by any number in P, since it leaves a remainder of 1. But N+1 must be divisible by at least one prime, so P cannot contain all of the primes. Therefore the set of primes is infinite.

A variation of this argument shows that there are infinitely many primes of the form 6k - 1.

Let P be a finite set of primes of the form 6k - 1, and let N be the product of the primes in P. Consider the number 6N - 1. It is not divisible by 2 or 3, nor is it divisible by any number in P. But it is not possible that all prime factors of 6N - 1 have the form 6k + 1, because the product would have the form 6k + 1 as well.

Therefore, 6N - 1 has at least one prime factor of the form 6k - 1 that does not belong to P. Therefore, P cannot contain every prime of the form 6k - 1, so the set of primes of this form is infinite.

It takes a little bit more work to prove that there are infinitely many primes of the form 6k + 1. Here is a proof of that fact.

Let P be a finite set of primes of the form 6k + 1, and let N be a number that is divisible by every number in P. Assume that N is also divisible by 6. Let p be a prime divisor of N^2-N+1.

Note that (N^2-N+1)(N+1)=N^3+1. so p divides N^3+1, or in other words N^3 \equiv -1 \pmod{p} and so N^6 \equiv 1 \pmod{p}.

Recall that the order of N modulo p is the least positive k so that N^k \equiv 1 \pmod{p}. The order must divide 6. so k = 1, 2, 3, or 6. But N^3 \equiv -1 \pmod{p}, so the order cannot be 1 or 3.

Can the order be 2? If N^2 \equiv 1 \pmod{p} and N^3 \equiv -1 \pmod{p} then N \equiv -1 \pmod{p}. This would be bad, because then p would divide both N+1 and N^2-N+1; but \gcd(N+1,N^2-N+1) = \gcd(N+1,3) < p, contradiction.

Thus N has order 6 mod p, and the group of units mod p has order p-1, so 6 divides p-1, which means that p has the form 6k+1. Therefore, P does not contain all primes of the form 6k+1, so the set of primes of this form is infinite.


Cancellation in finite groups

In this note, I will state a proof of the following theorem: If A,\ B,\ C are finite groups, and if A \times B is isomorphic to A \times C, then B is isomorphic to C.

Let h(X,Y) be the number of homomorphisms from X to Y, and let m(X,Y) be the number of monomorphisms from X to Y.

If D is a finite group then

h(D,A\times B)=h(D,A)\cdot h(D,B) and
h(D,A\times C)=h(D,A)\cdot h(D,C).

But h(D,A\times B)=h(D,A\times C) since A\times B and A\times C are isomorphic. Therefore h(D,A)\cdot h(D,B) = h(D,A)\cdot h(D,C). Dividing both sides by h(D,A) yields h(D,B)=h(D,C).

I claim that m(D,B)=m(D,C) for every finite group D. The proof is by induction on the order of D. If D has order 1 then m(D,B)=1 and m(D,C)=1, so the equality holds.

Suppose that m(E,B)=m(E,C) for every group E of order less than n, and let D be a group of order n.


h(D,B) = m(D,B) + \sum_K m(D/K,B) and

h(D,C) = m(D,C) + \sum_K m(D/K,C),

where K ranges over all nontrivial normal subgroups of D. But h(D,B)=h(D,C) by the preceding argument, and m(D/K,B)=m(D/K,C) by the induction hypothesis. Therefore m(D,B)=m(D,C).

Setting D=B and D=C in the above yields m(B,C)=m(B,B)\geq 1 and m(C,B)=m(C,C)\geq 1. Thus there exist monomorphisms B\rightarrow C and C\rightarrow B. Since B and C are finite, this implies that B and C are isomorphic. QED.

I am very fond of this theorem because it was a key lemma in my doctoral dissertation. I learned of this argument from Fred Galvin, but I do not know if it was original to him.

The theorem would be false if the word ‘finite’ were omitted. For example, if A is a free abelian group of infinite rank, B is an infinite cyclic group, and C is the trivial group, then A\times B and A \times C are both isomorphic to A, but B and C are not isomorphic to each other.

How much water is in the oceans?

Nat Banting posed the following problem on Twitter.

We could just look up the answer on Wikipedia, but to estimate the answer, we need to know the following quantities:

r = Radius of Earth
f = Fraction of Earth’s surface covered by the oceans
d = Mean depth of the oceans

Recall that the surface area of a sphere of radius r is

S = 4\pi r^2

so the total area covered by the oceans is

A = 4\pi r^2 f.

To approximate the total volume of the oceans, we multiply the area by the mean depth.

V \approx 4\pi r^2 fd

Now, let’s plug in some numbers. The radius of Earth is approximately 6400 km, and about 71% of Earth’s surface is covered by the oceans. The mean depth of the oceans is 3.7 km. So the total volume of the oceans is

V \approx 4\pi \cdot (6400 \text{ km})^2 \cdot 0.71 \cdot (3.7 \text{ km})

which is about 1.35 billion cubic kilometers.

Let’s express this in more familiar units. Recall that a liter is the volume of a cube that is 10 cm on each side. A cubic meter is 1000 liters (since 10 x 10 x 10 = 1000), and a cubic kilometer is 1,000,000,000 cubic meters (since 1000 x 1000 x 1000 = 1,000,000,000). So a cubic kilometer is equal to one trillion liters. This implies that the volume of the oceans is about 1.35 billion trillion liters, or about 350 million trillion US gallons.

A Generalization of the Birthday Problem

In a group of N people, chosen at random, what is the probability that two or more share the same birthday? We assume that birthdays are distributed equally among the 365 days of the year, ignoring leap days. This question is known as the Birthday Problem

The probability is higher than one might expect. In a group of 23 people, there is a 51% chance that two or more people have the same birthday. If there are 40 people, the probability increases to 89%.

There are many sites that explain the Birthday Problem, so I won’t discuss the details. An excellent non-technical explanation was given by Steven Strogatz in the article It’s My Birthday Too, Yeah, which was published in the New York Times. A more advanced treatment can be found in Wikipedia and Mathworld.

Instead, I would like to challenge the reader to solve a more difficult problem. In a group of 500 people, what is the probability that six or more have the same birthday?

My answer is 65.4%, which I computed using the attached Python code. However, I am not absolutely certain that this answer is correct, because the calculations are subject to rounding error, so I am hoping that someone will verify this result independently.

Python code:

Gelfand’s Question

Gary Davis (@republicofmath) has brought my attention to a question that has been attributed to Israel Gelfand: What are the possible sequences of leftmost digits of 2n, 3n, 4n, 5n, 6n, 7n, 8n, 9n?

For example, when n = 2 the sequence of powers is 4, 9, 16, 25, 36, 49, 64, 81
and the sequence of leftmost digits is 4, 9, 1, 2, 3, 4, 6, 8.

The leading digit of kn is determined by the fractional part of n*log(k), where log denotes the base 10 logarithm. There are dependencies between the columns, resulting from the following identities.

log(4) = 2 * log(2)
log(5) = 1 – log(2)
log(6) = log(2) + log(3)
log(8) = 3 * log(2)
log(9) = 2 * log(3)

Therefore, the pattern of leftmost digits is completely determined by
the fractional parts of n*log(2), n*log(3), and n*log(7).

I claim that the set {1, log(2), log(3), log(7)} is linearly independent over the rational numbers. This means that if a, b, c, d are rational numbers such that

a + b log(2) + c log(3) + d log(7) = 0,

then a = b = c = d = 0.

This is easy to prove. By clearing fractions, we can assume that a, b, c, and d are integers. Exponentiating both sides with a base of 10 yields

2^b 3^c 7^d = 10^{-a}.

By unique factorization, we conclude that a = b = c = d = 0.

Since {1, log(2), log(3), log(7)} are linearly independent over the rational numbers, Kronecker’s theorem implies that the set of points

\{ (n \log 2, n \log 3, n \log 7) \}

is dense modulo 1. (This theorem is proved in Hardy & Wright.)

Therefore, it suffices to find the patterns of leading digits as the fractional parts of n*log(2), n*log(3), and n*log(7) range independently over the interval [0,1). We can reduce this to a finite computation by subdividing the unit cube into appropriate regions. The result of this computation is that there are 18,226 distinct patterns of leading digits (see attached file).

Data on Gelfand’s question

Which powers of 2 do not contain the digit 7?

Which powers of 2 do not contain the digit 7? A quick search reveals the following examples.

2^0 = 1 
2^1 = 2 
2^2 = 4 
2^3 = 8 
2^4 = 16 
2^5 = 32 
2^6 = 64 
2^7 = 128 
2^8 = 256 
2^9 = 512 
2^10 = 1024 
2^11 = 2048 
2^12 = 4096 
2^13 = 8192 
2^14 = 16384 
2^16 = 65536 
2^18 = 262144 
2^19 = 524288 
2^22 = 4194304 
2^23 = 8388608 
2^25 = 33554432 
2^28 = 268435456 
2^33 = 8589934592 
2^41 = 2199023255552 
2^42 = 4398046511104 
2^49 = 562949953421312 
2^50 = 1125899906842624 
2^54 = 18014398509481984 
2^61 = 2305843009213693952 
2^71 = 2361183241434822606848

These exponents are listed in the On-line Encyclopedia of Integer Sequences as A035062.

I have continued the search for exponents up to 1010, and found no other powers of 2 that do not contain the digit 7. If we suppose that the decimal digits of powers of 2 are random, then it is extremely unlikely that other examples exist. Observe that 2n has approximately n\log_{10} 2 decimal digits, so the probability that 2n does not contain the digit 7 is approximately rn where

r = (0.9)^{\log_{10} 2} \approx 0.96878.

Given that no examples exist for 72 \le n \le 10^{10}, we can estimate the probability of an additional example via the infinite sum \sum_{n=10^{10}}^{\infty} r^n = \frac{r^{10^{10}}}{1-r} \approx 5 \times 10^{-137743771}.

This probability is so small that it almost defies comprehension. Imagine that every person in the State of New York picked a random combination for the Powerball lottery, that they all picked the same numbers purely by chance, and they all won the jackpot. Absurd as it is, this event is more probable than finding another power of 2 which does not contain the digit 7, unless there is some hidden pattern in the digits which has escaped our attention.

I wrote a Python program to conduct my search. Since powers of 2 grow very rapidly, it is not efficient to generate all of their digits, so I only kept track of the last 70 digits. If a 7 is not found in the last 70 digits, then the last 400 digits are checked using the three-argument pow function. If a 7 is not found in the last 400 digits, then n is printed, but no further checks are made. Here is my code:

n = 0
N = 1
while 1:
    if not('7' in str(N)) and not('7' in str(pow(2, n, 10**400))):
        print "2^" + str(n)                   
    n += 1
    N = (N * 2) % 10**70

My interest in this question was inspired by a blog post by John D. Cook.

Bayes’ Theorem and the Fake Facebook Lottery Winner

After the recent $587.5 million Powerball jackpot, the following picture was posted to Facebook, where it was shared over 2 million times.

Fake Powerball Ticket

It is not too hard to figure out that the picture is a fake. The most obvious clue is that the numbers are out of order. The numbers on an authentic Powerball ticket are always in ascending order, except for the last number. But I was doubtful for another reason — Bayes’ theorem.

There are two scenarios that must be considered. The first scenario is that Nolan actually did win the jackpot, and he wishes to share his good fortune with a random stranger. The second scenario is that Nolan did not win the lottery, and the picture is a fake. (Other scenarios are theoretically possible, but we will disregard them.)

Let us define some events. Let A be the event that a player selected at random wins the Powerball jackpot, and let B be the event that a player selected at random would make an offer similar to the one shown above. We wish to estimate the conditional probability of A given B. According to Bayes’ theorem, the probability can be computed as follows:

P(A|B) = \displaystyle \frac{P(B|A) P(A)}{P(B|A) P(A) + P(B|\neg A) P(\neg A)}

The probabilities in this formula are difficult to estimate, but let’s make an attempt. We know that there were two winners in the drawing, and I will guesstimate that about 20 million people bought tickets, so P(A) = 1/10,000,000.

The other probabilities are more difficult to predict. It certainly seems unlikely that a lottery winner would offer to give $1 million to a complete stranger. Perhaps it’s even more unlikely that a (randomly selected) non-winner would pretend to win the lottery and offer to share it with a stranger. But it does not seem reasonable to suppose that the first event is 10 million times more likely than the second. So we must conclude that the denominator is dominated by its second term, hence P(A|B) must be close to zero. In other words, the picture is (probably) fake.

A function that is surjective on every interval

The intermediate value theorem states that if f is a continuous real-valued function on the closed interval [a, b], and if c is any real number between f(a) and f(b), then there exists c in [a, b] such that f(x) = c. A function that satisfies the conclusion of this theorem is called a Darboux function.

Although every continuous function is a Darboux function, it is not true that every Darboux function is continuous. Perhaps the simplest example is f(x) = sin(1/x) for x not 0, f(0) = 0. The graph of this function is known as the topologist’s sine curve. The importance of this curve lies in the fact that it is connected but not path-connected.

This function is only discontinuous at 0, but the British mathematician John H. Conway constructed a Darboux function that is discontinuous at every point. In fact, it has the stronger property that it is surjective on every nonempty open interval. That is, if a, b, and y are real numbers with a < b, then there exists a real number x such that f(x) = y. This function is called Conway’s base 13 function, because it is defined in terms of the base 13 digits of the argument.

I wish to propose another example of a function that is surjective on every interval. The function is defined as follows:

f(x) = \displaystyle \lim_{n\to\infty} \tan(n!\, \pi x) if the limit exists,

f(x) = 0 otherwise.

In addition to being surjective on every interval, my function has a number of other appealing properties. It is defined using a simple formula instead of arcane digit manipulations. The function is equal to zero almost everywhere. Most remarkably, it is periodic and every positive rational number is a period.

I challenge the reader to verify these claims for herself or himself. My proof is available here.

What is a tangent line?

In this post I will give my answer to a question from David Wees.

Recall that a secant line to the graph of y = f(x) is a line that intersects the graph in at least one point (p, f(p)). This figure shows a secant line (in blue) to the curve y = x^2 (in red) at the point (1,1).

This particular secant line crosses from above to below, because the secant line lies above the curve when we are immediately left of the intersection point, and it lies below the curve when we are immediately to the right of the intersection point.

We could also draw another secant line that crosses from below to above, as shown here. Note that this new secant line has a greater slope than the previous secant line.

Now we can give a mathematically precise definition of a tangent line. A secant line to the curve y = f(x) at the point (p, f(p)) is said to be a tangent line if the following two conditions are satisfied.

  1. Every secant line with lesser slope crosses from above to below at (p, f(p)).
  2. Every secant line with greater slope crosses from below to above at (p, f(p)).

The following picture shows a tangent line to the graph of y = x^2. Notice that if the slope were increased then the line would cross from below to above, and if the slope were decreased then it would cross from above to below.

It is tempting to suppose that a tangent line cannot cross the curve, but this is not the case.

In this case, the tangent line crosses from above to below. If the slope were decreased, then it would still cross from above to below. However, if the slope were increased, then it would cross from below to above. This is consistent with our definition of the tangent line.

Credit: The idea for this definition of tangent line was inspired by the book Calculus Unlimited by Jerrold Marsden and Alan Weinstein, which develops calculus without the concept of limit.

Exploring patterns in Pascal’s triangle using Excel

There are many interesting patterns hidden within Pascal’s triangle. For example, if we color every odd number, we obtain a pattern resembling the Sierpinski triangle. (Image credit: Wikimedia commons)

Sierpinski Triangle

Other patterns can be generated by coloring the numbers according to their remainder after dividing by 3, 4, 5, etc. Many of these patterns are explored in the book Visual Patterns in Pascal’s Triangle by Dale Seymour, and they are analyzed more deeply in a scholarly article by Andrew Granville with the improbable title Zaphod Beeblebrox’s Brain and the Fifty-ninth Row of Pascal’s Triangle.

It is a good activity for students to create these patterns themselves by calculating Pascal’s triangles and coloring the squares. But it is also possible to generate these patterns quickly in Excel using conditional formatting.

Pascal’s triangle can be generated quickly by copying a simple formula. Just enter the number 1 in D3, enter the formula =C3+D3 in D4, and then copy this formula to a square block of cells starting with D4 in the upper left corner.

Since the numbers grow very quickly, it is convenient to use the MOD function to reduce the numbers. I entered a modulus (initially 2) in B3, and changed the formula in D4 to =mod(C3+D3,$B$3). Finally, I created a conditional formatting rule to highlight all cells whose values are greater than 0.

Below is a screenshot of the result. Note that I have zoomed out as far as possible. My Excel spreadsheet is available here. (Warning: 2 MB)

Sierpinski triangle in ExcelUpdate: Patrick Honner brought to my attention a nice video by Debra Borkovitz which demonstrates how to create these patterns in Excel.