Thursday, May 15, 2014

Guessing a PIN

A few posts ago I was musing about the security of a PIN-protected smartphone. We all know that if you choose a four digit PIN (0000 - 9999) then on average it would take 5,000 guesses to break-in. But how many attempts would give a thief, say, a 5% chance of breaking-in (i.e. a 95% chance of you being safe)?

This matters because you want to set a thief a lengthy guessing-task, which still has a low probability of success.

So we want to know the probability of bad-guy success (guessing the PIN) in X attempts from N possibilities (where N = 10,000 for a four-digit PIN) .. as X varies. The distribution we want is one where we keep taking samples (i.e. guessing four digits) without replacement until we sample the one success, then the thief stops and prepares to ransack your account.

My first intuition was that the curve would be cumulative-binomial. The first few trials you have no information so the probability of an early guess working is rather low. About half way through the process, you have rejected about half the possible values so now your chances of getting the PIN are rising, while at the end you've rejected so many wrong choices that your chances of the next guess being right is almost certain. But I was wrong (about the binomial).

Cumulative Binomial Distribution Function

After some googling I came upon this. It turns out that the probability of taking exactly X attempts to correctly guess the PIN is exactly the same, 1/N, for any sequence-length of guesses. So in the case of the four-digit PIN, your chances of guessing right the first time are 1/10,000 .. as are your chances of guessing right on the second, third and indeed ten-thousandth attempt.

(Note: this means that all sequence lengths are equally likely before the thief has started his guessing game. Obviously the conditional probabilities change as the thief is going along - if the first 9,999 guesses are wrong, the final guess will be correct with probability one. This is explained in the link.)

The actual distribution is rectangular - what is called a uniform distribution. If the PIN can take N different values, then the x-axis registers the length of possible guess-sequences going from 1 to N, and the Y-axis shows the probability of guessing correctly in that sequence-length. The distribution is a horizontal line of height (i.e. y-intercept) 1/N. For absolute clarity, we are considering the space of all possible guessing-sequences from length 1 (first guess gets it) to N (last 'guess' gets it). The thief will actually execute only one sequence but doesn't know in advance the length of the one which will work. We assume he doesn't give up. Since the sequences are mutually-exclusive (you get it in one, or you get it in two, or you ...) we can add the various probabilities in what follows.

Example: uniform distribution for N=6 possible choices

If your PIN is four digits then N=10,000 and the chances of correctly guessing your PIN in guess-sequences ranging in length from 1 to 500 attempts is 500/10,000 = 1/20 = 5%. If an attempt takes 6 seconds, this is 50 minutes. Note that searching the whole space of 10,000 choices takes twenty times as long = 1,000 minutes = nearly 17 hours.

If your PIN is six digits then N=1,000,000 and the chances of correctly guessing your PIN in up to 50,000 attempts is 50,000/1,000,000 = 1/20 = 5%. If an attempt takes 6 seconds, this is 5,000 minutes = around three and a half days.

The security services advise nothing shorter than six digits for secure material.