Perfect security
So far, we have just defined the basic syntax and correctness requirements of a Shannon cipher.
Next, we address the question: what is a “secure” cipher? Intuitively, the answer is that a secure
cipher is one for which an encrypted message remains “well hidden,” even after seeing its encryption.
However, turning this intuitive answer into one that is both mathematically meaningful and
practically relevant is a real challenge. Indeed, although ciphers have been used for centuries, it
is only in the last few decades that mathematically acceptable definitions of security have been
developed.
In this section, we develop the mathematical notion of perfect security — this is the “gold
standard” for security (at least, when we are only worried about encrypting a single message and
do not care about integrity). We will also see that it is possible to achieve this level of security;
indeed, we will show that the one-time pad satisfies the definition. However, the one-time pad is
not very practical, in the sense that the keys must be as long as the messages: if Alice wants to
send a 1GB file to Bob, they must already share a 1GB key! Unfortunately, this cannot be avoided:
we will also prove that any perfectly secure cipher must have a key space at least as large as its
message space. This fact provides the motivation for developing a definition of security that is
weaker, but that is acceptable from a practical point of view, and which allows one to encrypt long
messages using short keys.
If Alice encrypts a message m under a key k, and an eavesdropping adversary obtains the
ciphertext c, Alice only has a hope of keeping m secret if the key k is hard to guess, and that
means, at the very least, that the key k should be chosen at random from a large key space. To
say that m is “well hidden” must at least mean that it is hard to completely determine m from
c, without knowledge of k; however, this is not really enough. Even though the adversary may
not know k, we assume that he does know the encryption algorithm and the distribution of k. In
fact, we will assume that when a message is encrypted, the key k is always chosen at random,
uniformly from among all keys in the key space. The adversary may also have some knowledge of
the message encrypted — because of circumstances, he may know that the set of possible messages
is quite small, and he may know something about how likely each possible message is. For example,
suppose he knows the message m is either m0 = "ATTACK AT DAWN" or m1 = "ATTACK AT DUSK",
and that based on the adversary’s available intelligence, Alice is equally likely to choose either one
of these two messages. This, without seeing the ciphertext c, the adversary would only have a
50% chance of guessing which message Alice sent. But we are assuming the adversary does know
c. Even with this knowledge, both messages may be possible; that is, there may exist keys k0
and k1 such that E(k0,m0) = c and E(k1,m1) = c, so he cannot be sure if m = m0 or m = m1.
However, he can still guess. Perhaps it is a property of the cipher that there are 800 keys k0 such
that E(k0,m0) = c, and 600 keys k1 such that E(k1,m1) = c. If that is the case, the adversary’s
best guess would be that m = m0. Indeed, the probability that this guess is correct is equal to
800/(800 + 600) ⇡ 57%, which is better than the 50% chance he would have without knowledge
of the ciphertext. Our formal definition of perfect security expressly rules out the possibility that
knowledge of the ciphertext increases the probability of guessing the encrypted message, or for that
matter, determining any property of the message whatsoever.
Without further ado, we formally define perfect security. In this definition, we will consider
a probabilistic experiment in which is key is drawn uniformly from the key space. We write k to
denote the random variable representing this random key. For a message m, E(k,m) is another
random variable, which represents the application of the encryption function to our random key
and the message m. Thus, every message m gives rise to a di↵erent random variable E(k,m).
Definition 2.1 (perfect security). Let E = (E,D) be a Shannon cipher defined over (K,M, C).
Consider a probabilistic experiment in which the random variable k is uniformly distributed over
K. If for all m0,m1 2M, and all c 2 C, we have
Pr[E(k,m0) = c] = Pr[E(k,m1) = c],
22
then we say that E is a perfectly secure Shannon cipher.
There are a number of equivalent formulations of perfect security that we shall explore. We
state a couple of these here.
Theorem 2.1. Let E = (E,D) be a Shannon cipher defined over (K,M, C). The following are
equivalent:
(i) E is perfectly secure.
(ii) For every c 2 C, there exists N (possibly depending on c) such that for all m 2M, we have
|{k 2 K : E(k,m) = c}| = N.
(iii) If the random variable k is uniformly distributed over K, then each of the random variables
E(k,m), for m 2M, has the same distribution.
The proof of this is a simple calculation that we leave to the reader.
As promised, we give a proof that the one-time pad (see Example 2.1) is perfectly secure.
Theorem 2.2. The one-time pad is a perfectly secure Shannon cipher.
Proof. Suppose that the Shannon cipher E = (E,D) is a one-time pad, and is defined over (K,M, C),
where K := M := C := {0, 1}L. For any fixed message m 2 {0, 1}L and ciphertext c 2 {0, 1}L,
there is a unique key k 2 {0, 1}L satisfying the equation
k $ m = c,
namely, k := m$c. Therefore, E satisfies condition (ii) in Theorem 2.1 (with N = 1 for each c). 2
Example 2.5. Consider again the variable length one-time pad, defined in Example 2.2. This
does not satisfy our definition of perfect security, since a ciphertext has the same length as the
corresponding plaintext. Indeed, let us choose an arbitrary string of length 1, call it m0, and an
arbitrary string of length 2, call it m1. In addition, suppose that and c is an arbitrary length 1
string, and that k is a random variable that is uniformly distributed over the key space. Then we
have
Pr[E(k,m0) = c] = 1/2 and Pr[E(k,m1) = c] = 0,
which provides a direct counter-example to Definition 2.1.
Intuitively, the variable length one-time pad cannot satisfy our definition of perfect security
simply because any ciphertext leaks the length of the corresponding plaintext. However, in some
sense (which we do not make precise right now), this is the only information leaked. It is perhaps
not clear whether this should be viewed as a problem with the cipher or with our definition of
perfect security. On the one hand, one can imagine scenarios where the length of a message may
vary greatly, and while we could always “pad” short messages to e↵ectively make all messages
equally long, this may be unacceptable from a practical point of view, as it is a terrible waste of
bandwidth. On the other hand, one must be aware of the fact that in certain applications, leaking
just the length of a message may be dangerous: if you are encrypting a “yes” or “no” answer to
a question, just the length of the obvious ASCII encoding of these strings leaks everything, so you
better pad “no” out to three characters. 2