Randomness extractor

A randomness extractor, often simply called an “extractor”, is a function, which being applied to output from a weakly random source, together with a short, uniformly random seed, generates a highly output that appears from the source and . Examples of weakly random sources include or ; the only restriction on possible sources is that there is no way they can be fully controlled, calculated or predicted, and that a lower bound on their entropy rate can be established. For a given source, a randomness extractor can even be considered to be a true random number generator (); but there is no single extractor that has been proven to produce truly random output from any type of weakly random source.

Sometimes the term “bias” is used to denote a weakly random source’s departure from uniformity, and in older literature, some extractors are called unbiasing algorithms, as they take the randomness from a so-called “biased” source and output a distribution that appears unbiased. The weakly random source will always be longer than the extractor’s output, but an efficient extractor is one that lowers this ratio of lengths as much as possible, while simultaneously keeping the seed length low. Intuitively, this means that as much randomness as possible has been “extracted” from the source.

Note that an extractor has some conceptual similarities with a (PRG), but the two concepts are not identical. Both are functions that take as input a small, uniformly random seed and produce a longer output that “looks” uniformly random. Some pseudorandom generators are, in fact, also extractors. (When a PRG is based on the existence of , one can think of the weakly random source as a set of truth tables of such predicates and prove that the output is statistically close to uniform.) However, the general PRG definition does not specify that a weakly random source must be used, and while in the case of an extractor, the output should be to uniform, in a PRG it is only required to be from uniform, a somewhat weaker concept.

Special Publication 800-90B (draft) recommends several extractors, including the hash family and states that if the amount of entropy input is twice the number of bits output from them, that output can be considered essentially fully random.


Formal definition of extractors

The of a distribution X (denoted H_{infty}(X)), is the largest real number k such that Pr[X =x] leq 2^{-k} for every x in the range of X. In essence, this measures how likely X is to take its most likely value, giving a worst-case bound on how random X appears. Letting U_{ell} denote the uniform distribution over {0, 1 }^{ell}, clearly  H_{infty}(U_{ell}) = ell.

For an n-bit distribution X with min-entropy k, we say that X is an (n, k) distribution.

Definition (Extractor): (kε)-extractor

Let text{Ext}: {0,1}^n times {0,1}^d to {0,1}^m be a function that takes as input a sample from an (n, k) distribution X and a d-bit seed from U_d, and outputs an m-bit string. text{Ext} is a (kε)-extractor, if for all (n, k) distributions X, the output distribution of text{Ext} is ε-close to U_m.

In the above definition, ε-close refers to .

Intuitively, an extractor takes a weakly random n-bit input and a short, uniformly random seed and produces an m-bit output that looks uniformly random. The aim is to have a low d (i.e. to use as little uniform randomness as possible) and as high an m as possible (i.e. to get out as many close-to-random bits of output as we can).

Strong extractors

An extractor is strong if the seed with the extractor’s output yields a distribution that is still close to uniform.

Definition (Strong Extractor): A (k, epsilon)-strong extractor is a function

 text{Ext}: {0,1}^n times {0,1}^d rightarrow {0,1}^m ,

such that for every (n, k) distribution X the distribution U_d circ text{Ext}(X, U_d) (the two copies of U_d denote the same random variable) is epsilon-close to the uniform distribution on {0,1}^{m+d}.

Explicit extractors

Using the , it can be shown that there exists a (kε)-extractor, i.e. that the construction is possible. However, it is usually not enough merely to show that an extractor exists. An explicit construction is needed, which is given as follows:

Definition (Explicit Extractor): For functions k(n), ε(n), d(n), m(n) a family Ext = {Extn} of functions

text{Ext}_n : {0,1}^n times {0,1}^{d(n)} rightarrow {0,1}^{m(n)}

is an explicit (kε)-extractor, if Ext(xy) can be computed in (in its input length) and for every n, Extn is a (k(n), ε(n))-extractor.

By the probabilistic method, it can be shown that there exists a (kε)-extractor with seed length

d = log{(n-k)}+2log left(frac{1}{varepsilon}right) +O(1)

and output length

m = k +d-2log left(frac{1}{varepsilon}right) - O(1).


A variant of the randomness extractor with weaker properties is the .

Randomness extractors in cryptography

One of the most important aspects of cryptography is random . It is often necessary to generate secret and random keys from sources that are semi-secret or which may be compromised to some degree. By taking a single, short (and secret) random key as a source, an extractor can be used to generate a longer pseudo-random key, which then can be used for public key encryption. More specifically, when a strong extractor is used its output will appear be uniformly random, even to someone who sees part (but not all) of the source. For example, if the source is known but the seed is not known (or vice versa). This property of extractors is particularly useful in what is commonly called Exposure-Resilient cryptography in which the desired extractor is used as an Exposure-Resilient Function (ERF). Exposure-Resilient cryptography takes into account that the fact that it is difficult to keep secret the initial exchange of data which often takes place during the initialization of an application e.g., the sender of encrypted information has to provide the receivers with information which is required for decryption.

The following paragraphs define and establish an important relationship between two kinds of ERF–k-ERF and k-APRF–which are useful in Exposure-Resilient cryptography.

Definition (k-ERF): An adaptive k-ERF is a function fwhere, for a random input r, when a computationally unbounded adversary Acan adaptively read all of rexcept for kbits, |Pr{A^{r}(f(r)) = 1} - Pr{A^{r}(R) = 1}| leq epsilon(n)for some negligible function epsilon(n) (defined below).

The goal is to construct an adaptive ERF whose output is highly random and uniformly distributed. But a stronger condition is often needed in which every output occurs with almost uniform probability. For this purpose Almost-Perfect Resilient Functions (APRF) are used. The definition of an APRF is as follows:

Definition (k-APRF): A k = k(n)APRF is a function fwhere, for any setting of n-kbits of the input rto any fixed values, the probability vector pof the output f(r)over the random choices for the kremaining bits satisfies |p_{i}-2^{-m}| < 2^{-m} epsilon(n)for all iand for some negligible function epsilon(n).

Kamp and Zuckerman have proved a theorem stating that if a function f is a k-APRF, then f is also a k-ERF. More specifically, any extractor having sufficiently small error and taking as input an oblivious, bit-fixing source is also an APRF and therefore also a k-ERF. A more specific extractor is expressed in this lemma:

Lemma: Any 2^{-m} epsilon(n)-extractor f: {0,1}^{n} rightarrow {0,1}^mfor the set of (n,k)oblivious bit-fixing sources, where epsilon(n)is negligible, is also a k-APRF.

This lemma is proved by Kamp and Zuckerman.

Thus, it takes as input a with p not necessarily equal to 1/2, and outputs a Bernoulli sequence with p = 1/2. More generally, it applies to any – it only relies on the fact that for any pair, 01 and 10 are equally likely: for independent trials, these have probabilities pcdot q = qcdot p, while for an exchangeable sequence the probability may be more complicated, but both are equally likely.

Chaos machine

Another approach is to use the output of a applied to the input stream. This approach generally relies on properties of . Input bits are pushed to the machine, evolving orbits and trajectories in multiple dynamical systems. Thus, small differences in the input produce very different outputs. Such a machine has a uniform output even if the distribution of input bits is not uniform or has serious flaws, and can therefore use weak . Additionally, this scheme allows for increased complexity, quality, and security of the output stream, controlled by specifying three parameters: time cost, memory required, and secret key.

Cryptographic hash function

It is also possible to use a cryptographic hash function as a randomness extractor. However, not every hashing algorithm is suitable for this purpose.


Randomness extractors are used widely in cryptographic applications, whereby a function is applied to a high-entropy, but non-uniform source, such as disk drive timing information or keyboard delays, to yield a uniformly random result.

Randomness extractors have played a part in recent developments in , where photons are used by the randomness extractor to generate secure random bits.[1]

Randomness extraction is also used in some branches of .

Random extraction is also used to convert data to a simple random sample, which is normally distributed, and independent, which is desired by statistics.

See Also on BitcoinWiki