Locally decodable code
A locally decodable code (LDC) is an that allows a single bit of the original message to be decoded with high probability by only examining (or querying) a small number of bits of a possibly corrupted .
This property could be useful, say, in a context where information is being transmitted over a noisy channel, and only a small subset of the data is required at a particular time and there is no need to decode the entire message at once. Note that locally decodable codes are not a subset of locally testable codes, though there is some overlap between the two.
Codewords are generated from the original message using an algorithm that introduces a certain amount of redundancy into the codeword; thus, the codeword is always longer than the original message. This redundancy is distributed across the codeword and allows the original message to be recovered with good probability even in the presence of errors. The more redundant the codeword, the more resilient it is against errors, and the fewer queries required to recover a bit of the original message.
Contents
Overview
More formally, a -locally decodable code encodes an
-bit message
to an
-bit codeword
such that any bit
of the message can be recovered with probability
by using a randomized decoding algorithm that queries only
bits of the codeword
, even if up to
locations of the codeword have been corrupted.
Furthermore, a perfectly smooth local decoder is a decoder such that, in addition to always generating the correct output given access to an uncorrupted codeword, for every and
the
query to recover the
bit is uniform over
. (The notation
denotes the set
). Informally, this means that the set of queries required to decode any given bit are uniformly distributed over the codeword.
Local list decoders are another interesting subset of local decoders. List decoding is useful when a codeword is corrupted in more than places, where
is the minimum between two codewords. In this case, it is no longer possible to identify exactly which original message has been encoded, since there could be multiple codewords within
distance of the corrupted codeword. However, given a radius
, it is possible to identify the set of messages that encode to codewords that are within
of the corrupted codeword. An upper bound on the size of the set of messages can be determined by
and
.
Locally decodable codes can also be concatenated, where a message is encoded first using one scheme, and the resulting codeword is encoded again using a different scheme. (Note that, in this context, is the term used by scholars to refer to what is usually called ; see
Length of Codeword and Query Complexity
The rate of a code refers to the ratio between its message length and codeword length: , and the number of queries required to recover 1 bit of the message is called the query complexity of a code.
The rate of a code is inversely related to the query complexity, but the exact shape of this tradeoff is a major open problem. It is known that there are no LDCs that query the codeword in only one position, and that the optimal codeword size for query complexity 2 is exponential in the size of the original message.
LDCs are also useful for data storage, where the medium may become partially corrupted over time, or the reading device is subject to errors. In both cases, an LDC will allow for the recovery of information despite errors, provided that there are relatively few. In addition, LDCs do not require that the entire original message be decoded; a user can decode a specific portion of the original message without needing to decode the entire thing.
Complexity Theory
One of the applications of locally decodable codes in is hardness amplification. Using LDCs with polynomial codeword length and polylogarithmic query complexity, one can take a function that is hard to solve on worst case inputs and design a function
that is hard to compute on average case inputs.
Consider limited to only length
inputs. Then we can see
as a binary string of length
, where each bit is
for each
. We can use a polynomial length locally decodable code
with polylogarithmic query complexity that tolerates some constant fraction of errors to encode the string that represents
to create a new string of length
. We think of this new string as defining a new problem
on length
inputs. If
is easy to solve on average, that is, we can solve
correctly on a large fraction
of inputs, then by the properties of the LDC used to encode it, we can use
to probabilistically compute
on all inputs. Thus, a solution to
for most inputs would allow us to solve
on all inputs, contradicting our assumption that
is hard on worst case inputs.
Private Information Retrieval Schemes
A scheme allows a user to retrieve an item from a server in possession of a database without revealing which item is retrieved. One common way of ensuring privacy is to have separate, non-communicating servers, each with a copy of the database. Given an appropriate scheme, the user can make queries to each server that individually do not reveal which bit the user is looking for, but which together provide enough information that the user can determine the particular bit of interest in the database.
The Reed–Muller code
The main idea behind local decoding of is . The key concept behind a Reed-Muller code is a multivariate polynomial of degree on
variables. The message is treated as the evaluation of a polynomial at a set of predefined points. To encode these values, a polynomial is extrapolated from them, and the codeword is the evaluation of that polynomial on all possible points. At a high level, to decode a point of this polynomial, the decoding algorithm chooses a set
of points on a line that passes through the point of interest
. It then queries the codeword for the evaluation of the polynomial on points in
and interpolates that polynomial. Then it is simple to evaluate the polynomial at the point that will yield
. This roundabout way of evaluating
is useful because (a) the algorithm can be repeated using different lines through the same point to improve the probability of correctness, and (b) the queries are uniformly distributed over the codeword.
More formally, let be a finite field, and let
be numbers with
. The Reed-Muller code with parameters
is the function RM :
that maps every
-variable polynomial
over
of total degree
to the values of
on all the inputs in
. That is, the input is a polynomial of the form
specified by the interpolation of the
values of the predefined points and the output is the sequence
for every
.
To recover the value of a degree polynomial at a point
, the local decoder shoots a random line through
. Then it picks
points on that line, which it uses to interpolate the polynomial and then evaluate it at the point where the result is
. To do so, the algorithm picks a vector
uniformly at random and considers the line
through
. The algorithm picks an arbitrary subset
of
, where
, and queries coordinates of the codeword that correspond to points
for all
and obtains values
. Then it uses polynomial interpolation to recover the unique univariate polynomial
with degree less than or equal to
such that
for all
. Then, to get the value of
, it just evaluates
. To recover a single value of the original message, one chooses
to be one of the points that defines the polynomial.
Each individual query is distributed uniformly at random over the codeword. Thus, if the codeword is corrupted in at most a fraction of locations, by the union bound, the probability that the algorithm samples only uncorrupted coordinates (and thus correctly recovers the bit) is at least
. For other decoding algorithms, see.