Maximal Maximum-Entropy Sets

Consider a probability distribution {p(y)} on a space {\mathcal{Y}}. Suppose we want to construct a set {\mathcal{P}} of probability distributions on {\mathcal{Y}} such that {p(y)} is the maximum-entropy distribution over {\mathcal{P}}:

\displaystyle H(p) = \max_{q \in \mathcal{P}} H(q),

where {H(p) = \mathbb{E}_{p}[-\log p(y)]} is the entropy. We call such a set a maximum-entropy set for {p}. Furthermore, we would like {\mathcal{P}} to be as large as possible, subject to the constraint that {\mathcal{P}} is convex.

Does such a maximal convex maximum-entropy set {\mathcal{P}} exist? That is, is there some convex set {\mathcal{P}} such that {p} is the maximum-entropy distribution in {\mathcal{P}}, and for any {\mathcal{Q}} satisfying the same property, {\mathcal{Q} \subseteq \mathcal{P}}? It turns out that the answer is yes, and there is even a simple characterization of {\mathcal{P}}:

Proposition 1 For any distribution {p} on {\mathcal{Y}}, the set

\displaystyle \mathcal{P} = \{q \mid \mathbb{E}_{q}[-\log p(y)] \leq H(p)\}

is the maximal convex maximum-entropy set for {p}.

To see why this is, first note that, clearly, {p \in \mathcal{P}}, and for any {q \in \mathcal{P}} we have

\displaystyle \begin{array}{rcl} H(q) &=& \mathbb{E}_{q}[-\log q(y)] \\ &\leq& \mathbb{E}_{q}[-\log p(y)] \\ &\leq& H(p), \end{array}

so {p} is indeed the maximum-entropy distribution in {\mathcal{P}}. On the other hand, let {\mathcal{Q}} be any other convex set whose maximum-entropy distribution is {p}. Then in particular, for any {q \in \mathcal{Q}}, we must have {H((1-\epsilon)p + \epsilon q) \leq H(p)}. Let us suppose for the sake of contradiction that {q \not\in \mathcal{P}}, so that {\mathbb{E}_{q}[-\log p(y)] > H(p)}. Then we have

\displaystyle \begin{array}{rcl} H((1-\epsilon)p + \epsilon q) &=& \mathbb{E}_{(1-\epsilon)p+\epsilon q}[-\log((1-\epsilon)p(y)+\epsilon q(y))] \\ &=& \mathbb{E}_{(1-\epsilon)p+\epsilon q}[-\log(p(y) + \epsilon (q(y)-p(y))] \\ &=& \mathbb{E}_{(1-\epsilon)p+\epsilon q}\left[-\log(p(y)) - \epsilon \frac{q(y)-p(y)}{p(y)} + \mathcal{O}(\epsilon^2)\right] \\ &=& H(p) + \epsilon(\mathbb{E}_{q}[-\log p(y)]-H(p)) - \epsilon \mathbb{E}_{(1-\epsilon)p+\epsilon q}\left[\frac{q(y)-p(y)}{p(y)}\right] + \mathcal{O}(\epsilon^2) \\ &=& H(p) + \epsilon(\mathbb{E}_{q}[-\log p(y)]-H(p)) - \epsilon^2 \mathbb{E}_{q}\left[\frac{q(y)-p(y)}{p(y)}\right] + \mathcal{O}(\epsilon^2) \\ &=& H(p) + \epsilon(\mathbb{E}_{q}[-\log p(y)]-H(p)) + \mathcal{O}(\epsilon^2). \end{array}

Since {\mathbb{E}_{q}[-\log p(y)] - H(p) > 0}, for sufficiently small {\epsilon} this will exceed {H(p)}, which is a contradiction. Therefore we must have {q \in \mathcal{P}} for all {q \in \mathcal{Q}}, and hence {\mathcal{Q} \subseteq \mathcal{P}}, so that {\mathcal{P}} is indeed the maximal convex maximum-entropy set for {p}.