Setup
Let be a sample from a distribution indexed by an unknown parameter . Write for the probability measure under parameter and for the corresponding expectation.
An estimator is a measurable function of the sample. It is an unbiased estimator of when
E_{\htmlClass{maisight-sym-theta}{\theta}}[\htmlClass{maisight-sym-T}{T}] = \htmlClass{maisight-sym-theta}{\theta} \quad \text{for every } \htmlClass{maisight-sym-theta}{\theta} \in \Theta. \tag{1}
where denotes the estimator, denotes the unknown parameter, and the expectation is taken under the law .
A statistic is a sufficient statistic for when the conditional distribution of the sample given does not depend on . Equivalently:
P_\theta(X_1, \dots, X_n \mid S = s) = P(X_1, \dots, X_n \mid S = s) \quad \text{for all } \theta, s. \tag{2}
Here denotes the sufficient statistic, ranges over its support, and the independence of the right-hand side from is the defining property.
The theorem
Rao-Blackwell. Let be an unbiased estimator of with finite variance, and let be a sufficient statistic for . Define
\htmlClass{maisight-sym-Tstar}{T^*} := E[\htmlClass{maisight-sym-T}{T} \mid \htmlClass{maisight-sym-S}{S}]. \tag{3}
where denotes the Rao-Blackwellised estimator obtained by conditioning on . Then is an unbiased estimator of , and
\mathrm{Var}_{\htmlClass{maisight-sym-theta}{\theta}}(\htmlClass{maisight-sym-Tstar}{T^*}) \leq \mathrm{Var}_{\htmlClass{maisight-sym-theta}{\theta}}(\htmlClass{maisight-sym-T}{T}) \quad \text{for every } \htmlClass{maisight-sym-theta}{\theta}, \tag{4}
with equality iff is already a function of (up to a -null set).
Proof sketch
Two ingredients suffice.
Unbiasedness. By the tower property of conditional expectation,
E_\theta[T^*] = E_\theta[E[T \mid S]] = E_\theta[T] = \theta. \tag{5}
where the inner expectation is the conditional expectation of given , and the outer expectation is taken under .
Variance. Apply the variance decomposition with :
\mathrm{Var}_\theta(T) = E_\theta[\mathrm{Var}(T \mid S)] + \mathrm{Var}_\theta(E[T \mid S]). \tag{6}
Here denotes the σ-algebra generated by , denotes the conditional variance of given , and the two right-hand terms are both non-negative.
Substituting from :
\mathrm{Var}_\theta(T) = E_\theta[\mathrm{Var}(T \mid S)] + \mathrm{Var}_\theta(T^*). \tag{7}
Since , we conclude , with equality iff almost surely — i.e. iff is determined by .
The same chain holds for any convex loss via Jensen's inequality:
E_\theta\!\left[L(T^*, \theta)\right] \leq E_\theta\!\left[L(T, \theta)\right]. \tag{8}
where denotes a convex loss function and the variance case in is the special case .
Worked example
Let be i.i.d. Poisson with mean , and let .
A trivial unbiased estimator:
T = \mathbf{1}\{X_1 = 0\}. \tag{9}
where denotes the indicator. Then , so is unbiased for .
The sample sum is sufficient for . Given , the conditional distribution of is — independent of . Therefore:
T^* = E[T \mid S] = P(X_1 = 0 \mid S) = \left(1 - \tfrac{1}{n}\right)^S. \tag{10}
where is the Rao-Blackwellised estimator. By , , with strict inequality for since is not a function of alone.
Why this matters
The recipe is constructive: take any unbiased estimator , project onto the σ-algebra of a sufficient statistic , and you get with no greater variance. Combined with completeness of (Lehmann-Scheffé, 1950), this gives the minimum-variance unbiased estimator uniquely — there is essentially one Rao-Blackwellisation up to almost-sure equality, and it dominates every other unbiased estimator simultaneously.
References
C. R. Rao, "Information and accuracy attainable in the estimation of statistical parameters," Bulletin of the Calcutta Mathematical Society, 37:81-91, 1945.
D. Blackwell, "Conditional expectation and unbiased sequential estimation," Annals of Mathematical Statistics, 18(1):105-110, 1947.
E. L. Lehmann and H. Scheffé, "Completeness, similar regions, and unbiased estimation — Part I," Sankhyā, 10:305-340, 1950.