Abstract:
We prove a central limit theorem for a class of additive processes that arise naturally in the
theory of finite horizon Markov decision problems. The main theorem generalizes a classic
result of Dobrushin (1956) for temporally non-homogeneous Markov chains, and the principal
innovation is that here the summands are permitted to depend on both the current state and
a bounded number of future states of the chain. We show through several examples that this
added flexibility gives one a direct path to asymptotic normality of the optimal total reward of
finite horizon Markov decision problems. The same examples also explain why such results
are not easily obtained by alternative Markovian techniques such as enlargement of the state
space.