Automatic Differentiation of Programs with Discrete Randomness

Automatic differentiation (AD) has become ubiquitous throughout scientific computing and deep learning. However, AD systems have been restricted to the subset of programs that have a continuous dependence on parameters. Programs that have discrete stochastic behaviors governed by distribution parameters, such as flipping a coin with probability p of being heads, pose a challenge to these systems. In this work we develop a new AD methodology for programs with discrete randomness. We demonstrate how this method gives an unbiased and low-variance estimator.