Motivation

Wikipedia

The reparameterization trick allows for the efficient computation of gradients through random variables, enabling the optimization of parametric probability models using stochastic gradient descent

Consider a function of the form:

L (θ) = E_{X \sim P_{θ}} [F (X)]

Info

Our goal is to answer the question: How can we compute $\nabla_{θ} L (θ)$ ?

Before calculating $\nabla_{θ} L (θ)$ , let’s discuss how to calculate $L (θ)$ . We can use Monte Carlo methods to estimate the expectation by repeatedly sampling $x \sim P_{θ}$ , calculating $Y = F (X)$ , and then averaging all $Y$ :

L (θ) where Y_{i} \approx \frac{1}{n} i = 1 \sum n Y_{i} = F (X_{i}), X_{i} \sim P_{θ}

The following is the computational graph that describes how to generate a single sample $Y$ from the parameters $θ$ .

graph TD;  
θ{θ}-->P(P);
P-->X{X};
X-->F;
F-->Y{Y};
P(P)-.->θ{θ};
X{X}-.->P;
F-.->X;
Y{Y}-.->F;

The shape and style of nodes/edges define their semantics:

Diamond nodes represent data ( $θ, X, Y$ ), and act as the input/output of functions.
Rounded boxes represent stochastic functions ( $P_{θ}$ )
Rectangular boxes represent deterministic functions ( $F$ ):
Full arrows show the direction data flows for generating a sample $Y$ based on parameters $θ$ .
The dashed arrows show the direction that gradients flow when differentiating the output $Y$ with respect to the parameters $θ$ .

The computational graph makes it clear how to calculate $\nabla_{θ} L$ : We simple accumulate the gradients along the dashed arrows. However, an issue arises since we need to differentiate through a stochastic function $P_{θ}$ , which is not well-defined.

We discuss two options for handling this: the REINFORCE estimator, and the reparameterization trick.

REINFORCE Estimator

The REINFORCE estimator uses the log-derivative trick to rewrite $\nabla_{θ} L (θ)$ as an expectation that we can calculate directly:

\nabla_{θ} L (θ) = \nabla_{θ} E_{x \sim P_{θ}} [f (x)] = \nabla_{θ} x \sum P_{θ} (x) f (x) (Def. of Expectation) = x \sum \nabla_{θ} (P_{θ} (x) f (x)) (Linearity of Expectation) = x \sum f (x) \nabla_{θ} P_{θ} (x) + P_{θ} (x) \nabla_{θ} f (x) (Product rule) = x \sum f (x) \nabla_{θ} P_{θ} (x) (f does not depend on θ) = x \sum f (x) P_{θ} (x) \nabla_{θ} lo g P_{θ} (x) (Log-derivative trick) = E_{x \sim P_{θ}} [f (x) \nabla_{θ} lo g P_{θ} (x)]

This final expression is called the REINFORCE estimator for $\nabla_{θ} L (θ)$ . Similar to how we approximated $L (θ)$ , we can approximate this expectation by sampling $X \sim P_{θ}$ and averaging $f (x) \nabla_{θ} lo g P_{θ} (x)$ . One issue with this estimator is that it has high variance since it involves multiplying $f (x)$ and $\nabla_{θ} lo g P_{θ} (x)$ .

Instead of using the high-variance REINFORCE estimator, we can use the reparameterization trick.

Reparameterization trick

References

REINFORCE vs Reparameterization Trick

Continually Learning Blog

Explorer

Reparameterization trick

Motivation

REINFORCE Estimator

Reparameterization trick

References

Graph View

Table of Contents