References

libretexts writeup on change of variables is amazing. A lot of this is lifted from that but rewritten based on Michael Fishman’s more “direct” writing.

Summary

Consider a random variable $X$ that takes on real values in $R$ , and let $f (x)$ be the probability density function (pdf) for the random variable $X$ . Then we can define the cumulative distribution function (cdf) for the random variable as the integration of the pdf from $- \infty$ to $x$ :

F (x) \to \frac{d F}{d x} (x) := P (X \leq x) = \int_{- \infty}^{x} f (t) d t = f (x)

Now let $r (X) = Y$ be a smooth, invertible function that induces a new random variable $Y$ . We can calculate its pdf $g (y)$ by calculating its cdf $G (Y)$ , in terms of $X$ ‘s cdf, which we know, and then take the derivative to get the pdf. We present the final form that this note derives.

Key takeaways

For single variable:

g (y) = f (r^{- 1} (y)) ∣ \frac{d}{d y} r^{- 1} (y) ∣ = f (x) ∣ \frac{d x}{d y} (y) ∣ = \frac{f ( x )}{∣ \frac{d y}{d x} ( x ) ∣} where x = r^{- 1} (y)

For multivariate setting:

g (y) = f (x) ∣ det J_{r^{- 1}} (y) ∣ = \frac{f ( x )}{∣ det J _{r} ( x ) ∣} where x = r^{- 1} (y)

Monotonically increasing derivation

G (y) \to g (y) = P (Y \leq y) = P (r (X) \in (- \infty, y]) = P (X \in r^{- 1} ((- \infty, y])) = \int_{- \infty}^{r^{- 1} (y)} f (x) d x = F (r^{- 1} (y)) = \frac{d}{d y} G (y) = \frac{d}{d y} F (r^{- 1} (y)) = \frac{d}{d x} F (r^{- 1} (y)) \frac{d}{d y} r^{- 1} (y) = f (r^{- 1} (y)) \frac{d}{d y} r^{- 1} (y) (Monotonically increasing) (\frac{d F}{d y} = \frac{d F}{d x} \frac{d x}{d y}) (\frac{d F}{d x} (x) = f (x))

Monotonically decreasing derivation

G (y) \to g (y) = P (Y \leq y) = P (r (X) \in (- \infty, y]) = P (X \in r^{- 1} ((- \infty, y])) = \int_{r^{- 1} (y)}^{\infty} f (x) d x = P [X > r^{- 1} (y)] = 1 - F (r^{- 1} (y)) = \frac{d}{d y} G (y) = \frac{d}{d y} (1 - F (r^{- 1} (y))) = - \frac{d}{d x} F (r^{- 1} (y)) \frac{d}{d y} r^{- 1} (y) = - f (r^{- 1} (y)) \frac{d}{d y} r^{- 1} (y) (Monotonically decreasing) (Complement of F (r^{- 1} (y))) (\frac{d F}{d y} = \frac{d F}{d x} \frac{d x}{d y}) (\frac{d F}{d x} (x) = f (x))

The one-to-one assumption for $r$ was required to let us assume the inverse $r^{- 1} (Y) = X$ is well define. The main difference between the monotonically increasing/decreasing setting is that $r^{- 1} ((- \infty, y]) = (- \infty, r^{- 1} (y))$ for increasing, and $r^{- 1} ((- \infty, y]) = (r^{- 1} (y), \infty)$ for decreasing.

Combined change of variables formula

$g (y)$ looks similar for both monotonic increasing and decreasing, except a sign. However, since $r^{- 1} (y)$ is monotonically decreasing when $r (y)$ is, then we know its derivative $\frac{d}{d y} r^{- 1} (y)$ is also negative, so there is a negative component that will always cancel out. This means we can write $g (y)$ in a single form regardless if its increasing/decreasing by just taking the absolute value of the derivative:

g (y) = f (r^{- 1} (y)) ∣ \frac{d}{d y} r^{- 1} (y) ∣ = f (x) ∣ \frac{d x}{d y} (y) ∣ = \frac{f ( x )}{∣ \frac{d y}{d x} ( x ) ∣} where x = r^{- 1} (y)

The last line is true because the derivatives of smooth, one-to-one functions are reciprocals (scalar inverses), so we can divide by the derivative of $r$ instead of multiply by its inverse.

Multivariable change of variables

Let $r : R^{n} \to R^{n}$ be a one-to-one, monotonically increasing or decreasing for each output dimensional, and differentiable function from $R^{n}$ onto itself. The Jacobian of the inverse function $r^{- 1} (y) = x$ is the $n \times n$ matrix of first partial derivatives:

(J_{r^{- 1}})_{ij} = \frac{\partial x _{i}}{\partial y _{j}}

We can use the determinant of the Jacobian to calculate the multivariate change of variables formula (more detailed geometric proof here):

g (y) = f (r^{- 1} (y)) ∣ det J_{r^{- 1}} (y) ∣ = f (x) ∣ det J_{r^{- 1}} (y) ∣ = \frac{f ( x )}{∣ det J _{r} ( x ) ∣} where x = r^{- 1} (y)

The determinant of Jacobian represents how much the area is getting stretched by the transformation when going from $y = r (x)$ , so we need to account for it to correct for the volume expansion of the transformation.

Example: Gaussians

Consider:

f (x) r (x) = N (x ∣ 0, 1) = σ x + μ (Normally distributed x)

Note that for a Gaussian distribution, the probability density function can be written as:

N (x ∣ μ, σ^{2}) = \frac{1}{2 π σ ^{2}} exp (- \frac{( x - μ ) ^{2}}{2 σ ^{2}})

Then if we have a new random variable $Y = r (X)$ , it’s pdf $g (y)$ can be calculated using change of variables formula, and we get the following:

r^{- 1} (y) J_{r} (x) g (y) g (y) = \frac{( y - μ )}{σ} = \frac{d y}{d x} (x) = σ = \frac{f ( r ^{- 1} ( y ))}{∣ det J _{r} ( r ^{- 1} ( y )) ∣} = \frac{N ( r ^{- 1} ( y ) ∣ 0 , 1 )}{∣ σ ∣} = \frac{N ( \frac{( y - μ )}{σ} ∣ 0 , 1 )}{∣ σ ∣} = \frac{1}{∣ σ ∣ 2 π} exp (- \frac{( \frac{( y - μ )}{σ} ) ^{2}}{2}) = \frac{1}{2 π σ ^{2}} exp (- \frac{( y - μ ) ^{2}}{2 σ ^{2}}) = N (y ∣ μ, σ^{2})

The importance of this result is that we see two ways to calculate the probability density function $g (y)$ :

g (y) = \frac{N ( \frac{( y - μ )}{σ} ∣ 0 , 1 )}{∣ σ ∣} = N (y ∣ μ, σ^{2})

The first approach first calculates $x = r^{- 1} (y) = \frac{y - μ}{σ}$ , then gets the probability density value of that transformed value according to the original pdf $N (x ∣0, 1)$ , and then performs an additional correction (divide by $∣ σ ∣$ ). This is used in the reparameterization trick.

The second approach simply evaluates the probability density value of the input $y$ for a Gaussian parameterized by mean $μ$ and standard deviation $σ$ .

As we have shown, these are equivalent ways of calculating the pdf.

Continually Learning Blog

Explorer

Change of Variables

References

Summary

Key takeaways

Monotonically increasing derivation

Monotonically decreasing derivation

Combined change of variables formula

Multivariable change of variables

Example: Gaussians

References

Graph View

Table of Contents

Backlinks