References

  • libretexts writeup on change of variables is amazing. A lot of this is lifted from that but rewritten based on Michael Fishman’s more “direct” writing.

Summary

Consider a random variable that takes on real values in , and let be the probability density function (pdf) for the random variable . Then we can define the cumulative distribution function (cdf) for the random variable as the integration of the pdf from to :

Now let be a smooth, invertible function that induces a new random variable . We can calculate its pdf by calculating its cdf , in terms of ‘s cdf, which we know, and then take the derivative to get the pdf. We present the final form that this note derives.

Key takeaways

For single variable:

For multivariate setting:

Monotonically increasing derivation

Monotonically decreasing derivation

The one-to-one assumption for was required to let us assume the inverse is well define. The main difference between the monotonically increasing/decreasing setting is that for increasing, and for decreasing.

Combined change of variables formula

looks similar for both monotonic increasing and decreasing, except a sign. However, since is monotonically decreasing when is, then we know its derivative is also negative, so there is a negative component that will always cancel out. This means we can write in a single form regardless if its increasing/decreasing by just taking the absolute value of the derivative:

The last line is true because the derivatives of smooth, one-to-one functions are reciprocals (scalar inverses), so we can divide by the derivative of instead of multiply by its inverse.

Multivariable change of variables

Let be a one-to-one, monotonically increasing or decreasing for each output dimensional, and differentiable function from onto itself. The Jacobian of the inverse function is the matrix of first partial derivatives:

We can use the determinant of the Jacobian to calculate the multivariate change of variables formula (more detailed geometric proof here):

The determinant of Jacobian represents how much the area is getting stretched by the transformation when going from , so we need to account for it to correct for the volume expansion of the transformation.

Example: Gaussians

Consider:

Note that for a Gaussian distribution, the probability density function can be written as:

Then if we have a new random variable , it’s pdf can be calculated using change of variables formula, and we get the following:

The importance of this result is that we see two ways to calculate the probability density function :

The first approach first calculates , then gets the probability density value of that transformed value according to the original pdf , and then performs an additional correction (divide by ). This is used in the reparameterization trick.

The second approach simply evaluates the probability density value of the input for a Gaussian parameterized by mean and standard deviation .

As we have shown, these are equivalent ways of calculating the pdf.

References