References
- libretexts writeup on change of variables is amazing. A lot of this is lifted from that but rewritten based on Michael Fishman’s more “direct” writing.
Summary
Consider a random variable that takes on real values in , and let be the probability density function (pdf) for the random variable . Then we can define the cumulative distribution function (cdf) for the random variable as the integration of the pdf from to :
Now let be a smooth, invertible function that induces a new random variable . We can calculate its pdf by calculating its cdf , in terms of ‘s cdf, which we know, and then take the derivative to get the pdf. We present the final form that this note derives.
Key takeaways
For single variable:
For multivariate setting:
Monotonically increasing derivation
Monotonically decreasing derivation
The one-to-one assumption for was required to let us assume the inverse is well define. The main difference between the monotonically increasing/decreasing setting is that for increasing, and for decreasing.
Combined change of variables formula
looks similar for both monotonic increasing and decreasing, except a sign. However, since is monotonically decreasing when is, then we know its derivative is also negative, so there is a negative component that will always cancel out. This means we can write in a single form regardless if its increasing/decreasing by just taking the absolute value of the derivative:
The last line is true because the derivatives of smooth, one-to-one functions are reciprocals (scalar inverses), so we can divide by the derivative of instead of multiply by its inverse.
Multivariable change of variables
Let be a one-to-one, monotonically increasing or decreasing for each output dimensional, and differentiable function from onto itself. The Jacobian of the inverse function is the matrix of first partial derivatives:
We can use the determinant of the Jacobian to calculate the multivariate change of variables formula (more detailed geometric proof here):
The determinant of Jacobian represents how much the area is getting stretched by the transformation when going from , so we need to account for it to correct for the volume expansion of the transformation.
Example: Gaussians
Consider:
Note that for a Gaussian distribution, the probability density function can be written as:
Then if we have a new random variable , it’s pdf can be calculated using change of variables formula, and we get the following:
The importance of this result is that we see two ways to calculate the probability density function :
The first approach first calculates , then gets the probability density value of that transformed value according to the original pdf , and then performs an additional correction (divide by ). This is used in the reparameterization trick.
The second approach simply evaluates the probability density value of the input for a Gaussian parameterized by mean and standard deviation .
As we have shown, these are equivalent ways of calculating the pdf.
References
- https://www.cs.ubc.ca/~murphyk/Teaching/Stat406-Spring08/homework/changeOfVariablesHandout.pdf
- https://tutorial.math.lamar.edu/classes/calciii/changeofvariables.aspx
- https://tutorial.math.lamar.edu/classes/calci/substitutionruleindefinite.aspx
- https://tutorial.math.lamar.edu/classes/calci/Differentials.aspx
- https://stats.libretexts.org/Bookshelves/Probability_Theory/Probability_Mathematical_Statistics_and_Stochastic_Processes_(Siegrist)/03%3A_Distributions/3.07%3A_Transformations_of_Random_Variables