Daily Python: PSF GSoC students blogs: [Community Bonding Period] What is Automatic Differentiation?

1 Differential method

There are four differential methods:

Manual differentiation
Numerical differentiation
Sign differential
Automatic differentiation

Manual differentiation is to use the derivation formula to manually write the derivation formula. This method is accurate and effective, and the only disadvantage is that it takes effort.

Numerical differentiation uses the definition of derivative:

This method is simple to implement, but there are two serious problems: truncation error and roundoff error. But this method can be a good way to check whether the gradient is accurate.

Another method is symbolic differentiation, which transfers the work we did in manual differentiation to the computer. The problem with this method is that the expression must be closed-form, that is, there cannot be loops and conditional expressions. So that the entire problem can be converted into a pure mathematical symbol problem can be solved using some algebraic software. However, when expressions are complex, the problem of "expression swell" is prone to occur.

The last is our protagonist: automatic differentiation. It is also the most widely used derivation method in programe.

2 Automatic differentiation

The automatic differentiation discovers the essence of differential calculation: Differential calculation is a combination of a limited series of differentiable operators.

We can regarded the formula as a calculation graph (What’s more, it can be regarded as a tree structure, too). In the process of forward calculation, we can obtain the value of each node.

Then we can express the derivation process of as follows:

It can be seen that the whole derivation can be split into a series of differential operator combinations. The calculation can be divided into two types: calculating the formula from forward to backward is called Forward Mode, and calculating the formula from backward to forward is called Reverse Mode. The process of the two modes is expressed as follows:

The gradient values calculated by the two modes are the same, but for the calculation order is different, the calculation speed is different. Generally, if the Jacobian matrix is relatively high, then the forward mode is more efficient; if the Jacobian matrix is wider, then the reverse mode is more efficient.

3 JVP, VJP and vmap

If you have used pytorch, you will find that if y is a tensor instead of a scalar, you will be asked to pass a grad_variables in y.backward(). And the derivative result x.grad has the same shape as x. Where is the Jacobian matrix?

The reason is that deep learning frameworks such as Tensorflow and PyTorch prohibit the derivatives with tensor by tensor, but only retain scalar by tensor. When we call y.backward () and enter a grad_variables . In fact, it actually converts y into a weighted sum l = torch.sum(y * v) , where l is a scalar, and then the gradient of x.grad is naturally of the same shape as x. The reason for this is that the loss of deep learning is definitely a scalar, and gradient descent requires that the gradient must be of the same type as x.

But what if we want to obtain the Jacobian matrix?

The answer is to derive x for each value of y.In addition, Google ’s new deep learning framework JAX uses a more advanced method, the vectorization operation vmap to speed up the calculation.

4 Reference

from Planet Python
via read more

Daily Python

Monday, May 25, 2020

PSF GSoC students blogs: [Community Bonding Period] What is Automatic Differentiation?

1 Differential method

2 Automatic differentiation

3 JVP, VJP and vmap

4 Reference

No comments:

Post a Comment

TestDriven.io: Working with Static and Media Files in Django

Search This Blog