PyTorch is one of the leading frameworks for deep learning. Its core data structure is Tensor, a multi-dimensional array implementation with many advanced features like auto-differentiation. PyTorch is a massive codebase (approx. a million lines of C++, Python and CUDA code), and having a method for iterating over tensors in a very efficient manner that is independent of data type, dimension, striding and hardware is a critical feature that can lead to a very massive simplification of the codebase and make distributed development much faster and smoother. The TensorIterator C++ class within PyTorch is a complex yet useful class that is used for iterating over the elements of a tensor over any dimension and implicitly parallelizing various operations in a device independent manner.
It does this through a C++ API that is independent of type and device of the tensor, freeing the programmer of having to worry about the datatype or device when writing iteration logic for PyTorch tensors. For those coming from the NumPy universe, NpyIter is a close cousin of TensorIterator.
This post is a deep dive into how TensorIterator works, and is an essential part of learning to contribute to the PyTorch codebase since iterations over tensors in the C++ codebase are extremely commonplace. This post is aimed at someone who wants to contribute to PyTorch, and you should at least be familiar with some of the basic terminologies of the PyTorch codebase that can be found in Edward Yang's excellent blog post on PyTorch internals. Although TensorIterator can be used for both CPUs and accelerators, this post has been written keeping in mind usage on the CPU. Although there can be some dissimilarities between the two, the overall concepts are the same.
Read more… (7 min remaining to read)
from Planet Python
via read more
No comments:
Post a Comment