The legacy of Torch and the limits of static graphs
Deep learning in 2016 is dominated by TensorFlow, released by Google the previous year. TensorFlow adopts a static graph model: the developer first defines the entire neural network structure as a computational graph, then compiles it and finally executes it with data. This approach is efficient for production, but makes debugging complex — the graph is an opaque object, not inspectable with normal Python tools — and experimentation slow, because every structural change requires recompiling the entire graph.
PyTorch, released by Facebook AI Research (FAIR), inherits the mathematical foundations of Torch — a numerical computing framework written in Lua — but rethinks them entirely in Python, adopting a radically different approach: dynamic computational graphs.
Define-by-run
The central concept of PyTorch is define-by-run: the computational graph is not defined in advance but is built automatically during code execution. Every operation on a tensor creates a graph node on the fly. This means the developer can use normal Python constructs — if, for, while — to control the computation flow, and the graph adapts dynamically to each execution.
The advantage for researchers is immediate: they can inspect every step with print(), use the standard Python debugger, modify network architecture at runtime and experiment with structures that change shape at every data batch — such as recurrent networks with variable-length sequences.
Autograd
The autograd (automatic differentiation) system is the engine that makes dynamic graphs possible. Every operation on a tensor with requires_grad=True is tracked: PyTorch records the sequence of operations and, when .backward() is called, automatically computes gradients by traversing the graph in reverse. The researcher defines only the forward pass; gradient computation for optimisation is automatic.
Pythonic API
PyTorch does not abstract the model behind declarative APIs: tensors are Python objects, neural networks are Python classes, training is an explicit for loop. Those familiar with NumPy find a familiar interface, with the difference that PyTorch tensors support GPU execution and automatic differentiation. The project is released under the BSD licence and rapidly gains traction in academia as an alternative to TensorFlow for research.
Link: pytorch.org
