3.3. Nodes and Tensors¶
3.3.1. Nodes¶
primitiv has two different classes to calculate newral networks: Node
and
Tensor
.
Nevertheless the basic usage of these classes are identical, the inner behavior
of them is essentially different.
A Node
object behaves a reference to an intermediate result of the network.
Each Node
object corresponds a Device
object that represents the
phisical location of the calculated data, and a Graph
object that the
intermediate result belongs to.
Node
objects contain only a few information to identify the corresponding
intermediate result in the Graph
object, and have interfaces to communicate
the Graph
object to obtain actual data.
Copying Node
objects is typically a light operation.
3.3.1.1. Lazy Evaluation¶
Alithmetic operators between Node
objects and functions defined in the
primitiv::functions
namespace register a new operation to the Graph
object, and return a new Node
object representing the result of the
new operation. Actual calculation of each operation is postponed until the
values are actually required.
Once the operation is performed, the resulting values will be cached in the
Graph
object to prevent duplicated calculation.
Following examples show how Node
objects work:
using namespace primitiv;
namespace F = primitiv::functions;
// Creating a `Node` object with no information: it does not point to any
// existing data.
const Node n0;
// Creating a `Device` and a `Graph` and setting them as the defaults.
devices::Naive dev;
Device::set_default(dev);
Graph g;
Graph::set_default(g);
// Creating two `Node` objects as the data sources of the computation graph.
const Node n1 = F::input<Node>({3}, {1, 2, 3});
const Node n2 = F::input<Node>({3}, {1, 1, 1});
// Creating a new `Node` object representing the result of some operations.
const Node n3 = n1 + n2;
const Node n33 = F::tanh(n1);
// Copying a `Node` object.
// This operation does not yield copying phisical results.
const Node n4 = n3;
// Obtaining the actual results corresponding to a `Node` object.
// The `n1 + n2` operation will be actually performed here.
// And n33 is not calculated because it is not necessary to calculate `n4`.
const std::vector<float> values4 = n4.to_vector(); // {2, 3, 4}
// Defining an additional operation.
const Node n5 = n4 + F::input<Node>({3}, {3, 2, 1});
// Obtaining the result.
// The value represents `(n1 + n2) + {3, 2, 1}`, but the actual calculation
// will prevent the `n1 + n2` operation, and use the cached values of `n4`.
const std::vector<float> values5 = n5.to_vector(); // {5, 5, 5}
3.3.1.2. Executing Backpropagation¶
Node
object can perform the backpropagation.
Unlike the forward operations described above, results of the backpropagation
(gradients corresponding to Node
objects) will be discarded whenever it is
no longer used.
To execute the backpropagation from a specified Node
object (typically the
Node
representing the sum of loss values), users should call the
Node::backward()
function:
using namespace primitiv;
namespace F = primitiv::functions;
devices::Naive dev;
Device::set_default(dev);
Graph g;
Graph::set_default(g);
// Creating the graph with a `Parameter`.
Parameter p({3}, {0, 0, 0});
const Node w = F::parameter(p);
const Node x = F::input({3}, {1, 2, 3});
const Node y = w * x; // Elementwise multiplication
// Initializes the gradients of parameters.
p.reset_gradient();
const std::vector grad1 = y.gradient().to_vector(); // {0, 0, 0}
// Executing the backpropagation.
y.backward();
// All gradient values are disposed before arriving here.
const std::vector grad2 = y.gradient().to_vector(); // {1, 2, 3}
3.3.2. Tensor¶
Tensor
class is another interface to calculate networks using similar
interface with Node
.
Unlike the Node
objects, Tensor
objects hold actual resulting values
of corresponding operations, and the calculation will be performed at the same
time as creating new Tensor
objects.
Additionally, Tensor
objects can not perform the backpropagation because
they do not record the history of calculation.
Instead of these disadvantages, Tensor
objects do not consume more memory
than actual existence of all Tensor
objects at the time, and do not yield
any overhead of constructing computation graphs.
Users can use Tensor
instead of Node
when users do not need the gradient
information (e.g., testing trained models).
Following examples show how the Tensor
objects work:
using namespace primitiv;
namespace F = primitiv::functions;
// Creating a `Tensor` object with no information: it does not point to any
// existing data.
const Tensor t0;
// Creating a `Device` and setting it as the default.
// `Tensor` objects do not require the `Graph` object.
devices::Naive dev;
Device::set_default(dev);
// Creating two `Tensor` objects with their own data.
const Tensor t1 = F::input<Tensor>({3}, {1, 2, 3});
const Tensor t2 = F::input<Tensor>({3}, {1, 1, 1});
// Creating a new `Tensor` object representing the result of some operations.
// The operations will be performed as soon as these statements are evaluated.
// And `t3` and `t33` hold their own values internally.
const Tensor t3 = t1 + t2;
const Tensor t33 = F::tanh(t1);
// Copying a `Tensor` object.
// This operation basically does not yield a large overhead.
// `n3` and `n4` shares the inner memory while they refers the same values.
const Tensor t4 = t3;
// Obtaining the inner values from a `Tensor` object.
const std::vector<float> values4 = n4.to_vector(); // {2, 3, 4}