Tensor Class¶
tensor
is a class template for doing arithmetic on small,
statically-sized vectors, matrices, and tensors. To create one, specify
the underlying type in the first template argument, followed by a list
of integers for the shape. For example, tensor<float,3,2,4>
is
conceptually similar to the type float[3][2][4]
.
Here are some examples and features:
tensor
has value semantics (in contrast to C++ multidimensional arrays):// c-style arrays are okay for storage, but can't do much else { double u[3] = {1.0, 2.0, 3.0}; double v[3] = u; // does not compile double * w = u; // does compile, but size information is lost and w is a shallow copy } // tensors store their own data and can be copied, assigned, referenced, and transformed { tensor < double, 3 > u = {1.0, 2.0, 3.0}; tensor < double, 3 > v = u; // make a copy of u tensor < double, 3 > & w = u; // make a reference to u }
tensor
supports operator overloading:tensor < double, 3 > u = {1.0, 2.0, 3.0}; tensor < double, 3 > v = {4.0, 5.0, 6.0}; tensor < double, 3 > sum = u + v; tensor < double, 3 > weighted_sum = 3.0 * u - v / 7.0; weighted_sum += sum;
tensor
supports many common mathematical functions:tensor < double, 3, 3 > I = Identity<3>(); tensor < double, 3, 3 > grad_u = {...}; tensor < double, 3, 3 > strain = 0.5 * (grad_u + transpose(grad_u)); // or, equivalently, sym(grad_u) tensor < double, 3, 3 > stress = lambda * tr(strain) * I + 2.0 * mu * strain; tensor < double, 3 > traction = dot(stress, normal); tensor < double, 2, 3 > J = {...}; double dA = sqrt(det(dot(J, transpose(J)))); tensor < double, 3 > force = traction * dA;
tensor
supports useful shortcuts (like class template argument deduction):// class template argument deduction: A is deduced to have type tensor<double,2,2> tensor A = {{{1.0, 2.0}, {2.0, 3.0}}}; // create tensors from index notation expressions, B has type tensor<double,3,3,3,3> tensor B = make_tensor<3,3,3,3>([](int i, int j, int k, int l){ return 0.5 * ((i == j) * (k == l) + (i == l) * (k == j)); }); // slicing: get and set specific subtensors tensor< double, 3, 3 > C = B[0][2]; B[2][0][0] = C[1]; // access by operator() or operator[] C[1][1] = 2.0; C(2, 2) = 3.0;
tensor
arithmetic supportsconstexpr
evaluation:
constexpr tensor change_of_basis_matrix = {{ { 3.0, -2.0, 1.7}, { 2.0, 8.0, 1.7}, {-1.0, 4.0, 6.7} }}; // express a quantity in a new basis tensor v = dot(change_of_basis_matrix, u); // modify the components in the new basis v = f(v); // precompute the inverse basis transformation at compile time constexpr tensor inverse_change_of_basis_matrix = inv(change_of_basis_matrix); // convert the modified values back to the original basis u = dot(inverse_change_of_basis_matrix, v);
tensor
only allows operations between operands of appropriate shapestensor< double, 3, 2 > A{}; tensor< double, 3 > u{}; tensor< double, 2 > v{}; auto uA = dot(u, A); // works, returns tensor< double, 2 > auto Av = dot(A, v); // works, returns tensor< double, 3 > auto Au = dot(A, u); // compile error: incompatible dimensions for dot product auto vA = dot(v, A); // compile error: incompatible dimensions for dot product auto w = u + v; // compile error: can't add tensors of different shapes A[0] = v; // works, assign a new value to the first row of A A[1] = u; // compile error: can't assign a vector with 3 components to a vector of 2 components
Dual Number Class¶
dual
is a class template that behaves like a floating point value,
but also stores information about derivatives. For example, say we have
a function, \(f(x) = \frac{x \sin(\exp(x) - 2)}{1 + x^2}\). In C++,
one might implement this function as:
auto f = [](auto x){ return (x * sin(exp(x) - 2.0)) / (1 + x*x); };
If \(f(x)\) is used in a larger optimization or root-finding problem, we will likely also need to be able to evaluate \(f\;'(x)\). Historically, the two most common ways to get this derivative information were
Finite Difference Stencil:
static constexpr double epsilon = 1.0e-9; auto dfdx = [](double x) { return (f(x + epsilon) - f(x - epsilon)) / (2.0 * epsilon); }
This approach is simple, but requires multiple function invocations and the accuracy suffers due to catastrophic cancellation in floating point arithmetic.
Derive the expression for \(f\;'(x)\), either by hand or with a computer algebra system, and manually implement the result. For example, using Mathematica we get
\[f\;'(x) = \frac{\exp(x) (x + x^3) \cos(2 - \exp(x)) - (x^2 - 1) \sin(2 - \exp(x))}{(1 + x^2)^2},\]which must then be manually implemented in C++ code:
auto dfdx = [](double x) { return (exp(x) * (x + x*x*x) * cos(2 - exp(x)) - (x*x - 1) * exp(2 - sin(x))) / ((1 + x*x) * (1 + x*x)); };
This approach can give very accurate results, and allows the derivative implementations to be individually optimized for performance. The downside is that the symbolic differentiation and manual implementation steps can be error prone: mistakes in transcription, differentiation, or implementation can be hard to notice.
To emphasize this point, the expression for \(f\;'(x)\) given above is actually incorrect, and the subsequent C++ implementation of that incorrect expression for \(f \; '(x)\) is itself incorrect. But if you only skimmed the content above, you likely didn't notice.
The dual
class template provides a 3rd option that improves on the
accuracy and performance of finite difference stencil, without
sacrificing accuracy. In addition, it doesn't require the developer to
manually differentiate and write new code that might contain errors. An
example:
double answer = f(x); // evaluate f at x
dual< double > answer_and_derivative = f(make_dual(x)); // evaluate f and f' at x
double just_the_answer = answer.value;
double just_the_gradient = answer.gradient;
Internally, the implementation is remarkably simple:
template <typename gradient_type>
struct dual {
double value;
gradient_type gradient;
};
That is, dual
just stores a double
value and a specified type
for the gradient term. Then, the basic rules of differentiation are
encoded in the corresponding operator overloads:
template <typename gradient_type_a, typename gradient_type_b>
constexpr auto operator+(dual<gradient_type_a> a, dual<gradient_type_b> b)
{
return dual{a.value + b.value, a.gradient + b.gradient};
}
template <typename gradient_type_a, typename gradient_type_b>
constexpr auto operator*(dual<gradient_type_a> a, dual<gradient_type_b> b)
{
return dual{a.value * b.value, a.gradient * b.value + a.value * b.gradient};
}
and so on. In this way, when a dual number is passed in to a function, each of the intermediate values keep track of gradient information as well. The downside to this approach is that doing that arithmetic to track the gradients of intermediate values is more expensive than manually writing code for the derivatives.
However, by supporting both manually-written derivatives and dual
numbers, users can choose to calculate derivatives in whatever manner is
appropriate for their problem: manually-written gradients for
performance-critical codepaths, and automatic differentiation for
iterating quickly on prototypes and research.
Some additional resources on the theory and implementation of automatic differentiation are given below:
Article demonstrating how AD applies to a computational graph
Using tensor
and dual
together¶
In the previous example, \(f\) was a function with a scalar input and scalar output. In practice, most of the functions we care about are more interesting. For example, an isotropic linear elastic material in solid mechanics has the following stress-strain relationship:
or, in C++:
double lambda = 2.0;
double mu = 1.0;
static constexpr auto I = Identity<3>();
auto stress = [=](auto strain){ return lambda * tr(strain) * I + 2 * mu * strain; };
That is, stress()
takes a tensor<double,3,3>
as input, and
outputs a tensor<double, 3, 3>
:
tensor< double, 3, 3 > epsilon = {...};
tensor< double, 3, 3 > sigma = stress(epsilon);
In general, each part of a function's output can depend on each part of its inputs. So, in this example the gradient could potentially have up to 81 components:
If we promote the input argument to a tensor of dual numbers, we can compute these derivatives automatically:
tensor< double, 3, 3 > epsilon = {...};
tensor< dual< tensor< double, 3, 3 > >, 3, 3 > sigma = stress(make_dual(epsilon));
Now, sigma
contains value and gradient information that can be
understood in the following way:
There are also convenience routines to extract all the values and gradient terms into their own tensors of the appropriate shape:
// as before
tensor< dual< tensor< double, 3, 3 > >, 3, 3 > sigma = stress(make_dual(epsilon));
// extract the values
tensor< double, 3, 3 > sigma_values = get_value(sigma);
// extract the gradient
tensor< double, 3, 3, 3, 3 > sigma_gradients = get_gradient(sigma);
Differentiating Functions with Multiple Inputs and Outputs¶
Now let's consider a function that has multiple inputs and multiple outputs:
double mu = 1.0;
double rho = 2.0;
static constexpr auto I = Identity<3>();
auto f = [=](auto p, auto v, auto L){
auto strain_rate = 0.5 * (L + transpose(L));
auto stress = - p * I + 2 * mu * strain_rate;
auto kinetic_energy_density = 0.5 * rho * dot(v, v);
return serac::tuple{stress, kinetic_energy_density};
};
Here, f
calculates the stress, \(\sigma\), and local kinetic energy density, \(q\), of a fluid in terms of
the pressure p
(scalar), velocity v
(3-vector), and velocity gradient L
(3x3 matrix).
So, there are 2 outputs and 3 inputs, resulting in potentially 6 derivatives with different order tensors:
All of these derivatives can be calculated in a single function invocation by following the same pattern as before:
double p = ...;
tensor<double,3> v = ...;
tensor<double,3,3> L = ...;
// promote the arguments to dual numbers with make_dual()
tuple dual_args = make_dual(serac::tuple{p, v, L});
// then call the function with the dual arguments
auto outputs = apply(f, dual_args);
// note: serac::apply is a way to pass an n-tuple to a function that expects n arguments
//
// i.e. the two following lines have the same effect
// f(p, v, L);
// serac::apply(f, serac::tuple{p, v, L});
Like before, outputs
will now contain the actual output values, but also all gradient terms (6, in this case).
To get the gradient tensors, we call the same get_gradient()
function:
auto gradients = get_gradient(outputs);
The 6 gradient terms for this example can be thought of in a "matrix" where the \(i,j\) entry is the derivative of the \(i^{th}\) output with respect to the \(j^{th}\) input:
The type returned by get_gradient()
reflects this structure: returning a serac::tuple
of serac::tuple
.
So for this example, the return type will be of the form:
serac::tuple<
serac::tuple< df1_dx1_type, df1_dx2_type, df1_dx2_type >,
serac::tuple< df2_dx1_type, df2_dx2_type, df2_dx2_type >
>;
The individual blocks can be accessed by using serac::get()
.
Finally, if we look at the actual types contained in get_gradient(output)
we see a few interesting details:
serac::tuple<
serac::tuple<tensor<double, 3, 3>, zero, tensor<double, 3, 3, 3, 3> >,
serac::tuple<zero, tensor<double, 3>, zero >
> gradients = get_gradient(outputs);
First, the tensor shapes of the individual blocks are are in agreement with what we expect (e.g. \(\frac{\partial \sigma}{\partial p}\) is 3x3, \(\frac{\partial \sigma}{\partial L}\) is 3x3x3x3, etc).
Second, some of the derivative blocks seem to be missing!
Instead of actual tensors, a mysterious type zero
appears in three of the blocks
of our derivative. What does that mean?
It means that if we look back at the original definition of our function, we see that the stress tensor does not depend on v
at all.
Similarly, the kinetic energy density only depends on v
, while having no dependence on p
or L
. The implementation of the
tensor
and dual
class templates automatically detects and optimizes away unnecessary storage and calculations associated with
these derivative blocks that are identically zero.