Introduction
This starts in chapter 3 of deep learning, where we begin to look at Tensorflow
and Keras
. So, what have we learned so far/know? We’ve seen the fundamental ways in which we train a neural network:
- First, low-level tensor manipulation - the infrastructure that underlies all modern day machine learning. This translates to
Tensorflow
APIs:- Tensors, including special tensors that store the network’s state (variables)
- Tensor operations, such as addition,
relu
,matmul
- Back propagation, a way to compute the gradient of mathematical expressions (handled in
Tensorflow
via theGradientType
object
- Second, high-level deep learning concepts. This translates to
Keras
APIs:- Layers, which are combined into a model
- A loss function, which defines the feedback signal used for learning
- An optimizer, which determines how learning proceeds
- Metrics to evaluate model performance, such as accuracy
- A training loop that performs mini-batch stochastic gradient descent
Constant tensors and variables
TensorFlow
tensors can be thought of as multidimensional arrays. So, can we go in and manually adjust the values inside of a TensorFlow
tensor? no. Unlike something like NumPy
, tensors in TensorFlow
are constants.
To train a model, we’ll need to update its state, which is a set of tensors. So, these need to be mutable, hence, we use variables. To create a variable, you need to provide some initial value, such as a random tensor.
Also, TensorFlow
is capable of performing math operations, like NumPy
:
But, what makes TensorFlow
different from NumPy
? But here’s something NumPy can’t do: retrieve the gradient of any differentiable expression with respect to any of its inputs.
It is actually possible for these inputs to be any arbitrary tensor. However, only trainable variables are tracked by default. With a constant tensor, you would have to manually mark it as being tracked by calling tape.watch()
on it.
The gradient tape is a powerful utility, even capable of computing second-order gradients: