Introduction
This starts in chapter 3 of deep learning, where we begin to look at Tensorflow
and Keras
. So, what have we learned so far/know? We’ve seen the fundamental ways in which we train a neural network:
- First, low-level tensor manipulation - the infrastructure that underlies all modern day machine learning. This translates to
Tensorflow
APIs:- Tensors, including special tensors that store the network’s state (variables)
- Tensor operations, such as addition,
relu
,matmul
- Back propagation, a way to compute the gradient of mathematical expressions (handled in
Tensorflow
via theGradientType
object
- Second, high-level deep learning concepts. This translates to
Keras
APIs:- Layers, which are combined into a model
- A loss function, which defines the feedback signal used for learning
- An optimizer, which determines how learning proceeds
- Metrics to evaluate model performance, such as accuracy
- A training loop that performs mini-batch stochastic gradient descent
Constant tensors and variables
import tensorflow as tf
# all-ones tensor
x = tf.ones(shape = (2,1)) # all-ones tensor (2x1 matrix in this case)
print(x)
# random tensors
x = tf.random.normal(shape=(3, 1), mean=0., stddev=1.)
TensorFlow
tensors can be thought of as multidimensional arrays. So, can we go in and manually adjust the values inside of a TensorFlow
tensor? no. Unlike something like NumPy
, tensors in TensorFlow
are constants.
import numpy as np
x = np.ones(shape=(2, 2))
x[0, 0] = 0. # this is fine, "." after 0 indicates we are working with floating point
# now, what if we try this with tensorflow array, we will get an error
x = tf.ones(shape=(2, 2))
x[0, 0] = 0.
To train a model, we’ll need to update its state, which is a set of tensors. So, these need to be mutable, hence, we use variables. To create a variable, you need to provide some initial value, such as a random tensor.
v = tf.Variable(initial_value=tf.random.normal(shape=(3, 1)))
# is
array([[-0.75133973],
[-0.4872893 ],
[ 1.6626885 ]], dtype=float32)
v.assign(tf.ones((3, 1)))
# is
array([[1.],
[1.],
[1.]], dtype=float32)
v[0, 0].assign(3.)
# is
array([[3.],
[1.],
[1.]], dtype=float32)
Also, TensorFlow
is capable of performing math operations, like NumPy
:
a = tf.ones((2, 2))
b = tf.square(a) # square of a tensor, element-wise
c = tf.sqrt(a) # square root of a tenso, element-wise
d=b+c # element-wise addition
e = tf.matmul(a, b) # tensor products
e *= d # element-wise multiplication
But, what makes TensorFlow
different from NumPy
? But here’s something NumPy can’t do: retrieve the gradient of any differentiable expression with respect to any of its inputs.
input_var = tf.Variable(initial_value=3.) with tf.GradientTape() as tape:
result = tf.square(input_var)
gradient = tape.gradient(result, input_var)
## here is an easier example to understand
import tensorflow as tf
x = tf.Variable(3.0)
with tf.GradientTape() as tape:
y = tf.square(x)
dy_dx = tape.gradient(y, x) # derivative of y = x^2, x = 3, we get 6
It is actually possible for these inputs to be any arbitrary tensor. However, only trainable variables are tracked by default. With a constant tensor, you would have to manually mark it as being tracked by calling tape.watch()
on it.
input_const = tf.constant(3.) with tf.GradientTape() as tape:
tape.watch(input_const)
result = tf.square(input_const)
gradient = tape.gradient(result, input_const)
The gradient tape is a powerful utility, even capable of computing second-order gradients:
time = tf.Variable(0.)
with tf.GradientTape() as outer_tape:
with tf.GradientTape() as inner_tape: position = 4.9 * time ** 2
speed = inner_tape.gradient(position, time)
acceleration = outer_tape.gradient(speed, time)