When designing machine learning models, it is essential to receive feedback on their performance. To date, deep learning models largely remain a black box for us, and their internals are hard to peek a look at. However, it is still possible to obtain some insight, which is crucial to developing your model.
Why is it necessary to receive feedback on your model’s performance during training? The answer is to know which direction to tune your hyperparameters and whether your tuning will boost performance.
A naive way to monitor a model’s training performance is via a command line output. However, it is easy to miss the finer details this way. For example, it is easy to output the loss function after each training epoch, but it’s trickier to visualize how the weights are changed during training.
As with any task, it is essential to use a dedicated tool to gain more for less effort. Meet Tensorboard, the visualization framework that comes with Tensorflow.
Installation
You can install Tensorboard via Anaconda or Pip by running the following command:
pip install tensorboard
Alternatively, you can also use the following for Anaconda:
conda install tensorboard
Usage: Overview
Tensorboard runs as a server software. This server is started locally and continually monitors a directory that is specified by the user and contains the machine learning model logs. The logs need to be written in a specific format for Tensorboard to understand but major ML libraries, like Tensorflow or Keras, support this output out of the box.
Let us have a look at how Tensorboard works on an example.
Example
Task Specification
Let us assume we need to model a function f(x) = x * x
with machine learning. Precisely, we have the following training data we need to fit:
import numpy as np
Training Data
samples_num = 30
train_X = np.arange(0, samples_num).reshape((samples_num, 1))
train_Y = (train_X ** 2).reshape(samples_num, 1)
We will attempt to model the function with a neural network that has one hidden layer. We will implement the model in both Tensorflow and Keras to see how they interoperate with Tensorboard.
The Model
First, we will define the model in Tensorflow:
import tensorflow as tf
learning_rate = 0.000001 training_epochs = 500 display_step = 10 hidden_size = 128
g = tf.Graph() with g.as_default():
Inputs
X = tf.placeholder(np.float32, (samples_num, 1)) # *, 1
Y = tf.placeholder(np.float32, (samples_num, 1)) # *, 1
Model
W_1 = tf.get_variable("W_1", (1, hidden_size), np.float32, initializer=tf.random_uniform_initializer) b_1 = tf.get_variable("b_1", (1, hidden_size), np.float32, initializer=tf.random_uniform_initializer)
W_2 = tf.get_variable("W_2", (hidden_size, 1), np.float32, initializer=tf.random_uniform_initializer) b_2 = tf.get_variable("b_2", (1 ), np.float32, initializer=tf.random_uniform_initializer)
hidden = tf.nn.relu(tf.matmul(X, W_1) + b_1)
pred = tf.matmul(hidden, W_2) + b_2
The above model has the hidden layer defined as relu(X * W_1 + b_1)
. That is, we are using a standard fully-connected layer with relu
(linear rectified unit) (fn) as an activation function.
The hidden layer is then reused to output the predictions: hidden * W_2 + b_2
.
Afterwards, we define the weight initialisers and the objective function:
init = tf.global_variables_initializer()
Objective
cost = tf.losses.mean_squared_error(Y, pred)
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost)
The Summaries
We would like to monitor the performance of our model as we train it with the help of Tensorboard. To do so, we will use the “summaries” framework from Tensorflow:
Summaries
tf.summary.scalar ('loss', cost)
tf.summary.histogram('W_1' , W_1 )
tf.summary.histogram('b_1' , b_1 )
tf.summary.histogram('W_2' , W_2 )
tf.summary.histogram('b_2' , b_2 )
Using the functions from Tensorflow’s summary
package, one can log the performance of the graph’s selected nodes. In our case, we used a scalar
summary to log the performance of the cost
node, which outputs the mean squared error of our model. We used histogram
summaries to log the nodes that output matrices. We will see how the logs look in Tensorboard in a moment.
Once we have defined the summaries, we need to merge them into a separate node:
summaries = tf.summary.merge_all() log_writer = tf.summary.FileWriter(log_dir, graph = g)
The merge_all()
function creates a runnable graph node, which will turn the logs into the Tensorboard format. We will then be able to use the log_writer
to write the logs in a specified directory: log_dir
. The log_dir
is created as followed:
def new_run_log_dir(base_dir): log_dir = os.path.join('./log', base_dir) if not os.path.exists(log_dir): os.makedirs(log_dir) run_id = len([name for name in os.listdir(log_dir)]) run_log_dir = os.path.join(log_dir, str(run_id)) return run_log_dir
log_dir = new_run_log_dir('tensorflow-demo')
The new_run_log_dir
creates a separate directory under the base directory ./log/tensorflow_demo
for each run, with the names sequentially starting from 0
. In other words, the first run’s logs will be available under ./log/tensorflow_demo/0/
, the second run’s logs under ./log/tensorflow_demo/1/
, and so on. The new_run_log_dir
function can infer the number of the current run by counting the number of directories already created by each run under the ./log/tensorflow_demo/
directory. By having the summaries organised in separate identifiable directories, you can view the summary visualisations side by side in Tensorboard.
Training and Writing the Summaries
With the above model, we can train the model and write the summaries simultaneously by doing the following:
Training
with tf.Session(graph = g) as sess: sess.run(init)
Fit all training data
for epoch in range(training_epochs): sess.run(optimizer, feed_dict = {X: train_X, Y: train_Y})
# Log summaries
if epoch % display_step == 0:
summary, c = sess.run([summaries, cost], feed_dict={X: train_X, Y:train_Y})
print('Epoch: {};\tCost: {}'.format(epoch, c))
log_writer.add_summary(summary, epoch)
log_writer.flush()
predicted = sess.run(pred, feed_dict = {X: train_X})
print_summary(train_Y, predicted)
We first initialise the weights via sess.run(init)
(remember that init
was the node representing the graph initialiser in the model above). Then we perform the optimisation steps in the for
loop.
Notice the if
statement in the loop: the summary writing happens under every display_step
epochs. Let us see what’s going on here:
- The summaries are computed for the current epoch into the
summary
variable:summary, c = sess.run([summaries, cost], feed_dict={X: train_X, Y:train_Y})
. - The summaries are added to the summary writer to be written to the output directory:
log_writer.add_summary(summary, epoch)
. log_writer
is instructed to write the log directory:log_writer.flush()
.
Visualising the Summaries with Tensorboard
As already mentioned, Tensorboard is a server software that monitors a specified directory for summaries and visualises them. Let us start Tensorboard and specify ./log
as our target directory:
tensorboard --logdir log
You should get an output as followed:
TensorBoard 1.5.1 at http://localhost:6006 (Press CTRL+C to quit)
In order to use Tensorboard, navigate to http://localhost:6006 and you should see the following:
The three most important areas are highlighted in the screenshot above and should be self-explanatory. Notice how Tensorboard can display metrics of several runs at once? This can be useful for comparing the performance of the different runs. On the screenshot, I also have a few metrics from the Keras example, which we will get to soon.
Histograms
On the screenshot above, we have an example of a scalar summary (the loss function on the training dataset). The plot should be self-explanatory with the epochs on the horizontal axis and the value of the loss function on the vertical axis. Let us now see how Tensorboard interprets the output of the histogram
function – remember that we used the histogram
summary to log the weights of the model.
Above you can see the Distributions view of the matrices. For each epoch, the frequencies of the values in the matrix are specified by colour intensity. Also, notice how we can observe the distributions of the same variable across different runs, the first run being orange and the second one pink.
Here is how the same matrix is viewed as a histogram:
Graphs
Another feature of Tensorboard is visualising the model’s graph:
You can see the Tensorflow model that we have programmed displayed as a graph above.
Keras integration with Tensorboard
Let us now see how you can implement the same example in Keras while integrating with Tensorboard. Our model is defined as followed:
import os
os.environ['KERAS_BACKEND' ] = 'tensorflow'
os.environ['MKL_THREADING_LAYER'] = 'GNU'
import keras as ks
from keras.models import Sequential
from keras.layers import Dense
from keras.callbacks import TensorBoard
Parameters
learning_rate = 0.000001
training_epochs = 500
display_step = 10
hidden_size = 128
Model
model = Sequential()
model.add(Dense(hidden_size, activation='relu', input_shape=[1]))
model.add(Dense(1))
model.compile(loss = ks.losses.mean_squared_error,
optimizer = ks.optimizers.Adadelta())
The model is trained as followed:
log_dir = new_run_log_dir('keras-demo')
model.fit(train_X, train_Y,
epochs = training_epochs,
verbose = 1,
validation_data = (train_X, train_Y),
callbacks = [TensorBoard(log_dir = log_dir,
histogram_freq = 50)])
Notice how the training algorithm is instructed to perform Tensorboard output via the following line:
callbacks = [TensorBoard(log_dir = log_dir,
histogram_freq = 50)])
In Keras, you can control the fitting process via callbacks, one of which is TensorBoard
.
Summary
Tensorboard is a powerful tool that allows you to visualise the internals of your model while you train it:
- Scalar values as plots
- Matrices as histograms and probability distributions
- Support for image and audio summaries
Tensorboard integrates with Tensorflow and Keras. One should opt for Tensorboard to debug console output since the former provides more information and is easier to use.
The examples presented in the article are available on GitHub: https://github.com/anatoliykmetyuk/tensorboard-demos.