How to Control Deep Networks in Real-Time
In a recent stream of publications, many deep learning research teams, including Uber's, have explored the so-called Plug & Play Generative Models (PPM).
PPM is a technique to modify a generative deep network in real-time, that is, while the network generates a sequence.
Basic of Deep Generative Models
A Generative Network is a deep learning model that has been trained so to predict the next element of a sequence, given the latter. Let's look at one example.
Say we have got a sentence made of 10 words. We want to complete such sentence with 2 more words, and we are given a deep generative models that processes inputs of fixed length (10 words).
- We give the first 10 words to the network, and it'll predict the most likely 11th word.
- We have now a sentence made of 11 words, the initial 10 plus the generated one. As the network only accepts inputs of length 10, we take the last 10 words of the current sentence (hence, from the 2nd to the 11th) and use them as input for a new generation step.
Steps 1 and 2 can be repeated in a loop until the sentence reaches a desired length. This process is called generative loop.
Text Generation and PPM
Natural Language Processing is a very hot topic in deep learning these days. The frequency of publications about NLP is really impressive.
The Uber article I mentioned at the beginning is also about NLP. I really like the motivating section that the authors highlight at the beginning:
- Good Deep Models have become incredibly large; some have over over 2 billions parameters. That makes training freaking expensive, even for a custom application that is just slightly different from what the original model was meant for.
- While fine-tuning (training only the final layer(s)) used to be the solution to this problem, saving time and money, that's almost not true anymore. Large model are so huge that fine-tuning has become expensive too. Actually, sometimes the model is so big that even to simply load it in memory we need an expensive GPU!
Plug & Play may be a solution to these problems.
The key idea is that instead of re-training, fine-tuning, or modifying in any way the base model before using it, we use it as-it-is.
Additionally, and here's the key, we need to design another model, much simpler (and smaller), that at each generation step processes information about the whole system and determines a "steering action".
What is a steering action, then? It's a change that we have to make inside the large model. For example, we could change the weights of some layer, or the bias vector, or the activation function.
Whatever the action, the key points are:
- The steering model must be efficient. The most efficient you can get is with a algebraic formula. It can also be a (very) small deep learning model though.
- The steering action must be applied into the large model. To do this, you'll need hands-on practice with a deep learning framework such as Tensorflow, PyTorch, etc.
A Simple Experiment
I really recommend reading (and studying) Uber's paper-- see the references at the end. They use GPT-2 as base model, which is one of the best NLP generative models out there. And they propose a few approaches for the steering model, including a Bag-Of-Words approach that works quite well.
Overall, they achieve great results. However, I got one problem when I read their paper: the implementation of the PPM idea is quite complicated, because text generation and GPT-2 are a very complex application and a very complex model.
Thus, to truly understand the details in order to be able to apply them to my own projects, wasn't easy at all.
That's why I set out to reproduce Plug & Play with a much simpler experiment. It has allowed me to clarify the nuts and bolts of the approach, and I learnt how to apply it to (almost) any network. Let me walk you through this experiment of mines, and then I'll share the code with you.
The objective is generate a perfect sinusoidal wave with a custom model. Here is such a wave generated numerically with numpy
In the above function, frequency is 2 cycles per second, maximum amplitude is 1, and sampling rate is 10 samples per second.
As I said, I used numpy.sin(...)
function to generate it. Would it be possible to generate the same wave with a deep network? For sure it is, but finding the right network is not easy.
I started with a very simple, 3-layers, densely connected network. I trained it on sequences of 10 samples, to predict the 11th sample. And here's the result of the generative loop.
The approximation is not too bad, but it is somehow off. It looks like the generated wave loses one cycle every 5 second.
Now, I could have gone back to the architecture design. Added a few more deep layers, or even use LSTM (that, I admit, makes sense).
For the sake of experiment I decided to use PPM instead.
I designed a very simple steering model that is basically a simple implementation of real-time gradient descent. Gradient descent is normally used during training only. Here instead, I use it again while generation happens, in order to steer the "large" model.
More details later and in the code. For now, let's look at the result.
Note that the above figure is on a subset of the time horizon (from to ).
It's impressive how closely the generated wave with control can follow the real wave. If you zoom in enough, you will also notice that the controlled wave is much more "nervous" than the other two. That's due to the steering action that takes place at each time step.
References and Code
The first well-known work on PPM, in the context of image-generation, was published by researchers at the University of Wyoming, University of Freiburg, Jason Yosinski of Uber AI Labs, and none other than Yoshua Bengio. It's very technical and is available as a paper on arxiv.
If you want to start on PPM, I recommend reading Uber's publication. They deal with NLP. The same authors have published a technical paper on arxiv as well as a blog post. I recommend starting with the latter.
As for my illustrative example, you can find it in my GitHub repository, or even in a shared Colab notebook.
Worth knowing too, I have published an expanded version of this article, with a lot of more technical details. You can read it for free here.