Why my first hello world program was based on machine learning
I started learning programming when I was 21. At that time was in university studying something completely different. But I had an idea. An idea, that was so big, obsessive and completely engulfed my existence and not let me think about anything else. I wanted to implement a machine learning based aim assistance for a game.
So I got to work and learned how loops and if statements worked. I’m unfortunately not kidding. Dunning-Kruger was at full effect I suppose, not knowing how little I knew. Thinking a machine learning project was a good idea for someone who doesn’t even know how what control flow was.
That was okay though, my drive, dedication and motivation is what kept me going. Day after day working on something I had no idea how it worked. Learning the basics of programming was easy enough I thought, only taking me a few months, so I went on to the machine learning part.
Starting out with an evolutionary algorithm that evolves the topology of a neural network. It didn’t work. I learned about the difference between discrete and continues domains, I needed a continuous output from my neural network. After some adjusting and approximately a month later, I gave up with trying to evolve a neural network.
So my next hunch was reinforcement learning. Since I had no way of accessing data from the game, I had to rely on visual data which I processed with computer vision. The model does random actions and whenever it hit a taget, I gave it some reward. Technically not a dumb idea, unfortunately the chance of hitting a target was way to low and reinforcement learning usually needs a lot of data to improve.
I had to rely on something else but chance. The answer was curiosity. “Curiosity-driven Exploration by Self-supervised Prediction” (Pathak, 2017) had exactly what I was looking for. Instead of randomly hoping to get a reward, the Intrinsic Curiosity Module (ICM), according to Pathak “[…] helps to discover the environment out of curiosity when extrinsic rewards are spare or not present at all.” The best part about this was the paper was implemented by the authors. The only thing I had to do was grab the parts I needed from their code base and put it into my code. A little easier said than done since I have never done this before. I had to refactor a lot and tried to encapsulate the ICM. After a little fiddling it worked out fine and the code ran wonderfully.
Only problem is, it didn’t help with getting data to backpropagate through the neural network. I’m certain I did something wrong back then but I was fed up with many months of not even getting close to progress. I took a few weeks off to clear my mind and hoped for some inspiration from the outside world, which I haven’t seen in a while (both the inspiration and the outside world).
During that time I reflected on what worked and what didn’t, in a way backpropagating my own neural network. I realized there were a few things that worked without a lot of hiccup, like gathering data by detecting some objects on screen. Detecting objects? I wondered, why haven’t I tried using object detection? Seems pretty reasonable to me I thought. I then got back to work.
I started by gathering data again. What I needed were pictures of the targets. Lots of ‘em. That data consists of a picture and a corresponding file that stored the bounding box for the image. But where did I get the bounding box from? I could hand label them and there are tools for that. But labelling a few hundred thousand images by hand doesn’t sound like an activity a true programmer would do. No, we spend hours automating a task, that can be done by hand in 10 minutes. Only this task probably requires a little more than 10 minutes so it seemed worth it.
By using an existing cheat, I could draw literal bounding boxes on screen around the target. Then with some visual computing magic, I extracted the bounding boxes position, relative to the image. Now I had all the data I needed, I just needed a neural network that excels in image processing. A convolutional neural network (CNN). I used one by google, piped all my data through it and let it train for the night.
Next day I woke up, feeling pretty neutral, because I have been at this point many times before. Anticipating my program finally working and getting disappointed. I made it a habit not to anticipate anymore.
So there I was, sitting in front of my screen, and starting this (by my current standards) horrendous looking script and pressing the key that would move the mouse to the center of the closest found bounding box. And it moved. Into the opposite direction. I mixed up a sign.
Changing the plus to a minus, I witnessed something truly magnificent. After a year of failure and letdowns, there I was, experiencing my first big breakthrough in this entire project and my programming career. The feeling of satisfaction was unfathomable. I finally had something to show, something to say Hello World! I’m a programmer now! Not only did I learn programming, which is already valuable on its own, the actual lesson in my opinion, is knowing that I can accomplish anything I’ve put my mind to. That I’m not a failure. I just never found something I was passionate about. But I’ve found it.
Getting ideas from my head into code and having it work is one of the things I’m most passionate about. The satisfaction after a long coding session, pressing the run button and the code just working is unreal.
Of course there are bugs that, more often than not, need to be fixed, before your code runs as intended. But sometimes there aren’t and it just does. And it gives me the following feeling:
Now, before I keep on rambling and getting off topic, I’d like to say this: no matter who you are and where you stand in life, it’s never wrong to find your passion and keep looking for it, until you have found it. Even if it’s not coding. But it probably will be coding.
Thank you for reading and have a great rest of your day!
Ray