Codementor Events

Understanding OpenAI Gym

Published Mar 23, 2018Last updated Sep 19, 2018


OpenAI Gym Logo

OpenAI is a non-profit research company that is focussed on building out AI in a way that is good for everybody. It was founded by Elon Musk and Sam Altman. OpenAI’s mission as stated on their website is to “build safe AGI, and ensure AGI’s benefits are as widely and evenly distributed as possible”.

OpenAI Gym is a toolkit for developing and comparing reinforcement learning algorithms. It supports teaching agents everything from walking to playing games like pong or pinball. Gym is an open source interface to reinforcement learning tasks. Gym provides an environment and its is upto the developer to implement any reinforcement learning algorithms. Developers can write agent using existing numerical computation library, such as TensorFlow or Theano.


An OpenAI Gym environment (AntV0) : A 3D four legged robot walk

Gym Sample Code

Let us take a look at a sample code to create an environment named ‘Taxi-v1’.

import gym  
env = gym.make([“Taxi-v1”](https://gym.openai.com/envs/Taxi-v1))

Another code below, will execute an instance of ‘CartPole-v0’ environment for 1000 timestamps, rendering the environment at each step. Once this executes, we will see a window pop-up re
ndering the classic cart-pole problem:

import gym  
env = gym.make(‘CartPole-v0’)  
obs = env.reset()  
for _ in range(1000):  
env.render()  
env.step(env.action\_space.sample()) ## take a random action

The following is the output we see in the terminal on executing the above code.

The following is the popup window that shows the cart pole environment being executed.

CartPole is a traditional reinforcement learning task where you have to balance the stick.

Let us try to understand this code:

  1. gym.make(“ENVIRONMENT NAME”) : returns the environment that was passed as parameter. If you go to this link https://gym.openai.com/envs/#classic_control, you can see list of all the different environments that have been added by the community. Another list of all environments can be found at this link : https://github.com/openai/gym/wiki/Table-of-environments
  2. env.reset(): This command will reset the environment as shown below in screenshot. It returns an initial observation.
  3. for _ in range(1000): This line in python code will run an instance of ‘CartPole-v0’ environment for 1000 timesteps.
  4. env.render() : This command will display a popup window. Since it is written within a loop, an updated popup window will be rendered for every new action taken in each step.
  5. env.step() : This command will take an action at each step. The action is specified as its parameter. Env.step function returns four parameters, namely observation, reward, done and info. These four are explained below:

a) observation : an environment-specific object representing your observation of the environment.

b) reward : amount of reward achieved by the previous action. It is a floating data type value. The scale varies between environments.

c) done : A boolean value stating whether it’s time to reset the environment again.

d) info (dict): diagnostic information useful for debugging.

Each timestep, the agent chooses an action , and the environment returns an observation and a reward.

Hope this article gives you an idea about what OpenAI Gym is and how you can use it to implement your reinforcement learning algorithms.

Discover and read more posts from Ashish
get started