Codementor Events

Docker 101

Published Oct 31, 2017

NOTE: This is a repost from my blog

When working in multi node environment like Spark/Hadoop clusters, docker diminishes the barrier to entry. By barrier to entry, I mean the need to have a constantly running EMR cluster, when you are still in development phase. With Docker, you can quickly setup a 4-5 node cluster on a single machine and start coding your spark job. You can understand what Docker is and why you would use Docker on these links.

Benefits

  • You can very easily version control your environment
  • Barrier to entry for working with clusters (Spark/Hadoop) etc. reduces a lot. You no longer need EMR access to run a cluster which will have a cost associated with it.

Installing Docker

Follow this official guide

Manual

Quickly for ubuntu steps are:

sudo apt-get update
sudo apt-get install \ apt-transport-https \ ca-certificates \ curl \ software-properties-common
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
sudo add-apt-repository \ "deb [arch=amd64] https://download.docker.com/linux/ubuntu \ $(lsb_release -cs) \ stable"
sudo apt-get update
sudo apt-get install docker-ce
sudo docker run hello-world

With Ansible

You can use following command to use ansible to install docker for you

sudo python2.7 -m pip install ansible && sudo ansible-galaxy install --force angstwad.docker_ubuntu && echo '- hosts: all roles: - angstwad.docker_ubuntu ' > /tmp/docker_ubuntu.yml && sudo ansible-playbook /tmp/docker_ubuntu.yml -c local -i 'localhost,'

Setting up a cluster

Follow this post.

You will be able to run a local spark cluster with 4 commands.

Quick overview:

mkdir spark_cluster; cd spark_cluster echo 'version: "2" services: master: image: singularities/spark command: start-spark master hostname: master ports: - "6066:6066" - "7070:7070" - "8080:8080" - "50070:50070" worker: image: singularities/spark command: start-spark worker master environment: SPARK_WORKER_CORES: 1 SPARK_WORKER_MEMORY: 2g links: - master
' > docker-compose.yml sudo docker-compose up -d # sudo docker-compose scale worker=2

Extending other images

With docker you can build on top of someone else’s image. For example, here I will extend singularities/spark image, make my custom spark configuration changes, and push the final version to my own docker hub repo.

Pushing your changes to Docker hub

To create a fork from some base repo (singularities/spark), these are the steps

sudo docker run -it singularities/spark # Run base repo. This will open a shell
# Make your changes to the image in this container
sudo docker login --username=chaudhary --password=lol
sudo docker commit <container ID from docker ps> chaudhary/my-repo-name # Commit changes
sudo docker tag <image ID from docker images> chaudhary/my-repo-name # Tag for pull to work properly
sudo docker push chaudhary/my-repo-name

Now that you have pushed this image, you can start a new container from this image as shown below:

sudo docker run -it chaudhary/my-repo-name

Resources

For more information read the official getting started guide.

Discover and read more posts from Shubham Chaudhary
get started