Tim Van Wassenhove

Passionate geek, interested in Technology. Proud father of two

27 Aug 2015

Notes on running spark-notebook

These days Docker makes it extremely easy to get started with virtually any application you like. At first I was a bit skeptical but over the last couple of months I have changed my mind. Now I strongly believe this is a game changer. Even more when it comes to Windows. Anyway, these days kitematic (GUI to manage docker images) allows you to simply pick the spark-notebook by Andy Petrella.

When running your docker host in VirtualBox, you still need to set up port forwarding for port 9000 (the notebook) and ports 4040 to 4050 (spark-ui) Assuming your docker host vm is named default:

VBoxManage modifyvm "default" --natpf1 "tcp-port9000,tcp,,9000,,9000"

These days Docker makes it extremely easy to get started with virtually any application you like. At first I was a bit skeptical but over the last couple of months I have changed my mind. Now I strongly believe this is a game changer. Even more when it comes to Windows. Anyway, these days kitematic (GUI to manage docker images) allows you to simply pick the spark-notebook by Andy Petrella.

When running your docker host in VirtualBox, you still need to set up port forwarding for port 9000 (the notebook) and ports 4040 to 4050 (spark-ui) Assuming your docker host vm is named default:

VBoxManage modifyvm "default" --natpf1 "tcp-port9000,tcp,,9000,,9000"

Now you can browse to http://localhost:9000 and start using your new notebook:

You may want to copy the default set of notebooks to a local directory:

docker cp $containerName:/opt/docker/notebooks /Users/timvw/notebooks

Using that local copy is just a few clicks away with Kitematic:

Offcourse you want to use additional packages such as spark-csv. This can be achieved by editting the your notebook metadata:

You simply need to add an entry to customDeps:

When your container did not shutdown correctly, you may end up in the awkward situation that your container believes that it is still running(). The following commands fix that:

docker start $containerName && docker exec -t -i $containerName /bin/rm /opt/docker/RUNNING_PID