Setting up Apache Spark

Setup Apache Spark development sandbox to run in a LXC Container.
I have Ubuntu Desktop 16.04 LTS (Xenial release) installed on my machine.

Installation and Setup Steps – Sandbox

Install LXC
sudo apt install -y lxc

The system now has all the LXC commands available, all its templates as well as the Python3 binding to script LXC.

See: https://help.ubuntu.com/lts/serverguide/lxc.html#lxc-installation

Create a Container

This creates a privileged container called sparksandbox from the Ubuntu distribution, Xenial release, for amd64 architecture:

sudo lxc-create -n sparksandbox -t ubuntu -- -r xenial

## # The default user is 'ubuntu' with password 'ubuntu'! # Use the 'sudo' command to run tasks as root in the container. ##

List containers:

sudo lxc-ls -f

NAME STATE AUTOSTART GROUPS IPV4 IPV6 sparksandbox STOPPED 0 - - -
Start Container
sudo lxc-start -d -n sparksandbox

Get detailed container information. Take note of the container’s IP address shown below:

sudo lxc-info -n sparksandbox

Name: sparksandbox State: RUNNING PID: 28605 IP: 10.0.3.129 CPU use: 1.01 seconds BlkIO use: 60.66 MiB Memory use: 80.02 MiB KMem use: 5.84 MiB Link: vethQRIHXO TX bytes: 1.59 KiB RX bytes: 12.54 KiB Total bytes: 14.13 KiB
Configure Sandbox

Secure login into the new container. The password for default user ubuntu is ubuntu.
At this point, ssh will fail to forward X because there is no xauth.

ssh -X ubuntu@10.0.3.129

To connect using ssh with X11 forwarding, install the xauth package.
First update the package lists for upgrades with outstanding package updates, as well as with new packages that have just come to the repositories:

sudo apt-get update

Install xauth package:

sudo apt-get install xauth

Exit the container:

exit

..and re-enter:

ssh -X ubuntu@10.0.3.129

Install Firefox:

sudo apt-get install -y firefox

Install handy tools/utils:

sudo apt-get install -y tree
sudo apt-get install -y git
sudo apt-get install -y unzip
sudo add-apt-repository ppa:atareao/sunflower
sudo apt-get apt-get update
sudo sudo apt-get install sunflower
sudo apt-get install -y software-properties-common

Installation and Setup Steps – Oracle JDK8, Apache Spark and Scala IDE

Install Oracle JDK8

Add Oracle’s PPA, then update your package repository:

sudo add-apt-repository ppa:webupd8team/java
sudo apt-get update

Install JDK8

sudo apt-get install -y oracle-java8-installer

Add export JAVA_HOME=/usr/java/jdk1.8.0_xxx/ to .bashrc in home directory

Install Spark

Install Spark 2.2.1 (pre-built for Apache Hadoop 2.7 and later).
Download archive from: https://spark.apache.org/downloads.html
Unpack archive

tar -xvf spark-2.2.1-bin-hadoop2.7.tgz

Move the resulting folder and create a symbolic link so that you can have multiple versions of Spark installed.

sudo mv spark-2.1.1-bin-hadoop2.7 /usr/local/
sudo ln -s /usr/local/spark-2.1.1-bin-hadoop2.7/ /usr/local/spark

Update path in /etc/environment with /usr/local/spark/bin
Also add SPARK_HOME to your environment in ~/.bashrc

export SPARK_HOME=/usr/local/spark

Adjust default log level for Spark – rename and edit log4j.properties.template in /usr/local/spark/conf/
Change: log4j.rootCategory=ERROR, console

mv /usr/local/spark/conf/log4j.properties.template /usr/local/spark/conf/log4j.properties

Test!

spark-shell


Install IntelliJ IDEA

Download archive from: https://www.jetbrains.com/idea/download/#section=linux
Unpack archive

tar xvf ideaIU-2017.3.5.tar.gz

Leave a Reply

Your email address will not be published. Required fields are marked *