Getting Started¶
Installation¶
pip install lofn
Dependencies¶
You can use python 2 or 3, 2.7+ and 3.6+ preferably.
Running a script on this framework requires Spark and Docker.
Install Docker¶
See the Docker Docs on how to install.
Install Spark¶
See the Downloading Spark instructions to get started.
It will require Java be installed and in your PATH or set JAVA_HOME and
downloading the jar files. Then set SPARK_HOME as the path to this directory
and add its bin directory to PATH as well.
Running on Standalone¶
lofn can be run on Spark standalone on a cluster or a single node. Use spark-submit to submit your application
to Spark.
Running on YARN¶
Some configurations are required for lofn to work on YARN.
Configure the Cluster¶
Beyond having Spark setup on a YARN cluster ready to submit jobs, follow these steps for lofn to work:
- install lofn on each node
- install Docker on each node
- create a Docker group
- add $USER and
yarnuser to Docker group - restart yarn daemons and your shell for changes to take effect
See the next page ‘Using lofn on AWS’ for instructions on how to setup an EMR cluster automatically for lofn
Submission¶
- User volumes must be in HDFS and your
volumesdictionary should provide the absolute path to the directory on HDFS - use
spark-submitto submit the application to Spark