Getting Started¶
Installation¶
pip install lofn
Dependencies¶
You can use python 2 or 3, 2.7+ and 3.6+ preferably.
Running a script on this framework requires Spark and Docker.
Install Docker¶
See the Docker Docs on how to install.
Install Spark¶
See the Downloading Spark instructions to get started.
It will require Java be installed and in your PATH
or set JAVA_HOME
and
downloading the jar files. Then set SPARK_HOME
as the path to this directory
and add its bin
directory to PATH
as well.
Running on Standalone¶
lofn can be run on Spark standalone on a cluster or a single node. Use spark-submit
to submit your application
to Spark.
Running on YARN¶
Some configurations are required for lofn to work on YARN.
Configure the Cluster¶
Beyond having Spark setup on a YARN cluster ready to submit jobs, follow these steps for lofn to work:
- install lofn on each node
- install Docker on each node
- create a Docker group
- add $USER and
yarn
user to Docker group - restart yarn daemons and your shell for changes to take effect
See the next page ‘Using lofn on AWS’ for instructions on how to setup an EMR cluster automatically for lofn
Submission¶
- User volumes must be in HDFS and your
volumes
dictionary should provide the absolute path to the directory on HDFS - use
spark-submit
to submit the application to Spark