Advanced Example: Spark Action with OOzie
In this post, we will look at running a Spark job using Apache OOzie.
For background, you should read up on my earlier post on Patent Citation.
SparkAction on OOzie
As of OOzie 4.2, there is an action for running Spark jobs.
The workflow.xml is going to look as follows.
<workflow-app xmlns='uri:oozie:workflow:0.2' name='oozie-java-spark-wf'> <start to='java-spark' /> <action name='java-spark'> <spark xmlns="uri:oozie:spark-action:0.1"> <job-tracker>${jobTracker}</job-tracker> <name-node>${nameNode}</name-node> <prepare> <delete path="${jobOutput}"/> </prepare> <configuration> <property> <name>mapred.compress.map.output</name> <value>true</value> </property> </configuration> <master>local</master> <name>Spark Patent Citation</name> <class>spark.PatentCitation</class> <jar>${nameNode}/user/root/oozie/spark-patent/lib/patentcitation_spark.jar</jar> <spark-opts>--executor-memory 1G --num-executors 10</spark-opts> <arg>${nameNode}/user/captain/input/cite75_99.txt</arg> <arg>${nameNode}/user/captain/output</arg> </spark> <ok to="end"/> <error to="fail"/> </action> <kill name="fail"> <message>Spark Java PatentCitation failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message> </kill> <end name="end"/> </workflow-app>
OOzie Workflow Properties
The job.properties is as follows:
nameNode=hdfs://sandbox:8020 jobTracker=sandbox:8050 master=local[*] appRoot=spark-patent jobOutput=/user/captain/output oozie.wf.application.path=/user/root/oozie/spark-patent oozie.use.system.libpath=true
Ensure that you save the workflow.xml in hdfs under the location
/user/root/oozie/spark-parent
The patentcitation_spark.jar goes into the lib directory in hdfs.
How do you start OOzie job?
oozie job -oozie htp://localhost:11000/oozie -config job.properties -run
Note: job.properties should be in local directory and not on hdfs.
The cite75_99.txt input file goes in hdfs directory /user/captain/input
Once the oozie job has finished, you will find the output in
/user/captain/output/part-0000
If you have any questions, do not hesitate to ask.