Move the EDP examples from the sahara-extra repo to sahara¶
https://blueprints.launchpad.net/sahara/+spec/edp-move-examples
Moving the Sahara EDP examples from the sahara-extra repo to the sahara repo accomplishes several things:
It eliminates code duplication since the examples are actually used in integration tests
It removes an element from the sahara-extra repo, thereby moving us closer to retiring that repo and simplifying our repo structure
It puts examples where developers are more likely to find it, and makes it simpler to potentially bundle the examples with a Sahara distribution
Problem description¶
The goal is to create one unified set of EDP jobs that can be used to educate users and developers on how to create/run jobs and can also be used as jobs submitted during integration testing.
Proposed change¶
Under the sahara root directory, we should create a new directory:
sahara/edp-examples
The directory structure should follow a standard pattern (names are not important per se, this is just an illustration):
subdirectory_for_each_example/
README.rst (what it is, how to compile, etc)
script_and_jar_files
src_for_jars/
how_to_run_from_node_command_line/ (optional)
expected_input_and_output/ (optional)
hadoop_1_specific_examples/
subdirectory_for_each_example
hadoop_2_specific_examples/
subdirectory_for_each_example
The integration tests should be modified to pull job files from the sahara/edp-examples directory.
Here are some notes on equivalence for the current script and jar
files in sahara-extra/edp-examples
against
sahara/tests/integration/tests/resources
:
pig-job/example.pig == resources/edp-job.pig
pig-job/udf.jar == resources/edp-lib.jar
wordcount/edp-java.jar == resources/edp-java/edp-java.jar
Alternatives¶
None
Data model impact¶
None
REST API impact¶
None
Other end user impact¶
Examples won’t be found in the sahara-extra repo any longer. We should perhaps put a README file there that says “We have moved” for a release cycle.
Deployer impact¶
None
Developer impact¶
None
Sahara-image-elements impact¶
None
Sahara-dashboard / Horizon impact¶
None
Implementation¶
Assignee(s)¶
None as yet
Work Items¶
The problem has several components:
Move the examples to the sahara repository
Merge any jobs used by the integration tests into the new examples directory to create one comprehensive set
Provide source code and compilation instructions for any examples that currently lack them
Make the integration tests reference the new directory structure
Delineate which, if any, examples work only with specific Hadoop versions. Most examples work on both Hadoop 1 and Hadoop 2 but some do not. Version-specific examples should be in a subdirectory named for the version
Dependencies¶
None
Testing¶
Testing will be inherent in the integration tests. The change will be deemed successful if the integration tests run successfully after the merging of the EDP examples and the integration test jobs.
Documentation Impact¶
If our current docs reference the EDP examples, those references should change to the new location. If our current docs do not reference the EDP examples, a reference should be added in the developer and/or user guide.
References¶
None