Hadoop FileSystem

From Ceph wiki

Jump to: navigation, search

Ceph has a Hadoop FileSystem module which has been submitted to their Jira. It consists of a C++ JNI interface (src/client/hadoop/CephFSInterface.cc/h) and several Java classes which fit in with the standard Hadoop FileSystem architecture. All code is included in the Ceph repository, but using the Java modules requires several minor changes to the Hadoop configuration classes -- you can do these yourself or simply apply the patch (either from Apache's Jira or from our repository).

Contents

[edit] Apply the patch

Using your standard patch application methods. This will make the necessary changes to the Hadoop codebase and add the ceph package to the Hadoop fs package. At the time of writing this patch and the code in our repository are the same, but you may want to check and copy the code from our repository to make sure you're completely up-to-date.

[edit] Install ceph

Using the instructions available elsewhere. Make sure that libhadoopceph is built and installed -- this requires that your compiler recognize the Java JNI headers and has been an issue on some of our machines. (Check the output of ./configure to make sure the headers are found.)


Here's what worked for me on Ubuntu Karmic (9.10), to get configure to see the JNI headers:

./configure CPPFLAGS="-I/usr/lib/jvm/java-6-sun/include -I/usr/lib/jvm/java-6-sun/include/linux"

--Eestolan 11:47, 22 February 2010 (PST)

[edit] Configure Hadoop

Instructions are available in the ceph package Javadoc.

[edit] Start up Ceph

Make sure your storage cluster is running!

[edit] Start up Hadoop mapred

You don't want to run start-all.sh as that will start up the HDFS stuff as well. And make sure that you've created your mapred dir on Ceph (bin/hadoop fs -mkdir mapred) before running bin/start-mapred!

[edit] Hurrah!

You've done it. Use the standard Hadoop commands for looking at data and running jobs.

Personal tools