Tags

, , , , ,

When developing Hadoop-based projects, it is always a big challenge on how to integrate task testing into Bamboo.

One of ideas is to start single node Hadoop cluster each time new Bamboo Elastic instance is started.

Plan is following:
( maybe this is not best solution, but it seems to be working )

* create image with all stuff required ( Hadoop, configuration … )
* on each start call start_hadoop.sh which will bring Hadoop up

I used following command to get image ready for Hadoop

cd /mnt/bamboo-ebs/
wget http://archive.cloudera.com/cdh/3/hadoop-0.20.2-cdh3u0.tar.gz
tar -xvf hadoop-0.20.2-cdh3u0.tar.gz
ln -s hadoop-0.20.2-cdh3u0 hadoop
cd hadoop/conf
vi core-site.xml

With following configuration for Hadoop:

<configuration>
    <property>
        <name>fs.default.name</name>
        <value>hdfs://localhost/</value>
        <final>true</final>
    </property>
</configuration>

Also, we need to set MapReduce configuration:

vi mapred-site.xml

I’m only setting JobTracker host:port here, everything else will be default ( storage on /tmp/ … ) same as for core conf.

<configuration>
    <property>
        <name>mapred.job.tracker</name>
        <value>localhost:8021</value>
        <final>true</final>
    </property>
</configuration>

Next, we will do snapshot:

cd /
generateSnapshot.sh

I assume that image is mounted to /mnt/bamboo-ebs. With generateSnapshot.sh new snapshot of image is generated and saved. After successful image creation new image id is returned. Copy this id inside Bamboo setting to set created image as startup image for next instance start.

start_hadoop.sh looks like:

export JAVA_HOME=/opt/jdk-6
export HADOOP_HOME=/mnt/bamboo-ebs/hadoop
export HADOOP_CONF_DIR=/mnt/bamboo-ebs/hadoop/conf

/usr/sbin/groupadd hadoop
/usr/sbin/useradd -g hadoop hdfs
/usr/sbin/useradd -g hadoop mapred

chmod 777 -R /tmp/

bin/hadoop namenode -format

Namenode and DataNode are started with:

su hdfs

java -Dproc_namenode -Xmx1000m -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote -Dhadoop.log.dir=/tmp/logs -Dhadoop.log.file=hadoop-hadoop-namenode-ec2.log -Dhadoop.home.dir=/mnt/bamboo-ebs/hadoop/ -Dhadoop.id.str=hadoop -Dhadoop.root.logger=INFO,DRFA -Dhadoop.policy.file=hadoop-policy.xml -classpath /mnt/bamboo-ebs/hadoop/conf:/opt/jdk-6/lib/tools.jar:/mnt/bamboo-ebs/hadoop:/mnt/bamboo-ebs/hadoop/hadoop-core-0.20.2-cdh3u0.jar:/mnt/bamboo-ebs/hadoop/lib/ant-contrib-1.0b3.jar:/mnt/bamboo-ebs/hadoop/lib/aspectjrt-1.6.5.jar:/mnt/bamboo-ebs/hadoop/lib/aspectjtools-1.6.5.jar:/mnt/bamboo-ebs/hadoop/lib/commons-cli-1.2.jar:/mnt/bamboo-ebs/hadoop/lib/commons-codec-1.4.jar:/mnt/bamboo-ebs/hadoop/lib/commons-daemon-1.0.1.jar:/mnt/bamboo-ebs/hadoop/lib/commons-el-1.0.jar:/mnt/bamboo-ebs/hadoop/lib/commons-httpclient-3.0.1.jar:/mnt/bamboo-ebs/hadoop/lib/commons-logging-1.0.4.jar:/mnt/bamboo-ebs/hadoop/lib/commons-logging-api-1.0.4.jar:/mnt/bamboo-ebs/hadoop/lib/commons-net-1.4.1.jar:/mnt/bamboo-ebs/hadoop/lib/core-3.1.1.jar:/mnt/bamboo-ebs/hadoop/lib/hadoop-fairscheduler-0.20.2-cdh3u0.jar:/mnt/bamboo-ebs/hadoop/lib/hsqldb-1.8.0.10.jar:/mnt/bamboo-ebs/hadoop/lib/jackson-core-asl-1.5.2.jar:/mnt/bamboo-ebs/hadoop/lib/jackson-mapper-asl-1.5.2.jar:/mnt/bamboo-ebs/hadoop/lib/jasper-compiler-5.5.12.jar:/mnt/bamboo-ebs/hadoop/lib/jasper-runtime-5.5.12.jar:/mnt/bamboo-ebs/hadoop/lib/jets3t-0.6.1.jar:/mnt/bamboo-ebs/hadoop/lib/jetty-6.1.26.jar:/mnt/bamboo-ebs/hadoopdoop/lib/jetty-servlet-tester-6.1.26.jar:/mnt/bamboo-ebs/hadoop/lib/jetty-util-6.1.26.jar:/mnt/bamboo-ebs/hadoop/lib/jsch-0.1.42.jar:/mnt/bamboo-ebs/hadoop/lib/junit-4.5.jar:/mnt/bamboo-ebs/hadoop/lib/kfs-0.2.2.jar:/mnt/bamboo-ebs/hadoop/lib/log4j-1.2.15.jar:/mnt/bamboo-ebs/hadoop/lib/mockito-all-1.8.2.jar:/mnt/bamboo-ebs/hadoop/lib/oro-2.0.8.jar:/mnt/bamboo-ebs/hadoop/lib/servlet-api-2.5-20081211.jar:/mnt/bamboo-ebs/hadoop/lib/servlet-api-2.5-6.1.14.jar:/mnt/bamboo-ebs/hadoop/lib/slf4j-api-1.4.3.jar:/mnt/bamboo-ebs/hadoop/lib/slf4j-log4j12-1.4.3.jar:/mnt/bamboo-ebs/hadoop/lib/xmlenc-0.52.jar:/mnt/bamboo-ebs/hadoop/lib/jsp-2.1/jsp-2.1.jar:/mnt/bamboo-ebs/hadoop/lib/jsp-2.1/jsp-api-2.1.jar org.apache.hadoop.hdfs.server.namenode.NameNode &

java -Dproc_datanode -Xmx1000m -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote -Dhadoop.log.dir=/tmp/logs -Dhadoop.log.file=hadoop-hadoop-datanode-ec2.log -Dhadoop.home.dir=/mnt/bamboo-ebs/hadoop/ -Dhadoop.id.str=hadoop -Dhadoop.root.logger=INFO,DRFA -Dhadoop.policy.file=hadoop-policy.xml -classpath /mnt/bamboo-ebs/hadoop/conf:/opt/jdk-6/lib/tools.jar:/mnt/bamboo-ebs/hadoop:/mnt/bamboo-ebs/hadoop/hadoop-core-0.20.2-cdh3u0.jar:/mnt/bamboo-ebs/hadoop/lib/ant-contrib-1.0b3.jar:/mnt/bamboo-ebs/hadoop/lib/aspectjrt-1.6.5.jar:/mnt/bamboo-ebs/hadoop/lib/aspectjtools-1.6.5.jar:/mnt/bamboo-ebs/hadoop/lib/commons-cli-1.2.jar:/mnt/bamboo-ebs/hadoop/lib/commons-codec-1.4.jar:/mnt/bamboo-ebs/hadoop/lib/commons-daemon-1.0.1.jar:/mnt/bamboo-ebs/hadoop/lib/commons-el-1.0.jar:/mnt/bamboo-ebs/hadoop/lib/commons-httpclient-3.0.1.jar:/mnt/bamboo-ebs/hadoop/lib/commons-logging-1.0.4.jar:/mnt/bamboo-ebs/hadoop/lib/commons-logging-api-1.0.4.jar:/mnt/bamboo-ebs/hadoop/lib/commons-net-1.4.1.jar:/mnt/bamboo-ebs/hadoop/lib/core-3.1.1.jar:/mnt/bamboo-ebs/hadoop/lib/hadoop-fairscheduler-0.20.2-cdh3u0.jar:/mnt/bamboo-ebs/hadoop/lib/hsqldb-1.8.0.10.jar:/mnt/bamboo-ebs/hadoop/lib/jackson-core-asl-1.5.2.jar:/mnt/bamboo-ebs/hadoop/lib/jackson-mapper-asl-1.5.2.jar:/mnt/bamboo-ebs/hadoop/lib/jasper-compiler-5.5.12.jar:/mnt/bamboo-ebs/hadoop/lib/jasper-runtime-5.5.12.jar:/mnt/bamboo-ebs/hadoop/lib/jets3t-0.6.1.jar:/mnt/bamboo-ebs/hadoop/lib/jetty-6.1.26.jar:/mnt/bamboo-ebs/hadoopdoop/lib/jetty-servlet-tester-6.1.26.jar:/mnt/bamboo-ebs/hadoop/lib/jetty-util-6.1.26.jar:/mnt/bamboo-ebs/hadoop/lib/jsch-0.1.42.jar:/mnt/bamboo-ebs/hadoop/lib/junit-4.5.jar:/mnt/bamboo-ebs/hadoop/lib/kfs-0.2.2.jar:/mnt/bamboo-ebs/hadoop/lib/log4j-1.2.15.jar:/mnt/bamboo-ebs/hadoop/lib/mockito-all-1.8.2.jar:/mnt/bamboo-ebs/hadoop/lib/oro-2.0.8.jar:/mnt/bamboo-ebs/hadoop/lib/servlet-api-2.5-20081211.jar:/mnt/bamboo-ebs/hadoop/lib/servlet-api-2.5-6.1.14.jar:/mnt/bamboo-ebs/hadoop/lib/slf4j-api-1.4.3.jar:/mnt/bamboo-ebs/hadoop/lib/slf4j-log4j12-1.4.3.jar:/mnt/bamboo-ebs/hadoop/lib/xmlenc-0.52.jar:/mnt/bamboo-ebs/hadoop/lib/jsp-2.1/jsp-2.1.jar:/mnt/bamboo-ebs/hadoop/lib/jsp-2.1/jsp-api-2.1.jar org.apache.hadoop.hdfs.server.datanode.DataNode &

Check if everything is ok with:

bin/hadoop dfsadmin -report

We have enabled permissions, so we must create HDFS directory for MapReduce, as hdfs user

bin/hadoop dfs -mkdir hdfs://localhost/tmp/hadoop-mapred/mapred/system
bin/hadoop dfs -chown mapred hdfs://localhost/tmp/hadoop-mapred/mapred/system

Now we are ready to start JobTracker and TaskTracker:

su mapred

java -Dproc_jobtracker -Xmx1000m -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote -Dhadoop.log.dir=/tmp/logs -Dhadoop.log.file=hadoop-hadoop-jobtracker-ec2.log -Dhadoop.home.dir=/mnt/bamboo-ebs/hadoop/ -Dhadoop.id.str=hadoop -Dhadoop.root.logger=INFO,DRFA -Dhadoop.policy.file=hadoop-policy.xml -classpath /mnt/bamboo-ebs/hadoop/conf:/opt/jdk-6/lib/tools.jar:/mnt/bamboo-ebs/hadoop:/mnt/bamboo-ebs/hadoop/hadoop-core-0.20.2-cdh3u0.jar:/mnt/bamboo-ebs/hadoop/lib/ant-contrib-1.0b3.jar:/mnt/bamboo-ebs/hadoop/lib/aspectjrt-1.6.5.jar:/mnt/bamboo-ebs/hadoop/lib/aspectjtools-1.6.5.jar:/mnt/bamboo-ebs/hadoop/lib/commons-cli-1.2.jar:/mnt/bamboo-ebs/hadoop/lib/commons-codec-1.4.jar:/mnt/bamboo-ebs/hadoop/lib/commons-daemon-1.0.1.jar:/mnt/bamboo-ebs/hadoop/lib/commons-el-1.0.jar:/mnt/bamboo-ebs/hadoop/lib/commons-httpclient-3.0.1.jar:/mnt/bamboo-ebs/hadoop/lib/commons-logging-1.0.4.jar:/mnt/bamboo-ebs/hadoop/lib/commons-logging-api-1.0.4.jar:/mnt/bamboo-ebs/hadoop/lib/commons-net-1.4.1.jar:/mnt/bamboo-ebs/hadoop/lib/core-3.1.1.jar:/mnt/bamboo-ebs/hadoop/lib/hadoop-fairscheduler-0.20.2-cdh3u0.jar:/mnt/bamboo-ebs/hadoop/lib/hsqldb-1.8.0.10.jar:/mnt/bamboo-ebs/hadoop/lib/jackson-core-asl-1.5.2.jar:/mnt/bamboo-ebs/hadoop/lib/jackson-mapper-asl-1.5.2.jar:/mnt/bamboo-ebs/hadoop/lib/jasper-compiler-5.5.12.jar:/mnt/bamboo-ebs/hadoop/lib/jasper-runtime-5.5.12.jar:/mnt/bamboo-ebs/hadoop/lib/jets3t-0.6.1.jar:/mnt/bamboo-ebs/hadoop/lib/jetty-6.1.26.jar:/mnt/bamboo-ebs/hadoopdoop/lib/jetty-servlet-tester-6.1.26.jar:/mnt/bamboo-ebs/hadoop/lib/jetty-util-6.1.26.jar:/mnt/bamboo-ebs/hadoop/lib/jsch-0.1.42.jar:/mnt/bamboo-ebs/hadoop/lib/junit-4.5.jar:/mnt/bamboo-ebs/hadoop/lib/kfs-0.2.2.jar:/mnt/bamboo-ebs/hadoop/lib/log4j-1.2.15.jar:/mnt/bamboo-ebs/hadoop/lib/mockito-all-1.8.2.jar:/mnt/bamboo-ebs/hadoop/lib/oro-2.0.8.jar:/mnt/bamboo-ebs/hadoop/lib/servlet-api-2.5-20081211.jar:/mnt/bamboo-ebs/hadoop/lib/servlet-api-2.5-6.1.14.jar:/mnt/bamboo-ebs/hadoop/lib/slf4j-api-1.4.3.jar:/mnt/bamboo-ebs/hadoop/lib/slf4j-log4j12-1.4.3.jar:/mnt/bamboo-ebs/hadoop/lib/xmlenc-0.52.jar:/mnt/bamboo-ebs/hadoop/lib/jsp-2.1/jsp-2.1.jar:/mnt/bamboo-ebs/hadoop/lib/jsp-2.1/jsp-api-2.1.jar org.apache.hadoop.mapred.JobTracker &

java -Dproc_tasktracker -Xmx1000m -Dhadoop.log.dir=/tmp/logs -Dhadoop.log.file=hadoop-hadoop-tasktracker-ec2.log -Dhadoop.home.dir=/mnt/bamboo-ebs/hadoop/ -Dhadoop.id.str=hadoop -Dhadoop.root.logger=INFO,DRFA -Dhadoop.policy.file=hadoop-policy.xml -classpath /mnt/bamboo-ebs/hadoop/conf:/opt/jdk-6/lib/tools.jar:/mnt/bamboo-ebs/hadoop:/mnt/bamboo-ebs/hadoop/hadoop-core-0.20.2-cdh3u0.jar:/mnt/bamboo-ebs/hadoop/lib/ant-contrib-1.0b3.jar:/mnt/bamboo-ebs/hadoop/lib/aspectjrt-1.6.5.jar:/mnt/bamboo-ebs/hadoop/lib/aspectjtools-1.6.5.jar:/mnt/bamboo-ebs/hadoop/lib/commons-cli-1.2.jar:/mnt/bamboo-ebs/hadoop/lib/commons-codec-1.4.jar:/mnt/bamboo-ebs/hadoop/lib/commons-daemon-1.0.1.jar:/mnt/bamboo-ebs/hadoop/lib/commons-el-1.0.jar:/mnt/bamboo-ebs/hadoop/lib/commons-httpclient-3.0.1.jar:/mnt/bamboo-ebs/hadoop/lib/commons-logging-1.0.4.jar:/mnt/bamboo-ebs/hadoop/lib/commons-logging-api-1.0.4.jar:/mnt/bamboo-ebs/hadoop/lib/commons-net-1.4.1.jar:/mnt/bamboo-ebs/hadoop/lib/core-3.1.1.jar:/mnt/bamboo-ebs/hadoop/lib/hadoop-fairscheduler-0.20.2-cdh3u0.jar:/mnt/bamboo-ebs/hadoop/lib/hsqldb-1.8.0.10.jar:/mnt/bamboo-ebs/hadoop/lib/jackson-core-asl-1.5.2.jar:/mnt/bamboo-ebs/hadoop/lib/jackson-mapper-asl-1.5.2.jar:/mnt/bamboo-ebs/hadoop/lib/jasper-compiler-5.5.12.jar:/mnt/bamboo-ebs/hadoop/lib/jasper-runtime-5.5.12.jar:/mnt/bamboo-ebs/hadoop/lib/jets3t-0.6.1.jar:/mnt/bamboo-ebs/hadoop/lib/jetty-6.1.26.jar:/mnt/bamboo-ebs/hadoopdoop/lib/jetty-servlet-tester-6.1.26.jar:/mnt/bamboo-ebs/hadoop/lib/jetty-util-6.1.26.jar:/mnt/bamboo-ebs/hadoop/lib/jsch-0.1.42.jar:/mnt/bamboo-ebs/hadoop/lib/junit-4.5.jar:/mnt/bamboo-ebs/hadoop/lib/kfs-0.2.2.jar:/mnt/bamboo-ebs/hadoop/lib/log4j-1.2.15.jar:/mnt/bamboo-ebs/hadoop/lib/mockito-all-1.8.2.jar:/mnt/bamboo-ebs/hadoop/lib/oro-2.0.8.jar:/mnt/bamboo-ebs/hadoop/lib/servlet-api-2.5-20081211.jar:/mnt/bamboo-ebs/hadoop/lib/servlet-api-2.5-6.1.14.jar:/mnt/bamboo-ebs/hadoop/lib/slf4j-api-1.4.3.jar:/mnt/bamboo-ebs/hadoop/lib/slf4j-log4j12-1.4.3.jar:/mnt/bamboo-ebs/hadoop/lib/xmlenc-0.52.jar:/mnt/bamboo-ebs/hadoop/lib/jsp-2.1/jsp-2.1.jar:/mnt/bamboo-ebs/hadoop/lib/jsp-2.1/jsp-api-2.1.jar org.apache.hadoop.mapred.TaskTracker &

JobTracker can be tested with:

bin/hadoop job -list

Feel free to ping me for more informations…

Advertisements