, , ,

Recently I was making Hadoop alerting infrastructure and I needed something to track active jobs in cluster.

So, for start you need instance of JobClient. JobClient is wrapper around JobTracker RPC, basically
under the hood JobClient creates JobSubmissionProtocol instance:

    public void init() throws IOException {
        String tracker = conf.get("mapred.job.tracker", "local");
        if ("local".equals(tracker)) {
          this.jobSubmitClient = new LocalJobRunner(conf);
        } else {
          this.jobSubmitClient = (JobSubmissionProtocol) 
                         JobTracker.getAddress(conf), conf);

Let’s code:

import org.apache.hadoop.mapred.JobClient;

initialize JobClient instance:

JobClient jobClient = new JobClient(new InetSocketAddress(jobTrackerHost, jobTrackerPort), new Configuration());

where jobTrackerHost and jobTrackerPort are host name and port where Job Tracker is running …

To get list of currently active jobs in cluster all you have to do is:

JobStatus[] activeJobs = jobClient.jobsToComplete();

This will give you list of all active records, JobStatus is pretty useful. You can get jobId, username that was used to summit job, start time…

I created sample code which will every n seconds output number of active jobs and their info, look on my GitHub.