Yesterday I was setting up Apache Mahout for work related stuff. After downloading, unzipping and untarring I installed using –
(version 0.7, 0.8 and 0.9 [trunk] all gave the same issue which I am addressing in this post)
mvn -DskipTests clean install
After setting up some environment variables like MAHOUT_HOME I was ready to go for running some machine learning algorithms on some data. I chose Quick Start Guide on the Mahout site. I downloaded the sample data, copied it to Hadoop (cloudera distribution 4.3.0 cdh4) HDFS as instructed on the quick start page. After this, I ran kMeans algorithm on the sample data copied in HDFS:
$MAHOUT_HOME/bin/mahout org.apache.mahout.clustering.syntheticcontrol.kmeans.Job
It failed with following error:
Exception in thread “main” java.lang.NoSuchMethodError:
org.apache.hadoop.util.ProgramDriver.driver([Ljava/lang/String;)V
at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:175)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:192)*
I searched on Google and found many many many results. All pointing to some kind of version mismatch between Hadoop and Mahout OR some issue with CLASSPATH or some thing else. I could not really resolve the issue since was not flexible to change the version of Hadoop and was receiving this error for Mahout v0.7, 0.8 and svn trunk (0.9).
Finally, I ran the algorithm like this (replacing `bin/mahout` with `hadoop -jar apache-mahout-version-job.jar.. `)
$HADOOP_HOME/bin/hadoop -jar $MAHOUT_HOME/examples/target/apache-mahout-0.7-job.jar org.apache.mahout.clustering.syntheticcontrol.kmeans.Job
And this worked.
Looks like a classpath issue. Hadoop was not able to get access to Mahout 0.7 job JAR file and hence was throwing NoSuchMethodError. I didn’t have much progress resolving the CLASSPATH issue but was happy to get the map-reduce job running and getting the output which I finally copied back to file system.
BTW – this alternative is equivalent of running mahout shell script – so not that you miss anything.
Also note that if you are using Mahout version 0.7 – there is a bug in the script $MAHOUT_HOME/bin/mahout at line number 224 (around that) – read this:
> bin/mahout throws NoClassDefFoundError: org/apache/hadoop/util/ProgramDriver
> I think this is due to line 224
> CLASSPATH=”${CLASSPATH}:${MAHOUT_HOME/lib/hadoop/*}”
> I fix the issue by moving the closing brace
> CLASSPATH=”${CLASSPATH}:${MAHOUT_HOME}/lib/hadoop/*”
You can fix it locally. This issue is fixed in 0.8.