java.lang.NoSuchMethodError while running Mahout example

Yesterday I was setting up Apache Mahout for work related stuff. After downloading, unzipping and untarring I installed using  –
(version 0.7, 0.8 and 0.9 [trunk] all gave the same issue which I am addressing in this post)

mvn -DskipTests clean install

After setting up some environment variables like MAHOUT_HOME I was ready to go for running some machine learning algorithms on some data. I chose Quick Start Guide on the Mahout site. I downloaded the sample data, copied it to Hadoop (cloudera distribution 4.3.0 cdh4) HDFS as instructed on the quick start page. After this, I ran kMeans algorithm on the sample data copied in HDFS:

$MAHOUT_HOME/bin/mahout org.apache.mahout.clustering.syntheticcontrol.kmeans.Job

It failed with following error:

Exception in thread “main” java.lang.NoSuchMethodError:
org.apache.hadoop.util.ProgramDriver.driver([Ljava/lang/String;)V
at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:175)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:192)*

I searched on Google and found many many many results. All pointing to some kind of version mismatch between Hadoop and Mahout OR some issue with CLASSPATH or some thing else. I could not really resolve the issue since was not flexible to change the version of Hadoop and was receiving this error for Mahout v0.7, 0.8 and svn trunk (0.9).

Finally, I ran the algorithm like this (replacing `bin/mahout` with `hadoop -jar apache-mahout-version-job.jar.. `)

$HADOOP_HOME/bin/hadoop -jar $MAHOUT_HOME/examples/target/apache-mahout-0.7-job.jar org.apache.mahout.clustering.syntheticcontrol.kmeans.Job

And this worked.

Looks like a classpath issue. Hadoop was not able to get access to Mahout 0.7 job JAR file and hence was throwing NoSuchMethodError. I didn’t have much progress resolving the CLASSPATH issue but was happy to get the map-reduce job running and getting the output which I finally copied back to file system.

BTW – this alternative is equivalent of running mahout shell script – so not that you miss anything.

Also note that if you are using Mahout version 0.7 – there is a bug in the script $MAHOUT_HOME/bin/mahout at line number 224 (around that) – read this:

> bin/mahout throws NoClassDefFoundError: org/apache/hadoop/util/ProgramDriver
> I think this is due to line 224
> CLASSPATH=”${CLASSPATH}:${MAHOUT_HOME/lib/hadoop/*}”
> I fix the issue by moving the closing brace
> CLASSPATH=”${CLASSPATH}:${MAHOUT_HOME}/lib/hadoop/*”

You can fix it locally. This issue is fixed in 0.8.

Ruby require error in loading gems

Long time that I coded in Ruby so thought lets replenish the love.

I started reading Graph APIs from Facebook to get some idea of what capabilities they provide with. Graph APIs are immensely powerful in the kind of data they allow developers to access. No doubt how such powerful ecosystem got created. Graph API responses are in JSON (Javascript Object Notation). I have little familiarity with this representation of data. Coming from old school I’ve mostly played with XML data. So I thought lets first try practicing JSON on Ruby. So first step –

$ gem install json

It was easily installed.

Then I wrote the script to load JSON gem in my script i.e.

require ‘json’

and just to test if this is fine I executed it (being almost sure that it will and I will move on). Ruby interpreter yelled!

$ ruby temp.rb
temp.rb:1:in `require’: no such file to load — json (LoadError)
from temp.rb:1

Hmm. so again. Some googling helped (and refreshed some memory) –

There are three ways to handle this –

1. Put following line to the starting of the script

require ‘rubygems’

2. Execute it as:

$ ruby -rubygems script.rb

3. Add rubygems to RUBYOPT

$ export RUBYOPT=”rubygems”

Well, just now I read this article on Github and it clearly says:

You should never do this in a source file included with your library,
app, or tests:

require ‘rubygems’

Why You Shouldn’t Force Rubygems On People!

So guys, (2) and (3) are the way to go and (1) is to be avoided if you plan to share your Ruby script.

Facebook open sources its C++ library

Facebook has announced to open source its C++ library named folly and made it available via Github.

Library is C++11 Components and is claimed to be highly usable and fast. In fact, their introduction page particularly focuses on the performance part of the library. Folly has been tested with gcc 4.6 on 64 bit installations.

I just downloaded the library for a quick look and it should be interesting to peek into the code.

You can also have a look here: https://github.com/facebook/folly/blob/master/folly/docs/Overview.md

Is it possible that I call function with NULL object without segfaulting?

Oh yes it is!

While debugging my project at work I came across this interesting piece. Inspector (watch or expressions in visual studio) showed me the variable with which the function was being called was NULL (0x0) i.e. considered invalid and still the function call went fine. This variable was initialized later. I thought this should be one of the weird things that compilers and IDEs do sometimes but no it was not like that and i observed the same behavior with restarting the IDE and changing the compiler.

If your class does not have data associated with it (member variables) then its functions can be called without allocating any memory to it.

e.g. following class and example:

class A
{
public:
void printA() { cout << “A\n”; }
void printB() { cout << “B\n”; }
};

int main ()
{
A * a = NULL;
a->printA();
a->printB();
}

Then for MS compiler cl.exe, you can call a function from a NULL object as far as you don’t access the variable even if you have defined it.

class A
{
int i;
public:
void printA() { cout << “A\n”; }
void printB() { cout << “B\n”; }
};

int main ()
{
A * a = NULL;
a->printA();
a->printB();
}

This will SEGFAULT.

class A
{
int i;
public:
void printA() { cout << “A\n” << i; }
void printB() { cout << “B\n”; }
};

int main ()
{
A * a = NULL;
a->printA();
a->printB();

and yes, you can do this in C++ and not Java.

Blank Page on install page of wordpress & resetting the forgotten root password for MySQL

WordPress installation has been painless for me always but today it got me into trouble. Well, it was not wordpress it was mysql. Well, well not even mysql me myself. I forgot the root password. Whole story here –

I was setting up wordpress. It showed me a blank page while accessing wp-admin/install.php page for installation. I was left wondering!

Then I added in wp-config.php the following line:

define('WP_DEBUG', true); // debugging mode: 'true' = enable; 'false' = disable

This enabled debugging logs and I was able to get away from the “blank page” problem. It showed the error due to which the blank page was coming. WordPress could not connect to Database.

This got me to look into phpmyadmin and that gave the issue:

#1045 – Access denied for user ‘root’@’localhost’ (using password: YES)

I had put a blank password in the configuration file and hence this error was encountered. I forgot the root password. Following solved the issue:

1. Stop the mysqld service.

2. Start the mysqld service in safe mode  and without reading grant tables

mysqld --safe-mode --skip-grant-tables &

3. Now use the mysql client to connect with the daemon and change the password

MYSQL>UPDATE mysql.user SET Password=PASSWORD(‘MyNewPass’) WHERE User=’root’;

MYSQL>FLUSH PRIVILEGES;

4. Stop the safe mode daemon

5. Start the service normally.

Orthogonality and its importance in software development

I’ve been lately reading The Pragmatic Programmer by Andrew Hunt & David Thomas. Been onto a chapter about decoupling requirement in the development of software, I thought of putting few lines on the weblog. Orthogonality is derived originally from Geometry where it is meant to illustrate two lines which meet at right angles and hence are mutually independent moving in all directions. In software, orthogonality refers to the independence between the modules of the software. e.g. user interface of a software should not have any dependence on Database schema. Decoupling, if not met properly while designing software, can lead to disaster in code maintenance. A decoupled code is better for maintenance because of numerous reasons –

1. Changes are localized and hence development and testing time (and cost) are reduced. Quality also improves since better division of work is possible.

2. Problems are also localized. An issue in one module does not affect other modules and hence fix requires to be done their only (or whole module can be replaced by another implementation altogether).

3. There is more possibility of smaller independent teams (which is ideal for a better coordination)

An interesting introduction into orthogonality is the advent of Aspect Oriented Programming (AOP), a research project at Xerox Parc. As Object oriented programming focusses on the objects and their interaction, Aspect oriented programming focusses on aspects (concerns). AOP lets you express a behavior which would otherwise be distributed throughout the source code. The most obvious example would be logging. Log messages are normally generated by sprinkling explicit calls to some log function throughout the code. With AOP, you implement logging orthogonally to the things being logged. Using the AOP for Java, you could write a log message while entering any method of Class Fred by coding the aspect –

aspect Trace {
advise * Fred.*(..) {
static before {
Log.write(” -> Entering ” + thisJointPoint.methodname);
}
}
}

If you weave this aspect in your code then log messages will be generated and if you don’t, they won’t. Either way, your original source is unchanged.

Towards the end of the discussion is a challenge: Consider large GUI-oriented tools typically available on Windows and small but combinable command line tools used on shell prompts. Which do you think are more orthogonal in design?

What do you think?

failed to retrieve keys – Automatix !

After installing Automatix on the Ubuntu, when I started it , it sleeps for some time (seemingly) and then says – failed to retrieve keys.

Looking at the `ps -aef’ output I could find that it has been trying to get the keys with a timeout value and when the timeout occurs, it shows the error.

So something to do with firewall ! or proxy settings ! I tried shutting the firewall down. did not help.

Then I went to System -> Preferences -> Network Proxy and put the proxy information there. Worked !!

atomicity and alignment of data in memory

A data item is aligned in memory when its address is a multiple of its size in bytes. For instance, the address of an aligned short integer must be a multiple of two while the address of an aligned integer must be a multiple of four.

Why is it important to know about alignment ?

Assembly language instructions that make zero or one aligned memory access are atomic.  Generally, a unaligned memory access is not atomic.

fork and vfork

quick question: what’s the difference between fork() and vfork() system calls ?

quick answer: vfork() system call creates a process that shares the memory address space of its parent.

details:

fork() is implemented by linux as a clone() system call whose flags parameter specifies both a SIGCHLD signal and all the clone flags cleared and whose child_stack parameter is 0.

vfork() is implemented by linux as a clone() system call whose flags parameter specifies both a SGCHLD signal and flags CLONE_VM and CLONE_VFORK and whose second parameter is 0.

[ discussion: copy on write ]

This is a concept of making the process creation using fork() efficient in that instead of copying the parent’s address space while process creation, it is shared but as soon as either of them write on the page, kernel allocates a new page and assigns it to the writer process.

Most of the time, forking is required just to run a new process in which case it’s a waste to copy the whole parent address space.