Modifying built-in classes

Inheritance is a powerful idea in object oriented programming. This lets you add new enhancements to existing classes. Well, not exactly. You have to create a new data type derived from the existing classes and use these for additional enhancements.

For example, if we want to add a function palindrome to String class. Solution existing in all (most?) object oriented programming languages is to create a class say Word derived from String and add the function to this class. In Ruby for example it would look like:

Note – replace ‘LT’ with corresponding symbol for less than. HTML is messing it up and I feeling lazy to find the &; version for it.

class Word 'LT' String 
def palindrome? 
   self == self.reverse 
irb> w ="level")
irb> w.palindrome? 
irb> w1 ="simple")
irb> w1.palindrome? 

What we lose here is that we CANNOT call the function on String objects e.g., following is invalid:

irb> "level".palindrome? 
NoMethodError: undefined method `palindrome?' for "level":String

Ruby, quite surprisingly (and did i mention it never fails to amaze!) provides this. It lets you modify the built-in classes. So we can add a function simply by following:

class String 
def palindrome? 
   self == self.reverse 

and it gets your String more powerful!

irb> "level".palindrome? 

Isn’t that coooooool??

Facebook open sources its C++ library

Facebook has announced to open source its C++ library named folly and made it available via Github.

Library is C++11 Components and is claimed to be highly usable and fast. In fact, their introduction page particularly focuses on the performance part of the library. Folly has been tested with gcc 4.6 on 64 bit installations.

I just downloaded the library for a quick look and it should be interesting to peek into the code.

You can also have a look here:

to switch-case or not to !

A friend (and colleague) in an attempt to have a long discussion asked me about my preference between cascaded if-else and switch-case statements and which one is better. I could get away (because was caught into some work I didn’t want to lose attention from) by saying that I will certainly be getting back with facts you want to know and my personal preference is switch-case.

Later in the evening (now) I am in front of firefox with almost seventeen tabs open with
links relating to the comparison between if-else and switch-case statements. I think I should share some statements here (sources are documents and forums available on internet).

The switch-case can only used with integral types, and the case values need to be compile-time constants. An if/else cascade is likely to be slower than switch/case.

switch/case is often implemented using a jump table with the case values as index into the table. The if/else is usually implemented using a cascade of conditional jumps. Hence switch/case often will win on time efficiency.

The best construct to use is the one that is the easiest to understand to the reader (and not the more efficient) (obviously, if it is not killing the program).

For the curious,

A jump table is either a table of addresses (pointers) or jump instructions. An index is used to access the appropriate location, then an action is taken. This can be implemented in C++ using an std::map of <key, function_pointer> or an array of similar structures.

__cdecl and __stdcall

Q. Difference between __cdecl and __stdcall ?

A. Responsibility of cleaning up of arguments on stack is caller’s in case of __cdecl and is callee’s in case of __stdcall. Default call is __cdecl.

some more points to keep in mind:

> Since __stdcall does stack cleanup, the (very tiny) code to perform this task is found in only one place, rather than being duplicated in every caller as it is in __cdecl. This makes the code very slightly smaller, though the size impact is only visible in large programs.

> Variadic functions like printf() are almost impossible to get right with __stdcall, because only the caller really knows how many arguments were passed in order to clean them up. The callee can make some good guesses (say, by looking at a format string), but the stack cleanup would have to be determined by the actual logic of the function, not the calling-convention mechanism itself. Hence only __cdecl supports variadic functions so that the caller can do the cleanup.

> There isn’t really a “right or wrong” with respect to which one is best, but it is positively fatal to “mix and match”. The general principle is “the stack-cleanup must match the arg-pushing”, and this only happens when caller and callee know what the other is doing. Calling a function with the “wrong” convention will destroy the stack.

Details are available here .

return and exit from main: difference

What is the difference between returning from main with some exit code and calling exit giving the exit code as parameter ?

Basically the difference between following programs !

int main()
return 43;

int main()

well, there are three ways for the processes to exit: –

1. Voluntary exit (implicit)
2. Voluntary exit (explicit)
3. Involunatary exit

Voluntary exit can be implicit e.g. in case of return and explicit e.g.
in case of a call to exit(). Involuntary exit is like being killed by a
third process, receiving a SIGSTP signal (or other signals whose default ot set
behavior results in terminating the process).

+ ‘return’ is an implicit voluntary termination of process
+ ‘exit’ is an explicit voluntary termination.

Note: both calls actually execute code in the function ‘do_exit’ ultimately.

I also tried writing above programs and inspecting the assembly just to know if
there is any difference in the code ultimately. Here are they:

[ used ‘objdump -d’ function and showing here only the function ‘main’ disassembly]

//ret.o – main
080482f4 :
80482f4: 55 push %ebp
80482f5: 89 e5 mov %esp,%ebp
80482f7: 83 ec 08 sub $0x8,%esp
80482fa: 83 e4 f0 and $0xfffffff0,%esp
80482fd: b8 00 00 00 00 mov $0x0,%eax
8048302: 29 c4 sub %eax,%esp
8048304: b8 2b 00 00 00 mov $0x2b,%eax
8048309: c9 leave
804830a: c3 ret
804830b: 90 nop

//exit.o – main
08048324 :
8048324: 55 push %ebp
8048325: 89 e5 mov %esp,%ebp
8048327: 83 ec 08 sub $0x8,%esp
804832a: 83 e4 f0 and $0xfffffff0,%esp
804832d: b8 00 00 00 00 mov $0x0,%eax
8048332: 29 c4 sub %eax,%esp
8048334: 83 ec 0c sub $0xc,%esp
8048337: 6a 2b push $0x2b
8048339: e8 26 ff ff ff call 8048264
804833e: 90 nop
804833f: 90 nop

So there is some difference and looking closely it is that in case of ‘exit’
and explicit ‘call’ is made using call instruction (and hence pushing of
arguments to the stack) while this overhead is avoided in case of a return.

Ok. so that was a short dip in the code for two empty programs – result of a free mind.

how to enforce the instantiation of objects on heap ?

A nice question, I came through while being in an interview. Well, so how do you make sure that the object can be instantiated only on heap and not on stack ? so, if you are of the kind who don’t think without being given the use and purpose .. well the purpose is that your class may be a heavy one so storing objects on stack may be a killer.

Answer popped up my mind quite easily. Let’s suppose we have a class A,

class A {
int a[10000];
void print_arr();

Use the C++ access specifiers and make the constructor itself private. Now there is no way to instantiate the class. Pretty useless.

class A {
int a[10000];
void print_arr();

Now let’s make it useful by adding a factory function (or method) to it. This will act as the only way to instantiate an object. As with any factory method, it has to be static to be actually useful. Otherwise its as dumb a class as the one above.

class A {
int a[10000];
static A* getAnObject() { return new A(); }
void print_arr();

well that was it. problem solved. I am such a happy go lucky. But my interviewer friend had yet another idea (probably) in his mind and he asked me for some other way.

I am still puzzled.. some one there for help ?

Sequence Points in C

From the confusions in Expressions in C, as to what is legal and defined, and what is not, I read following from the C FAQs.

A sequence point is a point in time at which the dust has settled and all side effects which have been seen so far are guaranteed to be complete. The sequence points listed in the C standard are:

  • at the end of the evaluation of a full expression (a full expression is an expression statement, or any other expression which is not a subexpression within any larger expression);
  • at the ||, &&, ?:, and comma operators; and
  • at a function call (after the evaluation of all the arguments, and just before the actual call).

The Standard states that

Between the previous and next sequence point an object shall have its stored value modified at most once by the evaluation of an expression. Furthermore, the prior value shall be accessed only to determine the value to be stored.

If there are sequence points in the expression, you can be sure that it is defined and legal. e.g. following (stupid) expression

i = i++ && i++;

and following is undefined and illegal since sequence points are missing or if you can understand the languages of common sense – it is confusing if what happens earlier –

a[i] = i++;

You can find the above information here.

This page discusses the ways to avoid confusion and be sure about these.

post/pre increment code – some digging !!

I came across the following C problem this morning. This problems are so similar in looks that i felt a deja vu. I used to think earlier that these problems are just a result of obscure thinking or for the satisfaction of an interviewer’s ego but now my view is different – they also prove as an insight into what happens inside.

What is the output of following program ?

#include <stdio.h>
int main()
int i = 2;
i = i++ * ++i;
printf(“%d\n”, i);

You can try yourself before reading further. If your answer is 10 then you are correct and need not read further, else read on.  To confess, I could not get it correct the first time. Tried my hands on modifying the problem and then reading the disassembled code to get into what’s happening. Finally, I could understand what is going on. All this effort is really worth since it saves from making false assumptions like increment is right associative (which it is not) or these are execptions (which it is not).

Disassembled code for the above C program is. [ only relevant part ]

0x08048395 <main+17>:   movl   $0x2,0xfffffff8(%ebp)
0x0804839c <main+24>:   addl   $0x1,0xfffffff8(%ebp)
0x080483a0 <main+28>:   mov    0xfffffff8(%ebp),%eax
0x080483a3 <main+31>:   imul   0xfffffff8(%ebp),%eax
0x080483a7 <main+35>:   mov    %eax,0xfffffff8(%ebp)
0x080483aa <main+38>:   addl   $0x1,0xfffffff8(%ebp)
0x080483ae <main+42>:   mov    0xfffffff8(%ebp),%eax

This shows the process clearly. It is left associative. First, i is stored on the stack. No increment is done for the expression i++ (post increment). It is incremented for the expression ++i [see addl above ]. Then this is multiplied with itself to get the value of i * ++i. This value is assigned to the i. Finally the increment pending for the first post increment is done.

Program to reverse the words in a string

A friend of mine was asked to write the following program in an interview. He asked me how to do it?

Write a program to reverse the order of the words in a given statement (string). e.g. if input string is “India is a great country”, Output should be “country great a is India”.

This is one of the pet string questions in the interviews. I tried the implementation of a neat way of solving the problem. Here it is. Its not noval way. Algorithm uses iteration. Can you write a recursive one ?

Program is simple and self explanatory about the algorithm.

Update: I just tried writing it in shell – Bash.


for i in $INSTR; do
TMP=`echo $i | rev`

echo IN : $INSTR