Beware of Collection.retainAll in Java!

Yesterday I got a nasty bug! Basically, I observed that a map – which was populated with about 20 values (no significance of particular number here) on first access (in a static manner) – was missing most of those values when accessed later. Begun the debugging.. and finally the culprit was caught.

Culprit happened to be the function in Java collections:

boolean retainAll(Collection<?> c)

This function retains only the elements contains in the original collection. Basically this gets called over a collection (let’s call it original) and takes as an argument another collection (let’s call it smallset). I will reproduce a simplified version of my code here to explain the issue and how to resolve it:

class A {
 public static void main(String[] args) throws Exception {
 Map<String, Integer> original = new HashMap();
 original.put("Pen", 1);
 original.put("Color", 2);
 original.put("Paper", 3);
 original.put("Envelope", 4);
 original.put("Eraser", 5);
 original.put("Crayon", 6);
 System.out.println("BEFORE: original map size: " 
              + original.size());

 Set<String> smallset = new HashSet();
 smallset.add("Color");
 smallset.add("Crayon");
 System.out.println("BEFORE: smallset set size: " 
             + smallset.size());


 Set<String> originalKeys = original.keySet();
 originalKeys.retainAll(smallset);
 System.out.println("AFTER: original map size: " 
    + original.size());
 System.out.println("AFTER: smallset set size: " 
   + smallset.size());
 }
}

In short, there is an original HashMap which contains six different keys and there is a small set of values with which we want the intersection from original Map. We get the keySet from the original map and call retainAll on the same. Idea is to get the intersection of the map keys and the small set. Output of this program is:

mawasthi@mawasthi-1:~/scratch$ java -cp . A
BEFORE: original map size: 6
BEFORE: smallset set size: 2
AFTER: original map size: 2
AFTER: smallset set size: 2

So you get it – right? Since we used keySet() on Map, since Java returns references and since retainAll has a side-effect of modifying the original map and hence our original map becomes as short as passed small set.

How to fix this?

1) Clone the Map.keySet() output. OR
2) Create a new HashSet and do a addAll of Map.keySet to it.

Idea in (1) and (2) is to create a copy.

May be this is because I’m coming from the C/C++ background (who is in love with the concepts of functional programming aka no side-effects) that I hate the arguments getting modified in any manner within a function.

Understanding unicode in python and writing text in devanagri script

This Unicode HOWTO by Python Software Foundation is a short but informative read about Unicode handling in python. Understanding the brief history of characters standardization from ASCII to Unicode is important not only when you’re working on the stuff that needs it but also since more and more services interact with each other over Web and internationalization (or localization or l10n) is omnipresent.

In python (v2.7) playing around with unicode is super easy.

A normal string in python:

$ s = "abcde"
$ type(s)
str 

A unicode string in python:

$ s = unicode("abcde")
$ type(s)
unicode 

A character is the smallest possible component of a text. ‘A’, ‘B’, ‘C’, etc., are all different characters. So are ‘È’ and ‘Í’. The Unicode standard describes how characters are represented by code points. A code point is an integer value, usually denoted in base 16. In the standard, a code point is written using the notation U+12ca to mean the character with value 0x12ca (4810 decimal). Strictly, these definitions imply that it’s meaningless to say ‘this is character U+12ca’. U+12ca is a code point, which represents some particular character; in this case, it represents the character ‘ETHIOPIC SYLLABLE WI’. In informal contexts, this distinction between code points and characters will sometimes be forgotten.

So, a Unicode string is a sequence of code points, which are numbers from 0 to 0x10ffff.

The rules for translating a Unicode string into a sequence of bytes are called an encoding. UTF-8 is one of the most commonly used encodings. UTF stands for “Unicode Transformation Format”, and the ‘8’ means that 8-bit numbers are used in the encoding. (There’s also a UTF-16 encoding, but it’s less frequently used than UTF-8.) UTF-8 uses the following rules:

  • If the code point is <128, it’s represented by the corresponding byte value.
  • If the code point is between 128 and 0x7ff, it’s turned into two byte values between 128 and 255.
  • Code points >0x7ff are turned into three- or four-byte sequences, where each byte of the sequence is between 128 and 255.

UTF-8 has several convenient properties:

  • It can handle any Unicode code point.
  • A Unicode string is turned into a string of bytes containing no embedded zero bytes. This avoids byte-ordering issues, and means UTF-8 strings can be processed by C functions such as strcpy() and sent through protocols that can’t handle zero bytes.
  • A string of ASCII text is also valid UTF-8 text.
  • UTF-8 is fairly compact; the majority of code points are turned into two bytes, and values less than 128 occupy only a single byte.

That was some important and interesting theory.

Now out of curiosity I wanted to print Devanagari script characters (Hindi characters to be precise). So I searched over google for Devanagari characters unicode list and found this page from unicode.org comprising of all devanagari script unicode values.

first_name = u'\u092e' + u'\u0928' + u'\u094b' + u'\u091c'
last_name = u'\u0905' + u'\u0935' + u'\u0938' + u'\u094d' \
+ u'\u0925' + u'\u0940'

print first_name + ' ' + last_name

This prints the devanagari characters (basically my name). Output on a ipython notebook is clear and on terminal bit unclear. Looks beautiful.

मनोज अवस्थी

Python Error “TypeError ‘float’ object not callable”

If you see this error in python: 

TypeError ‘float’ object not callable

It seems that you are trying to call a method or function and a property or variable with the same name is available in the script. Check for the same. If it was a mistake to use the variable with same name as method – fix it by renaming the variable OR if this property is what you want to access and not call any function then remove parentheses `()’ after property. 

 

 

 

Tower of Hanoi in Haskell

Before moving into Types in Haskell which is one big chapter in this book I am following I thought of having some more familiarization with the language.

How about solving Tower of Hanoi problem through haskell?

hanoi (0, _, _, _) = []
hanoi (n,from,to,using) =
         hanoi(n-1,from,using,to) ++
         [(from, to)] ++
         hanoi(n-1,using,to,from)

Running the program for n = 3 (3 discs) and 1, 2, 3 as the labels for from, using and to towers –

*Main> dohanoi (3,1,3,2)
[(1,3),(1,2),(3,2),(1,3),(2,1),(2,3),(1,3)]

If you just want the number of moves in the problem, take length of the list

length $ dohanoi (3,1,3,2)

This is equal to 2 ^ n – 1.

Quick Sort – Poster boy for Haskell’s Elegance

Finally I reached to write Quick Sort code in Haskell. This code generally requires ~ 10-12 LOC in imperative languages (I implemented last in C & Perl). In Haskell (and not surprisingly) it is much lesser length and much intuitive than the pivot – if – else code. That beauty is functional programming language’s – You don’t tell “how to do” but “what to do”.

Quick Sort Algorithm is basically:

A sorted list is a list that has all values smaller than the head of the list (sorted values), then comes the head and then come all the values larger than the list (sorted values).

First take care of the edge case i.e. Empty list.

Then handle the list. Use Set comprehensions and get the list of elements < the head and >= the head and then concatenate. We have two smaller list on which the function can be applied recursively. Recursion is such a beauty.

quickSort’ :: (Ord a) => [a] -> [a]
quickSort’ [] = []
quickSort’ (x:xs) = quickSort’ [ y | y <- xs, y < x]
                        ++ [x]
                        ++ quickSort’ [y | y <- xs, y >= x]

Set comprehensions in Haskell

Remember “Set comprehensions” in Mathematics?

Being of mathematical nature Haskell provides the same kind of programming touch feel. It has a notion of list comprehensions i.e. you define a set of answers in a set notation form. For example to convey that you need a list of all integers greater than or equal to 1 and less than or equal to 100 you can write: 

[ x | x <- [1..100] ]

Here we used Range in Haskell to represent the list [1,2,3,4,..100].

A simple problem: get all right triangles whose perimeter is 24 and each of whose sides is less than 10

There is a concept of Tuples in Haskell which is very close to structures in C/C++. We can use a tuple that contains three sides and then apply the transformations (restrictions given above). First is that each side is less 10, second that it should be a right triangle and third that perimeter should be 24,

[ (x,y,z) | x <- [1..10], y <- [1..10], z <- [1..10], x^2 + y^2 == z^2, x+y+z == 24 ]

This one liner looks so cute.

Starting with Haskell

Few days back I started learning Haskell, a purely functional programming language. A functional programming language has a different paradigm altogether from the so called imperative languages (C, C++, Java, Python, Ruby, Perl etc.). Haskell is not a new language and was originated in 1987 undergoing huge changes during the course of time.

As the book “Learn You a Haskell for Great Good” mentions:

In purely functional programming you don’t tell the computer what to do as such but rather you tell it what stuff is. The factorial of a number is the product of all the numbers from 1 to that number, the sum of a list of numbers is the first number plus the sum of all the other numbers, and so on.

Haskell is –

  • Purely functional programming language
  • Lazy
  • Statically Typed
  • Elegant & Concise

I will keep writing further on Haskell and my programming experiences with it.

Flex: Storing custom user strings in controls

Few days back while I was programming on Flex this cool problem came through. Problem was that I created buttons dynamically in the code based on some parameters passed to the function. These parameters would decide the label, id and color of the button. In addition to this there was an information called category which had a contextual meaning to the project I was working on. So it went something like this – 

private function InitializeAndDisplayButton(id:String, label:String, category:String)
{
var b:Button = new Button(); 
b.id = id; 
b.label = label; 
b.addEventListener(MouseEvent.CLICK, buttonHandler);
}

private function buttonHandler(event:Event):void
{
var id:String = event.currentTarget.id; 
/* how do I access “category” here */

Now the problem, how do you store the category for each button so that in the button handler you access it through the event? 

Solution I

A naive solution came to mind that I can keep a private map (flash.utils.Dictionary) between id and the category and add an entry corresponding to each id in the function InitializeAndDisplayButton and access it in buttonHandler after extracting “id”. 

Solution II – (topic of this post) 

Another solution is a cool one certainly. Function “SetStyle” takes two parameters and stores them as key-value pairs. We can use this to store any custom user strings (data) in controls. e.g. we can add following line in InitializeAndDisplayButton – 

b.setStyle(“ButtonCategory”, category); 

and access it with following code in function buttonHandler

var id:String = event.currentTarget.getStyle(“ButtonCategory”); 

This looks cool to me! 😀 

_