Thursday, February 16, 2012

Java Bridge Methods

Before diving into the details about bridge methods let's swim a little and get familiar with one of the features introduced in Java 5, covariant return types. This feature basically allows an overriding method to return a more specialized type than the method it overrides. For example:
class Node { }

class TextNode extends Node { }

class NodeFactory {
    public Node makeNode() {
        return new Node();
    }
}

class TextNodeFactory extends NodeFactory {
    public TextNode makeNode() {
        return new TextNode();
    }
}

TextNodeFactory class overrides makeNode method and is allowed to return a TextNode because TextNode is a specialization of Node. So far everything looks good. But let's see how the compilation of TextNodeFactory looks like (courtesy of javap -c TextNodeFactory):
class TextNodeFactory extends NodeFactory {
TextNodeFactory();
  Code:
   0: aload_0
   1: invokespecial #1; //Method NodeFactory."<init>":()V
   4: return

public TextNode makeNode();
  Code:
   0: new #2; //class TextNode
   3: dup
   4: invokespecial #3; //Method TextNode."<init>":()V
   7: areturn

public Node makeNode();
  Code:
   0: aload_0
   1: invokevirtual #4; //Method makeNode:()LTextNode;
   4: areturn
}
It seems that the compiler generated another method public Node makeNode() that simply forwards the call to our method public TextNode makeNode(). Using reflection we find out that the method is a synthetic method, which is ok because Java Language Specification states that: Any constructs introduced by the compiler that do not have a corresponding construct in the source code must be marked as synthetic, except for default constructors and the class initialization method. We also find out that the method is a bridge method.

What are bridge methods?

Bridge methods are methods generated by the compiler in order to ensure correct overriding. They are needed because at the JVM level the method return type is part of the internal signature of the method. This means that the bridge method is the one that overrides NodeFactory's public Node makeNode() method.

Now let's modify our initial example and introduce generics.
interface Node { }

class TextNode implements Node { }

class NodeFactory<T extends Node> {

    public T makeNode() {
        return null;
    }

    public void releaseNode(T node) {
    }
}

class TextNodeFactory extends NodeFactory<TextNode> {

    public TextNode makeNode() {
        return new TextNode();
    }

    public void releaseNode(TextNode node) {
    }
}
After compilation NodeFactory will look like this:
class NodeFactory extends java.lang.Object{
NodeFactory();
  Code:
   0: aload_0
   1: invokespecial #1; //Method java/lang/Object."<init>":()V
   4: return

public Node makeNode();
  Code:
   0: aconst_null
   1: areturn

public void releaseNode(Node);
  Code:
   0: return

}
You can clearly see the effects of type erasure. The type information is lost and the lower bound type (i.e. Node) is used instead. Now let's take a look at TextNodeFactory.
class TextNodeFactory extends NodeFactory{
TextNodeFactory();
  Code:
   0: aload_0
   1: invokespecial #1; //Method NodeFactory."<init>":()V
   4: return

public TextNode makeNode();
  Code:
   0: new #2; //class TextNode
   3: dup
   4: invokespecial #3; //Method TextNode."<init>":()V
   7: areturn

public void releaseNode(TextNode);
  Code:
   0: return

public void releaseNode(Node);
  Code:
   0: aload_0
   1: aload_1
   2: checkcast #2; //class TextNode
   5: invokevirtual #4; //Method releaseNode:(LTextNode;)V
   8: return

public Node makeNode();
  Code:
   0: aload_0
   1: invokevirtual #5; //Method makeNode:()LTextNode;
   4: areturn

}
In this case the compiler generates 2 bridge methods. The first one public Node makeNode() is just another example of covariant return types. The second one public void releaseNode(Node) overrides the method with the same internal signature from the NodeFactory base class. It also makes sure its argument has the correct type before forwarding the call to our method. This is a safeguard for the case when one makes an unsafe call, like this:
((NodeFactory)textNodeFactory).releaseNode(new Node() { });
and the result will be a java.lang.ClassCastException.

There are probably other cases where bridge methods are used. Besides the fact the Method#isBridge() is part of the public API and the flag is defined in class file format there is no other "official" source of information related to bridge methods which leads me to the conclusion that bridge methods are a JVM implementation detail that has leaked.

Monday, February 6, 2012

Git: Rebasing Merge Commits

There are many articles related to rebasing and some of them also describe rebasing merge commits. Long story short: You just completed a merge and somebody has pushed a commit before you were able to push yours. The solution is to make git aware of the merge you did.

git rebase --preserve-merges <upstream>
or
git rebase -p <upstream>

This way git will be aware of your merge commit and it will try to redo it when rebasing. Here's what man git-rebase has to say about this:

-p, --preserve-merges
           Instead of ignoring merges, try to recreate them.

           This uses the --interactive machinery internally but
           combining it with the --interactive option explicitly
           is generally not a good idea unless you know what you
           are doing (see BUGS below).

But there's a problem, if your merge had conflicts that you solved they won't be picked up by the rebase machinery. And you will end up resolving the conflicts again ... at least this is the case with git version 1.7.5.4

But there's hope. We can make our lives easier by using another git cool feature aka rerere (Reuse recorded resolution of conflicted merges).  Here's what man git-rerere sais:

In a workflow employing relatively long lived topic branches, the developer sometimes needs to resolve the same conflicts over and over again until the topic branches are done (either merged to the "release" branch, or sent out and accepted upstream).

This command assists the developer in this process by recording conflicted automerge results and corresponding hand resolve results on the initial manual merge, and applying previously recorded hand resolutions to their corresponding automerge results.

OK, sounds cool, but how can we use rerere now? Let's see... , assuming your last commit was the merge commit, do a git log -n1 and "remember" the sha1 of the commit. And then:

  1. Enable rerere mechanism.

    git config --global rerere.enabled 1

  2. Go back one commit 

    git reset --hard HEAD^

  3. Redo the merge.  Rerere will record the conflicts.

    git merge --no-ff <topic>

  4. "resolve" conflicts

    git checkout <commit> -- ./
    git add -u

  5. Complete the merge. Rerere will record conflict resolution.

    git commit

  6. Go back where we started.

    git reset --hard <commit>

  7. Do the rebase again. 

    git rebase -p <upstream>

where <commit> is the sha1 for the merge commit and <upstream> is the remote branch. This will automatically reuse the manual resolution previously recorded by rerere mechanism.

Now relax, there's no hurry in doing the push ;)

Friday, February 3, 2012

Numbers Everyone Should Know

When designing efficient systems an important skill is to quickly estimate the performance of a design without actually building it. Jeff Dean in his presentation from 2007 Software Engineering Advice from Building Large-Scale Distributed Systems gives a few examples of the technique. One of the key elements is that you need to be familiar with the costs of the various operations used in your design. Let's have a look at the list of latency numbers from the presentation:

L1 cache reference                                     0.5 ns
Branch mispredict                                      5 ns
L2 cache reference                                     7 ns
Mutex lock/unlock                                    100 ns
Main memory reference                                100 ns
Compress 1K bytes with Zippy                      10,000 ns
Send 2K bytes over 1 Gbps network                 20,000 ns
Read 1 MB sequentially from memory               250,000 ns
Round trip within same datacenter                500,000 ns
Disk seek                                     10,000,000 ns
Read 1 MB sequentially from network           10,000,000 ns
Read 1 MB sequentially from disk              30,000,000 ns
Send packet CA->Netherlands->CA              150,000,000 ns 

While these numbers depend on the computer and infrastructure hardware being used they usually tend to improve in new generation hardware. Check out a new presentation from 2009 here which shows different figures.

L1 cache reference                                     0.5 ns
Branch mispredict                                      5 ns
L2 cache reference                                     7 ns
Mutex lock/unlock                                     25 ns
Main memory reference                                100 ns
Compress 1K bytes with Zippy                       3,000 ns
Send 2K bytes over 1 Gbps network                 20,000 ns
Read 1 MB sequentially from memory               250,000 ns
Round trip within same datacenter                500,000 ns
Disk seek                                     10,000,000 ns
Read 1 MB sequentially from network           10,000,000 ns
Read 1 MB sequentially from disk              20,000,000 ns
Send packet CA->Netherlands->CA              150,000,000 ns

[Edit]

As technology evolves and hardware is getting faster (more GHz for CPUs, faster DRAM, higher throughput NICs & networking gear) some of these numbers are getting smaller. Here is a nice visualization that captures this evolution: https://people.eecs.berkeley.edu/~rcs/research/interactive_latency.html

One of the early efforts to visualize the latency is from the NetStore '99 keynote slide #23 "How far away is the data?"  But maybe the most famous visual aid is Grace Hopper's nanosecond, it makes you aware of the fundamental limit imposed by the speed of light on the technology in general and indirectly on the latency numbers.

Related articles