Thursday, February 16, 2012

Java Bridge Methods

Before diving into the details about bridge methods let's swim a little and get familiar with one of the features introduced in Java 5, covariant return types. This feature basically allows an overriding method to return a more specialized type than the method it overrides. For example:
class Node { }

class TextNode extends Node { }

class NodeFactory {
    public Node makeNode() {
        return new Node();
    }
}

class TextNodeFactory extends NodeFactory {
    public TextNode makeNode() {
        return new TextNode();
    }
}

TextNodeFactory class overrides makeNode method and is allowed to return a TextNode because TextNode is a specialization of Node. So far everything looks good. But let's see how the compilation of TextNodeFactory looks like (courtesy of javap -c TextNodeFactory):
class TextNodeFactory extends NodeFactory {
TextNodeFactory();
  Code:
   0: aload_0
   1: invokespecial #1; //Method NodeFactory."<init>":()V
   4: return

public TextNode makeNode();
  Code:
   0: new #2; //class TextNode
   3: dup
   4: invokespecial #3; //Method TextNode."<init>":()V
   7: areturn

public Node makeNode();
  Code:
   0: aload_0
   1: invokevirtual #4; //Method makeNode:()LTextNode;
   4: areturn
}
It seems that the compiler generated another method public Node makeNode() that simply forwards the call to our method public TextNode makeNode(). Using reflection we find out that the method is a synthetic method, which is ok because Java Language Specification states that: Any constructs introduced by the compiler that do not have a corresponding construct in the source code must be marked as synthetic, except for default constructors and the class initialization method. We also find out that the method is a bridge method.

What are bridge methods?

Bridge methods are methods generated by the compiler in order to ensure correct overriding. They are needed because at the JVM level the method return type is part of the internal signature of the method. This means that the bridge method is the one that overrides NodeFactory's public Node makeNode() method.

Now let's modify our initial example and introduce generics.
interface Node { }

class TextNode implements Node { }

class NodeFactory<T extends Node> {

    public T makeNode() {
        return null;
    }

    public void releaseNode(T node) {
    }
}

class TextNodeFactory extends NodeFactory<TextNode> {

    public TextNode makeNode() {
        return new TextNode();
    }

    public void releaseNode(TextNode node) {
    }
}
After compilation NodeFactory will look like this:
class NodeFactory extends java.lang.Object{
NodeFactory();
  Code:
   0: aload_0
   1: invokespecial #1; //Method java/lang/Object."<init>":()V
   4: return

public Node makeNode();
  Code:
   0: aconst_null
   1: areturn

public void releaseNode(Node);
  Code:
   0: return

}
You can clearly see the effects of type erasure. The type information is lost and the lower bound type (i.e. Node) is used instead. Now let's take a look at TextNodeFactory.
class TextNodeFactory extends NodeFactory{
TextNodeFactory();
  Code:
   0: aload_0
   1: invokespecial #1; //Method NodeFactory."<init>":()V
   4: return

public TextNode makeNode();
  Code:
   0: new #2; //class TextNode
   3: dup
   4: invokespecial #3; //Method TextNode."<init>":()V
   7: areturn

public void releaseNode(TextNode);
  Code:
   0: return

public void releaseNode(Node);
  Code:
   0: aload_0
   1: aload_1
   2: checkcast #2; //class TextNode
   5: invokevirtual #4; //Method releaseNode:(LTextNode;)V
   8: return

public Node makeNode();
  Code:
   0: aload_0
   1: invokevirtual #5; //Method makeNode:()LTextNode;
   4: areturn

}
In this case the compiler generates 2 bridge methods. The first one public Node makeNode() is just another example of covariant return types. The second one public void releaseNode(Node) overrides the method with the same internal signature from the NodeFactory base class. It also makes sure its argument has the correct type before forwarding the call to our method. This is a safeguard for the case when one makes an unsafe call, like this:
((NodeFactory)textNodeFactory).releaseNode(new Node() { });
and the result will be a java.lang.ClassCastException.

There are probably other cases where bridge methods are used. Besides the fact the Method#isBridge() is part of the public API and the flag is defined in class file format there is no other "official" source of information related to bridge methods which leads me to the conclusion that bridge methods are a JVM implementation detail that has leaked.

Monday, February 6, 2012

Git: Rebasing Merge Commits

There are many articles related to rebasing and some of them also describe rebasing merge commits. Long story short: You just completed a merge and somebody has pushed a commit before you were able to push yours. The solution is to make git aware of the merge you did.

git rebase --preserve-merges <upstream>
or
git rebase -p <upstream>

This way git will be aware of your merge commit and it will try to redo it when rebasing. Here's what man git-rebase has to say about this:

-p, --preserve-merges
           Instead of ignoring merges, try to recreate them.

           This uses the --interactive machinery internally but
           combining it with the --interactive option explicitly
           is generally not a good idea unless you know what you
           are doing (see BUGS below).

But there's a problem, if your merge had conflicts that you solved they won't be picked up by the rebase machinery. And you will end up resolving the conflicts again ... at least this is the case with git version 1.7.5.4

But there's hope. We can make our lives easier by using another git cool feature aka rerere (Reuse recorded resolution of conflicted merges).  Here's what man git-rerere sais:

In a workflow employing relatively long lived topic branches, the developer sometimes needs to resolve the same conflicts over and over again until the topic branches are done (either merged to the "release" branch, or sent out and accepted upstream).

This command assists the developer in this process by recording conflicted automerge results and corresponding hand resolve results on the initial manual merge, and applying previously recorded hand resolutions to their corresponding automerge results.

OK, sounds cool, but how can we use rerere now? Let's see... , assuming your last commit was the merge commit, do a git log -n1 and "remember" the sha1 of the commit. And then:

  1. Enable rerere mechanism.

    git config --global rerere.enabled 1

  2. Go back one commit 

    git reset --hard HEAD^

  3. Redo the merge.  Rerere will record the conflicts.

    git merge --no-ff <topic>

  4. "resolve" conflicts

    git checkout <commit> -- ./
    git add -u

  5. Complete the merge. Rerere will record conflict resolution.

    git commit

  6. Go back where we started.

    git reset --hard <commit>

  7. Do the rebase again. 

    git rebase -p <upstream>

where <commit> is the sha1 for the merge commit and <upstream> is the remote branch. This will automatically reuse the manual resolution previously recorded by rerere mechanism.

Now relax, there's no hurry in doing the push ;)

Friday, February 3, 2012

Numbers Everyone Should Know

When designing efficient systems an important skill is to quickly estimate the performance of a design without actually building it. Jeff Dean in his presentation from 2007 Software Engineering Advice from Building Large-Scale Distributed Systems gives a few examples of the technique. One of the key elements is that you need to be familiar with the costs of the various operations used in your design. Let's have a look at the list of latency numbers from the presentation:

L1 cache reference                                     0.5 ns
Branch mispredict                                      5 ns
L2 cache reference                                     7 ns
Mutex lock/unlock                                    100 ns
Main memory reference                                100 ns
Compress 1K bytes with Zippy                      10,000 ns
Send 2K bytes over 1 Gbps network                 20,000 ns
Read 1 MB sequentially from memory               250,000 ns
Round trip within same datacenter                500,000 ns
Disk seek                                     10,000,000 ns
Read 1 MB sequentially from network           10,000,000 ns
Read 1 MB sequentially from disk              30,000,000 ns
Send packet CA->Netherlands->CA              150,000,000 ns 

While these numbers depend on the computer and infrastructure hardware being used they usually tend to improve in new generation hardware. Check out a new presentation from 2009 here which shows different figures.

L1 cache reference                                     0.5 ns
Branch mispredict                                      5 ns
L2 cache reference                                     7 ns
Mutex lock/unlock                                     25 ns
Main memory reference                                100 ns
Compress 1K bytes with Zippy                       3,000 ns
Send 2K bytes over 1 Gbps network                 20,000 ns
Read 1 MB sequentially from memory               250,000 ns
Round trip within same datacenter                500,000 ns
Disk seek                                     10,000,000 ns
Read 1 MB sequentially from network           10,000,000 ns
Read 1 MB sequentially from disk              20,000,000 ns
Send packet CA->Netherlands->CA              150,000,000 ns

[Edit]

As technology evolves and hardware is getting faster (more GHz for CPUs, faster DRAM, higher throughput NICs & networking gear) some of these numbers are getting smaller. Here is a nice visualization that captures this evolution: https://people.eecs.berkeley.edu/~rcs/research/interactive_latency.html

One of the early efforts to visualize the latency is from the NetStore '99 keynote slide #23 "How far away is the data?"  But maybe the most famous visual aid is Grace Hopper's nanosecond, it makes you aware of the fundamental limit imposed by the speed of light on the technology in general and indirectly on the latency numbers.

Related articles


Friday, January 27, 2012

A Hibernate Proxy and 2 Open Sessions

If you use spring's implementation of OpenSessionInView pattern to support lazy loading in the view layer and you've set singleSession=false you'll most probably hit the infamous:

org.hibernate.HibernateException: illegally attempted to associate a proxy with two open Sessions
So what does this mean? It complains about a proxy being associated with two open sessions? How can this happen? Well, this happens when you try to update an entity which directly or indirectly contains lazy-loading proxies that are associated to another session. For example:


The user is loaded from database using session #1. Along with the user, an object graph consisting of all eager-loading fields reachable from user are also loaded. If any of these objects contain lazy-loading fields or collections they will be associated to session #1. The method userDao.update(user) is trying to update the user but is using a new session #2. From hibernate's point of view the user entity is a detached one, so it will try to reattach it to the current session which is #2. But there's a problem because the lazy-loading fields and collections are already associated with session #1, hence we get the exception

One possible solution would be to close session #1 before doing the update. We can do this by calling processDeferredClose/initDeferredClose. This will definitely work for us but it defeats the purpose of using OpenSessionInView with multiple sessions.

Another solution would be to detach the entity from the session as it would be if the session #1 was closed. But how do we do this? We iterate over the graph of entities rooted at entity and evict all lazy-loading proxies or collections from their associated sessions:


I'm not fully confident that this code covers all the angles but you get the idea. Let me know if you have any thoughts on this.