Friday, February 3, 2012

Numbers Everyone Should Know

When designing efficient systems an important skill is to quickly estimate the performance of a design without actually building it. Jeff Dean in his presentation from 2007 Software Engineering Advice from Building Large-Scale Distributed Systems gives a few examples of the technique. One of the key elements is that you need to be familiar with the costs of the various operations used in your design. Let's have a look at the list of latency numbers from the presentation:

L1 cache reference                                     0.5 ns
Branch mispredict                                      5 ns
L2 cache reference                                     7 ns
Mutex lock/unlock                                    100 ns
Main memory reference                                100 ns
Compress 1K bytes with Zippy                      10,000 ns
Send 2K bytes over 1 Gbps network                 20,000 ns
Read 1 MB sequentially from memory               250,000 ns
Round trip within same datacenter                500,000 ns
Disk seek                                     10,000,000 ns
Read 1 MB sequentially from network           10,000,000 ns
Read 1 MB sequentially from disk              30,000,000 ns
Send packet CA->Netherlands->CA              150,000,000 ns 

While these numbers depend on the computer and infrastructure hardware being used they usually tend to improve in new generation hardware. Check out a new presentation from 2009 here which shows different figures.

L1 cache reference                                     0.5 ns
Branch mispredict                                      5 ns
L2 cache reference                                     7 ns
Mutex lock/unlock                                     25 ns
Main memory reference                                100 ns
Compress 1K bytes with Zippy                       3,000 ns
Send 2K bytes over 1 Gbps network                 20,000 ns
Read 1 MB sequentially from memory               250,000 ns
Round trip within same datacenter                500,000 ns
Disk seek                                     10,000,000 ns
Read 1 MB sequentially from network           10,000,000 ns
Read 1 MB sequentially from disk              20,000,000 ns
Send packet CA->Netherlands->CA              150,000,000 ns

[Edit]

As technology evolves and hardware is getting faster (more GHz for CPUs, faster DRAM, higher throughput NICs & networking gear) some of these numbers are getting smaller. Here is a nice visualization that captures this evolution: https://people.eecs.berkeley.edu/~rcs/research/interactive_latency.html

One of the early efforts to visualize the latency is from the NetStore '99 keynote slide #23 "How far away is the data?"  But maybe the most famous visual aid is Grace Hopper's nanosecond, it makes you aware of the fundamental limit imposed by the speed of light on the technology in general and indirectly on the latency numbers.

Related articles


No comments:

Post a Comment