GRIDtoday Logo IBM

DAILY NEWS AND INFORMATION FOR THE GLOBAL GRID COMMUNITY / JULY 8, 2002: VOL. 1 NO. 4

( Previous Article )   ( Table of Contents )   ( Next Article )

Special Features:

LBNL MAKING 10-GIGABIT ETHERNET DATA TRANSFER A REALITY
by Alan Beck, editor-in-chief

With the IEEE's recent adoption of Standard 802.3ae for 10-Gigabit Ethernet equipment, the speed of Ethernet operations has increased by an order of magnitude - at least on paper. But achieving that 10-fold increase in actual Ethernet performance remains a challenge that can be met only with leading-edge equipment and expertise.

Lawrence Berkeley National Laboratory, which operates some of the world's most powerful computing, data storage and networking resources for the U.S. Department of Energy, has teamed with Force10 Networks (switches), SysKonnect (network interfaces), FineTec Computers (clusters), Quartet Network Storage (on-line storage) and Ixia (line rate monitors) to assemble a demonstration system that runs a true scientific application to produce data on one 11-processor cluster, then sends the resulting data across a 10-Gigabit Ethernet connection to another cluster, where it is rendered for visualization. The system was built as a prelude to Berkeley Lab's entry into the High-Performance Bandwidth Challenge at the SC2002 conference. The system's 10-Gigabit Ethernet capability was showcased in a demonstration held Tuesday, July 2. Using line monitoring equipment from Ixia, the demonstration actually posted a peak performance line rate of 10.6 Gigabits per second. The total amount of real data transferred during 12 hours of trial runs and the actual demonstration was nearly 60 terabytes.

When it comes to moving huge amounts of scientific data quickly across networks, the team from Berkeley Lab is the undisputed champion in the high-performance computing and networking world for two years running. At the SC2001 conference of high-performance computing and networking held last November, the LBNL team took top honors in the High-Performance Bandwidth Challenge, moving data across the network at a sustained rate of 3.3 Gigabits in a live computational steering/visualization demonstration involving the Albert Einstein Institute's "Cactus" simulation code ( http://www.cactuscode.org ) and Berkeley Lab's Visapult parallel visualization system ( vis.lbl.gov/RDProjects/visapult/index.html ). That team used a hardware system consisting of equipment provided by Force10, SysKonnect and FineTec. The 10-Gigabit Ethernet demo is being assembled with help from the same vendors as a test run for this year's competition at the SC2002 conference, at which the LBNL team is seeking its third straight win.

The primary team responsible for assembling the 10-Gigabit demonstration system consists of four Berkeley Lab staffers - network engineers Mike Bennett and John Christman and computer systems engineers John Shalf and George "Chip" Smith. Between preparations for the demo, we had a chance to talk with some of the team members about the effort.

GRIDtoday: First of all, why is Lawrence Berkeley National Laboratory leading a demonstration project like this?

BENNETT: We had been asked in January to serve as a technical advisor to a conference planned for March. The goal of the conference was to highlight the new IEEE standard for 10-Gig E. For various reasons, the conference was pushed back until June to coincide with adoption of the standard. We were then asked what kind of demo we could put together that would show the difference that having 10-Gig capability would make. I immediately thought of the Lab group that won the Bandwidth Challenge at SC2001 - they had a real scientific application that was bandwidth intensive.

We put the demo system together for the conference, which was again delayed. Since we had a room full of equipment, we decided to salvage our effort and do a demo run here. It turned out to really successful. Force 10 loaned us the switches, FineTec donated enough computers to make it interesting and Chip Smith worked with SysKonnect to get very high performance from their network interfaces. Quartet provided the network storage for storing the data to be visualized.

The result is we proved that 10-Gig E is a reality, not just a bunch of back-of-the-envelope calculations.

SMITH: Also, Berkeley Lab has a long history of being on the forefront of networking, from putting the first supercomputer on ARPANET to helping develop TCP and IP protocols to posting one of earliest sites at the dawn of the World Wide Web. We're carrying on that work by extension to keep the Lab at the forefront of technology - and to continue to push the capabilities of that technology.

GRIDtoday: In lay terms, what does 10-Gigabit Ethernet represent?

BENNETT: In order to put 10 Gigabit Ethernet in perspective, consider that the average desktop machine connects at 100 megabits per second. In essence, the higher-speed technology is 100 times faster. Here's an example of the advantage of faster data transfer: the file size of a raw digital version of "The Matrix" (AVI format) is approximately 236 gigabits. With 10-Gigabit Ethernet, transferring the entire movie file takes 23.6 seconds. In contrast, the average desktop machine transfer using Fast Ethernet takes 2360 seconds, or roughly 39 minutes. The same transfer over a DSL line takes 66 hours. Still, the full benefit of 10-Gigabit Ethernet has yet to be fully appreciated.

GRIDtoday: Is this the first real-world demonstration of 10-Gigabit Ethernet capability?

BENNETT: As far as I know. A lot of the tests that have been publicized have been interoperability-based, to show that a product from Vendor A can interoperate with equipment from Vendor B, which is the aim of the IEEE standard. What the interoperability standard doesn't address is whether you can take one vendor's equipment and plug it into a cluster connected to a network and get that 10-Gig level of performance.

What we are demonstrating is that it does work in the real world. And it has real-world benefits. From a network engineering perspective, 10-Gig E makes building a network is much easier. You have one point-to-point connection, rather than 10 1-Gig E connections to install and maintain.

SHALF: From the computing side, there's also a real-world need and benefit. The source of data for our demonstration was the Cactus simulation code developed by the Numerical Relativity group led by Ed Seidel at the Albert Einstein Institute/Max Planck Institute in Potsdam, Germany. Cactus is a modular framework capable of supporting many different simulation applications, such as general relativity, binary neutron stars, magneto-hydrodynamics, and chemistry, but in this case we were interested in binary black hole mergers. These simulations will help us better understand what wave signatures we should be looking for in gravitational wave observatories like LIGO and VIRGO.

Codes like Cactus can easily consume an entire supercomputer, like the 3,328-processor IBM SP at NERSC. The Cactus team ran the code at NERSC for 1 million CPU-hours, or 14 CPU-years, performing the first-ever simulations of the inspiraling coalescence of two black holes. When you make these big heroic runs, you don't want to find out after a week that one parameter was wrong and the simulation fell apart after a few days. You need high bandwidth to keep up with the enormous data production rate of these simulations (one terabyte per timestep) - with 10-Gig E you can get an accurate look at how the code is running. Otherwise, you can only get low-resolution snapshots that are of limited usefulness.

Remote monitoring and visualization require a system that can provide visualization capability over wide area network connections without compromising interactivity or the simulation performance. We used Visapult, developed by Wes Bethel of LBNL's Visualization Group for DOE's Next Generation Internet/Combustion Corridor project several years ago. Visapult allows you to use your desktop workstation to perform interactive volume visualization of remotely computed datasets without downsampling of the original data. It does so by employing the same massively parallel distributed memory computational model employed by the simulation code in order to keep up with the data production rate of the simulation. It also uses high performance networking in order to distribute its computational pipeline across a WAN so as to provide a remote visualization capability that is decoupled from the cycle time of the simulation code itself.

GRIDtoday: What about other applications for this capability?

BENNETT: Initially, I think the major interest will come from the research and university communities, until the cost comes down - although right now we have found 10-Gig E to cost about the same as aggregating 10 1-Gig E connections. One area that could benefit would be health care. Having 10-Gig E capability will allow streaming video at motion picture quality, which could be useful in performing surgery and teaching. It will also make it easier to transmit high-res medical images.

Also, services that rely on bandwidth can benefit. Data centers operating Web servers or providing bandwidth on demand for commercial clients would be able to offer better service, as would Metropolitan Area Ethernet service providers. Basically, any place now running 1-Gig E stands to benefit from this. Farther down the road, I think the financial services industry will find this capability useful.

SMITH: A couple of colleagues who work at Pixar came by to view the demo. Their computer animations are a good candidate to benefit from higher-bandwidth connections. They said they were getting a new cluster in the coming weeks and this gave them some good ideas, especially since it is going to be a Linux cluster, as are ours.

GRIDtoday: What were the obstacles to achieving true 10-Gigabit Ethernet performance?

BENNETT: The first one is getting the 1-Gig network interfaces to run as close to that line rate as possible. Many of them only run at 600-700 megabits. Chip Smith worked with SysKonnect to get up to the gigabit level.

SMITH: The speedbump was with Linux. When you run Linux with the SysKonnect card, the libraries in the kernel for the SysKonnect cards have a default behavior that would have the cards run with an average line rate of 600-700 megabits per second. Working with Syskonnect, I was able to change one of the libraries in the kernel and using a recent virtual Ehernet interface module, I was able to get 950 to 1000 megabits off the single interfaces. This enabled us to run this demonstration with one-third fewer machines than it would have without the work on the kernel. In the long run, getting this to work also saves money on machines and per-port price that is factored in when purchasing new machines for those who want to set up similar systems. It also shows that 1-Gig E is viable in a cluster setting.

BENNETT: The second obstacle was getting network equipment that can deliver at that rate. Force10 was able to provide the network equipment that could handle it. Because of all the contributing vendors, the demo was a success.

But the most work involved building the cluster and getting the applications to run on it, which are John's and Chip's areas of expertise.

SHALF: And certainly it's a non-trivial feat to design an application like Wes's Visapult that can fully overlap its computation with pulling data off of the network at full line rate. This requires considerable performance tuning at the application level as well as novel visualization algorithms like the Rogers and Crawfis Image Based Rendering (IBR) method on which Visapult is loosely based.

GRIDtoday: Any other challenges?

BENNETT: Well, it's definitely an exciting process. When you're working with new technology like this, you almost hope you'll run into a new and interesting bug - something you haven't seen before. It's also exciting to be able to offer this to users of our network here at the Lab.

GRIDtoday: Did you notice any similarities to previous increases in bandwidth?

SHALF: At SC95 we were asked, "With these OC-3 lines, how are you going to deal with this infinite bandwidth?" Our demonstration shows that 10-Gig E isn't indeed "Infinite Bandwidth." We are quite capable of consuming this and more using an existing production simulation/visualization application. So our excitement over the possibilities that this new technology unlocks is tempered by the fact that we remain such a long distance away from anything approximating "infinite bandwidth".

BENNETT: I saw the same cycle when 1-Gig E was rolled out in 1998. People thought it was too expensive and that no one would use all that bandwidth right away. But as the cost came down, demand and usage went up. Here at the Lab, we have 1-Gig E network distribution connections to the buildings. As that fills up, we're going to be looking at upgrading to 10-Gig E.

Contact Jon Bashor, Berkeley Lab Computing Sciences Communications, jbashor@lbl.gov , 510-486-5849, Lawrence Berkeley National Laboratory.

( Top of Page )

( Previous Article )   ( Table of Contents )   ( Next Article )