Special Features:
LBNL MAKING 10-GIGABIT ETHERNET
DATA TRANSFER A REALITY
by Alan Beck, editor-in-chief
With the IEEE's recent adoption of Standard 802.3ae for 10-Gigabit Ethernet
equipment, the speed of Ethernet operations has increased by an order of
magnitude - at least on paper. But achieving that 10-fold increase in actual
Ethernet performance remains a challenge that can be met only with
leading-edge equipment and expertise.
Lawrence Berkeley National Laboratory, which operates some of the world's most
powerful computing, data storage and networking resources for the U.S.
Department of Energy, has teamed with Force10 Networks (switches), SysKonnect
(network interfaces), FineTec Computers (clusters), Quartet Network Storage
(on-line storage) and Ixia (line rate monitors) to assemble a demonstration
system that runs a true scientific application to produce data on one
11-processor cluster, then sends the resulting data across a 10-Gigabit
Ethernet connection to another cluster, where it is rendered for
visualization. The system was built as a prelude to Berkeley Lab's entry into
the High-Performance Bandwidth Challenge at the SC2002 conference. The
system's 10-Gigabit Ethernet capability was showcased in a demonstration held
Tuesday, July 2. Using line monitoring equipment from Ixia, the demonstration
actually posted a peak performance line rate of 10.6 Gigabits per second. The
total amount of real data transferred during 12 hours of trial runs and the
actual demonstration was nearly 60 terabytes.
When it comes to moving huge amounts of scientific data quickly across
networks, the team from Berkeley Lab is the undisputed champion in the
high-performance computing and networking world for two years running. At the
SC2001 conference of high-performance computing and networking held last
November, the LBNL team took top honors in the High-Performance Bandwidth
Challenge, moving data across the network at a sustained rate of 3.3 Gigabits
in a live computational steering/visualization demonstration involving the
Albert Einstein Institute's "Cactus" simulation code (
http://www.cactuscode.org ) and Berkeley Lab's Visapult parallel visualization
system ( vis.lbl.gov/RDProjects/visapult/index.html ). That team used a
hardware system consisting of equipment provided by Force10, SysKonnect and
FineTec. The 10-Gigabit Ethernet demo is being assembled with help from the
same vendors as a test run for this year's competition at the SC2002
conference, at which the LBNL team is seeking its third straight win.
The primary team responsible for assembling the 10-Gigabit demonstration
system consists of four Berkeley Lab staffers - network engineers Mike Bennett
and John Christman and computer systems engineers John Shalf and George "Chip"
Smith. Between preparations for the demo, we had a chance to talk with some of
the team members about the effort.
GRIDtoday: First of all, why is Lawrence Berkeley National Laboratory leading
a demonstration project like this?
BENNETT: We had been asked in January to serve as a technical advisor to a
conference planned for March. The goal of the conference was to highlight the
new IEEE standard for 10-Gig E. For various reasons, the conference was pushed
back until June to coincide with adoption of the standard. We were then asked
what kind of demo we could put together that would show the difference that
having 10-Gig capability would make. I immediately thought of the Lab group
that won the Bandwidth Challenge at SC2001 - they had a real scientific
application that was bandwidth intensive.
We put the demo system together for the conference, which was again delayed.
Since we had a room full of equipment, we decided to salvage our effort and do
a demo run here. It turned out to really successful. Force 10 loaned us the
switches, FineTec donated enough computers to make it interesting and Chip
Smith worked with SysKonnect to get very high performance from their network
interfaces. Quartet provided the network storage for storing the data to be
visualized.
The result is we proved that 10-Gig E is a reality, not just a bunch of
back-of-the-envelope calculations.
SMITH: Also, Berkeley Lab has a long history of being on the forefront of
networking, from putting the first supercomputer on ARPANET to helping develop
TCP and IP protocols to posting one of earliest sites at the dawn of the
World Wide Web. We're carrying on that work by extension to keep the Lab at
the forefront of technology - and to continue to push the capabilities of that
technology.
GRIDtoday: In lay terms, what does 10-Gigabit Ethernet represent?
BENNETT: In order to put 10 Gigabit Ethernet in perspective, consider that the
average desktop machine connects at 100 megabits per second. In essence, the
higher-speed technology is 100 times faster. Here's an example of the
advantage of faster data transfer: the file size of a raw digital version of
"The Matrix" (AVI format) is approximately 236 gigabits. With 10-Gigabit
Ethernet, transferring the entire movie file takes 23.6 seconds. In contrast,
the average desktop machine transfer using Fast Ethernet takes 2360 seconds,
or roughly 39 minutes. The same transfer over a DSL line takes 66 hours.
Still, the full benefit of 10-Gigabit Ethernet has yet to be fully
appreciated.
GRIDtoday: Is this the first real-world demonstration of 10-Gigabit Ethernet
capability?
BENNETT: As far as I know. A lot of the tests that have been publicized have
been interoperability-based, to show that a product from Vendor A can
interoperate with equipment from Vendor B, which is the aim of the IEEE
standard. What the interoperability standard doesn't address is whether you
can take one vendor's equipment and plug it into a cluster connected to a
network and get that 10-Gig level of performance.
What we are demonstrating is that it does work in the real world. And it has
real-world benefits. From a network engineering perspective, 10-Gig E makes
building a network is much easier. You have one point-to-point connection,
rather than 10 1-Gig E connections to install and maintain.
SHALF: From the computing side, there's also a real-world need and benefit.
The source of data for our demonstration was the Cactus simulation code
developed by the Numerical Relativity group led by Ed Seidel at the Albert
Einstein Institute/Max Planck Institute in Potsdam, Germany. Cactus is a
modular framework capable of supporting many different simulation
applications, such as general relativity, binary neutron stars,
magneto-hydrodynamics, and chemistry, but in this case we were interested in
binary black hole mergers. These simulations will help us better understand
what wave signatures we should be looking for in gravitational wave
observatories like LIGO and VIRGO.
Codes like Cactus can easily consume an entire supercomputer, like the
3,328-processor IBM SP at NERSC. The Cactus team ran the code at NERSC for 1
million CPU-hours, or 14 CPU-years, performing the first-ever simulations of
the inspiraling coalescence of two black holes. When you make these big heroic
runs, you don't want to find out after a week that one parameter was wrong and
the simulation fell apart after a few days. You need high bandwidth to keep up
with the enormous data production rate of these simulations (one terabyte per
timestep) - with 10-Gig E you can get an accurate look at how the code is
running. Otherwise, you can only get low-resolution snapshots that are of
limited usefulness.
Remote monitoring and visualization require a system that can provide
visualization capability over wide area network connections without
compromising interactivity or the simulation performance. We used Visapult,
developed by Wes Bethel of LBNL's Visualization Group for DOE's Next
Generation Internet/Combustion Corridor project several years ago. Visapult
allows you to use your desktop workstation to perform interactive volume
visualization of remotely computed datasets without downsampling of the
original data. It does so by employing the same massively parallel distributed
memory computational model employed by the simulation code in order to keep up
with the data production rate of the simulation. It also uses high
performance networking in order to distribute its computational pipeline
across a WAN so as to provide a remote visualization capability that is
decoupled from the cycle time of the simulation code itself.
GRIDtoday: What about other applications for this capability?
BENNETT: Initially, I think the major interest will come from the research and
university communities, until the cost comes down - although right now we have
found 10-Gig E to cost about the same as aggregating 10 1-Gig E connections.
One area that could benefit would be health care. Having 10-Gig E capability
will allow streaming video at motion picture quality, which could be useful in
performing surgery and teaching. It will also make it easier to transmit
high-res medical images.
Also, services that rely on bandwidth can benefit. Data centers operating Web
servers or providing bandwidth on demand for commercial clients would be able
to offer better service, as would Metropolitan Area Ethernet service
providers. Basically, any place now running 1-Gig E stands to benefit from
this. Farther down the road, I think the financial services industry will find
this capability useful.
SMITH: A couple of colleagues who work at Pixar came by to view the demo.
Their computer animations are a good candidate to benefit from
higher-bandwidth connections. They said they were getting a new cluster in the
coming weeks and this gave them some good ideas, especially since it is going
to be a Linux cluster, as are ours.
GRIDtoday: What were the obstacles to achieving true 10-Gigabit Ethernet
performance?
BENNETT: The first one is getting the 1-Gig network interfaces to run as close
to that line rate as possible. Many of them only run at 600-700 megabits. Chip
Smith worked with SysKonnect to get up to the gigabit level.
SMITH: The speedbump was with Linux. When you run Linux with the SysKonnect
card, the libraries in the kernel for the SysKonnect cards have a default
behavior that would have the cards run with an average line rate of 600-700
megabits per second. Working with Syskonnect, I was able to change one of the
libraries in the kernel and using a recent virtual Ehernet interface module, I
was able to get 950 to 1000 megabits off the single interfaces. This enabled
us to run this demonstration with one-third fewer machines than it would have
without the work on the kernel. In the long run, getting this to work also
saves money on machines and per-port price that is factored in when purchasing
new machines for those who want to set up similar systems. It also shows that
1-Gig E is viable in a cluster setting.
BENNETT: The second obstacle was getting network equipment that can deliver at
that rate. Force10 was able to provide the network equipment that could handle
it. Because of all the contributing vendors, the demo was a success.
But the most work involved building the cluster and getting the applications
to run on it, which are John's and Chip's areas of expertise.
SHALF: And certainly it's a non-trivial feat to design an application like
Wes's Visapult that can fully overlap its computation with pulling data off of
the network at full line rate. This requires considerable performance tuning
at the application level as well as novel visualization algorithms like the
Rogers and Crawfis Image Based Rendering (IBR) method on which Visapult is
loosely based.
GRIDtoday: Any other challenges?
BENNETT: Well, it's definitely an exciting process. When you're working with
new technology like this, you almost hope you'll run into a new and
interesting bug - something you haven't seen before. It's also exciting to be
able to offer this to users of our network here at the Lab.
GRIDtoday: Did you notice any similarities to previous increases in bandwidth?
SHALF: At SC95 we were asked, "With these OC-3 lines, how are you going to
deal with this infinite bandwidth?" Our demonstration shows that 10-Gig E
isn't indeed "Infinite Bandwidth." We are quite capable of consuming this and
more using an existing production simulation/visualization application. So our
excitement over the possibilities that this new technology unlocks is tempered
by the fact that we remain such a long distance away from anything
approximating "infinite bandwidth".
BENNETT: I saw the same cycle when 1-Gig E was rolled out in 1998. People
thought it was too expensive and that no one would use all that bandwidth
right away. But as the cost came down, demand and usage went up. Here at the
Lab, we have 1-Gig E network distribution connections to the buildings. As
that fills up, we're going to be looking at upgrading to 10-Gig E.
Contact Jon Bashor, Berkeley Lab Computing Sciences Communications,
jbashor@lbl.gov , 510-486-5849, Lawrence Berkeley National Laboratory.
|