Special
Features:ENVISIONING
THE GRID By Paul
Preuss
At last year's SC2002 conference in Baltimore, Berkeley Lab racked up its
third straight win in supercomputing's annual Bandwidth Challenge with a data-
gobbling visualization of colliding black holes. When it comes to remote
scientific visualization, says Wes Bethel with a smile, "we're the kings."
Now Bethel and John Shalf of the Computational Research Division's
Visualization Group have followed up their SC2002 success by writing a good
chunk -- including the guest editorial -- of the March/April, 2003, issue of
IEEE Computer Graphics and Applications, which is devoted to graphics on the
Grid.
"The Grid integrates all sorts of devices, services, and resources, not
just computers," Shalf says. Operating inside a specialized world of research,
the Grid hopes to do for the tools of science, from computers and data-storage
systems to instruments like telescopes, electron microscopes, seismographs,
synchrotron beam lines -- even ocean-going floats that report via satellite --
what the World Wide Web did for desktop PCs.
But Berkeley Lab's bandwidth champs are far from complacent about the
future of visualization on the Grid. In fact, Bethel and Shalf argue, there is
a "wide gulf between current visualization technologies and the vision of
global, Grid-enabled visualization capabilities."
Their editorial focuses on the gulf between enthusiasm bordering on hype,
on the one hand, and on the other, the tools that can actually be used by
scientific researchers in their day-to-day research activities. In the
process, the editorial highlights several of the most important technical
challenges facing the Grid visualization community.
Bethel and Shalf describe what many envision, a future in which "large,
multidisciplinary teams scattered around the world" can work with
sophisticated visualizations powered by a huge inflow of information to their
individual desktop machines. They sketch a scenario of an imaginary
geophysical and materials-science team using real-time, interactive models
that integrate input from experiments on the molecular scale, seismograms of
natural and induced earthquakes, phone calls from the field, and all kinds of
other data collected by a "vast network of sensors."
"The vision is a noble one," says Bethel, "but there is a huge gap between
it and what can be done at present." One issue is what Bethel calls the "Tower
of Babel" problem: "A major objective of the Grid is a uniform means of
communication. But in order for Grid components to be able to communicate,
they must all speak the same language, using the same conventions. In the
visualization world, there are many different data file formats and grid
types, and no widespread agreement about how to go about having disparate
software components interact with one another."
A closely related concern is security. "The Grid couldn't work if a user
had to log into all these sites separately," Shalf remarks. Yet in working
toward secure sign-ons "the Grid community has spent too much time on getting
different components to talk to each other" -- at least from the standpoint of
effective visualization systems.
Lossy Versus Bossy
Part of the problem is that "historically, network specialists have a fear
of lost data." In a major article in the same issue of Computer Graphics and
Applications, which draws on their experience with the SC2002 Bandwidth
Challenge, Bethel and Shalf characterize the data-loss issue as one of
balancing "the competing interests of interactivity and fidelity" --
determining when absolute accuracy is needed and when it is not.
"The visualization community has long worked with missing data," Bethel
notes. "So John and I asked whether it is hypocritical to insist that a
visualization system preserve every single bit in the datastream, without
loss. After all, MPEG movies and JPEG images are lossy, yet are widely
accepted within the scientific community. The challenge is to have predictable
behavior with loss in the data used to create the visualization, not just with
lossy compression of images resulting from the visualization process."
In one of the two broad approaches that characterize present systems, the
visualization is first performed on a single server, then sent to the client -
- an approach that can handle large datasets but stonewalls interactivity. The
other approach is to transfer subsets of data that are assembled on the
client's desktop -- which is fine for interactivity but can't keep up with the
ever-increasing size of scientific data sets or the limitations of finite
network bandwidth.
Both these approaches preserve the integrity of the data as it travels the
internet. Neither works if large datasets and interactivity are needed
simultaneously. Volume rendering in full 3-D uses up a lot of computing power
and bandwidth; some systems take hours to render a single frame.
Yet, says Shalf, "Loss of data may not have much of an impact if it doesn't
lead to misinterpretation." The volume-rendering program named Visapult, whose
development was spearheaded by Bethel, was designed to work quickly over the
network using a combination of parallelism, pipelining, and novel "latency-
tolerant" visualization and graphics algorithms.
Berkeley Lab won the SC2002 Bandwidth Challenge by running the Visapult
program to view the results of black-hole collision simulations. The simulated
collisions were produced by the Cactus program, developed by the General
Relativity group at the Albert Einstein Institute in Potsdam, Germany. Up to
16.8 billion bits a second streamed from a Cactus simulation to the Visapult
application in Baltimore through an intricate network of high-speed
connections in Europe and the U.S.
It takes a capacious transmission line -- a "fat pipe" -- to carry 17
billion bits a second, and moving lots of data fast is an essential feature of
the science Grid. "The scientists all want fatter pipes," Shalf remarks, "but
if they can't fill them, it will be hard for anyone to demand network
improvement."
The data-transfer protocol favored by network specialists is Transmission
Control Protocol (TCP) -- emphasis on "control" -- which identifies individual
data packets at the source, then reassembles them in precise order at the
receiving end before the user ever sees the result. Data packets can spend a
lot of time on hold, waiting to be assembled in the right order or awaiting
the retransmission of a lost packet.
TCP includes a congestion-avoidance algorithm that forces it to cut back
its speed by half for each packet-loss event, then return to full speed only
very slowly. Consequently TCP network utilization efficiency is only about 25
percent.
One of the tricks Visapult used to gulp from Cactus's data firehose was a
different version of the Internet Protocol known as the User Datagram Protocol
(UDP) -- emphasis on "user." Unlike TCP, which rearranges out-of-order packets
and requests retransmission of dropped packets, UDP merely sends packets from
one machine to another; it's up to the application to detect when they arrive
out of order or if one has been lost.
TCP is actually a control layer built atop the more fundamental substrate
of UDP. By going back to the UDP protocol the Berkeley-led Bandwidth Challenge
team were able to reengineer the fundamental behavior of the protocol,
particularly its response to loss, an issue Visapult addresses by providing a
manual throttle for TCP's congestion-avoidance behavior. The Cactus/Visapult
combination's custom UDP-based protocol was able to use better than 90 percent
of the available network bandwidth -- and deliver the win.
"At SC2002 we filled the pipe using a custom UDP protocol," says Bethel.
"But to do it, we had to confront the issue of potential loss of data, as well
as transmission-induced data reordering."
A full-scale Cactus simulation requires three to five trillions of bytes of
data, far more than real-time visualization can handle. The Bandwidth
Challenge showed that loss is not only tolerable if its effects can be
managed, it's essential if the goal is interactivity in the presence of huge
amounts of data.
Visapult's basic design solves some of the problem through an architecture
known as pipelining: one of the components in Visapult's pipeline is housed on
a multiprocessor supercomputer and imports and reads all the data. During
first-stage processing, the data size is effectively reduced by an order of
magnitude. The result, a partial visualization, is then transferred to the
Visapult viewer. As the second component in Visapult's pipeline, the viewer
runs on a desktop workstation or a laptop computer.
With a customized UDP protocol controlling the stream between Cactus (the
data source) and Visapult's back end (the data consumer), not all the data
need even reach the Visapult back end before visualization and rendering
begin. The Visapult viewer component produces usable results without delay.
When more packets arrive they are included in the visualization process and
produce an increasingly detailed result. As Bethel and Shalf phrase it,
Visapult "tolerates loss gracefully" -- and by design.
In this way the evolving visualization can keep up with the evolving
simulation. If the scientist sees that a run is going awry, he or she can cut
it short, or even adjust the code or the problem's parameters on the fly.
Getting what you want to see In visualization, one of the most challenging
tasks is to allow scientists to find interesting things in data. "Finding
interesting things can be a difficult objective to achieve," says Bethel.
"Often, scientists are not sure exactly what they mean by interesting."
Worse yet, given the large number of control parameters needed to produce
different types of visual results, using visualization software can be a
complex task in itself. In a second major article in Computer Graphics and
Applications, Bethel and Shalf team with colleagues in the Visualization and
Graphics Research Group at the University of California at Davis to tackle the
problem of making Grid-based visualization friendly for nonexpert users.
"Web browsers are familiar interfaces for many users, but with all these
widely distributed, heterogeneous machines -- including the special graphics
machines employed in complex visualization applications -- it's hard to deploy
a system that everyone can use," says Shalf. "In this article we're proposing
a web-based portal which hides the complexity of launching complex, multi-
component visualization tools from the user." Portals are well known in the e-
commerce world, Shalf remarks, naming familiar examples like amazon.com and
E*TRADE.com. "
The portal and browser combination addresses some user interface issues.
The authors also describe a new kind of visualization application specially
designed for the web environment, which excels at facilitating visual
exploration of data. They describe a web-based visualization tool using a
"spreadsheet-like" interface to present images resulting from variations in
visualization parameters "designed to assist exploration by providing context
for where a user is in their exploration, where they have been, and suggesting
where they may go next."
Visualization is one of the primary tools the Grid promises for furthering
highly interactive, widely-distributed, multidisciplinary approaches to major
scientific problems. But much needs to be done.
The use of Grid portals is one still-evolving approach to the challenge of
access to resources. To meet the challenge of data transfer, Bethel and Shalf
emphasize that "dynamic environments require continuous adjustment of the data
rates" -- an area where the Visualization Group is hard at work on a range of
network solutions "so these Grid applications don't stomp all over each other
in practice," as Shalf puts it. Plenty of other challenges lie ahead in
repairing what Bethel calls "the disconnect between the research community and
science's practical tools."
Or, as Shalf says of prospects for improving Grid-based visualization,
"It's a work-rich environment."
Additional Information
"How the Grid will affect the architecture of future visualization systems"
and "Cactus and Visapult: an ultra-high performance Grid-distributed
visualization architecture using connectionless protocols," by E. Wes Bethel
and John Shalf, and "Deploying web-based visual exploration tools on the
Grid," by T. J. Jankun-Kelly, Oliver Kreylos, John Shalf, Kwan-Liu Ma, Bernd
Hamann, Kenneth I. Joy, and E. Wes Bethel, will appear in the March/April,
2003, issue of IEEE Computer Graphics and Applications.
http://www-vis.lbl.gov
http://www-vis.lbl.gov/RDProjects/visapult
http://www.cactuscode.org/index.html
http://graphics.cs.ucdavis.edu
|