Special
Features:
CAN GRIDS/CLUSTERS OVERTAKE SUPERCOMPUTERS?
Until a few years ago, the Top 500 list of supercomputers was dominated by
proprietary solutions with prices that put high-performance computing (HPC)
out of reach for all but the most well-funded institutions and companies.
Then came Linux. The open source OS has gate-crashed the Top 500 list in a
big way over the past few years, clocking in with more than 50 entries in the
November 2002 survey. These Linux supercomputers are clusters, meaning they
are composed of a number of relatively low-end machines working together over
a network. But clusters also have some drawbacks, including slower access
to data than proprietary solutions.
Can clusters conquer their shortcomings and overtake proprietary
supercomputers?
SGI, for one, thinks so. The company recently unveiled a new innovation
that represents a giant step toward that goal. As a result, the world of
supercomputing may soon have a new middleweight, or at least mid-price,
champion.
Typically, Linux supercomputing clusters link a number of commodity
machines that work on problems in parallel, with one "master" node directing
the work and many "slave" nodes working together. These clusters can be
composed of just a few machines that handle Web serving, or of hundreds or
thousands of nodes that perform high-performance computing tasks that once
would have required a supercomputer.
Most Linux clusters use off-the-shelf hardware, typically white-box
computers that have one or two processors per machine. Because they use
standard hardware, the costs of clusters are far below those of
proprietary supercomputers, though those costs still are not insignificant
when deploying hundreds of nodes.
On the flip side, clusters lack some of the features that make the big
iron attractive. For example, clustered systems are limited by the amount of
data that can be processed by each node. Data sets must be broken down into
small chunks that can be handled by an individual machine, and the bandwidth
latency between machines is much greater than that of a supercomputer --
making it difficult or impossible to run tasks that are dependent on fast
access to data.
Don Becker, who pioneered Beowulf clustering, wrote many of the earliest
network device drivers, and is founder and chief technical officer of
clustering company Scyld Computing, confirmed that today's commodity Linux
clusters are up to most, but not all, computing tasks.
But a new day in clustering history may be dawning. SGI claims its latest
offering, called the Altix, can tackle the tasks that commodity Linux clusters
cannot handle. The Altix is a new approach to Linux clustering that puts up to
64 processors in each node -- a far cry from the one or two processors per
node in typical cluster systems.
The Altix also uses a different approach to handling memory, called non-
uniform memory access (NUMA). This means Altix machines can have much larger
shared memory spaces -- up to 512 GB of memory per 64-processor node -- than
commodity Linux clusters, and can even share memory among nodes.
All of this sounds pretty impressive, but how will the Altix be used?
"Some kinds of weather codes can't be done well on clusters today, and might
be appropriate for [the SGI Altix] solution," Becker told NewsFactor. "There
are certain math-oriented problems that require a large shared-memory
model."
Jason Pettit, SGI's Altix 3000 product manager, told NewsFactor that the
company made quite a few changes in the Linux kernel and surrounding tools to
accommodate the new approach to clustering. Some of these changes have
focused on base-level hardware support. Pettit noted that SGI has been working
on Altix support for NUMA platforms produced by other vendors (such as
IBM).
According to Pettit, the SGI Altix uses a patched Linux kernel in the 2.4
series (2.4.19) and version 2.2.4 of the GNU C Libraries. He said the company
also has worked on "tools for managing jobs on NUMA systems. You don't want
your memory over by processor number 64 when your job is on processor number
one."
According to Becker, this is difficult stuff. "If you want a single kernel
to run all 64 processors, you must change the scheduler and memory management.
Locking on the network stack needs to be significantly updated and perhaps the
locking on the file system." In addition, he said, the changes required to
make the Linux kernel run efficiently on a 64-processor machine create
challenges for machines with fewer CPUs. "Adding finer-grained locks makes the
kernel less suitable [and less efficient] for one- or two-processor
machines."
Andy Fenselau, SGI's Altix 3000 product line manager, told NewsFactor that
SGI's patched kernel is "an interim solution until the standard kernel
solution is adequate." Additionally, Fenselau noted, SGI plans to play nice
with the Linux community, with some provisions. "We're going to be good
community citizens and share the technology but we also need to keep some of
the 'special sauce' exclusive to our system."
Aside from addressing high numbers of processors, the choice of CPU itself
is not trivial. Becker put it pretty simply, saying, "It's a difficult
decision to pick a processor today." The SGI Altix uses Intel's Itanium 2, a
64-bit processor that can address much more memory than the 32-bit Intel and
AMD chips that are typically used in commodity clusters.
Fenselau said SGI chose Intel over other chips because of the support Intel
has given to the new processor family. "Intel is investing close to US$200
million in creating the ecosystem [compilers and other tools] for the Itanium
processor family. AMD isn't doing that."
According to Fenselau, SGI is going after companies and institutions that
want to avoid proprietary solutions while still getting high performance.
"[Many] of our target markets at the end of the day are pretty frustrated with
proprietary platforms that have been the mainstay of those markets. It's
hostage technology."
Until now, Fenselau said, "the options have been to switch to another
proprietary platform, or to go to a standards-based Linux cluster but with
some very real compromises in functionality. They've grown quite concerned
over the compromises."
Fenselau said SGI's Altix represents a middle road between high-priced
proprietary platforms and low-cost commodity Linux clusters. "We're looking at
pricing that is one-third to half of proprietary high-performance computing,
and looking for a price premium over the commodity clusters," he said.
Although the price of the Altix will be somewhat higher than that of an
average Linux cluster, Fenselau told NewsFactor that the SGI solution
will be worth the cost. "The delivered performance and productivity will more
than compensate for the premium that we're looking for over these [commodity]
systems."
The Altix has already performed well in benchmarks. For example, it
achieved the highest score on record for Standard Performance Evaluation
Corporation (SPEC) tests, according to SGI. The question now, according to
Becker, is whether SGI can "translate the technical prowess into a market
advantage.… It does come down to [the] price/performance ratio of the
resulting machine."
|