GRIDtoday Logo Hewlett-Packard

DAILY NEWS AND INFORMATION FOR THE GLOBAL GRID COMMUNITY / MARCH 3, 2003: VOL. 2 NO. 9

( Previous Article )   ( Table of Contents )   ( Next Article )

Special Features:

CAN GRIDS/CLUSTERS OVERTAKE SUPERCOMPUTERS?

Until a few years ago, the Top 500 list of supercomputers was dominated by proprietary solutions with prices that put high-performance computing (HPC) out of reach for all but the most well-funded institutions and companies.

Then came Linux. The open source OS has gate-crashed the Top 500 list in a big way over the past few years, clocking in with more than 50 entries in the November 2002 survey. These Linux supercomputers are clusters, meaning they are composed of a number of relatively low-end machines working together over a network. But clusters also have some drawbacks, including slower access to data than proprietary solutions.

Can clusters conquer their shortcomings and overtake proprietary supercomputers?

SGI, for one, thinks so. The company recently unveiled a new innovation that represents a giant step toward that goal. As a result, the world of supercomputing may soon have a new middleweight, or at least mid-price, champion.

Typically, Linux supercomputing clusters link a number of commodity machines that work on problems in parallel, with one "master" node directing the work and many "slave" nodes working together. These clusters can be composed of just a few machines that handle Web serving, or of hundreds or thousands of nodes that perform high-performance computing tasks that once would have required a supercomputer.

Most Linux clusters use off-the-shelf hardware, typically white-box computers that have one or two processors per machine. Because they use standard hardware, the costs of clusters are far below those of proprietary supercomputers, though those costs still are not insignificant when deploying hundreds of nodes.

On the flip side, clusters lack some of the features that make the big iron attractive. For example, clustered systems are limited by the amount of data that can be processed by each node. Data sets must be broken down into small chunks that can be handled by an individual machine, and the bandwidth latency between machines is much greater than that of a supercomputer -- making it difficult or impossible to run tasks that are dependent on fast access to data.

Don Becker, who pioneered Beowulf clustering, wrote many of the earliest network device drivers, and is founder and chief technical officer of clustering company Scyld Computing, confirmed that today's commodity Linux clusters are up to most, but not all, computing tasks.

But a new day in clustering history may be dawning. SGI claims its latest offering, called the Altix, can tackle the tasks that commodity Linux clusters cannot handle. The Altix is a new approach to Linux clustering that puts up to 64 processors in each node -- a far cry from the one or two processors per node in typical cluster systems.

The Altix also uses a different approach to handling memory, called non- uniform memory access (NUMA). This means Altix machines can have much larger shared memory spaces -- up to 512 GB of memory per 64-processor node -- than commodity Linux clusters, and can even share memory among nodes.

All of this sounds pretty impressive, but how will the Altix be used?

"Some kinds of weather codes can't be done well on clusters today, and might be appropriate for [the SGI Altix] solution," Becker told NewsFactor. "There are certain math-oriented problems that require a large shared-memory model."

Jason Pettit, SGI's Altix 3000 product manager, told NewsFactor that the company made quite a few changes in the Linux kernel and surrounding tools to accommodate the new approach to clustering. Some of these changes have focused on base-level hardware support. Pettit noted that SGI has been working on Altix support for NUMA platforms produced by other vendors (such as IBM).

According to Pettit, the SGI Altix uses a patched Linux kernel in the 2.4 series (2.4.19) and version 2.2.4 of the GNU C Libraries. He said the company also has worked on "tools for managing jobs on NUMA systems. You don't want your memory over by processor number 64 when your job is on processor number one."

According to Becker, this is difficult stuff. "If you want a single kernel to run all 64 processors, you must change the scheduler and memory management. Locking on the network stack needs to be significantly updated and perhaps the locking on the file system." In addition, he said, the changes required to make the Linux kernel run efficiently on a 64-processor machine create challenges for machines with fewer CPUs. "Adding finer-grained locks makes the kernel less suitable [and less efficient] for one- or two-processor machines."

Andy Fenselau, SGI's Altix 3000 product line manager, told NewsFactor that SGI's patched kernel is "an interim solution until the standard kernel solution is adequate." Additionally, Fenselau noted, SGI plans to play nice with the Linux community, with some provisions. "We're going to be good community citizens and share the technology but we also need to keep some of the 'special sauce' exclusive to our system."

Aside from addressing high numbers of processors, the choice of CPU itself is not trivial. Becker put it pretty simply, saying, "It's a difficult decision to pick a processor today." The SGI Altix uses Intel's Itanium 2, a 64-bit processor that can address much more memory than the 32-bit Intel and AMD chips that are typically used in commodity clusters.

Fenselau said SGI chose Intel over other chips because of the support Intel has given to the new processor family. "Intel is investing close to US$200 million in creating the ecosystem [compilers and other tools] for the Itanium processor family. AMD isn't doing that."

According to Fenselau, SGI is going after companies and institutions that want to avoid proprietary solutions while still getting high performance. "[Many] of our target markets at the end of the day are pretty frustrated with proprietary platforms that have been the mainstay of those markets. It's hostage technology."

Until now, Fenselau said, "the options have been to switch to another proprietary platform, or to go to a standards-based Linux cluster but with some very real compromises in functionality. They've grown quite concerned over the compromises."

Fenselau said SGI's Altix represents a middle road between high-priced proprietary platforms and low-cost commodity Linux clusters. "We're looking at pricing that is one-third to half of proprietary high-performance computing, and looking for a price premium over the commodity clusters," he said.

Although the price of the Altix will be somewhat higher than that of an average Linux cluster, Fenselau told NewsFactor that the SGI solution will be worth the cost. "The delivered performance and productivity will more than compensate for the premium that we're looking for over these [commodity] systems."

The Altix has already performed well in benchmarks. For example, it achieved the highest score on record for Standard Performance Evaluation Corporation (SPEC) tests, according to SGI. The question now, according to Becker, is whether SGI can "translate the technical prowess into a market advantage.… It does come down to [the] price/performance ratio of the resulting machine."

( Top of Page )

( Previous Article )   ( Table of Contents )   ( Next Article )