June 18, 2007
Evergrid Inc., a provider
of global resource management software for next-generation datacenters, announced the Evergrid Cluster Availability Management
Suite (CAMS), a new continuous availability and resource management
software solution for high-productivity computing grid environments and
the utility enterprise datacenter. CAMS manages server clusters from
power-on through operating system provisioning and application
scheduling to load management. CAMS is integrated with Evergrid's
Availability Management Service (AvS) to provide checkpoint/resume
capabilities for applications, including massively parallel distributed
applications. With CAMS, batch applications run at near 100 percent
reliability.
Evergrid provides transparent fault tolerance using an OS
abstraction layer that loads between the operating system (OS) and the
application. Without modifying either the application or the operating
system, CAMS/AvS periodically captures the collective state of the
application across the entire infrastructure while the application
continues processing. By recording the state of an application and all
of the OS and system state, Evergrid is able to checkpoint and resume
from failures or interruptions rapidly with minimal overhead. Even
failure of multiple servers or of software systems does not stop an
application from being able to resume processing from a checkpoint.
Evergrid provides recovery especially for long-running, multi-server
batch jobs that are limited in their runtime by the inherent
reliability characteristics of software and hardware. The patented
checkpoint/resume technology also allows transparent stateful job
preemption and application migration of batch workloads on multiple
servers. Moreover, recovery and preemptive scheduling for
applications can be done globally, scaling across geographically
dispersed data centers.
"What differentiates Evergrid from other solutions that attempt to
solve the checkpoint problem is our ability to scale up to thousands of
nodes, with less than five percent performance overhead and without OS
or application changes," said Dave Anderson, CEO of Evergrid. "You
can't get this capability anywhere else."
Evergrid Cluster Availability Management Suite (CAMS) is comprised
of two products, Evergrid Availability Services (AvS-Batch) and
Evergrid Resource Manager (RM-Batch). Evergrid AvS-Batch captures the
collective state of single or multiple nodes running distributed
applications and prevents downtime by performing checkpoint, migration
and recovery of the application, thus providing automatic failover
across multiple nodes and tiers. Evergrid RM-Batch allows efficient
allocation of resources and stateful preemptive scheduling of jobs.
CAMS ensures that no compute cycle is lost by recovering, migrating or
pre-empting jobs. This translates to greater flexibility, reliability
and utilization of computing resources.
"Software solutions that minimize downtime for compute-intensive
applications, improve job execution, and minimize job preemption while
maximizing utilization of servers will fundamentally change how we
serve our user community," said Henry Neeman, director of the OU
Supercomputing Center for Education & Research (OSCER) at the
University of Oklahoma.
Evergrid's software is designed for demanding, computing-intensive
sectors such as manufacturing, financial services, and pharmaceutical
and petrochemical research. Currently, Evergrid solutions target
high-performance technical computing (HPTC) applications that are
computationally intensive and use high-speed interconnects. In the near
future, Evergrid will also provide solutions for the high-performance
enterprise computing (HPEC) and online transaction processing (OLTP)
database and enterprise application markets.
Evergrid licenses its Cluster Availability Management Suite software
on a per-socket, annual subscription basis, with substantial discounts
for large deployments. Evergrid's Availability Management Service can
be licensed separately for integration with other resource managers.
Currently CAMS and AvS are implemented on multiple versions of Linux.
Both Cluster Availability Management Suite and Availability Management
Services are available immediately from Evergrid.
About Evergrid Inc.
Evergrid, a provider of global resource management software for next
generation datacenters, lets massively parallelized, distributed
applications run properly on high-performance cluster grids, at near
100 percent reliability. Evergrid's fault-tolerant application
virtualization software prevents downtime, automates checkpoint,
migration, and recovery of applications, and scales to thousands of
nodes, with less than 5 percent performance overhead.
Evergrid's leadership team brings extensive management and technology expertise from IBM, Amdahl, VERITAS, Motorola, Tandem Computers and the Virginia Polytechnic Institute and State University. Evergrid is a private company that is funded, in part, by Menlo Ventures and the Acartha Group. For more information, visit www.evergrid.com/.
-----Source: Evergrid Inc.