Scientific
Applications:
SDSC RELEASES NEW VERSION OF
DATA-GRID SOFTWARE
The San Diego Supercomputer Center (SDSC) at UCSD has released a new
version
of its popular software that enables scientists to create, manage, and
collaborate with data collections located on heterogeneous resources
distributed across a network. Version 2.0.0 of SDSC's Storage Resource Broker
(SRB) preserves existing capabilities of the earlier version for current
users, while also providing a greater number of faster, more powerful services
in an easier-to-use interface. The software, user manual, and release notes
are available at www.npaci.edu/dice/srb/.
"These growing capabilities are enabling new science to be done in new
ways,"
said Reagan Moore, co-director of SDSC's Data and Knowledge Systems (DAKS)
program and lead of the Data-Intensive Computing Environments (DICE) group
thrust in the National Partnership for Advanced Computational Infrastructure
(NPACI). "Working closely with the computational science community, we have
incorporated user requests over the last two years into version 2.0 so that
the SRB addresses a growing array of scientific needs."
Research groups in a variety of fields are using or planning to use the
SDSC
SRB software to integrate, manage, and access explosively growing data
collections. Developed by Moore, Arcot Rajasekar, Michael Wan, and colleagues
in SDSC's DAKS program, the SRB is being used in projects as diverse as
helping astronomers integrate multi-terabyte image collections in the National
Science Foundation's National Virtual Observatory to enabling NIH-funded
neuroscientists to share brain data across the country in the Biomedical
Informatics Research Network. The National Archives and Records Administration
is using the software to develop persistent archives, NASA is using it to
merge massive sets of satellite data, and other groups are employing the SRB
to bring together diverse types of environmental data.
In general, the SRB offers many advantages over traditional file systems.
What
appears as a single collection to the user of the software is actually a
virtual collection consisting of digital entities scattered across
distributed, heterogeneous storage resources, including file systems,
archives, and databases. The SRB makes all these differences transparent to
users. It negotiates all protocols and access permissions across the multiple
sites so that users can access data based on familiar, user-defined names of
data attributes. This frees them from having to keep track of such
complexities as file names, physical locations, protocols, and security
arrangements. The SRB not only supports more efficient science at the
researcher level, but it also enables rapid collaborations never before
possible.
SRB collections are highly scalable, both in size and in distribution
across
remote sites. For example, SRB collections at SDSC support more than 6.5
million files and 40 terabytes of data. There are currently more than 200
registered users of the SDSC SRB at more than 50 sites.
NEW FEATURES
The principal new features in version 2.0 of the SRB (the previous version
was
1.1.8) include:
- Server-initiated, multi-threaded parallel data transfers, which give the
new version faster and more robust transfers of very large data sets.
- Revamping the SRB Administration GUI into an easy-to-use Java-based
client-
side tool that assists in the management of the SRB.
- MCAT port for Sybase and Postgres databases.
- Improved MCAT metadata catalog functions for such things as creating and
deleting users and resources, and parallel bulk loading of metadata into the
MCAT, yielding speeds of more than 400 files per second, a factor of 50 faster
loading for collections that contain large numbers of small files.
- Its own Mass Storage System (MSS), which uses a new type of "compound
resource" to manage connectivity to tape silos and tape devices, using the SRB
to provide caching and other functionality, without requiring a proprietary
tape management system. The MSS enables users to economically build their own
mass storage system in which data migrate automatically between cache and
tape.
"One of the most important new features is the server-driven parallel data
transfers," said Rajasekar, leader of the DAKS SRB development team. "By
incorporating automatic parallel data transfers with up to five threads in a
way that is transparent to users, the software optimizes and matches the
transfer to the network and server export rates, resulting in transfers that
are more robust and two or three times faster." Early tests have already shown
transfer rates at 85 percent of network capability.
DAKS researchers on the SDSC SRB project led by Rajasekar include: Wan,
Sheau-
Yen Chen, Charles Cowart, Lucas Gilbert, Arun Jagatheesan, George Kremenek,
Roman Olschanowsky, Vicky Rowley, Wayne Schroeder, and Bing Zhu.
|