GRIDtoday Logo Hewlett-Packard

DAILY NEWS AND INFORMATION FOR THE GLOBAL GRID COMMUNITY / JUNE 9, 2003: VOL. 2 NO. 23

   ( Table of Contents )   

Special Features:

NSF REPORT ON CYBERINFRASTRUCTURE AND GRIDS

Revolutionizing Science And Engineering Through Cyberinfrastructure

Report of the National Science Foundation
Blue Ribbon Advisory Panel on Cyberinfrastructure

Executive Summary

This is the final report of a Blue Ribbon Advisory Panel on Cyberinfrastructure, a panel of experts formed and charged by the National Science Foundation (NSF) Assistant Director for the Computer and Information Science and Engineering (CISE) Directorate to evaluate current major investments in cyberinfrastructure and its use, to recommend new areas of emphasis relevant to cyberinfrastructure, and to propose an implementation plan for pursuing them. We carried out this charge through individual interactions with researchers, surveys, testimony, review of prior relevant reports, requests for comments, participation in workshops, and extensive deliberation.

The Panel's overarching finding is that a new age has dawned in scientific and engineering research, pushed by continuing progress in computing, information, and communication technology, and pulled by the expanding complexity, scope, and scale of today's challenges. The capacity of this technology has crossed thresholds that now make possible a comprehensive "cyberinfrastructure" on which to build new types of scientific and engineering knowledge environments and organizations and to pursue research in new ways and with increased efficacy.

Such environments and organizations, enabled by cyberinfrastructure, are increasingly required to address national and global priorities, such as understanding global climate change, protecting our natural environment, applying genomics-proteomics to human health, maintaining national security, mastering the world of nanotechnology, and predicting and protecting against natural and human disasters, as well as to address some of our most fundamental intellectual questions such as the formation of the universe and the fundamental character of matter.

The Panel's overarching recommendation is that the National Science Foundation should establish and lead a large-scale, interagency, and internationally coordinated Advanced Cyberinfrastructure Program (ACP) to create, deploy, and apply cyberinfrastructure in ways that radically empower all scientific and engineering research and allied education. We estimate that sustained new NSF funding of $1 billion per year is needed to achieve critical mass and to leverage the coordinated coinvestment from other federal agencies, universities, industry, and international sources necessary to empower a revolution.

The cost of not acting quickly or at a subcritical level could be high, both in opportunities lost and in increased fragmentation and balkanization of the research communities.

The amounts of calculation and the quantities of information that can be stored, transmitted, and used are exploding at a stunning, almost disruptive rate. Vast improvements in raw computing power, storage capacity, algorithms, and networking capabilities have led to fundamental scientific discoveries inspired by a new generation of computational models that approach scientific and engineering problems from a broader and deeper systems perspective. Scientists in many disciplines have begun revolutionizing their fields by using computers, digital data, and networks to extend and even replace traditional techniques. Online digital instruments and wide-area arrays of sensors are providing more comprehensive, immediate, and higher- resolution measurement of physical phenomena. Powerful "data mining" techniques operating across huge sets of multidimensional data open new approaches to discovery. Global networks can link all these together and support more interactivity and broader collaboration.

A central goal of ACP is to define and build cyberinfrastructure that facilitates the development of new applications, allows applications to interoperate across institutions and disciplines, insures that data and software acquired at great expense are preserved and easily available, and empowers enhanced collaboration over distance, time and disciplines. The individual disciplines must take the lead in defining specialized software and hardware environments for their fields based on common cyberinfrastructure, but in a way that encourages them to give back results for the general good of the research enterprise.

The emerging vision is to use cyberinfrastructure to build more ubiquitous, comprehensive digital environments that become interactive and functionally complete for research communities in terms of people, data, information, tools, and instruments and that operate at unprecedented levels of computational, storage, and data transfer capacity. Increasingly, new types of scientific organizations and support environments for science are essential, not optional, to the aspirations of research communities and to broadening participation in those communities. They can serve individuals, teams, and organizations in ways that revolutionize what they can do, how they do it, and who participates. This vision also has profound broader implications for education, commerce, and social good.

Our findings are supported by substantial grass roots activity in research communities. The Internet, World Wide Web, and supercomputing have already provided new tools for science, but glimpses of much more powerful and comprehensive environments for discovery and learning can be seen in a landscape of projects focusing on creating advanced cyberinfrastructure and/or using it to create new knowledge environments for specific fields of science.

Included in this landscape are the NSF Partnerships for Advanced Computing Infrastructure (PACI), the Pittsburgh Terascale Computing System (TCS), and the Distributed Terascale Facility (DTF) that "grids" together resources at all these centers plus others. Also included are a series of NSF networking, digital library, scientific database, advanced interface, and middleware research initiatives. Through the NSF Information Technology Research (ITR) initiative and other NSF programs, projects have emerged from many disciplines involving computer science and engineering researchers working to develop and use cyberinfrastructure in specific projects.

Testimony from research communities indicate that many contemporary projects require effective federation of both distributed resources (data and facilities) and distributed, multidisciplinary expertise, and that cyberinfrastructure is a key to making this possible.

There is no standard term for such environments enabled by cyberinfrastructure; some of the names in use are collaboratory, co- laboratory, grid community/network, virtual science community, and e-science community. A few examples are the Network for Earthquake Engineering Simulations (NEES), the Space Physics and Aeronomy Research Collaboratory (SPARC), the National Ecological Observatory Network (NEON), the Grid Physics Network (GriPhyN), the International Virtual Data Grid Laboratory (iVDGL), and the High Energy Physics Collaboratory for the ATLAS project. Research mission agencies are also initiating similar projects, for example, the NIH Biomedical Informatics Research Network (BIRN), the Department of Energy (DOE) National Collaboratories Program, and the DOE program in Scientific Discovery through Advanced Computing (SciDAC). Relevant international activities include the UK E-science program, parts of the European Union 6th Framework Project, and the Japanese Earth Simulator. Because of the extent of cyberinfrastructure investment under way in other countries and the intrinsic global nature of science, an effective response to our primary finding should be interagency and international in scope.

Achieving this vision will challenge our fundamental understanding of computer and information science and engineering as well as parts of social science, and it will motivate and drive basic research in these areas. We envision radical improvements in cyberinfrastructure and its impact on all science and engineering over time, as work ripens at the intersection of fundamental social and technical research about cyberinfrastructure and its application to advance discovery and learning.

This vision of science and engineering research involves significant educational dimensions. The research community needs more broadly trained personnel with blended expertise in disciplinary science or engineering, mathematical and computational modeling, numerical methods, visualization, and the sociotechnical understanding about working in new grid or collaboratory organizations. Grid and collaboratory environments built on cyberinfrastructure can enable people to work routinely with colleagues at distant institutions, even ones that are not traditionally considered research universities, and with junior scientists and students as genuine peers, despite differences in age, experience, race, or physical ability. These new environments can contribute to science and engineering education by providing interesting resources, exciting experiences, and expert mentoring to students, faculty, and teachers anywhere there is access to the Web. The new tools, resources, human capacity building, and organizational structures emerging from these activities will also eventually have even broader beneficial impact on the future of education at all levels and likely on all types of educational institutions.

A vast opportunity exists for creating new research environments based upon cyberinfrastructure, but there are also significant risks and costs if we do not act quickly and at a sufficient level of investment. The dangers, all increasing with the passage of time, include adoption of incompatible data formats in different fields; permanent loss of observational data due to lack of wellcurated, long-term archives; increased technological ("not invented here") balkanizations rather than interoperability among disciplines; wasteful redundant system-building activities among science fields or between science fields and industry; lack of synergy among information technology research, the IT industry, and domain science users resulting in under- or overestimating technological futures; lost opportunity from not driving basic computer science research with advanced applications; loss of leadership to other countries and a falloff of research and economic vigor; lack of understanding of social/ cultural barriers to new ways of doing research; inadequate supporting or supported educational activities; and an inadequate, piecemeal cyberinfrastructure program.

We propose a large, long-term, and concerted new effort, not just a linear extension of the current investment level and resources. NSF must recognize that the scope of shared cyberinfrastructure is far broader than in the past – it includes computing cycles, higher capacity networking, massive storage, and managed information. NSF must ensure that the exponentially growing amounts of data are collected, curated, managed, and stored for broad, long-term access by scientists everywhere. The new effort must create and continually renovate a new "high end," so that selected research projects can use centralized resources 100-1000 times faster and bigger than are available locally.

But even this is not sufficient. There must be high-level leadership on shared standards, middleware, and building advanced scientific tools that enable scientists to follow new paths, try new techniques, build better models, and test them in new ways and that facilitates innovative interdisciplinary activities. The program must also have a component to empower more people and more disciplines to benefit from the use of cyberinfrastructure. It must especially encourage science and engineering communities to exploit the new opportunities that cyberinfrastructure brings for including people who, because of physical capabilities, location, or history, have been excluded from the frontiers of scientific and engineering research and education.

NSF's prior investments provide a sound foundation for the ACP. In particular, the two NSF Partnerships for Advanced Computational Infrastructure (PACI) established in 1997 have been pioneers in activities closely related to the ACP. They have provided high-end computing cycles; developed software tools for helping people to use architecturally diverse machines; supported education, outreach, and training with a special focus on underrepresented groups; and nurtured specific testbed projects for science-driven collaboratories or grids.

Much of the experience and expertise represented in the PACIs is highly relevant to the ACP, and we believe that subject to appropriate review they should be competitive for expanded missions and continuing or expanding resources within ACP.

NSF has both a unique breadth of scientific scope and a mandate for the health of the scientific research enterprise in the U.S., and therefore NSF should lead the ACP for the federal government. We estimate that sustained new funding for NSF of $1 Billion per year is required to achieve the critical mass necessary for revolutionary changes, reusable assets and experiences, and to be a true partner with other agencies. Only then will it be able to leverage the coordinated co-investment from other federal agencies, universities, industry, and international sources required to empower a revolution. An NSF-led ACP can be catalytic and provide over-the-horizon views for other agencies, research labs, and education at large.

We estimate that the new funding will be distributed into four coordinated areas: fundamental research to create advanced cyberinfrastructure ($60M); research on the application of cyberinfrastructure to specific fields of science and engineering research ($100M); acquisition and development of production quality software for cyberinfrastructure and supported applications ($200M); provisioning and operations (including computational centers, data repositories, digital libraries, networking, and application support) ($660M). These are recurring annual figures.

The opportunity is enormous, but also enormously complex and must be approached in a long-term, comprehensive way with great attention to a management structure that can identify and act on the common interests of a large and varied set of stakeholders. Some of the most critical challenges to this ambitious program are to 1) build real synergy between computer and information science research and development, and its use in science and engineering research and education; 2) capture the cyberinfrastructure commonalities across science and engineering disciplines; 3) use cyberinfrastructure to empower and enable, not impede, collaboration across science and engineering disciplines; 4) exploit technologies being developed commercially and apply them to research applications, as well as feed back new approaches from the scientific realm into the larger world; and 5) engage social scientists to work constructively with other scientists and technologists.

We recommend that the organization of the ACP be overlaid in a matrix fashion on the existing organizational structures of NSF with the addition of a single new coordinating ACP Office (ACPO). Achieving sufficient coordination within the proposed matrix management structure will be formidable; the roles of the ACPO are to provide overall vision and guidance and to exercise budgetary planning and responsibility. Wherever it is administratively placed within NSF, the ACPO must have significant autonomy. Its leader must have fundamental responsibility for achieving the goals of the ACP, with sufficient credibility, power, resources, and authority to succeed in working with all NSF directorates and other domestic and international agencies. Domain science and engineering directorates must take the lead in revolutionizing their respective fields through new research organization and processes, supported by new applications of information technology. CISE must be deeply involved as a technology user and as a technology leader for the overall program. It should also benefit from advanced scientific applications informing and validating its own research.

The ACP requires an organization for internal NSF coordination, as well as a central point of coordination in its external implementation. Several development centers should be devoted to activities at the core of the program. These core activities include the planning, acquisition, integration, and support of the major software platforms and components at the foundation of cyberinfrastructure, as well as the management of consistency and sharing across the program. Human resources are critical to making cyberinfrastructure and applications work, keeping them working, and providing user support. In the interest of funding more grants, NSF has arguably undersupported the recurring costs of permanent staff, preferring to focus resources on direct research costs and "hard" or "tangible" assets. In the ACP, human resources are the primary requirement in both development and operations, and success is clearly dependent on adequate funding both in centers and in the end-user research groups. To be successful, the ACP will require committed champions and leaders from the research community, long-term focus and commitment, innovative organizational structures, and a sustained high level of support and commitment from the upper levels of NSF, other federal agencies, and Congress.

This Panel believes that the National Science Foundation has a once-in-a- generation opportunity to lead the revolution in science and engineering through coordinated development and expansive use of cyberinfrastructure.

This report can be viewed in its entirety at: www.cise.nsf.gov/evnt/reports/toc.htm

Reprinted Courtesy: National Science Foundation

( Top of Page )

   ( Table of Contents )