December 03, 2007
SuperComputing 2007 in Reno, Nev., saw the second “interoperability fest” involving the OGF’s HPC Basic Profile (HPCBP) specification. Groups from academia, research and industry came together to demonstrate the interoperability of the core specification and to prototype extensions to the core specification used to support more advanced capabilities, such as file staging.
High Performance Computing Basic Profile (HPCBP) Background
The HPCBP specification has been developed by the OGF’s HPC Profile Working Group over the last 18 months to define how to submit, monitor and manage jobs using standard mechanisms across different job schedulers or grid middleware from different software providers. A significant milestone was passed in August 2007 when the HPC Profile Working Group’s first specification --the HPC Basic Profile 1.0 -- was published as a proposed recommendation. As part of the process of moving the HPCBP from a proposed to a full recommendation, it is necessary to gather experiences of implementing the specification, and to demonstrate its interoperability between two or more independent implementations. The working group, comprising individuals from industry, academia and research, decided to use SC07 as the venue for an “interoperability fest” of the published specifications.
The HPCBP is focused just on managing job submission, yet it provides sufficient core functionality to enable its use as a basis for integration into applications (both desktop and Web-based), its use as a basis for meta-scheduling and its use from within workflow engines. It leverages standards developed within the OGF (the Basic Execution Service and the Job Submission Description Language specification) and those from the broader Web services community (WS-Security, WSDL and SOAP).
At SC06, more than a dozen groups drawn from industry, academia and research demonstrated interoperability between their prototype implementations of this Web service on their existing job submission infrastructures. This year, with the HPCBP specification now a proposed OGF standards recommendation, many of these groups returned to show how the longer-term integrations of the HPCBP into their plans were proceeding. Several commercial organizations were using prototypes developed by their engineering teams as part of their plans for inclusion in their products. Teams from academia and research were showing how their implementations could be deployed and used with large-scale grid infrastructure deployments.
SC07 Demonstration Results
Participants in this year’s activity included Altair Engineering, Microsoft, Platform Computing, the London e-Science Centre at Imperial College London on behalf of OMII-UK, the University of Virginia e-Science group, and representatives from the EGEE and NIC/FZJ groups within the OMII-Europe project.
All of the participants were demonstrating their implementations of the HPCBP, revised from last year’s versions due to the final changes made in the specification during the public comment period, and verifying their continued interoperability with other participating implementations. This interoperability work had been greatly accelerated through the use of a Web-based compatibility tester developed by the University of Virginia e-Science group. The portal allows users to run a series of tests, derived from the HPCBP specification, to verify that their endpoint is compliant with the standard by seeing how it accepts and generates the XML messages and responds to failures or incorrect messages.
From this groundwork, the interoperability testing that took place in the run-up to SC07 and at the meeting was broadly very successful. Jobs were passing from clients to services located on different operating systems (different versions of Linux and Windows, for example), using different Web service stacks (including Windows Communication Foundation, gSOAP, Axis and XFire) and interfacing to different jobs scheduling interfaces (including PBS, LSF, Windows Compute Cluster Server v1 & v2, SGE, Torque, ARC, CREAM and Globus). Minor issues continue to be found relating to the interoperability of different Web service hosting environments, (for example, in how WS-Security is handled) rather than any fundamental problems with the HPCBP specification.
Commercial Adoption Plans
Several commercial organizations also announced their plans to provide implementations of the HPCBP-compatible Web services in their products.
Platform Computing has contributed its implementation to an open source project (BES++ hosted on SourceForge) that uses the gSOAP toolkit to submit jobs through the HPCBP into LSF. Work is ongoing at the University of Virginia to extend this software to submit jobs into other schedulers, such as PBS. This software will be integrated as part of Platform’s product line in 2008.
Microsoft was demonstrating a prototype of an HPCBP-compatible Web service running on Windows HPC Server 2008, the next version of Microsoft’s HPC product due out in the second half of 2008. It uses the Windows Communication Foundation as its Web service stack and it is currently planned for inclusion in the second beta of HPC Server 2008.
Altair Engineering also had a prototype integration of an HPCBP-compatible client in its PBS Professional product that demonstrated the use of this specification as a meta-scheduler. Job submitted into PBS using the conventional command line utilities could be transferred to another HPCBP-compliant resource. At SC07, jobs were being submitted into a PBS instance running on Linux on the Altair stand and being executed through the HPCBP Web service on a Windows machine on the Microsoft stand.
Next Steps
After SC07, work is continuing within the working group to capture our interoperability experiences in order to support the migration of the HPCBP specification from a proposed to a full recommendation. These experiences also are being used to refine those implementations that will be emerging in 2008 as products. The practical experience gained at SC07 with the Data Staging and Activity Credential extensions will be used to develop a single extension focused exclusively on integrating File Staging operations that need different credentials with the current JSDL specification already used in the HPCBP. It is hoped that this specification will be submitted into the OGF process in early 2008.