GRIDtoday Hewlett-Packard

DAILY NEWS AND INFORMATION FOR THE GLOBAL GRID COMMUNITY /
  ( Table of Contents )  
Systems/Enterprise:

WEB SERVICES RELIABILITY BECOMES OASIS STANDARD
By Alan J. Weissberger, NEC Corp voting member of OASIS WS-RM TC

Summary

At its Nov. 10-11 meeting, the OASIS WS-RM (Reliable Messaging) TC voted to send the WS Reliability specification version 1.1 to OASIS for publication as a standard. An OASIS Standard signifies the highest level of ratification for a specification developed by an OASIS TC. Developed through an open process, WS-Reliability enables companies to conduct reliable business-to-business trading or collaboration using Web services. Three protocol capabilities are provided by this standard: guaranteed delivery, ordered delivery, and duplicate elimination. These are described along with message reply patterns later in this article. Figures illustrating the reliable messaging model and the reply patterns are available below.

The milestones reached by the WS-RM TC are noted in Table 1.

Additionally, four companies (NEC, Fujitsu, Hitachi and Oracle) participated in a successful interoperability demo of the WS-Reliability specification. This was the third such validation of multi-vendor interoperability by the WS-RM TC.

WS-Reliability will be used in the Japanese Business Grid project to ensure reliable delivery of SOAP formatted notification messages, which are sent based on some predefined condition, e.g. CPU/Server throughput exceeds a pre-set threshold level or drops below a "low water mark." These reliable notification messages may be sent between different companies at different geographical locations. This may facilitate disaster recovery or remote data base synchronization (or back-up) between multiple Grid sites. The standards used in the Business Grid project are listed in the Table 2.

As other Web services standards are approved by OASIS, they will likely be encapsulated in WS Reliability envelopes to ensure reliable delivery between end points.

A presentation on the Japanese Business Grid Project was made at GGF12 Enterprise Grid Workshop session, and is available for free download from:


I. Why Is Reliability (i.e. Reliable Message Delivery) Needed For Web Services?

As Web services (WS) start to be deployed across enterprise boundaries and for collaborative e-business and e-transaction scenarios, reliable delivery of messages becomes a critical issue. This is because communications over the Internet and Intranets is inherently unreliable, as the underlying "transport protocols" (HTTP, SMTP and JMS) do not offer any form of guaranteed or ordered delivery for SOAP messages. Yet, those messages must be delivered to the ultimate receiver, even in the presence of component, system or network failures! If a message cannot be reliably delivered, then the user must be so informed.

For Web services messaging to be robust within an enterprise, or to be used across firewalls, it is imperative that a large amount of control, management and security related protocol information be delivered over a reliable connection. It is also important to ensure that user data exchanges are similarly delivered in a reliable fashion to the Application entity. A Reliable Messaging sender and receiver must co-operate to achieve this WS Reliability. The "users" of reliable messaging are either other WS protocols (e.g. WS Security, WS Notification, WS Resource Properties, WS Distributed Management, etc) and/or Application layer/user information exchanges between the end points of the connection.

Accordingly, reliable messaging becomes one of the first problems that need to be addressed for Web services to become a truly viable software technology. (Would you consider sending credit transactions to your bank or placing a stock purchase or sale order over an unreliable Web service connection?)

II. The OASIS WS Reliability Specification Explained

A. Overview:

WS Reliability is an open specification for ensuring reliable message delivery for Web services. Reliability, in this context, is defined as the ability to guarantee message delivery to "users" with a chosen level of protocol capability and Quality of Service (QOS). Again, the users are either other WS protocols (e.g. WS Security, WS Distributed Management, WS-Notifications, etc), or Application layer/user information messages which are exchanged between the end points of the connection.

To facilitate WS Reliability, there is a need for SOAP based Reliable Messaging Processors (RMPs) -- in the sender and in the receiver endpoints* -- that work together to ensure that messages are delivered in a reliable manner over a connection that may be inherently unreliable.

The sender and receiver RMPs operate on newly defined SOAP headers that are transmitted as either self contained messages, or they are attached to other WS protocol messages or user data messages (all of which are SOAP/XML encoded). Fault messages may extend to the SOAP message body.

*Intermediaries are considered to be transparent in the WS Reliability specification.

The "users" determine the level of WS Reliability. Reliability may include one or more reliable messaging protocol capability for the delivery of WS messages (see II C below for detailed description of these capabilities):

  • Guaranteed delivery to the user or Application entity (the message MUST be persisted (i.e. stored in non-volatile memory) in the sender RMP until delivery to the ultimate receiver has been acknowledged. Either a message is delivered, or the sending application is notified of a delivery failure. A resending mechanism controlled by acknowledgements and handled by RMPs, will overcome occasional connection failures or message loss.
  • Duplicate elimination -- Delivery at most once -- with duplicates detected and eliminated by the RMP receiver. Duplicate messages could be generated accidentally by some network component (e.g. a router), or intentionally by a resending mechanism. In both cases, it is critical for applications that require only a single instance of the message be delivered, independent of how much time elapsed between the reception of a message and its duplicate.
  • Guaranteed message ordering -- when delivered by the RMP receiver to the user, the messages are properly sequenced, in the same order as they were sent. The problem arises when messages are received out of sequence or were resent when acknowledgements are lost. The RMP reorders the messages before delivery to the application, waiting for delayed messages to arrive. (Solution: RMP transmitter retransmits unacknowledged messages -- after a time-out -- and the RMP receiver re-orders received out of sequence messages so that they are properly delivered to the user/Application entity)

The users of the WS Reliability protocol may agree upon any or all of the above message delivery capabilities. Different users or applications may choose different protocol capabilities, which are conveyed to the RMP sender and receiver prior to initiating communications. Alternatively, the receiver RMP can determine the protocol capability via explicit parameter values sent in each reliable message request.

For purposes of the WS RM TC, QOS is defined as the ability to determine the following aspects:

  • Message persistence (ability to store a message until it is reliably delivered to the Application).
  • Message acknowledgement (by the receiver and resending (by sender on No Ack time-out).
  • Ordered delivery of messages (by use of Sequence numbers).
  • Delivery status awareness for both sender and receiver (via state saving and status check- pointing).

The WS Reliability specification defines extensions to SOAP Headers. It is assumed that the payload (user information) is specified using a WSDL description (fault messages may also use the payload to convey fault code information). While WS Reliability is currently based on SOAP 1.1, it could be updated for use with SOAP 1.2, when it becomes a W3C Recommendation.

B. Reliable Messaging (RM) Model and RM Reply Patterns:

In the Reliable Messaging Model described in this specification, the sender node sends a message to the receiver node (i.e., intermediaries are assumed to be transparent in the WS Reliability specification). Upon receipt of the message and at the appropriate time, the receiver node sends back an Acknowledgment message or Fault message to the sender node.

There are three ways for the receiver to send back an Acknowledgment message or a Fault message to the sender. These are referred to as the "RM Reply patterns," which are defined as follows:

  • Response RM-Reply Pattern

    We say that a Response RM-Reply pattern is in use if the outbound Reliable Message is sent in the underlying protocol request, and the resultant Acknowledgment message (or Fault message) is contained in the underlying protocol response message which corresponds to the original request. In essence, the Acknowledgement is "piggybacked" onto the business response message.

  • Callback RM-Reply Pattern

    We say that a Callback RM-Reply pattern is in use if the Acknowledgment message (or Fault message) is contained in an underlying protocol request of a second request/response exchange (or a second one-way message), operating in the opposite direction to the message containing the outbound Reliable Message.

  • Polling RM-Reply Pattern

    We say that the Polling RM-Reply pattern is being used if a second underlying protocol request is generated, in the same direction as the one containing the outbound Reliable Message, to act as a "request for acknowledgment." The Acknowledgment message (or Fault message) is contained in the underlying protocol response to this request. This polling pattern can be used in instances where it is inappropriate for the sender of reliable messages to receive underlying protocol requests e.g. the sender behind a firewall.

These three reply patterns provide "the users" with flexibility to send reliable request/ response or one-way SOAP messages (Callback and Polling patterns). Callback is important for one-way request message patterns and for batching of acknowledgements and fault messages.

Additionally, "polling" enables reliable message delivery to extend beyond the firewall, which might otherwise block external reliable messages from reaching the intended recipient. Polling makes it possible to use the WS Reliability protocol, even when a firewall prevents 3rd parties from initiating messages or requests.

The illustrations of the basic messaging model and the reply patterns are available below.

C. WS Reliability Protocol Capabilities:

Three types of message delivery capabilities are defined in the WS Reliability protocol. One or more of these protocol capabilities may be used with each of the RM Reply patterns defined in II B above. The selection is dependent on prior end user agreements or explicitly inferred by the receiver RMP from request messages.

  • Guaranteed Delivery

To successfully deliver a message from a sender RMP to a receiver RMP without failure; if this is not possible, to report the failure to the sender's application. To realize guaranteed delivery, the message MUST be persisted (i.e. stored) in the sender RMP until delivery to the receiver is acknowledged, or until the ultimate failure is reported to it's requester. (There is a requirement on the underlying transport protocol that the message MUST be transported without corruption.) If message persistence is lost for any reason, it is no longer possible to guarantee message delivery. Since the reliability of message persistence is a property of the system implementation, the conditions under which guaranteed message delivery holds is also a property of the system implementation and is outside the scope of the specification.

Example 1. A PC Server may use a HDD for it's persistent Storage, and those messages persisted in the HDD are reliably maintained even if the the system software crashes and the system is rebooted. However, if the HDD itself crashes, it is no longer possible to guarantee message delivery.

Example 2. A message persisted in a mobile phone may be lost when it's battery is detached. In this case, message delivery is only guaranteed by proper battery maintenance of the mobile phone.

  • Duplicate Elimination

A number of conditions may result in transmission of duplicate message(s), e.g. temporary downtime of the sender or receiver, a routing problem between the sender and receiver, etc. In order to provide at-most-once semantics, the ultimate RMP receiver MUST eliminate duplicate messages and never present them to the user. Messages with the same Message Identifier value MUST be treated as duplicates and not delivered to the application.

  • Guaranteed Message Ordering

Some applications will expect to receive a sequence of messages from the same sender in the same order those messages were sent. Although there are often means to enforce this at the Application layer, this is not always possible or practical. In such cases, the Reliable Messaging layer is required to guarantee the message order. This specification defines a model, illustrated in Figure 3, to meet this requirement.

When the sender application sends three messages (1), (2), and (3) with Guaranteed Message Ordering, the receiver's RMP MUST guarantee that message order when it makes those messages available to the receiver's application (the user). In Figure 3, the receiver's RMP received messages (1) and (3), the receiver's RMP makes message (1) available to the application, but it persists message (3) until message (2) is received. When receiver's RMP receives message (2), it then makes message (2) and (3) available to the application, in that order.

Table 1. Milestones Reached By OASIS WS RM TC
  • Dec. 9, 2003 -- Public Interop Demo at XML/2003 conference: Fujitsu, Hitachi, Oracle, NEC and Sun implemented WS-Reliability CD* 0.52.
  • March 17, 2004 -- OASIS Public Review of CD 0.992 initiated.
  • Aug. 24, 2004 -- TC votes to recommend CD 1.086 for OASIS Member Review.
  • Oct. 16, 2004 -- OASIS Member Vote on WS-Reliability Version 1.1 -- initiated.
  • Oct. 30 2004 -- OASIS Member Vote completed.
  • Nov. 10, 2004 -- WS Reliability becomes an OASIS standard.

*CD= Committee Draft

Table 2. Relevant Standardization Bodies For Japanese Business Grid Project
  • GGF
    • OGSA-WG (architecture, roadmap, WG factory).
    • CMM-WG (resource management).
    • JSDL-WG (job portability).
    • CDDLM-WG (configuration, deployment, lifecycle management).
  • OASIS
    • WSDM TC.
    • WSRM TC (WS Reliability).
    • WSBPEL TC.
    • WSRF TC, WSN TC.
  • DMTF
    • Server Management WG.
    • Utility Computing WG.
About Alan J. Weissberger

As the founder and Technical Director of Data Communications Technology (DCT), a technical consulting firm started in March 1983, Alan J. Weissberger specializes in telecommunications standards and their implementation. His clients have included network providers (AT&T, NTT, Pacific Bell, US West, Entel and CTC in Chile, Telkom South Africa, Moroccan PTT, others), equipment and semiconductor manufacturers, and large end users. In 1995 and 1996 Alan was the principal architect for the European Commission's multi-service, multi-country ATM network -- the largest private network in Europe (that network has now evolved into Gig Ethernet over CWDM). In 2000-01, he was Ciena's lead ITU-T delegate, contributing to the standardization of the optical control plane in SG13 and SG15. Alan now represents NEC Corp in several OASIS TCs dealing with Web Services, while also attending the Global Grid Forum and the Optical Internetworking Forum (OIF).

Weissberger can be reached via e-mail at aweissberger@sbcglobal.net or ajwdct@technologist.com. To read his entire biography, please visit www.gridtoday.com/04/1011/bio.html.

( Top of Page )
  ( Table of Contents )