GRIDtoday Logo UD

DAILY NEWS AND INFORMATION FOR THE GLOBAL GRID COMMUNITY / MARCH 17, 2003: VOL. 2 NO. 11

( Previous Article )   ( Table of Contents )   ( Next Article )

Special Features:

PLATFORM CASE STUDY: SPACE SYSTEMS LORAL

Background

Space Systems/Loral (SS/L)-a subsidiary of Loral Space & Communications - is one of the fastest growing full-service producers of commercial communications and weather satellites. The company has an international base of customers whose applications include broadband digital communications, wireless telephony, direct-tohome broadcast, environmental monitoring, and air traffic control.

Based in Palo Alto, California with worldwide operations, SS/L employs over 2,800 people. In 2000, SS/L generated $1 billion in revenue, and has one of the world's premier facilities for making advanced satellites. Comprised of 29 buildings that encompass 1.3 million square feet, SS/L's facilities house a variety of specialized development laboratories and modern production environments.

SS/L's dedicated test networks-comprised of 270 servers and workstations running Solaris, NT, Linux, and various real time operating systems-are deployed to local and remote test sites and manage an intensive quality assurance process. Its development networks include 42 Windows and Sun Solaris workstations.

The Challenge

An important part of a process improvement effort (made even more significant in the current economic climate) is to "do more with less". For manufacturing firms, shorter production schedules mean increased throughput and increased profit.

SS/L identified two areas as candidates for improvement: spacecraft test system validation and recurring administrative tasks.

For system validation, SS/L needed a solution that could monitor/test its distributed computing environment, enable users to visualize system status and automatically prevent or correct system failures. To reduce recurring and unnecessary labor costs, SS/L needed to automate its system administration procedures including host configuration and data management (test data, logs, etc). By streamlining the manual testing process and eliminating downtimes, SS/L engineers could ultimately increase their productivity and maximize throughput.

To reduce recurring and unnecessary labor costs in maintaining its test and development networks, SS/L needed a solution to automate its system administration procedures, monitor its distributed computing environment, and automatically prevent and handle potential system downtimes or failures.

  • The SiteAssure software provided SS/L with multi-platform support for its heterogeneous technology infrastructure.
  • With SiteAssure, SS/L has a simple solution for viewing complex resource information.
  • SiteAssure has drastically reduced the time spent by system administrators for troubleshooting.
  • SS/L has minimized developer downtime, increased productivity, and boosted its throughput capability.

The Solution

SS/L evaluated a number of different open source and commercial software applications to address its challenges. As Platform LSF was already in production with SS/L's design and modeling systems, it was a natural choice to include Platform SiteAssure in the evaluation process. SS/L's goal was to implement a resource management solution in under a month.

SS/L very quickly narrowed the field to SiteAssure. While SiteAssure enabled SS/L to monitor its systems and automate system administration tasks, it was also the only solution with the ability to probe and track the internal state of applications that SS/L was developing.

Today, 50 SiteAssure agents are deployed across SS/L's test networks and development environment. Platform SiteAssure monitors SS/L's host systems and provides notifications of problems such as limited disk space, system abnormalities, excessive memory swapping, or CPU utilization. It also automates recurring tests that were previously done manually. Policies established in SiteAssure handle any problems, deliver automated alerts, and provide feedback on certain operational levels.

Platform Value

While other solutions were either Windows only or UNIX based, the SiteAssure solution provided much needed multi-platform support for SS/L's heterogeneous technology infrastructure. Platform also worked together with SS/L to develop the binary port monitoring agent (now a part of the SiteAssure solution), which allows arbitrary protocols to be used to probe services. SS/L uses the agent to gather metrics on the internal state of SS/L developed servers. The agent was developed by Platform within 12 weeks, beating all time estimates.

The Agent Interface contained in SiteAssure provided SS/L with the flexibility to custom develop its own monitoring interface to display the status and health of its workstations and servers. Using color-coded data flows, the monitoring interface brings together a combination of SiteAssure agent information and additional metrics from SS/L's existing applications. This provided a simple solution for visualizing complex information.

As SiteAssure enables SS/L engineers to access and evaluate system information from their own workstations, SS/L has dramatically reduced the time that system administrators and developers spent troubleshooting. The color-coding system enables system users to immediately identify the source of any problems and contact the responsible parties (system administrators for hardware issues, and developers for software) to further isolate and resolve the problems. Previously, it would take two hours for up to three developers to isolate, identify and resolve a problem. SiteAssure has narrowed this down to one person supporting a problem for a maximum of one hour. This minimizes developer downtime, increases productivity and boosts SS/L's throughput capability.

SiteAssure also automatically notifies system administrators when workstations and servers are offline for location changes, which helps streamline the reconfiguration process and ensures minimal network interruptions.

Looking forward, SS/L has identified that at least half of its development workstations are candidates for implementation of SiteAssure software, and plans to implement an additional 85 SiteAssure licenses.

"Platform has increased our productivity tenfold. By monitoring and troubleshooting problems with intelligent corrective actions, SiteAssure not only minimizes disruptions for our primary engineering team, but also reduces recurring system administration costs." Jim Jaquet, Test Software Section Supervisor, Spacecraft Test and Operations World Headquarters

About Platform

Platform is the world's leading distributed computing software provider, with desktop to Grid solutions that allow organizations to dramatically improve time to market and quality of results, while maximizing their I.T. investment. Platform has strategic relationships with industry leaders including Compaq, HP, IBM, SGI, Cadence and SAS Institute and its open, scalable software solutions are the choice of more than 1,500 result-driven organizations around the world. Platform is a private company with 400 employees in 14 offices in North America, Europe and Asia.

Web site: www.platform.com

( Top of Page )

( Previous Article )   ( Table of Contents )   ( Next Article )