meta.aspx
   
Newsroom
Previous Article

Next Article

  DBS AND IBM DETAIL FINDINGS OF 5 JULY OUTAGE

***

 Wide range of actions underway to prevent recurrence

Singapore, 4 August 2010 – DBS and IBM today announced that the portion of the investigation into the DBS systems outage on 5 July 2010 related to determining the cause of the incident has been concluded. DBS and IBM jointly provided a detailed account of events which preceded the outage, and consequent recovery activities and actions.

Events preceding the outage
IBM determined that a repeated failure to apply the correct procedure when addressing instability in the communications link of the storage subsystem resulted in the service outage on 5 July.

IBM’s immediate priority was to ensure that customer data was not in any way compromised while services were being restored as quickly as possible. DBS' services were restored the same morning with full and complete data integrity.

Prior to the outage, the following events took place:
• 3 July 2010, 11.06am: IBM software monitoring tools sent an alert message to IBM’s Asia Pacific support centre located outside of Singapore.  It indicated there was instability in a communications link in the storage system which was connected to a mainframe.  At this point, the storage system was functioning. An IBM field engineer was despatched to the DBS data centre and was given approval by DBS to repair the machine. 

• 3 July 2010, 7.50pm: The cable in question was replaced. The IBM field engineer did not use the machine’s maintenance interface but used the instructions given by the support centre.  Although this was done using an incorrect step, the error message ceased.  The storage system was still functioning.

• 4 July 2010, 2.55pm: The error message reappeared. This time, it indicated instability in the cable and associated electronic cards. The IBM field engineer was despatched for the second time to the data centre. He diagnosed and escalated the issue to the regional IBM support centre.

• 4 July 2010, 5.16pm: Based on instructions from the regional IBM support centre, the cable was removed for inspection and reseated, using the same incorrect step. The error message ceased. The storage system continued functioning.

• 4 July 2010, 6.14pm: The error message reappeared. Over the next five hours and 22 minutes, the regional IBM support centre analysed the log from the machine and recommended to the field engineer that he unplug the cable and check for a bent pin.  The storage system continued functioning.

• 4 July 2010, 11.38pm: The IBM field engineer did not find a bent pin and reseated the cable. The error message persisted. The storage system was still functioning and able to communicate with the mainframe. The regional IBM support centre and the IBM field engineer continued diagnosing the issue, including reseating the cable for a second time.

• Subsequently, DBS was contacted and authorised a cable change at 2.50am, a quiet period, which is standard operating procedure. While waiting to replace the cable, the IBM field engineer decided to inspect the cable again to ensure that it was not defective and that it was installed properly. He then unplugged the cable for inspection using the previous incorrect procedure recommended by the regional IBM support centre.

• 5 July 2010, 2.58am: The cable was replaced using the same procedures. This caused errors that threatened data integrity. As a result, the storage system ceased communicating in order to protect the data.

At this point, DBS banking services were disrupted. 

If the correct procedures had been used, the storage system would have automatically suspended the communications link and the machine would have instructed the engineer to replace the cable and both cards together and maintain redundancy of the system.

As data integrity is considered a higher priority than availability, the storage system is designed to automatically cease communicating under these conditions. In doing so, the system preserved full data integrity.

In spite of the machine’s high availability and redundancy, these incorrect procedures caused the outage.

Recovery actions following the outage
Immediately after the outage occurred, IBM informed DBS and an onsite technical command function comprising DBS and IBM staff was activated by 3.40am.

The immediate priority was to ensure that customer data was not in any way compromised while restoring services as quickly as possible.

This process required time to ensure that data integrity was maintained. This included careful efforts to reconcile data in the cache and disk within the storage subsystem. 

A restart of the systems was initiated at 5.20am. Banking services were restored progressively from 10am to 12.30pm on 5 July 2010.

IBM restored the system with full data integrity.

Actions moving forward

IBM has taken steps to enhance the training of all related personnel on current procedures. IBM has brought in experts from its global team to undertake a root cause incident report with recommended actions.

Corrective procedural actions include:

  • Immediate review of the operations of the regional IBM support centre to ensure adherence to maintenance procedures and quality control processes. This was completed on 30 July 2010.
    • Monthly recertification of every IBM engineer involved, on the most current procedures.
    • Enhancements in the escalation processes for incident diagnosis and recovery.
  • IBM personnel directly involved with this incident have been removed from direct customer support activity and disciplined.
  • IBM has also appointed technical advocates from IBM Development to provide deeper technical expertise to DBS.

Regional General Manager, IBM ASEAN Cordelia Chung once again apologised to DBS and its customers for the inconvenience caused by this incident.

“The corrective and preventive actions which we are taking are of the highest priority for IBM. We have also taken steps to review installations of the same storage system at other financial institutions in Singapore for whom we provide maintenance services,” she said.

David Gledhill, Managing Director and Group Head of Technology and Operations at DBS said, “DBS has taken steps to improve the bank’s internal escalation process and the speed and manner in which it reaches out to customers during periods of service disruptions. The bank is defining specific ‘red alert’ scenarios that will automatically trigger group-wide crisis management procedures, including specific actions to be taken for each of the red alert scenarios identified. DBS is also implementing additional modes of internal alerts and implementing new processes to expedite escalation.”

Today, MAS announced that it requires DBS to apply a multiplier of 1.2 times to its risk-weighted assets for operational risk, which translates to the bank setting aside an additional amount of approximately SGD 230 million in regulatory capital on a group basis based on numbers as at 30 June 2010. DBS’ Tier 1 capital and total capital adequacy ratio (CAR) as at 30 June 2010 was 13.1% and 16.5%, respectively. The sanction would result in DBS having proforma Tier 1 capital of 12.9% and total CAR of 16.3%.

Piyush Gupta, DBS Chief Executive Officer said, “The system outage is of grave concern to us and we acknowledge MAS’ censure. DBS would like to assure customers that taking into account the regulatory capital charge, our total capital adequacy ratio is still comfortably above the required levels. Measures to strengthen our technology and risk management controls are also well underway. Twelve months ago, DBS commenced a two-year programme to further enhance our system reliability and resilience and we are accelerating the implementation of these initiatives. DBS is deeply sorry for the outage and once again, my apologies to our customers for all the inconvenience caused." 


About DBS

DBS - Living, Breathing Asia
DBS is one of the largest financial services groups in Asia with operations in 15 markets. Headquartered in Singapore, DBS is a well-capitalised bank with "AA-" and "Aa1" credit ratings that are among the highest in the Asia-Pacific region.

As a bank that specialises in Asia, DBS leverages its deep understanding of the region, local culture and insights to serve and build lasting relationships with its clients. DBS provides the full range of services in corporate, SME, consumer and wholesale banking activities across Asia and the Middle East. The bank is committed to expanding its pan-Asia franchise by leveraging its growing presence in mainland China, Hong Kong and Taiwan to intermediate the increasing trade and investment flows between these markets. Likewise, DBS is focused on extending its end-to-end services to facilitate capital within fast-growing countries in Indonesia and India.

DBS acknowledges the passion, commitment and can-do spirit in each of its 14,000 staff, representing over 30 nationalities. For more information, please visit www.dbs.com
 
About IBM
IBM creates business value for clients and solves business problems through integrated solutions that leverage information technology and deep knowledge of business processes. IBM solutions typically create value by reducing a client’s operational costs or by enabling new capabilities that generate revenue. These solutions draw from an industry leading portfolio of consulting, delivery and implementation services, enterprise software, systems and financing.

For more information on IBM, please visit http://www.ibm.com/sg.

 

 

DBS Group News Releases
Further Assistance
Terms & Conditions | Privacy Policy | Fair Dealing Commitment | © 2007 DBS Bank Ltd | Co. Reg. No. 196800306E