Why Defining Your Risk Model is the Best Place to Start with Mission Critical Operations
This week is the halfway mark for the 2015 Atlantic hurricane season. Many facility managers, building owners, and businesses along the Eastern seaboard of the US will heave a sigh of relief when we officially make it to the end of November. Knock on wood, but it has been a relatively uneventful season—so far. Although certainly timely, the subject of critical building systems and emergency preparedness is in actuality most efficiently and most effectively addressed during the pre-design phase. This is when weighing options for where and how much to invest in redundancy and resiliency can be measured against your future (or future upgraded) facility’s individualized risk model.
Earlier this month, I sat down with HLW Principal and Director of Engineering Joe Calabrese, LEED AP, and Power Management Concepts (PMC) president Peter Curtis. We discussed mission critical operations in the context of assessing risk, whether Mother Nature is to blame or something more human in origin, like deficiencies in the power grid or even terrorism.
A Note about the Mission Critical Experts Consulted for this Article
HLW and PMC recently formalized a working relationship as a strategic alliance. When combined, PMC’s mission critical expertise and HLW’s strength in architectural and engineering services have a distinct advantage for our clients; budgetary considerations and issues integral to the basis of design are addressed early and at every stage of the process, from site selection to construction and ongoing maintenance protocol. HLW and PMC are currently working together on the expansion of a “flagship data center” in Commack, NY, for managed services supplier mindSHIFT Technologies.
Cost of Downtime in Mission Critical Environments
“Before determining where and how much to invest to prepare for a potential emergency situation, you have to first identify the cost of downtime,” said Peter Curtis, who emphasized that an effective risk model must be defined in terms of lost revenue to the fraction of a minute. In the manufacturing, media, and healthcare sectors the cost per hour for failing to maintain continuous production operations can range from $600,000 to $1.6M. For financial institutions that demand continuous trading or banking operations, the cost is significantly more, between $6.4M to $20M.
To put the cost of downtime in relatable terms for just about any business—because every company has a “critical” aspect to its operations—99.999% availability will cost you 5.25 minutes of downtime each year. So, what is the bottom line behind your 5 minutes?
Joe Calabrese expanded on the idea, “You want the ‘degree of preparedness’ decision-making process to start as soon as possible. How much risk (and when and where) that you’re willing to expose your critical infrastructure to during an emergency—those kind of decisions need to be made long before you formalize any plans or design and, certainly, before you commit to specific redundancies.” The added value of this kind of investment approach is increased awareness about precisely how design and operational options can impact the entire mission critical development process.
Emergency preparedness than becomes about reducing risk as defined by your facility’s specific risk model.
What Emergencies Can Teach Mission Critical and Emergency Management Professionals
“Past emergency situations and how an organization handled a stress or disruption to its critical infrastructure can provide very helpful information to the emergency management professional who is not sure where to allocate dollars for future resiliency,” said Peter. “Emergencies teach us to prioritize our capital dollars.” Peter went on to explain how identifying infrastructure weaknesses can focus attention on the internal political roadblocks that have the potential to “value engineer out” proposed improvements or necessary budget line items.
Challenging situations can also serve to focus (or refocus) our efforts by forcing an organization to continuously evolve its messaging to align with a changing environment and to discover new ways to be innovative. Joe defined true measures of resiliency as “your infrastructure’s ability to adapt to changing conditions and to function at normal capacity, despite a specific stress or disruption.” What is that old adage? If you are standing still, you are also going backwards.
Immediately after a critical event, mission critical and emergency management professionals should sit down with all stakeholders and get buy-in and budget approval for the next steps to business resiliency. “Emergencies can be treated as a kind of pop quiz,” said Peter. “Consider them an unplanned wake-up call that allows us to practice and to implement better business resiliency strategies and plans.” Emergencies provide real world experience, as well as a benchmark for defining areas for investment and for improvement, including overall enhanced situational awareness among staff through new or refined best practices.
Critical Building Systems Emergency Preparedness Pop Quiz
Peter’s company PMC specializes in developing transition-to-operations documentation. He describes this service as the “owner’s manual” for a company’s critical building systems infrastructure. Optimizing operations and energy utilization are at the top of the list, but so is emergency preparedness. He sometimes uses the following informal “survey” to help owners and facilities managers to get into the mindset that it’s about protecting assets throughout the long lifecycle of their building.
How would your organization be impacted as a result of losing critical personnel due to employee retirement, sick leave, or internal promotion? Keep in mind that greater than 50% of downtime is attributed to human error.
- What knowledge would be lost?
- Where is the critical documentation?
- How do you train new personnel?
- What risks will your business face during a transitional period?
How would you rate your staff’s level of situational awareness preceding and during critical events, like fires, blackouts, natural disasters, terrorism, and building equipment failures?
- Who is contacted in the event of an emergency?
- Can you identify the critical systems that support production and infrastructure?
- What are your emergency procedures—and where/how do you maintain them?
- When was the last time that you updated your emergency procedures?
- Will you have sufficient time to respond to an emergency?
How would you rate your daily operations? Consider the consistency (or inconsistency) of data and information sharing. Also, how accessible is your documentation, and how structured is your work flow?
- How is information shared and leveraged throughout your organization?
- Is system data readily accessible?
- Are the EOPs, SOPs, and ARPs up-to-date?
- How are revisions approved and made available to all users?