For LA County IT Chief, Disaster Drill Was Big Success
“This is a drill. Assume a major earthquake has happened and the building is uninhabitable. You may not come to work.”
For the rest of the day, that large staff worked everywhere and anywhere except at the office. Many worked at home, at a public library or at some other location with secure Internet. For others, it was a day in the park — literally. Thus began a test of the Continuity of Operations Plan (COOP) of the IT department in the nation’s largest county.
Dave Wesolik, general manager of the county’s sprawling Internal Services Department (ISD), recently reviewed the daylong exercise for Techwire and the background leading up to it. In order to test the agility of the department, almost no one knew ahead of time that the drill was planned.
“We’ve been working on emergency planning for quite a while, and it’s been evolving into serious levels of business continuity planning and so forth, trying to make sure we’re ready for the Big One,” Wesolik recounted. “So, we have been doing modular or compartmental-type testing for data center folks: What happens if we go all out and we have a major earthquake on one of the two data center faults? We thought, ‘We need to be the first backup for the rest of the county.' … So we started planning an all-in: Everybody has to be involved in a major scenario.”
That meant sending out robo-calls beginning at 5 a.m. to all 1,100 ISD staffers. If someone didn’t respond to a text or email, they’d get a phone call — “and they’d keep getting them until they answered,” added Rebecca Friedman, ISD’s Media Services director, who was interviewed along with Wesolik for this story.
Using a system called Everbridge, the county’s massive call-and-response drill was a success, Wesolik said.
“Everyone who participated got the notification that morning,” he said. “Just 15 people out of 1,100 hadn’t checked email, didn’t get the phone call, whatever the reason was. It was pretty successful overall.”
Having everyone from ISD’s IT Services (ITS) division working remotely didn’t cause any significant interruptions of service. County staff in other departments were able to conduct their business seamlessly.
“It was great,” Wesolik said. “I’d give them an A. I was very satisfied, both with the response of the teams getting set up by 9 a.m., as well as everybody reporting in to their management as to their secondary location.”
While many staffers were happy to be able to skip the commute and work from home for the day, for others, it was a chance to get out of the office and work outside. Wesolik said the county send a brigade of more than 100 staffers and truckloads of hardware to Whittier Narrows Park — “microwave connections, 4G connections, satellites, generators and all these different alternate communications. And we set up out network monitoring, our data center monitoring and our help desk out of tents in a parking lot out there, so we could operate in a worst-case scenario.”
Wesolik drew an analogy between the county’s COOP drill and a theater troupe rehearsing for a show.
“It’s like if we were putting together a Broadway show for New York, we’d do dress rehearsals all the way up,” he said. “So, ‘OK, Scene 1 does a rehearsal, then Scene 2, and finally we put this whole thing together and said, ‘OK, we’re doing the final dress rehearsal before the opening.’”
Along with the overall success, Wesolik said, came some lessons:
- "The hard part of what we’re doing is to determine productivity loss” from having a staff of 1,100 working out of the office for a day with no warning. “We’re having (top managers) confirm that first-line supervision, secondary-line management, etc., were tracking the productivity of every employee that they had in their particular organization. So now each division has written an incident report to reflect how they were able to measure that. We’ve built to this. We’ve got people in the office who’ve done COOP drills for other organizations, and we’re leveraging their expertise.”
- "Another was that we needed more VPN licenses. … We’ve already taken care of that.”
- Finally, the non-IT elements of the COOP exercise were lacking, he said: “If this were actually a real emergency, we’d need to have more dedicated infrastructure. We were able to get generators rolling out to the sites with our microwave systems, and we were able to get chairs and tables and tents … but all that stuff is shared with other organizations, and in the event of a real emergency, we might be fighting over that stuff. And so we need to make sure we’re nailing that down before a real emergency occurs.”
More significantly, he said, on a day when hundreds of things might have gone wrong and wreaked havoc in other county government agencies, “We got zero negative feedback from departments,” he said. “It was business as usual as far as they were concerned. There was no, ‘Hey, great job on Thursday’s operations,’ because Thursday’s operations were just like Wednesday’s and Tuesday’s operations.”
LA County IT Services division: 1,100 employees
Number who failed to report in: 15
Data centers affected: Two
Telecom downtime that day: 30 to 40 minutes
Public phone calls dropped: 37