IE11 Not Supported

For optimal browsing, we recommend Chrome, Firefox or Safari browsers.

Amazon Fleet Management: Meet the Man Who Keeps Amazon Servers Running

Meet Brian Herman, he oversees the servers and services that run the retail, Kindle, Alexa, and Instant Video factions of Amazon.

amazon-web-services-logo-140rgb.jpg
Picture this: You oversee a fleet of servers that supports one of the world’s most massive online retail companies, and a special promotion everyone thought would bump sales by 21% is actually giving closer to a 120% boost—all in the first minute. That’s exactly where Brian Herman, Director of Datacenter Compute Capacity at Amazon Web Services, was standing in 2015, moments after the first Prime Day launched. The fleet Herman oversees doesn’t involve a single truck or plane. Instead, it’s the servers and services that run the retail, Kindle, Alexa, and Instant Video factions of Amazon. As he remembers it, that low projection initially “made a lot of sense to us. This was a sale that nobody knew was coming. We didn’t announce it until a week before the event, so there was no hype. Also, it was only for our Prime Members, which is a subset of our total shopping population.”

But luckily, being nimble enough to handle even the most unexpected traffic deluges is pretty much Herman’s job description—something he’d honed while working his way up the ladder of Amazon’s internal systems teams since 2009. But back to those pivotal 2015 moments: “We sounded the alarm to all of the software teams, and across the company they jumped into action. First, we had our software teams turn off anything that wasn’t used by the production website. Next, we reached out to teams developing new software products that weren’t live yet. We got any available hardware and pressed that into service for production traffic. Then we reached out to the AWS services and said, ‘We’re going to have a big on-demand day. Get ready for it.’ We ramped up and used every single EC2 reserved instance that we owned, and a whole lot of on-demand EC2 capacity, and went on to have a very successful day.”

Herman’s tasked with metabolizing experiences like that monster first Prime Day into bigger and better future events, while keeping a realistic view of the groundwork that’s involved. Examples of the progress Herman’s team has made includes spreading out deals—so there’s great stuff to shop throughout a Prime Day or Black Friday—and distributing traffic across servers.

He’s not content to lean on AWS’s capacity to handle traffic influxes alone. As Herman puts it, “I think one of our big lessons from Prime Day 2015 is we did the best we could, and the elasticity of AWS really saved us, but we never want to see that happen again. It’s important if you want to have a successful event to plan that event. And so, even if AWS has got your back from a capacity perspective, you need to think about what’s going to happen with the rest of your business. It’s important in that planning process to bring your software teams and leaders together to talk about the interdependencies amongst each other, and how things are going to work.” He continues, “One of the things we do at Amazon is this full end-to-end testing of the entire retail website in the weeks leading up to an event. We call those ‘game days’. It’s a chance to see how the whole website is going to perform.”

Herman is also quick to point out you can borrow the same types of event management strategies, even if you don’t have Amazon’s considerable resources at your disposal. He recommends sitting down with your team leaders and considering, “Okay, what’s the most rigid part of our infrastructure? Is it a database? Is it an old piece of code that we haven’t replaced in years? What happens if that thing fails?” It’s solid advice from a guy who’s seen the kind of traffic surges most websites dream about.

Amazon Web Services (AWS) Worldwide Public Sector helps government, education, and nonprofit customers deploy cloud services to reduce costs, drive efficiencies, and increase innovation across the globe.