‘Tis the Season to Avoid Enterprise IT and DevOps Gremlins

by: Brian Baggett / November, 27 2018

The likelihood of dealing with enterprise IT gremlins is heightened during certain times of the year for any DevOps team…

The likelihood of dealing with enterprise IT gremlins¹ is heightened during certain times of the year for any DevOps team. My brother, who works in IT Disaster Recovery for a healthcare agency, reminded me of this during our most recent Thanksgiving gathering. He had to address four hours of downtime right before the holiday, as something DevOps related pushed a change to the production system instead of in a test environment. Sound familiar?

Whether it’s a holiday, close of the quarter, or “go live” day, any number of factors can put a little extra stress on IT staff with more of a chance for network gremlins to plague any enterprise. Although not as mischievous as mythical gremlins, sloppiness causes trouble, difficulties, or unexpected failures—threatening security as well as contributing to downtime and poor performance.

Self-Service Resources and IT Automation

Keeping gremlins at bay can be achieved with a solid plan for self-service options and IT automation. End users need to have access to hardened resources and processes when others who have the keys to these resources are on PTO or swamped by other high priority projects.

Leaving users in the dust while waiting for resources or an update can make them turn to workarounds or short cuts. The idea is that you don’t want anyone in your organization going rogue during the stressful times. The more that enterprise IT and DevOps teams have self-service IT enabled, the less likely the chance for folks to fend for themselves.

Making any DevOps practice or IT process bulletproof for occasional mishaps is nearly impossible, but reducing the likelihood is worth the effort needed by using the following approaches:

Eliminate bottlenecks
Consider a typical workflow from start to finish and make sure that if there are dependencies that require manual input, you have taken that into consideration and have an alternative method for achieving the end result. One way to do this is to make sure that administrator access is enabled for trusted individuals who can step in if the primary admin is not available. In some cases, this person could be above or just below the person on the org chart. Get your boss’s boss to intervene when necessary and you’ll be guaranteed to move the bottleneck issue along a little faster.
Automate approvals
Always consider routine approval processes and automate them whenever you can. That does not mean to approve any request automatically but rather to set up automated checklists so that if the request meets those requirements, there’s no need to have a manual approval. You could also set up specific sets of resources that meet the requirements without an approval. This is particularly useful when you want to have self-service resources but not an open faucet. IT automation eliminates the unnecessary manual errors.
Consolidate resources
Another way to reduce mishaps or what is considered the “who’s on first?” effect is to make sure that resource management is centralized to specific teams with defined roles and plans for backup coverage. When resources and roles are scattered throughout the whole organization and someone with a key role is out on PTO, you’ll be scrambling to figure out where to get the IT resources you need—just like the old Abbott & Costello skit.
Embed security
Security must be part of the whole process from start to finish. When provisioning IT resources on premises for both private and public cloud environments, there’s special consideration for containerization and other virtualized environments in the cloud. Here’s a quick reference for security concerns and DevOps resources: Enterprise Hybrid Cloud Containerization and Rugged DevOps and DevSecOps for Security. This post drills down to these two manifestos, which are also helpful in hardening security issues.
- Rugged Software
- DevSecOps

A centrally managed platform like CloudBolt can get any IT organization on the right path to avoiding the “gremlin” effect, especially as we approach another holiday season and schedules and priorities will undoubtedly be different for many enterprises.

1—Gremlins are unexplained problems or faults (↑BACK↑)