Software platforms permeate the fabric of our lives, yet only 27% of CEOs in the Fortune 100 have degrees in engineering and science. Join a quarterly earnings call, and you’ll hear plenty of discussion about revenue, expenses, and geographic trends, but little (if anything) about the quality of the company’s software. The results are obvious: For nearly every major disaster caused by software defects, the postmortem usually determines that the defect had been around for some time.
The problem is not that company leaders need to have engineering backgrounds and don’t, but that few outside of engineering silos know how to discuss critical software systems. As a result, software bugs generally stay below the radar of the CEO unless a cataclysmic event occurs.
The sudden-acceleration problem in Toyota cars was a textbook case of what can happen without a proactive internal quality system. In 2004 the National Highway Traffic Safety Administration opened an investigation into complaints about the electronic throttle control in the Lexus ES300s. But Toyota didn’t issue large recalls — or halt the sales of affected models — until 2010. When a lawsuit against Toyota was settled three years later, two expert witnesses reported that the “system was defective and dangerous, riddled with bugs and gaps in its failsafes that led to the root cause of the crash.” A Toyota programmer described the engine control application as “spaghetti-like.” The drawn-out nature and lack of resolve of this case are clear signs there was an inadequate quality review system in place. With a good system, that bug would have been visible to top leadership month after month, early and often, prompting efficient — and life-saving — action.
Company leaders can, and should, be intimately involved in software quality, just as they are involved in sales and finance divisions. This means understanding how technical teams work and implementing a quality management system.
What Leading the Way Looks Like
I worked in IBM’s mainframe division at a time when both IBM and Microsoft had quality issues that caused significant business headwinds and customer satisfaction issues. In the early 1990s IBM was experiencing serious field quality problems in addition to difficulties with meeting deadlines for new products. CEO Louis Gerstner Jr. was frustrated that, on the date some products were due, he was told they would miss the launch date by a year. In a legendary memo Gerstner offered an amnesty program — 30 days to reset the dates, and after that missing a date would be cause for dismissal.
The task of creating new schedules was given to Nicholas Donofrio, the SVP of technology and manufacturing. Donofrio’s motto was “Be forthright [about your schedule and quality issues] and I’ll be forthcoming [about getting you the needed resources].” He did not have direct line management for the systems (both hardware and software) business, but since he was the prior leader of the business unit, he had in-depth knowledge and deep personal relationships. Almost 80% of the product dates were reset, and each product organization established end-to-end quality management systems. By 1999 IBM was the worldwide server leader, with 23% market share.
Similarly, in the 1990s Microsoft’s Windows operating systems had a series of bugs that resulted in computers’ frequently freezing. The public largely got used to those failures — “blue screens of death” — but then Microsoft was hit by other bugs: As Windows systems connected to the internet, they suffered many embarrassing hacks, viruses, and security problems. Bill Gates responded to the crisis with a memo issuing a call to arms, which was sent to all 50,000 employees and simultaneously published in Wired. In it he defined “trustworthy computing,” a broad set of initiatives to improve security and product design, and outlined the prerequisites for a broad quality management system (QMS), including changes in software design and development processes, new error-reporting capabilities, and new update features. By 2000 Microsoft’s revenue crossed $20 billion, and the company reported that customers were citing the newly released Windows 2000 server as the most reliable version to date.
These two examples show how leaders can turn around deteriorating engineering divisions by asking simple questions, setting standards, and seeing themselves as part of the process.
When Software Is Critical, So Are Bugs
Creating an effective QMS that includes engineers and top leadership alike should be approached as an evolutionary exercise, starting simple with a focus on high impact early on. CEOs should ask two simple questions about a product that has recently shipped:
1. “What criteria was used to determine when the product was ready to be shipped?” There should be a clear discussion on the amount of time in system testing, the type of tests completed, and the precise criteria used in the decision to ship the product.
2. “What is the current defect status after the first six months?” It is normal to see an uptick in defects after the initial shipment, because of increased usage in real-world environments, but leaders should probe how engineering teams are prioritizing bugs and categorizing the most severe defects. With this information, leaders can drill down on the key metric of days open for a defect, to ensure the most severe defects are being fixed in an expeditious manner.
The 2×2 grid below shows how to grade a quality management system based on answers to those two questions. Along the y-axis you measure how siloed the information is, based on how quickly the questions are answered. Along the x-axis you measure the organization’s response to your questioning, based on how proactive the team is about addressing bugs. A “Mature” QMS can be identified by the team’s sharing defect information quickly and earnestly with top leaders; in a “Troubled” system, answers are brought forward slowly and with a great deal of antagonism.
If you find your company is in the Troubled quadrant, your first step should be to immediately create a proactive focus on defects in order to move to the Learning quadrant. This is what Gerstner and Gates both did. Your focus should be on creating a culture where honesty is rewarded and employees feel safe discussing methods and techniques to address software defects. In this environment, information should flow easily between the required silos, moving your system from Learning to the Mature quadrant.
Next-Level Quality Management
If you’re creating a QMS from scratch, your first step is to decide on how the organization will classify and prioritize bugs. This should be done by the client-facing teams in conjunction with clients. Generally, teams will prioritize two types of bugs: those that cause a system to crash and a loss of service (these earned top severity at IBM) and those that are less severe but could be pervasive.
Next, as an organization, decide your target response time for each level of severity. If the QMS is new, then the initial focus should be on fixing the most severe bugs within hours or days. As you use your system, you can gather data on two key metrics, incoming bug rates and the productivity of the bug fixers, and adjust your targets as needed. Finally, you should create a review system that involves yourself and other top leaders. Reviews of open defects and time to resolve a defect should be done with various degrees of detail at all levels of the organization.
Once the QMS is established, the CEO is unlikely to see many old bugs simply because of the fact that nobody wants to give the same excuse two months in a row. The CEO could also review all of the high-severity bugs and the pervasive ones. The CEO could ask the simplest question: “How did the software get released with that type of bug?” The product development team might respond with software engineering jargon about “escapes,” and the CEO could then ask an almost rhetorical question: “Will we test those conditions next time before releasing the product?”
This type of QMS will lead to improved client experiences. To understand how important this is, consider a software defect that I encounter regularly in my bank’s online checking system: An expense-code function frequently fails, requiring me to reboot my computer. I have complained countless times over the last year, and all the bank says is, “We told IT, but we don’t know if they will fix it.” This is a sign that the bank lacks a rigorous QMS, which leaves its financial advisers looking helpless and annoys its customers, perhaps causing them to change banks. I wish I could tell them that they can make a turnaround, as IBM and Microsoft did.