Encomium to Technology: Six Sigma in software development

Six sigma by definition is a process or methodology which when practiced in production, ensures the output quality lies within 6 standard deviation from mean or average quality. Interesting as well as convoluted definition, but lets now understand what exactly it means. Let’s first remind ourselves of normal distribution. The picture below, shows the normal distribution which is centered at mean.

68% of times the outcome lies between 1 sigma i.e. 1 standard deviation from mean. The outcome is 95% of times lies between 2 standard deviation and so on.Considering further, how often does the outcome lies within 6 sigma i.e. six standard deviation which would be way out of the graph shown above. The answer is the only time it will lie in 6 standard deviation is 3.4 in a million. All confusing till now, but lets now understand with a small example. very morning when I drive to office, which is 25miles from house, on an average I need 5 gallons of gas. I calculated for some time, I noted with one standard deviation 1 gallon.

mean = 5

standard deviation = 1

the question now to be answered is, how can I be sure of not running out of gas while going to office. That is, how to do I ensure quality traveling time to office by making is predictable and making sure I do not run out of gas ever.

With above values, of mean 5 and 1 standard deviation of 1 let’s calculate 6 sigma value, which is a simple calculation

6 sigma = 6 * 1 = 6

mean or average is 5, so all I need to do is to always have 5 + 6 = 11 gallons of gas in my car. Having 16 gallons I achieve 6 sigma in process of reaching office i.e. it would be only a chance of 3.4 in a million that my car would go out of gas. If I travel to office a million times, it would be 3.4 times out of million that I have a chance of running out of gas. Which in other terms under similar conditions would never happen.

As we now understand the basic definition of 6 sigma, let’s now see how it can be used in software development. Software development, unlike manufacturing is an iterative process. Some companies also call their software development as research & development. Calling it R&D teams is quite natural as work done is not very predictable and quite often tend to change directions. Over the years, there has been a number of recommendations on quality control, so as to to bring predictability to the whole process. Predictability when talked in terms of 6 σ would mean a plan which does not get us more than 3.4 opportunities of failures in a million, while saying million it means we never miss the target. When defined with software quality this would mean 3.4 failures in 1 million software execution. With my software development experience that is not at all an easy target.

Any statistical process requires gathering data i.e. creating population of opportunities. The opportunities in software development are nothing but lines of code written. Every line of code written there is an opportunity of defect created and is a process results in creating only 3.4 defects in 1 million opportunities we possibly term it as 6 sigma compliant.

Modern software are complicated, and trends suggests that it will become even more complicated in the near future. The number of bugs per thousand lines of code (KLOC) varies from system to system. Estimates are between 5 to 50 bugs per KLOC. On average, each module is about 100K bytes in size. Assuming that a single LOC results in 10 bytes of code then by conservative rate of 10 bugs per KLOC, each executable module has about 50 bugs. This is industry average, the average could very well vary. Taking industry average as 5 bugs per KLOC with a deviation of 1 i.e.

mean = 10

sigma = 1

Can we say a 6 σcompliant software development process would not give out more than more than 16 bugs per KLOC? Possibly yes, but only when we classify what these bugs are. For e.g. if the bug or defect is about not meeting the level of performance then it is a critical issue and cannot be ignored. The 6 sigma adherence on such situation would be meeting the performance by making sure availability of resources in abundance. Let me highlight this more with another example, a disaster recovery solution for a datacenter emphasis on creating a disaster recovery site outside the data center. The site is usually created at a distance far enough so as it is not impacted due to any disaster within datacenter or any natural disaster that might impact the data-center. For convenience let’s call the disaster site as satellite site. The major issue with disaster recovery software need to solve is to make sure data from local site actually reaches the remote satellite site. As we all know, IP network is connectionless, sending data in packets over wire. The layers above IP viz. TCP can be utilized for numbering the packets for proper data organization. This still does not make sure packets have actually reached over to satellite site and it is now responsibility of application layer to perform either retires or other mechanism to ensure data availability. In order to have such quality built it, what are the important factors does this solution depend upon. Definitely it is data bandwidth, the software developer can identify the packet size and based on distance & identify the minimum bandwidth required. A bandwidth of 100mbps may be just sufficient to meet the average case but a bandwidth of 110 mbps might make sure the reliability falls in 6 sigma range.

The software developer working on the requirement starts with analyzing the environment where software shall be deployed, listing down key failures that might occur in software.

How do we achieve this? Difficult but not impossible

Reading all the theory above, the important question now comes how can this be achieved. Over the years, there has been number of suggested development process but what has worked is the process which is agile and iterative.. An iterative process, with provision of measuring quality at every step The measure criteria well said, as well measuring instrument well calibrated are the key to success. Few steps which have worked effective:

Identify key critical factors for a requirement to fulfill e.g. you may want to list down situations that will impact the end use-case. As cited in above example satellite site not receiving packet would result in failure during disaster.
Identify areas which may cause problems or failure modes. A high number of failure modes might indicate very low reliability.
Group failure modes put software code to make sure these failure mode are not suppressed but exposed with suggestions or remedies
If these failure modes ways to avoid them. For e.g. if we make sure high memory availability then we reduce the chance of having a failure to occur.

Apart from having such thought process, it has also become import to adopt a strategy of management which is agile and iterative.

Few suggestions which work well in making this happen:

Identify key stakeholders and roles of individuals in the team. Few iterative processes, also calls for naming individuals a Pigs or Chicken. Pigs – who work or code, Chicken – the stakeholders. Mixing the roles is recipe to disaster. Ask chickens to identify failures or quality metrics along with pigs
Create list of undefined or unclear work, and work with people involved to get list defined.
Measure the defects or mistakes in work at every step, if possible get the average case from previous projects. Measure deviation and calculate 6 sigma value. Work iteratively to improve the sigma value.
Most important – involve team in predicting work timelines, and take average case of team’s speed or velocity in deciding critical timelines.

What not to do?

There are certain things which should be take into account while practicing 6 sigma for software development. Software does not work on its own, but depends on other components on a computer hardware. On average 3000 modules are installed on an given workstation. Assuming 10 bytes result from 1 LOC, with each module of 100K bytes. Assuming single LOC results in 10 bytes, then it is very likely each executable module will have about 50 bugs. Thus achieving a target of 6 sigma depends highly on the quality of other dependent modules.

100Kbytes = 10KLOC per exe

5 bugs / KLOC = 50 bugs per exe

In my experience it has been very tough to achieve such discipline but believe me feeling of achievement is very high once reached there. So, keep trying and keep improving.

Encomium to Technology

Wednesday, August 12, 2015

Six Sigma in software development

How do we achieve this? Difficult but not impossible

What not to do?

No comments:

Post a Comment

Blog Archive