Encomium to Technology

Tuesday, January 12, 2016

Fine Grained Resource Management using Mesos and Docker

Mesos

Mesos is a fault-tolerant cluster manager which consists of a master daemon that manages the slave daemons running on each node in a cluster. The master also manages tasks running on these slaves. The master manages the physical resources across these tasks by accepting resource offers from slave. A framework running on top of Mesos consists of two components: a scheduler that registers with the master and an executor process that is launched on slave nodes to run the framework’s tasks. Mesos uses ZooKeeper as its configuration manager to coordinate activities across a cluster. More information on Mesos can be found at http://mesos.apache.org.
Our goal was to explore ways of assigning fine grain resources (like ports) to a specific slave and run a Docker daemon on each of this slave for isolation. We used Marathon (https://github.com/mesosphere/marathon) as our scheduler.

Here is our deployment

We have two hosts running Linux Ubuntu; the master and the marathon scheduler runs in 10.13.216.113 and the slaves in 10.13.216.214. There are two Docker daemons running on this 10.13.216.214 as well. We would like to use marathon to schedule a task which will launch a web application called “outyet” inside Docker.

Starting two Docker daemons in the same host

We wanted to deploy our application “outyet” on separate Docker daemons for each slave in 10.13.216.214.
Before we explain our motivation as to why we need two separate Docker daemons and how we start multiple daemons, we need to understand the internals of Docker. A good explanation of Docker architecture and its components can be found at https://docs.docker.com/introduction/understanding-docker/.

Docker uses a client-server architecture, the client is the CLI to connect to the server which is the Docker daemon.
A Docker client can connect to a Docker daemon running locally or to a remote host.
The Docker images, registries and containers are the main components of a Docker daemon.
A Docker image is a template which contains the operating system, the web server etc for an application.
A Docker container is created from the Docker image, a Docker container container is the runtime component of Docker.
Docker registry hold images, a registry could be public or private. When a Docker client issues the run command, Docker checks the presence of the image in its local registry and if not available pulls it from the configured public or private registry.
A Docker image is read-only; when Docker runs a container from an image, it adds a read-write layer on top of the image in which it can run the application for the image.

Our motivation to start multiple Docker daemons

Isolation : Docker takes advantages of a technology called namespaces to provide isolation; so when we run a container Docker creates a namespace for that container. By running a dedicated Docker daemon for each Mesos slave, we add another level of isolation, so each Docker daemon has its own set of constraints, resources and data.
Multi-Tenancy : Isolation and Multi-Tenancy go hand in hand; isolation ensures good Multi-Tenancy. Since each tenant runs on its own Docker daemon, tenants do not have access to each other file system which adds to another layer of security.
Microservices : A tenant can have its own registry and can restrict access to its registry from other tenants as each Docker daemon can have it own private or public registry.

Starting Docker Daemons

As we have seen from the Docker architecture, running two Docker daemons on the same host requires its own set of sockets to bind, files and directories to persist its filesystem and registries. Docker provides additional flags like host, graph and pidfile to facilitate running multiple Docker daemons on the same host.

sudo docker --daemon=true --host=unix:///var/run/docker_a.sock --graph=/var/lib/docker_a --pidfile=/var/lib/docker_a_pid

The --daemon=true flag is run docker as a daemon.
The --host flag is the socket to which the daemon has to bind to; if we are running multiple daemons, we need have to provide an unique socket identifier (Unix automatically creates a file descriptor for a socket when its created so that IO operations can be performed on the socket just as with normal file descriptors).
The --graph specified the path to use as the root of the Docker runtime. By default is /var/lib/docker, but if we are running multiple daemons, we have to specify a unique directory for each daemon. Each docker daemon maintains its own registry, filesystems, volume etc. in this directory.
The --pidfile is the file the process id is written to, must be unique so that one does not overwrite the other.
Similarly, let us start another Docker daemon:

sudo docker --daemon=true --host=unix:///var/run/docker_b.sock --graph=/var/lib/docker_b --pidfile=/var/lib/docker_b_pid

Starting Mesos master.

sudo <MESOS_INSTALL_DIR>/bin/mesos-master.sh --ip=10.13.216.113 --work_dir=/var/lib/mesos --quorum=1 --zk=zk://127.0.0.1:2181/mesos

--ip IP address where the master has to listen on.
--work_dir is the directory where mesos stores persistent information.
--quorum Number of replicas.
--zk zookeeper URL, we need this as we are running a cluster.

Starting Mesos slave.

Now let us start Mesos slaves in our slave node 10.13.216.214

sudo ./mesos-slave.sh --log_dir=/var/log/mesos1 --port=5051 --hostname=sclq214-1 --work_dir=/tmp/mesos1 --master=10.13.216.113:5050 --containerizers="docker" --executor_registration_timeout="5mins" --resources='ports:[21000-24000]' --docker_socket="/var/run/docker_a.sock" --ip=10.13.216.214 --attributes='slave:a'

Please note that the support for specifying a docker socket (--docker_socket) will not be available until Mesos .25 version.

--log_dir : The directory where Mesos slave writes its logs.
--port: Port to listen on
--hostname: The hostname the slave should report. If left unset, the hostname is resolved from the IP address that the slave binds to.
--work_dir: Directory path to place Mesos work directories
--master: The IP address of the master or masters
--containerizers: To run the slave to enable the Docker Containerizer, you must launch the slave with “docker” as one of the containerizers option.
--executor_registration_timeout: Amount of time to wait for an executor to register with the slave before considering it hung and shutting it down
--resources: The Mesos slave offers this (21000-24000) range of ports.
--docker_socket: The socket of Docker daemon which the executor of the slave will connect to for executing the task.
--ip: IP address of the slave.
--attributes: Attributes that can be passed to the slave.
Similarly let us start another Mesos slave in the same node

sudo ./mesos-slave.sh --log_dir=/var/log/mesos2 --port=5052 --hostname=sclq214-2 --work_dir=/tmp/mesos2 --master=10.13.216.113:5050 --containerizers="docker" --executor_registration_timeout="5mins" --resources='ports:[21000-24000]' --docker_socket="/var/run/docker_b.sock" --ip=10.13.216.214 --attributes='slave:b'

Executing a Marathon application in Mesos

For this work we chose a sample Docker web application called "outyet" (http://open.mesosphere.com/intro-course/ex12.html). The "outyet" image has to be built and loaded in each of the docker daemon; more instruction is available in the above link.

sudo docker -H unix:///var/run/docker_a.sock build -t outyet .
sudo docker -H unix:///var/run/docker_b.sock build -t outyet .

Once the "outyet" images are loaded in the Docker daemons, the application "outyet" can be deployed to Marathon from the host hosting Marathon (10.13.216.113).

curl -i -H 'Content-Type: application/json' -d @outyet.json localhost:8080/v2/apps

The contents of outyet.json from the example has to be changed to:

{
    "id": "outyet",
    "cpus": 0.2,
    "mem": 20.0,
    "instances": 1,
    "constraints": [["slave", "CLUSTER", "a"]],
    "container": {
        "type": "DOCKER",
        "docker": {
        "image": "outyet",
        "network": "BRIDGE",
        "portMappings": [{ "containerPort": 8080, "hostPort": 21001, "servicePort": 0, "protocol": "tcp" }]
         }
     }
}

id: Has to be an unique identifier
constraints: The constraint here forces Mesos to execute the task in "slave" "a"; the slave was started with these attributes.

Marathon Console

We can view the status of our apps ("outyet") from the Marathon console.

How do we test if the app has been deployed and if the app is running?

To test the app ("outyet"), we first need the port where the app is running. We can use the following command to figure this out.

sudo docker --host unix:///var/run/docker_a.sock ps

The output of the command will look like:

CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
58c253680cd7 outyet "go-wrapper run" 2 weeks ago Up 2 weeks 0.0.0.0:23705->8080/tcp mesos-20150814-181831-1909984522-5050-12612S9.23dd487c-01de-4d02-8e20-0deaca20c193

Now we can test the app by using Curl

curl http://127.0.0.1:23705

The output will look like:

<!DOCTYPE html><html><body><center>
<h2>Is Go 1.4 out yet?</h2>
<h1>
<a href="https://go.googlesource.com/go/+/go1.4">YES!</a>
</h1>
</center></body></html>

Conclusion and Future work

In this exercise we were able to demonstrate how we can assign fine grained resources like ports to a Mesos slave and run a simple Docker web application image ("outyet") using a Mesos Docker executor and scheduling it with a Marathon scheduler. We also demonstrated how we can run multiple Docker daemons on a single host and how we could map each Docker daemon to a Docker executor thereby providing Isolation and Multi-Tenancy in a Mesos/Docker environment.
We should be able to extend this approach to provision a ScaleIO volume to a Docker container and guard-rail them using ScaleIO's "Protection Domains" thereby improving the characteristics of Isolation and Multi-Tenancy to Docker.

Monday, November 2, 2015

Demulsifying Intel ClearLinux

Clearlinux When we talk containers, there are two things which get highlighted - (a) packaging format, so as applications and their dependencies go along each other, making it easy to deploy and (b) a high performing alternative to hypervisor based virtualization. The big tradeoff for "b" is security. In my earlier post, I dissected available Docker security constructs to conclude nothing much is available to make the environment good enough secure.

ClearContainers

Clear Containers is taking a different track, with proposing to start kvm based virtual machines as fast as Linux containers. Each container brings its own version of Linux kernel, and does not make use of popular cgroups and namespace to isolate the application using containers.
There has been a long debate to make "Kernel Virtual Machine" or popularly known as "KVM" to ba made part of Linux Kernel. Very recently, it a strip down version of KVM, called kvmtool was introduced as integral part of Linux kernel. For very long time, Linus Torvalds did not want it to be part of the kernel but I guess finally gave up on demand. A utility called "lkvm"[1] can be used to start these small sized virtual machines, living inside the kernel.

lkvm (Linux KVM Tool)

This new native Linux KVM tool was announced at the end of March and was written by Pekka Enberg, Cyrill Gorcunov, and Asias He. As per him, "The goal of this tool is to provide a clean, from-scratch, lightweight KVM host tool implementation that can boot Linux guest images with no BIOS dependencies and with only the minimal amount of legacy device emulation." This tool is still in development but is only around 5,000 lines of C code that is capable of booting a Linux guest image while leveraging the Linux KVM. The tool can be launched with simple cli options, userspace image and link to linux kernel bootable bzImage [2].

Virtual Devices (virtio)

The virtio or virtual IO lets Linux Kernel virtualization to share the device. Rather than have a variety of device emulation mechanisms (for network, block, and other drivers), virtio provides a common front end for these device emulations to standardize the interface and increase the reuse of code across the platforms. This design allows hypervisor to support common set of emulated devices, through a predefined common set of api's. The guess operating implements front end drivers, which make use of virtio API's and back end drivers are implemented in hypervisors [3].

The virtio has developed interfaces for - block, network, pci, ballon, console and 9p. Most of these are quite self explanatory, the one, which catches my attention is 9p or virtFS. VirtFS is a new paravirtualized filesystem interface designed for improving passthrough technologies in the KVM environment. It is based on the VirtIO framework and uses the 9P protocol. The containers created using ClearLinux, make use of virtFS or virtio 9p protocol foe exposing volumes from underlying system inside the virtual machine.

Direct access (DAX)

The non volatile memory (NVM) devices are going to provide fast terabytes of persistence storage at RAM speeds. It is easy to wrap block device around portions or whole of NVM device. Direct acces or DAX is patch introduce in kernel replacing XIP code, which files from direct addressable devices to be mapped into user space. This is a feature, which is used by Clear Containers to map rootfs inside the virtual machine image.

Clear Containers & Dockers

The Intel engineers have extended the Docker orchestration system to launch clear containers using docker cli. This code is still though not part of main docker repository but can be easily installed reading instructions from the clear linux website[4]. The installation brings all necessary components needed for running clear containers using the Docker cli. All the commands, which work normally with the Docker, do work here as is.

After the installation, it can be verified that this modified version of the Docker

' sudo docker -v'
docker version 1.8.1-clear-containers, build d12ea79-clear-containers

Now, try running a ubuntu server

'sudo docker run -d ubuntu sleep 5000'

Check to see if lkvm is running, do check out the parameters by which it is invoked

lkvm run -c 6 -m 1024 --name 99ed6cf540e3e93baa9a5d452dbeb7df38ebb19441454c16781bc33a33b87c07 --console virtio --kernel /usr/lib/kernel/vmlinux.container --params root=/dev/plkvm0p1 rootfstype=ext4 rootflags=dax,data=ordered init=/usr/lib/systemd/systemd systemd.unit=container.target rw tsc=reliable systemd.show_status=false no_timer_check rcupdate.rcu_expedited=1 console=hvc0 quiet ip=172.17.0.19::172.17.42.1::99ed6cf540e3::off --shmem 0x200000000:0:file=/var/lib/docker/clear-4740-containers.img:private --network mode=tap,script=none,tapif=tb-99ed6cf540e3,guest_mac=02:42:ac:11:00:13 --9p /var/lib/docker/aufs/mnt/99ed6cf540e3e93baa9a5d452dbeb7df38ebb19441454c16781

Some observations:

a) The root file-system is exposed using DAX
b) The filesystem of mounts from host is given as virtFS using 9p protocol
c) The init system is coming through, 'systemd"

References

[1] https://github.com/penberg/linux-kvm/tree/master/tools/kvm & https://lkml.org
[2] https://lkml.org/lkml/2011/3/31/406
[3] http://www.ibm.com/developerworks/library/l-virtio/
[4] https://clearlinux.org/blogs/clear-containers-docker-engine

Thursday, August 20, 2015

Dissecting Docker Security Part 2

In my last post, I discussed about varies security constructs available in Linux which help in creating a secure application environment. Other than these Linux kernel constructs, there are other best practices and architectures, which if followed can make container environment less vulnerable and thus more secure.

Unikernel

Small footprint - more secure

For next generation data center making use of public or in-house clouds, a statically linked small foot-print kernel aka. unikernel, has made possible to run lot many virtual appliance per host. Unikernel only include minimal functionality as needed by the application, thus making the host less vulnerable to attacks. An application not needing to take talk to outside world can avoid having a kernel stack consisting of networking components. This custom build kernel is statically linked to the application and shipping along with the application. MirageOS, a project at MIT is an example of such Unikernel. The team has proven with example deployments the level of security it offers when compared with a full size generic Linux distribution.

CoreOS

What is Linux? Tar ball of code called kernel? The kernel code can be downloaded from kernel repository, add with few utilities, create a tar ball and there you get a distribution. Various commercial distribution vendors - Redhat, Ubuntu, Suse are kind of doing the same. The vendors add value in making sure the distribution has all necessary utilities, make sure they work together and can provide support if anything breaks. CoreOS is one such distribution with an aim to make the distribution as small as possible. They get their tar balls, add only that stuff that would be needed to run "Docker Container" micro services. The distribution does not comewith any fancy gizmo, utilities, GUI, graphs but just enough to get things going with Docker.

The basic images boots up in less than a minute, and take just about 3GB of space. There is no package manager and one has to use Docker to pull any software. The default install does not let one login as any user, but wants to use cloud-init for any purpose of deployment. The cloud-init helps in bringing instances of CoreOS, as many, with application pulled from Docker registry.

In short, the concept of small size distribution not only works in favor of having a small footprint for the kernel but also makes the system less vulnerable. So, use CoreOS or anything similar.

Golang - nolibc

Golang, commonly referred a Go is a statically typed programming language. It is derived from C with capabilities such as garbage collection, dynamic arrays etc, and developed at Google. Always been a "C" programmer, I found learning Go super easy. Go comes with its own concurrency model, designed for better context switching between threads and not making each as a kernel thread. Engineers familiar with Unix threading model can easily understand and appreciate such differences. The language has implemented its own system call interface and does not use libc for calls into the kernel. All the code in Go is statically linked, though with 1.5 they started supporting shared libs. The basic idea of not using any system installed libraries for kernel talking brings the application written in Go to category of "less vulnerable".

Go calls in system call without libc or anything else.

A sample Dockerfile below, explains on how to build Docker containers with statically linked and with syscall package

FROM scratch
MAINTAINER Kelsey Hightower <kelsey.hightower@gmail.com> ADD contributors contributors
ENV PORT 80
EXPOSE 80
ENTRYPOINT ["/contributors"]

More info here

Conclusion

Micro-service architecture focusses on creating smaller components - functional, non-functionals of an application into a small separate entities. The life-cycle of these are managed independently, scaling up or down, and come with endpoint api's for communicating. Linux containers using Docker provides an easy way to compose such micro-services. The applications have to be small in size, secure and portable to run any environment. Small footprint of the application, with less dependency on the host based libraries or entities are the key to a successful secure micro service architecture.

In my next post, few more solution from Docker and some tricks using which container data management can be made easy & secure.

Wednesday, August 12, 2015

Dissecting Docker security - Part 1

What is Docker?

Docker is an open source project that automates deploying of software applications inside a Linux container. Linux containers utilize namespaces and cgroups, for providing fast and reliable isolation to the software applications. A high end system can be effectively utilized run thousands of containers, belonging to varied clients or tenants. Such systems have to be secured as an exploited application executed with high privilege rights or permissions, can bring bring the system down.

Why Security is important for Docker?

The isolation provided by Linux containers, are very secure and cover most of the aspects of security. The containers are widely used, but do not find its way when talked about multi-tenant environments. The reasons for not having such environments using containers is well called for due to missing constructs, which have to be investigated and or constructed.

Security Constructs

There have been a number of feature additions to Linux container security features, which, while
evaluating vulnerability focuses on three major areas. Even though these security constructs suffice but lack in a number of aspects, thus non giving out a complete solution.

1. The Linux kernel security features

a) Namespace

Linux namespaces is a lightweight process virtualization. It enables a process to have different views of the system. The processes running within containers cannot see the system or other containers. Linux kernel provides six different types of namespaces - mount, process, network stack, inter process communication, hostname and user. Each container gets its own network stack; containers do not get privileged access to containers sockets or interfaces of other containers. Mount namespaces ensures volume mounts in one containers are not see of accessed to other containers processes, though it may be seen at the host.

b) Control groups (cgroups)

Cgroups is a mechanism in a kernel for grouping, tracking and limiting kernel resource usage to a process. Cgroups ensure each process gets its own fair share of resources, and a single container cannot bring down system by over-exhausting resources.

2. Linux kernel capabilities

The Linux style capabilities come in two varieties, regular user and root. The users have to impersonate as root, to get capabilities of the root user. However, having such large access of user capabilities is more risky than advantageous. Therefore, having only two types of privileges is not sufficient; a more granular privilege set is required. The POSIX capabilities are exactly designed for this purpose.

3. The Linux kernel hardening features

a. Linux Security Modules - SELinux , AppArmor

i. Security-Enhanced Linux (SELinux) is a Linux feature that provides a variety of security policies for Linux kernel. It is included with CentOS / RHEL / Fedora Linux, Debian / Ubuntu, Suse, Slackware and many other distributions.

ii. AppArmor is the most effective and easy-to-use Linux application security system available on the market today. AppArmor is a security framework that proactively protects the operating system and applications from external or internal threats, even zero-day attacks, by enforcing good program behavior and preventing even unknown software flaws from being exploited.

iii. Grsecurity is a set of patches for the Linux kernel with an emphasis on enhancing security. It utilizes a multi-layered detection, prevention and powerful policy setup.It can be easily argued that aesthetics of Linux security modules makes them a unfavorable option for implementing security policies.

b. SECCOMP

Secure computing mode is a facility, which provides Linux kernel sandboxing by restricting access to the certain system calls, which a program can make. This feature thus does not let an exploiting program take control of the system, even when executed with super privileged user.

Evaluating Security Constructs

It can be said that above mentioned security constructs suffice most of the use-cases, but

a) Are the skills needed for using these require more than what application developer would know?

b) Are these constructs easily maintainable? Can they withstand change - migration of application etc.

c) Do using of these require change in application code or program?

d) Are these tools agnostic of the platform they run on? For e.g. a SELinux policy does it work with centos as base os and ubuntu as container os?

My earlier blog I have measured aesthetics of Linux security modules, as I do find them use-full but not very usable.

Conclusion

Before choosing a particular security construct, it is good to understand what skills and requirements are needed to bring these into play. In future blogs, we evaluate each of these against criteria. We will also evaluate few things which an application developer can do, so as make best use of Docker/Linux container technology.

Part2

Six Sigma in software development

Six sigma by definition is a process or methodology which when practiced in production, ensures the output quality lies within 6 standard deviation from mean or average quality. Interesting as well as convoluted definition, but lets now understand what exactly it means. Let’s first remind ourselves of normal distribution. The picture below, shows the normal distribution which is centered at mean.

68% of times the outcome lies between 1 sigma i.e. 1 standard deviation from mean. The outcome is 95% of times lies between 2 standard deviation and so on.Considering further, how often does the outcome lies within 6 sigma i.e. six standard deviation which would be way out of the graph shown above. The answer is the only time it will lie in 6 standard deviation is 3.4 in a million. All confusing till now, but lets now understand with a small example. very morning when I drive to office, which is 25miles from house, on an average I need 5 gallons of gas. I calculated for some time, I noted with one standard deviation 1 gallon.

mean = 5

standard deviation = 1

the question now to be answered is, how can I be sure of not running out of gas while going to office. That is, how to do I ensure quality traveling time to office by making is predictable and making sure I do not run out of gas ever.

With above values, of mean 5 and 1 standard deviation of 1 let’s calculate 6 sigma value, which is a simple calculation

6 sigma = 6 * 1 = 6

mean or average is 5, so all I need to do is to always have 5 + 6 = 11 gallons of gas in my car. Having 16 gallons I achieve 6 sigma in process of reaching office i.e. it would be only a chance of 3.4 in a million that my car would go out of gas. If I travel to office a million times, it would be 3.4 times out of million that I have a chance of running out of gas. Which in other terms under similar conditions would never happen.

As we now understand the basic definition of 6 sigma, let’s now see how it can be used in software development. Software development, unlike manufacturing is an iterative process. Some companies also call their software development as research & development. Calling it R&D teams is quite natural as work done is not very predictable and quite often tend to change directions. Over the years, there has been a number of recommendations on quality control, so as to to bring predictability to the whole process. Predictability when talked in terms of 6 σ would mean a plan which does not get us more than 3.4 opportunities of failures in a million, while saying million it means we never miss the target. When defined with software quality this would mean 3.4 failures in 1 million software execution. With my software development experience that is not at all an easy target.

Any statistical process requires gathering data i.e. creating population of opportunities. The opportunities in software development are nothing but lines of code written. Every line of code written there is an opportunity of defect created and is a process results in creating only 3.4 defects in 1 million opportunities we possibly term it as 6 sigma compliant.

Modern software are complicated, and trends suggests that it will become even more complicated in the near future. The number of bugs per thousand lines of code (KLOC) varies from system to system. Estimates are between 5 to 50 bugs per KLOC. On average, each module is about 100K bytes in size. Assuming that a single LOC results in 10 bytes of code then by conservative rate of 10 bugs per KLOC, each executable module has about 50 bugs. This is industry average, the average could very well vary. Taking industry average as 5 bugs per KLOC with a deviation of 1 i.e.

mean = 10

sigma = 1

Can we say a 6 σcompliant software development process would not give out more than more than 16 bugs per KLOC? Possibly yes, but only when we classify what these bugs are. For e.g. if the bug or defect is about not meeting the level of performance then it is a critical issue and cannot be ignored. The 6 sigma adherence on such situation would be meeting the performance by making sure availability of resources in abundance. Let me highlight this more with another example, a disaster recovery solution for a datacenter emphasis on creating a disaster recovery site outside the data center. The site is usually created at a distance far enough so as it is not impacted due to any disaster within datacenter or any natural disaster that might impact the data-center. For convenience let’s call the disaster site as satellite site. The major issue with disaster recovery software need to solve is to make sure data from local site actually reaches the remote satellite site. As we all know, IP network is connectionless, sending data in packets over wire. The layers above IP viz. TCP can be utilized for numbering the packets for proper data organization. This still does not make sure packets have actually reached over to satellite site and it is now responsibility of application layer to perform either retires or other mechanism to ensure data availability. In order to have such quality built it, what are the important factors does this solution depend upon. Definitely it is data bandwidth, the software developer can identify the packet size and based on distance & identify the minimum bandwidth required. A bandwidth of 100mbps may be just sufficient to meet the average case but a bandwidth of 110 mbps might make sure the reliability falls in 6 sigma range.

The software developer working on the requirement starts with analyzing the environment where software shall be deployed, listing down key failures that might occur in software.

How do we achieve this? Difficult but not impossible

Reading all the theory above, the important question now comes how can this be achieved. Over the years, there has been number of suggested development process but what has worked is the process which is agile and iterative.. An iterative process, with provision of measuring quality at every step The measure criteria well said, as well measuring instrument well calibrated are the key to success. Few steps which have worked effective:

Identify key critical factors for a requirement to fulfill e.g. you may want to list down situations that will impact the end use-case. As cited in above example satellite site not receiving packet would result in failure during disaster.
Identify areas which may cause problems or failure modes. A high number of failure modes might indicate very low reliability.
Group failure modes put software code to make sure these failure mode are not suppressed but exposed with suggestions or remedies
If these failure modes ways to avoid them. For e.g. if we make sure high memory availability then we reduce the chance of having a failure to occur.

Apart from having such thought process, it has also become import to adopt a strategy of management which is agile and iterative.

Few suggestions which work well in making this happen:

Identify key stakeholders and roles of individuals in the team. Few iterative processes, also calls for naming individuals a Pigs or Chicken. Pigs – who work or code, Chicken – the stakeholders. Mixing the roles is recipe to disaster. Ask chickens to identify failures or quality metrics along with pigs
Create list of undefined or unclear work, and work with people involved to get list defined.
Measure the defects or mistakes in work at every step, if possible get the average case from previous projects. Measure deviation and calculate 6 sigma value. Work iteratively to improve the sigma value.
Most important – involve team in predicting work timelines, and take average case of team’s speed or velocity in deciding critical timelines.

What not to do?

There are certain things which should be take into account while practicing 6 sigma for software development. Software does not work on its own, but depends on other components on a computer hardware. On average 3000 modules are installed on an given workstation. Assuming 10 bytes result from 1 LOC, with each module of 100K bytes. Assuming single LOC results in 10 bytes, then it is very likely each executable module will have about 50 bugs. Thus achieving a target of 6 sigma depends highly on the quality of other dependent modules.

100Kbytes = 10KLOC per exe

5 bugs / KLOC = 50 bugs per exe

In my experience it has been very tough to achieve such discipline but believe me feeling of achievement is very high once reached there. So, keep trying and keep improving.

Aesthetics of Linux Security Modules

Linux security modules were enhancements done to Linux kernel to bring security mechanisms by restricting entities - programs or files, to their specific role. These enhancements were introduced by NSA, bringing security features as needed but come with a heavy price of maintenance.

Mandatory Access Control

Most operating systems use access controls to determine whether an entity file or program can access a given resource. Linux based systems use a form of discretionary access control (DAC). For examples, files in GNU/Linux have an owner, a group, and a set of permissions. The permissions define who can access a given file, who can read it, who can write to it, and who can execute it. These permissions are split into three sets of users, representing the user (owner of the file), the group (all users who are members of a group), and others (all users who are neither members of the group nor owner of the file). A program executed with high privileged user can be exploited, doings things at the user’s access level, which is undesirable. Rather than defining privileges in such fashion, it may be better to define a minimal set of functions, which a program can perform. For e.g. if it a function of the program to listen on socket, it should not get access to file-system details. This type of control are call Mandatory Access Control.

Role based Access Control

Another approach to controlling access is role-based access control (RBAC). In RBAC, permissions are provided based on roles that are granted by the security system. The concept of a role differs from that of a traditional group in that a group represents one or more users. A role can represent multiple users, but it also represents the permissions that a set of users can perform.

Security-Enhanced Linux SELinux

SELinux adds both MAC and RBAC to the GNU/Linux operating system. SELinux provides all necessary tools for creating a MAC and RBAC policy. The policy implementation adds extended attributes to the entities - program or files, thus associating each entity with its role.

AppArmor

AppArmor was developed by security vendor Immunix. AppArmor has many features for SELinux but boosts of simplicity that serves as a main selling point. A security policy called profile, is assigned to each application, which defines the system resources and privileges that the application can access.

GRCSEC

Grsecurity is a patch for Linux kernel that allows you to increase prevention, protection and detection. Its main feature is hardening of chroot, grsecurity’s chroot hardening automatically converts all uses of chroot into real jails with confinement levels equivalent to containers. Processes inside a chroot will not be able to create suid/sgid binaries, see or attack processes outside the chroot jail, mount filesystems, use sensitive capabilities, or modify UNIX domain sockets or shared memory created outside the chroot jail.

Feature Comparison

Each of these security enhancements come with their pros and cons. They promise lot of features, but before anything of these can be recommended, the table below compares each of these against set of features offered.

FEATURE	SELINUX	APPARMOR	GRCSEC
Admin Skill Set (Learning Curve)	High	Medium	Medium to Low
Complex and powerful	Yes	Yes	Somewhat less
Detailed configuration required	Yes	Yes	Seems like have a learning algorithm
GUI tools to write / modify rules set	Yes	Yes	No
Ease of use	Horrible	Horrible	Somewhat less horrible
Binary package	Most Linux support	Ubuntu, Centos, not all	Ubuntu, Centos, not all
System performance impact	None	None	None
Typical user base	Enterprise	Enterprise	web-server and hosting companies
Documentation	Plenty	Plenty	Not, much
Auditing and logging supported	Yes	Yes	Yes

Conclusion

It can be easily argued that features supported by these extensions are not only use-full but important. At the same these score real low when measured for their aesthetics. The skills needed to use any such tool equals to compare of a system administrator. The system once configured tends to remain stable but becomes very inflexible to changes. The SELinux tags entities - files & program, making it next to impossible to bring any change.

Datacenter is now heading for a change, the roles of administrators is vanishing, with infrastructure providing all the what an end user would demand. An end user wants agility, flexibility, workload migration, guaranteed resource availability. All these requirement bring bigger challenges and system which cannot adhere such needs, is not needed.