Kubernetes — What Is It, What Problems Does It Solve and How Does It Compare With Alternatives?
In this article, we will try to answer:
- What is Kubernetes?
- What problems does it aim to solve?
- When should you choose to use Kubernetes? What alternatives are available?
In Part 2, we will discuss:
- What is the architecture of Kubernetes?
- How one can use Kubernetes (with a simple example).
Before we dive in and answer these questions, let’s briefly discuss some background. This will help us to better understand the motivation behind designing a system like Kubernetes.
Background of Software Applications in the Early Days
In the early days, software applications were designed and deployed as monolithic applications. In monolithic applications, a single instance performed all the business functions.
The majority of these applications were Enterprise software applications, meant to be used by a single organization. They were deployed and maintained internally, on-premises. These applications were primarily used for simple tasks like accounting and billing within the organization.
Designing such applications with a set of predetermined constant parameters (such as the organization’s size and requirements), could guarantee high performance and robustness. Downtime for maintenance was accepted, as 24/7 availability of these applications wasn’t a requirement in those days.
The disruptive new millennium
The technological breakthroughs during the late nineties and early 2000s, pushed the limits of what’s possible from a software application. Software applications started performing more and more complex tasks, like business intelligence, customer relationship management, etc.
In order to effectively handle this complexity, the code-base of these applications grew in size. Apart from this, the exponential growth of the internet around this time added a completely new dimension, and gave birth to a new breed — web applications.
Accessible from anywhere on the web, these web applications are expected to be available around the clock, without downtime. The workload on web applications was not predetermined, and they are expected to be resilient enough to handle the unexpected fluctuations in workload.
The rise of cloud
As the size of the applications, and the demand for their around the clock availability grew, maintaining these applications on-premises became a challenge. The cost and engineering effort required to maintain the physical hardware made little economic sense. Instead, a new alternative quickly became the popular choice, on-demand cloud computing platforms.
Cloud computing platforms offered the physical hardware resources required to run the applications as a service. These organizations promise high availability and reliability of their infrastructure, and charge their customers on the basis of how many resources they used, and for how long.
The relief of not having to maintain physical hardware in a world of Moore’s law, and the option of pay-per-use, fueled the widespread adoption of cloud. Amazon Web Services (AWS), Microsoft Azure and Google Cloud are some of the popular Cloud Service Providers available today.
Development and Deployment Issues Remain Unsolved
Even with cloud taking care of the physical hardware problem, there are numerous issues in an application life-cycle that remain unsolved — development and deployment, for example.
- Large build times. Extremely large code bases resulted in large build times, i.e, large downtime for deployment. But high availability has been a critical requirement, and downtime was unacceptable.
- Slow Development. Another big problem with a large code-base, is its effect on development speed. Since the entire code-base is encompassed in a single instance, a broken code from one single developer in the team, could block the work of a hundred other developers from being deployed. In the early 2000s, the Agile software development method emerged, which advocated evolutionary development, early delivery, and continual improvement. But monolithic style discouraged the developers from making small incremental changes, as the cost of a new deployment was very high.
- Poor resilience to failures. A monolithic web application handles application failures very poorly. In a monolithic application, all the business functionality is handled by a single instance. A fatal error in performing a single business function, could bring down the entire application. Instead of the entire business functionality being unavailable, we would prefer a scenario where the well-functioning part of the application remained available without interruption, while the error-prone functionality is fixed.
- Low resource utilization. When the work load to perform a particular business function suddenly increases, the entirety of a monolithic application has to be scaled, to balance the workload. This results in sub-optimal consumption of computing resources. In a world of cloud, redundant usage of resources costs money. Instead, one would prefer a scenario where only a small part of the application, necessary to handle the required business function scales, resulted in optimum utilization of resources.
An industry-wide engineering effort to avoid the problems (with monolithic architectural style) discussed above, resulted in the development of architectural-style — Microservices.
In this style, the application is designed as a collection of loosely coupled services (smaller applications), instead of a single large monolithic application. Each of these services is responsible for a self-contained business function. These services communicate with each other whenever necessary and work together to meet all the business requirements.
Decomposing an application into different smaller services makes it easier to understand, develop and test. All of the above-mentioned problems with monolithic style are addressed.
- Only a small service has to be rebuilt whenever the code is updated, thus reducing the deployment time. The reduction in deployment time encourages continuous delivery and deployment.
- The decomposition also results in the application becoming more resilient to failures, as downtime is observed only for the error-prone services, while other services remain fully functional.
- Workload fluctuations could be handled much more efficiently. Optimum utilization of cloud resources is possible, if the number of replicas of a service is dynamically adjusted per workload.
High-Level Architecture style comparison
Although the Microservices style solved most of the problems with monolithic on-premises applications, it introduced new challenges.
- Redundant usage of cloud resources. As mentioned earlier, in Microservices, dynamically scaling the replicas of a service is essential for optimum utilization of cloud resources. This would require fine-tuned control over how cloud resources are being utilized. In Microservices architecture there are too many moving parts. Manually controlling the usage of all the cloud resources is not feasible. So, the new challenge is to design a solution that controls the utilization of cloud resources at a fine-tuned level.
- The nightmare of deployment operations. In Microservices, we’re splitting a single application into multiple smaller applications. Care must be taken to ensure that communication between the services is efficient and stable, and the OS of the host has all the required dependencies before deployment.
- The system now has multiple smaller applications instead of one. Because of this, the approach required for monitoring the overall application should be modified to handle a large number of smaller applications. Making the whole process much more complex.
- This complexity quickly grows with the number of services, and it isn’t an exaggeration to say that the task of executing the deployment operations becomes a nightmare. A robust solution to better handle the complexity of deployment operations has become indispensable.
Innovation for New Challenges
The solutions to these new challenges required two new key innovations.
The first key innovation, that became an integral part of the solution to handle the deployment of distributed applications on the cloud, was Containerization. Containerization is the practice of deploying applications by packaging them in a container.
What is a container
A container is an OS-level virtualization, in which the kernel allows the existence of multiple isolated user-space instances.
In simpler terms, you could say that a container is a fully functional virtual computer, running inside another computer (host). Each container is isolated from other containers, and from the host.
Traditional deployment of applications vs containerized deployment of applications
Why Is Containerization of Applications Important for Deployment?
- Decoupling of application and infrastructure. Traditionally, software applications were deployed directly onto the host machine. This had the disadvantage of entangling the application’s executables, configuration, libraries, and lifecycles with each other and with the host OS. But this new approach to package each application and its dependencies into a self sufficient isolated container, makes the application completely decoupled from the host OS and infrastructure. It allows for the deployment of an application on any host, without first worrying if the host has all the dependencies installed.
- Isolation of resources. Aside from decoupling an application from the host, an even more important advantage with containerization is the isolation of resources. If multiple containers are running on a host, a fatal error inside one container can’t affect other containers, or the host.
- Compared to this, when all the applications are directly installed on the host, a fatal error can bring down, or corrupt, other applications, and in some cases could crash the entire host. But there is no scope for such a scenario if the applications are deployed in containers. Containerization of applications adds great value, and will simplify a great deal of complexity involved in developing, deploying, and maintaining distributed applications.
- It wasn’t long before various engineering teams across the industry realized the advantages with containerization of applications, and decided that every application that gets deployed within their organization must be containerized.
Container Orchestration System
As soon as the industry embraced containerization, the need for our second key innovation was born. This innovation was the container orchestration system. On a high level, a container orchestration system allows the user to effectively manage the deployment of containerized applications.
What does a container orchestration do? Where does it fit in the system? How much responsibility does it have? How complex should it be? Answers to each of these questions are not definite and are completely dependent on the choices made by the engineering team that designed that particular container orchestration system.
One could say that all a container orchestration system brings to the table, is an ability to start and stop containerized applications, and it is not worthy enough to be a separate component in the system. But experiences of many engineering teams across the industry has shown that it is not the case. To understand why not, let’s take a step back and summarize our discussion so far, and recapitulate the critical requirements and unresolved issues we have outlined.
- Applications which handle large scale complex operations must be highly available without any downtime.
- Applications should be resilient to large fluctuations in workload.
- Failure to optimally utilize the cloud resource will result in large costs.
- Reliably performing deployment operations, to support evolutionary development, is crucial.
Almost all of the large scale organizations in the industry started facing some version of the problems above by the late 2000s or early 2010s. And engineering teams at these organizations made efforts to solve these problems.
A few of these teams built an individual software package for each problem they faced. But many teams eventually realized, that designing a container orchestration system, which addressed all of these problems together, was the best approach. A robust and well-engineered container orchestration system could make the system resilient, efficient, and completely abstract the complexity of deployment operations from the user.
Now that we understand that an engineering effort to build a container orchestration system is valuable. Let’s focus on one of such effort s — work done by the engineering team at Google. Over the years, the engineering team at Google worked on multiple internal projects, aimed at solving deployment problems.
Early on, they realized the benefits of containerization, when performing deployment operations at large scale. They then went on to design a system for automating containerized application deployment, scaling, and management. This attempt at designing a system which could reliably function at the scale of Google, gave the engineering team rare and useful insights.
After realizing how ubiquitous the problems they solved were, Google’s engineering team considered democratizing this technology, and started working on a new project — reshaping the concepts behind their internal projects, to work in tandem with other open source technologies.
In 2014, Google open-sourced this project with the name Kubernetes.
What is Kubernetes?
To quote the official documentation:
Kubernetes (K8s) is an open source system for automating deployment, scaling, and management of containerized applications. It groups containers that make up an application into logical units for easy management and discovery.
Kubernetes is a container orchestration system developed by Google. Kubernetes is designed to effectively solve many of the issues we outlined earlier.
- Kubernetes can run containerized applications of any scale without any downtime.
- Kubernetes can self-heal containerized applications, making them resilient to unexpected failures.
- Kubernetes can auto-scale containerized applications as per the workload, and ensure optimal utilization of cloud resources.
- Kubernetes greatly simplifies the process of deployment operations. With Kubernetes, however complex an operation is, it could be performed reliably by executing a couple of commands, at most.
Although Kubernetes was initially designed by Google, it is now maintained by the Cloud Native Computing Foundation (CNCF) — an open source software foundation dedicated to making cloud-native computing universal and sustainable. Since Kubernetes initial release, the open source community has been contributing heavily to its growth; adding new features and making Kubernetes more robust and efficient.
Knowledge of Kubernetes is an indispensable tool for anyone tasked with designing an application, as per modern requirements. But before we dive deep into Kubernetes, let’s take a look at things like what one could expect from Kubernetes, and when we should choose to use it. We’ll also look at a few alternatives to Kubernetes, and how these differ.
What Problems Does Kubernetes Aim to Solve?
Kubernetes provides a container-centric management environment. It orchestrates computing, networking, and storage infrastructure on behalf of the user. It eliminates the need for direct manual orchestration in a cluster, and automates the orchestration process, so that the applications are highly available and compute resources are optimally utilized.
- Service discovery and load balancing. You have an application comprising of hundreds of Microservices, and they need to efficiently and reliably communicate with each other. Kubernetes can take care of that.
- Horizontal scaling. The workload on your application is known to surge abruptly. Scaling up the replicas of a service will balance your workload. Scaling it down after the surge keeps your costs low. Kubernetes can do that for you (auto scale).
- Self healing. Whenever one of the hundreds of services goes down due to fatal error, you’ll want to automatically instantiate a new, healthy replica of this service. Kubernetes can do this for you (self healing).
- Automated rollouts and rollbacks. Kubernetes can take care of deployment operations like the rollout of a new version of the application, and the rollback to a previous version. All of these operations can be performed reliably just by executing a couple of commands from a command line.
- Secret and configuration management. Kubernetes provides built-in mechanisms to effectively store and manage configuration (like environmental variables, database connections) across different environments (eg : production, test, development). It also allows for storing sensitive configuration data, meant to be kept secret in a special manner, so that accidental exposure of such data is minimized.
- Storage orchestration. Kubernetes allows for effective management of storage required by an application. In Kubernetes, storage management is separated into multiple steps. The allocated storage is first configured, then a claim is made whenever an application in the cluster requires this storage. Kubernetes provides excellent integration with various storage solutions supported by cloud providers.
Tasks Unrelated to Core Functionality
Kubernetes offers a large set of features, however, there are many tasks that are unrelated to its core functionality. It is designed to support excellent integration with other tools, that are expected to perform the tasks outside the realm of Kubernetes.
- Kubernetes doesn’t deploy source code or build the application. Continuous integration, delivery, and deployment (CI/CD) workflows are not a core feature of Kubernetes. Automation tools like Jenkins, which has excellent integration with Kubernetes, could be used for such tasks.
- Kubernetes does not provide application-level services, such as message buses, data processing frameworks, databases, caches and storage systems as built-in services. These components can run on Kubernetes, and/or can be accessed by applications running on Kubernetes through portable mechanisms, such as the Open Service Broker.
- Kubernetes does not dictate logging, monitoring, or alerting solutions. Although Kubernetes comes with some integrations as proof of concept, and mechanisms to collect and export metrics, using external tools that are best suited for your particular use is recommended. Fluentd is a good choice for handling logging. Prometheus is a popular choice for monitoring and altering. Helm and Envoy are also popular projects, which work well with Kubernetes and simplify the workflow. The page Projects under CNCF is a good place to find more.
Kubernetes has been successful in adding great value to its end users by providing a robust platform to orchestrate deployment operations, and great interoperability with external tools.
Kubernetes is now the world’s most popular container orchestration platform, and is used by some of the world’s most innovative companies, across a wide range of industries. But there are some caveats when working with Kubernetes you should be aware of.
Caveats With Kubernetes
Once configured, Kubernetes greatly simplifies the manual workload during deployment operations. Kubernetes priorities lie in providing high flexibility and configurability over ease of use.
Configuring Kubernetes for a large production environment is a complex task. Correctly configuring and maintaining a production system requires expert level knowledge, but the learning curve is known to be steep.
Many teams that tried to abruptly switch to Kubernetes without proper training, faced difficulties and often switched back to their old solution. Any new team, before choosing to use Kubernetes, should first allocate time to learn the fundamentals required to work with Kubernetes.
As the size of the project, deployed on Kubernetes, grows — the amount of Kubernetes configuration required also grows. In large projects, maintaining and updating Kubernetes configuration becomes a challenging task in itself.
This complexity with configuration management of Kubernetes has been a well known problem, and steps are being taken towards developing a good solution. Kustomize, a Kubernetes native configuration management solution, is actively under development. Many teams use external tools like Helm, to manage Kubernetes configuration at present.
As mentioned above, Kubernetes is expected to be used with other tools. It is designed to support patterns, where specialized tools could easily integrate with it. Using Kubernetes without any additional tools where required, or using inappropriate tools, could result in a bad experience.
Teams should actively explore the available tools and choose the ones best suited to their needs.
Another important caveat when choosing Kubernetes, is that it is a highly active project. New releases are made frequently. Although rare, there are times when new changes may not be backward compatible, and could cause production outages. Thus, one should be diligent and design a testing mechanism for detecting application breaking changes, whenever they update their Kubernetes version.
When to Use Kubernetes and Alternatives
Kubernetes offers solutions to container orchestration, and other challenges one faces whilst working with applications that are designed using the Microservices architectural style.
We have already discussed that this problem of efficiently and effectively deploying large scale Microservices application, was an industry wide concern, and many engineering teams worked towards solving it. Many teams designed their own version of container orchestration system.
We will discuss a few of the other popular projects that aimed to solve the same problems as Kubernetes. Having some basic familiarity with these projects will help us better understand where Kubernetes stands, and when you should choose to use it.
Netflix OSS and Spring Cloud
Although this project did not directly tackle the problem of container orchestration, there is some overlap between the problems solved here, and solutions offered by Kubernetes, so let us briefly discuss it.
A few years prior to the release of Kubernetes by Google, Netflix released a set of libraries aimed at handling various issues one faces, while working with Microservices architecture.
These libraries were built from the experience and insights gained by the Netflix engineering team, while scaling their infrastructure on AWS. The Java-based Spring Framework was then built on top of these libraries, and they released Spring Cloud Netflix. Spring Cloud Netflix greatly simplified the process of integrating Netflix libraries with Spring applications.
These releases quickly gained traction, and were adopted by a large number of teams struggling to get their Microservices architecture right. They provided capabilities like service discovery (Eureka), routing (Zuul), fault tolerance (Hystrix), etc. But using these libraries had some drawbacks.
Most of them were written primarily in Java. Applications are required to either be aware of these libraries/services, or interact with them at run time. This made using these libraries/services in applications, implemented in other languages and frameworks, awkward. Along with this, an even more important concern was that the application code has to incorporate the logic necessary to communicate with these libraries/services.
On the other hand, Kubernetes does not restrict the choice of language or framework in which an application is implemented, and aspires to be general purpose. Applications deployed on Kubernetes are containerized, and are completely unaware of deployment infrastructure. Service discovery, routing, health monitoring, etc, are handled by Kubernetes, allowing applications to focus on core-business logic.
Spring Cloud Netflix has little development activity today, and is mostly in maintenance mode when compared to Kubernetes, which is very active.
Docker Swarm is the cluster management and container orchestration solution that comes integrated with Docker Engine.
Released in 2015, Docker Swarm is a solution that’s completely native to the Docker environment, and doesn’t require any additional software, apart from Docker. Docker Swarm, as of today, does support a large set of functionalities that Kubernetes offers. Service discovery, load balancing, rolling updates, auto scaling, and state reconciliation, are available in Docker Swarm as well.
Over the years, any feature that has been proven to be of great value in Kubernetes, was later added to Docker Swarm in one way or another. Borrowing good features allowed Docker Swarm to stay relevant, and it is still being used by a good number of teams in production.
Unlike Kubernetes, which is notoriously daunting for beginners, Docker Swarm is very simple to use. But, Docker Swarm sacrifices configurability and flexibility in favor of simplicity and ease of use — a choice which differentiates it from Kubernetes.
Because of this choice, a large ecosystem of specialized external solutions integrate seamlessly with Kubernetes, but not with Swarm. Almost all cloud providers support great integration with Kubernetes, but only a few directly support integration with Swarm, which is a very important concern if you are planning to deploy your applications on the cloud.
Kubernetes enjoys large development activity, and is often the first place to find great new features. These features might take a while to reach Swarm, or in some cases, Swarm may never adopt those features.
Apart from this, Docker Swarm can run only Docker containers, whereas Kubernetes supports numerous container systems including Docker. Unlike Kubernetes, which is completely a community project, Swarm is maintained by Docker Inc.
If you have a team that’s already using Docker, and is very comfortable with Docker CLI, and you are not interested in integrations with external tools or cloud providers, you could consider using Docker Swarm. Otherwise, choose Kubernetes.
Marathon on Apache Mesos
Marathon is the container orchestration framework that runs on top of DC/OS (the Distributed Cloud Operating System). DC/OS is an open source distributed operating system, based on the Apache Mesos distributed systems kernel.
Marathon can’t run without DC/OS, thus only projects using DC/OS can use Marathon. On the other hand, Kubernetes can run on DC/OS, along with numerous other platforms and operating systems.
Marathon offers features like service discovery, load balancing, health checks, CLI, GUI, etc. But Kubernetes offers more mature versions of these features, and many other features that aren’t part of Marathon.
Marathon is native to DC/OS, takes advantage of this, and perform operations more efficiently and effectively. Kubernetes has better integration with external services and cloud providers. Consider using Marathon only if it is essential for your project, but do explore the possibility of using Kubernetes, as it gives you more flexibility.
The most important factor, where Kubernetes outperforms all its alternatives, is development activity. Development activity is a reliable metric to measure the longevity of the project, and the support for it.
The fact that Kubernetes has such a strong developer community behind it, is often the reason why most teams choose to use it in their production environment.
To summarize, anyone interested in designing large scale applications that remain highly available and fault tolerant, while optimizing the consumption of computing resources, should consider using Kubernetes for container orchestration, combined with an engineering team that has expert knowledge of configuring, maintaining and updating Kubernetes.
Kubernetes enjoys the largest development activity, and supports a large ecosystem of tools and services, which makes it a good choice for most teams.
In Part 2 of this article, we will dive deep into the details of the inner workings of Kubernetes. We will continue our discussion of the concepts, design, and architecture of Kubernetes.
Then, we will go through a detailed example showing how to deploy a stateful web application with Kubernetes.