Kubernetes is a container management/orchestration solution responsible for configuring, deploying/redeploying ,tracking,monitoring,scheduling ,scaling,handling availaibility and load balancing containerized applications across many server computers . The increase adoption of microservices resulted in increased usage of containers as containers are good for hosting small microservices. A complex of application may have hundereds of services and consquently containers hence container orchestration tools are imperative.
Kubernetes and complimentary projects mentioned below come togather to create a true cloud native platform.
- Service meshes eg Istio,
- A service mesh is a platform layer on top of the infrastructure layer that enables managed, observable, and secure communication between individual services. This platform layer enables companies or individuals to create robust enterprise applications, made up of many microservices on a chosen infrastructure. Service meshes use consistent tools to factor out all the common concerns of running a service, like monitoring, networking, and security. That means service developers and operators can focus on creating and managing applications for their users instead of worrying about implementing measures to address challenges for every service.
- Monitoring tools like Prometheus,
- Distributed tracing from the likes of Jaeger and Kiali,
- Enterprise registries like Quay,
- Inspection utilities like Skopeo,
Kuberntes cluster is group or bunch of nodes that run your containerized applications. kubectl is a command line interface (CLI) for managing operations on your Kubernetes clusters. It does so by communicating with the Kubernetes API which the the entry point to kubernetes cluster.kubectl can be used for any kuberntes cluster (cloud/minkube).
Key building blocks of kubernetes cluster
- Kubernetes control plane/Kubernetes master (also called cluster services) is key building block of kubernetes. .Desired state management can be enforced via feeding a specific configuration to control plane and the control plane will run that specific configuration. The desired configuration is specified in a deployment yaml file. Typically a deployment file will have pods configuration which will specify the cotainer image for the pod , how many replicas of a pod should be running , what tcp ports etc. The control plane will ensure that the specified configuration is running on the container hosts. In other words control plans components make global descions about cluster. Kubernetes cluster typically has multiple master nodes. Each master node runs the following process.
- kube-apiserver - The API server is a component of the Kubernetes control plane that exposes the Kubernetes API. The API server is the front end for the Kubernetes control plane. kube-apiserver runs on each master node. Kubernetes API is the gateway to cluster , for example it will get any requests on any updates to cluster (sechudle new pods,deploy new service).It all also handles authentication.(only authenticated/authorized requests will be allowed will be allowed and forwarded to other process by api server).If you want to query the health of cluster,or create a new service ,a request must be made to api server . so you have kind of one entry point into the cluster via apiserver . Note that the apiserver gets load balanced as avaliable on all master nodes.
- etcd - Consistent and highly-available key value store used as Kubernetes' backing store for all cluster data.etcd can be considered the brain of cluster as all major cluster changes get updated in etcd. for example how would the scheduler know what resources are avaliable on which worker node.If you make query requet to api server about system health , api server will get data from etcd. Note that only cluster data is stored in etcd, not application data. etcd is distributed storage across master nodes.
- kube-scheduler - creates new pods on worker nodes by sending command to kubelet. The descion to select a worker node is done intelligently based on
- resource requriments of pod (cpu/ram) and resource avaliability on worker.
- hardwadware /software/ policy constraints.
- kube-controller-manager - Logically, each controller is a separate process, but to reduce complexity, they are all compiled into a single binary and run in a single process.some types of controllers are
- Node controller: Responsible for noticing and responding when nodes go down or pods die. The node controller will make a request to scheduler which in turn will decide the worker node and start pod by interacting with kubelet.
- Job controller: Watches for Job objects that represent one-off tasks, then creates Pods to run those tasks to completion.
- cloud-controller-manager A Kubernetes control plane component that embeds cloud-specific control logic. The cloud controller manager lets you link your cluster into your cloud provider's API, and separates out the components that interact with that cloud platform from components that only interact with your cluster.The cloud-controller-manager only runs controllers that are specific to your cloud provider. If you are running Kubernetes on your own premises, or in a learning environment inside your own PC, the cluster does not have a cloud controller manager.
- Kubernetes node (aka worker node/minion node) - Nodes are comprised of physical or virtual machines on your cluster. These “worker” machines have everything necessary to run your application containers, including the container runtime and other critical services. Ie node is the work horse of the kubernetes cluster.Worker nodes typically will have more cpu and ram compared to master nodes .Each worker node in cluster will get a range of ip address. Through command kubctl get pod -o wide you can check the ip address of the pods. Each worker node will have multiple pods running on it. Some critical process need to run on each node .
- kubelet - Worker has kubelet process which runs and communicates with Kubernetes cluster services. It also ensures that all containers in the node are running and in a healthy state.Kubelet interacts with container runtime,worker node as kubelet is responsible for starting pod with a container inside , allocation resources from the node to pod like cpu / ram.
- kube-proxy - kube-proxy is a network proxy that runs on each node in your cluster, implementing part of the Kubernetes service concept. It is one of the main implementers of the service discovery and load balancing in the cluster. kube proxy has smart routing logic , for instance it may route request to pods on same worker node for performance purposes.
- Container runtime - The container runtime is the software that is responsible for running containers.
- cAdvisor: Acts as an assistant who is responsible for monitoring and gathering data about resource usage and performance metrics on each node.
- Pod - pod is the smallest unit in kubernetes cluster. Pod is a layer of abstraction over container. The advantage in this approach is that container technlogy can be changed. In a pod there can be one or more containers. Typically there will be just 1 container per pod and optionally some side containers (sidecare pattern). Each container in a pod must be on a different port. Since kuberntes offers a virtual network , each pod gets its own internal ip address. Note that pods are ephmeral in nature
What is kubernetes service
Kubernetes service can be used expose an application running on a set of Pods as a network service. In other words a Service is an abstraction which defines a logical set of Pods and a policy by which to access them (sometimes this pattern is called a micro-service). There is no need to modify your application to use an unfamiliar service discovery mechanism. Kubernetes uses server side discovery mechansim .Kubernetes gives Pods their own IP addresses and a single DNS name and a static ip address for a set of Pods, and can load-balance across them.Note that while each pod has its own ip address , pods are ephemeral and can be created /destroyed frequently. The new pods have new ip address.Hence pods ip address cannot be used. Instead the static internal ip of the service should be used. The set of pods targeted by a service is usually determined by a selector.
Ingress
Ingress becomes the entry point to the kubernetes cluster.External requests go to ingress which forwards requests to service.Ie ingress component is used for routing traffic to kubernetes cluster.In ingress you have routing rules where in host to service mapping can be defined.Host has to be a valid domain address. So typicall from load balancer , the request will go to ingress controller and from ingress controller to service. Different path can lead to different service.
ConfigMap
Exteranal configuration of application eg db url . Hence for configuration changes new images / redeployment need not be done. Note that user id passwords should not be put in configmap. .note that config map is a local volume. It is managed by kuberntes. this can also be used when you want configuration file for your pod. ConfigMap can be created and then mounted into pod/container.
Secrets
Secret is just like config map but it is used to store secret data eg credentials, The data is stored in base64encoded format.
Volumes
A physical storage can be attached to a pod.The storage does not depend on the pod lifecycle . Note that a pod could restart on any worker node so the storage should be accessible to all nodes. The storage can be on local machine or remote (not part kuberntes cluster).Hence for example of db is restarted, all the data is still persisted. Persisten volumes should be thought of as cluster resouce like RAM/CPU that is used to stored data. The type of storage which app needs does not depend on kubernetes. In other words storage is plugin to cluster. Persistent volumes are not name spaced ie avaliable to whole cluster. Typically for databases remote storage is used. A pod can have multiple volumes.
Storage class
storage class provisions persistent volumes dynamically.
Deployment
Deployments represent a set of multiple, identical Pods with no unique identities. A Deployment runs multiple replicas of your application and automatically replaces any instances that fail or become unresponsive. In this way, Deployments help ensure that one or more instances of your application are available to serve user requests. Deployments are managed by the Kubernetes Deployment controller.
Deployments use a Pod template, which contains a specification for its Pods. The Pod specification determines how each Pod should look like: what applications should run inside its containers, which volumes the Pods should mount, its labels, and more.
Deployments are well-suited for stateless applications.
Stateful sets
StatefulSets represent a set of Pods with unique, persistent identities and stable hostnames that GKE maintains regardless of where they are scheduled. The state information and other resilient data for any given StatefulSet Pod is maintained in persistent disk storage associated with the StatefulSet.
StatefulSets use a Pod template, which contains a specification for its Pods. The Pod specification determines how each Pod should look: what applications should run inside its containers, which volumes it should mount, its labels and selectors, and more.
StatefulSets are designed to deploy stateful applications and clustered applications that save data to persistent storage, such as Compute Engine persistent disks. StatefulSets are suitable for deploying Kafka, MySQL, Redis, ZooKeeper, and other applications needing unique, persistent identities and stable hostnames.
In stateful sets the pods are not identical. Giving each pod it's own required individual identity is what stateful set does differently compared to deployment. Sticky identity is maintained for each pod in stateful set. Each pod has persistent identifier that is maintained across any rescheduling. Ie when a pod dies and is replaced by another pod, it keeps that identity. In stateful set pods get fixed ordered names. Also each pod gets its own DNS end point from service. when pod restarts while ip address will change yet but name and end point will stay the same ie pods get stikcy identity(across restarts) . ie state and role is maintained across restarts.
Note that is a common practice to host database applications outside of the kuberntes cluster and stateful sets are more tedious compared to deployments. State full applilcations are not perfect for containerized environments.
Namespace in kubernetes
Resources can be organized in name spaces. It can be thought of as a virutal cluster within kuberntes cluster. Name space is useful in large applications where there are many services. It would be difficult to have over all picture of the cluster. You can create database name space, monitoring name space . Name spaces are also useful when there are multiple teams (many teams , same application). Name spaces are also useful for blue/green deployment ie the same cluster you want 2 different versions of application. Also access control can be done via giving a team access only to there name space. You can limit resouces a name space can consume (cpu/ram).
Note that each name space must have its own configMap /secrets.
What is Minikube?
Minikube is an open source tool that enables you to run Kubernetes on your laptop or other local machine. All the master node process and worker node process run on one machine.
Advantages of kubernetes
- Portable - Kubernetes can run containers on one or more public cloud environment,
- Auto/manual scaling
- High availaibility
- health checks
- load balancing
- Deployment ease
- Automated Rollouts and Rollbacks: Kubernetes handles the new version and updates for your app without downtime, while also monitoring the health during roll-out. If any failure occurs during the process, it automatically rolls back.
- Canary Deployments: Kubernetes tests the production of new deployment and the previous version in parallel, i.e. before scaling up the new deployment and simultaneously scaling down the previous deployment.
- Kubernetes allows stateful containers.