Apache Kafka is distributed Publisher-Subscriber messaging system.
What does Publisher-Subscriber do?
Publisher publishes some information and subscriber subscribes those messages.
What is Messaging System?
Apache Kafka main purpose to store the messages sent by publisher and supply those message to subscribe whenever they request those messages.
Kafka Overview
Producer / Publisher
In Apache Kafka, publishers are referred to as Producers. They play a crucial role in sending messages to Subscribers (also known as Consumers). Producers are responsible for producing and sending messages to Kafka topics, which are then stored in the brokers and consumed by subscribers.
Consumer/ Subscriber
Subscribers are called consumers as they consume messages from Kafka Brokers
Broker
They store messages and they serve publishers and subscribers.And the Kafka broker simply stores messages in files on a hard drive, and producers are able to append messages to those files, and consumers are able to read from those files.
Zookeeper
Zookeeper manages the configuration of topics and partitions in a Kafka cluster. When a topic is created, Zookeeper:
Stores the topic configuration
Distributes the configuration to all brokers in the cluster
In summary, Zookeeper is essential in the Kafka ecosystem as it maintains and coordinates the configuration of topics and partitions across the cluster.
Kafka Topic and Partition
Kafka Topic : Messages sent by publishers to brokers are stored in a special entity called a topic.
Key Points bout Kafka Topics
Unique Name: Each topic has its own unique name.
Cluster-Wide Uniqueness: The topic name must be unique across the entire Kafka cluster.
Offset Number: Each message within a topic is assigned a specific number called an offset.
Offset Assignment: The offset number is assigned to each message when it arrives at a specific broker.
Append-Only: Producers can only append messages to the end of the log, not insert them in the middle or at the beginning.
These points highlight the organization and structure of topics in Kafka, ensuring efficient and scalable data processing.
Kubernetes is an open-source platform for managing containerized services and workloads. It is a tool that enables you to automate the deployment, scaling, and management of containerized applications across a cluster of hosts on a shared network. The Kubernetes technology was first developed by Google and is now maintained by the Cloud Native Computing Foundation (CNCF), and was originally developed by Google.
With Kubernetes, you are able to easily manage a large number of container instances and services, making it ideal for applications that require a distributed logic environment. This platform provides a lot of functionality, such as automatic load balancing, scaling, and self-healing, which ensure that your applications are always available and running smoothly without any interruptions. Additionally, Kubernetes offers a rich set of APIs that allow you to automate many of the tasks you need to perform, and it integrates with many other tools and platforms, such as Prometheus for monitoring and Istio for building service meshes.
The following are some of the key components of Kubernetes:
Nodes: The nodes are the physical or virtual machines on which your containers run on.
Pods: Pods are the smallest deployable units in Kubernetes, and they may contain one or more containers.
Services: These allow your application to be exposed to the network and to be discovered by other services.
Controllers: These control the desired state of your application, ensuring that it is always operating as expected.
API Server: The API Server is the central control plane that manages all aspects of the Kubernetes cluster.
In summary, Kubernetes is an important tool for managing containerized workloads, and it is widely used both in small and large-scale deployments.
Kubernetes components
Kube-api-server
The Kubernetes API server is a central component of the Kubernetes control plane. It is responsible for exposing the Kubernetes API, which is used by all other components of the control plane, as well as external clients and applications, to interact with the Kubernetes cluster.
This API server is designed to be highly scalable and available so that it can handle a large number of requests from various sources at the same time. This system is also designed to be extensible, which means that it can be easily customized and extended to support custom APIs and resources according to your specific requirements.
With the API server, clients are allowed to perform CRUD (Create, Read, Update, Delete) operations on Kubernetes objects, such as pods, services, and deployments, by using a RESTful interface. Also, it also provides users and applications with authentication and authorization mechanisms to ensure that the Kubernetes API can be accessed only by those who are authorized to do so.
It is also important to note that the API server verifies all incoming requests and processes them in a timely manner, as well as storing the desired state of the Kubernetes objects in etcd, the distributed key-value store that Kubernetes uses to store all of its state information.
It is important to note that the Kubernetes API server is an integral part of the Kubernetes architecture, as this server provides centralized management and control for the entire Kubernetes cluster, thus allowing for improved user experiences and a better user experience in general.
etcd
The Kubernetes etcd store stores all the state information about the Kubernetes cluster in the form of a key-value store. It is used to configure and manage the cluster, as well as to store data related to its state. This file stores information regarding the cluster’s configuration, such as nodes, pods, services, and other objects. In addition, it is used to store information regarding the cluster’s scheduling and API requests.
Kubernetes offers several options for setting up highly available etcd clusters, including kubeadm, kops, or Kubespray. Additionally, when setting up an etcd cluster, it is important to take into consideration the best practices and considerations for large clusters operating in multiple zones. It is suggested in to create an external etcd cluster, which requires more infrastructure, but can provide greater control and flexibility.
Kubernetes Controller Manager
The Kubernetes Controller Manager is one of the components of the Kubernetes control plane that is responsible for managing the different controllers that are used by Kubernetes. Essentially, these controllers are in charge of maintaining the state of the cluster according to its desired state, by constantly monitoring the current status of the cluster and making changes accordingly so that the desired state can be maintained.
Controller Manager runs several different controllers, each of which is a separate process. In order to reduce complexity, they are all compiled into a single binary and run in a single process, each of which is responsible for a specific aspect of the Kubernetes cluster. As an example, the Replication Controller ensures that the desired number of replicas of a particular pod is maintained at all times, while the Node Controller is responsible for monitoring the cluster’s nodes and taking action if a node becomes unavailable.
It is the Controller Manager that is responsible for starting and stopping the different controllers, as well as for updating the configuration settings of the controllers as needed. In addition to providing a high level of fault tolerance, this system ensures that multiple instances of each controller are running at all times, so that if one fails or becomes unavailable, another can take over the task and maintain a high level of system availability.
As a whole, the Kubernetes Controller Manager is a vital component of the Kubernetes control plane, ensuring that the cluster remains in its desired state and that all necessary actions are taken to maintain that state.
Some types of controllers are:
Node controller: Responsible for noticing and responding when nodes go down.
Job controller: Watches for Job objects that represent one-off tasks, then creates Pods to run those tasks to completion.
EndpointSlice controller: Populates EndpointSlice objects (to provide a link between Services and Pods).
ServiceAccount controller: Create default ServiceAccounts for new namespaces.
kube-scheduler
This component of the control plane monitors for newly created pods that have no assigned node, and selects a node for them to be located at, based on their configuration.
The following factors are taken into account when making scheduling decisions such as,
Hardware/software/policy constraints
affinity and anti-affinity specifications
Individual and collective resource requirements
data locality
inter-workload interference
deadlines.
Kubernetes HA Design Patterns
Stacked etcd
In Kubernetes, stacked etcd is used to improve the reliability and scalability of the etcd cluster. With stacked etcd, each Kubernetes control plane node runs its own etcd instance as well as a proxy etcd instance. The proxy etcd instances form a separate cluster of etcd instances that store data related to the Kubernetes control plane, while the local etcd instance stores information related to the node itself.
There are several benefits to this approach. It allows for more efficient resource allocation, since each node can store its own local state information in its own etcd instance. Furthermore, it provides a higher level of fault tolerance, since each node can continue to function even in the event that the proxy etcd cluster becomes unavailable.
The stacked etcd cluster also provides better scalability, as nodes are able to be added to the cluster without increasing the size of the proxy etcd cluster. In this way, Kubernetes can be scaled horizontally more effectively.
However, it should be noted that stacking etcd does add some complexity to the Kubernetes control plane, and requires careful configuration in order to operate properly. Before implementing stacked etcd in a production environment, it is important to carefully consider the benefits and drawbacks of the method.
External etcd
Kubernetes utilizes a distributed key-value store called external etcd, which is a distributed key-value store which stores and retrieves data in a distributed manner. As the name implies, it is a separate service that can be installed outside of the Kubernetes cluster, and it is used to store the entire state of the cluster, including configuration data, secrets, and other information that is important to the cluster’s operation.
A Kubernetes cluster running on an external etcd can be helpful in a number of scenarios, including deploying the Kubernetes cluster across multiple data centers or entrusting the management of the cluster to a separate team or organization. The Kubernetes cluster restore function can also be used for disaster recovery and backup purposes, as it makes it simple to restore the state of the Kubernetes cluster in case of an unexpected failure.
It is important to note that external Etcd requires additional configuration and maintenance when compared to the default internal Etcd that comes with Kubernetes and requires no additional configuration. It can, however, provide the user with more flexibility and control over the storage and management of the data in the Kubernetes cluster.
Kubernetes Fundamentals
Pod
ReplicaSet
Deployment
Service
POD : POD is a single instance of any application, It is also a smallest object we can create in Kubernetes. PODS generally have one to one relationship with container within Kubernetes. In the exception (Multi container) case a POD can have two containers like Helper container (Data Pusher or Data Puller).
ReplicaSet : A replica set maintains a stable set of replica PODS at any given time, and it guarantees that a specified number of identical pods are available at any time
Deployment : Deployment runs multiple replicas of your application and automatically replaces any instances that fail or become unresponsive, It is well-suited for stateless applications.
Service : Services abstract POD’s, providing the user with a stable, so-called virtual IP address. It works as a load balancer in front of a POD
Using Amazon EKS, you can run Kubernetes on AWS without installing and operating your own control plane or worker nodes.
Kubernetes is an open-source container orchestration system that allows you to deploy and manage containerized applications. Kubernetes organizes containers into logical groups for management and discovery, then launches them onto Amazon Elastic Compute Cloud (Amazon EC2) instances. You can run containerized applications on premises and in the cloud using Kubernetes, including microservices, batch processing workers, and PaaS platforms.
EKS deploys the Kubernetes control plane, including the API servers and backend persistence layer, across multiple AWS Availability Zones (AZs) for high availability and fault tolerance. AWS EKS automatically detects and replaces unhealthy nodes in the control plane. AWS Fargate provides serverless compute for containers, so you can run EKS using it as part of a serverless computing setup. With AWS Fargate, there is no need to provision and manage servers, you can specify the resources for a given application and pay for them as needed, and the software enhances security through application isolation by design.
Amazon EKS is integrated with many AWS services to provide scalability and security for your applications. These services include Elastic Load Balancing for load distribution, AWS Identity and Access Management (IAM) for authentication, Amazon Virtual Private Cloud (VPC) for isolation, and AWS CloudTrail for logging.
Amazon EKS works by provisioning (starting) and managing the Kubernetes control plane and worker nodes for you. At a high level, Kubernetes consists of two major components: a cluster of ‘worker nodes’ running your containers, and the control plane managing when and where containers are started on your cluster while monitoring their status.
Without Amazon EKS, you have to run both the Kubernetes control plane and the cluster of worker nodes yourself. With Amazon EKS, you provision your worker nodes using a single command in the EKS console, command-line interface (CLI), or API. AWS handles provisioning, scaling, and managing the Kubernetes control plane in a highly available and secure configuration. This removes a significant operational burden and allows you to focus on building applications instead of managing AWS infrastructure.