Become a Celery expert TODAY! Sign up for my newsletter.
Tell me where to send your free Celery Bootcamp lessons.

Kubernetes for Python Developers: Part 1

A Kubernetes tutorial for Python developers

Published on November 28, 2018
Estimated reading time: 10 minutes
The full source code is available on https://github.com/bstiel/celery-kubernetes

Kubernetes is an open-source container-orchestration system for automating deployment, scaling and management of containerised apps.

Kubernetes helps you to run, track and monitor containers at scale. It has become the de facto tool for container management.

Kubernetes is the largest and fastest growing open-source container orchestration software.

This blog post is the first part of a series: Kubernetes for Python developers.

Our goal is to migrate a Celery app app we developed in a previous blog post from Docker Compose to Kubernetes.

You do not need any Kubernetes knowlegde to follow this blog post. You should have some experience with Docker.

In this first part of the series, you will learn how to set up RabbitMQ as your Celery message broker on Kubernetes.

You will learn about kubectl, the Kubernetes command line interface. And by the end of this article you will know how to deploy a self-healing RabbitMQ application with a stable IP address and DNS name into the cluster.

In order to run Kubernetes on your machine, make sure to enable it. You can find instructions here.

screenshot

kubectl

First you need to know is kubectl. kubectl is the kubernetes command line tool. It is the docker-compose equivalent and lets you interact with your kubernetes cluster.

For example, run kubectl cluster-info to get basic information about your kubernetes cluster. Or kubectl logs worker to get stdout/stderr logs. Very similar to docker-compose logs worker.

screenshot

Pods

You cannot run a container directly on Kubernetes. A container must always run inside a Pod. A Pod is the smallest and most basic building block in the Kubernetes world.

A Pod is an environment for a single container. Or a small number of tightly coupled containers (think log forwarding container).

A Pod shares some of the properties of a Docker Compose service. A Pod specifies the docker image and command to run. It allows you to define environment variables, memory and CPU resources.

Unlike a Docker Compose service, a Pod does not provide self-healing functionality. It is ephemeral. When a Pod dies, it’s gone. 

Nor does a Pod come with DNS capabilities. This is handled by a Service object which we will cover further down. Pods are much lower level compared to Docker Compose services.

Let’s create a RabbitMQ Pod. Using the RabbitMQ image from Docker Hub, tag 3.7.8. 

# rabbitmq-pod.yaml

apiVersion: v1  
kind: Pod  
metadata:
  name: rabbitmq-pod
spec:
  containers:  
  - name: rabbitmq-container
    image: rabbitmq:3.7.8

Create the Pod with kubectl and confirm it is up and running:

=> kubectl apply -f rabbitmq-pod.yaml
pod/rabbitmq-pod created

=> kubectl get pods
NAME                            READY   STATUS    RESTARTS   AGE
rabbitmq-pod                    1/1     Running   0          10s

Delete the Pod and confirm:

=> kubectl delete -f rabbitmq-pod.yaml
pod "rabbitmq-pod" deleted

=> kubectl get pods
No resources found.

ReplicaSets

When you create a Pod and the container running inside the Pod dies, the Pod is gone. Pods do not self-heal and they do not scale.

The lack of self-healing capabilities means that it is not a good idea to create a Pod directly. 

This is where ReplicaSets come in. A ReplicaSet ensures that a specified number of Pod replicas are running at any given time.

A ReplicaSet is a management wrapper around a Pod. If a Pod, that is managed by a ReplicaSet, dies, the ReplicaSet brings up a new Pod instance.

# rabbitmq-rs.yaml
apiVersion: apps/v1
kind: ReplicaSet
metadata:
  name: rabbitmq-rs
  labels:
    app: rabbitmq-rs
spec:
  replicas: 1
  selector:
    matchLabels:
      name: rabbitmq-pod
  template:
    metadata:
      labels:
        name: rabbitmq-pod
    spec:
      restartPolicy: Always
      containers:
      - name: rabbitmq-container
        image: rabbitmq:3.7.8

Instead of having a dedicated Pod manifest file, we now define the Pod inside .spec.template. This is the RabbitMQ Pod manifest from above.

.spec.template has exactly the same schema as the Pod manifest. Except that it is nested and does not have an apiVersion or kind.

We also rearranged the Pod’s metadata slightly. We now attach the label name: rabbitmq-pod to the RabbitMQ Pod. This matches the ReplicaSet’s .spec.selector.matchLabels selector.

This means the ReplicaSet can manage the RabbitMQ Pods as the selector matches. We set the number of RabbitMQ Pods we want to run concurrently in .spec.replicas to 1.

Create the ReplicaSet with kubectl and confirm the ReplicaSet is up and running. And check the ReplicaSet created one instance of the RabbitMQ Pod.

=> kubectl apply -f rabbitmq-rs.yaml
replicaset.apps/rabbitmq-rs created

=> kubectl get rs
NAME          DESIRED   CURRENT   READY   AGE
rabbitmq-rs   1         1         1       5s

=> kubectl get pods
NAME                READY   STATUS    RESTARTS   AGE
rabbitmq-rs-fxdqp   1/1     Running   0          7s

The ReplicaSet we created, created one RabbitMQ Pod. Let’s see what happens when we delete that Pod.

=> kubectl delete pod rabbitmq-rs-fxdqp
pod "rabbitmq-rs-fxdqp" deleted

=> kubectl get pods
NAME                READY   STATUS    RESTARTS   AGE
rabbitmq-rs-5sldl   1/1     Running   0          24s

What happened here? We deleted the ephemeral Pod rabbitmq-rs-fxdqp. The ReplicaSet then noticed that the actual number of RabbitMQ Pods running was 0. And it created a new RabbitMQ Pod instance named rabbitmq-rs-5sldl. We have a self-healing RabbitMQ instance.

Delete the ReplicaSet and confirm the ReplicaSet and any RabbitMQ Pods are gone:

=> kubectl delete -f rabbitmq-rs.yaml
replicaset.apps "rabbitmq-rs" deleted

=> kubectl get rs
No resources found.

=> kubectl get pods
No resources found.

Deployments

Deploying ReplicaSet updates directly is only possible in an imperative way. It is much easier to define the desired state.

This is the use case for Deployments. A Deployment provides declarative updates for ReplicaSets and Pods.

Create a Deployment to create a ReplicaSet which, in turn, brings up one RabbitMQ Pod:

# rabbitmq-deploy.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: rabbitmq-deploy
spec:
  replicas: 1
  selector:
    matchLabels:
      name: rabbitmq-pod
  template:
    metadata:
      labels:
        name: rabbitmq-pod
    spec:
      restartPolicy: Always
      containers:
      - name: rabbitmq-container
        image: rabbitmq:3.7.8

ReplicaSets manage Pods. Deployments manage ReplicaSets.

Now, let’s say we need RabbitMQ with the management plugin. We need to replace rabbitmq:3.7.8 with rabbitmq:3.7.8-management.

The new Deployment manifest defines the updated desired state for rabbitmq-deploy.

# rabbitmq-management-deploy.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: rabbitmq-deploy
spec:
  replicas: 1
  selector:
    matchLabels:
      name: rabbitmq-pod
  template:
    metadata:
      labels:
        name: rabbitmq-pod
    spec:
      restartPolicy: Always
      containers:
      - name: rabbitmq-container
        image: rabbitmq:3.7.8-management

Deploy the new Deployment version and see how it updates the ReplicaSet and Pod.

=> kubectl apply -f rabbitmq-deploy.3.6.16.yaml
deployment.apps/rabbitmq-deploy configured

=> kubectl get pods
NAME                               READY   STATUS              RESTARTS   AGE
rabbitmq-deploy-7f86fcd959-fgtxr   1/1     Running             0          8m
rabbitmq-deploy-f98989967-qmxzn    0/1     ContainerCreating   0          2s

=> kubectl get pods
NAME                               READY   STATUS        RESTARTS   AGE
rabbitmq-deploy-7f86fcd959-fgtxr   0/1     Terminating   0          8m
rabbitmq-deploy-f98989967-qmxzn    1/1     Running       0          19s

=> kubectl get rs
NAME                         DESIRED   CURRENT   READY   AGE
rabbitmq-deploy-7f86fcd959   0         0         0       13m
rabbitmq-deploy-f98989967    1         1         1       1m

Get more details about the new Pod:

=> kubectl get pod rabbitmq-deploy-f98989967-qmxzn -o yaml

apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: 2018-11-23T16:33:38Z
  generateName: rabbitmq-deploy-f98989967-
  labels:
    name: rabbitmq-pod
    pod-template-hash: "954545523"
  name: rabbitmq-deploy-f98989967-qmxzn
  namespace: default
  ownerReferences:
  - apiVersion: extensions/v1beta1
    blockOwnerDeletion: true
    controller: true
    kind: ReplicaSet
    name: rabbitmq-deploy-f98989967
    uid: 87be145f-ef3d-11e8-886a-025000000001
  resourceVersion: "594134"
  selfLink: /api/v1/namespaces/default/pods/rabbitmq-deploy-f98989967-qmxzn
  uid: 87c0e8ca-ef3d-11e8-886a-025000000001
spec:
  containers:
  - image: rabbitmq:3.7.8-management
    imagePullPolicy: IfNotPresent
    name: rabbitmq-container
    resources: {}
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: default-token-r7js4
      readOnly: true

RabbitMQ 3.7.8-management is successfully deployed, replacing RabbitMQ 3.7.8 and giving you access to the RabbitMQ management plugin. You now know how to create and deploy a self-healing RabbitMQ Kubernetes instance!

Services

We still lack a stable Pod IP address or DNS name.

Remember that Pods are not durable. When a Pod dies, the ReplicaSet creates a new Pod instance. The new Pod’s IP address differs from the old Pod’s IP address.

In order to run a Celery worker Pod, we need a stable connection to the RabbitMQ Pod.

Enter Services. A Kubernetes Service is another Kubernetes object. A service gets its own stable IP address, a stable DNS name and a stable port.

Services provide service discovery, load-balancing, and features to support zero-downtime deployments.

Kubernetes provides two types of Services.

A ClusterIP service gives you a service inside your cluster. Your apps inside your cluster can access that service via a stable IP address, DNS name and port. A ClusterIP service does not provide access from outside the cluster.

A NodePort service provides access to a Pod from outside the cluster. And everything a ClusterIP service provides.

Make the RabbitMQ Pod available inside the cluster under the service name rabbitmq and expose 5672.

Expose the RabbitMQ management UI externally on port 30672.

# rabbitmq-service.yaml
apiVersion: v1
kind: Service
metadata:
  name: rabbitmq
spec:
  type: NodePort
  selector:
    name: rabbitmq-pod
  ports:
  - protocol: TCP
    port: 15672
    nodePort: 30672
    targetPort: 15672
    name: http
  - protocol: TCP
    port: 5672
    targetPort: 5672
    name: amqp

Deploy with kubectl and check the service’s status:

=> kubectl apply -f rabbitmq-service.yaml
service/rabbitmq created

=> kubectl get service rabbitmq
NAME       TYPE       CLUSTER-IP      EXTERNAL-IP   PORT(S)                          AGE
rabbitmq   NodePort   10.105.37.247   <none>        15672:30672/TCP,5672:32610/TCP   1m

The RabbitMQ management UI should be available on http://localhost:30672:

screenshot

And RabbitMQ is now accessible internally under amqp://guest:guest@rabbitmq/5672.

Now that we have a stable RabbitMQ URL, we can set up our Celery worker on Kubernetes.

Conclusion

In this blog post, we built the foundations for migrating our Docker Compose Celery app to Kubernetes.

We set up a self-healing RabbitMQ Deployment and a RabbitMQ service that gives us a stable URL.

In the next part of this blog post, you will learn about persistent storage (volumes) and configuration via ConfigMaps. And we will migrate the remainder of our Celery app’s stack to Kubernetes.

Posted on November 28, 2018