Logo

Documentation

Determining Health

Overview

In this tutorial we’ll explore how to use health rules to let users know if their supply chain is working correctly. We’ll see how the different types of resources you create in a supply chain effects what health rules are appropriate.

Environment setup

For this tutorial you will need a kubernetes cluster with Cartographer installed. You may follow the installation instructions here.

Alternatively, you may choose to use the ./hack/setup.sh script to install a kind cluster with Cartographer. This script is meant for our end-to-end testing and while we rely on it working in that role, no user guarantees are made about the script.

Command to run from the Cartographer directory:

$ ./hack/setup.sh cluster cartographer-latest

If you later wish to tear down this generated cluster, run

$ ./hack/setup.sh teardown

Personas

In previous tutorials, we’ve split the personas between app operator and app developer. Now we will split the personas between authors and users.

Template Author

A template author is the expert on some kubernetes resource (e.g. Pods, kpack Images, Knative Services). They understand the behavior of the resource and the fields in the resource’s spec and status. It is their responsibility to write a Cartographer template that wraps their resource.

Supply Chain Author

A supply chain author is the organization’s expert on organizational policy. They know what steps must happen to take source code and verify that it is ready for deployment on a cluster. (There are no special steps for supply chain authors in this tutorial, but they are mentioned for completeness)

User

A user will be any persona that is interested in what happens when a workload is applied to the cluster. Often this is the app developer persona, who has created a workload and wants to know that their code has reached production. This can also be the app operator persona, who knows that some devs have workloads and wants to know that changes are smoothly reaching production.

Scenario

At the Hello World Application Inc., we’ve observed that not all workloads are providing valid configuration, leading to supply chains that cannot stamp out k8s deployments. We want to make sure that in this case, the workload object reflects the problem. We will add health rules to the template that we created in “Build Your First Supply Chain”.

Steps

Template Author Steps

Previously, we created a Supply Chain with just one step: it creates a deployment. As template authors it is our responsibility to be experts on the resource we template out. Let’s review just a few details that will be important to remember about kubernetes deployments:

  • A deployment creates a replicaset, which in turn creates pods.
  • A deployment status has conditions. Read more on k8s conditions.
  • The deployment condition “Available” reports whether the declared number of pod replicas are available on the cluster.
  • The deployment condition “Progressing” reports whether the managed replicasets are making progress in creating pods.
  • The progressing condition will change from True to False if the timeout set in the deployment’s spec.progressDeadlineSeconds field is exceeded.

For a more thorough review of Deployments, see the kubernetes documentation.

We’ll start by setting a progress deadline. We’ll set a timeout of 30 seconds because this is a demo (we’re not suggesting this is the appropriate value for the real world). In the template field of our cluster template we see our deployment. There we can see the new spec.progressDeadlineSeconds field.

apiVersion: carto.run/v1alpha1
kind: ClusterTemplate
metadata:
  name: app-deploy
spec:
  template:
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: $(workload.metadata.name)$-deployment
      labels:
        app: $(workload.metadata.name)$
    spec:
      progressDeadlineSeconds: 30 # <=== NEW CONFIG
      replicas: 3
      selector:
        matchLabels:
          app: $(workload.metadata.name)$
      template:
        metadata:
          labels:
            app: $(workload.metadata.name)$
        spec:
          containers:
            - name: $(workload.metadata.name)$
              image: $(workload.spec.image)$

Next we’ll write our health rule. When the workload reports the health of the deployment we’ll report if healthy is " True", “False”, or “Unknown”. Deployments have two conditions, progressing and available that report “True” or “False”. Let’s consider how we’ll want to represent each of these states:

Available Progressing Workload Reports Healthy as: Reason
True True True Pods are all available and any updates necessary are progressing properly
True False False There are pods available, but the necessary updates (changes our workload expects) aren’t progressing
False True Unknown The expected pods are not available, but work is progressing and may resolve
False False False The expected pods are not available, and necessary updates aren’t progressing

From this we know that Workload should report the Deployment as Healthy when both available and progressing are true. It should report False whenever progressing is False. And report unknown otherwise. With this in mind, we’re ready to write our healthrule.

Because health of a Deployment depends on more than one condition, we’ll write a multimatch health rule. A multimatch rule requires that we define what constitutes both healthy and unhealthy. (Good thing we just determined that above!) For both healthy and unhealthy we’ll specify a set of matchers. If all the healthy matchers are satisfied, we’ll report healthy == True. If any of the unhealthy matchers are satisfied, we’ll report healthy == False. Otherwise, we’ll report healthy == Unknown.

apiVersion: carto.run/v1alpha1
kind: ClusterTemplate
spec:
  ...
  healthRule:
    multiMatch:
      healthy:    # Matchers are ANDed
      unhealthy:  # Matchers are ORed

Note: Health rules are available on all Carto templates (e.g. ClusterSourceTemplate, ClusterImageTemplate, etc).

Let’s begin with the healthy matchers. Two different conditions on a Deployment must be true for it to be healthy. We can write these as matchConditions. We just need to provide the conditions' type and status.

      healthy:
        matchConditions:
          - type: Available
            status: 'True'
          - type: Progressing
            status: 'True'

And we can write the unhealthy matcher:

      unhealthy:
        matchConditions:
          - type: Progressing
            status: 'False'

Let’s bring this all together and look at the template we’ll apply to the cluster:

---
apiVersion: carto.run/v1alpha1
kind: ClusterTemplate
metadata:
  name: app-deploy
spec:
  template:
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: $(workload.metadata.name)$-deployment
      labels:
        app: $(workload.metadata.name)$
    spec:
      progressDeadlineSeconds: 30
      replicas: 3
      selector:
        matchLabels:
          app: $(workload.metadata.name)$
      template:
        metadata:
          labels:
            app: $(workload.metadata.name)$
        spec:
          containers:
            - name: $(workload.metadata.name)$
              image: $(workload.spec.image)$
  healthRule:
    multiMatch:
      healthy:
        matchConditions:
          - type: Available
            status: 'True'
          - type: Progressing
            status: 'True'
      unhealthy:
        matchConditions:
          - type: Progressing
            status: 'False'

Otherwise we’ll apply the same app operator objects (supply chain, service account, role, role binding) from the “Build Your First Supply Chain” tutorial.

App Dev Steps

Let’s apply a workload that we know will succeed, as we’ve used it before:

---
apiVersion: carto.run/v1alpha1
kind: Workload
metadata:
  name: hello
  labels:
    workload-type: pre-built
spec:
  image: docker.io/nginxdemos/hello:latest

Observe

We’ve seen this workload and supply chain before, so we know what objects will be created (a deployment, which will create a replicaset, which will create pods). What is different in this tutorial is the status of the workload itself.

Let’s observe the workload after giving a moment for the deployment’s pods to come up.

$ kubectl get -o yaml workload hello

First let’s consider the status.resources field:

status:
  ...
  resources:
      - name: deploy
        conditions:
          - type: ResourceSubmitted
            status: "True"
            reason: ResourceSubmissionComplete
          - type: Healthy
            status: True
            reason: MatchedCondition
            message: 'condition status: True, message: Deployment has minimum availability.'
          - reason: Ready
            status: Unknown
            type: Ready

Look at that second condition! Healthy is true. Our matchers were satisfied. Great stuff.

Next let’s look at the top level conditions of the workload and concentrate on the condition with type ResourcesHealthy:

status:
  conditions:
    - reason: HealthyConditionRule
      status: True
      type: ResourcesHealthy

This condition on the workload aggregates the health of all the objects created by the workload. If all are healthy, the condition is true. If any are unhealthy, the condition is False. Otherwise the condition is Unknown. In our case, the aggregation is trivial to compute; the workload’s ResourcesHealthy condition is true.

Steps of an unfortunate dev

At some point, each of us will make a mistake, like mistyping the name of an image in our workload. Let’s try submitting the following workload:

---
apiVersion: carto.run/v1alpha1
kind: Workload
metadata:
  name: typo
  labels:
    workload-type: pre-built
spec:
  image: docker.io/what-a-typo-this-image-definitely-does-not-exist/hello-world:latest

We’ll see what feedback we get in the workload status.

Observe

First, we’ll check the workload just after deploying, inspecting the status.resources:

$ kubectl get -o yaml workload typo
status:
  resources:
    - conditions:
        - type: ResourceSubmitted
          status: "True"
          reason: ResourceSubmissionComplete
        - type: Healthy
          status: Unknown
          reason: NoMatchesFulfilled
        - type: Ready
          status: Unknown
          reason: NoMatchesFulfilled

From our discussion above, we know that the deployment will never reach a healthy state, but until it hits the timeout it will continue to report that it is progressing but the expected pods are not available. We can observe this directly:

$ kubectl get -o yaml deployment typo-deployment
apiVersion: apps/v1
kind: Deployment
status:
  conditions:
    - message: Deployment does not have minimum availability.
      reason: MinimumReplicasUnavailable
      status: "False"
      type: Available
      ...
    - message: ReplicaSet "hello-deployment-SOMEHASH" is progressing.
      reason: ReplicaSetUpdated
      status: "True"
      type: Progressing
      ...

Let’s check back in on the deployment status after 30 seconds:

  - message: Deployment does not have minimum availability.
    reason: MinimumReplicasUnavailable
    status: "False"
    type: Available
  - message: ReplicaSet "typo-deployment-SOMEHASH" has timed out progressing.
    reason: ProgressDeadlineExceeded
    status: "False"
    type: Progressing

We see that the Progressing condition has switched to False.

Let’s verify that our workload healthy condition is reflecting this. We’ll observe status.resources

status:
  resources:
    - conditions:
      ...
      - type: Healthy
        status: "False"
        message: 'condition status: False, message: ReplicaSet "typo-deployment-7b8bd888d8"
        has timed out progressing.'
        reason: MatchedCondition

We see that status.resources reports that an unhealthy condition matcher was satisfied. The message of that condition on the deployment is reflected in the workload’s status.resources[x].conditions[x].message field.

And we cn observe that the workload’s top level conditions then mirror this message in the ResourcesHealthy condition:

status:
  conditions:
  ...
  - type: ResourcesHealthy
    status: "False"
    message: 'condition status: False, message: ReplicaSet "typo-deployment-7b8bd888d8"
      has timed out progressing.'
    reason: HealthyConditionRule

Wrap Up

Congratulations, you’ve used a healthrule to make your supply chain more understandable and repairable! You’ve learned:

  • How to specify a multimatch rule with matchConditions matchers
  • How to read the workload’s status.resources healthy conditions
  • How to read the workload’s ResourcesHealthy condition

To learn more, read the troubleshooting guide on ResourcesHealthy. It explores the possible values you’ll see and their meanings.

Also check out the reference page for the template CRDs

And read this blog post on an example resource for which determining a health rule is currently not possible!