Understanding Kubernetes Probes - Liveness, Readiness, and Startup

Kubernetes probes are a critical component for managing the health and availability of applications running in containers. There are three main types of probes: liveness, readiness, and startup. Each serves a unique purpose and is configured to help Kubernetes make intelligent decisions about the lifecycle of containers in a pod. This blog post will delve into the differences between these probes, illustrate when to use each, and provide YAML configuration examples.

Feature	Liveness Probe	Readiness Probe	Startup Probe
Purpose	Checks if the container is still running.	Checks if the container is ready to serve.	Checks if the application within the container has started.
Action on Fail	Restarts the container.	Stops routing traffic to the container.	Prevents liveness and readiness checks until it succeeds.
Use Case	Detect container deadlocks or other issues.	Determine when a container can accept traffic.	Ensure slow-starting applications don’t get killed.

probe examples

apiVersion: v1
kind: Pod
metadata:
  name: liveness-example
spec:
  containers:
  - name: liveness
    image: k8s.gcr.io/busybox
    args:
    - /bin/sh
    - -c
    - touch /tmp/healthy; sleep 30; rm -rf /tmp/healthy; sleep 600
    livenessProbe:
      exec:
        command:
        - cat
        - /tmp/healthy
      initialDelaySeconds: 5
      periodSeconds: 5
    readinessProbe:
      exec:
        command:
        - cat
        - /tmp/ready
      initialDelaySeconds: 5
      periodSeconds: 5
    startupProbe:
      exec:
        command:
        - cat
        - /tmp/healthy
      failureThreshold: 30
      periodSeconds: 10

Adjust the following based on what your application needs:

Parameter	Description	Reason to Adjust	Example Scenario
initialDelaySeconds	The time in seconds Kubernetes waits before performing the first probe after the container starts.	Allows the application enough time to start up.	Set to 40 seconds for an app that takes about 30 seconds to start and stabilize.
periodSeconds	Defines how often (in seconds) the probe is performed after the initial delay.	Balances responsiveness and resource use.	Set to 10 seconds if the application's state is relatively stable.
failureThreshold	Indicates the number of consecutive failures required for Kubernetes to take the action specified by the probe (restart or stop traffic).	Avoids reacting to transient issues or temporary spikes.	Set to 3 to prevent Kubernetes from reacting to momentary failures in an app with occasional spikes.