- Published on
Implementing Docker-in-Docker in Kubernetes
- Authors
- Name
- Justin D Vrana
As part of my role, I focus on building bioinformatic and data pipelines. I envisioned a data platform that integrates various third-party open-source tools into data workflows, executable on auto-scaling serverless compute platforms like AWS EKS. This setup would allow us to handpick tools à la carte.
In this post, I'll discuss the setup but won't delve deeply into how workflow orchestration works. Briefly, we have a server for workflow orchestration that pulls from code repositories which define each workflow. These workflows are triggered on a schedule, by sensor detection, or manually through an API or GUI via a web app. Each job initiates a new node on the Kubernetes cluster and creates a Pod that houses the workflow code and the necessary software packages.
This approach is adequate for simple workflows. However, many bioinformatics pipelines require the use of third-party tools and Docker images, such as minimap2, FLASH merge, and Oxford Nanopore tools. These workflows necessitate calling additional Docker containers with separate images to execute specific commands. We could theoretically compile all tools and workflows into one comprehensive Docker image, but integrating all these tools, workflow codes, packages, and orchestration libraries into a single image proves cumbersome and complicates repository management. I found it more streamlined for each workflow operation to invoke its own Docker container as needed.
As you might infer from the title, each workflow runs within a containerized Pod. Therefore, we utilize Docker-in-Docker (DinD). There's a DinD Docker image available (docker:dind) that provides this functionality. However, using it as our base image would mean creating a new image with all our dependencies included.
Instead, we opted for the Kubernetes sidecar pattern. In addition to our primary container, we run a DinD sidecar container. By exposing the Docker daemon's TCP port and sharing the /var/lib/docker volume, we enable Docker-in-Docker functionality within our Pods.
apiVersion: apps/v1
kind: Deployment
metadata:
name: worker
spec:
replicas: 4
template:
spec:
containers:
- name: launcher
image: "busybox"
imagePullPolicy: Always
env:
- name: DOCKER_HOST
value: tcp://localhost:2375
volumeMounts:
- name: worker-storage
mountPath: /opt/worker
subPath: worker
- name: dind
image: "docker:dind"
imagePullPolicy: Always
command: ["dockerd", "--host", "tcp://127.0.0.1:2375"]
securityContext:
privileged: true
volumeMounts:
- name: worker-storage
mountPath: /var/lib/docker
subPath: docker
- name: common-storage
mountPath: /opt/worker
subPath: worker
volumes:
- name: common-storage
emptyDir: {}