by Arpit Kumar
10 Sep, 2023
9 minute read
Understanding fundamental working of Linux Containers and Docker

Insights into the core principles behind Linux containerization and Docker technology, exploring how these tools facilitate efficient application deployment, isolation, and management in a lightweight, portable manner


In last decade, software infrastructure has moved to the cloud. Most of the startups and now even enterprises are moving to cloud to provide the scalability and lower the capital cost. This has helped companies lower down cost of experimenting new products by scaling up and down infrastructure at will.

With cloud infrastructure new ways of deployments have come up. Microservices has become the de facto of most the software infrastructure. And as microservices gained prominence technologies like docker became universal.

Today let’s explore how linux containers and docker works as this has become the base of cloud deployment.

Docker depends on containerisation which is built on two main features of linux namely namespaces and cgroups. Overtime linux kernel added features which can be used to provided isolation for certain global resources.

We will discuss these two features and see how they basically work to make containers possible.

What are namespaces

Quoting below directly from man page of namespace

A namespace wraps a global system resource in an abstraction that makes it appear to the processes within the namespace that they have their own isolated instance of the global resource.

Changes to the global resource are visible to other processes that are members of the namespace, but are invisible to other processes.

In Linux, namespaces are a feature of the kernel that allow processes to have isolated views of system resources. Namespaces are used to provide process and resource isolation, making it possible for multiple processes to run on the same system without interfering with each other.

There are several types of namespaces in Linux, each responsible for isolating a specific set of resources. Here are some of the most commonly used namespaces:

PID Namespace (pid):

  • This namespace isolates the process ID (PID) number space. Each process in a PID namespace has its own unique set of PIDs, which may or may not correspond to PIDs in the parent namespace.
  • Useful for creating process hierarchies and isolating processes.

Mount Namespace (mnt):

  • Mount namespaces isolate the filesystem mount points. Processes in different mount namespaces can have different views of the filesystem, including their own separate root filesystem.
  • Useful for creating container-like environments with their own filesystem.

UTS Namespace (uts):

  • UTS namespaces isolate the hostname and domain name. This allows processes in different UTS namespaces to have their own separate system identification.
  • Useful for containers and virtual machines.

IPC Namespace (ipc):

  • IPC namespaces isolate interprocess communication resources like System V IPC objects (e.g., message queues, semaphores, shared memory).
  • Useful for process isolation and security.

Network Namespace (net):

  • Network namespaces provide isolation for network-related resources such as network interfaces, routing tables, and firewall rules. Processes in different network namespaces can have their own virtual network stack.
  • Essential for creating network isolation for containers and virtual machines.

User Namespace (user):

  • User namespaces isolate user and group IDs. They allow non-privileged users to have their own isolated view of user and group IDs, making it safer to run processes with reduced privileges.
  • Useful for containerization and privilege separation.

Cgroup Namespace (cgroup):

  • Cgroup namespaces allow processes to create their own cgroup hierarchies, isolating control over resource limits and accounting.
  • Useful for managing resource allocation within containers.

Time Namespace (time):

  • Time namespaces provide isolation for the clock sources and settings. Processes in different time namespaces can have their own separate timekeeping.
  • Potential use cases for specialized time-sensitive applications.

Each of these namespaces contributes to process and resource isolation, which is crucial for various use cases, including containerization, virtualization, and system security.

By combining these namespaces, Linux can create isolated environments for applications and services, preventing interference and improving system resource management.

If you do lsns in your linux machine you would see list of current namespaces with their types.


PID namespace
https://www.nginx.com/blog/what-are-namespaces-cgroups-how-do-they-work/

What are Cgroups

cgroups man page

Control groups, usually referred to as cgroups, are a Linux kernel feature which allow processes to be organized into hierarchical groups whose usage of various types of resources can then be limited and monitored. The kernel’s cgroup interface is provided through a pseudo-filesystem called cgroupfs.  Grouping is implemented in the core cgroup kernel code, while resource tracking and limits are implemented in a set of per-resource type subsystems (memory, CPU, and so on).

A cgroup is a collection of processes that are bound to a set of limits or parameters defined via the cgroup filesystem.

Various subsystems have been implemented, making it possible to do things such as limiting the amount of CPU time and memory available to a cgroup, accounting for the CPU time used by a cgroup, and freezing and resuming execution of the processes in a cgroup. Subsystems are sometimes knows as resource controllers.

Cgroups allows for the control and limitation of system resources (e.g., CPU, memory, disk I/O, network bandwidth) used by a set of processes. Through this Cgroups ensure that containers do not consume all available resources on a host system and enforce resource constraints.

  1. Hierarchy: Cgroups are organized into a hierarchy, which resembles a filesystem tree structure. Each level in the hierarchy represents a specific resource type (e.g., CPU, memory) or subsystem.
  2. Controllers: Cgroups use controllers to manage specific resources. Each controller is responsible for regulating a specific resource type, such as the CPU controller for CPU usage or the memory controller for memory usage.
  3. Assignment: When a process is started inside a container, it is associated with a cgroup within the hierarchy. This assignment is determined by the container runtime, like Docker.
  4. Resource Limits: For each cgroup, resource limits can be set using the controllers. These limits define how much of a resource a group of processes (e.g., those running in a container) can use.
  5. Monitoring and Accounting: Cgroups provide monitoring and accounting information, allowing administrators to track resource usage and enforce resource limits.

Container runtimes use cgroups to enforce resource constraints specified by users or administrators, ensuring that containers do not monopolize system resources.

For example, if a container has a memory limit of 512MB, the memory controller cgroup enforces this limit, preventing the container from exceeding its allocated memory.

Namespaces isolate processes and their views of system resources, while cgroups manage and limit the use of those resources.

These technologies enable the creation of secure, isolated, and efficient containerized environments.

Docker Ecosystem on top of Namespaces and Cgroups

Docker provides isolation through a combination of these technologies and mechanisms, both at the application and operating system levels.

Process Isolation

Docker containers run as separate processes on the host system. Each container has its own isolated process space, and processes within one container cannot see or interfere with processes in other containers.

Network Isolation

Docker provides networking isolation by creating virtual network interfaces for each container.

Containers can be connected to user-defined bridge networks, overlay networks, or other custom networks, allowing them to communicate with each other while isolating their network traffic from the host and other containers.

User Isolation

Docker containers often run as non-root users within the container. This limits their access to sensitive system resources and reduces the potential impact of security breaches.

Isolation from Host

Containers are isolated from the host system. Even though they share the host’s kernel, they cannot directly access the host’s filesystem, processes, or network interfaces.

Filesystem Isolation

Docker in conjunction with cgroup and namespace adds unique filesystem isolation using UnionFS.


Union FS
https://medium.com/@knoldus/unionfs-a-file-system-of-a-container-2136cd11a779

It is a type of file system that allows multiple directories or file systems to be combined into a single, unified directory structure.

Union file systems are particularly useful in situations where you want to overlay multiple directories or file systems while presenting them as a single directory hierarchy without physically copying the data.

The key features of a Union Filesystem include:

  1. Overlaying: Union file systems allow you to overlay multiple directories or file systems, which means you can combine the contents of these directories into a single view. Changes to the virtual filesystem are typically transparently managed without affecting the underlying file systems.
  2. Read-Only and Read-Write Layers: In a typical union file system, you have multiple layers. The lower layers are often read-only, and the uppermost layer may be read-write. This setup is beneficial for scenarios where you have a base filesystem (read-only) and then multiple overlay layers (read-write) where changes are stored without modifying the original data.
  3. Priority and Order: Union filesystems have rules for determining which layer takes precedence when there are conflicting files in multiple layers. Generally, files in upper layers take precedence over those in lower layers. If a file exists in both the upper and lower layers with the same path, the one in the upper layer will be visible.
  4. Copy-on-Write (CoW): Changes made to files in upper layers are often implemented as copy-on-write operations. This means that when a modification is made to a file, a new copy of the file is created in the upper layer, and the original file remains intact in the lower layer. This minimizes the risk of data corruption and allows for efficient use of storage space.

This provides a flexible way to combine multiple directories or file systems into a single view, making them valuable in various computing scenarios, particularly for lightweight and efficient storage management.

Each Docker container has its own isolated filesystem, which is created using the “Union File System” (UnionFS) or overlay filesystem.

UnionFS layers allow containers to share a base image while maintaining their individual file systems. This reduces duplication and minimizes storage overhead.

By combining these technologies and mechanisms, Docker creates a high level of isolation between containers, ensuring that they operate independently and securely.

This isolation allows containers to be portable and run consistently across various environments while minimizing potential security risks and resource conflicts.

Docker vs Linux Containers

Docker containers and Linux containers are closely related, but there are some important distinctions.

Docker is a platform that provides a comprehensive solution for building, packaging, distributing, and managing containers. It includes a higher-level toolset built around containerization.

Linux containers, on the other hand, refer to the containerization technology itself, which is a feature provided by the Linux kernel. Linux containers are a low-level technology that allows you to isolate processes and filesystems, and they can be used independently of Docker.

Container Format

Docker uses its own container format and runtime, which includes Docker images and the Docker Engine.

Linux containers are a generic term that encompasses various container runtimes, including Docker, containerd, rkt (pronounced “Rocket”), and more. These runtimes can use the same container format or different formats, such as OCI (Open Container Initiative) standards.

Ecosystem and Tools

Docker has a rich ecosystem of tools and services that make it easy to work with containers, including Docker Compose for defining multi-container applications, Docker Swarm for container orchestration, and Docker Hub for sharing and distributing images.

Linux containers may require additional tools and setup to achieve similar functionality. For example, Kubernetes, which is a popular container orchestration platform, can be used with various container runtimes, including Docker and containerd.

Docker is primarily focused on providing a user-friendly experience for developers and DevOps teams. It abstracts many of the complexities of working with containers and offers a high-level command-line interface.

Linux containers, being a kernel-level technology, are more developer-agnostic. They can be accessed and manipulated directly using system tools, making them flexible but potentially more complex for some users.

Linux containers can be used without Docker, and alternative container runtimes can be employed to create and manage containers without Docker’s toolset.

Your choice between Docker and Linux containers depends on your specific requirements and familiarity with the technology.

Recent Posts

Understanding Asynchronous I/O in Linux - io_uring
Explore the evolution of I/O multiplexing from `select(2)` to `epoll(7)`, culminating in the advanced io_uring framework
Building a Rate Limiter or RPM based Throttler for your API/worker
Building a simple rate limiter / throttler based on GCRA algorithm and redis script
MicroVMs, Isolates, Wasm, gVisor: A New Era of Virtualization
Exploring the evolution and nuances of serverless architectures, focusing on the emergence of MicroVMs as a solution for isolation, security, and agility. We will discuss the differences between containers and MicroVMs, their use cases in serverless setups, and highlights notable MicroVM implementations by various companies. Focusing on FirecrackerVM, V8 isolates, wasmruntime and gVisor.

Get the "Sum of bytes" newsletter in your inbox
No spam. Just the interesting tech which makes scale possible.