Distributed Locks: Use Cases and Implementations

by Arpit Kumar

30 Nov, 2024

6 minute read

Explores use cases, principles, and common implementation strategies, including Redis, ZooKeeper, and etcd for distributed locks

# distributedlocks # redis # zookeepr

Table of Content

Use Cases for Distributed Locks

Distributed Lock Implementations

Best Practices and Considerations

Conclusion

References

Most of our modern-day services and products are backed by digital infrastructure. Behind the scenes, workloads are generally divided into smaller tasks to ensure correctness, reliability, availability, and durability.

There are a variety of workloads that need to run behind the scenes to maintain these services. The majority of these workloads are deployed in a distributed manner, meaning a particular task can be performed by any machine out of a group of worker machines. Whether it’s API services, streaming jobs, or workers with queued tasks, distributed systems are common.

From the perspective of reliability and correctness, many tasks performed by these systems must be executed only once or processed by a single worker/process at a time.

This is where distributed locks become essential. Distributed locks are a critical component in distributed systems, allowing multiple processes or nodes to coordinate access to shared resources. In this post, we’ll explore the key use cases for distributed locks and examine some common implementation approaches.

Why Distributed Locks Matter

In a distributed system, resources might be accessed and modified by multiple nodes. Without proper coordination, concurrent operations can lead to unpredictable results. Distributed locks are crucial for

Consistency: Ensuring that only one process modifies a resource at a time to maintain data integrity.
Coordination: Synchronizing actions across different nodes that need to work together.
Fault Tolerance: Handling failures gracefully and ensuring that locks are properly released or reassigned if a node crashes.

Core Principles of Distributed Locks

Distributed locks rely on a few key principles to function correctly:

Exclusivity: At any moment, only one node should hold the lock and have access to the shared resource.
Timeout: Locks should have a timeout to prevent indefinite holding if a node fails or becomes unresponsive.
Fairness: Ideally, locks should be granted in the order requests are received to avoid starvation.
Atomicity: Lock acquisition and release operations should be atomic to prevent race conditions.

Use Cases for Distributed Locks

Resource Contention Management

Distributed locks help manage access to shared resources like databases, files, or API rate limits. They ensure that only one process at a time can modify a resource, preventing data corruption or inconsistencies.
Leader Election

In distributed systems, it’s often necessary to designate a leader node for coordination tasks. Distributed locks can facilitate leader election by allowing nodes to compete for a lock, with the winner becoming the leader.
Distributed Cron Jobs

When running scheduled tasks across multiple nodes, distributed locks prevent duplicate job execution. Only the node that acquires the lock will run the task.
Concurrency Control in Microservices

Microservices often need to coordinate operations across multiple services. Distributed locks can ensure that critical sections of code are executed by only one service instance at a time.
Distributed Transactions

In systems without built-in distributed transaction support, locks can be used to implement basic transactional semantics across multiple resources.

Distributed Lock Implementations

Let’s examine some common approaches to implementing distributed locks:

1. Database Level Row Locks

Many relational databases provide features that can be used to implement distributed locks:


  BEGIN;
  -- Insert or update the lock
  INSERT INTO distributed_locks (key, locked_at)
  VALUES ('my_key', NOW())
  ON CONFLICT (key) DO NOTHING;

  -- Check if the lock was acquired
  SELECT * FROM distributed_locks WHERE key = 'my_key';
  -- Perform critical operation
  COMMIT;

However, this approach adds additional load on the database and increases the potential for deadlocks if not implemented carefully.

Another way to implement locking in databases is by using SELECT … FOR UPDATE

  BEGIN;

  -- Ensure the lock exists and acquire it
  INSERT INTO distributed_locks (lock_key)
  VALUES ('my_lock_key')
  ON CONFLICT (lock_key) DO NOTHING;

  -- Lock the row
  SELECT *
  FROM distributed_locks
  WHERE lock_key = 'my_lock_key'
  FOR UPDATE;

  -- Critical section: perform your operations here

  COMMIT;

The lock is automatically released when the transaction ends (COMMIT or ROLLBACK).

2. Redis-based Locks (Redlock algorithm)

Redis provides atomic operations that make it suitable for implementing distributed locks:

import redis

def acquire_lock(resource, ttl):
    return redis.set(resource, 'locked', nx=True, ex=ttl)

def release_lock(resource):
    redis.delete(resource)

The Redlock algorithm extends this concept to multiple Redis instances for increased reliability.

For more information, Redis documentation provides a comprehensive guide: https://redis.io/docs/latest/develop/use/patterns/distributed-locks/

3. ZooKeeper

Apache ZooKeeper, a distributed coordination service, provides robust primitives for implementing distributed locks.

ZooKeeper is fast and offers strong ordering guarantees.

From Zookeeper official docs:

Unlike is standard file systems, each node in a ZooKeeper namespace can have data associated with it as well as children. It is like having a file-system that allows a file to also be a directory. (ZooKeeper was designed to store coordination data: status information, configuration, location information, etc., so the data stored at each node is usually small, in the byte to kilobyte range.) We use the term znode to make it clear that we are talking about ZooKeeper data nodes.

Znodes maintain a stat structure that includes version numbers for data changes, ACL changes, and timestamps, to allow cache validations and coordinated updates. Each time a znode’s data changes, the version number increases. For instance, whenever a client retrieves data it also receives the version of the data.

The data stored at each znode in a namespace is read and written atomically. Reads get all the data bytes associated with a znode and a write replaces all the data. Each node has an Access Control List (ACL) that restricts who can do what.

ZooKeeper also has the notion of ephemeral nodes. These znodes exists as long as the session that created the znode is active. When the session ends the znode is deleted. Ephemeral nodes are useful when you want to implement [tbd].

https://zookeeper.apache.org/doc/r3.1.2/recipes.html#sc_recipes_Locks

znodes in zookeeper — Source: (https://zookeeper.apache.org/doc/r3.1.2/zookeeperOver.html)

public class DistributedLock {
    private final ZooKeeper zk;
    private final String lockPath;

    public boolean acquireLock() throws KeeperException, InterruptedException {
        try {
            zk.create(lockPath, new byte[0], ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.EPHEMERAL);
            return true;
        } catch (KeeperException.NodeExistsException e) {
            return false;
        }
    }

    public void releaseLock() throws KeeperException, InterruptedException {
        zk.delete(lockPath, -1);
    }
}

This is one of the most correct and fast implementation but requires zookeeper cluster setup which can be a bit challenging if the team is not expert in managing it.

4. etcd

etcd, a distributed key-value store, offers a lease mechanism that’s well-suited for implementing distributed locks:

func acquireLock(ctx context.Context, client \*clientv3.Client, key string) error {
lease, err := client.Grant(ctx, 10) // 10-second TTL
if err != nil {
return err
}

    _, err = client.Txn(ctx).
        If(clientv3.Compare(clientv3.CreateRevision(key), "=", 0)).
        Then(clientv3.OpPut(key, "locked", clientv3.WithLease(lease.ID))).
        Commit()

    if err != nil {
        return err
    }

    return nil

}

Best Practices and Considerations

When implementing distributed locks, keep these factors in mind:

Fencing Tokens: Use monotonically increasing values with locks to prevent issues with stale lock holders.
Timeouts and TTLs: Always use timeouts when acquiring locks and TTLs on the locks themselves to prevent indefinite blocking.
Reentrancy: Consider whether your locks need to be reentrant (allowing the same process to acquire the lock multiple times).
Lock Granularity: Balance between fine-grained locks for maximum concurrency and coarse-grained locks for simplicity.
Failure Handling: Design your system to handle cases where lock acquisition fails or where the lock-holding process crashes.
Performance Impact: Monitor the performance impact of your locking system, especially under high concurrency.

Conclusion

Distributed locks are a powerful tool for coordinating activities in distributed systems. By understanding the use cases and carefully considering implementation options, you can choose the right approach for your specific needs. Remember that distributed locks add complexity to your system, so use them judiciously and always consider simpler alternatives where appropriate.