Distributed storage systems must navigate the trade-offs between Consistency, Availability, and Partition Tolerance (the CAP Theorem). This lab uses a distributed MinIO cluster and network chaos tools to visualize these trade-offs in action.

Learning Objectives

  • Experience network partitions and latency in a distributed storage system.
  • Understand the real-world implications of the CAP Theorem.
  • Observe quorum behavior and cluster recovery.

Prerequisites

  • Docker and Docker Compose installed
  • Basic understanding of distributed systems
  • A terminal with curl or a load-testing tool like ab (Apache Bench)

Step 1: Distributed Cluster Setup

We will deploy a 4-node distributed MinIO cluster. In a 4-node setup, MinIO requires a quorum of nodes to be available to perform operations.

  1. Create a docker-compose.yml file:

    services:
      minio1:
        image: quay.io/minio/minio
        volumes:
          - data1-1:/data1
        ports:
          - '9001:9000'
        environment:
          MINIO_ROOT_USER: admin
          MINIO_ROOT_PASSWORD: password
        command: server http://minio{1...4}/data{1...1}
    
      minio2:
        image: quay.io/minio/minio
        volumes:
          - data2-1:/data1
        ports:
          - '9002:9000'
        environment:
          MINIO_ROOT_USER: admin
          MINIO_ROOT_PASSWORD: password
        command: server http://minio{1...4}/data{1...1}
    
      minio3:
        image: quay.io/minio/minio
        volumes:
          - data3-1:/data1
        ports:
          - '9003:9000'
        environment:
          MINIO_ROOT_USER: admin
          MINIO_ROOT_PASSWORD: password
        command: server http://minio{1...4}/data{1...1}
    
      minio4:
        image: quay.io/minio/minio
        volumes:
          - data4-1:/data1
        ports:
          - '9004:9000'
        environment:
          MINIO_ROOT_USER: admin
          MINIO_ROOT_PASSWORD: password
        command: server http://minio{1...4}/data{1...1}
    
    volumes:
      data1-1:
      data2-1:
      data3-1:
      data4-1:
    
  2. Start the cluster:

    docker-compose up -d
    
  3. Verify: Access any node (e.g., localhost:9001) and ensure you can see the health of all 4 nodes in the dashboard.


Step 2: Baseline Benchmarking

  1. Upload an object:

    ./mc alias set cluster http://localhost:9001 admin password
    ./mc mb cluster/test-bucket
    dd if=/dev/urandom of=sample.bin bs=1M count=10
    ./mc cp sample.bin cluster/test-bucket/
    
  2. Measure Latency: Use a loop to repeatedly read the object and measure the time taken.

    time for i in {1..20}; do ./mc cat cluster/test-bucket/sample.bin > /dev/null; done
    

Step 3: Inducing Chaos with Toxiproxy

Toxiproxy allows us to simulate network conditions. We will inject latency and then a partition.

  1. Add Toxiproxy to your Compose file: (Conceptual - in a real lab, you would route minio1's traffic through a proxy container).

  2. Simulate Latency: Introduce 500ms of latency to minio4. Observe how the "time for i in..." loop now takes significantly longer as the cluster waits for nodes to synchronize.

  3. Induce a Partition: Completely block traffic to minio3 and minio4.

    • Your cluster now has 2 nodes online and 2 offline.
    • For a 4-node cluster, MinIO typically requires (N/2 + 1) nodes for a "Read+Write" quorum.

Step 4: Observation & Recovery

  1. Test Quorum:

    • Attempt to Write a new object: mc cp sample.bin cluster/test-bucket/new-file.bin
    • Attempt to Read the existing object: mc cat cluster/test-bucket/sample.bin
  2. Analysis:

    • Does the write fail? Why? (Likely due to lack of write quorum).
    • Does the read succeed? MinIO often allows reads if at least N/2 nodes are up.
  3. Heal the Cluster: Restore connectivity to the partitioned nodes.

    # Remove the toxiproxy blocks or restart the containers
    docker-compose restart
    
  4. Verify Synchronization: Check the logs to see the nodes handshaking and re-syncing any missed data.


Lab Reflection

  1. In the CAP theorem, which two letters does MinIO prioritize by default during a partition?
  2. How does adding more nodes (e.g., moving from 4 to 16 nodes) affect the probability of maintaining availability during a network split?
  3. What is the difference between "Strong Consistency" and "Eventual Consistency" in the context of what you saw during the recovery phase?