Multiple Components In One Server

The VDA cluster could manager thousands of DN and CN. Generally, each DN or CN should be deployed to a dedicate server. But for testing purpose, we can deploy multiple DNs and CNs to a single server. This guide will show you how to do that and examine how VDA manages multiple DNs and CNs. Below is the components we will deploy:

_images/multiple_components_in_one_server.png

Each cn_agnet and dn_agent should have its own spdk application. We will deploy 2 cn_agent and 2 dn_agent, so we will launch 4 spdk applications. All of their NVMeOF targets should listen on different port. To deploy these spdk applications, the server should have at least 8G hugepages.

Note

In this guide, we deploy the VDA components to a ubuntu20.04 system. But you could depploy them to any linux x86_86 system.

Create a work directory

Here we create a directory. We will store all the data (e.g. sockets, logs, etcd data) to this directory.

mkdir -p /tmp/vda_data

Install and launch etcd

Follow the official install guide to install etcd. The easy way is to download the pre-built binaries. You can open the latest release page <https://github.com/etcd-io/etcd/releases/latest>, and find the binaries for your OS and arch. In this doc, the latest version is v3.5.0 and we choose the linux-amd64 one:

curl -L -O https://github.com/etcd-io/etcd/releases/download/v3.5.0/etcd-v3.5.0-linux-amd64.tar.gz
tar xvf etcd-v3.5.0-linux-amd64.tar.gz

Go Launch etcd:

etcd-v3.5.0-linux-amd64/etcd --listen-client-urls http://localhost:2389 \
--advertise-client-urls http://localhost:2389 \
--listen-peer-urls http://localhost:2390 \
--name etcd0 --data-dir /tmp/vda_data/etcd0.data \
> /tmp/vda_data/etcd0.log 2>&1 &

Here we don’t use the default etcd port nubmers. Letter we will let the VDA control plane components (portal and monitor) connect to the etcd 2398 port.

Install vda

Go to the vda latest release. Download and unzip the package. In this doc, the latest version is v0.2.1:

curl -L -O https://github.com/virtual-disk-array/vda/releases/download/v0.2.1/vda_linux_amd64_v0.2.1.tar.gz
tar xvf vda_linux_amd64_v0.2.1.tar.gz

Go to the vda_linux_amd64_v0.2.1 directory. We will run all the following commands in this directory:

cd vda_linux_amd64_v0.2.1

Prepare SPDK environment

The vda dataplane code is a SPDK application, so we should configure the SPDK environment before we run it. Here we use 8G hugepages because we will launch multiple SPDK applications. The default 2G hugepage is not large enough:

sudo HUGEMEM=8192 ./spdk/scripts/setup.sh

Launch dn0

Launch the dataplane application:

sudo ./vda_dataplane --config ./dataplane_config.json \
--rpc-socket /tmp/vda_data/dn0.sock > /tmp/vda_data/dn0.log 2>&1 &

Change the ower of dn0.sock, so the controlplane agent could communicate with it:

sudo chown $(id -u):$(id -g) /tmp/vda_data/dn0.sock

Launch the controlplane agent:

./vda_dn_agent --network tcp --address '127.0.0.1:9720' \
--sock-path /tmp/vda_data/dn0.sock --sock-timeout 10 \
--lis-conf '{"trtype":"tcp","traddr":"127.0.0.1","adrfam":"ipv4","trsvcid":"4420"}' \
--tr-conf '{"trtype":"TCP"}' \
> /tmp/vda_data/dn_agent_0.log 2>&1 &

We let the dn0 controlplane listen on 127.0.0.1:9720, the dataplane listen on 127.0.0.1:4420.

Launch dn1

Launch the dataplane application:

sudo ./vda_dataplane --config ./dataplane_config.json \
--rpc-socket /tmp/vda_data/dn1.sock > /tmp/vda_data/dn1.log 2>&1 &

Change the owner of dn0.sock, so the controlplane agent could communicate with it:

sudo chown $(id -u):$(id -g) /tmp/vda_data/dn1.sock

Launch the controlplane agent:

./vda_dn_agent --network tcp --address '127.0.0.1:9721' \
--sock-path /tmp/vda_data/dn1.sock --sock-timeout 10 \
--lis-conf '{"trtype":"tcp","traddr":"127.0.0.1","adrfam":"ipv4","trsvcid":"4421"}' \
--tr-conf '{"trtype":"TCP"}' \
> /tmp/vda_data/dn_agent_1.log 2>&1 &

We let the dn1 controlplane listen on 127.0.0.1:9721, the dataplane listen on 127.0.0.1:4421.

Launch cn0

Launch the dataplane application:

sudo ./vda_dataplane --config ./dataplane_config.json \
--rpc-socket /tmp/vda_data/cn0.sock > /tmp/vda_data/cn0.log 2>&1 &

Change the owner of cn0.sock, so the controlplane agent could communicate with it:

sudo chown $(id -u):$(id -g) /tmp/vda_data/cn0.sock

Launch the controlplane agent:

./vda_cn_agent --network tcp --address '127.0.0.1:9820' \
--sock-path /tmp/vda_data/cn0.sock --sock-timeout 10 \
--lis-conf '{"trtype":"tcp","traddr":"127.0.0.1","adrfam":"ipv4","trsvcid":"4430"}' \
--tr-conf '{"trtype":"TCP"}' \
> /tmp/vda_data/cn_agent_0.log 2>&1 &

We let the cn0 controlplane listen on 127.0.0.1:9820, the dataplane listen on 127.0.0.1:4430.

Launch cn1

Launch the dataplane application:

sudo ./vda_dataplane --config ./dataplane_config.json \
--rpc-socket /tmp/vda_data/cn1.sock > /tmp/vda_data/cn1.log 2>&1 &

Change the owne of cn0.sock, so the controlplane agent could communicate with it:

sudo chown $(id -u):$(id -g) /tmp/vda_data/cn1.sock

Launch the controlplane agent:

./vda_cn_agent --network tcp --address '127.0.0.1:9821' \
--sock-path /tmp/vda_data/cn1.sock --sock-timeout 10 \
--lis-conf '{"trtype":"tcp","traddr":"127.0.0.1","adrfam":"ipv4","trsvcid":"4431"}' \
--tr-conf '{"trtype":"TCP"}' \
> /tmp/vda_data/cn_agent_1.log 2>&1 &

We let the cn1 controlplane listen on 127.0.0.1:9821, the dataplane listen on 127.0.0.1:4431.

Launch portal

Run below command:

./vda_portal --portal-address '127.0.0.1:9520' --portal-network tcp \
--etcd-endpoints localhost:2389 \
> /tmp/vda_data/portal.log 2>&1 &

Launch monitor

Run below command:

./vda_monitor --etcd-endpoints localhost:2389 \
> /tmp/vda_data/monitor.log 2>&1 &

Create DNs, PDs and CNs

Create dn0:

./vda_cli dn create --sock-addr localhost:9720 \
--tr-type tcp --tr-addr 127.0.0.1 --adr-fam ipv4 --tr-svc-id 4420 \
--location localhost:9720

Create pd0 on dn0:

dd if=/dev/zero of=/tmp/vda_data/pd0.img bs=1M count=512
./vda_cli pd create --sock-addr localhost:9720 --pd-name pd0 \
--bdev-type-key aio --bdev-type-value /tmp/vda_data/pd0.img

Create dn1:

./vda_cli dn create --sock-addr localhost:9721 \
--tr-type tcp --tr-addr 127.0.0.1 --adr-fam ipv4 --tr-svc-id 4421 \
--location localhost:9721

Create pd1 on dn1:

dd if=/dev/zero of=/tmp/vda_data/pd1.img bs=1M count=512
./vda_cli pd create --sock-addr localhost:9721 --pd-name pd1 \
--bdev-type-key aio --bdev-type-value /tmp/vda_data/pd1.img

When we create dn0 and dn1, we use the --location option. The location is a string. When the VDA allocate VDs across multiple DNs, it will make sure no two DNs has the same location. It will make sure the the DA is constructed by multiple DNs. If we omit the --location, it means this DN can go together with any other DN.

In previous tutorial, we use malloc bdev as pd. Here we use aio bdev as pd0 and pd1. The aio bdev is also used as test purpose. You could create a file as the backend of the aio bdev. The file size will be the aio bdev size. So the aio bdev could be used to emulate larger bdev than malloc bdev. The pd1 could have the same pd-name as pd0, here we use different name for avoid confusing.

Create cn0:

./vda_cli cn create --sock-addr localhost:9820 \
--tr-type tcp --tr-addr 127.0.0.1 --adr-fam ipv4 --tr-svc-id 4430 \
--location localhost:9820

Create cn1:

./vda_cli cn create --sock-addr localhost:9821 \
--tr-type tcp --tr-addr 127.0.0.1 --adr-fam ipv4 --tr-svc-id 4431 \
--location localhost:9821

Similar as dn0 and da1, we use the --location to make sure we won’t allocate two cntlrs from the same CN.

Create da0

create da0:

./vda_cli da create --da-name da0 --size-mb 128 --physical-size-mb 128 \
--cntlr-cnt 2 --strip-cnt 2 --strip-size-kb 64

We have two CNs, so we can set --cntlr-cnt 2, let the da0 have two cntlrs. We have two DNs, so we can set --strip-cnt 2, let the dn0 have two strips.

Get the da0 status

Run below command to get the DA status:

./vda_cli da get --da-name da0

Below is an example response:

{
  "reply_info": {
    "req_id": "fded5447-b92e-4642-b21f-448c5977f2b1",
    "reply_msg": "succeed"
  },
  "disk_array": {
    "da_id": "81427a2f66f64c228bd0d8ef25817a50",
    "da_name": "da0",
    "da_conf": {
      "qos": {},
      "strip_cnt": 2,
      "strip_size_kb": 64
    },
    "cntlr_list": [
      {
        "cntlr_id": "0ee93ac9fee54eb99e0ae0095e2c523c",
        "sock_addr": "localhost:9820",
        "is_primary": true,
        "err_info": {
          "timestamp": "2021-06-22 05:45:52.255526703 +0000 UTC"
        }
      },
      {
        "cntlr_id": "4d296c6044994f0aaee7ef9ea14571d9",
        "sock_addr": "localhost:9821",
        "cntlr_idx": 1,
        "err_info": {
          "timestamp": "2021-06-22 05:45:52.443623618 +0000 UTC"
        }
      }
    ],
    "grp_list": [
      {
        "grp_id": "45d0135352ed4620a760f874ca8f1560",
        "size": 134217728,
        "err_info": {
          "timestamp": "2021-06-22 05:45:51.391511017 +0000 UTC"
        },
        "vd_list": [
          {
            "vd_id": "821db145028c41a5b7bdd5257be3e1f1",
            "sock_addr": "localhost:9720",
            "pd_name": "pd0",
            "size": 67108864,
            "qos": {},
            "be_err_info": {
              "timestamp": "2021-06-22 05:45:47.47142903 +0000 UTC"
            },
            "fe_err_info": {
              "timestamp": "2021-06-22 05:45:51.231529123 +0000 UTC"
            }
          },
          {
            "vd_id": "5a786119a887413ea39716b0baf419cd",
            "vd_idx": 1,
            "sock_addr": "localhost:9721",
            "pd_name": "pd1",
            "size": 67108864,
            "qos": {},
            "be_err_info": {
              "timestamp": "2021-06-22 05:45:47.947491643 +0000 UTC"
            },
            "fe_err_info": {
              "timestamp": "2021-06-22 05:45:49.663537187 +0000 UTC"
            }
          }
        ]
      }
    ]
  }
}

There are two cntlrs in the cntlr_list. We can find "is_primary": true from the first cntlr, so it is the primary. There are also two VDs in the vd_list, one is allocated from localhost:9720/pd0, another is allocated from localhost:9721/pd1.

Create exp0a

Run below command to create an EXP:

./vda_cli exp create --da-name da0 --exp-name exp0a \
--initiator-nqn nqn.2016-06.io.spdk:host0

Get exp0a status

Run below command to get the EXP status:

./vda_cli exp get --da-name da0 --exp-name exp0a

Below is an exmaple response:

{
  "reply_info": {
    "req_id": "0b05cada-25f7-4cf5-aac1-cbc1d4f77779",
    "reply_msg": "succeed"
  },
  "exporter": {
    "exp_id": "e01d5adb4f694591afdce2838b9112d9",
    "exp_name": "exp0a",
    "initiator_nqn": "nqn.2016-06.io.spdk:host0",
    "target_nqn": "nqn.2016-06.io.vda:exp-da0-exp0a",
    "serial_number": "c5e94c313982b7e362dd",
    "model_number": "VDA_CONTROLLER",
    "exp_info_list": [
      {
        "nvmf_listener": {
          "tr_type": "tcp",
          "adr_fam": "ipv4",
          "tr_addr": "127.0.0.1",
          "tr_svc_id": "4430"
        },
        "err_info": {
          "timestamp": "2021-06-22 05:50:16.047444703 +0000 UTC"
        }
      },
      {
        "cntlr_idx": 1,
        "nvmf_listener": {
          "tr_type": "tcp",
          "adr_fam": "ipv4",
          "tr_addr": "127.0.0.1",
          "tr_svc_id": "4431"
        },
        "err_info": {
          "timestamp": "2021-06-22 05:50:18.039508566 +0000 UTC"
        }
      }
    ]
  }
}

We can see two items in the exp_info_list, they are the two EXPs on the two cntlrs. The host can connect to both of them.

Connect to the DA/EXP

Install the nvme-tcp kernel module:

sudo modprobe nvme-tcp

Install the nvme-cli. E.g. you may run below command in a ubuntu system:

sudo apt install -y nvme-cli

Now we can connect to the two cntlrs:

sudo nvme connect -t tcp -n nqn.2016-06.io.vda:exp-da0-exp0a -a 127.0.0.1 -s 4430 --hostnqn nqn.2016-06.io.spdk:host0
sudo nvme connect -t tcp -n nqn.2016-06.io.vda:exp-da0-exp0a -a 127.0.0.1 -s 4431 --hostnqn nqn.2016-06.io.spdk:host0

If the kernel nvme multiple path is enabled, the two cntlrs will be aggregated to a single device autoamtically. You man run below command to check whether nvme multiple is enabled:

grep CONFIG_NVME_MULTIPATH /boot/config-$(uname -r)

You may use it as a normal disk on the host, e.g.:

sudo parted /dev/disk/by-id/nvme-VDA_CONTROLLER_c5e94c313982b7e362dd print

Check the cluster status

List all the CNs:

./vda_cli cn list

Result:

{
  "reply_info": {
    "req_id": "68a165e6-5314-43f8-9561-c1ba506a79dc",
    "reply_msg": "succeed"
  },
  "token": "L3ZkYS9saXN0L2NuLzAwMDBhMmQ4QGxvY2FsaG9zdDo5ODIw",
  "cn_summary_list": [
    {
      "sock_addr": "localhost:9821"
    },
    {
      "sock_addr": "localhost:9820"
    }
  ]
}

You can find all the sock_addr in the cn_summary_list. If there are too many CNs, the result will be pagination. You can use vda_cli cn list --token xxxx to get the next page. The token xxxx can be found from the previous result.

After we know the sock_addr of a CN, we can check its status:

./vda_cli cn get --sock-addr localhost:9820

Result:

{
  "reply_info": {
    "req_id": "85f4bac4-3041-438f-aefa-3940ed84c28d",
    "reply_msg": "succeed"
  },
  "controller_node": {
    "cn_id": "058a4172396c441885dd3286c122ff4e",
    "sock_addr": "localhost:9820",
    "nvmf_listener": {
      "tr_type": "tcp",
      "adr_fam": "ipv4",
      "tr_addr": "127.0.0.1",
      "tr_svc_id": "4430"
    },
    "hash_code": 41688,
    "err_info": {
      "timestamp": "2021-06-22 05:50:16.207509206 +0000 UTC"
    },
    "cntlr_fe_list": [
      {
        "cntlr_id": "0ee93ac9fee54eb99e0ae0095e2c523c",
        "da_name": "da0",
        "is_primary": true,
        "err_info": {
          "timestamp": "2021-06-22 05:50:16.047447453 +0000 UTC"
        },
        "grp_fe_list": [
          {
            "grp_id": "45d0135352ed4620a760f874ca8f1560",
            "size": 134217728,
            "err_info": {
              "timestamp": "2021-06-22 05:50:15.539520506 +0000 UTC"
            },
            "vd_fe_list": [
              {
                "vd_id": "821db145028c41a5b7bdd5257be3e1f1",
                "size": 67108864,
                "err_info": {
                  "timestamp": "2021-06-22 05:50:15.475493961 +0000 UTC"
                }
              },
              {
                "vd_id": "5a786119a887413ea39716b0baf419cd",
                "vd_idx": 1,
                "size": 67108864,
                "err_info": {
                  "timestamp": "2021-06-22 05:50:15.443433468 +0000 UTC"
                }
              }
            ]
          }
        ],
        "snap_fe_list": [
          {
            "snap_id": "68a303d4411a442dbd07d5bc4912f0a9",
            "err_info": {
              "timestamp": "2021-06-22 05:50:15.667518335 +0000 UTC"
            }
          }
        ],
        "exp_fe_list": [
          {
            "exp_id": "e01d5adb4f694591afdce2838b9112d9",
            "err_info": {
              "timestamp": "2021-06-22 05:50:16.047444703 +0000 UTC"
            }
          }
        ]
      }
    ]
  }
}

The controller_node field has the basic information of this The cntlr_fe_list field has all the cntlrs of this CN.

List all the DNs:

./vda_cli dn list

Result:

{
  "reply_info": {
    "req_id": "912b7d2c-31ec-42f1-aece-88f1e89c7254",
    "reply_msg": "succeed"
  },
  "token": "L3ZkYS9saXN0L2RuLzAwMDBjZjg3QGxvY2FsaG9zdDo5NzIw",
  "dn_summary_list": [
    {
      "sock_addr": "localhost:9721"
    },
    {
      "sock_addr": "localhost:9720"
    }
  ]
}

Similar as CN, after we have the DN sock_addr list, we can check each individual DN:

./vda_cli dn get --sock-addr localhost:9720

Result:

{
  "reply_info": {
    "req_id": "d2e19fa3-394c-4add-bba1-b124ad769726",
    "reply_msg": "succeed"
  },
  "disk_node": {
    "dn_id": "07ff85310a864b449ce9b53231e8389f",
    "sock_addr": "localhost:9720",
    "version": 3,
    "nvmf_listener": {
      "tr_type": "tcp",
      "adr_fam": "ipv4",
      "tr_addr": "127.0.0.1",
      "tr_svc_id": "4420"
    },
    "hash_code": 53127,
    "err_info": {
      "timestamp": "2021-06-22 05:45:47.567450797 +0000 UTC"
    }
  }
}

The result shows the basic information of this DN, but it doesn’t have any PD information. We can list all PDs on a given DN:

./vda_cli pd list --sock-addr localhost:9720

Result:

{
  "reply_info": {
    "req_id": "59d5ac98-1b65-4849-b211-060f563eecff",
    "reply_msg": "succeed"
  },
  "pd_summary_list": [
    {
      "pd_name": "pd0"
    }
  ]
}

Then we can get the details of a given PD:

./vda_cli pd get --sock-addr localhost:9720 --pd-name pd0

Result:

{
  "reply_info": {
    "req_id": "cfee6d23-c042-480f-b63c-9671b0c1cd36",
    "reply_msg": "succeed"
  },
  "physical_disk": {
    "pd_id": "e86bb5e03b2446e48ac9465aacf602eb",
    "pd_name": "pd0",
    "total_size": 264241152,
    "free_size": 197132288,
    "total_qos": {},
    "free_qos": {},
    "BdevType": {
      "BdevMalloc": {
        "size": 268435456
      }
    },
    "err_info": {
      "timestamp": "2021-06-22 05:45:47.503458789 +0000 UTC"
    },
    "vd_be_list": [
      {
        "vd_id": "821db145028c41a5b7bdd5257be3e1f1",
        "da_name": "da0",
        "size": 67108864,
        "qos": {},
        "cntlr_id": "0ee93ac9fee54eb99e0ae0095e2c523c",
        "err_info": {
          "timestamp": "2021-06-22 05:45:47.47142903 +0000 UTC"
        }
      }
    ]
  }
}

The vd_be_list field lists all the VDs allocated from this PD.

Clean up all resources

  • Disconnect from the host:

    sudo nvme disconnect -n nqn.2016-06.io.vda:exp-da0-exp0a
    

    You should get below output:

    NQN:nqn.2016-06.io.vda:exp-da0-exp0a disconnected 2 controller(s)
    

    It indicates both of the two controllers are disconnected.

  • Delete the exp0a:

    ./vda_cli exp delete --da-name da0 --exp-name exp0a
    
  • Delete the da0:

    ./vda_cli da delete --da-name da0
    
  • Delete the cn0:

    ./vda_cli cn delete --sock-addr localhost:9820
    
  • Delete the cn1:

    ./vda_cli cn delete --sock-addr localhost:9821
    
  • Delete the pd0:

    ./vda_cli pd delete --sock-addr localhost:9720 --pd-name pd0
    
  • Delete the dn0:

    ./vda_cli dn delete --sock-addr localhost:9720
    
  • Delete the pd1:

    ./vda_cli pd delete --sock-addr localhost:9721 --pd-name pd1
    
  • Delete the dn1:

    ./vda_cli dn delete --sock-addr localhost:9721
    
  • Terminate all the processes:

    killall vda_portal
    killall vda_monitor
    killall vda_dn_agent
    killall vda_cn_agent
    killall etcd
    ./spdk/scripts/rpc.py -s /tmp/vda_data/dn0.sock spdk_kill_instance SIGTERM
    ./spdk/scripts/rpc.py -s /tmp/vda_data/dn1.sock spdk_kill_instance SIGTERM
    ./spdk/scripts/rpc.py -s /tmp/vda_data/cn0.sock spdk_kill_instance SIGTERM
    ./spdk/scripts/rpc.py -s /tmp/vda_data/cn1.sock spdk_kill_instance SIGTERM
    
  • Delete the work directory:

    rm -rf /tmp/vda_data