I'm setting up a new Ceph cluster for testing purposes using virtual machines on Hyper-V, all running Debian 12 (Bookworm). Here's the current setup:
4 VMs, each with: 40GB disk for the system 1000GB disk for Ceph (clean, no partitions or filesystem)
Hostnames and Network Each VM has a static IP and hostname:
sudo hostnamectl set-hostname adm001
sudo hostnamectl set-hostname osd001
sudo hostnamectl set-hostname osd002
sudo hostnamectl set-hostname osd003
All hosts share the following /etc/hosts entry:
192.168.155.10 adm001
192.168.155.21 osd001
192.168.155.22 osd002
192.168.155.23 osd003
Installed packages on all nodes:
sudo apt -y install podman lvm2
Ceph Installation On adm001, I installed Ceph 19.2.1:
CEPH_RELEASE=19.2.1
curl --silent --remote-name --location https://download.ceph.com/rpm-${CEPH_RELEASE}/el9/noarch/cephadm
chmod +x cephadm
./cephadm add-repo --release squid
./cephadm install
Then I bootstrapped the cluster:
cephadm bootstrap --mon-ip=192.168.155.10 \
--apply-spec=initial-config-primary-cluster.yaml \
--initial-dashboard-password='niceone' \
--dashboard-password-noupdate \
--ssh-user=admin
The initial-config-primary-cluster.yaml file defines the cluster topology (included below if needed). After about 20 minutes, the installation completes successfully. I can access the Ceph Dashboard, and the cluster looks healthy:
ceph status
Output:
cluster:
id: ...
health: HEALTH_OK
services:
mon: 2 daemons, quorum adm001,osd001
mgr: adm001 (active)
osd: 3 osds: 3 up, 3 in
data:
pools: 2 pools, 33 pgs
usage: 146 MiB used, 3.0 TiB / 3.0 TiB avail
pgs: 33 active+clean
io:
client: 3.0 KiB/s rd, 1 op/s rd
NVMe-oF Configuration (via Dashboard)
- Created a pool, assigned rbd as its application. Pool name is pool1.
- Set up NVMe/TCP Gateway on osd002 and osd003. Created a Subsystem:
- Created a Subsystem: nqn.2001-07.com.ceph:1742891486659
- Created a Namespace with image name nvme_ns_image:1742891396750, size 1TB, in pool1.
- Created a Listener on osd002 at port 4420.
- Added an Initiator (ID from Linux client), also tried allowing all hosts.
Client-Side (Linux) On the Linux client:
sudo apt install nvme-cli
modprobe nvme-tcp
nvme connect -t tcp -a 192.168.155.22 -n "nqn.2001-07.com.ceph:1742891486659"
Output:
message: connecting to device: nvme0
But:
- lsblk shows no new device
- nvme list only shows the header, no devices:
Client-Side (Windows) I also tried using StarWind NVMe-oF Initiator on Windows. The connection succeeds, but no usable device is listed.
Question: What might I be missing in the Ceph NVMe-oF setup? Why is no device appearing on either the Linux or Windows client? The image and listener seem to be set up correctly, and the connection doesn't error out — but no usable device is visible.
Any hints, suggestions, or troubleshooting steps are highly appreciated!
initial-config-primary-cluster.yaml:
service_type: host
hostname: adm001
addr: 192.168.155.10
location:
root: default
datacenter: DC1
rack: HVHost1
labels:
- mon
- mgr
- admin
---
service_type: host
hostname: osd001
addr: 192.168.155.21
location:
root: default
datacenter: DC1
rack: HVHost1
labels:
- osd
- mon
service_type: host
hostname: osd002
addr: 192.168.155.22
location:
root: default
datacenter: DC1
rack: HVHost1
labels:
- osd
- nvme
service_type: host
hostname: osd003
addr: 192.168.155.23
location:
root: default
datacenter: DC1
rack: HVHost1
labels:
- osd
- nvme
service_type: mon
placement:
label: "mon"
service_type: mgr
placement:
label: "mgr"
service_type: osd
service_id: default_drive_group
spec:
data_devices:
all: true
placement:
label: "osd"