0

I recently began learning about Ceph and setup my own ansible scripts to setup a Ceph cluster (yes, I'm aware that cephadm-ansible exists, but I want to get comfortable managing ceph).

Initially, I provisioned the default rgw service here and tried to create another zone with the pools I wanted and make it the default, but that didn't work so well (the service seemed to run, but was inaccessible): https://github.com/Magnitus-/ansible-playbooks/blob/23385216d078251939a6ad03f197d3aad9a79516/roles/ceph/rgw/templates/setup_rgw_service.sh.j2

I deleted the new zone, removed the rgw service and cleaned up all the pools except for .rgw.root because the documentation warned me not to (in retrospect, I probably should have removed it).

I then re-provisioned the rgw service, but just pre-provisioned the default pools before I did so with what I wanted instead: https://github.com/Magnitus-/ansible-playbooks/blob/main/roles/ceph/rgw/templates/setup_rgw_pools.sh.j2 https://github.com/Magnitus-/ansible-playbooks/blob/main/roles/ceph/rgw/templates/setup_rgw_service.sh.j2

And that seemed to work very well. However, when I list the users by running radosgw-admin user list, I get the list, but I get this first:

2024-02-05T13:20:40.999+0000 7f37fd330a40  0 failed reading obj info from .rgw.root:realms.c08fb4e1-502c-42f2-98b9-63202f161420: (2) No such file or directory
2024-02-05T13:20:40.999+0000 7f37fd330a40  0 failed reading obj info from .rgw.root:realms.c08fb4e1-502c-42f2-98b9-63202f161420: (2) No such file or directory
2024-02-05T13:20:41.003+0000 7f37fd330a40  0 failed reading obj info from .rgw.root:realms.c08fb4e1-502c-42f2-98b9-63202f161420: (2) No such file or directory
2024-02-05T13:20:41.075+0000 7f37fd330a40  0 failed reading obj info from .rgw.root:realms.c08fb4e1-502c-42f2-98b9-63202f161420: (2) No such file or directory

I'm guessing I have some corruption left from my previous setup. It hasn't affected me so far (well, creating a user with read-only access on some buckets really feels like pulling teeth, but I have the same problem on a fresh virtualized test ceph cluster), but I feel like I should clean house before it becomes a problem so I want to reprovision a fresh rgw service (no previous metadata, nothing).

First, I want to migrate the data within the same ceph cluster (I don't have the disk capacity outside my ceph cluster, I don't want to pay cloud egress fees and I don't want to redownload everything). Ideally, I'd provision another rgw service that doesn't use .rgw.root and rclone to it, but I get the feeling that might be a tall order. Instead, I guess I'll figure out how to setup cephfs, mount a volume and rclone my buckets to it (I have enough capacity in my ceph cluster to duplicate the data).

Then, to get a squeeky clean rgw service, I guess I'll remove it like last time, clean all its pools, but this time I'll also cleanup .rgw.root and then I'll be good? No more ghosts from the past?

Magnitus
  • 101

1 Answers1

0

I validated it in a virtualized ceph cluster and removing the service and all pools related to rgw works well (assuming you have a way to transfer all the data you want to keep out of the rgw pools which I have in my case).

I had forgotten about it, but the ceph cluster only starts with an .mgr pool and that's it. The .rgw.root gets added later on when you setup the rados gateway service.

Also, from what I could tell so far, rgw's state doesn't appear to leak outside its pools so just shutting off the service and deleting all the pools seems to be enough to start again with a clean slate.

Magnitus
  • 101