9

Just for test, I created and mounted the same XFS file system on two hosts based on a shared device (pmem).Host A created a file in its mounted directory and executed the sync command to ensure that xfs_db can see the newly created inode information. However, this new file is not visible on Host B until host B umount the filesystem. I would like to know why.

I noticed that the ls system call(getdents) eventually called xfs_readdir(), which uses XFS's on-disk format to get inode information. However, does this process access the disk, or are some of the metadata for xfs_inode cached in memory when the file system is mounted?

2 Answers2

11

What you're seeing is absolutely normal because XFS isn’t built for multi-host setups, it’s no clustered file system. By design!

In a nutshell:

Host A updates its in-memory metadata and syncs it to disk, but Host B doesn’t know about it since they don’t share cache updates. Caches aren’t coherent, so xfs_readdir() mostly pulls from memory unless it has to hit the disk, so Host B’s view stays stale.

Unmounting on Host B forces it to refresh from disk, which is why the file shows up.

Bottom line:

XFS doesn’t handle multi-host mounts. If you need that, roll with a clustered file system like GFS2 or OCFS2. You can bring in a distributed lock manager like say SANlock. Or you can stick with a ‘network redirector’ like NFS or SMB3, which is a safest way to go, hands down! There's some good reading on topic, see:

https://forums.starwindsoftware.com/viewtopic.php?t=1392

Hint:

Just ignore iSCSI and StarWind context, everything these guys are talking about is completely relevant to any shared block storage vendor.

NISMO1968
  • 1,583
4

You’re seeing perfectly normal aspects of shared storage access without using a filesystem designed for that usage (currently on mainline Linux, the options for that are GFS2 and OCFS2, neither of which is great).

To answer your title question, any non-clustered filesystem on Linux caches directory entries and inodes (and, notably, also the filesystem superblock), because the VFS layer itself does that. This really has nothing to do with XFS itself here. XFS may be doing additional caching, but what you demonstrated is behavior you would also see with ext4, BTRFS, F2FS, and essentially any other Linux filesystem except for GFS2 and OCFS2.

Expanding on this a bit more, essentially any filesystem not designed for shared storage access assumes it has exclusive access to the underlying storage device when mounted. This is really important for performance reasons, because it allows things that are not expected to change to just be cached, which eliminates a lot of unnecessary storage accesses.

This, in turn, leads to coherency issues in shared storage access situations like what you are doing. What you saw is actually a best case result, the worst case is that one or more of the hosts crashes due to some bug in the filesystem driver that results from not accounting for some state of the filesystem in persistent storage resulting from seeing a torn write.

If you need multiple systems to access the same storage, you need to instead use one of:

  • A clustered filesystem like GFS2 or OCFS2.
  • A network filesystem like NFS or SMB3, backed by non-shared storage.
  • A distributed storage system like Ceph.