Have you been able to resolve this?
tldr: enabling SR-IOV seems to be required
We have noticed similar, probably the same, issue with 3 a16 cards. Basically only 1 physical card was working.
One of the things we noticed was that the system complained about memory overlap (dmesg logs), only one of the cards was actually working. 2 cards were not able to map memory.
This could be then checked in lspci, below is the example when it does not work:
sudo lspci | grep NVIDIA | cut -d ' ' -f1 | xargs -I@ bash -c 'echo @; sudo lspci -v -s @ | grep non-prefetchable'
5a:00.0
Memory at bd000000 (32-bit, non-prefetchable) [size=16M]
5b:00.0
Memory at bf000000 (32-bit, non-prefetchable) [size=16M]
5c:00.0
Memory at c1000000 (32-bit, non-prefetchable) [size=16M]
5d:00.0
Memory at c3000000 (32-bit, non-prefetchable) [size=16M]
c6:00.0
Memory at <ignored> (32-bit, non-prefetchable)
c7:00.0
Memory at <ignored> (32-bit, non-prefetchable)
c8:00.0
Memory at <ignored> (32-bit, non-prefetchable)
c9:00.0
Memory at <ignored> (32-bit, non-prefetchable)
de:00.0
Memory at <ignored> (32-bit, non-prefetchable)
df:00.0
Memory at <ignored> (32-bit, non-prefetchable)
e0:00.0
Memory at <ignored> (32-bit, non-prefetchable)
e1:00.0
Memory at <ignored> (32-bit, non-prefetchable)
Once we enabled SR-IOV all cards got memory mapped.
From before we have been enabling Intel IOMMU (perhaps AMD has equivalent) in boot options since all the other cards required it. So in case you haven't enabled it before you probably will need it with a16 cards too.
Some of the error logs we collected:
pnp 00:01: disabling [mem 0xff000000-0xffffffff disabled] because it overlaps 0000:e1:00.0 BAR 8 [mem 0x00000000-0x7ffffffff 64bit pref]
pci 0000:e1:00.0: BAR 8: no space for [mem size 0x800000000 64bit pref]
vfio 0000:ca:00.0: hardware reports invalid configuration, MSIX PBA outside of specified BAR