3

I have nearly 300 devices interconnected in the local ipv4 network and would like to discover each other somehow. At the moment, I'm using mdns announce to achieve this (using avahi lib). This solution works up to 50 devices, but it goes more than 100 I started to face some flooding of avahi announce messages in the network (of course it's obvious).

Is there any other alternative to mdns discovery to achieve the same result? all my device has arch Linux and I'm ok to install any open-source software.

Vencat
  • 139

2 Answers2

0

I believe that paper referenced by Robert Harvey focuses on locating resources within an established p2p network, while your question is concerned with initial discovery of possible neighbors. Is that right?

For scalability, I would make the frequency of announce messages be inversely proportional to the known number of nodes in the p2p net, so that in the steady state the number of mdns messages per unit of time is bounded.

To achieve that, each device needs to know (or have a good estimate of) the number of other distinct devices reachable via direct or indirect connections. If you have disjunct meshes, the number of announcements per unit of time will be higher, but such disjunct meshes would quickly join each other.

0

Scaling up is not too hard. You will need to elect some group leaders.

Also, while it is {more powerful, higher overhead} than your current setup, you might find the notion of reliable multicast or virtual synchrony to be of interest. Ken Birman did great work on this in the 1980's. A host becomes a member of sequentially numbered "views", which may shrink in membership due to power fail or network partition, and may grow when partitions are later merged.


Goal: You want both traffic through a given WAN link, and messages seen by a given node, to be bounded even as number of nodes grows.

  1. Keep sending mdns messages as you currently do, at random intervals of roughly 24 hours, plus once soon after bootup.
  2. Every node has a unique (mostly persistent) ID, perhaps derived from MAC address or a GUID rolled at software install time.
  3. Define some "regions" of network topology. If you think about ARP broadcasts, a region corresponds to an IPv4 subnet prefix, e.g. 10.0.2.0/23. You might instead use "nearest leader node" based on elapsed time ping, or administrative groupings, or even SHA3 hash of MAC address modulo K.
  4. Region leaders send frequent mdns advertisements, perhaps once per minute. Upon bootup a node pauses for random delay, sends advertisement, considers itself an auxiliary leader, and waits for messages. It should soon hear one from the region's leader, containing the node's ID, at which point node demotes itself to being an ordinary participant.
  5. Leaders send "current state of the world" messages that list all of a region's node IDs together with timestamp it last heard from that node. It also includes administrivia: (A.) region's subleader(s), and (B.) other regions along with their leaders.
  6. Nodes periodically send "I'm alive!" unicast messages to leaders and aux leaders.
  7. Auxiliary leaders send messages similar to leader, but less frequently. If leader becomes unreachable, the lowest numbered aux leader promotes itself to become leader.

This is an asymmetric gossip protocol.

The idea is that everyone can see what the protocol believes the state of the world is, and typically they silently agree with that. Seeing yourself listed in an advertisement from leader squelches your own advertisement. In some ways this resembles bridge spanning tree protocol.

Leader + aux leader will have overhead proportional to region size. Everyone else wins. For less overhead, use a greater number of smaller regions.


Very important: jitter all timers. Add some random amount of delay. Else nodes will synchronize with one another. I have seen this in production with MUA clients hitting a POP3 mail server. The BGP community was impacted by this badly enough that mandatory jitter is now written into the protocol spec.

J_H
  • 7,645