// Some design notes and history: // VXLAN encapsulates L2 packets (though flannel is L3 only so don't expect to be able to send L2 packets across hosts) // The first versions of vxlan for flannel registered the flannel daemon as a handler for both "L2" and "L3" misses // - When a container sends a packet to a new IP address on the flannel network (but on a different host) this generates // an L2 miss (i.e. an ARP lookup) // - The flannel daemon knows which flannel host the packet is destined for so it can supply the VTEP MAC to use. // This is stored in the ARP table (with a timeout) to avoid constantly looking it up. // - The packet can then be encapsulated but the host needs to know where to send it. This creates another callout from // the kernal vxlan code to the flannel daemon to get the public IP that should be used for that VTEP (this gets called // an L3 miss). The L2/L3 miss hooks are registered when the vxlan device is created. At the same time a device route // is created to the whole flannel network so that non-local traffic is sent over the vxlan device. // // In this scheme the scaling of table entries (per host) is: // - 1 route (for the configured network out the vxlan device) // - One arp entry for each remote container that this host has recently contacted // - One FDB entry for each remote host // // The second version of flannel vxlan removed the need for the L3MISS callout. When a new remote host is found (either // during startup or when it's created), flannel simply adds the required entries so that no further lookup/callout is required. // // // The latest version of the vxlan backend removes the need for the L2MISS too, which means that the flannel deamon is not // listening for any netlink messages anymore. This improves reliability (no problems with timeouts if // flannel crashes or restarts) and simplifies upgrades. // // How it works: // Create the vxlan device but don't register for any L2MISS or L3MISS messages // Then, as each remote host is discovered (either on startup or when they are added), do the following // 1) create routing table entry for the remote subnet. It goes via the vxlan device but also specifies a next hop (of the remote flannel host). // 2) Create a static ARP entry for the remote flannel host IP address (and the VTEP MAC) // 3) Create an FDB entry with the VTEP MAC and the public IP of the remote flannel daemon. // // In this scheme the scaling of table entries is linear to the number of remote hosts - 1 route, 1 arp entry and 1 FDB entry per host // // In this newest scheme, there is also the option of skipping the use of vxlan for hosts that are on the same subnet, // this is called "directRouting"