Pages - Menu

Wednesday, January 16, 2013

how to network jumbo frames to a kvm guest

Getting a jumbo frame to a KVM guest is not something which works out of the box, but can be configured to work quite easily.

One way get jumbo frames to a KVM guest is to directly pass the guest a physical NIC through PCI Passthrough, or a Virtual Function through SR-IOV.

Another way is to use the traditional libvirt bridge. Create a bridge on the host, and bridge the guest's NICs into this. The reason this doesn't work by default is all the interfaces are created with the default Ethernet MTU of 1500 bytes.

You can set MTU of the physical NIC, the bridge, and the guest interface with the network-scripts, but you cannot automatically set the MTU of the tap interface with the network-scripts. The tap interface is how the virtual network interface connects to the "real world". You'll see these tap devices with names like "vnet0" in your bridge.

So how do we do this?

First, some requirements:
  • A physical interface which supports jumbo frames.
    The Linux bridge code will only allow the bridge to change to the smallest MTU of all its interfaces.
  • A bridge with the larger required MTU
    Technically a jumbo frame covers MTU from 1501 to 9000, but almost everyone just wants to set 9000
  • virtio-net or Intel e1000 network interface inside the guest
    The real-life Realtek 8139 does not support an MTU of 9000, therefore there is no driver support for large MTU in the emulated version either.
Our first step is to make all tap interfaces on the host have an MTU of 9000. We do this by adding a udev rule to the top of the network device creation as follows:

/etc/udev/rules.d/70-persistent-net.rules (on host)
SUBSYSTEM=="net", ACTION=="add", KERNEL=="vnet*", ATTR{mtu}="9000"
Then set your physical interface MTU to 9000:

/etc/sysconfig/network-scripts/ifcfg-eth0 (on host):
HWADDR=[physical MAC address]
And do the same inside the bridge:

/etc/sysconfig/network-scripts/ifcfg-br0 (on host)
Now inside your guest, set your virtio or e1000 interface to also have the larger MTU:

/etc/sysconfig/network-scripts/ifcfg-eth0 (in guest)
HWADDR=[guest MAC address]
You'll need to perform a "service network restart" on the host, or at least bring the bridge and eth interfaces down and up. Start your guest and confirm all interfaces have been made with the correct MTU:
# ip link
2: eth0: mtu 9000 qdisc pfifo_fast state UP qlen 1000
    link/ether 01:23:45:67:89:0A brd ff:ff:ff:ff:ff:ff
3: br0: mtu 9000 qdisc noqueue state UNKNOWN
    link/ether 01:23:45:67:89:0A brd ff:ff:ff:ff:ff:ff
4: vnet0: mtu 9000 qdisc pfifo_fast state UNKNOWN qlen 500
    link/ether fe:54:00:01:23:45 brd ff:ff:ff:ff:ff:ff
This should be enough to send a "Do Not Fragment" 9000 byte frame over the network. Let's test with a ping:
# ping -c 4 -s 8972 -M do
PING ( 8972(9000) bytes of data.
8980 bytes from icmp_seq=1 ttl=64 time=1.27 ms
8980 bytes from icmp_seq=2 ttl=64 time=0.284 ms
8980 bytes from icmp_seq=3 ttl=64 time=0.202 ms
8980 bytes from icmp_seq=4 ttl=64 time=0.260 ms


Tim Small said...

FWIW, the 8139C+ supports 4k frames according to the datasheet. The other 8139s support 'baby giant' packets up to an MTU of 1700 bytes or so (can be useful for encapsulation of 1500 mtu packets inside MPLS, GRE, PPPoE etc.).



Anonymous said...

Thank you very much for that udev rule, I can confirm it works on CentOS 7 also.

Anonymous said...

Does not work for me in Centos7. I see that initially MTU set to required size. But later when net interfaces up - I see 1500, it seems smth change it later to default value.