Wednesday, January 16, 2013

how to network jumbo frames to a kvm guest

Getting a jumbo frame to a KVM guest is not something which works out of the box, but can be configured to work quite easily.

One way get jumbo frames to a KVM guest is to directly pass the guest a physical NIC through PCI Passthrough, or a Virtual Function through SR-IOV.

Another way is to use the traditional libvirt bridge. Create a bridge on the host, and bridge the guest's NICs into this. The reason this doesn't work by default is all the interfaces are created with the default Ethernet MTU of 1500 bytes.

You can set MTU of the physical NIC, the bridge, and the guest interface with the network-scripts, but you cannot automatically set the MTU of the tap interface with the network-scripts. The tap interface is how the virtual network interface connects to the "real world". You'll see these tap devices with names like "vnet0" in your bridge.

So how do we do this?

First, some requirements:
  • A physical interface which supports jumbo frames.
    The Linux bridge code will only allow the bridge to change to the smallest MTU of all its interfaces.
  • A bridge with the larger required MTU
    Technically a jumbo frame covers MTU from 1501 to 9000, but almost everyone just wants to set 9000
  • virtio-net or Intel e1000 network interface inside the guest
    The real-life Realtek 8139 does not support an MTU of 9000, therefore there is no driver support for large MTU in the emulated version either.
Our first step is to make all tap interfaces on the host have an MTU of 9000. We do this by adding a udev rule to the top of the network device creation as follows:

/etc/udev/rules.d/70-persistent-net.rules (on host)

SUBSYSTEM=="net", ACTION=="add", KERNEL=="vnet*", ATTR{mtu}="9000"

Then set your physical interface MTU to 9000:

/etc/sysconfig/network-scripts/ifcfg-eth0 (on host)

DEVICE=eth0
HWADDR=[physical MAC address]
TYPE=Ethernet
ONBOOT=yes
BRIDGE=br0
MTU=9000

And do the same inside the bridge:

/etc/sysconfig/network-scripts/ifcfg-br0 (on host)

DEVICE=br0
TYPE=Bridge
ONBOOT=yes
DELAY=0
MTU=9000

Now inside your guest, set your virtio or e1000 interface to also have the larger MTU:

/etc/sysconfig/network-scripts/ifcfg-eth0 (in guest)

DEVICE=eth0
HWADDR=[guest MAC address]
TYPE=Ethernet
ONBOOT=yes
MTU=9000

You'll need to perform a "service network restart" on the host, or at least bring the bridge and eth interfaces down and up. Start your guest and confirm all interfaces have been made with the correct MTU:

# ip link
2: eth0: mtu 9000 qdisc pfifo_fast state UP qlen 1000
    link/ether 01:23:45:67:89:0A brd ff:ff:ff:ff:ff:ff
3: br0: mtu 9000 qdisc noqueue state UNKNOWN
    link/ether 01:23:45:67:89:0A brd ff:ff:ff:ff:ff:ff
4: vnet0: mtu 9000 qdisc pfifo_fast state UNKNOWN qlen 500
    link/ether fe:54:00:01:23:45 brd ff:ff:ff:ff:ff:ff


This should be enough to send a "Do Not Fragment" 9000 byte frame over the network. Let's test with a ping:

# ping -c 4 -s 8972 -M do 172.16.0.2
PING 172.16.0.2 (172.16.0.2) 8972(9000) bytes of data.
8980 bytes from 172.16.0.2: icmp_seq=1 ttl=64 time=1.27 ms
8980 bytes from 172.16.0.2: icmp_seq=2 ttl=64 time=0.284 ms
8980 bytes from 172.16.0.2: icmp_seq=3 ttl=64 time=0.202 ms
8980 bytes from 172.16.0.2: icmp_seq=4 ttl=64 time=0.260 ms


Success!

Sunday, August 5, 2012

how to persist ethtool settings across reboot

You can use the ethtool command to read and change information about your network interfaces.

For example, ethtool -g ethX reads the size of the ring buffer on the NIC, and ethtool -G ethx rx A tx B changes it. Use man ethtool to discover more settings.


But these options don't persist across reboot, so how do you make sure your settings are kept permanent?

You can enter the ethtool commands in /etc/rc.local (or your distribution's equivalent) where commands are run after the current runlevel completes, but this isn't ideal. Network services may have started during the runlevel and ethtool commands tend to interrupt network traffic. It would be more preferable to have the commands applied as the interface is brought up.

The network service in CentOS has the ability to do this. The script /etc/sysconfig/network-scripts/ifup-post checks for the existence of /sbin/ifup-local, and if it exists, runs it with the interface name as a parameter (eg: /sbin/ifup-local eth0)

We can create this file with touch /sbin/ifup-local make it executable with chmod +x /sbin/ifup-local set its SELinux context with chcon --reference /sbin/ifup /sbin/ifup-local and then open it in an editor.

A simple script to apply the same settings to all interfaces would be something like

#!/bin/bash
if [ -n "$1" ]; then
    /sbin/ethtool -G $1 rx 4096 tx 4096
    /sbin/ethtool -K $1 tso on gso on
fi


Keep in mind this will attempt to apply settings to ALL interfaces, even the loopback.

If we have different interfaces we want to apply different settings to, or want to skip the loopback, we can make a case statement

#!/bin/bash
case "$1" in
 eth0)
  /sbin/ethtool -G $1 rx 16384 tx 16384

  /sbin/ethtool -K $1 gso on gro on
  ;;
 eth1)
  /sbin/ethtool -G $1 rx 64 tx 64

  /sbin/ethtool -K $1 tso on gso on
  /sbin/ip link set $1 txqueuelen 0
  ;;
esac
exit 0


Now ethtool settings are applied to interfaces as they start, all potential interruptions to network communication are done as the interface is brought up, and your server can continue to boot with full network capabilities.

Saturday, August 4, 2012

kvm virtual machine won't start

Recently I brought my KVM host down for an upgrade. I shut down the guests within their OSes, confirmed they were powered off with a virsh list --all, and everything went well with the upgrade.

However, upon returning the guests to service, one of them would not start up. Typing virsh start I got the helpful error message

Error restoring domain: cannot send monitor command
Connection reset by peer

Why was this one VM broken but the others were fine?

I played around with virsh autostart and virsh autostart --disable but that had no effect, nor did a reboot of the hypervisor.

After some searching around, it turns out libvirt has the capability to keep "managed save states" of guests, kinda like a sleep mode or snapshot, to save you fully powering a guest OS off.

For some reason, a managed save for this one guest had been created, perhaps it had not shut down properly, or perhaps there's an errorr in libvirt. I could view the saved state with

# virsh list --all --managed-save
 Id Name                 State
----------------------------------
  - guest1               shut off
  - guest2               saved


Now a virsh managedsave-remove guest2 returned it to the "shut off" state, and I could start it properly with virsh start as per usual.

Wednesday, June 20, 2012

grub2 usb keyboard not working

I installed the newly released Fedora 17 this week, only to find I could no longer control the GRUB2 screen to get into Windows to play some games. The keyboard and mouse work perfectly in BIOS, and in Linux once USB drivers are loaded, just not at the GRUB2 screen.

Many forum threads exist for this, most pointing towards the "USB Legacy" or similar option in the BIOS. I had this turned on, however turning it off made no difference either.

GRUB2 can load some driver modules, so perhaps it wasn't loading the USB modules. Adding GRUB_PRELOAD_MODULES="usb usb_keyboard ehci ohci uhci" to /etc/default/grub and then rebuilding the config files with grub2-mkconfig -o /boot/grub2/grub.cfg didn't change anything either.

At this point I started coming across articles mentioning UEFI support for GRUB2. UEFI is the "new BIOS" standard coming out on new motherboards. My motherboard is a fairly new model, so it does have EFI firmware.

Turns out the solution is to install a version of GRUB2 with EFI support. This was done with yum install grub2-efi to install the package, then grub2-efi-install /dev/sda to install the EFI-supporting bootloader onto my hard drive. I regenerated a new config with grub2-efi-mkconfig -o /boot/grub2/grub.cfg while I was at it.

Now my USB keyboard works perfectly in GRUB2.

Thursday, April 26, 2012

su to root without a password

I'm sick of typing the root password every time I want to su - on Fedora to become the root user. I know how to allow sudo access without a password, but I don't want to use sudo, I want to be able to just type su - and become root.

I couldn't find a good answer for this on Google, so I read the man pages of pam (Pluggable Authentication Modules) until I figured it out.

In the file /etc/pam.d/su put this as the second line:

auth            sufficient      pam_permit.so

This is incredibly insecure as it lets literally anyone at all with a login become root.

To restrict this just to your username, use this line instead, replace the yourusername with your actual username:

auth            sufficient      pam_succeed_if.so use_uid user = yourusername

You can also restrict this to a group, here the group allowedpeople can su without a password:

auth            sufficient      pam_succeed_if.so use_uid user ingroup allowedpeople