Pages - Menu

Showing posts with label ethernet. Show all posts
Showing posts with label ethernet. Show all posts

Thursday, August 29, 2013

finding the netdev_priv struct

In addition to the net_device struct within the kernel, network drivers also have their own private or device-specific struct which stores stats unique to the individual hardware.

The name of the struct varies for each device type, however the location remains the same, right after net_device:

linux-2.6.32-358.14.1.el6.x86_64/include/linux/netdevice.h
/**
 *      netdev_priv - access network device private data
 *      @dev: network device
 *
 * Get network device private data
 */
static inline void *netdev_priv(const struct net_device *dev)
{
        return (char *)dev + ALIGN(sizeof(struct net_device), NETDEV_ALIGN);
}
A search in cscope for netdev_priv will show you many functions within drivers which update the priv structure via pointers, for example this one in e1000_main.c:
struct e1000_adapter *adapter = netdev_priv(netdev);
So, given that we know what the device-specific struct is called, and where the device-specific private struct is, how do you find it and read it?

We'll need the ability to read kernel memory using crash, either run on a live system, or in a vmcore captured with kdump:
crash /usr/lib/debug/lib/modules/2.6.32-358.14.1.el6.x86_64/vmlinux /var/crash/2013-08-26/vmcore
We'll also need the debugging symbols for the driver in question:
crash> mod -s e1000 /usr/lib/debug/lib/modules/2.6.32-358.14.1.el6.x86_64/kernel/drivers/net/e1000/e1000.ko.debug
     MODULE       NAME                 SIZE  OBJECT FILE
ffffffffa01550e0  e1000              170678  /usr/lib/debug/lib/modules/2.6.32-358.14.1.el6.x86_64/kernel/drivers/net/e1000/e1000.ko.debug 
The net command will show you the network devices in the system:
crash> net
   NET_DEVICE     NAME   IP ADDRESS(ES)
ffff88007e76b820  lo     127.0.0.1
ffff8800372c0020  eth0   192.168.1.126
We can then cast net_device against this to see the in-kernel device struct:
crash> net_device 0xffff8800372c0020
struct net_device {
  name = "eth0\000\000\060:03.0\000\000\000",
...
But we want to get after this struct and cast it against the struct in the driver.

We need to know how big net_device is:
crash> struct -o net_device
struct net_device {
...
}
SIZE: 1728
We then find the net address plus the size of net_device:
crash> px 0xffff8800372c0020+1728
$1 = 0xffff8800372c06e0
We can now cast the device-specific private struct against this new address:
crash> e1000_adapter 0xffff8800372c06e0
struct e1000_adapter {
  vlgrp = 0x0,
  mng_vlan_id = 65535,
  bd_number = 0,
  rx_buffer_len = 1522,
...
and we're done!

Wednesday, January 16, 2013

how to network jumbo frames to a kvm guest

Getting a jumbo frame to a KVM guest is not something which works out of the box, but can be configured to work quite easily.

One way get jumbo frames to a KVM guest is to directly pass the guest a physical NIC through PCI Passthrough, or a Virtual Function through SR-IOV.

Another way is to use the traditional libvirt bridge. Create a bridge on the host, and bridge the guest's NICs into this. The reason this doesn't work by default is all the interfaces are created with the default Ethernet MTU of 1500 bytes.

You can set MTU of the physical NIC, the bridge, and the guest interface with the network-scripts, but you cannot automatically set the MTU of the tap interface with the network-scripts. The tap interface is how the virtual network interface connects to the "real world". You'll see these tap devices with names like "vnet0" in your bridge.

So how do we do this?

First, some requirements:
  • A physical interface which supports jumbo frames.
    The Linux bridge code will only allow the bridge to change to the smallest MTU of all its interfaces.
  • A bridge with the larger required MTU
    Technically a jumbo frame covers MTU from 1501 to 9000, but almost everyone just wants to set 9000
  • virtio-net or Intel e1000 network interface inside the guest
    The real-life Realtek 8139 does not support an MTU of 9000, therefore there is no driver support for large MTU in the emulated version either.
Our first step is to make all tap interfaces on the host have an MTU of 9000. We do this by adding a udev rule to the top of the network device creation as follows:

/etc/udev/rules.d/70-persistent-net.rules (on host)
SUBSYSTEM=="net", ACTION=="add", KERNEL=="vnet*", ATTR{mtu}="9000"
Then set your physical interface MTU to 9000:

/etc/sysconfig/network-scripts/ifcfg-eth0 (on host):
DEVICE=eth0
HWADDR=[physical MAC address]
TYPE=Ethernet
ONBOOT=yes
BRIDGE=br0
MTU=9000
And do the same inside the bridge:

/etc/sysconfig/network-scripts/ifcfg-br0 (on host)
DEVICE=br0
TYPE=Bridge
ONBOOT=yes
DELAY=0
MTU=9000
Now inside your guest, set your virtio or e1000 interface to also have the larger MTU:

/etc/sysconfig/network-scripts/ifcfg-eth0 (in guest)
DEVICE=eth0
HWADDR=[guest MAC address]
TYPE=Ethernet
ONBOOT=yes
MTU=9000
You'll need to perform a "service network restart" on the host, or at least bring the bridge and eth interfaces down and up. Start your guest and confirm all interfaces have been made with the correct MTU:
# ip link
2: eth0: mtu 9000 qdisc pfifo_fast state UP qlen 1000
    link/ether 01:23:45:67:89:0A brd ff:ff:ff:ff:ff:ff
3: br0: mtu 9000 qdisc noqueue state UNKNOWN
    link/ether 01:23:45:67:89:0A brd ff:ff:ff:ff:ff:ff
4: vnet0: mtu 9000 qdisc pfifo_fast state UNKNOWN qlen 500
    link/ether fe:54:00:01:23:45 brd ff:ff:ff:ff:ff:ff
This should be enough to send a "Do Not Fragment" 9000 byte frame over the network. Let's test with a ping:
# ping -c 4 -s 8972 -M do 172.16.0.2
PING 172.16.0.2 (172.16.0.2) 8972(9000) bytes of data.
8980 bytes from 172.16.0.2: icmp_seq=1 ttl=64 time=1.27 ms
8980 bytes from 172.16.0.2: icmp_seq=2 ttl=64 time=0.284 ms
8980 bytes from 172.16.0.2: icmp_seq=3 ttl=64 time=0.202 ms
8980 bytes from 172.16.0.2: icmp_seq=4 ttl=64 time=0.260 ms
Success!