Monday, August 5, 2013

Using kdump on Ubuntu in Azure

This is another of my occasional posts that may help the next guy. I call them YAHTNG, yet another helping the next guy,t blog entry...

kdump

kdump is a tool that allows you to capture (in a file) the linux kernel state when it crashes (oops). It uses the kexec functionality that's long been part of the linux kernel (since 2004 if memory serves.) In order to use this on linux, you install the linux-crashdump metapackage that in turn depends on the right bits and pieces.

apt-get install linux-crashdump

On different versions of Linux, different bits and pieces get installed. Prior to Raring, 13.04, you get one set of packages and Raring and newer, you get a different set. In either case, on Microsoft's Azure cloud and elsewhere under the hyper-v hypervisor, you will get a hang if  you just install the linux-crashdump package and then experience a crash. This is due to some Azure-specific kernel modules that get loaded in the kexec/kdump kernel. You need to exclude these modules, i.e., blacklist them. Here's how.

Older Ubuntu Releases including Precise

In 12.04 (Precise) and 12.10 (Quantal) you want to edit /etc/init.d/kdump (this is the script that runs at boot time to configure the kdump kernel. The kdump kernel gets loaded into memory and configured via this script.)

--- /etc/init.d/kdump 2013-06-28 00:09:22.400504335 +0000
+++ kdump.nohyperv 2013-06-28 00:16:48.903733116 +0000
@@ -48,6 +48,7 @@ do_start () {
  # Append kdump_needed for initramfs to know what to do, and add
  # maxcpus=1 to keep things sane.
  APPEND="$APPEND kdump_needed maxcpus=1 irqpoll reset_devices"
+ APPEND="$APPEND ata_piix.prefer_ms_hyperv=0 modprobe.blacklist=hv_vmbus,hv_storvsc,hv_utils,hv_netvsc,hid_hyperv"

  # --elf32-core-headers is needed for 32-bit systems (ok
  # for 64-bit ones too).

As you can see, we are simply prohibiting the Azure kernel modules hv_vmbus, hv_storvsc, hv_utils, hv_netvsc, and hid_hyperv from loading in the kdump kernel. They still get loaded in the regular Azure kernel (and you will want to keep them there for performance and behavior reasons.) However, if they load in the kdump kernel, they won't actually work and will "hang" the kdump kernel while they try and connect to the Azure services (or hyper-v services.) Additionally, we  prefer NOT to load the hyper-v module setting for ata_piix by setting it to zero.

After you modify this init script, you will want to reboot. (But take note and read the last section on the crashkernel as you will likely want to make that change as well, prior to rebooting.

Newer Ubuntu Release (Raring and the upcoming Saucy)

The newest releases of Ubuntu include an additional package that handles kdump configuration called kdump-tools. This package manages the kernel modules in a simple config file /etc/default/kdump-tools. You can edit that file to blacklist the appropriate modules:

    67 #KDUMP_CMDLINE_APPEND="irqpoll maxcpus=1 nousb"
    68 KDUMP_CMDLINE_APPEND="irqpoll maxcpus=1 nousb  ata_piix.prefer_ms_hyperv=0 modprobe.blacklist=hv_vmbus,hv_storvsc,hv_utils,hv_netvsc,hid_hyperv "

In addition to preferring to NOT use the ata_piix for hyperv, it also blacklists the same kernel modules as previously mentioned.

Smaller Images

Low memory (extrasmall, small) Azure instances (well, really any small images including small physical machines) unfortunately run into bug #1206691, default crashkernel setting rarely works on a system with little memory. You will need to modify /etc/grub.d/10_linux and set the crashkernel to 128M for any size instance. Do this by simply altering the range here:

   74 # add crashkernel option if we have the required tools
    75 if [ -x "/usr/bin/makedumpfile" ] && [ -x "/sbin/kexec" ]; then
    76     GRUB_CMDLINE_EXTRA="$GRUB_CMDLINE_EXTRA crashkernel=384M-2G:64M,2G-:128M"


    74 # add crashkernel option if we have the required tools
    75 if [ -x "/usr/bin/makedumpfile" ] && [ -x "/sbin/kexec" ]; then
    76     GRUB_CMDLINE_EXTRA="$GRUB_CMDLINE_EXTRA crashkernel=384M-700M:64M,700M-:128M"



Once you have made this change, be sure to update grub:

sudo update-grub

so that the chnage will take effect. You will also want to reboot. Then you can validate that change by inspecting the boot command line:

cat /proc/cmdline

and see that the new value is now shown.

ubuntu@bug1195328-1210:~$ cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-3.5.0-36-generic root=UUID=39eb48d3-958a-48e0-896e-b6b03cc2342a ro crashkernel=128M console=tty1 console=ttyS0 earlyprintk=ttyS0 rootdelay=300

Reference Material

The official references for configuring Ubuntu for kdump are here:
https://wiki.ubuntu.com/Kernel/CrashdumpRecipe
https://help.ubuntu.com/lts/serverguide/kernel-crash-dump.html
https://bugs.launchpad.net/ubuntu/+source/grub2/+bug/785394
https://bugs.launchpad.net/ubuntu/+source/grub2/+bug/1206691

and you should refer to them for procedures for testing and verifying your crashdump setup.

Micosoft Azure has some notes on the kernel modules here:
http://support.microsoft.com/kb/2858695

No comments: