BenV's notes

Xen, DRBD and live migration

by on Mar.10, 2011, under Software

Once again I have some new hardware that’s been labeled “Xen Server”.
This time I want to set it up in a way that brings some redudancy so we can actually have 1 server fail and still have our hosts up and running.
(or at least back up in a few minutes instead of several hours).
To achieve this goal I will install the latest version of Xen (which seems to be 4.01) and use DRBD with LVM for storage.

Xen Setup

First we have to download the Xen sources, compile and install them. Here goes:


root@newserver:/# mkdir /usr/src/xen
root@newserver:/# cd /usr/src/xen
root@newserver:/usr/src/xen# wget http://bits.xensource.com/oss-xen/release/4.0.1/xen-4.0.1.tar.gz
root@newserver:/usr/src/xen# tar zxvf xen-4.0.1.tar.gz
root@newserver:/usr/src/xen# cd xen-4.0.1
root@newserver:/usr/src/xen/xen-4.0.1# make xen
root@newserver:/usr/src/xen/xen-4.0.1# make install-xen
root@newserver:/usr/src/xen/xen-4.0.1# make tools
# lots of output
make[10]: Entering directory `/usr/src/xen/xen-4.0.1/tools/firmware/rombios/32bit/tcgbios'
gcc -O2 -fomit-frame-pointer -m32 -march=i686 -fno-strict-aliasing -std=gnu99 -Wall -Wstrict-prototypes -Wno-unused-value -Wdeclaration-after-statement -D__XEN_TOOLS__ -MMD -MF .tcgbios.o.d -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE -mno-tls-direct-seg-refs -DNDEBUG -Werror -fno-stack-protector -fno-exceptions -fno-builtin -msoft-float -I../../../../../tools/include -I.. -I../.. -c -o tcgbios.o tcgbios.c
In file included from /usr/include/features.h:380:0,
from /usr/include/stdint.h:26,
from /usr/lib64/gcc/x86_64-slackware-linux/4.5.2/include/stdint.h:3,
from ../rombios_compat.h:8,
from tcgbios.c:24:
/usr/include/gnu/stubs.h:7:27: fatal error: gnu/stubs-32.h: No such file or directory
compilation terminated.
make[10]: *** [tcgbios.o] Error 1
make[10]: Leaving directory `/usr/src/xen/xen-4.0.1/tools/firmware/rombios/32bit/tcgbios'
make[9]: *** [subdir-all-tcgbios] Error 2
make[9]: Leaving directory `/usr/src/xen/xen-4.0.1/tools/firmware/rombios/32bit'
make[8]: *** [subdirs-all] Error 2
make[8]: Leaving directory `/usr/src/xen/xen-4.0.1/tools/firmware/rombios/32bit'
make[7]: *** [subdir-all-32bit] Error 2
make[7]: Leaving directory `/usr/src/xen/xen-4.0.1/tools/firmware/rombios'
make[6]: *** [subdirs-all] Error 2
make[6]: Leaving directory `/usr/src/xen/xen-4.0.1/tools/firmware/rombios'
make[5]: *** [subdir-all-rombios] Error 2
make[5]: Leaving directory `/usr/src/xen/xen-4.0.1/tools/firmware'
make[4]: *** [subdirs-all] Error 2
make[4]: Leaving directory `/usr/src/xen/xen-4.0.1/tools/firmware'
make[3]: *** [all] Error 2
make[3]: Leaving directory `/usr/src/xen/xen-4.0.1/tools/firmware'
make[2]: *** [subdir-install-firmware] Error 2
make[2]: Leaving directory `/usr/src/xen/xen-4.0.1/tools'
make[1]: *** [subdirs-install] Error 2
make[1]: Leaving directory `/usr/src/xen/xen-4.0.1/tools'
make: *** [install-tools] Error 2

Yay, it broke. Some searching reveals that it’s something retarded again.
It tries to include headers that aren’t available on our 64 bit version of slackware.
See this for details.
A quick solution is to fake the file:

root@newserver:~# ln -s /usr/include/gnu/stubs-64.h /usr/include/gnu/stubs-32.h

Let’s continue compiling!

root@newserver:/usr/src/xen/xen-4.0.1# make tools
# more tons of output
make[8]: Entering directory `/usr/src/xen/xen-4.0.1/tools/firmware/hvmloader/acpi'

ACPI ASL compiler (iasl) is needed
Download and install Intel ACPI CA from
http://acpica.org/downloads/

make[8]: *** [iasl] Error 1

Oh… well, I guess we need some garbage intel compiler now. Fine, let’s grab and install it.
Obviously you need to agree with giving away your first born child to Intel, the usual, but I don’t care.
My clownfish downloaded and compiled it for me. He’s very intelligent 😉

root@newserver:/usr/src/xen/xen-4.0.1# cd /usr/src
root@newserver:/usr/src# wget http://acpica.org/download/acpica-unix2-20110112.tar.gz
root@newserver:/usr/src# tar zxvf $!
root@newserver:/usr/src# cd acpica-unix2-20110112/compiler
root@newserver:/usr/src/acpica-unix2-20110112/compiler# make
root@newserver:/usr/src/acpica-unix2-20110112/compiler# ginstall -m755 iasl /usr/local/bin

Tada, the Irregular Antimatter Structure Layer is installed. Or whatever it stands for today 🙂
Let’s retry the Xen part.

root@newserver:/usr/src/acpica-unix2-20110112/compiler# cd /usr/src/xen/xen-4.0.1
root@newserver:/usr/src/xen/xen-4.0.1# make tools
# And this time it works!
root@newserver:/usr/src/xen/xen-4.0.1# make install-tools

That’s part 1 complete.

Xen dom0 kernel

Next, we’ll need a dom0 kernel. These days the vanilla linux 2.6 kernel has Xen dom0 support…. but it’s not complete enough to boot yet (tried with 2.6.37).
At least, that’s what my experiments have shown so far. If you want to try it yourself, see this.
My tries crashed like this:

# more boot gibberish
[ 0.0000000000] kernel direct mapping tables up to 2000000 @ 26fd0000-27ff0000
[ 0.0000000000] init_memory_mapping: 0000001000000000-080000022000000
[(XEN) Domain 0 crashed: 'noreboot' set - not rebooting.

Google got me to this, so I’ve given up on that for now.

However, what DOES work is the patched kernel maintained by Jeremy.
Let’s get it and build it!

root@newserver:/usr/src/xen# git clone git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen.git linux-2.6-jeremy-git
root@newserver:/usr/src/xen# cd linux-2.6-jeremy-git
root@newserver:/usr/src/xen/linux-2.6-jeremy-git# git reset --hard
root@newserver:/usr/src/xen/linux-2.6-jeremy-git# git checkout -b xen/stable-2.6.32.x origin/xen/stable-2.6.32.x
root@newserver:/usr/src/xen/linux-2.6-jeremy-git# git pull
root@newserver:/usr/src/xen/linux-2.6-jeremy-git# make menuconfig
# Have fun, make sure to enable the required dom0 stuff as described here http://wiki.xensource.com/xenwiki/XenParavirtOps
root@newserver:/usr/src/xen/linux-2.6-jeremy-git# make bzImage modules modules_install
root@newserver:/usr/src/xen/linux-2.6-jeremy-git# cp arch/x86_64/boot/bzImage /boot/vmlinuz-xen-dom0-2.6.32.x
# update your grub /boot/grub/grub.cfg

Time to test it! Make sure you check /etc/xen/xend-config.sxp. I set my networking to bridge, but that’s the only thing I needed to change.
You should now be able to reboot your machine into the new Xen 4.0.1 + 2.6.32.x kernel. At least I could 🙂

Xen domU test

Let’s see if our Xen actually works. Start it up using the /etc/init.d/xend script.

Next, grab an image from stacklet. … wait, what!
Since when do they ask money for downloads?
*grr*. Another useful site down the drain… guess their bandwidth bills were too high and they weren’t intelligent enough to use bittorrent. I hope they go bankrupt soon and vanish from the internet. Free replacements are welcome.

Fine, I’ll build my own slackware image. Use a real block device like an LVM partition if you plan on using it for more than testing 😉
Since this is just a test for me I’ll use an image instead.

  • Create a 10GB image:
    dd if=/dev/zero of=slackware64_v13.1.img bs=1M seek=10240 count=0
  • Format as ext4:
    mkfs.ext4 -L Root slackware64_v13.1.img
  • Mount it somewhere:
    mkdir -p /mnt/tmp
    mount -t ext4 -o loop slackware64_v13.1.img /mnt/tmp
  • From your slackware64 source (dvd, nfs, whatever), install the packages you need. I’m lazy so I’ll instead way too much:
    cd $YOURSLACKWARESOURCE/slackware64/
    for PKG in `grep ADD {a,ap,l,n}/tagfile | cut -f1,2 -d':' | sed -e 's/tagfile://'` ; do
    installpkg -root /mnt/tmp -infobox $PKG-*.t?z
    done
  • Basic configuration for image. At this point you should be able to chroot into your new install:
    chroot /mnt/tmpAnd edit the needed setup stuff. To make a domU boot properly, we need:
    cat > /etc/fstab
    /dev/xvda1 / ext4 defaults 0 1
    none /dev/pts devpts gid=5,mode=620 0 0
    none /dev/shm tmpfs noexec 0 0
    none /proc proc defaults 0 0
    none /tmp tmpfs noexec 0 0
    Make sure we can login:
    echo hvc0 >> /etc/securetty
    cat >> /etc/inittab
    # Xen console
    hvc0:1235:respawn:/sbin/agetty 38400 hvc0
  • Try booting it:
    Exit the shell by using ‘exit’ or press ctrl-d on an empty line.
    umount /mnt/tmpCreate a xen domU config:
    cat >> testdomain.cfg
    kernel = "/boot/vmlinuz-xen-dom0-2.6.32.x"
    memory = 768
    name = "test"
    vcpus = 2
    vif = [ 'vifname=test.0' ]
    dhcp = "no"
    disk = [
    'file:/xen/hosts/test/slackware64_v13.1.img,xvda1,w'
    ]
    root = "/dev/xvda1 ro"
    extra = "xencons=hvc0"
  • Test time:
    xm create -c testdomain.cfg

Note that there’s still a lot of tweaking to be done on this image, like getting rid of hardware clock, adding network config, a root password, and tons of other details, but at least it boots and you can use it 🙂
Oh yeah, you can exit the console by hitting ctrl-] in case you’re lost 😉

DRDB Setup

Finally, time to play with new stuff. For those of you who don’t know it, DRBD is a kernel module and some tools that allow you to do RAID-1 between 2 machines over network. This is awesome, because it will give us redundancy, AND networked storage (which means live migrations!).
Since I don’t want to use more than two machines, and don’t have the funds for dedicated network storage, this is an excellent solution.
If you want to use more than two machines you could opt for using DRBD on two machines that you use as storage nodes that export their storage using iSCSI or some kind of cluster filesystem to your other nodes that run Xen. But that’s for next time :-p

For now we’ll stick to this setup:

  • 2 machines, both with enough storage
  • each machine has their own (small) RAID-1 backed partition from which they boot and run dom0
  • each machine has a big chunk of RAID-1 backed storage that’s put into LVM
  • DRBD is put on top of LVM volumes as needed to run domU’s from

Time to install it!
Since 2.6.32 is the last kernel that DOESN’T have DRDB included we’ll have to build it ourselves. (This always happens with stuff you want to use :))
Here goes:

root@newserver:/usr/src# wget http://oss.linbit.com/drbd/8.3/drbd-8.3.10.tar.gz
root@newserver:/usr/src# tar zxvf $!
root@newserver:/usr/src# cd drbd-8.3.10
root@newserver:/usr/src/drbd-8.3.10# ./configure --prefix=/usr --with-utils --with-km --with-udev --with-xen --with-pacemaker --with-heartbeat
root@newserver:/usr/src/drbd-8.3.10# slackbuild.pl
# create package
root@newserver:/usr/src/drbd-8.3.10# installpkg /usr/src/packages/drbd-8.3.10-x86_64-1.txz
root@newserver:/usr/src/drbd-8.3.10# modprobe drbd
root@newserver:/usr/src/drbd-8.3.10# cat /proc/drbd
version: 8.3.10 (api:88/proto:86-96)
GIT-hash: 5c0b0469666682443d4785d90a2c603378f9017b build by root@newserver, 2011-02-02 14:54:41

Seems to work. Note how it also installed a new xen script for us – /etc/xen/scripts/block-drbd.
Time for a test.
For every DRBD block device you want you’ll have to have an equally sized block device available on both hosts. (sounds obvious)
As I said earlier, I’ll use software RAID-1 backed LVM devices on both hosts.
Note that for those who don’t have free space available it’s theoretically possibly to upgrade an existing partition to a DRBD device. But I won’t go there yet.
The link between the two hosts is dedicated, both on eth1 using the 10.0.0.x namespace. (I’ll pick 10.0.0.1 for machine1 and 10.0.0.2 for machine2 to make it easy).

Test setup, configuration for 1 DRBD device or ‘resource‘ as they call it.
In /etc/drbd.conf they include /etc/drbd.d/global_common.conf and /etc/drbd.d/*.res.
global_common.conf is fine with the default values for now (Protocol C, usage stats enabled and some handlers), so let’s create a resource in /etc/drbd.d/test.res:

resource test {
on machine1 {
device /dev/drbd1;
disk /dev/Storage/drbd_test;
address 10.0.0.1:7789;
meta-disk internal;
}
on machine2 {
device /dev/drbd1;
disk /dev/Storage/drbd_test;
address 10.0.0.2:7789;
meta-disk internal;
}

Easy enough. Let’s test it!
On both hosts you’ll have to initialize the resource.
First on machine1. Note that at this point my network is still down 😉

root@newserver# drbdadm create-md test
Writing meta data...
initializing activity log
NOT initialized bitmap
New drbd meta data block successfully created.
success
root@newserver# drbdadm up test
root@newserver# cat /proc/drbd
version: 8.3.10 (api:88/proto:86-96)
GIT-hash: 5c0b0469666682443d4785d90a2c603378f9017b build by root@newserver, 2011-02-02 14:54:41

1: cs:WFConnection ro:Secondary/Unknown ds:Inconsistent/DUnknown C r----s
ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:10485404

root@newserver# ls -la /dev/drbd/by-res/
total 0
drwxr-xr-x 2 root root 60 Feb 2 15:25 ./
drwxr-xr-x 4 root root 80 Feb 2 15:25 ../
lrwxrwxrwx 1 root root 11 Feb 2 15:25 test -> ../../drbd1

Looks good, it’s not even complaining because of the second host being unreachable.

Let’s fix that. Here goes the second host (which is a dd made replica of the first host, made after last step. Needless to say I had to recompile the dom0 kernel because of an issue with cpufreq and the cheap sempron. This seemed to be the cause, so I removed cpufreq from the kernel completely, it’s just for testing after all):

root@machine2:~# drbdadm create-md test
Writing meta data...
initializing activity log
NOT initialized bitmap
New drbd meta data block successfully created.
success
root@machine2:~# drbdadm up test
root@machine2:~# cat /proc/drbd
version: 8.3.10 (api:88/proto:86-96)
GIT-hash: 5c0b0469666682443d4785d90a2c603378f9017b build by root@machine2, 2011-03-10 11:25:33

1: cs:Connected ro:Secondary/Secondary ds:Inconsistent/Inconsistent C r-----
ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:10485404

Yay, they’re connected. The state of the device is still inconsistent since DRBD doesn’t know what side might contain valid data.
In theory we could upgrade an existing block device to DRBD and then make that existing one the ‘primary‘ to use as synchronisation source. If your disks are empty it doesn’t matter which machine you run this command from. I’ll pick machine1:

root@machine1:~# drbdadm -- --overwrite-data-of-peer primary test
root@machine1:~# cat /proc/drbd
version: 8.3.10 (api:88/proto:86-96)
GIT-hash: 5c0b0469666682443d4785d90a2c603378f9017b build by root@machine1, 2011-02-02 14:54:41

1: cs:SyncSource ro:Primary/Secondary ds:UpToDate/Inconsistent C r-----
ns:41856 nr:0 dw:0 dr:42520 al:0 bm:2 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:10443548
[>....................] sync'ed: 0.5% (10196/10236)M
finish: 2:11:51 speed: 1,316 (1,308) K/sec

And the other machine shows something similar:

root@machine2:~# cat /proc/drbd
version: 8.3.10 (api:88/proto:86-96)
GIT-hash: 5c0b0469666682443d4785d90a2c603378f9017b build by root@machine2, 2011-03-10 11:25:33

1: cs:SyncTarget ro:Secondary/Primary ds:Inconsistent/UpToDate C r-----
ns:0 nr:92032 dw:92032 dr:0 al:0 bm:5 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:10393372
[>....................] sync'ed: 0.9% (10148/10236)M
finish: 2:13:14 speed: 1,292 (1,296) want: 250 K/sec

Looks good. However, it’s slow. Given that this is a 100MBit link, it should go with at least a few MB/s, ideally 10MB/s.
The speed is regulated by the syncer, which you can change in your configuration files.
For this one time sync we’ll speed it up to ludicrous speed:

root@machine1:~# drbdsetup /dev/drbd1 syncer -r 110M
# A little later on machine 2
root@machine2:~# cat /proc/drbd
version: 8.3.10 (api:88/proto:86-96)
GIT-hash: 5c0b0469666682443d4785d90a2c603378f9017b build by root@machine2
, 2011-03-10 11:25:33

1: cs:SyncTarget ro:Secondary/Primary ds:Inconsistent/UpToDate C r-----
ns:0 nr:2241024 dw:2241024 dr:0 al:0 bm:136 lo:1 pe:3059 ua:0 ap:0 ep:1 wo:b oos:8244380
[===>................] sync'ed: 21.5% (8048/10236)M
finish: 0:11:23 speed: 12,044 (4,860) want: 112,640 K/sec

Now it does 12MB/s, which is a lot faster.
If you want to make this permanent, put it in your resource configuation, which is /etc/drbd.d/test.res for us now.

resource test {
syncer {
rate 110M;
}
# other stuff, leave it there :)
}

Note that DRBD suggests using only 30% of your bandwidth for synchronisation, so instead of 110M we should use 3.3M for the permanent config. Sounds reasonable.

After changing your configuration files, you need to synchronise them on both hosts (they should always be identical on all nodes) and then let drbd know that they’ve changed with:

root@machine1:~# rsync -ar /etc/drbd.d/ machine2:/etc/drbd.d/
root@machine1:~# drdbadm adjust test
# AND on the other node as well!
root@machine1:~# ssh machine2 'drdbadm adjust test'

It’s probably wise to put those commands in a script, combined with some ssh-key magic. I’ll call this script drbd-sync-cfg.
Mine looks like this:

#!/bin/bash
set -e
rsync -ar /etc/drbd.d/ machine2:/etc/drbd.d/
drdbadm adjust test
ssh -i /root/.ssh/drbd machine2 'drdbadm adjust test'

Anyway, after a few minutes of syncing it should be done and the status should be like this:

root@machine1:~# drbd-overview
1:test Connected Primary/Secondary UpToDate/UpToDate C r-----
root@machine1:~# cat /proc/drbd
version: 8.3.10 (api:88/proto:86-96)
GIT-hash: 5c0b0469666682443d4785d90a2c603378f9017b build by root@machine1, 2011-02-02 14:54:41

1: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r-----
ns:10485404 nr:0 dw:0 dr:10486424 al:0 bm:640 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0

Xen + DRBD == Live migration!

Time to introduce Xen and DRBD to eachother. First we’ll enable live migration on our resource by adding this to the resource config /etc/drbd.d/test.res:

resource test {
net {
allow-two-primaries;
}
# Leave the rest as before.
}

Next run your drbd-sync-cfg script.

Now that we have a DRBD block device that should be capable of running a Xen domU … let’s test it!
I’ll copy my test slackware domU from earlier onto the DRBD device and then we’ll try booting it.

root@machine1:~# dd if=/xen/hosts/test/slackware64_v13.1.img of=/dev/drbd1
# takes a while, see 'top' and you'll see dd, and some drbd processes doing stuff :)
# Or, from a separate shell:
root@machine1:~# killall -USR1 dd
# and the dd process will barf:
4124409+0 records in
4124409+0 records out
2111697408 bytes (2.1 GB) copied, 251.721 s, 8.4 MB/s
# After a while:
20970809+0 records in
20970808+0 records out
10737053696 bytes (11 GB) copied, 1277.9 s, 8.4 MB/s
# Note that mounting the image and drbd1 device and then running rsync is probably faster and works as well ;)

Next, change the config file for the domU. Xen will use the block-drbd script if you change the disk line:

# disk = [ 'file:/xen/hosts/test/slackware64_v13.1.img,xvda1,w' ]
disk = [ 'drbd:test,xvda1,w' ]

Let’s see if it runs 🙂


root@machine1:~# xm create -c /xen/hosts/test/testdomain.cfg
# bootup stuff appears
Welcome to Linux 2.6.32.27-BenV-g75cc13f (hvc0)

darkstar login:
# hit ctrl-]
root@machine1:~# xm list
Name ID Mem VCPUs State Time(s)
Domain-0 0 768 1 r----- 2135.5
test 1 768 2 -b---- 4.4
root@machine1:~# xm migrate --live test 10.0.0.2
root@machine1:~# xm list
Name ID Mem VCPUs State Time(s)
Domain-0 0 768 1 r----- 2168.9
root@machine1:~# # drbd-overview
1:test Connected Secondary/Primary UpToDate/UpToDate C r-----

Note how after the migration machine1 has become the ‘Secondary’ on the DRBD device.
Needless to say the migrated test domU didn’t show much signs of being moved, apart from a single line in its dmesg.
After shutting down the domU on machine2 we can start it again on machine1 and the DRBD roles will be fixed automagically by Xen (using the block-drbd script).
Isn’t this cool? 🙂

Since this post is way too long already I’ll leave it at this for today.




:, , , ,

1 Comment for this entry

  • cron0

    Very nice post! We also use Xen/DRBD to provide redundancy to our VMs. We use them in a Active/Active configuration where we have 2 DRBD resources on each server and the first host runs one VM and the other one runs another. This way we don’t have a standby server doing nothing and we maximize resource usage.

    Also you can skip the initial synchronization if your disks does not contain any datayou wish to keep. For that you simply clear the bitmap on the resource and it’ll instantly be ready for use:
    # drbdadm — –clear-bitmap new-current-uuid drbd0

Leave a Reply

You must be logged in to post a comment.

Archives

  • 2018 (1)
  • 2016 (1)
  • 2015 (7)
  • 2014 (4)
  • 2013 (11)
  • 2012 (27)
  • 2011 (26)
  • 2010 (25)
  • 2009 (68)