BenV's notes

Xen 4.2.0 on Slackware64 14.0

by on Nov.20, 2012, under Software

Yay a new Xen version. Well, it’s not that new, but I’m upgrading to it today. And while we’re at it, Jeremy got his pvops kernel almost to version 3.1.0 (it’s at rc9 today, good enough for me atm).
So what’s new in this latest Xen version? First of all it has finally ditched the xm command for good. Well, it’s still there, but it’s really deprecated now because it has been replaced by xl. For a nice overview of what has been improved since Xen 4.0, they have a nice list over here.
One cool thing in the later Xen releases (that is: 4.0 and up) is the integration of Remus. We’ll test that out later.

Since things have changed a bit since my last Xen installation on Slackware, here’s an updated Xen on Slackware64 howto. We’ll start right after the installation of Slackware64 — if you can’t handle that you’d better run. Also I assume you already have grub2 up and running, same for (software) raid arrays etc.

Xen 4.2.0 Installation on Slackware64 v14.0


root@xen420:~# mkdir -p /usr/src/xen
root@xen420:~# cd /usr/src/xen
root@xen420:/usr/src/xen# wget http://bits.xensource.com/oss-xen/release/4.2.0/xen-4.2.0.tar.gz
root@xen420:/usr/src/xen# tar axvf !$:t
root@xen420:/usr/src/xen# chown -R root:root !$
root@xen420:/usr/src/xen# cd !$
root@xen420:/usr/src/xen/xen-4.2.0# make -j4 xen
# Time passes
root@xen420:/usr/src/xen/xen-4.2.0# make install-xen

That’s the Xen Hypervisor part done, but we also need the tools and dom0 kernel in order to do something useful with it. Tools are next. They have a bunch of requirements that depend on the options that you need enabled (like “Trusted” boot garbage etc). If you want HVM support (in order to run windows for example) you’ll need the ACPI ASL Compiler from Intel. It seems like you can’t even disable that anymore, so we’ll get the latest IASL first:

root@xen420:~# cd /usr/src
root@xen420:/usr/src# wget https://acpica.org/download/acpica-unix2-20121018.tar.gz
root@xen420:/usr/src# tar axvf !$:t
root@xen420:/usr/src# cd acpica-unix2-20121018
root@xen420:/usr/src/acpica-unix2-20121018# make -j4
root@xen420:/usr/src/acpica-unix2-20121018# ginstall -m755 generate/unix/bin64/iasl /usr/local/bin/

That covers the ACPI compiler. Back to the Xen tools installation. In the Xen README we notice another annoying new dependency called ‘yajl‘. Yet Another Junk Library? No, it’s JSON — run! Fine, we’ll get the piece of trash. Weird requirement for Xen if you ask me, but it’s probably for their api.

root@xen420:~# cd /usr/src
root@xen420:/usr/src# wget http://github.com/lloyd/yajl/tarball/2.0.1 -O yajl-2.0.1.tar.gz
root@xen420:/usr/src# tar axvf !$
# Fuck, tag a release version LLoyd, nobody likes directory names like these:
root@xen420:/usr/src# mv lloyd-yajl-f4b2b1a yajl-2.0.1
root@xen420:/usr/src# cd !$
root@xen420:/usr/src/yajl-2.0.1# ./configure --prefix=/usr
root@xen420:/usr/src/yajl-2.0.1# make -j4
root@xen420:/usr/src/yajl-2.0.1# mkdir pkg ; export DESTDIR=`pwd`/pkg; make install ; cd pkg
root@xen420:/usr/src/yajl-2.0.1/pkg# makepkg -l y -c n /usr/src/packages/yajl-2.0.1-x86_64-1.txz
root@xen420:/usr/src/yajl-2.0.1/pkg# installpkg /usr/src/packages/yajl-2.0.1-x86_64-1.txz

Now that we’re done with the requirements, let’s finally get the Xen tools installed.

# First we run configure, change your requirements here if needed.
root@xen420:/usr/src/xen/xen-4.2.0# ./configure
# If you're missing dependencies like yajl it'll complain here. It shouldn't if you followed me so far.
root@xen420:/usr/src/xen/xen-4.2.0# make -j4 tools
# You might run into this error -- I discussed it on my previous Xen 4.0 post, it's still the same.
In file included from /usr/include/features.h:382:0,
from /usr/include/stdint.h:26,
from /usr/lib64/gcc/x86_64-slackware-linux/4.7.1/include/stdint.h:3,
from ../../../hvmloader/acpi/acpi2_0.h:21,
from ../util.h:4,
from tcgbios.c:27:
/usr/include/gnu/stubs.h:7:27: fatal error: gnu/stubs-32.h: No such file or directory
# To solve this we can do this:
root@xen420:/usr/src/xen/xen-4.2.0# ln -s /usr/include/gnu/stubs-64.h /usr/include/gnu/stubs-32.h
root@xen420:/usr/src/xen/xen-4.2.0# make -j4 tools
root@xen420:/usr/src/xen/xen-4.2.0# make install-tools

There, the tools are installed. Note that we’re still running into the gnu/stubs-32.h error with this Xen release, but the fix is still the same. On to the next part, the dom0 kernel.

Xen dom0 kernel

The options for a dom0 kernel are still plentiful. I’ll be using Jeremy’s pvops kernel. You can read some information on the pvops on the Xen wiki on XenParavirtOps. Alternatively they claim you can use a vanilla linux kernel these days for dom0, but I haven’t tried it lately. I’ll stick to what I know that works 🙂
Building and installing this kernel is a simple, but tedious and time consuming process that goes a bit like this:

root@xen420:/usr/src/xen# git clone git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen.git linux-jeremy-git
# This'll take a bit
root@xen420:/usr/src/xen# cd linux-jeremy-git
# IF you have an old configuration I suggest you put it in place and make oldconfig:
root@xen420:/usr/src/xen/linux-jeremy-git# cp /root/linux-jeremy-config-2.6 .config
root@xen420:/usr/src/xen/linux-jeremy-git# make oldconfig
# Either hold enter for a while and check everything in the menuconfig later, or seriously read and answer the questions that follow
root@xen420:/usr/src/xen/linux-jeremy-git# make menuconfig

You’ll end up in the famous kernel configuration menu. I’m sure you’ll manage, here are some pointers on required options for Xen dom0. Since you’ll start off with your current kernel configuration most options are already set, you’ll only be asked about new/changed options.
The options you should look out for (and enable, except for maybe the XEN_DEBUG option and a few NJETXEN_NIC drivers or whatever they’re called) can be found by hitting / and typing ‘xen’.
After you’re done with the configuration, exit and save the menu. Let’s build the kernel:

root@xen420:/usr/src/xen/linux-jeremy-git# make bzImage modules modules_install
# See you in 10 minutes orso
root@xen420:/usr/src/xen/linux-jeremy-git# cp arch/x86_64/boot/bzImage /boot/vmlinuz-jeremy-3.1.0-rc9

The kernel is in place, now all that’s left is adding it to Grub. You ARE running grub(2), right? 😉
Here’s my /boot/grub/grub.cfg blurb:

menuentry "Xen 4.2.0 / Linux 3.1.0-rc9" {
echo Xen 4.2.0 / Linux 3.1.0-rc9 loading...
multiboot /boot/xen-4.2.0.gz dom0_mem=768M,max:768M dom0_vcpus_pin dom0_max_vcpus=1 loglvl=all guest_loglvl=all vga=current,keep
module /boot/vmlinuz-jeremy-3.1.0-rc9 root=/dev/md0 ro nomodeset console=hvc0 earlyprintk=xen
}

Now all that’s left is to enable Xen when your system boots, but you could do it manually for now. If you want to add it, you could add this to /etc/rc.d/rc.local:

if [ -d /proc/xen ]
then
/etc/init.d/xencommons start
# This one is optional, it'll start your Xen domains if you put them in the right place:
/etc/init.d/xendomains start
fi

These lines will start the required daemons for Xen, like xenstored — without this the xen tools won’t be able to communicate with the hypervisor.

One issue I ran into after upgrading from Xen 4.0 with kernel version 2.6.32.something with Jeremy’s pvops to this latest Xen 4.2 version with kernel version 3.1.0-rc9 (Jeremy’s) was a nice segfaulting xenstored. I noticed it segfaulted in dmesg, but obviously there was no reason supplied why it crashed there. At the same time a simple command such as xl list would completely hang with no method to kill it. (ctrl-c, ctrl-z, etc all didn’t react, seemed like it was waiting for a kernel something).
The reason for the above became clearer when starting xenstored manually, it mentioned something about evtchn missing. That was a complete lie, since it was definitely in my kernel — not as a module mind you. However, xenstored doesn’t look at modules, it simply tries to open /dev/xen/evtchn, which didn’t exist.
Reason? Oh yeah, upgrading from 2.6.x to 3.x introduced a ton of new kernel options. The missing ones in this case are not new to me — this is an udev issue I ran into before –, I simply forgot ;). A repeat of the solution: enable CONFIG_DEVTMPFS and CONFIG_DEVTMPFS_MOUNT in Device Drivers -> Generic Driver Options.
The lack of those options cause udev to fail, which means no /dev/xen for you (among things).

Anyhow, a reboot later and we’re up and running in Xen 4.2.0 with Linux 3.1.0-rc9 🙂

Remus

Time for something cool (at least in theory). With Remus we should be able to yank the plug on a running domU and it’ll magically keep running from the perspective of the outside. On the inside the other Xen node will take over. Let’s see how well this works in reality.
First we’ll need a test domU. For testing I just took a generic Slackware skeleton domU.
In order to run a domU with Remus you should either have shared storage (think DRBD) or if that’s not an option the block device should be exactly the same on both Xen nodes in order for Remus to handle the storage updates.
Then again, I like to see how things fail, so here goes without setting things up. First I boot my domU, then I try to move it to my other Xen node that doesn’t know about this domU yet.

root@xen01:/xen/hosts/test# xl create test.cfg
Parsing config from test.cfg
Daemon running with PID 2312
root@xen01:/xen/hosts/test# xl list
Name ID Mem VCPUs State Time(s)
Domain-0 0 767 1 r----- 971.7
test 2 512 1 -b---- 9.2
root@iejoor:/xen/hosts/test# xl remus test 192.168.1.7
root@192.168.1.7's password:
Saving to migration stream new xl format (info 0x0/0x0/174)
migration target: Ready to receive domain.
Loading new save file (new xl fmt info 0x0/0x0/174)
Savefile contains xl domain config
libxl: error: libxl_device.c:243:libxl__device_disk_set_backend: Disk vdev=xvda1 failed to stat: /dev/Storage/slacktest: No such file or directory
migration target: Domain creation failed (code -3).
libxl: error: libxl_utils.c:363:libxl_read_exactly: file/stream truncated reading ipc msg header from domain 2 save/restore helper stdout pipe
libxl: error: libxl_exec.c:129:libxl_report_child_exitstatus: domain 2 save/restore helper [2135] died due to fatal signal Broken pipe
remus sender: libxl_domain_suspend failed (rc=-3)
Remus: Backup failed? resuming domain at primary.
root@iejoor:/xen/hosts/test# xl list
Name ID Mem VCPUs State Time(s)
Domain-0 0 768 1 r----- 979.1

What. Remus fails to transfer my domU, so let’s kill it? That’s terrible. To be fair I was negligent because I didn’t properly give the test domU the same storage on both ends. Let’s try again (still without DRBD) but this time use a storage file. The disk configuration now says:
disk = [‘tap:aio:/xen/hosts/test/test.img,xvda1,w’]
And after running rsync to make sure this file exists on both nodes, we try again:

root@xen01:/xen/hosts/test# xl create test.cfg
Parsing config from test.cfg
Daemon running with PID 2843
root@xen01:/xen/hosts/test# xl remus test 192.168.1.7
root@192.168.1.7's password:
Saving to migration stream new xl format (info 0x0/0x0/228)
migration target: Ready to receive domain.
Loading new save file (new xl fmt info 0x0/0x0/228)
Savefile contains xl domain config
libxl: cannot execute /usr/lib/xen/bin/libxl-save-helper: No such file or directory
libxl: error: libxl_utils.c:363:libxl_read_exactly: file/stream truncated reading ipc msg header from domain 4 save/restore helper stdout pipe
libxl: error: libxl_exec.c:118:libxl_report_child_exitstatus: domain 4 save/restore helper [2742] exited with error status 255
libxl: error: libxl_create.c:901:domcreate_rebuild_done: cannot (re-)build domain: -3
migration target: Domain creation failed (code -3).
libxl: error: libxl_utils.c:363:libxl_read_exactly: file/stream truncated reading ipc msg header from domain 6 save/restore helper stdout pipe
libxl: error: libxl_exec.c:129:libxl_report_child_exitstatus: domain 6 save/restore helper [2850] died due to fatal signal Broken pipe
remus sender: libxl_domain_suspend failed (rc=-3)
Remus: Backup failed? resuming domain at primary.
root@xen01:/xen/hosts/test# xl list
Name ID Mem VCPUs State Time(s)
Domain-0 0 768 1 r----- 1423.2

Dead again. But wait, this time there’s an error hidden in there. Seems like my second node is missing the /usr/lib/xen/bin/libxl-save-helper binary. Well, after some inspection it seems like a 32/64-bit fuckup. Somehow the Xen installation created both /usr/lib/xen and /usr/lib64/xen … some of their tools need to have their autoconf rules checked I guess. Anyhow, I moved them all to /usr/lib64/xen and symlinked the other dir. Try 3:

root@xen01:/xen/hosts/test# xl remus test 192.168.1.7
root@192.168.1.7's password:
Saving to migration stream new xl format (info 0x0/0x0/228)
migration target: Ready to receive domain.
Loading new save file (new xl fmt info 0x0/0x0/228)
Savefile contains xl domain config

Well, this looks more promising. On the other Xen node we now see this:

root@xen02:~:0>xl list
Name ID Mem VCPUs State Time(s)
Domain-0 0 2852 4 r----- 268.9
test--incoming 5 512 0 --p--- 0.0

Thus far this is as promised: the domain is constantly being live migrated. So now in theory I should be able to pull the plug on one of the machines and the other should be like “oh, I need to take over now”. While contemplating on what to try next however, this gem flew past:

[ 281.142163] ------------[ cut here ]------------
[ 281.142215] kernel BUG at arch/x86/xen/irq.c:105!
[ 281.142224] invalid opcode: 0000 [#1] SMP
[ 281.142236] CPU 0
[ 281.142242] Modules linked in:
[ 281.142251]
[ 281.142259] Pid: 0, comm: swapper Not tainted 3.1.0-rc9-Owl-ICT-00087-g4c41042 #2
[ 281.142271] RIP: e030:[] [] xen_safe_halt+0x10/0x13
[ 281.142312] RSP: e02b:ffffffff8186df58 EFLAGS: 00010202
[ 281.142318] RAX: 0000000000000001 RBX: ffffffff8186dfd8 RCX: 00000000c0010055
[ 281.142325] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000001
[ 281.142333] RBP: ffffffff8196adc0 R08: 0000000000000000 R09: 0000000000000000
[ 281.142340] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88001fffbc80
[ 281.142346] R13: ffffffffffffffff R14: 0000000000000000 R15: 0000000000000000
[ 281.142357] FS: 0000000000000000(0000) GS:ffff88001feb5000(0000) knlGS:0000000000000000
[ 281.142365] CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 281.142372] CR2: 00000000f766d8f0 CR3: 000000001e34d000 CR4: 0000000000000660
[ 281.142382] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 281.142389] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 281.142396] Process swapper (pid: 0, threadinfo ffffffff8186c000, task ffffffff81879020)
[ 281.142404] Stack:
[ 281.142407] ffffffff8100f723 ffffffff8100f8ba 0000000000000000 000000008186dfd8
[ 281.142421] ffffffff8196adc0 ffffffff81008204 ffffffff81965e70 ffffffff81927aaf
[ 281.142434] 0000000000000000 ffffffff8196adc0 0000000000000000 ffffffff81ccc000
[ 281.142447] Call Trace:
[ 281.142461] [] ? default_idle+0x21/0x3b
[ 281.142470] [] ? amd_e400_idle+0xdf/0xe4
[ 281.142480] [] ? cpu_idle+0x5c/0x99
[ 281.142503] [] ? start_kernel+0x32e/0x339
[ 281.142512] [] ? xen_start_kernel+0x579/0x57f
[ 281.142518] Code: c0 0f b6 c0 48 f7 d8 25 00 02 00 00 c3 65 48 8b 04 25 00 b1 00 00 c6 40 01 01 c3 bf 01 00 00 00 31 f6 e8 a6 b6 ff ff 85 c0 74 02 <0f> 0b c3 ff 14 25 c0 ea 87 81 f6 c4 02 75 18 65 8b 34 25 40 d3
[ 281.142638] RIP [] xen_safe_halt+0x10/0x13
[ 281.142647] RSP
[ 281.142662] ---[ end trace 25ca6ac18424c5e4 ]---
[ 281.142669] Kernel panic - not syncing: Attempted to kill the idle task!
[ 281.142676] Pid: 0, comm: swapper Tainted: G D 3.1.0-rc9-Owl-ICT-00087-g4c41042 #2
[ 281.142684] Call Trace:
[ 281.142696] [] ? panic+0x95/0x194
[ 281.142709] [] ? do_exit+0x8b/0x6d1
[ 281.142729] [] ? do_raw_spin_unlock+0x5/0x8
[ 281.142738] [] ? _raw_spin_unlock_irqrestore+0x9/0x12
[ 281.142746] [] ? arch_local_irq_restore+0x7/0x8
[ 281.142754] [] ? kmsg_dump+0x40/0xc5
[ 281.142762] [] ? oops_end+0xaa/0xaf
[ 281.142769] [] ? do_invalid_op+0x87/0x91
[ 281.142777] [] ? xen_safe_halt+0x10/0x13
[ 281.142784] [] ? check_events+0x12/0x20
[ 281.142791] [] ? xen_force_evtchn_callback+0x9/0xa
[ 281.142799] [] ? check_events+0x12/0x20
[ 281.142807] [] ? invalid_op+0x1b/0x20
[ 281.142814] [] ? xen_safe_halt+0x10/0x13
[ 281.142822] [] ? xen_safe_halt+0xc/0x13
[ 281.142829] [] ? default_idle+0x21/0x3b
[ 281.142835] [] ? amd_e400_idle+0xdf/0xe4
[ 281.142842] [] ? cpu_idle+0x5c/0x99
[ 281.142849] [] ? start_kernel+0x32e/0x339
[ 281.142857] [] ? xen_start_kernel+0x579/0x57f

So much for that test domU, it crashed hard. Maybe something with powersave / cpu idle ruining things, but I’m no kernel hacker.

In the next retry I didn’t see this crash again. I pulled out the network cable, and w00t… it worked! A ping I had running from one of my servers on the network to the domU didn’t show any signs of packet loss whatsoever. Of course this isn’t a serious test, but it’s a first sign of cool stuff working as it should 🙂
My conclusion for now is that Remus is cute for testing, but I wouldn’t trust my production domU’s on it yet. It’s just too damn easy to crash a domain because of either configuration issues or a fat finger mistake. And I didn’t even mention that kernel panic yet.

I give Remus another try again later in combination with DRBD, but for now this post is long enough. Thanks for reading 🙂




:, , ,

Leave a Reply

You must be logged in to post a comment.

Archives

  • 2018 (1)
  • 2016 (1)
  • 2015 (7)
  • 2014 (4)
  • 2013 (11)
  • 2012 (27)
  • 2011 (26)
  • 2010 (25)
  • 2009 (68)