BenV's notes

Xen and booting domU using a vanilla kernel

by on Aug.18, 2009, under Morons, Software

Our server in the datacenter is running slackware (duh) with a nice Xen installation on it.
Still running with Xen 3 using PAE since the last time I updated/upgraded it without any major problems for over 2 years now 🙂
(ignoring the time that the power supply let go of the the magic smoke)

So after the latest local root exploit thing, and testing it on some machines during har2009, I figured it might be nice to get the patch into my kernels. Running a webserver usually means you’re at risk after all, especially when it’s running wordpress 😉

So I grabbed the latest kernel that had that patch and went for the usual make menuconfig ordeal. After half an hour of selecting “yes”, “maybe”, “I guess so”, and
some XEN options (CONFIG_PARAVIRT_GUEST=y, CONFIG_XEN=y, CONFIG_HVC_DRIVER=y, CONFIG_XEN_BLKDEV_FRONTEND=y, CONFIG_XEN_NETDEV_FRONTEND=y, CONFIG_XEN_KBDDEV_FRONTEND=y, CONFIG_HVC_XEN=y, CONFIG_XEN_BALLOON=y, CONFIG_XEN_SCRUB_PAGES=y, CONFIG_XENFS=y, CONFIG_XEN_COMPAT_XENFS=y) I built the thing.
As I usually build bzImages I also did so here. make bzImage modules modules_install.
Half an hour later I had a shiny bzImage. Let’s try it on a test domain!

I copied the bzImage to /boot, changed the kernel = "/boot/vmlinuz-xen-old" option to the new place, and GO!
This is what it told me:

root@iejoor:/xen/hosts/purple# xm create -c purple.cfg-newkernel
Using config file "./purple.cfg-newkernel".
Started domain purple
root@iejoor:/xen/hosts/purple#

Huh, where’s my console?
Checking xm list it seemed to be there… but paused… and with constantly increasing ids. Waaait a minute!
Checking the log files (xend.log in this case) revealed a secret:

[2009-08-18 16:40:01 16116] DEBUG (DevController:162) Waiting for devices irq.
[2009-08-18 16:40:01 16116] DEBUG (DevController:162) Waiting for devices vkbd.
[2009-08-18 16:40:01 16116] DEBUG (DevController:162) Waiting for devices vfb.
[2009-08-18 16:40:01 16116] DEBUG (DevController:162) Waiting for devices console.
[2009-08-18 16:40:01 16116] DEBUG (DevController:167) Waiting for 0.
[2009-08-18 16:40:01 16116] DEBUG (DevController:162) Waiting for devices pci.
[2009-08-18 16:40:01 16116] DEBUG (DevController:162) Waiting for devices ioports.
[2009-08-18 16:40:01 16116] DEBUG (DevController:162) Waiting for devices tap.
[2009-08-18 16:40:01 16116] DEBUG (DevController:162) Waiting for devices vtpm.
[2009-08-18 16:40:01 16116] INFO (XendDomain:1165) Domain purple (35) unpaused.
[2009-08-18 16:40:01 16116] WARNING (XendDomainInfo:1240) Domain has crashed: name=purple id=35.
[2009-08-18 16:40:01 16116] DEBUG (XendDomainInfo:1879) XendDomainInfo.destroy: domid=35
[2009-08-18 16:40:01 16116] DEBUG (XendDomainInfo:1896) XendDomainInfo.destroyDomain(35)

Note the ‘WARNING’ line. Crashed?!
Gee, that’s … interesting. Why? Took me 5 minutes to find the other log file, but xend-debug.log had a magic line:
ERROR Invalid kernel: xc_dom_find_loader: no loader found
… great. Another why.
In case you’re wondering, the increasing status in ‘xm list’ was caused by the on_crash = 'reboot' line in the xen host config.
I quickly destroyed the doman and changed it to a oneshot try: on_crash = 'destroy'

After some searching another hint presented itself. Aren’t they nice?
This hint was: “Xen is a retarded piece of cancer and can’t decypher the bzImage format, try vmlinux instead”. Aha!
Back to the kernel, make vmlinux, copy, and another try to boot it. Obviously this was way too simple. It still crashed.
Checking my old kernel that works for both dom0 and domU with file I noticed that gz should at least work.

# file /boot/vmlinuz-2.6*
/boot/vmlinuz-2.6.18.8-xen: gzip compressed data, from Unix, last modified: Tue Nov 25 16:13:16 2008, max compression
/boot/vmlinuz-new: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), statically linked, not stripped

Some more details about the crashing domains can be found using xm dmesg
It gave me something like:

(XEN) traps.c:413:d38 Unhandled general protection fault fault/trap [#13] on VCPU 0 [ec=0000]
(XEN) domain_crash_sync called from entry.S (ff18928e)
(XEN) Domain 38 (vcpu#0) crashed on cpu#3:
(XEN) ----[ Xen-3.2.3 x86_32p debug=n Not tainted ]----
(XEN) CPU: 3
(XEN) EIP: e019:[]
(XEN) EFLAGS: 00000282 CONTEXT: guest
(XEN) eax: 8000c068 ebx: c064c040 ecx: 80000000 edx: 00000cf8
(XEN) esi: c0651f3c edi: c0651f30 ebp: c06981b8 esp: c0651f14
(XEN) cr0: 8005003b cr4: 000006f0 cr3: 00bd5c80 cr2: 00000000
(XEN) ds: e021 es: e021 fs: e021 gs: e021 ss: e021 cs: e019
(XEN) Guest stack trace from esp=c0651f14:
(XEN) 00000000 c03e9e60 0001e019 00010082 c04bd0cd 00000068 00000000 00000000
(XEN) 00002003 00000000 00003030 00000002 00000007 c064c07c c065dfd8 00000000
(XEN) c06423c0 c064c100 c0651fd8 c065a63f 00000005 00000000 00000000 00000000
(XEN) c065d02b 006faee4 00000000 00000000 c056dad4 00100000 00000000 00100000
(XEN) 00100000 00000000 006faee4 c065d27c 006faee4 00000000 00000000 c0651fe8
(XEN) 00000000 00000000 00000000 c0651fe8 00000000 00000000 c065756f c056dd50
(XEN) c04c5020 c0657073 c0651ff4 c065973e 00000000 17898175 00800001 03040800
(XEN) 00100f22 00000000 c08e0000 c04ba91b c04ba923 c0103371 c0103878 c0659a92
(XEN) c01039c7 c0103afc c0103d1a c0104004 c0104163 c01041eb c010451d c010457c
(XEN) c0659eb7 c0659ebf c04ba975 c04baa22 c0105536 c01055ae c0105781 c0105c5c
(XEN) c0105c9e c0106253 c0106345 c0106831 c010683d c0107b07 c010825e c0108333
(XEN) c065a45c c065a46a c0108e56 c065a982 c065a98a c0109bf0 c065af2c c065af93
(XEN) c065afe9 c065b042 c065b09b c065b0f4 c065b14d c065b1a6 c065b1ff c065b258
(XEN) c065b2b1 c065b32d c065b341 c065b397 c065b3f0 c065b449 c065c14f c065c157
(XEN) c010aee5 c010aef3 c010af77 c010af85 c065d7be c065d7d1 c010bdc3 c010be22
(XEN) c010c113 c010c14a c010c1c6 c010c2f3 c010c41a c010c443 c010c4dd c010d267
(XEN) c010d4cb c010d77b c010d782 c010de76 c010defd c010df0c c010df3f c010df49
(XEN) c010df84 c010df8c c010df94 c04bb1d7 c04bb466 c04bb4db c04bb5b9 c04bb60f
(XEN) c04bb62f c04bbb8b c04bbd60 c065de79 c065de81 c04bc041 c04bc46b c04bc50d
(XEN) c04bc899 c04bc902 c065e012 c065e01a c065e028 c065e030 c065e038 c065e040

As you can see, that’s really useful….. 😉

A useful page with some info can be found here. It confirms that a gz kernel should work (but bz probably doesn’t) since I’m still running Xen 3.2. However, I can’t find a good reason for why it won’t work.

Google to the rescue! Hard to find a useful keyword, since most xen kernel issues seem to be similar, but eventually I found this tidbit:
tiny kernel patch.
Weird that this should be needed, one would hope that basic shit like this would work after 8 kernel versions, but obviously they’re all retards when it comes to this.
Thanks Jeremy Fitzhardinge for the patch!
Just for quick reference, this is the actual patch:

diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index 28e5f59..e2485b0 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -356,7 +356,7 @@ static void __cpuinit early_init_amd(struct cpuinfo_x86 *c)
#endif
#if defined(CONFIG_X86_LOCAL_APIC) && defined(CONFIG_PCI)
/* check CPU config space for extended APIC ID */
- if (c->x86 >= 0xf) {
+ if (cpu_has_apic && c->x86 >= 0xf) {
unsigned int val;
val = read_pci_config(0, 24, 0, 0x68);
if ((val & ((1 << 17) | (1 << 18))) == ((1 << 17) | (1 << 18)))

So much for 'vanilla'.

However, it boots now:

root@iejoor:/xen/hosts/purple# xm create -c purple.cfg-newkernel
Using config file "./purple.cfg-newkernel".
Started domain purple
Reserving virtual address space above 0xf5800000
Linux version 2.6.30.5-jemoeder (root@iejoor) (gcc version 4.1.2) #9 SMP Tue Aug 18 22:59:38 CEST 2009
KERNEL supported cpus:
Intel GenuineIntel
AMD AuthenticAMD
NSC Geode by NSC
Cyrix CyrixInstead
Centaur CentaurHauls
Transmeta GenuineTMx86
Transmeta TransmetaCPU
UMC UMC UMC UMC
ACPI in unprivileged domain disabled
and a lot more yadieyada until it hits a new wall: root device.
Well, that sounds solvable.

In fact, here's a solution. What you say?
"AAAAH, I GET NOTHING, IT WON'T EVEN CRASH ANYMORE, JUST NO OUTPUT???!"
Ah yeah, I forgot to mention: they changed the console device as well as the block device. So here's a solution for both issues:
Edit your xen host config file and make it so:

root = "/dev/xvda1 ro"
extra = "xencons=hvc0"

(xvda1 is what used to be sda1, you can figure out the rest).

And after all this: HAHA! Success is mine!
Except of course for the undeniable fact that my domU has a fucked up fstab and all, but no issues there 🙂
Now go mess up your own system! Oh, I see... you already did. Good luck fixing it 😉

Update

I just tested a testing kernel -- 2.6.31-rc8, and it seems like they fixed it. Probably has been fixed in the testing branch for quite a while now, but still not in stable 2.6.30 🙂
Let's hope they release 2.6.31 soon.




:, , , , , ,

1 Trackback or Pingback for this entry

Leave a Reply

You must be logged in to post a comment.