BenV's notes

Tag: kernel

Linux 3.8 and NVIDIA driver

by on Feb.21, 2013, under Software

Good news everyone! Linux 3.8 has been released! Obviously I immediately fired up my compiled to upgrade from the by now ancient 3.6.8 kernel that I was running.
After rebooting my Slackware64 machine into the new kernel without a problem it was time to recompile the NVIDIA binary blob. You know, this piece of garbage. It doesn’t matter if you pick the latest official release version (seems to be 310.32 atm) or the beta that I picked, it won’t compile.
Running the installer will fail and leave you a /var/log/nvidia-installer.log that contains something like this:

root@machine:/# tail /var/log/nvidia-installer.log
/tmp/selfgz7139/NVIDIA-Linux-x86_64-313.18/kernel/nv.c: In function ‘nv_kern_open’:
/tmp/selfgz7139/NVIDIA-Linux-x86_64-313.18/kernel/nv.c:1521:30: warning: passing argument 2 of ‘request_irq’ from incompatible pointer type [enabled by default]
In file included from /tmp/selfgz7139/NVIDIA-Linux-x86_64-313.18/kernel/nv-linux.h:128:0,
from /tmp/selfgz7139/NVIDIA-Linux-x86_64-313.18/kernel/nv.c:13:
include/linux/interrupt.h:130:1: note: expected ‘irq_handler_t’ but argument is of type ‘enum irqreturn_t (*)(int, void *, struct pt_regs *)’
/tmp/selfgz7139/NVIDIA-Linux-x86_64-313.18/kernel/nv.c:1525:17: error: implicit declaration of function ‘NV_TASKQUEUE_INIT’ [-Werror=implicit-function-declaration]
/tmp/selfgz7139/NVIDIA-Linux-x86_64-313.18/kernel/nv.c:1537:25: warning: passing argument 2 of ‘request_irq’ from incompatible pointer type [enabled by default]
In file included from /tmp/selfgz7139/NVIDIA-Linux-x86_64-313.18/kernel/nv-linux.h:128:0,
from /tmp/selfgz7139/NVIDIA-Linux-x86_64-313.18/kernel/nv.c:13:
include/linux/interrupt.h:130:1: note: expected ‘irq_handler_t’ but argument is of type ‘enum irqreturn_t (*)(int, void *, struct pt_regs *)’
cc1: some warnings being treated as errors
make[3]: *** [/tmp/selfgz7139/NVIDIA-Linux-x86_64-313.18/kernel/nv.o] Error 1
make[2]: *** [_module_/tmp/selfgz7139/NVIDIA-Linux-x86_64-313.18/kernel] Error 2
NVIDIA: left KBUILD.
nvidia.ko failed to build!
make[1]: *** [module] Error 1
make: *** [module] Error 2
-> Error.
ERROR: Unable to build the NVIDIA kernel module.

Fortunately here’s a little patch you can run to fix it. This assumes you have your linux 3.8 kernel sources symlinked in /usr/src/linux!

root@machine:/usr/src# wget -q ftp://download.nvidia.com/XFree86/Linux-x86_64/313.18/NVIDIA-Linux-x86_64-313.18.run
root@machine:/usr/src# bash NVIDIA-Linux-x86_64-313.18.run -x
root@machine:/usr/src# cd NVIDIA-Linux-x86_64-313.18
root@machine:/usr/src/NVIDIA-Linux-x86_64-313.18# wget -q http://notes.benv.junerules.com/wp-content/uploads/2013/02/nvidia-313.18-linux-3.8.patch
root@machine:/usr/src/NVIDIA-Linux-x86_64-313.18# patch -p1 < nvidia-313.18-linux-3.8.patch root@machine:/usr/src/NVIDIA-Linux-x86_64-313.18# ./nvidia-installer # now it should work

Works for me at least 😉

Leave a Comment :, , more...

Slackware64-current and udev 1.82

by on Jul.24, 2012, under Software

Some days after tinkering for a little bit you come to the realization that it might be better to stop doing anything with devices and just wait for the day to pass, because everything you touch breaks in the most spectacular ways. Of course this never stopped me from breaking even more, but I’m stupid like that.
Today is a day like that it seems. First our ADSL line at home received an upgrade to FTTH (aka a fiber connection), boosting our internet speed from a lousy 8Mbit down to 50Mbit down, and from less than 1Mbit upstream to 50Mbit upstream. (continue reading…)

Leave a Comment :, , , , , more...

Missing /dev/sd* in slackware 13

by on Jul.11, 2010, under Software

I’ve bashed my head into this problem at least three times now, so after finally running to google …. it made me search more than I liked.

The problem descriptions:
* Your system boots fine (maybe because it’s running on software raid), but your /dev/sd* files are gone.
* Your system doesn’t boot anymore, complaining about not finding your boot device when booting your custom kernel, but the stock kernel does work.
* Mounting partitions doesn’t work anymore, saying stuff like mount: special device /dev/sda does not exist

Reason:
* Your custom kernel has CONFIG_SYSFS_DEPRECATED enabled. To find out:

benv@uil$ zcat /proc/config.gz | grep CONFIG_SYSFS_DEPRECATED
CONFIG_SYSFS_DEPRECATED=y

Since udev version 151 (or something close to that) this will sparsely populate /dev. Yay.

If you don’t believe it, check out /usr/share/doc/udev-*/README:

- Udev will not work with the CONFIG_SYSFS_DEPRECATED* option.

Another problem solved.

Leave a Comment :, , more...

NFS issues

by on Sep.14, 2009, under Morons, Software

Yesterday evening after getting tired of playing the Aion open beta (it was the last night of the open beta, so we felt like at least reaching level 10, which we did… and then we could ~FLYYYY) we decided to go downstairs to watch some series on our beamer. So we fire up the machine connected to it, which runs everything from NFS. It didn’t take long for the boot screen to come up and after the default selection was made for us it ran through the boot process spewing out the usual kernel messages….. (continue reading…)

1 Comment :, , more...

Pokemon OS, rsync/ssh and MAC

by on Sep.11, 2009, under Software

So yesterday at work I ran into the famous ssh MAC failure like this:

wouter@wouter-laptop:~:0> rsync -varP ./vmware/ wouter@192.168.1.2:/archive/archive2/programs/vmware/
Password:
sending incremental file list
./
Keys
116 100% 0.00kB/s 0:00:00 (xfer#1, to-check=8/10)
linux/
linux/VMware-server-2.0.1-156745.i386.tar.gz
32768 0% 800.00kB/s 0:10:11 Received disconnect from 192.168.1.2: 2: Corrupted MAC on input.

rsync: writefd_unbuffered failed to write 4 bytes to socket [sender]: Broken pipe (32)
rsync: connection unexpectedly closed (53 bytes received so far) [sender]
rsync error: error in rsync protocol data stream (code 12) at io.c(600) [sender=3.0.6]

No, that has nothing to do with Apple/Mac computers or Media Access Control, it’s
part of the SSH protocol (and others) called Messenge Authentication Code. This blog has a nice explanation: Jan Pechanec on SSH messages.

Funny thing, my brother also had this exact issue with the same kind of laptop. Well…. in his case it was putty failing his connection from a windows machine to this laptop.
The reason? Same as ever, checksum offloading.
You can check if your card does this with the ethtool program:

wouter@wouter-laptop:~:0> sudo ethtool -k eth0
Offload parameters for eth0:
Cannot get device flags: Operation not supported
rx-checksumming: on
tx-checksumming: on
scatter-gather: on
tcp-segmentation-offload: off
udp-fragmentation-offload: off
generic-segmentation-offload: on
generic-receive-offload: off
large-receive-offload: off

And the fix for this:

wouter@wouter-laptop:~:85> sudo ethtool -K eth0 tx off

Fixed.

Wait, what does Pokemon OS have to do with this?
Well, isn’t that obvious? It’s supposed to magically work, and it doesn’t! 😉
Probably more a kernel thing though… oh well.

Leave a Comment :, , , more...

P4-Clockfuck

by on Sep.03, 2009, under Software

You know what’s really annoying?

Try this:

root@Uil:/sys/devices/system/cpu/cpu0/cpufreq:0>echo ondemand > scaling_governor
root@Uil:/sys/devices/system/cpu/cpu0/cpufreq:0>dmesg
[1325843.712549] ondemand governor failed, too long transition latency of HW, fallback to performance governor

Apparently this is caused by the p4-clockmod module.
You know, this piece of junk:

p4-clockmod

p4-clockmod


Also known as “CONFIG_X86_P4_CLOCKMOD“.

The reason?
Well, the ondemand and the conservative governors want to be able to switch to a different speed before their calculations become invalid.
The p4-clockmod thing apparently makes the latency of switching so BIG that it becomes impossible. Or so they think.
Which leaves only the sucky powersave and performance governors. (and of course userspace).

The kernel menuconfig help states:

This driver should be only used in exceptional
circumstances when very low power is needed because it causes severe
slowdowns and noticeable latencies. Normally Speedstep should be used
instead.

Lovely. I’ll keep that in mind, thanks.
*enables CONFIG_X86_ACPI_CPUFREQ instead*

Note that when they’re both compiled in (as opposed to modules) the retarded thing seems to go for the clockmod. Just great.

1 Comment :, , more...

New server, day 2. DomU and networking.

by on Aug.29, 2009, under Software

Another day, another time for fun!

Since we got Xen up and running yesterday, it’s now time for actually having some fun with it.
The goals are:

  1. Getting xend started automagically when booting without destroying my network connection
  2. Getting a domU up and running with a network connection
  3. Getting an internal network between the domUs and dom0, shielded from the big bad internet.

(continue reading…)

Leave a Comment :, , , , , more...

Xen and booting domU using a vanilla kernel

by on Aug.18, 2009, under Morons, Software

Our server in the datacenter is running slackware (duh) with a nice Xen installation on it.
Still running with Xen 3 using PAE since the last time I updated/upgraded it without any major problems for over 2 years now 🙂
(ignoring the time that the power supply let go of the the magic smoke)

So after the latest local root exploit thing, and testing it on some machines during har2009, I figured it might be nice to get the patch into my kernels. Running a webserver usually means you’re at risk after all, especially when it’s running wordpress 😉

So I grabbed the latest kernel that had that patch and went for the usual make menuconfig ordeal. After half an hour of selecting “yes”, “maybe”, “I guess so”, and
some XEN options (CONFIG_PARAVIRT_GUEST=y, CONFIG_XEN=y, CONFIG_HVC_DRIVER=y, CONFIG_XEN_BLKDEV_FRONTEND=y, CONFIG_XEN_NETDEV_FRONTEND=y, CONFIG_XEN_KBDDEV_FRONTEND=y, CONFIG_HVC_XEN=y, CONFIG_XEN_BALLOON=y, CONFIG_XEN_SCRUB_PAGES=y, CONFIG_XENFS=y, CONFIG_XEN_COMPAT_XENFS=y) I built the thing.
As I usually build bzImages I also did so here. make bzImage modules modules_install.
Half an hour later I had a shiny bzImage. Let’s try it on a test domain!

I copied the bzImage to /boot, changed the kernel = "/boot/vmlinuz-xen-old" option to the new place, and GO!
This is what it told me:

root@iejoor:/xen/hosts/purple# xm create -c purple.cfg-newkernel
Using config file "./purple.cfg-newkernel".
Started domain purple
root@iejoor:/xen/hosts/purple#

Huh, where’s my console?
Checking xm list it seemed to be there… but paused… and with constantly increasing ids. Waaait a minute!
Checking the log files (xend.log in this case) revealed a secret:

[2009-08-18 16:40:01 16116] DEBUG (DevController:162) Waiting for devices irq.
[2009-08-18 16:40:01 16116] DEBUG (DevController:162) Waiting for devices vkbd.
[2009-08-18 16:40:01 16116] DEBUG (DevController:162) Waiting for devices vfb.
[2009-08-18 16:40:01 16116] DEBUG (DevController:162) Waiting for devices console.
[2009-08-18 16:40:01 16116] DEBUG (DevController:167) Waiting for 0.
[2009-08-18 16:40:01 16116] DEBUG (DevController:162) Waiting for devices pci.
[2009-08-18 16:40:01 16116] DEBUG (DevController:162) Waiting for devices ioports.
[2009-08-18 16:40:01 16116] DEBUG (DevController:162) Waiting for devices tap.
[2009-08-18 16:40:01 16116] DEBUG (DevController:162) Waiting for devices vtpm.
[2009-08-18 16:40:01 16116] INFO (XendDomain:1165) Domain purple (35) unpaused.
[2009-08-18 16:40:01 16116] WARNING (XendDomainInfo:1240) Domain has crashed: name=purple id=35.
[2009-08-18 16:40:01 16116] DEBUG (XendDomainInfo:1879) XendDomainInfo.destroy: domid=35
[2009-08-18 16:40:01 16116] DEBUG (XendDomainInfo:1896) XendDomainInfo.destroyDomain(35)

Note the ‘WARNING’ line. Crashed?!
Gee, that’s … interesting. Why? Took me 5 minutes to find the other log file, but xend-debug.log had a magic line:
ERROR Invalid kernel: xc_dom_find_loader: no loader found
… great. Another why.
In case you’re wondering, the increasing status in ‘xm list’ was caused by the on_crash = 'reboot' line in the xen host config.
I quickly destroyed the doman and changed it to a oneshot try: on_crash = 'destroy'

After some searching another hint presented itself. Aren’t they nice?
This hint was: “Xen is a retarded piece of cancer and can’t decypher the bzImage format, try vmlinux instead”. Aha!
Back to the kernel, make vmlinux, copy, and another try to boot it. Obviously this was way too simple. It still crashed.
Checking my old kernel that works for both dom0 and domU with file I noticed that gz should at least work.

# file /boot/vmlinuz-2.6*
/boot/vmlinuz-2.6.18.8-xen: gzip compressed data, from Unix, last modified: Tue Nov 25 16:13:16 2008, max compression
/boot/vmlinuz-new: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), statically linked, not stripped

Some more details about the crashing domains can be found using xm dmesg
It gave me something like:

(XEN) traps.c:413:d38 Unhandled general protection fault fault/trap [#13] on VCPU 0 [ec=0000]
(XEN) domain_crash_sync called from entry.S (ff18928e)
(XEN) Domain 38 (vcpu#0) crashed on cpu#3:
(XEN) ----[ Xen-3.2.3 x86_32p debug=n Not tainted ]----
(XEN) CPU: 3
(XEN) EIP: e019:[]
(XEN) EFLAGS: 00000282 CONTEXT: guest
(XEN) eax: 8000c068 ebx: c064c040 ecx: 80000000 edx: 00000cf8
(XEN) esi: c0651f3c edi: c0651f30 ebp: c06981b8 esp: c0651f14
(XEN) cr0: 8005003b cr4: 000006f0 cr3: 00bd5c80 cr2: 00000000
(XEN) ds: e021 es: e021 fs: e021 gs: e021 ss: e021 cs: e019
(XEN) Guest stack trace from esp=c0651f14:
(XEN) 00000000 c03e9e60 0001e019 00010082 c04bd0cd 00000068 00000000 00000000
(XEN) 00002003 00000000 00003030 00000002 00000007 c064c07c c065dfd8 00000000
(XEN) c06423c0 c064c100 c0651fd8 c065a63f 00000005 00000000 00000000 00000000
(XEN) c065d02b 006faee4 00000000 00000000 c056dad4 00100000 00000000 00100000
(XEN) 00100000 00000000 006faee4 c065d27c 006faee4 00000000 00000000 c0651fe8
(XEN) 00000000 00000000 00000000 c0651fe8 00000000 00000000 c065756f c056dd50
(XEN) c04c5020 c0657073 c0651ff4 c065973e 00000000 17898175 00800001 03040800
(XEN) 00100f22 00000000 c08e0000 c04ba91b c04ba923 c0103371 c0103878 c0659a92
(XEN) c01039c7 c0103afc c0103d1a c0104004 c0104163 c01041eb c010451d c010457c
(XEN) c0659eb7 c0659ebf c04ba975 c04baa22 c0105536 c01055ae c0105781 c0105c5c
(XEN) c0105c9e c0106253 c0106345 c0106831 c010683d c0107b07 c010825e c0108333
(XEN) c065a45c c065a46a c0108e56 c065a982 c065a98a c0109bf0 c065af2c c065af93
(XEN) c065afe9 c065b042 c065b09b c065b0f4 c065b14d c065b1a6 c065b1ff c065b258
(XEN) c065b2b1 c065b32d c065b341 c065b397 c065b3f0 c065b449 c065c14f c065c157
(XEN) c010aee5 c010aef3 c010af77 c010af85 c065d7be c065d7d1 c010bdc3 c010be22
(XEN) c010c113 c010c14a c010c1c6 c010c2f3 c010c41a c010c443 c010c4dd c010d267
(XEN) c010d4cb c010d77b c010d782 c010de76 c010defd c010df0c c010df3f c010df49
(XEN) c010df84 c010df8c c010df94 c04bb1d7 c04bb466 c04bb4db c04bb5b9 c04bb60f
(XEN) c04bb62f c04bbb8b c04bbd60 c065de79 c065de81 c04bc041 c04bc46b c04bc50d
(XEN) c04bc899 c04bc902 c065e012 c065e01a c065e028 c065e030 c065e038 c065e040

As you can see, that’s really useful….. 😉

A useful page with some info can be found here. It confirms that a gz kernel should work (but bz probably doesn’t) since I’m still running Xen 3.2. However, I can’t find a good reason for why it won’t work.

Google to the rescue! Hard to find a useful keyword, since most xen kernel issues seem to be similar, but eventually I found this tidbit:
tiny kernel patch.
Weird that this should be needed, one would hope that basic shit like this would work after 8 kernel versions, but obviously they’re all retards when it comes to this.
Thanks Jeremy Fitzhardinge for the patch!
Just for quick reference, this is the actual patch:

diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index 28e5f59..e2485b0 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -356,7 +356,7 @@ static void __cpuinit early_init_amd(struct cpuinfo_x86 *c)
#endif
#if defined(CONFIG_X86_LOCAL_APIC) && defined(CONFIG_PCI)
/* check CPU config space for extended APIC ID */
- if (c->x86 >= 0xf) {
+ if (cpu_has_apic && c->x86 >= 0xf) {
unsigned int val;
val = read_pci_config(0, 24, 0, 0x68);
if ((val & ((1 << 17) | (1 << 18))) == ((1 << 17) | (1 << 18)))

So much for 'vanilla'.

However, it boots now:

root@iejoor:/xen/hosts/purple# xm create -c purple.cfg-newkernel
Using config file "./purple.cfg-newkernel".
Started domain purple
Reserving virtual address space above 0xf5800000
Linux version 2.6.30.5-jemoeder (root@iejoor) (gcc version 4.1.2) #9 SMP Tue Aug 18 22:59:38 CEST 2009
KERNEL supported cpus:
Intel GenuineIntel
AMD AuthenticAMD
NSC Geode by NSC
Cyrix CyrixInstead
Centaur CentaurHauls
Transmeta GenuineTMx86
Transmeta TransmetaCPU
UMC UMC UMC UMC
ACPI in unprivileged domain disabled
and a lot more yadieyada until it hits a new wall: root device.
Well, that sounds solvable.

In fact, here's a solution. What you say?
"AAAAH, I GET NOTHING, IT WON'T EVEN CRASH ANYMORE, JUST NO OUTPUT???!"
Ah yeah, I forgot to mention: they changed the console device as well as the block device. So here's a solution for both issues:
Edit your xen host config file and make it so:

root = "/dev/xvda1 ro"
extra = "xencons=hvc0"

(xvda1 is what used to be sda1, you can figure out the rest).

And after all this: HAHA! Success is mine!
Except of course for the undeniable fact that my domU has a fucked up fstab and all, but no issues there 🙂
Now go mess up your own system! Oh, I see... you already did. Good luck fixing it 😉

Update

I just tested a testing kernel -- 2.6.31-rc8, and it seems like they fixed it. Probably has been fixed in the testing branch for quite a while now, but still not in stable 2.6.30 🙂
Let's hope they release 2.6.31 soon.

1 Comment :, , , , , , more...