BenV's notes

Archive for January, 2011

Linux Software Raid-1 issue

by on Jan.29, 2011, under Software

It just took me about an hour to figure this one out, so here’s the story for the next time I run into it.
Steps taken:
* New machine
* 2 harddisks (Western Digital Greens, so used wdidle3 on them!)
* Boot Slackware64 installer from PXE/NFS
* cfdisk, create 2 identical partitions, make bootable, set type to FD, write, quit
* mdadm –create /dev/md0 –raid-level=1 –raid-devices=2 /dev/sda1 /dev/sdb2
* install slackware64, grub2 and some other junk
* reboot

Sounds good right?
Well, for some reason the array kept booting up with only 1 of 2 disks active.
No errors or warnings, just kept fucking up. /proc/mdstat looked like

# cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
md0 : active raid1 sdb1[1]
39061952 blocks [1/2] [_U]

Adding /dev/sda1 back (mdadm /dev/md0 –add /dev/sda1) worked fine too, the array resync-ed without problems.

After about an hour of trying to recreate superblocks and that sort of stuff I found it:
The partition type of /dev/sda1 was set to 0x83 instead of 0xFD.
Thanks cfdisk, last time I used that piece of garbage. (I’m 100% certain I set them to 0xFD, but somehow it’s bugging for me lately in cfdisk).

Leave a Comment :, , , , more...

Why I hate lilo

by on Jan.11, 2011, under Software

Every time I install a machine with the latest Slackware, I’m amazed again at the installed boot manager – lilo.
Sure, lilo works. Most of the times. Even when you have a raid-1 boot device.
Unless you don’t have the latest version of lilo of course.

Today I tried to continue a Slackware64 (current) install of a machine that I installed a week ago.
It worked fine, was just about to install Xen when one of the disks started acting up.
Obviously SMART didn’t help for a bit
* Report – No errors!
* Short Self test – Your disk is fine!
* You want a long test that takes 4 hours? Your machine locks up before it completes, haha!
But when the disk kept failing every time when the md1 resync hit 36%, I yanked out the disk and sent it RMA.
Dmesg showed error like this:

[ 3362.784129] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[ 3362.784132] ata1.00: failed command: READ DMA EXT
[ 3362.784135] ata1.00: cmd 25/00:00:3f:60:f4/00:04:57:00:00/e0 tag 0 dma 524288 in
[ 3362.784135] res 40/00:00:02:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 3362.784136] ata1.00: status: { DRDY }
[ 3362.784139] ata1: hard resetting link
[ 3364.002049] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[ 3364.009142] ata1.00: configured for UDMA/33
[ 3364.009148] sd 0:0:0:0: [sda] Result: hostbyte=0x00 driverbyte=0x08
[ 3364.009150] sd 0:0:0:0: [sda] Sense Key : 0xb [current] [descriptor]
[ 3364.009152] Descriptor sense data with sense descriptors (in hex):
[ 3364.009153] 72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 00 00
[ 3364.009156] 00 00 00 01
[ 3364.009158] sd 0:0:0:0: [sda] ASC=0x0 ASCQ=0x0
[ 3364.009159] sd 0:0:0:0: [sda] CDB: cdb[0]=0x28: 28 00 57 f4 60 3f 00 04 00 00
[ 3364.009162] end_request: I/O error, dev sda, sector 1475633215
[ 3364.009174] ata1: EH complete

So today I figured I could continue installing with only half a raid-1 array.
But it didn’t boot (“Loading operating system…. *halt*).
I figured lilo must have been installed to the MBR of the disk that I yanked, so I booted from LAN and ran lilo.
Obviously lilo complained, because /dev/sda was only half the raid-1 array and disks were missing!
Fine. I changed my boot device to /dev/md0, hoping that lilo would get the hint.


# lilo
Warning: LBA32 addressing assumed
Fatal: Not all RAID-1 disks are active; use '-H' to install to active disks only
# lilo -H
Warning: LBA32 addressing assumed
Warning: Partial RAID-1 install on active disks only: booting is not failsafe

Warning: Faulty disk in RAID-1 array; boot with caution!!
Fatal: Unusual RAID bios device code: 0xFF

*sigh*
This is why I hate lilo. If it doesn’t work, it doesn’t work.
And it never tells you why. Or maybe it does, just like windows always tells you what’s wrong when you get a blue screen.

It’s probably this bug, but I don’t care. Always something.
Time to find the sources to grub.

Leave a Comment :, , , , more...

Archives

  • 2018 (1)
  • 2016 (1)
  • 2015 (7)
  • 2014 (4)
  • 2013 (11)
  • 2012 (27)
  • 2011 (26)
  • 2010 (25)
  • 2009 (68)