NFS issues
by BenV on Sep.14, 2009, under Morons, Software
Yesterday evening after getting tired of playing the Aion open beta (it was the last night of the open beta, so we felt like at least reaching level 10, which we did… and then we could ~FLYYYY) we decided to go downstairs to watch some series on our beamer. So we fire up the machine connected to it, which runs everything from NFS. It didn’t take long for the boot screen to come up and after the default selection was made for us it ran through the boot process spewing out the usual kernel messages…..
Until it ran into this:
Looking up port of RPC 100003/2 on 192.168.1.1
Looking up port of RPC 100005/1 on 192.168.1.1
VFS: Mounted root (nfs filesystem) on device 0:14
Freeing unused kernel memory: 412k freed
INIT: version 2.86 booting
nfs: server 192.168.1.1 not responding, still trying
… huh? What’s taking so long? Our server with issues? That’s unlikely since I still used it 5 minutes earlier. After checking out the NFS server and deciding nothing was wrong with it (or at least, it didn’t look like it) I got fed up with it and went to bed.
Don’t you hate it when stuff stops working just when you’re tired and want to go to sleep? Fortunately for me this is our own internal problem so I could actually do so 😉
Today I decided to go find out what the hell was wrong with it. So again I booted the machine to find out the above kernel messages.
Time for some tcpdump on the nfs server to see if that can shine some light on this situation. (needless to say there were no useful log entries anywhere).
Tcpdump:
18:27:51.931303 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto UDP (17), length 132)
193.168.1.9.1304809574 > 192.168.1.1.2049: 104 getattr fh Unknown/010004014D00460054C8D81757072101679D7E1B000000000000000000000000
18:27:51.931402 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto UDP (17), length 124)
192.168.1.1.2049 > 192.168.1.9.1304809574: reply ok 96 getattr DIR 40755 ids 0/0 sz 69632
18:27:55.725726 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto UDP (17), length 140)
192.168.1.9.3787772006 > 192.168.1.1.2049: 112 lookup fh Unknown/010004004D00460054C8D8170000000000000000000000000000000000000000 "proc"
18:27:56.825633 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto UDP (17), length 140)
192.168.1.9.3787772006 > 192.168.1.1.2049: 112 lookup fh Unknown/010004004D00460054C8D8170000000000000000000000000000000000000000 "proc"
18:27:59.025453 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto UDP (17), length 140)
192.168.1.9.3787772006 > 192.168.1.1.2049: 112 lookup fh Unknown/010004004D00460054C8D8170000000000000000000000000000000000000000 "proc"
18:28:03.425078 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto UDP (17), length 140)
192.168.1.9.3787772006 > 192.168.1.1.2049: 112 lookup fh Unknown/010004004D00460054C8D8170000000000000000000000000000000000000000 "proc"
18:28:12.224347 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto UDP (17), length 140)
192.168.1.9.3787772006 > 192.168.1.1.2049: 112 lookup fh Unknown/010004004D00460054C8D8170000000000000000000000000000000000000000 "proc"
18:28:13.324257 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto UDP (17), length 140)
192.168.1.9.3787772006 > 192.168.1.1.2049: 112 lookup fh Unknown/010004004D00460054C8D8170000000000000000000000000000000000000000 "proc"
18:28:15.524071 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto UDP (17), length 140)
192.168.1.9.3787772006 > 192.168.1.1.2049: 112 lookup fh Unknown/010004004D00460054C8D8170000000000000000000000000000000000000000 "proc"
Hmm, interesting. The first request seems to “work” (read: it receives an answer) while the rest of the messages are somehow ignored. Wonder why…
Just to make sure I checked the firewall on the server, which should allow pretty much everything:
root@uil# iptables -L -v -n
Chain INPUT (policy ACCEPT 198M packets, 89G bytes)
pkts bytes target prot opt in out source destination
Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
Chain OUTPUT (policy ACCEPT 228M packets, 166G bytes)
pkts bytes target prot opt in out source destination
Well, that seems to be true. Maybe something weird in the mount options? Let’s see the /etc/fstab for the NFS client:
root@teigetje# cat /etc/fstab
192.168.1.1:/mnt/general_stores2/teigetje / nfs rsize=8192,wsize=8192,nfsvers=3 0 0
proc /proc proc defaults 0 0
tmpfs /dev/shm tmpfs defaults 0 0
tmpfs /tmp tmpfs defaults 0 0
Nothing weird… or is there? After messing around with it for a bit I decided it really wasn’t anything in there.
Time to do a test on a different machine in the network. On my workstation I tried to mount the root path after giving myself permissions and re-exporting the nfs paths:
root@janeman:/mnt:0>mount -o rsize=8192,wsize=8192,nfsvers=3 192.168.1.1:/mnt/general_stores2/teigetje tmp
root@janeman:/mnt:0>ls -al tmp
*stuck*
And there it was, being stuck for what seems to be forever. So in other words: the mount worked, but then it doesn’t respond for the ls command. What… the… fuck.
Mind boggling, while the logs on the nfs server gave no hints at all. (as usual).
To make this even more interesting, there is another export on that same nfs server that most workstation use, called general_stores. (to access the music collection etc).
Can’t remember that being broken…. let’s test:
benv@janeman:~:0>ls -lad /mnt/general_stores/.
drwxr-xr-x 14 root root 4096 Aug 25 16:58 /mnt/general_stores/./
How can this be?! This mount works fine while the other doesn’t? What the hell is so special about that NFS client’s export?
And then it hit me.
*WHACK*
(ouch)
The window that I had open with the ls -al tmp suddenly decided to work. It gave this listing:
ls: cannot access tmp/proc: Input/output error
total 164
drwxr-xr-x 21 root root 4096 Sep 14 18:14 .
drwxr-xr-x 12 root root 4096 Sep 26 2006 ..
drwxr-xr-x 2 root root 4096 Sep 14 18:14 bin
drwxr-xr-x 13 root root 4096 Sep 14 18:05 boot
drwxr-xr-x 17 root root 69632 Oct 6 1997 dev
drwxr-xr-x 71 root root 12288 Sep 14 18:24 etc
drwxr-xr-x 5 root root 4096 Oct 6 1997 home
drwxr-xr-x 6 root root 12288 Sep 14 18:11 lib
drwxr-xr-x 16 root root 4096 Sep 14 16:47 media
drwxr-xr-x 12 root root 4096 Sep 26 2006 mnt
drwxr-xr-x 9 root root 4096 Jun 10 2007 opt
?????????? ? ? ? ? ? proc
drwx--x--- 28 root root 4096 Sep 14 18:24 root
drwxr-xr-x 2 root root 4096 Jun 14 05:35 sbin
drwxr-xr-x 2 root root 4096 Apr 8 2007 srv
drwxr-xr-x 2 root root 4096 May 12 2004 sys
drwxrwxrwt 4 root root 4096 Sep 14 18:17 tmp
drwxr-xr-x 19 root root 4096 Jul 16 02:22 usr
drwxr-xr-x 19 root root 4096 Aug 4 01:52 var
Doh! /proc … of course that’s the problem! That’s why tcpdump showed the “proc” entries!
But why?
Well, recently I moved some disks around and added a ‘crossmnt‘ option in the /etc/exports file on the NFS server.
This option gives the nfs clients the ability to cross mount borders, so for example if I have this on the nfs server:
* /mnt/disk with device /dev/sda1 mounted there
* /mnt/disk/stuff with device /dev/sdb1 mounted there
And then export /mnt/disk and mount it on my nfs client, I can see /mnt/disk/stuff with its contents because of the crossmnt option. Without that option the directory would be empty on nfs clients.
So how does this relate to proc? Well, since proc is “special” it has a ton of special files that nfs can’t deal with and therefore can not export.
Now normally this wouldn’t be an issue, since it shouldn’t be mounted in the first place (the nfs server has its own /proc, the nfs cilent mounts its own version so it only needs a place to mount it on). However, a little earlier my girly friend mounted it “because she had to build E“. Don’t ask.
E again…. always the same… 😉
After unmounting the proc dir and rebooting the NFS client it had no more problems and booted fine.
Conclusion:
NFS can’t deal with proc. Don’t hide a mounted procfs in your nfs exports.
NFS sucks because of stuff like this, but I haven’t found anything better so far. Guess we’ll stick with it for now.
September 15th, 2009 on 14:48
This is why I had to mount proc:
—– Installing packages —–
– eina ……. autogen: Error, do this: mount -t proc none /proc