nvidia drivers + 2.6 smp kernel = random x11 crash anybody?

OS / Drivers / BIOS
Post Reply
tapeworm
Posts: 15
Joined: Thu Mar 06, 2003 10:44 am
Location: italy
Contact:

nvidia drivers + 2.6 smp kernel = random x11 crash anybody?

Post by tapeworm »

hi,
first of all sorry for my english

i'm experiencing this BIG problem since ages (i believe since my first attempts with the 2.6 kernels, but maybe even with 2.4)

basically from time to time, with the sistem without load, with only basic 2d apps opened (no 3d, but quake3 is very "crashy" too, unlike quake1), xfree crashes, leaving me with a frozen screen, no keyboard control (so no control+alt+backspace), but with a working mouse pointer (can't click anithing but i can move it). only way to get back the machine apart from a reset is to login remotely, kill xfree and restart it.

i even got close to produce these crashes at my will (some random factor remains), and if i use the opensource nv drivers i have no problems (but no opengl), so i think there is something wrong with the nvidia binary drivers.

i think i'm not alone in this, because i found reports of other people having stability problems with nvidia drivers on smp systems, for example this:
http://lwn.net/Articles/71832/
and this (italian, sorry):
http://lists.debian.org/debian-italian/ ... 00144.html
and some others i can't locate anymore.

so now i'm posting this here (a forum with mostly linux smp users) because i'd like to know of anybody else experiencing this kind on instability, and eventually discuss it, thanks.
ImagePowered by Debian Sid/Unstable on 2.6.8-1-686-smp...
purrkur
Linux Guru
Posts: 687
Joined: Fri Dec 12, 2003 5:57 pm
Location: Sweden
Contact:

Post by purrkur »

Hello Tapeworm!

Fist of all, don't worry about your english. Your message came through loud and clear :)

I am also running my BP6 with Debian unstable and Nvidias drivers and X on my BP6 is literally crash free. I haven't seen a single X crash yet. So lets compare notes and see if we can figure out your issues, shall we?

There are two things that are lacking in your message and that is what Nvidia driver version you are using and what Nvidia card you are running. I am using the latest and greatest from Nvidia and a Geforce2 MX graphics adapter. I also got a Dual XEON machine at work using an older TNT2 graphics adapter (but still the same driver) and I have no problems there either.

I checked out your sig and saw that you are running a Pre 2.6.0 kernel! Is that true? If so then this is where I would start! You also don't mention if you have checked out the X logfiles? It would be great if you could check those out and include what is being written into it when X crashes. You can find the logfile at /var/log/XFree86.0.log.

Just a quick comment on your first link. The discussion there is about Hyperthreading and not true SMP so I wouldn't worry too much. Hyperthreading is also technology that I have absolutely no faith in and I have seen the negative impacts of it too often to know that it is best turned off in a production environment. Even Intel seems to know this since computers are usually delivered with it turned off and it also seems that once Intel starts launching their dual-core processors, that Hyperthreading technology will loose focus in the marketplace.

So, to recap: I need:

1. What Nvidia Driver version you are running
2. What Nvidia graphics adapter
3. Kernel version? Is it pre-2.6.0 like your signature says?
4. The X log file. What does it say when X crashes?

Ciao!
2x533MHz@544MHz, 2.0V
640MB PC100 memory
Realtek RTL-8139 NIC
Maxtor 6Y080L0 80GB hdd
Debian Linux stable with 2.4.8 kernel
tapeworm
Posts: 15
Joined: Thu Mar 06, 2003 10:44 am
Location: italy
Contact:

Post by tapeworm »

thanks for the quick reply
purrkur wrote: There are two things that are lacking in your message and that is what Nvidia driver version you are using and what Nvidia card you are running. I am using the latest and greatest from Nvidia and a Geforce2 MX graphics adapter. I also got a Dual XEON machine at work using an older TNT2 graphics adapter (but still the same driver) and I have no problems there either.
sorry, i forgot, i use an asus geforce 2 mx and the latest nvidia drivers, exactly these ones:
http://packages.debian.org/unstable/x11 ... nel-source
purrkur wrote: I checked out your sig and saw that you are running a Pre 2.6.0 kernel! Is that true? If so then this is where I would start!
doh... i forgot to update that too... i'm running this one:
http://packages.debian.org/unstable/bas ... -1-686-smp
but basically with every other 2.6 kernel i have more or less the same kind of crash.
purrkur wrote: You also don't mention if you have checked out the X logfiles? It would be great if you could check those out and include what is being written into it when X crashes. You can find the logfile at /var/log/XFree86.0.log.
last time i chacked there was absolutely nothing in the logs. i'll check better next time.

Just a quick comment on your first link. The discussion there is about Hyperthreading and not true SMP so I wouldn't worry too much.

it's possible that these are 2 unrelated circumstances, but i think that probably there is something similar, it's a kernel with smp support after all.
most of all in the second case, the italian one, hypertrheading related too, the description of the crashes is absolutely identical to mine, and it's a pretty strange kind of crash, where you still can control the mice. i have been exposed to several xfree crashes, due to overclocking and what not, and usually i get either a completely locked up display, or i'm able to go to the console.

i was thinking about agp drivers and settings, do you maybe use the integrated nvagp instead of agpgart? or is it possible that completely disabling agp support makes some difference? do you use the package from the nvidia home page or the "debianized" version? (there should be no difference, the binary part doesn't change, and the compiling part uses the nvidia makefile). Do you use a precompiled kernel from debian? do you compile it yourself from the debian sources? or do you compile it yourself from the vanilla (kernel.org) sources? do you use a 2.4 or a 2-6 version? unfortunatly there are a very large number of factors, and trying out all of them one by one takes quite a bit of time...
ImagePowered by Debian Sid/Unstable on 2.6.8-1-686-smp...
purrkur
Linux Guru
Posts: 687
Joined: Fri Dec 12, 2003 5:57 pm
Location: Sweden
Contact:

Post by purrkur »

Hi again,

I just double checked my machine and found out that I have only downloaded the latest 6111 but not installed it. I checked the Nvidia Linux user forums and I found out that there seems to be some issue with this driver that may be similar to what you are seeing.

I would recommend backing up and using 6106 (that is what I am using on my BP6) instead. There is a thread about this issue here in the above mentioned forums.

FWIW, I compile my own kernels (downloaded from www.kernel.org) and I also download the drivers separately from Nvidia and install them myself. My current kernel is still 2.6.5 on my PB6. I also use the agpgart driver without issues!

The Linux kernel gang has recently made a change in the kernel that directly affected the kernel. I thought that 6111 was supposed to fix that but it may be that some other bugs reared their ugly head. Back it up to 6106 and see how that goes. If you are still having problems then post your XF86Config file and check the log for any hints!

Cheers,
2x533MHz@544MHz, 2.0V
640MB PC100 memory
Realtek RTL-8139 NIC
Maxtor 6Y080L0 80GB hdd
Debian Linux stable with 2.4.8 kernel
24seven
IRC Lurker
Posts: 495
Joined: Wed Jul 24, 2002 5:23 pm
Location: Derbyshire UK
Contact:

Post by 24seven »

Hi, Ive had similar problems with my machine, but I put the problem down to heat.

The graphics would crash and it locked the machine up localy, but if you sshed into the machine it was still running. If you killed the X11 process it didnt help, the only way to fix it was a reboot.

David
tapeworm
Posts: 15
Joined: Thu Mar 06, 2003 10:44 am
Location: italy
Contact:

Post by tapeworm »

purrkur wrote:Hi again,
I would recommend backing up and using 6106 (that is what I am using on my BP6) instead.
unfortunatly those drivers did exibit the same issue, i use to upgrade the nvidia drivers some days after they are out, and i have this problem since many months.

however, i'll keep an eye on the logs (no crashes since 3 days, quite a record...), and i will try some other kernel. thanks for the help.
27seven wrote:Hi, Ive had similar problems with my machine, but I put the problem down to heat.
i don't think this could be problem, my video card does not generate a lot of heat, my case is always open, the situation doesn't get better with lower ambient temperature, crashes aren't related to the load of the system or the activity (opengl rendering) of the gpu, and disappear with the nv driver.

it's quite strange, i was convinced that many more bp6 users could have this problem...
ImagePowered by Debian Sid/Unstable on 2.6.8-1-686-smp...
purrkur
Linux Guru
Posts: 687
Joined: Fri Dec 12, 2003 5:57 pm
Location: Sweden
Contact:

Post by purrkur »

My BP6 has yet to experience an X crash so this is definitely something with your setup. I can also upgrade my kernel to 2.6.8.1 and the Nvidia driver to 6111 just to see what happens...
2x533MHz@544MHz, 2.0V
640MB PC100 memory
Realtek RTL-8139 NIC
Maxtor 6Y080L0 80GB hdd
Debian Linux stable with 2.4.8 kernel
tapeworm
Posts: 15
Joined: Thu Mar 06, 2003 10:44 am
Location: italy
Contact:

some more info found just now...

Post by tapeworm »

in this review of fedora core 3 test 2:
http://www.osnews.com/story.php?news_id=8349
the guy talks about EXACTLY my same problem, calling it "the notorious nvidia soft freeze".
i don't know if he has a smp kernel (this would definitly prove me wrong), but he says that recompiling without support for framebuffer fix the thing.
i'll have to try that, but i'm not very happy about this solution anyway, because a kernel recompile the way i want it takes hours, and doing it just to disable some drivers that i already don't use is suboptimal... if nvidia drivers can't cope with framebuffers i'd really prefer that nvidia fix them...

purrkur, out of curiosity, do you enable vesa framebuffer or riva framebuffer as modules on your kernels? or do you use framebuffers in any way?
oh, and don't change drivers and kernel just to test them for me, it's not necessary, thank you anyway.
ImagePowered by Debian Sid/Unstable on 2.6.8-1-686-smp...
purrkur
Linux Guru
Posts: 687
Joined: Fri Dec 12, 2003 5:57 pm
Location: Sweden
Contact:

Post by purrkur »

Hmmm. There was mention of this freeze as well in the link to the Nvidia forums that I sent you. The curious thing is that some users have this while others have not. It is definitely a config issue but the question is what?

I use framebuffers because I like being able to change the resolution of my screen when not running X (I never start X at boot.) However, I only compile support for VESA VGA and nothing else.

Oh, and BTW, why does a kernel compile take hours?? I think that a full kernel compile for me is somewhere inbetween 35 and 45 minutes on my BP6. I will definitely upgrade my kernel and Nvidia drivers sometime soon when I got the time to do so. Nothing beats playing around with the latest and greatest :) If you decide to do a kernel recompile then I will gladly send you my .config file. You might have to alter some things but you will only need to do minimal amount of configuration.
2x533MHz@544MHz, 2.0V
640MB PC100 memory
Realtek RTL-8139 NIC
Maxtor 6Y080L0 80GB hdd
Debian Linux stable with 2.4.8 kernel
Post Reply