PDA

View Full Version : Tainted Kernel messages in abrt but nothing in dmesg



dcharlespyle
24th September 2012, 02:53 PM
I am getting lots of messages about tainted kernels in Fedora 18. I try to run dmesg and get no results stating where or what is the cause. I only see a message about USB storage being not tainted. Any idea what might be up with that? I'm stumped at the moment. :blink:

Kernel and dmesg output, as well as kernel version, follow.

# rpm -qa kernel
kernel-3.6.0-0.rc6.git2.1.fc18.x86_64
kernel-3.6.0-0.rc6.git0.2.fc18.x86_64
kernel-3.6.0-0.rc2.git2.1.fc18.x86_64

# dmesg | grep kernel
[ 0.000000] kernel direct mapping tables up to 0xcff5ffff @ [mem 0x1f97b000-0x1fffffff]
[ 0.000000] kernel direct mapping tables up to 0x12fffffff @ [mem 0xcff5a000-0xcff5ffff]
[ 0.000000] Booting paravirtualized kernel on bare hardware
[ 0.000000] Memory: 4004180k/4980736k available (7085k kernel code, 787532k absent, 189024k reserved, 6424k data, 3244k init)
[ 1.853867] DMA-API: debugging enabled by kernel config
[ 3.425494] Freeing unused kernel memory: 3244k freed
[ 3.427602] Write protecting the kernel read-only data: 12288k
[ 3.430772] Freeing unused kernel memory: 1096k freed
[ 3.434237] Freeing unused kernel memory: 1216k freed
[ 4.412518] [drm] radeon kernel modesetting enabled.
[ 4.424283] [drm] initializing kernel modesetting (JUNIPER 0x1002:0x68BE 0x1043:0x0338).
[ 4.429835] [TTM] Zone kernel: Available graphics memory: 2014478 kiB
[ 5.164979] [<ffffffff816e8704>] kernel_thread_helper+0x4/0x10

# dmesg | grep tainted
[ 5.164915] Pid: 165, comm: usb-storage Not tainted 3.6.0-0.rc6.git2.1.fc18.x86_64 #1

I have attached the abrt-gui information displayed

jpollard
24th September 2012, 03:06 PM
Are you using a Nvidia driver?

That is the usual cause.

dcharlespyle
26th September 2012, 03:26 PM
No, I am not using the nVidia driver. I have an Ati/AMD card but I am using only the FOSS driver. In the event it was the graphic driver, which at times past I have seen before, dmesg would have said something. I have seen nothing. The only thing dmesg gives me is the following:

dmesg | grep tainted
[ 5.238939] Pid: 180, comm: usb-storage Not tainted 3.6.0-0.rc6.git2.1.fc18.x86_64 #1

"Taints" gives me a blank. Whenever it has been a video driver it has been "taints" that gives me information about the driver that did it.

I have not had any errors like the above until I installed the 3.6 series of the kernel, with the exception of one of the 3.5 kernels.

Here is what I get when I grep "radeon":

$ dmesg | grep radeon
. . .

[ 4.381138] [drm] radeon kernel modesetting enabled.
[ 4.381287] fb: conflicting fb hw usage radeondrmfb vs VESA VGA - removing generic driver
[ 4.396850] radeon 0000:01:00.0: VRAM: 1024M 0x0000000000000000 - 0x000000003FFFFFFF (1024M used)
[ 4.396853] radeon 0000:01:00.0: GTT: 512M 0x0000000040000000 - 0x000000005FFFFFFF
[ 4.405028] [drm] radeon: 1024M of VRAM memory ready
[ 4.405044] [drm] radeon: 512M of GTT memory ready.
[ 4.405714] radeon 0000:01:00.0: irq 44 for MSI/MSI-X
[ 4.405806] radeon 0000:01:00.0: radeon: using MSI.
[ 4.406835] [drm] radeon: irq initialized.
[ 4.408655] [drm] enabling PCIE gen 2 link speeds, disable with radeon.pcie_gen2=0
[ 4.634708] radeon 0000:01:00.0: WB enabled
[ 4.634712] radeon 0000:01:00.0: fence driver on ring 0 use gpu addr 0x0000000040000c00 and cpu addr 0xffff8801204c4c00
[ 4.682380] [drm] radeon: power management initialized
[ 4.728171] fbcon: radeondrmfb (fb0) is primary device
[ 4.952859] fb0: radeondrmfb frame buffer device
[ 4.952922] [drm] Initialized radeon 2.22.0 20080528 for 0000:01:00.0 on minor 0
[ 5.238932] Modules linked in: radeon i2c_algo_bit drm_kms_helper ttm drm i2c_core usb_storage

I haven't had any troubles with this before. None of the above have caused the kernel to be tainted before, either. I did notice this morning when I booted up that I saw a message regarding TSC calibration failing. Using grep for that gives me the following:

$ dmesg | grep TSC
[ 0.000000] tsc: Fast TSC calibration failed
[ 2.854138] tsc: Refined TSC clocksource calibration: 3191.999 MHz

But, this is my CPU speed and not referring to the graphics card. I have also not seen this failure in the 3.5 series kernels.

So, for the moment, I am stumped as to what is causing this. I am wondering whether or not it is the bluetooth driver that is forced to load with BNEP. I do not even have a bluetooth card in the computer. Trying to remove the modules BNEP and bluetooth is met with failure, too.

DBelton
26th September 2012, 03:47 PM
I wonder if this goes back to the abrt bug where when it hit multiple kernel opps's. Only the first oops shows as being untainted (if the kernel is untainted) and the ones afterwards show it being tainted and abrt picks it up as the kernel being tainted.

dcharlespyle
27th September 2012, 01:36 AM
I don't know about that at this point. So far, every single time I restart and log on I get the tainted kernel error. I think I know at least part of the cause, however. It looks like a recursive locking error. I have managed to recover the file I needed to figure it out. Here is what I am getting when the tainted kernel message comes up. It looks like it is in the cx18 driver again. That is a regression.

[ INFO: possible recursive locking detected ]
3.6.0-0.rc6.git2.1.fc18.x86_64 #1 Tainted: G W
---------------------------------------------
systemd-udevd/429 is trying to acquire lock:
(hdl->lock){+.+...}, at: [<ffffffffa02a685e>] find_ref_lock+0x2e/0x60 [videodev]
but task is already holding lock:
(hdl->lock){+.+...}, at: [<ffffffffa02a979d>] v4l2_ctrl_add_handler+0x7d/0xd0 [videodev]
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(hdl->lock);
lock(hdl->lock);
*** DEADLOCK ***
May be due to missing lock nesting notation
3 locks held by systemd-udevd/429:
#0: (&__lockdep_no_validate__){......}, at: [<ffffffff8143fb0b>] __driver_attach+0x5b/0xb0
#1: (&__lockdep_no_validate__){......}, at: [<ffffffff8143fb19>] __driver_attach+0x69/0xb0
#2: (hdl->lock){+.+...}, at: [<ffffffffa02a979d>] v4l2_ctrl_add_handler+0x7d/0xd0 [videodev]
stack backtrace:
Pid: 429, comm: systemd-udevd Tainted: G W 3.6.0-0.rc6.git2.1.fc18.x86_64 #1
Call Trace:
[<ffffffff810d58cf>] __lock_acquire+0x15bf/0x1ae0
[<ffffffff810ac8a8>] ? sched_clock_cpu+0xa8/0x120
[<ffffffff810ac8a8>] ? sched_clock_cpu+0xa8/0x120
[<ffffffff81021dc3>] ? native_sched_clock+0x13/0x80
[<ffffffff810d64d1>] lock_acquire+0xa1/0x1f0
[<ffffffffa02a685e>] ? find_ref_lock+0x2e/0x60 [videodev]
[<ffffffff816da596>] mutex_lock_nested+0x76/0x390
[<ffffffffa02a685e>] ? find_ref_lock+0x2e/0x60 [videodev]
[<ffffffffa02a979d>] ? v4l2_ctrl_add_handler+0x7d/0xd0 [videodev]
[<ffffffffa02a685e>] ? find_ref_lock+0x2e/0x60 [videodev]
[<ffffffff810d705d>] ? trace_hardirqs_on_caller+0x10d/0x1a0
[<ffffffffa02a685e>] find_ref_lock+0x2e/0x60 [videodev]
[<ffffffffa02a8d61>] handler_new_ref+0x51/0x200 [videodev]
[<ffffffffa02a97c8>] v4l2_ctrl_add_handler+0xa8/0xd0 [videodev]
[<ffffffffa02a365f>] v4l2_device_register_subdev+0x9f/0x1a0 [videodev]
[<ffffffffa03455ab>] cx18_av_probe+0x28b/0x2f0 [cx18]
[<ffffffff816db5de>] ? mutex_unlock+0xe/0x10
[<ffffffffa0349934>] cx18_probe+0xddf/0x143b [cx18]
[<ffffffff816de57f>] ? _raw_spin_unlock_irqrestore+0x3f/0x80
[<ffffffff810d70fd>] ? trace_hardirqs_on+0xd/0x10
[<ffffffff813764a9>] local_pci_probe+0x79/0x100
[<ffffffff81376641>] pci_device_probe+0x111/0x120
[<ffffffff8143f7ab>] driver_probe_device+0x8b/0x390
[<ffffffff8143fb5b>] __driver_attach+0xab/0xb0
[<ffffffff8143fab0>] ? driver_probe_device+0x390/0x390
[<ffffffff8143d745>] bus_for_each_dev+0x55/0x90
[<ffffffff8143f11e>] driver_attach+0x1e/0x20
[<ffffffff8143ed40>] bus_add_driver+0x1b0/0x2a0
[<ffffffffa0359000>] ? 0xffffffffa0358fff
[<ffffffff81440257>] driver_register+0x77/0x170
[<ffffffffa0359000>] ? 0xffffffffa0358fff
[<ffffffff8137517f>] __pci_register_driver+0x6f/0xf0
[<ffffffffa0359000>] ? 0xffffffffa0358fff
[<ffffffffa0359078>] module_start+0x78/0x1000 [cx18]
[<ffffffff8100212a>] do_one_initcall+0x12a/0x180
[<ffffffff810e3db6>] sys_init_module+0x156/0x2290
[<ffffffff81365970>] ? ddebug_proc_open+0xd0/0xd0
[<ffffffff811d0ee0>] ? delayed_fput+0xb0/0xb0
[<ffffffff816e7529>] system_call_fastpath+0x16/0x1b

I still get nothing in dmesg, though.

DBelton
27th September 2012, 04:10 AM
You have a Hauppauge card?

Have you tried installing the cx18-firmware?

stevea
27th September 2012, 07:12 AM
Proprietary modules is NOT the problem.

If you look at the screenshot for taint flags, the OP has the G and W flags set.
G means all modules are GPL'ed. (P means proprietary).
W means there was already a kernel problem reported.

http://www.kernel.org/doc/Documentation/oops-tracing.txt


'W' if a warning has previously been issued by the kernel.
(Though some warnings may set more specific taint flags.)
===========

I see you are getting a bunch of SIGSEGVs for processes.
Is you DRAM OK ? It could cause this problem too.
install 'memtest86+' package and boot it, let it run a few FULL cycles or overnight.
This *might* cause the other problem, but that's iffy.
===========

This bug in systemd-devd seems to be the cause of the taint, not the abrt.


INFO: possible recursive locking detected ]
3.6.0-0.rc6.git2.1.fc18.x86_64 #1 Tainted: G W
---------------------------------------------
systemd-udevd/429 is trying to acquire lock:
(hdl->lock){+.+...}, at: [<ffffffffa02a685e>] find_ref_lock+0x2e/0x60 [videodev]
but task is already holding lock:


I strongly urge you report that one to bugzilla.redhat.com
It looks serious and could impact a lot of others.
Linux is a community effort and you an do your bit by reporting problems.

The traceback is through the v4l2 video code, so I suspect it's your vid cam driver
probe code. You might try blacklisting the driver for it, disconnecting it,
whatever, as test. The driver is either cx18 or cx18-alsa

You can add the lines
blacklist cx18
blacklist cx18-alsa
to /etc/modprobe.d/blacklist.conf and reboot to test if that fixes it.
You won't have access to the camera.
===========

Your kernel abort problem originated in lib/dma-debug.c which is part of the kernel DMA routines.
There may be a bug in one of the drivers vs your chipsets. The disk drivers and enet drives are top of the list. *maybe* your camera.

Try booting w/o some parts or drivers if you can.

If the above don't work out post the output of
dmesg
lsmod
lspci -nnk

---------- Post added at 02:12 AM ---------- Previous post was at 01:21 AM ----------

After scanning the kernel, it seems that probe/traceback was likely from the cx18-alsa driver which includes 'av hardware, specifically a sound system. The error is the result of the setup of the 'av' hardwaare. It really is possible that that failure caused the dma-debug abort later.

Most likely the cx18-alsa driver has a problem.

dcharlespyle
27th September 2012, 12:55 PM
You have a Hauppauge card?

Have you tried installing the cx18-firmware?

Yes and yes to both questions. It still happens with the 3.6 kernels but not the 3.5 kernels (now gone from the /boot directory due to updates).

A recent set of updates just made my system go loopy so I have to for the moment reply to this using Windows...unfortunate.

---------- Post added at 05:55 AM ---------- Previous post was at 05:50 AM ----------


I strongly urge you report that one to bugzilla.redhat.com
It looks serious and could impact a lot of others.
Linux is a community effort and you an do your bit by reporting problems.

I have tried but the application would not allow it. I would have to do that manually on the website. I will try the other suggestions as soon as I can get my Linux installation to run without freezing on me repeatedly. That behavior started after applying a number of updates. No errors in the updating process but plenty afterward. Lots of new SELinux denials, too. Well, I did click the "I accept my fate" button, now didn't I?

stevea
28th September 2012, 05:16 AM
Check your dram
try the blacklist

dcharlespyle
28th September 2012, 08:24 AM
Blacklisting failed. It still loads. The same is the case for the bluetooth and bnep drivers I tried blacklisting. Checking dram tonight.

stevea
28th September 2012, 08:54 AM
Hmm something wrong about the blacklist not working. Unless maybe it's in the initramfs and you'll have to remove & reinstall the kernel (w/ the blacklist in place) to fix.

dcharlespyle
28th September 2012, 12:57 PM
Looks like I can't run the memtest, either. I am logged on as root and am getting all sorts of 'permission denied' errors.

jpollard
28th September 2012, 02:19 PM
memtest is a very tiny kernel that does nothing but exercise the CPU/memory.

Frequently you can boot it from either the install disk/rescue disk, or (if it is installed) from the grub menu.

DBelton
28th September 2012, 02:53 PM
memtest86+ has to be booted using it's own kernel. I don't believe you can run it from a running kernel.

You have to run memtest-setup and it will put an entry in your grub2 menu to boot it if you installed it to your F18 install. Or you can run it from the install media.

but, the kernel module (for cx18) may have been built into your initramfs. To blacklist it, you may have to blacklist the module, then rebuild the initramfs with dracut. Memtest doesn't use a initramfs.

You can try rebuilding your initramfs after you blacklist the module.

to backup the current initramfs and rebuild the initramfs for your currently running kernel:


cp /boot/initramfs-$(uname -r).img /boot/initramfs-$(uname -r).bak

dracut -f /boot/initramfs-$(uname -r).img $(uname -r)

dcharlespyle
29th September 2012, 12:54 PM
Well, that explains why I couldn't run it. Checked DRAM. All fine. Zero errors after 24 hours. Checked Pango and got:

$ rpm -qa pango
pango-1.31.2-1.fc18.x86_64
pango-1.31.2-1.fc18.i686

I am going to try to grab the latest off koji and see what happens. Then will try to rebuild inramfs and reinstall kernel after doing that.

I still think it is odd that I am only getting this error and seeing this behavior with the 3.6 kernels. More details to follow after I have done all this.

DBelton
29th September 2012, 04:52 PM
I forgot to add above that you have to run grub2-mkconfig -o /boot/grub2/grub.cfg after you run the memtest-setup.

Memtest-setup puts the scripts needed by grub2 to create the menu entries in your /etc/grub.d folder. You still have to run grub2-mkconfig for it to actually create the grub2 menu entries for it.

And it's not really odd that you had good results from the 3.5 kernel and 3.6 is broken. Major version changes is usually where the modules get broken, if they are going to break.

Which brings up something that's been bugging me for awhile now. These fast path changes to major system components, like the kernel, systemd, etc... We ran on the 2.6 kernel for years, and now within the past year or so, it's gone from 3.0 to 3.6

I think they are now putting out new releases more to fix regressions and errors with their own patches than new development. It's like there aren't any real developers and programmers involved anymore. It's just what I call "hacker programmers". They hack at a problem, sticking patches in until they get things to work, release it and then start the cycle again when the patch breaks something else.

dcharlespyle
29th September 2012, 07:35 PM
I just took for granted that I have to run grub2-mkconfig -o /boot/grub2/grub.cfg after running setup.

Memory was fine. I just applied another spate of test updates a couple hours ago, and all the crashes I have been having are gone. I also am no longer seeing the tainted kernel errors so far. Unfortunately, Cheese is broken--again.

I am thinking something similar in relation to programming practices. I think there are exceptions but...