FedoraForum.org - Fedora Support Forums and Community
Results 1 to 11 of 11
  1. #1
    Join Date
    Sep 2004
    Posts
    65

    [SOLVED] Random Crashing Fedora 26 After Upgrading from 4.12 to 4.13

    THIS IS SOLVED - or at least I've identified a fix.

    Here's the problem, which was a change in kernel config options from 4.12 to 4.13 kernel:
    https://bugzilla.kernel.org/show_bug.cgi?id=196683

    Solution:
    Append the following to boot parameters to disable RCU on all CPU:
    Code:
    rcu_nocbs=0-15
    Uptime is over 7 days now, used to reboot 1-2 times per day.


    --------------------------

    Fedora 26
    Ryzen 1700X
    32GB RAM (memtested without issue)
    MSI B350 Tomahawk
    GTX 1070

    Temps are very good (mid 30c)

    Random crashes, few times a day. Computer is frozen, mouse is lit up, num lock works, but monitors won't wake up, computer wont respond on the network via icmp and has to be physically restarted.

    I've used several kernel versions with the same random crashes, including the latest:

    4.13.5-200
    4.13.4-200
    4.12.9-300

    nvidia driver is from negativo repo:
    dkms-nvidia-384.90-2

    From journalctl:

    Oct 24 07:11:59 hostname rtkit-daemon[997]: The canary thread is
    apparently starving. Taking action.
    Oct 24 07:11:59 hostname rtkit-daemon[997]: Demoting known real-time
    threads.
    Oct 24 07:11:59 hostname rtkit-daemon[997]: Successfully demoted thread
    5046 of process 5024 (/usr/bin/pulseaudio).
    Oct 24 07:11:59 hostname rtkit-daemon[997]: Successfully demoted thread
    5043 of process 5024 (/usr/bin/pulseaudio).
    Oct 24 07:11:59 hostname rtkit-daemon[997]: Successfully demoted thread
    5040 of process 5024 (/usr/bin/pulseaudio).
    Oct 24 07:11:59 hostname rtkit-daemon[997]: Successfully demoted thread
    5036 of process 5024 (/usr/bin/pulseaudio).
    Oct 24 07:11:59 hostname rtkit-daemon[997]: Successfully demoted thread
    5033 of process 5024 (/usr/bin/pulseaudio).
    Oct 24 07:11:59 hostname rtkit-daemon[997]: Successfully demoted thread
    5024 of process 5024 (/usr/bin/pulseaudio).
    Oct 24 07:11:59 hostname rtkit-daemon[997]: Demoted 6 threads.
    Oct 24 07:12:07 hostname kernel: list_add corruption. next->prev should
    be prev (ffff8c08ce9dadf8), but was ffff8c0802d9cd68. (next=ffff8c08ce9dadf8).
    Oct 24 07:12:07 hostname kernel: ------------[ cut here ]------------
    Oct 24 07:12:07 hostname kernel: kernel BUG at lib/list_debug.c:25!
    Oct 24 07:12:07 hostname kernel: invalid opcode: 0000 [#1] SMP
    Oct 24 07:12:07 hostname kernel: Modules linked in: rpcsec_gss_krb5
    auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache vhost_net vhost tap
    xt_CHECKSUM ipt_MASQUERADE nf_nat_masque
    Oct 24 07:12:07 hostname kernel: crc32_pclmul ghash_clmulni_intel
    snd_hda_intel snd_hda_codec snd_usb_audio snd_hda_core snd_usbmidi_lib
    snd_hwdep snd_rawmidi snd_seq joydev snd_seq_
    Oct 24 07:12:07 hostname kernel: CPU: 2 PID: 4688 Comm: chrome Tainted:
    P OE 4.13.5-200.fc26.x86_64 #1
    Oct 24 07:12:07 hostname kernel: Hardware name: Micro-Star
    International Co., Ltd MS-7A34/B350 TOMAHAWK ARCTIC (MS-7A34), BIOS H.50
    06/22/2017
    Oct 24 07:12:07 hostname kernel: task: ffff8c08c628cc80 task.stack:
    ffffb1d68bbfc000
    Oct 24 07:12:07 hostname kernel: RIP: 0010:__list_add_valid+0x3b/0x70
    Oct 24 07:12:07 hostname kernel: RSP: 0018:ffffb1d68bbffd30 EFLAGS:
    00010086
    Oct 24 07:12:07 hostname kernel: RAX: 0000000000000075 RBX:
    ffff8c08c8678200 RCX: 0000000000000000
    Oct 24 07:12:07 hostname kernel: RDX: 0000000000000000 RSI:
    ffff8c08ce68e118 RDI: ffff8c08ce68e118
    Oct 24 07:12:07 hostname kernel: RBP: ffffb1d68bbffd30 R08:
    000000000000043f R09: 0000000000000004
    Oct 24 07:12:07 hostname kernel: R10: 000000000000ba14 R11:
    ffffffff92314aed R12: ffff8c07c9eaa700
    Oct 24 07:12:07 hostname kernel: R13: ffff8c08ce9da440 R14:
    ffff8c07c9eaa728 R15: ffff8c08ce9dadf8
    Oct 24 07:12:07 hostname kernel: FS: 00007f116d817ac0(0000)
    GS:ffff8c08ce680000(0000) knlGS:0000000000000000
    Oct 24 07:12:07 hostname kernel: CS: 0010 DS: 0000 ES: 0000 CR0:
    0000000080050033
    Oct 24 07:12:07 hostname kernel: CR2: 00003aa455bb5000 CR3:
    00000003f185a000 CR4: 00000000003406e0
    Oct 24 07:12:07 hostname kernel: Call Trace:
    Oct 24 07:12:07 hostname kernel: account_entity_enqueue+0xd8/0x100
    Oct 24 07:12:07 hostname kernel: enqueue_entity+0x9a/0x7a0
    Oct 24 07:12:07 hostname kernel: enqueue_task_fair+0x7a/0x720
    Oct 24 07:12:07 hostname kernel: activate_task+0x51/0xc0
    Oct 24 07:12:07 hostname kernel: wake_up_new_task+0x117/0x250
    Oct 24 07:12:07 hostname kernel: _do_fork+0x132/0x390
    Oct 24 07:12:07 hostname kernel: SyS_clone+0x19/0x20
    Oct 24 07:12:07 hostname kernel: do_syscall_64+0x67/0x140
    Oct 24 07:12:07 hostname kernel: entry_SYSCALL64_slow_path+0x25/0x25
    Oct 24 07:12:07 hostname kernel: RIP: 0033:0x7f11671d7bb1
    Oct 24 07:12:07 hostname kernel: RSP: 002b:00007fff454a4538 EFLAGS:
    00000206 ORIG_RAX: 0000000000000038
    Oct 24 07:12:07 hostname kernel: RAX: ffffffffffffffda RBX:
    00007f11487ff700 RCX: 00007f11671d7bb1
    Oct 24 07:12:07 hostname kernel: RDX: 00007f11487ff9d0 RSI:
    00007f11487fec30 RDI: 00000000003d0f00
    Oct 24 07:12:07 hostname kernel: RBP: 00007fff454a4610 R08:
    00007f11487ff700 R09: 00007f11487ff700
    Oct 24 07:12:07 hostname kernel: R10: 00007f11487ff9d0 R11:
    0000000000000206 R12: 0000000000000000
    Oct 24 07:12:07 hostname kernel: R13: 00007fff454a4760 R14:
    00000d2020cb6c10 R15: 000000000000001c
    Oct 24 07:12:07 hostname kernel: Code: 48 8b 10 4c 39 c2 75 25 48 39 c7
    74 34 48 39 d7 74 2f b8 01 00 00 00 5d c3 48 89 d1 48 c7 c7 38 ab cc 91 48 89
    c2 e8 26 58 cc ff <0f> 0b 48 89 c
    -- Reboot --
    Last edited by antikythera; 11th November 2017 at 03:02 PM. Reason: formatting

  2. #2
    Join Date
    Sep 2004
    Posts
    65

    Re: Random Crashing

    Any suggestions? My guess is nvidia driver but I'm grasping at straws. Might try rolling it back and might try switching to nouveau driver for a while to see if I can determine if it really is nvidia driver issues.
    Last edited by jon3k; 26th October 2017 at 04:00 PM.

  3. #3
    Join Date
    Dec 2009
    Posts
    83

    Re: Random Crashing

    I've been running into the same issue. I'm currently running the nouveau drivers, though I have used the proprietary drivers in the past. I still have not found an answer.

  4. #4
    Join Date
    Jun 2005
    Location
    Montreal, Que, Canada
    Posts
    4,496

    Re: Random Crashing

    Is your cpu one of the early manufactered ones, (before week 30 of production). If so, you could ask for a rma and get a new cpu to replace that one. Tom's hardware has a test suite to verify that you have the "good" cpu.

    What motherboard are you using?

    I am actually looking to purchase the 1700 cpu system. (I don't need 95 watts when 65 will do for me).
    Last edited by lsatenstein; 26th October 2017 at 10:03 PM.
    Leslie in Montreal

    Interesting web sites list
    http://forums.fedoraforum.org/showth...40#post1697840

  5. #5
    Join Date
    Sep 2004
    Posts
    65

    Re: Random Crashing

    Quote Originally Posted by lsatenstein
    Is your cpu one of the early manufactered ones, (before week 30 of production). If so, you could ask for a rma and get a new cpu to replace that one. Tom's hardware has a test suite to verify that you have the "good" cpu.

    What motherboard are you using?

    I am actually looking to purchase the 1700 cpu system. (I don't need 95 watts with 65 will do for me).
    I would recommend teh 1700, I only bought the X because I got it for essentially the same price.

    MSI Tomahawk B350 motherboard

    I've been hoping and praying it isn't the Ryzen bug, but thanks for the advice I will go check now. Ugh, I hope it's not.

  6. #6
    Join Date
    Jun 2005
    Location
    Montreal, Que, Canada
    Posts
    4,496

    Re: Random Crashing

    Quote Originally Posted by jon3k
    I would recommend the 1700, I only bought the X because I got it for essentially the same price.

    MSI Tomahawk B350 motherboard

    I've been hoping and praying it isn't the Ryzen bug, but thanks for the advice I will go check now. Ugh, I hope it's not.
    AMD is honorable and reputable. If it is the faulty cpu, the RMA request is not because of cpu burnout, but because its got a bug.

    Also check if your motherboard bios is up to date.
    Last edited by lsatenstein; 3rd November 2017 at 11:01 PM.
    Leslie in Montreal

    Interesting web sites list
    http://forums.fedoraforum.org/showth...40#post1697840

  7. #7
    Join Date
    Sep 2004
    Posts
    65

    Re: Random Crashing

    I'm relatively certain that my CPU is affected by the bug. I was able to get segfaults using the ryzen-kill script. However, I don't think it's related to my random freezing because that happens when the computer is totally idle. The Linux Ryzen bug should only appear during heavy compilation.

    each time it creates a director in /var/spool/abrt/ named "oops-[date:time]" so I'm trying to determine if that's helpful to identify the problem.

  8. #8
    Join Date
    Jun 2005
    Location
    Montreal, Que, Canada
    Posts
    4,496

    Re: Random Crashing

    Hi jon3k,

    contact AMD for a new replacement cpu chip. Do realize that when you read the popular articles about Ryzen, that those presenters have only a few minutes with Ryzen. They get the system running, run some tests, and then produce a report.
    And probably, they stopped testing until they got a replacement. That is probably why they did not report other issues with the original chip.
    Leslie in Montreal

    Interesting web sites list
    http://forums.fedoraforum.org/showth...40#post1697840

  9. #9
    Join Date
    Oct 2013
    Location
    In A Bottle
    Posts
    9

    Re: Random Crashing

    Quote Originally Posted by jon3k
    However, I don't think it's related to my random freezing because that happens when the computer is totally idle. The Linux Ryzen bug should only appear during heavy compilation.
    I think you 're right about it , mine run fine on 4.11.x, 4.12.x but not on 4.13.x .
    If you look at bugzilla kernel and bugzilla redhat you'll find few bugs reported already .
    My ryzen 7 cpu was running smooth on 4.12.x then as soon as I was on 4.13.x my system froze and had some random reboots .

    Modify your /etc/dnf/dnf.conf to keep more than 3 kernels as rollback and wait for the kernel 4.15 witch should bring some nice features for amd users included a working lm_sensors for ryzen.

    Currently I am on Archlinux but I stick to the kernel LTS version otherwise like on Fedora I get some random reboot with the last kernel witch also is a 4.13.x version.

    Fell free to try some tricks provided by bugzilla kernel if you want, I didn't try yet have some work to do and need something stable, but will give a try as soon as I have time for this.

    https://bugzilla.kernel.org/show_bug.cgi?id=195919
    https://bugzilla.kernel.org/show_bug.cgi?id=196683
    https://bugzilla.redhat.com/show_bug.cgi?id=1502095
    https://bugzilla.redhat.com/show_bug.cgi?id=1502067

  10. #10
    Join Date
    Sep 2004
    Posts
    65

    Re: Random Crashing

    kleepeat:

    I saw the exact same behavior. This turned out to be due to some changes in kernel config options from 4.12 to 4.13 involving RCU.

    I've been following this bug specifically for
    https://bugzilla.kernel.org/show_bug.cgi?id=196683

    Passing the following boot parameter solved my problem:
    rcu_nocbs=0-15

  11. #11
    Join Date
    Oct 2013
    Location
    In A Bottle
    Posts
    9

    Re: Random Crashing

    Glad your system runs fine now, I followed the thread over there too.
    I've been testing during 5 days the exact same parameter passed to the kernel (currently it is the 4.13.11-100) via grub (I didn't rebuild the kernel) and so far so good like you.
    Remember it is only a workaround and that doesn't solve the real problem yet, but for now it's good enough at least with Fedora.
    On the contrary with ArchLinux I still have some random reboots the workaround doesn't work, but it's an another story.

    See you mate

Similar Threads

  1. random kde restart
    By ssb in forum Using Fedora
    Replies: 0
    Last Post: 16th January 2011, 08:40 PM
  2. Fedora 13 x64 LXDE crashing crashing crashing
    By joutlan in forum Using Fedora
    Replies: 3
    Last Post: 31st May 2010, 07:44 AM
  3. I know this is random but....
    By Paulfocused in forum Wibble
    Replies: 3
    Last Post: 14th March 2010, 07:52 PM
  4. FC6 random crashing: suspect nvidia or 965 chipset issue
    By wintermute000 in forum EOL (End Of Life) Versions
    Replies: 10
    Last Post: 29th January 2007, 01:03 AM
  5. FC3 - Nvsound - random crashing!
    By interzoneuk in forum Hardware & Laptops
    Replies: 2
    Last Post: 19th April 2005, 11:44 AM

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •