CPU Hardware Error
FedoraForum.org - Fedora Support Forums and Community
Results 1 to 13 of 13
  1. #1
    Join Date
    Mar 2016
    Location
    Earth
    Posts
    5

    CPU Hardware Error

    Hi all,
    I have had some occasional crashes on the machine which I think is related to this in the log :

    kernel: mce: [Hardware Error]: Machine check events logged
    kernel: mce: [Hardware Error]: CPU 1: Machine Check: 0 Bank 5: bea0000000000108
    kernel: mce: [Hardware Error]: TSC 0 ADDR 1ffff9205da4e MISC d012000101000000 SYND 4d000000 IPID 500b000000000
    kernel: mce: [Hardware Error]: PROCESSOR 2:800f11 TIME 1517252461 SOCKET 0 APIC 1 microcode 8001126

    The CPU is a AMD Ryzen and it only happens very occasionally but its anoying.... any thoughts on what the error is talking about?
    Trebor

  2. #2
    Join Date
    Sep 2011
    Location
    New York
    Posts
    184

    Re: CPU Hardware Error

    I have the same issue. I have turned the opcache setting off in the BIOS as a fix, which seems to have worked. Some also claim that changing their c-state settings fixed the issue.

  3. #3
    Join Date
    Dec 2012
    Location
    santa barbara, CA
    Posts
    918

    Re: CPU Hardware Error

    I have a Ryzen 5, 1600X, in an Asus B350 mobo, and I haven't seen any such errors, in about a month of heavy usage. I did update the mobo bios to the latest though , and I am not overclocking.
    "monsters John ... monsters from the ID..."
    "ma vule teva maar gul nol naya"

  4. #4
    Join Date
    Mar 2016
    Location
    Earth
    Posts
    5

    Re: CPU Hardware Error

    Thanks for the tips Steve, it does seem to be connected to the power management and only happened after the machine had been suspended/woken up....

  5. #5
    Join Date
    Oct 2007
    Posts
    398

    Re: CPU Hardware Error

    Did you report the bug? It would be great for us Ryzen users to have this fixed.

  6. #6
    Join Date
    Oct 2006
    Posts
    25

    Re: CPU Hardware Error

    Just piling on here. I recently upgraded my system to a Ryzen 1800X with new mobo, ram, and video card but I reused my wifi card (tp-link ath9k) and power supply (550W 80PLUS-Bronze). I noticed I was getting similar random lockups and they seem to occur when my internet traffic is heaviest (eg, streaming video, downloading files, or browsing a lot of web sites). It usually manifests as a rapid slowing down of my computer before it just hangs completely and I have to reset with the hardware power button.

    I've noticed the following in my journalctl log at the time of the incidents:
    Mar 11 20:52:44 phobos kernel: watchdog: BUG: soft lockup - CPU#7 stuck for 23s! [kworker/u32:1:11933]
    Mar 11 20:52:44 phobos kernel: Modules linked in: usblp fuse ccm nf_conntrack_netbios_ns nf_conntrack_broadcast xt_CT ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_br
    Mar 11 20:52:44 phobos kernel: snd_timer sp5100_tco rfkill k10temp snd i2c_piix4 soundcore shpchp wmi acpi_cpufreq dm_crypt amdkfd amd_iommu_v2 amdgpu chash drm_kms_helper ttm igb drm crct10dif_pclmul ptp crc32
    Mar 11 20:52:44 phobos kernel: CPU: 7 PID: 11933 Comm: kworker/u32:1 Not tainted 4.15.6-300.fc27.x86_64 #1
    Mar 11 20:52:44 phobos kernel: Hardware name: System manufacturer System Product Name/PRIME X370-PRO, BIOS 3803 01/22/2018
    Mar 11 20:52:44 phobos kernel: Workqueue: phy0 ath_reset_work [ath9k]
    Mar 11 20:52:44 phobos kernel: RIP: 0010:ioread32+0x19/0x30
    Mar 11 20:52:44 phobos kernel: RSP: 0018:ffffab3e0c0a3d78 EFLAGS: 00000292 ORIG_RAX: ffffffffffffff11
    Mar 11 20:52:44 phobos kernel: RAX: 00000000ffffffff RBX: 0000000000000086 RCX: 0000000000000002
    Mar 11 20:52:44 phobos kernel: RDX: 00000000000092ac RSI: 0000000000007000 RDI: ffffab3e02a07000
    Mar 11 20:52:44 phobos kernel: RBP: ffff9af64b35c028 R08: 00000000ffffffff R09: 0000000000000000
    Mar 11 20:52:44 phobos kernel: R10: 0000000000000002 R11: 000000000000000f R12: 0000000000002710
    Mar 11 20:52:44 phobos kernel: R13: 0000000000007000 R14: 0000000000000003 R15: 0000000000000000
    Mar 11 20:52:44 phobos kernel: FS: 0000000000000000(0000) GS:ffff9af65edc0000(0000) knlGS:0000000000000000
    Mar 11 20:52:44 phobos kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    Mar 11 20:52:44 phobos kernel: CR2: 00007f59c01faff0 CR3: 000000032120a000 CR4: 00000000003406e0
    Mar 11 20:52:44 phobos kernel: Call Trace:
    Mar 11 20:52:44 phobos kernel: ath9k_hw_wait+0x56/0x90 [ath9k_hw]
    Mar 11 20:52:44 phobos kernel: ath9k_hw_set_reset+0x288/0x410 [ath9k_hw]
    Mar 11 20:52:44 phobos kernel: ath9k_hw_reset+0x1d8/0x1460 [ath9k_hw]
    Mar 11 20:52:44 phobos kernel: ath_reset_internal+0xfd/0x1e0 [ath9k]
    Mar 11 20:52:44 phobos kernel: ath_reset_work+0x1f/0x30 [ath9k]
    Mar 11 20:52:44 phobos kernel: process_one_work+0x175/0x390
    Mar 11 20:52:44 phobos kernel: worker_thread+0x2e/0x380
    Mar 11 20:52:44 phobos kernel: ? process_one_work+0x390/0x390
    Mar 11 20:52:44 phobos kernel: kthread+0x113/0x130
    Mar 11 20:52:44 phobos kernel: ? kthread_create_worker_on_cpu+0x70/0x70
    Mar 11 20:52:44 phobos kernel: ? do_syscall_64+0x74/0x180
    Mar 11 20:52:44 phobos kernel: ? SyS_exit+0x13/0x20
    Mar 11 20:52:44 phobos kernel: ret_from_fork+0x22/0x40
    Mar 11 20:52:44 phobos kernel: Code: b8 ff ff 00 00 c3 66 90 66 2e 0f 1f 84 00 00 00 00 00 48 81 ff ff ff 03 00 77 0e 48 81 ff 00 00 01 00 76 08 0f b7 d7 ed c3 8b 07 <c3> 48 c7 c6 69 5d 0d 82 e8 2a ff ff ff b8 f
    Mar 11 20:52:45 phobos abrt-dump-journal-oops[1281]: abrt-dump-journal-oops: Found oopses: 1
    Mar 11 20:52:45 phobos abrt-dump-journal-oops[1281]: abrt-dump-journal-oops: Creating problem directories
    Mar 11 20:52:46 phobos abrt-dump-journal-oops[1281]: Reported 1 kernel oopses to Abrt
    Mar 11 20:52:49 phobos abrt-server[11969]: Can't find a meaningful backtrace for hashing in '.'
    Mar 11 20:52:49 phobos abrt-server[11969]: Option 'DropNotReportableOopses' is not configured
    Mar 11 20:52:49 phobos abrt-server[11969]: Preserving oops '.' because DropNotReportableOopses is 'no'
    Mar 11 20:52:50 phobos abrt-notification[11988]: System encountered a non-fatal error in ??()
    Mar 11 20:53:03 phobos kernel: watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [wpa_supplicant:1349]
    -- Reboot --


    I'm suspecting either my wireless card is incompatible with the new hardware or my power supply. Although neither of these were an issue a few hours before I swapped in the new parts.

    Since then I've updated the BIOS to the latest and turned off automatic c-states. Doing so has reduced the frequency of these issues from multiple times a day to about once a day. I also ran the full suite of memtest86+ on my new ram just in case and it found zero errors. I've not yet filed a bug report yet but I'm willing to do so if you think it would help. I don't think this is a Fedora bug because I've tried a couple other distros to the same effect (Ubuntu, OpenSUSE). Is this a bug or faulty/incompatible hardware?

  7. #7
    Join Date
    Dec 2012
    Location
    santa barbara, CA
    Posts
    918

    Re: CPU Hardware Error

    my $0.02:

    in my new AMD Epyc server, and my new Ryzen home PC, I have:

    1. modified /etc/default/grub and added acpi=off to the CMDLIN...
    and ran grub2-mkconfig -o /boot/grub2/grub.cfg

    2. I blacklisted the i2c_piix4 module (intel chipsets), by creating file: /etc/modprobe.d/i2c_piix4.conf and adding "blacklist i2c_piix4" in it.

    hope this helps.
    "monsters John ... monsters from the ID..."
    "ma vule teva maar gul nol naya"

  8. #8
    Join Date
    Oct 2006
    Posts
    25

    Re: CPU Hardware Error

    Update to my previous post. I've now swapped out my PSU and I'm still getting these hangs. I'm now experimenting with disabling the C6 c-state using ZenStates to see if that has any impact.

  9. #9
    Join Date
    Dec 2012
    Location
    santa barbara, CA
    Posts
    918

    Re: CPU Hardware Error

    You may also consider disabling nf_conntrack. I ran into some other web chats where they were talking about it.
    The problem does seem to be network/wifi related.
    Also, you may consider adding "nohpet" to your grub boot config.
    "monsters John ... monsters from the ID..."
    "ma vule teva maar gul nol naya"

  10. #10
    Join Date
    Oct 2006
    Posts
    25

    Re: CPU Hardware Error

    Disabling C6 c-state didn't work nor did making the grub changes in comment 7 and 9. However in my research I stumbled upon Kernel Bug 196683 (Random Soft Lockup on new Ryzen build) implicating some combination of Ryzen CPUs and Atheros wifi chips. I've now swapped in a spare wireless card with a broadcom chipset to see if that resolves the issue. Will report back later.

  11. #11
    Join Date
    Dec 2012
    Location
    santa barbara, CA
    Posts
    918

    Re: CPU Hardware Error

    heya, just spent 2 hours reading the full thread on that link you posted.
    did you try the rcu settings too ?
    also, i saw that some people, even though they disabled C6 in the BIOS, it was actually enabled. And they fixed that with the zenstates,py script.
    "monsters John ... monsters from the ID..."
    "ma vule teva maar gul nol naya"

  12. #12
    Join Date
    Oct 2006
    Posts
    25

    Re: CPU Hardware Error

    Quote Originally Posted by bobx001
    heya, just spent 2 hours reading the full thread on that link you posted.
    did you try the rcu settings too ?
    also, i saw that some people, even though they disabled C6 in the BIOS, it was actually enabled. And they fixed that with the zenstates,py script.
    I've not tried the RCU settings yet, will try after my experiment with my broadcom wifi card. Neither disabling C6 in the bios nor via the zenstates.py script solved the issue with my Atheros card.

  13. #13
    Join Date
    Oct 2006
    Posts
    25

    Re: CPU Hardware Error

    I'm now testing out the RCU settings mentioned in the bug because I hit the hang again yesterday. For reference I have added rcu_nocbs=0-15 processor.max_cstate=5 to my default GRUB_CMDLINE_LINUX. So far so good but will report back later.

Similar Threads

  1. Booting hardware error
    By SubahitEditor in forum Using Fedora
    Replies: 1
    Last Post: 5th January 2018, 11:46 AM
  2. Boot error with new hardware
    By vyrtec in forum Using Fedora
    Replies: 2
    Last Post: 18th February 2011, 02:25 AM
  3. Hardware Error
    By moewilly in forum Hardware & Laptops
    Replies: 0
    Last Post: 10th December 2008, 11:53 PM
  4. Replies: 0
    Last Post: 22nd June 2005, 08:57 PM
  5. initializing hardware error
    By memo in forum Installation, Upgrades and Live Media
    Replies: 2
    Last Post: 13th January 2005, 03:32 AM

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •