Diagnosing hardware crashing problem
FedoraForum.org - Fedora Support Forums and Community
Results 1 to 8 of 8
  1. #1
    Join Date
    Jul 2012
    Location
    Chicago
    Posts
    76
    Linux (Fedora) Safari 11.0

    Diagnosing hardware crashing problem

    Since I installed my new motherboard and processor:

    • ASUS Prime B350-Plus motherboard
      AMD Ryzen 5 1600, 6 core/12 thread Processor



    I have been having spontaneous reboots (maybe every other day).
    When I log on after one of these reboots, there is a
    dialog box on the desktop that offers to mail some information
    regarding this crash (presumably to the Fedora organization)
    but when I try to initiate this procedure, I get the message
    that the crash was caused because of a hardware problem.


    After each of these spontaneous crashes and reboots,
    I have been saving the following information:

    • /var/log/messages
      /var/log/boot.log
      output of dmesg
      output of journalctl


    so that I can try to diagnose this issue.

    Now, when one of these crash/reboots occurs, during the reboot
    some messages about a "Hardware Error" are displayed in
    the left hand corner of the screen. These messages
    appear and then disappear too quickly for me to write
    then down, but in the output of dmesg just after one
    of the crash/reboots are the following lines:

    [message]
    [ 0.031021] mce: [Hardware Error]: Machine check events logged
    [ 0.031023] mce: [Hardware Error]: CPU 3: Machine Check: 0 Bank 0: baa0000000060165
    [ 0.031027] mce: [Hardware Error]: TSC 0 MISC d012000101000000 SYND 2d032500 IPID b000000000
    [ 0.031031] mce: [Hardware Error]: PROCESSOR 2:800f11 TIME 1540830475 SOCKET 0 APIC 3 microcode 8001129
    [/message]

    The above lines _only_ occur in the dmesg output _after_ one of these
    spontaneous crash/reboots, _not_ when I reboot with the command

    shutdown -r now

    And so now, I wonder, are these lines in the dmesg output the key?
    Are these lines indicating that my processor is bad?


    Also:

    - In the information I have been saving after these crash/reboots,
    what else might I be checking for?

    - What other information should I possibly look at and/or save
    after one of these crash/reboots?

    - Do people have other ideas?


    Thank you very much for your help.

  2. #2
    Join Date
    Jul 2012
    Location
    Chicago
    Posts
    76
    Linux (Fedora) Safari 11.0

    Re: Diagnosing hardware crashing problem

    My "inxi -Fxmz" output:

    Code:
    System:    Host: computer-room01 Kernel: 4.18.16-100.fc27.x86_64 x86_64 bits: 64 compiler: gcc v: 7.3.1 
               Desktop: Xfce 4.12.4 Distro: Fedora release 27 (Twenty Seven) 
    Machine:   Type: Desktop Mobo: ASUSTeK model: PRIME B350-PLUS v: Rev X.0x serial: <filter> 
               UEFI: American Megatrends v: 0902 date: 09/08/2017 
    Memory:    RAM: total: 15.66 GiB used: 3.03 GiB (19.4%) 
               Array-1: capacity: 64 GiB slots: 4 EC: None max module size: 16 GiB note: est. 
               Device-1: DIMM_A1 size: No Module Installed 
               Device-2: DIMM_A2 size: 8 GiB speed: 2133 MT/s type: DDR4 
               Device-3: DIMM_B1 size: No Module Installed 
               Device-4: DIMM_B2 size: 8 GiB speed: 2133 MT/s type: DDR4 
    CPU:       Topology: 6-Core model: AMD Ryzen 5 1600 bits: 64 type: MT MCP arch: Zen rev: 1 L2 cache: 3072 KiB 
               flags: lm nx pae sse sse2 sse3 sse4_1 sse4_2 sse4a ssse3 svm bogomips: 76647 
               Speed: 1375 MHz min/max: 1550/3200 MHz Core speeds (MHz): 1: 1375 2: 1371 3: 1373 4: 1372 5: 1375 
               6: 1375 7: 1374 8: 1375 9: 1374 10: 1373 11: 1374 12: 1373 
    Graphics:  Device-1: NVIDIA GT218 [GeForce 210] driver: nouveau v: kernel bus ID: 22:00.0 
               Display: server: Fedora Project X.org 1.19.6 driver: modesetting unloaded: fbdev,vesa 
               resolution: 1920x1080~60Hz 
               OpenGL: renderer: NVA8 v: 3.3 Mesa 17.3.9 direct render: Yes 
    Audio:     Device-1: VIA ICE1712 [Envy24] PCI Multi-Channel I/O driver: snd_ice1712 v: kernel bus ID: 20:00.0 
               Device-2: NVIDIA High Definition Audio driver: snd_hda_intel v: kernel bus ID: 22:00.1 
               Device-3: Advanced Micro Devices [AMD] Family 17h HD Audio driver: snd_hda_intel v: kernel 
               bus ID: 24:00.3 
               Device-4: Creative Live! Cam Chat HD [VF0700] type: USB 
               driver: hid-generic,snd-usb-audio,usbhid,uvcvideo bus ID: 1-7:2 
               Sound Server: ALSA v: k4.18.16-100.fc27.x86_64 
    Network:   Device-1: Realtek RTL8111/8168/8411 PCI Express Gigabit Ethernet driver: r8169 v: 2.3LK-NAPI 
               port: e000 bus ID: 1e:00.0 
               IF: enp30s0 state: up speed: 1000 Mbps duplex: full mac: <filter> 
    Drives:    Local Storage: total: 931.51 GiB used: 494.52 GiB (53.1%) 
               ID-1: /dev/sda vendor: Western Digital model: WD1002FAEX-007BA0 size: 931.51 GiB temp: 35 C 
    Partition: ID-1: / size: 72.83 GiB used: 12.55 GiB (17.2%) fs: ext4 dev: /dev/sda6 
               ID-2: /boot size: 3.60 GiB used: 192.5 MiB (5.2%) fs: ext4 dev: /dev/sda2 
               ID-3: swap-1 size: 29.80 GiB used: 0 KiB (0.0%) fs: swap dev: /dev/sda7 
    Sensors:   System Temperatures: cpu: 28.4 C mobo: N/A gpu: nouveau temp: 58 C 
               Fan Speeds (RPM): cpu: 0 
    Info:      Processes: 297 Uptime: 1h 09m Init: systemd runlevel: 5 Compilers: gcc: 7.3.1 Shell: bash v: 4.4.23 
               inxi: 3.0.26

  3. #3
    Join Date
    Jun 2005
    Location
    Montreal, Que, Canada
    Posts
    5,477
    Linux Firefox 63.0

    Re: Diagnosing hardware crashing problem

    Is your CPU fan running? Your printout shows zero
    Sensors: System Temperatures: cpu: 28.4 C mobo: N/A gpu: nouveau temp: 58 C
    Fan Speeds (RPM): cpu: 0
    Leslie in Montreal

    Interesting web sites list
    http://forums.fedoraforum.org/showth...40#post1697840

  4. #4
    Join Date
    Dec 2013
    Location
    United Kingdom
    Posts
    6,497
    Linux (Ubuntu) Firefox 63.0

    Re: Diagnosing hardware crashing problem

    refer to this thread about c-state and see if the fixes work for you. apart from that though it wouldn't do any harm to do the following. F27 will be end of life in near enough 30 days time anyway.

    update to the latest available firmware build for the motherboard which will contain the newest agesa code for the ryzen cpu internals but also try using Fedora 29 with the newer builds of hardware stacks, drivers, kernel 4.19, firmware modules and the latest amd-ucode

    Version 4023 2018/09/148.01 MBytesPRIME B350-PLUS BIOS 4023
    1. Improve system compatibility

    as for the zero fan speed reading, check what the UEFI firmware fan control readings are and ignore inxi for now. it may not be able to properly detect the CPU fan. the temperature itself indicates heat is not an issue, that's the normal operating window for that CPU. I have the same model CPU and chipset combination although mine's an MSI motherboard and I don't yet run linux with it

  5. #5
    Join Date
    Jul 2012
    Location
    Chicago
    Posts
    76
    Linux (Fedora) Firefox 63.0

    Re: Diagnosing hardware crashing problem

    Thank you very much for your response. Sorry I was
    away for a while. antikythera writes:

    refer to this thread
    [https://forums.fedoraforum.org/showt...t=ryzen+crash]
    about c-state and see if the fixes work for you.
    I did not actually see any suggested fixes in that thread.
    Did I miss something?

    update to the latest available firmware build for the motherboard which will
    contain the newest agesa code for the ryzen cpu internals but also try using
    Fedora 29 with the newer builds of hardware stacks, drivers, kernel 4.19,
    firmware modules and the latest amd-ucode
    Ok, I am making plans to upgrade to Fedora 29. As far as the lastest firmware
    for the motherboard, I found an update of the UEFI bios at Asus, but
    nothing else.

    And the dmesg output lines after a crash that read:

    Code:
      [    0.031021] mce: [Hardware Error]: Machine check events logged
      [    0.031023] mce: [Hardware Error]: CPU 3: Machine Check: 0 Bank 0: baa0000000060165
      [    0.031027] mce: [Hardware Error]: TSC 0 MISC d012000101000000 SYND 2d032500 IPID b000000000
      [    0.031031] mce: [Hardware Error]: PROCESSOR 2:800f11 TIME 1540830475 SOCKET 0 APIC 3 microcode 8001129
    Those are not an indication of a faulty processor ?

    Thanks again.

  6. #6
    Join Date
    Dec 2013
    Location
    United Kingdom
    Posts
    6,497
    Linux Chrome 70.0.3538.77

    Re: Diagnosing hardware crashing problem

    Did I miss something? set the global c-state to disabled instead of auto

    I found an update of the UEFI bios - yes that is what I am referring to as the firmware build

  7. #7
    Join Date
    Dec 2012
    Location
    santa barbara, CA
    Posts
    944
    Linux (Fedora) Firefox 63.0

    Re: Diagnosing hardware crashing problem

    Hey man,

    I have a very similar combination:
    Code:
    System:    Host: nova Kernel: 4.18.16-200.fc28.x86_64 x86_64 bits: 64 compiler: gcc v: 8.2.1 
               Desktop: Xfce 4.12.4 Distro: Fedora release 28 (Twenty Eight) 
    Machine:   Type: Desktop Mobo: ASUSTeK model: PRIME B350M-A v: Rev X.0x serial: <filter> 
               UEFI [Legacy]: American Megatrends v: 4014 date: 05/11/2018 
    Memory:    RAM: total: 31.41 GiB used: 5.34 GiB (17.0%) 
               Array-1: capacity: 256 GiB note: check slots: 4 EC: None max module size: 64 GiB note: est. 
               Device-1: DIMM_A1 size: 8 GiB speed: 2133 MT/s type: DDR4 
               Device-2: DIMM_A2 size: 8 GiB speed: 2133 MT/s type: DDR4 
               Device-3: DIMM_B1 size: 8 GiB speed: 2133 MT/s type: DDR4 
               Device-4: DIMM_B2 size: 8 GiB speed: 2133 MT/s type: DDR4 
    CPU:       Topology: 6-Core model: AMD Ryzen 5 1600X bits: 64 type: MT MCP arch: Zen rev: 1 L2 cache: 3072 KiB 
               flags: lm nx pae sse sse2 sse3 sse4_1 sse4_2 sse4a ssse3 svm bogomips: 86237 
               Speed: 2196 MHz min/max: 2200/3600 MHz Core speeds (MHz): 1: 2195 2: 2189 3: 3180 4: 3192 5: 2195 
               6: 2194 7: 2196 8: 2196 9: 2128 10: 2129 11: 2591 12: 2137 
    Graphics:  Device-1: NVIDIA GP106 [GeForce GTX 1060 6GB] driver: nvidia v: 410.73 bus ID: 09:00.0 
               Display: server: Fedora Project X.org 1.19.6 driver: nvidia 
               resolution: 1920x1080~60Hz, 2560x1440~60Hz, 2560x1440~60Hz 
               OpenGL: renderer: GeForce GTX 1060 6GB/PCIe/SSE2 v: 4.6.0 NVIDIA 410.73 direct render: Yes 
    Audio:     Device-1: NVIDIA GP106 High Definition Audio driver: snd_hda_intel v: kernel bus ID: 09:00.1 
               Device-2: Advanced Micro Devices [AMD] Family 17h HD Audio driver: snd_hda_intel v: kernel 
               bus ID: 0b:00.3 
               Sound Server: ALSA v: k4.18.16-200.fc28.x86_64 
    Network:   Device-1: Realtek RTL8111/8168/8411 PCI Express Gigabit Ethernet driver: r8169 v: 2.3LK-NAPI 
               port: f000 bus ID: 07:00.0 
               IF: enp7s0 state: up speed: 1000 Mbps duplex: full mac: <filter> 
               IF-ID-1: bridge0 state: up speed: N/A duplex: N/A mac: <filter> 
    Drives:    Local Storage: total: 13.42 TiB used: 7.98 TiB (59.5%) 
               ID-1: /dev/nvme0n1 vendor: Samsung model: SSD 960 EVO 250GB size: 232.89 GiB 
               ID-2: /dev/sda vendor: SanDisk model: Ultra II 500GB size: 465.76 GiB temp: 34 C 
               ID-3: /dev/sdb vendor: Seagate model: ST2000LM015-2E8174 size: 1.82 TiB temp: 31 C 
               ID-4: /dev/sdc vendor: Seagate model: ST8000VN0022-2EL112 size: 7.28 TiB 
               ID-5: /dev/sdd vendor: Seagate model: ST4000DX001-1CE168 size: 3.64 TiB temp: 37 C 
               ID-6: /dev/sdf type: USB vendor: Transcend model: JetFlash TS2GJFT3 size: 1.91 GiB 
    Partition: ID-1: / size: 56.46 GiB used: 19.82 GiB (35.1%) fs: ext4 dev: /dev/nvme0n1p2 
               ID-2: /home size: 3.48 TiB used: 3.11 TiB (89.4%) fs: ext4 dev: /dev/dm-7 
               ID-3: swap-1 size: 32.23 GiB used: 10.5 MiB (0.0%) fs: swap dev: /dev/nvme0n1p1 
    Sensors:   System Temperatures: cpu: 39.0 C mobo: N/A gpu: nvidia temp: 47 C 
               Fan Speeds (RPM): cpu: 0 gpu: nvidia fan: 27% 
    Info:      Processes: 502 Uptime: 4d 11h 27m Init: systemd runlevel: 5 Compilers: gcc: 8.2.1 Shell: bash 
               v: 4.4.23 inxi: 3.0.26
    The Fan speed also says 0, however since I have a large glass case (Corsair 570X), I can clearly see the fan working fine.
    My box is super stable as follows:
    1. I boot legacy not UEFI.
    2. My /etc/default/grub says: GRUB_CMDLINE_LINUX="rd.driver.blacklist=nouveau modprobe.blacklist=nouveau nvidia-drm.modeset=1 rhgb quiet selinux=0 nmi_watchdog=0 nohpet pci=biosirq rcu_nocbs=0-11"
    (I use NVIDIA, as seen above in the inxi)
    3. I will post my BIOS settings if you need them, but I know my Virtualization is enabled, and my power profile is in "Standard/Typical", and I don't do any overclocking.

    If you like to read: I have participated in a looong running discussion about Ryzen lockups: https://bugzilla.kernel.org/show_bug.cgi?id=196683

    Basically, what it all boils down to is that when the CPU is idle, the power (the Voltage actually) it "wants" goes down so much as to starve the processor, and it stalls.
    A good "C6-capable" power supply is most definitely recommended, and by disabling the C6 states with some proggy called "c6state.py" , it then basically tells the processor to never go into idle (coma) mode.

    I do have a good power supply which came with my Corsair case, but I don't do anything with the C6 states. i.e. I have not modified anything in that respect. And I have a very stable box, last uptime was like 70 days, and I only rebooted after a dnf upgrade.

    EDIT: Since this is an MCE (Machine Check Exception) issue, I would definitely recommend updating your Motherboard BIOS first, I think you can do this directly from the BIOS screens, using the EZFlash utility which is part of the BIOS, just need to configure your network , and voilla. Then go into the BIOS and make sure all your DRAM timinig options are "default" or "optimal" or "auto".
    EDIT #2: change your RAM to the other banks, not A2/B2 like the motherboard manual says, but A1/B1.
    Last edited by bobx001; Yesterday at 01:45 PM.
    "monsters John ... monsters from the ID..."
    "ma vule teva maar gul nol naya"

  8. #8
    Join Date
    Jul 2012
    Location
    Chicago
    Posts
    76
    Linux (Fedora) Firefox 63.0

    Re: Diagnosing hardware crashing problem

    Quote Originally Posted by antikythera
    set the global c-state to disabled instead of auto
    Thanks ! -----------------------------

Similar Threads

  1. Fedora 19 - Hardware Crashing!!
    By craigfedora in forum Hardware & Laptops
    Replies: 2
    Last Post: 12th September 2013, 08:25 AM
  2. [SOLVED]
    Help needed in diagnosing a login problem
    By rajat152 in forum Using Fedora
    Replies: 5
    Last Post: 26th February 2013, 06:04 PM
  3. Need help diagnosing hardware issue
    By jdelisle in forum Hardware & Laptops
    Replies: 8
    Last Post: 1st March 2009, 12:58 AM
  4. Diagnosing Crashes
    By kurtg in forum Using Fedora
    Replies: 1
    Last Post: 27th March 2007, 06:04 PM
  5. Replies: 1
    Last Post: 1st August 2006, 11:12 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •