[SOLVED] first server error,reboot , what is this UUID ?
FedoraForum.org - Fedora Support Forums and Community
Page 1 of 2 1 2 LastLast
Results 1 to 15 of 23
  1. #1
    Join Date
    Dec 2012
    Location
    santa barbara, CA
    Posts
    918

    first server error,reboot , what is this UUID ?

    Hello guys,

    just got my first crash/reboot on the large new server doing extensive testing,
    Below is the DMESG error, after the reboot. Question is, what does this UUID: dc3ea0b0-a144-4797-b95b-53fa242b6e1d mean ? it's not in the blkid list. Could it be a DIMM ?

    Code:
    Mar 10 15:11:51 boa kernel: [Hardware Error]: event severity: info
    Mar 10 15:11:51 boa kernel: [Hardware Error]:  Error 0, type: fatal
    Mar 10 15:11:51 boa kernel: [Hardware Error]:  fru_text: ProcessorError
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   section type: unknown, dc3ea0b0-a144-4797-b95b-53fa242b6e1d
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   section length: 0xd0
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   00000000: 00000007 00000000 00000011 00000000  ................
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   00000010: 00800f12 00000000 11200800 00000000  .......... .....
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   00000020: 76d8320b 00000000 178bfbff 00000000  .2.v............
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   00000030: a55701f5 43dee3ef 9b2472ac 2cad3f57  ..W....C.r$.W?.,
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   00000040: 00000001 00000000 0602001f 00000000  ................
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   00000050: 00000000 00000000 00000000 00000000  ................
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   00000060: 00000000 00000000 00000000 00000000  ................
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   00000070: 00500002 00000414 00000000 00000000  ..P.............
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   00000080: 00000005 00000000 00000108 bea00000  ................
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   00000090: c0f2e150 0000555d 00000000 00000000  P...]U..........
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   000000a0: 00000000 00000000 00000031 00000003  ........1.......
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   000000b0: 4d000000 00000000 00000000 d0140ff6  ...M............
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   000000c0: 00000000 00000000 00000000 000500b0  ................
    Mar 10 15:11:51 boa kernel: [Hardware Error]:  Error 1, type: fatal
    Mar 10 15:11:51 boa kernel: [Hardware Error]:  fru_text: ProcessorError
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   section type: unknown, dc3ea0b0-a144-4797-b95b-53fa242b6e1d
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   section length: 0xd0
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   00000000: 00000007 00000000 00000019 00000000  ................
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   00000010: 00800f12 00000000 19200800 00000000  .......... .....
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   00000020: 76d8320b 00000000 178bfbff 00000000  .2.v............
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   00000030: a55701f5 43dee3ef 9b2472ac 2cad3f57  ..W....C.r$.W?.,
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   00000040: 00000001 00000000 0602001f 00000000  ................
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   00000050: 00000000 00000000 00000000 00000000  ................
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   00000060: 00000000 00000000 00000000 00000000  ................
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   00000070: 00500002 00000414 00000000 00000000  ..P.............
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   00000080: 00000005 00000000 00000108 bea00000  ................
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   00000090: b28a1b47 0101ffff 00000000 00000000  G...............
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   000000a0: 00000000 00000000 00000031 00000003  ........1.......
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   000000b0: 4d000000 00000000 00000000 d0140ff6  ...M............
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   000000c0: 00000000 00000000 00000000 000500b0  ................
    Mar 10 15:11:51 boa kernel: [Hardware Error]:  Error 2, type: fatal
    Mar 10 15:11:51 boa kernel: [Hardware Error]:  fru_text: ProcessorError
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   section type: unknown, dc3ea0b0-a144-4797-b95b-53fa242b6e1d
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   section length: 0xd0
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   00000000: 00000007 00000000 00000020 00000000  ........ .......
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   00000010: 00800f12 00000000 20200800 00000000  ..........  ....
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   00000020: 76d8320b 00000000 178bfbff 00000000  .2.v............
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   00000030: 48ab7f57 4f6cdc34 b5b0d3a7 1443a7b0  W..H4.lO......C.
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   00000040: 00000001 00000000 00980027 00000000  ........'.......
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   00000050: 00000000 00000000 00000000 00000000  ................
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   00000060: 00000000 00000000 00000000 00000000  ................
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   00000070: 00500002 00000458 00000000 00000000  ..P.X...........
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   00000080: 00000016 00000000 0000080b faa00000  ................
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   00000090: 00000000 00000000 00000000 00000000  ................
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   000000a0: 00000000 00000000 00000035 00000007  ........5.......
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   000000b0: 5d000000 00000000 00000000 d0140ff7  ...]............
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   000000c0: 00000000 00000000 00000002 0001002e  ................
    Mar 10 15:11:51 boa kernel: [Hardware Error]:  Error 3, type: fatal
    Mar 10 15:11:51 boa kernel: [Hardware Error]:  fru_text: ProcessorError
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   section type: unknown, dc3ea0b0-a144-4797-b95b-53fa242b6e1d
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   section length: 0xd0
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   00000000: 00000007 00000000 00000030 00000000  ........0.......
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   00000010: 00800f12 00000000 30200800 00000000  .......... 0....
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   00000020: 76d8320b 00000000 178bfbff 00000000  .2.v............
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   00000030: 48ab7f57 4f6cdc34 b5b0d3a7 1443a7b0  W..H4.lO......C.
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   00000040: 00000001 00000000 00180007 00000000  ................
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   00000050: 00000000 00000000 00000000 00000000  ................
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   00000060: 00000000 00000000 00000000 00000000  ................
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   00000070: 00500002 00000458 00000000 00000000  ..P.X...........
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   00000080: 00000016 00000000 0000080b baa00000  ................
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   00000090: 00000000 00000000 00000000 00000000  ................
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   000000a0: 00000000 00000000 00000035 00000007  ........5.......
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   000000b0: 5d000000 00000000 00000000 d0140ff6  ...]............
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   000000c0: 00000000 00000000 00000002 0001002e  ................
    Mar 10 15:11:51 boa kernel: [Hardware Error]:  Error 4, type: fatal
    Mar 10 15:11:51 boa kernel: [Hardware Error]:  fru_text: ProcessorError
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   section type: unknown, dc3ea0b0-a144-4797-b95b-53fa242b6e1d
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   section length: 0xd0
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   00000000: 00000007 00000000 00000040 00000000  ........@.......
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   00000010: 00800f12 00000000 40200800 00000000  .......... @....
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   00000020: 76d8320b 00000000 178bfbff 00000000  .2.v............
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   00000030: 48ab7f57 4f6cdc34 b5b0d3a7 1443a7b0  W..H4.lO......C.
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   00000040: 00000001 00000000 00980027 00000000  ........'.......
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   00000050: 00000000 00000000 00000000 00000000  ................
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   00000060: 00000000 00000000 00000000 00000000  ................
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   00000070: 00500002 00000458 00000000 00000000  ..P.X...........
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   00000080: 00000016 00000000 0000080b faa00000  ................
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   00000090: 00000000 00000000 00000000 00000000  ................
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   000000a0: 00000000 00000000 00000035 00000007  ........5.......
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   000000b0: 5d000000 00000000 00000000 d0140ff7  ...]............
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   000000c0: 00000000 00000000 00000002 0001002e  .
    .
    .
    .
    Mar 10 15:11:51 boa kernel:  Magic number: 10:937:183
    "monsters John ... monsters from the ID..."
    "ma vule teva maar gul nol naya"

  2. #2
    Join Date
    Dec 2012
    Location
    santa barbara, CA
    Posts
    918

    Re: first server error,reboot , what is this UUID ?

    well, the only "relatively similar" errors I could find on the net are mostly related to UEFI and ACPI

    I have decided to add the acpi=off into the grub boot string. We shall see how that fares.

    I know my bro had a hell of a time with UEFI to get the machine to boot from one of the SSD disks. The system kept offering only the NVMEs for boot, or one of the SATA hard drives connected. But he finally got it to work, unfortunately we shall never know how he did it lol. I guess I'll get a NOC tech to take photos of all the BIOS settings, and send them to me.
    "monsters John ... monsters from the ID..."
    "ma vule teva maar gul nol naya"

  3. #3
    Join Date
    Dec 2012
    Location
    santa barbara, CA
    Posts
    918

    Re: first server error,reboot , what is this UUID ?

    BTW, memtest86 won't run with UEFI crap, so I am running memtester with booted FC27 at the moment, I allocated 125GB of ram to it, and doing a single pass. It's been working like 12 hours so far, and no errors.

    EDIT: I now launched 10 memtesters with 12000MB each, and one with 7000MB , leaving 1GB of ram for the OS /etc.
    this seems to be MUUUUUUUUUCH quicker.
    Click image for larger version. 

Name:	memtesters.png 
Views:	18 
Size:	126.0 KB 
ID:	29460


    UPDATE: memtesters finished without error, so I am inclined to think it was ACPI or the NVMEs set up in raid mode.
    Click image for larger version. 

Name:	memtesters_success.png 
Views:	14 
Size:	109.8 KB 
ID:	29461
    Last edited by bobx001; 11th March 2018 at 01:42 PM.
    "monsters John ... monsters from the ID..."
    "ma vule teva maar gul nol naya"

  4. #4
    Join Date
    Oct 2006
    Location
    CN99CF Agassiz BC Canada
    Posts
    397

    Re: first server error,reboot , what is this UUID ?

    Perhaps 'sudo updatedb' followed by 'locate dc3ea0b0-a144-4797-b95b-53fa242b6e1d'. This should give some idea of what type of device it is, based on its location in the file tree.
    -----
    f26 x86_64 Acer Predator G5910 Quad core Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz

  5. #5
    Join Date
    Dec 2012
    Location
    santa barbara, CA
    Posts
    918

    Re: first server error,reboot , what is this UUID ?

    cheers bro, but no luck, however I have been through many reboots.
    Code:
    [root@boa NVME1]# updatedb
    [root@boa NVME1]# locate dc3ea0b0-a144-4797-b95b-53fa242b6e1d
    zilch
    If this happens again, upon the next boot of the box I will do precisely that, and report here.
    "monsters John ... monsters from the ID..."
    "ma vule teva maar gul nol naya"

  6. #6
    Join Date
    Dec 2012
    Location
    santa barbara, CA
    Posts
    918

    acpi=off , very interesting performance effect

    Hello guys,

    still hunting for that Hardware Error, which happened once again (server running 2 VM's on Qemu: FC27 32bit, and Solaris 10) and postgres on the server itself. And it happened again. This one:
    https://forums.fedoraforum.org/showt...t-is-this-UUID
    (with the same UUID !, tried looking for it in the /dev and /sys, to no avail)

    So I turned off acpi and also HPET, with acpi=off and nohpet, and interestingly instead of 64 "cpu siblings", it went down to 32.

    The funny thing is that with acpi=off, there is a batch postgres proggy which calculates some heavy stuff, and it was running in 22 minutes. As soon as I took acpi=off from grub.cfg again, to get back my 64 "cpus", the proggy takes 28 minutes to run.

    nohpet is still there, and for the moment I have turned off both QEMUs, to see if it maybe that what causes the crashing (maybe the solaris one?).

    anyway, interesting that with acpi=off and losing half the cores, postgres runs quicker.
    "monsters John ... monsters from the ID..."
    "ma vule teva maar gul nol naya"

  7. #7
    Join Date
    Dec 2013
    Location
    United Kingdom
    Posts
    6,277

    Re: first server error,reboot , what is this UUID ?

    threads merged, please don't open more than one thread at a time about the same issue

  8. #8
    Join Date
    Dec 2012
    Location
    santa barbara, CA
    Posts
    918

    Re: first server error,reboot , what is this UUID ?

    zorry guys, but I thought since my latest post was just an acpi -> affects postgres issue, that it was different.

    Anyway, Update on this so elusive an error: I can successfully say that it only happens when KVM/QEMU is running (I had 2, an FC27 32bit, and a Solaris 10). I will try and launch just one of them to see if I can identify the culprit (willing to bet it's a Solaris+QEMU issue)

    Supermicro replied to my query about this, and they told me that if in the IPMI logs I do not see any hardware error, then it must be linked to the OS running the box. (and since FC27 is not certified, then it maybe that).

    I shall update on the next step in the investigation.
    "monsters John ... monsters from the ID..."
    "ma vule teva maar gul nol naya"

  9. #9
    Join Date
    Dec 2012
    Location
    santa barbara, CA
    Posts
    918

    Re: first server error,reboot , what is this UUID ?

    getting closer to the source of the problem.

    One thing I have noticed today, is that I have an old application, which needs to run on 32bits, so I created a QEMU guest with FC27-32bit, and voilla, it would not start. Actually the guest would just freeze. The funny issue is that I would do exactly the same in my Intel-based laptop, and it runs !
    So, then I went into the Guest machine settings, and instead of choosing an Opteron G3 (default), I chose a Nehalem, and voilla.... app runs ! Go figure....

    which leads me to the following thought:
    I am going to test the Solaris 10 guest now with different CPU configs, to see if the server crashes again, and hey, if I can get it to run for at least a few days, we may have something there.
    "monsters John ... monsters from the ID..."
    "ma vule teva maar gul nol naya"

  10. #10
    Join Date
    Dec 2012
    Location
    santa barbara, CA
    Posts
    918

    Re: first server error,reboot , what is this UUID ?

    Quick update on this. It is not the solaris KVM/QEMU guest. It was not running, but the problem happened again, same exact error, with a very loaded FC27/32bit KVM/QEMU guest + postgres + rsync.

    At this point, I will test the server with Centos 7.4, which is at least certified by Supermicro, and see what mileage I get.
    "monsters John ... monsters from the ID..."
    "ma vule teva maar gul nol naya"

  11. #11
    Join Date
    May 2018
    Location
    Tokyo
    Posts
    1

    Re: first server error,reboot , what is this UUID ?

    Have you had any luck with centos?

  12. #12
    Join Date
    Jun 2018
    Location
    greece
    Posts
    4

    Re: first server error,reboot , what is this UUID ?

    I have encountered the same error on 2 Epyc servers with Supermicro motherboards running VMware workstation.
    The motherboard model is H11SSL-i, CPU Epyc 7301 and the O/S is installed on an Nvme M.2. Booting through UEFI.

    Both of the machines are based on Ubuntu 18.04 and the error/reboot was encountered during execution of Windows 7 64bit VMs on the VMWare Workstation.

    Have you had any luck in solving the problem ?
    I am planning to test the VM on another Epyc server with different CPU & mobo (also Supermicro), based on Centos 7.4

    Here is the error log from one of the machines, which is almost identical to your log:
    Code:
    [    1.385678] [Hardware Error]:  Error 0, type: fatal
    [    1.385767] [Hardware Error]:  fru_text: ProcessorError
    [    1.385858] [Hardware Error]:   section type: unknown, dc3ea0b0-a144-4797-b95b-53fa242b6e1d
    [    1.385985] [Hardware Error]:   section length: 0xd0
    [    1.386076] [Hardware Error]:   00000000: 00000007 00000000 00000000 00000000  ................
    [    1.386206] [Hardware Error]:   00000010: 00800f12 00000000 00200800 00000000  .......... .....
    [    1.386336] [Hardware Error]:   00000020: 76d8320b 00000000 178bfbff 00000000  .2.v............
    [    1.387728] [Hardware Error]:   00000030: 48ab7f57 4f6cdc34 b5b0d3a7 1443a7b0  W..H4.lO......C.
    [    1.387857] [Hardware Error]:   00000040: 00000001 00000000 00180007 00000000  ................
    [    1.387987] [Hardware Error]:   00000050: 00000000 00000000 00000000 00000000  ................
    [    1.388139] [Hardware Error]:   00000060: 00000000 00000000 00000000 00000000  ................
    [    1.388270] [Hardware Error]:   00000070: 00500002 00000458 00000000 00000000  ..P.X...........
    [    1.388400] [Hardware Error]:   00000080: 00000016 00000000 0000080b baa00000  ................
    [    1.388529] [Hardware Error]:   00000090: 00000000 00000000 00000000 00000000  ................
    [    1.388658] [Hardware Error]:   000000a0: 00000000 00000000 00000035 00000007  ........5.......
    [    1.388787] [Hardware Error]:   000000b0: 5d000000 00000000 00000000 d0140ff6  ...]............
    [    1.388917] [Hardware Error]:   000000c0: 00000000 00000000 00000002 0001002e  ................
    [    1.389045] [Hardware Error]:  Error 1, type: fatal
    [    1.389134] [Hardware Error]:  fru_text: ProcessorError
    [    1.389224] [Hardware Error]:   section type: unknown, dc3ea0b0-a144-4797-b95b-53fa242b6e1d
    [    1.389351] [Hardware Error]:   section length: 0xd0
    [    1.389441] [Hardware Error]:   00000000: 00000007 00000000 00000010 00000000  ................
    [    1.389570] [Hardware Error]:   00000010: 00800f12 00000000 10200800 00000000  .......... .....
    [    1.389699] [Hardware Error]:   00000020: 76d8320b 00000000 178bfbff 00000000  .2.v............
    [    1.389829] [Hardware Error]:   00000030: 48ab7f57 4f6cdc34 b5b0d3a7 1443a7b0  W..H4.lO......C.
    [    1.389958] [Hardware Error]:   00000040: 00000001 00000000 00980027 00000000  ........'.......
    [    1.390087] [Hardware Error]:   00000050: 00000000 00000000 00000000 00000000  ................
    [    1.390216] [Hardware Error]:   00000060: 00000000 00000000 00000000 00000000  ................
    [    1.390346] [Hardware Error]:   00000070: 00500002 00000458 00000000 00000000  ..P.X...........
    [    1.390475] [Hardware Error]:   00000080: 00000016 00000000 0000080b faa00000  ................
    [    1.390604] [Hardware Error]:   00000090: 00000000 00000000 00000000 00000000  ................
    [    1.390733] [Hardware Error]:   000000a0: 00000000 00000000 00000035 00000007  ........5.......
    [    1.390862] [Hardware Error]:   000000b0: 5d000000 00000000 00000000 d0140ff7  ...]............
    [    1.390992] [Hardware Error]:   000000c0: 00000000 00000000 00000002 0001002e  ................
    [    1.391120] [Hardware Error]:  Error 2, type: fatal
    [    1.391209] [Hardware Error]:  fru_text: ProcessorError
    [    1.391299] [Hardware Error]:   section type: unknown, dc3ea0b0-a144-4797-b95b-53fa242b6e1d
    [    1.391426] [Hardware Error]:   section length: 0xd0
    [    1.391515] [Hardware Error]:   00000000: 00000007 00000000 00000013 00000000  ................
    [    1.391645] [Hardware Error]:   00000010: 00800f12 00000000 13200800 00000000  .......... .....
    [    1.391774] [Hardware Error]:   00000020: 76d8320b 00000000 178bfbff 00000000  .2.v............
    [    1.391903] [Hardware Error]:   00000030: a55701f5 43dee3ef 9b2472ac 2cad3f57  ..W....C.r$.W?.,
    [    1.392066] [Hardware Error]:   00000040: 00000001 00000000 0602001f 00000000  ................
    [    1.392195] [Hardware Error]:   00000050: 00000000 00000000 00000000 00000000  ................
    [    1.392324] [Hardware Error]:   00000060: 00000000 00000000 00000000 00000000  ................
    [    1.392454] [Hardware Error]:   00000070: 00500002 00000414 00000000 00000000  ..P.............
    [    1.392583] [Hardware Error]:   00000080: 00000005 00000000 00000108 bea00000  ................
    [    1.392712] [Hardware Error]:   00000090: 578e6f12 00007ff5 00000000 00000000  .o.W............
    [    1.392841] [Hardware Error]:   000000a0: 00000000 00000000 00000031 00000003  ........1.......
    [    1.392971] [Hardware Error]:   000000b0: 4d000000 00000000 00000000 d0140ff6  ...M............
    [    1.393100] [Hardware Error]:   000000c0: 00000000 00000000 00000000 000500b0  ................
    [    1.393229] [Hardware Error]:  Error 3, type: fatal
    [    1.393317] [Hardware Error]:  fru_text: ProcessorError
    [    1.393407] [Hardware Error]:   section type: unknown, dc3ea0b0-a144-4797-b95b-53fa242b6e1d
    [    1.393534] [Hardware Error]:   section length: 0xd0
    [    1.393624] [Hardware Error]:   00000000: 00000007 00000000 00000020 00000000  ........ .......
    [    1.393753] [Hardware Error]:   00000010: 00800f12 00000000 20200800 00000000  ..........  ....
    [    1.393882] [Hardware Error]:   00000020: 76d8320b 00000000 178bfbff 00000000  .2.v............
    [    1.394011] [Hardware Error]:   00000030: 48ab7f57 4f6cdc34 b5b0d3a7 1443a7b0  W..H4.lO......C.
    [    1.394141] [Hardware Error]:   00000040: 00000001 00000000 00980027 00000000  ........'.......
    [    1.394270] [Hardware Error]:   00000050: 00000000 00000000 00000000 00000000  ................
    [    1.394399] [Hardware Error]:   00000060: 00000000 00000000 00000000 00000000  ................
    [    1.394528] [Hardware Error]:   00000070: 00500002 00000458 00000000 00000000  ..P.X...........
    [    1.394657] [Hardware Error]:   00000080: 00000016 00000000 0000080b faa00000  ................
    [    1.394786] [Hardware Error]:   00000090: 00000000 00000000 00000000 00000000  ................
    [    1.394916] [Hardware Error]:   000000a0: 00000000 00000000 00000035 00000007  ........5.......
    [    1.395045] [Hardware Error]:   000000b0: 5d000000 00000000 00000000 d0140ff7  ...]............
    [    1.395174] [Hardware Error]:   000000c0: 00000000 00000000 00000002 0001002e  ................
    [    1.395303] [Hardware Error]:  Error 4, type: fatal
    [    1.395391] [Hardware Error]:  fru_text: ProcessorError
    [    1.395481] [Hardware Error]:   section type: unknown, dc3ea0b0-a144-4797-b95b-53fa242b6e1d
    [    1.395609] [Hardware Error]:   section length: 0xd0
    [    1.395698] [Hardware Error]:   00000000: 00000007 00000000 00000028 00000000  ........(.......
    [    1.395827] [Hardware Error]:   00000010: 00800f12 00000000 28200800 00000000  .......... (....
    [    1.395956] [Hardware Error]:   00000020: 76d8320b 00000000 178bfbff 00000000  .2.v............
    [    1.396108] [Hardware Error]:   00000030: a55701f5 43dee3ef 9b2472ac 2cad3f57  ..W....C.r$.W?.,
    [    1.396237] [Hardware Error]:   00000040: 00000001 00000000 0602001f 00000000  ................
    [    1.396367] [Hardware Error]:   00000050: 00000000 00000000 00000000 00000000  ................
    [    1.396496] [Hardware Error]:   00000060: 00000000 00000000 00000000 00000000  ................
    [    1.396625] [Hardware Error]:   00000070: 00500002 00000414 00000000 00000000  ..P.............
    [    1.396754] [Hardware Error]:   00000080: 00000005 00000000 00000108 bea00000  ................
    [    1.396884] [Hardware Error]:   00000090: 9c71f647 0101ffff 00000000 00000000  G.q.............
    [    1.397013] [Hardware Error]:   000000a0: 00000000 00000000 00000031 00000003  ........1.......
    [    1.397142] [Hardware Error]:   000000b0: 4d000000 00000000 00000000 d0140ff6  ...M............
    [    1.397271] [Hardware Error]:   000000c0: 00000000 00000000 00000000 000500b0  ................
    [    1.397400] [Hardware Error]:  Error 5, type: fatal
    [    1.397488] [Hardware Error]:  fru_text: ProcessorError
    [    1.397578] [Hardware Error]:   section type: unknown, dc3ea0b0-a144-4797-b95b-53fa242b6e1d
    [    1.397706] [Hardware Error]:   section length: 0xd0
    [    1.397795] [Hardware Error]:   00000000: 00000007 00000000 00000032 00000000  ........2.......
    [    1.397924] [Hardware Error]:   00000010: 00800f12 00000000 32200800 00000000  .......... 2....
    [    1.398054] [Hardware Error]:   00000020: 76d8320b 00000000 178bfbff 00000000  .2.v............
    [    1.398183] [Hardware Error]:   00000030: a55701f5 43dee3ef 9b2472ac 2cad3f57  ..W....C.r$.W?.,
    [    1.398312] [Hardware Error]:   00000040: 00000001 00000000 0602001f 00000000  ................
    [    1.398441] [Hardware Error]:   00000050: 00000000 00000000 00000000 00000000  ................
    [    1.398570] [Hardware Error]:   00000060: 00000000 00000000 00000000 00000000  ................
    [    1.398700] [Hardware Error]:   00000070: 00500002 00000414 00000000 00000000  ..P.............
    [    1.398829] [Hardware Error]:   00000080: 00000005 00000000 00000108 bea00000  ................
    [    1.398958] [Hardware Error]:   00000090: 9cf951eb 0101ffff 00000000 00000000  .Q..............
    [    1.399087] [Hardware Error]:   000000a0: 00000000 00000000 00000031 00000003  ........1.......
    [    1.399217] [Hardware Error]:   000000b0: 4d000000 00000000 00000000 d0140ff6  ...M............

  13. #13
    Join Date
    Dec 2012
    Location
    santa barbara, CA
    Posts
    918

    Re: first server error,reboot , what is this UUID ?

    Quote Originally Posted by nterzeni
    I have encountered the same error on 2 Epyc servers with Supermicro motherboards running VMware workstation.
    The motherboard model is H11SSL-i, CPU Epyc 7301 and the O/S is installed on an Nvme M.2. Booting through UEFI.

    Both of the machines are based on Ubuntu 18.04 and the error/reboot was encountered during execution of Windows 7 64bit VMs on the VMWare Workstation.

    Have you had any luck in solving the problem ?
    I am planning to test the VM on another Epyc server with different CPU & mobo (also Supermicro), based on Centos 7.4

    Here is the error log from one of the machines, which is almost identical to your log:
    Code:
    [    1.385678] [Hardware Error]:  Error 0, type: fatal
    [    1.385767] [Hardware Error]:  fru_text: ProcessorError
    [    1.385858] [Hardware Error]:   section type: unknown, dc3ea0b0-a144-4797-b95b-53fa242b6e1d
    [    1.385985] [Hardware Error]:   section length: 0xd0
    .
    .
    .
    Hey man,

    try this first:
    ===============
    Hello Robert,

    It looks like you want to enable the virtualization feature.
    Please go into BIOS,
    Advanced -> NB Configuration -> IOMMU (change to Enabled).
    Advanced -> PCIe/PCI/PnP Configuration -> SR-IOV Support (change to Enabled).

    Hope this takes care of your issue.
    ==================

    If that takes care of it, post here rightaway. So that I may migrate out of Centos7, and back into Fedora , hopefully 28.
    This little fix given to me by a Supermicro dude (a Hero), made my Centos7 server WORK PERFECTLY WELL, without random reboots, and much more performant.

    so I wonder if it's the same issue.

    I hope you got IPMI access (IPMIView available from Supermicro works wonders for me).

    cheersola.
    Last edited by bobx001; 3rd June 2018 at 02:01 PM.
    "monsters John ... monsters from the ID..."
    "ma vule teva maar gul nol naya"

  14. #14
    Join Date
    Dec 2012
    Location
    santa barbara, CA
    Posts
    918

    Re: first server error,reboot , what is this UUID ?

    Quote Originally Posted by yunta
    Have you had any luck with centos?
    check out my reply to nterzeni, with the "fixes" given to me by Supermicro, ONLY after applying them the server works like a charm on Centos7 (before applying them it was rebooting randomly). Waiting to see what nterzeni says about his Ubuntus, and if it works, then I am immediately changing it all to FC28 (so I have the latest Kernel, and the latest KVM which allows for Shared Folders, .... instead of NFS ...), although it's not "certified".
    Last edited by bobx001; 3rd June 2018 at 03:05 PM.
    "monsters John ... monsters from the ID..."
    "ma vule teva maar gul nol naya"

  15. #15
    Join Date
    Jun 2004
    Location
    Maryland, US
    Posts
    7,347

    Re: first server error,reboot , what is this UUID ?

    Quote Originally Posted by bobx001
    check out my reply to nterzeni, with the "fixes" given to me by Supermicro, ONLY after applying them the server works like a charm on Centos7 (before applying them it was rebooting randomly). Waiting to see what nterzeni says about his Ubuntus, and if it works, then I am immediately changing it all to FC28 (so I have the latest Kernel, and the latest KVM which allows for Shared Folders, .... instead of NFS ...), although it's not "certified".
    You can run the latest and greatest kernels in Centos, there's a special kernel repo. I think it's 'elrepo' and you install the kernel-ml . In a lot of cases, I've found the Centos kernel when you do that is newer than even the Fedora one (maybe they have more testing resources so they go to it sooner)?
    REF:
    https://linuxhint.com/upgrade-kernel-centos-7/
    http://www.gooksu.com/2015/04/21/running-the-latest-kernel-centos7/
    https://elrepo.org/tiki/kernel-ml

    ...

    ...

    ...

    ...

    ...
    Last edited by marko; 3rd June 2018 at 08:49 PM.

Page 1 of 2 1 2 LastLast

Similar Threads

  1. [SOLVED]
    Fedora 13 software raid 5, UUID and md number changes on reboot
    By nrheckman in forum Using Fedora
    Replies: 3
    Last Post: 19th June 2010, 12:54 PM
  2. UUID error while booting
    By akito85 in forum Installation, Upgrades and Live Media
    Replies: 2
    Last Post: 6th April 2009, 07:43 AM
  3. Why would server reboot
    By machielr in forum Using Fedora
    Replies: 2
    Last Post: 1st September 2008, 07:03 AM
  4. X Server won't start on reboot. Please help
    By Stephenjmccoy in forum Using Fedora
    Replies: 2
    Last Post: 12th September 2007, 01:47 AM
  5. Server Does Not Reboot
    By bbzbryce in forum Using Fedora
    Replies: 3
    Last Post: 29th August 2006, 09:02 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •