FedoraForum.org - Fedora Support Forums and Community
Results 1 to 10 of 10
  1. #1
    Join Date
    Dec 2012
    Location
    santa barbara, CA
    Posts
    731

    first server error,reboot , what is this UUID ?

    Hello guys,

    just got my first crash/reboot on the large new server doing extensive testing,
    Below is the DMESG error, after the reboot. Question is, what does this UUID: dc3ea0b0-a144-4797-b95b-53fa242b6e1d mean ? it's not in the blkid list. Could it be a DIMM ?

    Code:
    Mar 10 15:11:51 boa kernel: [Hardware Error]: event severity: info
    Mar 10 15:11:51 boa kernel: [Hardware Error]:  Error 0, type: fatal
    Mar 10 15:11:51 boa kernel: [Hardware Error]:  fru_text: ProcessorError
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   section type: unknown, dc3ea0b0-a144-4797-b95b-53fa242b6e1d
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   section length: 0xd0
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   00000000: 00000007 00000000 00000011 00000000  ................
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   00000010: 00800f12 00000000 11200800 00000000  .......... .....
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   00000020: 76d8320b 00000000 178bfbff 00000000  .2.v............
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   00000030: a55701f5 43dee3ef 9b2472ac 2cad3f57  ..W....C.r$.W?.,
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   00000040: 00000001 00000000 0602001f 00000000  ................
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   00000050: 00000000 00000000 00000000 00000000  ................
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   00000060: 00000000 00000000 00000000 00000000  ................
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   00000070: 00500002 00000414 00000000 00000000  ..P.............
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   00000080: 00000005 00000000 00000108 bea00000  ................
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   00000090: c0f2e150 0000555d 00000000 00000000  P...]U..........
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   000000a0: 00000000 00000000 00000031 00000003  ........1.......
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   000000b0: 4d000000 00000000 00000000 d0140ff6  ...M............
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   000000c0: 00000000 00000000 00000000 000500b0  ................
    Mar 10 15:11:51 boa kernel: [Hardware Error]:  Error 1, type: fatal
    Mar 10 15:11:51 boa kernel: [Hardware Error]:  fru_text: ProcessorError
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   section type: unknown, dc3ea0b0-a144-4797-b95b-53fa242b6e1d
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   section length: 0xd0
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   00000000: 00000007 00000000 00000019 00000000  ................
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   00000010: 00800f12 00000000 19200800 00000000  .......... .....
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   00000020: 76d8320b 00000000 178bfbff 00000000  .2.v............
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   00000030: a55701f5 43dee3ef 9b2472ac 2cad3f57  ..W....C.r$.W?.,
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   00000040: 00000001 00000000 0602001f 00000000  ................
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   00000050: 00000000 00000000 00000000 00000000  ................
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   00000060: 00000000 00000000 00000000 00000000  ................
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   00000070: 00500002 00000414 00000000 00000000  ..P.............
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   00000080: 00000005 00000000 00000108 bea00000  ................
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   00000090: b28a1b47 0101ffff 00000000 00000000  G...............
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   000000a0: 00000000 00000000 00000031 00000003  ........1.......
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   000000b0: 4d000000 00000000 00000000 d0140ff6  ...M............
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   000000c0: 00000000 00000000 00000000 000500b0  ................
    Mar 10 15:11:51 boa kernel: [Hardware Error]:  Error 2, type: fatal
    Mar 10 15:11:51 boa kernel: [Hardware Error]:  fru_text: ProcessorError
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   section type: unknown, dc3ea0b0-a144-4797-b95b-53fa242b6e1d
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   section length: 0xd0
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   00000000: 00000007 00000000 00000020 00000000  ........ .......
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   00000010: 00800f12 00000000 20200800 00000000  ..........  ....
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   00000020: 76d8320b 00000000 178bfbff 00000000  .2.v............
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   00000030: 48ab7f57 4f6cdc34 b5b0d3a7 1443a7b0  W..H4.lO......C.
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   00000040: 00000001 00000000 00980027 00000000  ........'.......
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   00000050: 00000000 00000000 00000000 00000000  ................
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   00000060: 00000000 00000000 00000000 00000000  ................
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   00000070: 00500002 00000458 00000000 00000000  ..P.X...........
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   00000080: 00000016 00000000 0000080b faa00000  ................
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   00000090: 00000000 00000000 00000000 00000000  ................
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   000000a0: 00000000 00000000 00000035 00000007  ........5.......
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   000000b0: 5d000000 00000000 00000000 d0140ff7  ...]............
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   000000c0: 00000000 00000000 00000002 0001002e  ................
    Mar 10 15:11:51 boa kernel: [Hardware Error]:  Error 3, type: fatal
    Mar 10 15:11:51 boa kernel: [Hardware Error]:  fru_text: ProcessorError
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   section type: unknown, dc3ea0b0-a144-4797-b95b-53fa242b6e1d
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   section length: 0xd0
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   00000000: 00000007 00000000 00000030 00000000  ........0.......
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   00000010: 00800f12 00000000 30200800 00000000  .......... 0....
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   00000020: 76d8320b 00000000 178bfbff 00000000  .2.v............
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   00000030: 48ab7f57 4f6cdc34 b5b0d3a7 1443a7b0  W..H4.lO......C.
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   00000040: 00000001 00000000 00180007 00000000  ................
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   00000050: 00000000 00000000 00000000 00000000  ................
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   00000060: 00000000 00000000 00000000 00000000  ................
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   00000070: 00500002 00000458 00000000 00000000  ..P.X...........
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   00000080: 00000016 00000000 0000080b baa00000  ................
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   00000090: 00000000 00000000 00000000 00000000  ................
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   000000a0: 00000000 00000000 00000035 00000007  ........5.......
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   000000b0: 5d000000 00000000 00000000 d0140ff6  ...]............
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   000000c0: 00000000 00000000 00000002 0001002e  ................
    Mar 10 15:11:51 boa kernel: [Hardware Error]:  Error 4, type: fatal
    Mar 10 15:11:51 boa kernel: [Hardware Error]:  fru_text: ProcessorError
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   section type: unknown, dc3ea0b0-a144-4797-b95b-53fa242b6e1d
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   section length: 0xd0
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   00000000: 00000007 00000000 00000040 00000000  ........@.......
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   00000010: 00800f12 00000000 40200800 00000000  .......... @....
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   00000020: 76d8320b 00000000 178bfbff 00000000  .2.v............
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   00000030: 48ab7f57 4f6cdc34 b5b0d3a7 1443a7b0  W..H4.lO......C.
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   00000040: 00000001 00000000 00980027 00000000  ........'.......
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   00000050: 00000000 00000000 00000000 00000000  ................
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   00000060: 00000000 00000000 00000000 00000000  ................
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   00000070: 00500002 00000458 00000000 00000000  ..P.X...........
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   00000080: 00000016 00000000 0000080b faa00000  ................
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   00000090: 00000000 00000000 00000000 00000000  ................
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   000000a0: 00000000 00000000 00000035 00000007  ........5.......
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   000000b0: 5d000000 00000000 00000000 d0140ff7  ...]............
    Mar 10 15:11:51 boa kernel: [Hardware Error]:   000000c0: 00000000 00000000 00000002 0001002e  .
    .
    .
    .
    Mar 10 15:11:51 boa kernel:  Magic number: 10:937:183
    "monsters John ... monsters from the ID..."
    "ma vule teva maar gul nol naya"

  2. #2
    Join Date
    Dec 2012
    Location
    santa barbara, CA
    Posts
    731

    Re: first server error,reboot , what is this UUID ?

    well, the only "relatively similar" errors I could find on the net are mostly related to UEFI and ACPI

    I have decided to add the acpi=off into the grub boot string. We shall see how that fares.

    I know my bro had a hell of a time with UEFI to get the machine to boot from one of the SSD disks. The system kept offering only the NVMEs for boot, or one of the SATA hard drives connected. But he finally got it to work, unfortunately we shall never know how he did it lol. I guess I'll get a NOC tech to take photos of all the BIOS settings, and send them to me.
    "monsters John ... monsters from the ID..."
    "ma vule teva maar gul nol naya"

  3. #3
    Join Date
    Dec 2012
    Location
    santa barbara, CA
    Posts
    731

    Re: first server error,reboot , what is this UUID ?

    BTW, memtest86 won't run with UEFI crap, so I am running memtester with booted FC27 at the moment, I allocated 125GB of ram to it, and doing a single pass. It's been working like 12 hours so far, and no errors.

    EDIT: I now launched 10 memtesters with 12000MB each, and one with 7000MB , leaving 1GB of ram for the OS /etc.
    this seems to be MUUUUUUUUUCH quicker.
    Click image for larger version. 

Name:	memtesters.png 
Views:	9 
Size:	126.0 KB 
ID:	29460


    UPDATE: memtesters finished without error, so I am inclined to think it was ACPI or the NVMEs set up in raid mode.
    Click image for larger version. 

Name:	memtesters_success.png 
Views:	10 
Size:	109.8 KB 
ID:	29461
    Last edited by bobx001; 11th March 2018 at 01:42 PM.
    "monsters John ... monsters from the ID..."
    "ma vule teva maar gul nol naya"

  4. #4
    Join Date
    Oct 2006
    Location
    CN99CF Agassiz BC Canada
    Posts
    382

    Re: first server error,reboot , what is this UUID ?

    Perhaps 'sudo updatedb' followed by 'locate dc3ea0b0-a144-4797-b95b-53fa242b6e1d'. This should give some idea of what type of device it is, based on its location in the file tree.
    -----
    f26 x86_64 Acer Predator G5910 Quad core Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz

  5. #5
    Join Date
    Dec 2012
    Location
    santa barbara, CA
    Posts
    731

    Re: first server error,reboot , what is this UUID ?

    cheers bro, but no luck, however I have been through many reboots.
    Code:
    [root@boa NVME1]# updatedb
    [root@boa NVME1]# locate dc3ea0b0-a144-4797-b95b-53fa242b6e1d
    zilch
    If this happens again, upon the next boot of the box I will do precisely that, and report here.
    "monsters John ... monsters from the ID..."
    "ma vule teva maar gul nol naya"

  6. #6
    Join Date
    Dec 2012
    Location
    santa barbara, CA
    Posts
    731

    acpi=off , very interesting performance effect

    Hello guys,

    still hunting for that Hardware Error, which happened once again (server running 2 VM's on Qemu: FC27 32bit, and Solaris 10) and postgres on the server itself. And it happened again. This one:
    https://forums.fedoraforum.org/showt...t-is-this-UUID
    (with the same UUID !, tried looking for it in the /dev and /sys, to no avail)

    So I turned off acpi and also HPET, with acpi=off and nohpet, and interestingly instead of 64 "cpu siblings", it went down to 32.

    The funny thing is that with acpi=off, there is a batch postgres proggy which calculates some heavy stuff, and it was running in 22 minutes. As soon as I took acpi=off from grub.cfg again, to get back my 64 "cpus", the proggy takes 28 minutes to run.

    nohpet is still there, and for the moment I have turned off both QEMUs, to see if it maybe that what causes the crashing (maybe the solaris one?).

    anyway, interesting that with acpi=off and losing half the cores, postgres runs quicker.
    "monsters John ... monsters from the ID..."
    "ma vule teva maar gul nol naya"

  7. #7
    Join Date
    Dec 2013
    Location
    United Kingdom
    Posts
    5,872

    Re: first server error,reboot , what is this UUID ?

    threads merged, please don't open more than one thread at a time about the same issue

  8. #8
    Join Date
    Dec 2012
    Location
    santa barbara, CA
    Posts
    731

    Re: first server error,reboot , what is this UUID ?

    zorry guys, but I thought since my latest post was just an acpi -> affects postgres issue, that it was different.

    Anyway, Update on this so elusive an error: I can successfully say that it only happens when KVM/QEMU is running (I had 2, an FC27 32bit, and a Solaris 10). I will try and launch just one of them to see if I can identify the culprit (willing to bet it's a Solaris+QEMU issue)

    Supermicro replied to my query about this, and they told me that if in the IPMI logs I do not see any hardware error, then it must be linked to the OS running the box. (and since FC27 is not certified, then it maybe that).

    I shall update on the next step in the investigation.
    "monsters John ... monsters from the ID..."
    "ma vule teva maar gul nol naya"

  9. #9
    Join Date
    Dec 2012
    Location
    santa barbara, CA
    Posts
    731

    Re: first server error,reboot , what is this UUID ?

    getting closer to the source of the problem.

    One thing I have noticed today, is that I have an old application, which needs to run on 32bits, so I created a QEMU guest with FC27-32bit, and voilla, it would not start. Actually the guest would just freeze. The funny issue is that I would do exactly the same in my Intel-based laptop, and it runs !
    So, then I went into the Guest machine settings, and instead of choosing an Opteron G3 (default), I chose a Nehalem, and voilla.... app runs ! Go figure....

    which leads me to the following thought:
    I am going to test the Solaris 10 guest now with different CPU configs, to see if the server crashes again, and hey, if I can get it to run for at least a few days, we may have something there.
    "monsters John ... monsters from the ID..."
    "ma vule teva maar gul nol naya"

  10. #10
    Join Date
    Dec 2012
    Location
    santa barbara, CA
    Posts
    731

    Re: first server error,reboot , what is this UUID ?

    Quick update on this. It is not the solaris KVM/QEMU guest. It was not running, but the problem happened again, same exact error, with a very loaded FC27/32bit KVM/QEMU guest + postgres + rsync.

    At this point, I will test the server with Centos 7.4, which is at least certified by Supermicro, and see what mileage I get.
    "monsters John ... monsters from the ID..."
    "ma vule teva maar gul nol naya"

Similar Threads

  1. [SOLVED]
    Fedora 13 software raid 5, UUID and md number changes on reboot
    By nrheckman in forum Using Fedora
    Replies: 3
    Last Post: 19th June 2010, 12:54 PM
  2. UUID error while booting
    By akito85 in forum Installation, Upgrades and Live Media
    Replies: 2
    Last Post: 6th April 2009, 07:43 AM
  3. Why would server reboot
    By machielr in forum Using Fedora
    Replies: 2
    Last Post: 1st September 2008, 07:03 AM
  4. X Server won't start on reboot. Please help
    By Stephenjmccoy in forum Using Fedora
    Replies: 2
    Last Post: 12th September 2007, 01:47 AM
  5. Server Does Not Reboot
    By bbzbryce in forum Using Fedora
    Replies: 3
    Last Post: 29th August 2006, 09:02 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •