PDA

View Full Version : SuperMicro 1U EM64T and FC4/64 Probs



wdingus
6th August 2005, 11:40 PM
I'm guessing the "AMD" 64 forum is the correct place for this. This is a cut/paste of a forum posting I made on linuxquestions.org this morning. I thought it might be more appropriate here. I find somewhat similar problems detailed here, but not quite the same thing I think.


I'm having problems and wondering if anyone can make any suggestions.

SuperMicro 6014H-X8 with 2 3.2Ghz 800FSB EM64T 2MB Xeon CPUs, 16GB RAM, Adaptec(DPT) i2o RAID, IPMI, etc...

So far I've tried various BIOS settings but currently am getting the best results with hyper-threading disabled. I also am forced to boot UP kernels. If I boot an SMP kernel it fails to boot all the way with lots of complaints along the way like "/sbin/consoletype: cannot execute binary file".

The last thing attempted was a minimal install, then booted up on the CD with "linux rescue" and set SELinux to disabled. Then if I boot the UP kernel it will boot but still has a few errors with LVM (which I didn't use anyway) and a few other things. It did boot fine and I was able to install all updates. Afterwards the newer 2.6.12 kernel still wouldn't work SMP but is in about the same shape UP.

So, are there known issues with EM64T I've not read about previously? With any of the other hardware in this box or perhaps BIOS or similar settings?

Thanks...

PS. I also have a few Sun v20z AMD64 Opteron servers successfully running FC3. Been very rock solid SMP so far until the most recent updates. The most recent 2.6.12 kernel fails to boot on them. Not sure how to report the "bug" or what all info to include, only saw this for a short while when at my co-lo site yesterday evening.

The SuperMicro 6014H-X8:
http://www.supermicro.com/products/...YS-6014H-X8.cfm

ddb123
19th August 2005, 11:36 PM
Seems to be a 32bit driver bug.
Try limiting the memory to less than 4GB and see if that works. add mem=4032M to your boot options.
I just tried that on a Supermicro 6014P-82RB with 16GB ram & adaptec 2010S and it booted SMP.
I also tried it without the 2010S and it booted fine with16GB.

wdingus
20th August 2005, 01:28 AM
Thanks for the reply. Since posting this I've loaded, reloaded, tested, tweaked, etc.. I've submitted a question to RedHat since I have some RHEL licenses, etc...

Findings:

DPT_I2O does not exist in the 64-bit version of RHEL3 but does in the 32-bit version. I have multiple servers with RHEL3 using it (with 2015S cards) in production with 4GB RAM each.

Booting a UP kernel appears at this point to be totally stable. I put the box into production today with Core4 installed. I did so with the hopes that the appropriate fixes will catch up in a kernel upgrade sometime in the future. (I plugged 2 1.7TB external RAID units into it and started copying from one to the other, 70% at this point and no probs yet)

I2O exists in the 64-bit version of RHEL4 but has the same problems as Core4 and most everything else I tested with. Since it's included I submitted this to RH Support. So far they've taken 2 days to answer my query and then only with a request for more info.

Suse beta10 x86_64 I could not even get to complete an install at all.

In doing lots of searching on the net, I see discussions of "final fixes to i2o before 2.6.13 final" and similar. By all indications it's a known SMP bug and is being worked on. I hope so at least...

In the end though, I'm quite confident it's related more to SMP than anything else. I'll just run it with a single CPU until it's fixed, not that big of a deal.

Thanks again!

wdingus
20th August 2005, 01:34 AM
Oh yeah, when the problem was first identified I pulled out 1 CPU, all but 2GB RAM, etc... If it would run SMP with only 4GB RAM that would have proven what was going on but I need the RAM more than the second CPU or HT. I have 14GB as INNODB cache:

set-variable = innodb_buffer_pool_size=14G

Any suggestions on ways of getting more than 16GB in a 64-bit box without breaking the bank any more than 8 2GB DIMMs already does?

PS. I was amused while installing Windows XP-64 on it, reading the bits of text displayed during. One of them described how the 64-bit version of XP allowed you to do more, faster. How it did this by allowing more virtual memory and faster access to virtual memory. Cracked me up...

wdingus
11th October 2005, 01:19 AM
For closure and for anyone who might have followed this.. The recent 2.6.13 kernel update for FC4 resolved the problem. It's booted SMP and doing fine right now. Thanks kernel hackers!

jowah
11th October 2005, 02:36 PM
Thanks for updating this thread, wdingus. Now that you've gotten the problem sorted for FC4, I'm curious...

* What about RHEL4 on a similar machine? What has RedHat come up with, if anything?

This would be interesting for people wishing to run "production level" distributions such as RHEL4 and clones thereof. FC4 is more of a "bleeding edge" distro, as you know. RHEL4 currently runs a variant of the 2.6.9 kernel IIRC, hence my asking.

* Any particular Redhat bugzilla ticket # for this issue?


Any suggestions on ways of getting more than 16GB in a 64-bit box without breaking the bank any more than 8 2GB DIMMs already does?Only one way I can think of - get a machine with more memory sockets. The IWILL DK88 (http://www.iwill.net/product_2.asp?p_id=102&sp=Y) is one such beast - it's a two-socket Opteron motherboard with 8 DIMMs per processor. That'll should get you you 32 GB with "cheap" 2 GB DIMMs.


PS. I was amused while installing Windows XP-64 on it, reading the bits of text displayed during. One of them described how the 64-bit version of XP allowed you to do more, faster. How it did this by allowing more virtual memory and faster access to virtual memory. Cracked me up...Hehe...yep, that reminds me of how P4:s "make the Internet go faster" or how the commercials went...oh boy...the suckers of the world. :eek:

wdingus
11th October 2005, 02:53 PM
Thanks.. I was beginning to wonder if anyone was even paying attention to this :)

No bugzilla ticket I'm aware of, but I suspect the RH tech will submit something somewhere. Right before posting this I updated my open RH support ticket, letting them know 2.6.13 cured it. I suspect they'll do what they always do, backport whatever the fix is to their 2.6.9 kernel. I ended up installing and going into production on this box with FC4 instead of RHEL because of the problem though. Both OS'es were subject to it and after having an open ticket sit there for 3 days with no response yet from RH I got impatient. I hate to say anything bad about them, they do great things and they've helped us a lot. But in this case there responsiveness left a bit to be desired. By the time they were asking for sysreport results status of booting in various ways, I already had the box deployed and in production. When you buy expensive boxes, the boss doesn't want to hear that you're "playing" with it for 3 weeks trying to get it going. Deploy it, now... I did so with it booting UP and in a slightly "crippled" mode for a few months until a new kernel came out that resolved the issue and it's up to full speed now. The boss was unaware of that behind-the-scenes hassle, he just knew it was installed and doing what he bought it for.

Thanks for the heads up on the IWill board. I was unaware that was possible, very interesting...