PDA

View Full Version : How much do you trust memtest86?


MakeMyDay
2005-09-10, 06:05 PM CDT
I'm curious about your experience with memtest86 http://www.memtest86.com/. If it finds errors in the memory is it wise to replace the memory, even if no adverse effect on the system performance has been observed? In other words, does it test the memory beyond what it will ever see in the real world, or is it just a matter of luck until a certain bad bit pattern happens?

I try to hit the "New Posts" link at least once a week to answer questions if I can or to learn new aspects of FC. I saw a few posts recently that suggested running memtest86. Out of curiosity I tested my laptop using the latest version 3.2 and found about 100 errors. The errors are very repeatable. By moving around my two 256MB sticks I was able to isolate to a single stick.

I've been running FC3 for almost a year with no issues (other than those caused by user error :)).

I did search this forum (primarily suggestions to run memtest86 after unusual system behavior), and I did read through the memtest86 website above. The author states:

I am often asked about the reliability of errors reported by Mestest86. In the vast majority of cases errors reported by the test are valid. There are some systems that cause Memtest86 to be confused about the size of memory and it will try to test non-existent memory. This will cause a large number of consecutive addresses to be reported as bad and generally there will be many bits in error. If you have a relatively small number of failing addresses and only one or two bits in error you can be certain that the errors are valid. Also intermittent errors are without exception valid. Frequently memory vendors question if Memtest86 supports their particular memory type or a chipset. Memtest86 is designed to work with all memory types and all chipsets. Only support for ECC requires knowledge of the chipset.

All valid memory errors should be corrected. It is possible that a particular error will never show up in normal operation. However, operating with marginal memory is risky and can result in data loss and even disk corruption. Even if there is no overt indication of problems you cannot assume that your system is unaffected. Sometimes intermittent errors can cause problems that do not show up for a long time. You can be sure that Murphy will get you if you know about a memory error and ignore it.
It looks like I can get a new stick on eBay for under $50, so I may go ahead and replace it, but still curious about your experience.

Jman
2005-09-10, 08:41 PM CDT
Not my personal experience, but here goes.

memtest stress tests your memory. It's equivalent to reading and writing various patterns on your entire hard drive--several times.

On opinion from a Red Hat kernel hacker: - memtest86
Yes it takes ages to run. Sometimes it takes at least a day
before it shows up that there's a bit error in some DIMM.
(The worst I've seen was an error that only showed up after
a week long run).
It's really worth the time testing though. If you don't do this
test, and the problem really is flaky RAM, then the 'bug' will
never be fixed, and just cause extensive head-scratching. http://people.redhat.com/davej/hardware-problems.txt

So this may cause unreproducable bugs in the future.

Your call, but I would set it aside as possibly faulty.

jspaar
2005-09-13, 09:58 PM CDT
I heartily second what Jman said. Aside from Dave Jones' opinion, which is enough for me any day, when I've had new sticks of RAM show errors in memtest86, that has always been enough to convince vendors to take them back.

Also, sometimes the errors will go away if you use less aggressive RAM timing in your BIOS settings. That will cost you in performance though, and if they won't test clean at the timing they were rated for, chuck 'em. Otherwise you're just rolling the dice for when they'll corrupt something important.

RahulSundaram
2005-09-14, 12:10 AM CDT
Not my personal experience, but here goes.

memtest stress tests your memory. It's equivalent to reading and writing various patterns on your entire hard drive--several times.

On opinion from a Red Hat kernel hacker: http://people.redhat.com/davej/hardware-problems.txt

So this may cause unreproducable bugs in the future.

Your call, but I would set it aside as possibly faulty.

A updated page is available from http://fedoraproject.org/wiki/HardwareProblems

More general information on reporting bugs

http://fedoraproject.org/wiki/ReportingBugs

Rahul
Red Hat

MakeMyDay
2005-09-14, 04:34 AM CDT
Thanks everyone.

The bad stick came originally with a laptop I purchased used from Dell leasing many years ago, so no way to return. No markings on the stick other than a Mitsubishi logo and part number.

The laptop (CPx H/J series) doesn't allow RAM timing tweaks in the BIOS.

I found a new stick on eBay and it's on the way.