View Full Version : memory failure when hot
reader
14th May 2008, 02:35 AM
I recently bought a Dell Inspiron 530 with quad-core and 3GB memory. If the computer has been running full load (all four cores used) for a while, I start to get errors, like compressed file has integrity error, copied files different from original. If I run memory test at that time, either memtest86+ from fedora live CD, or from Dell resource CD, I get memory error. But if I turn off the computer, wait for a while, and turn it back on and do memory test, both tests pass without any problem. So the problem only shows up when system is hot. How likely is it that the memory module is bad, that replace it will solve the problem? Or is there a problem with the designed of the Dell box, that it over heats, (at least when quad-core is running full speed), and the memory is not at fault?
Is there any way to tell the temperature of the CPU or memory or anything else inside the box?
marko
14th May 2008, 02:42 AM
Sure, in fedora just install the lm_sensors package and run "sensors-detect" and follow the instructions.
When it's done you can run on the command line:
sensors
and it will tell you various temp and other stats reported from the cpu, system, chipset, etc
There are fancy gui applets like "gkrellm" that will show you all this
info in a continuous way.
Mark
savage
14th May 2008, 02:51 AM
I had memory overheating problems when I went to 4GB, because the modules were so close together, it managed to fry on of the modules.
If your confident it's overheating, and happy opening your PC, I would remove a module, at least until you can get a heatsink/fan for the memory, rather than risk frying it. I'm assuming you have 3x1GB modules, and 2 are close together.
I've changed my RAM now from Corsair to OCZ, which has heat pipes that direct the heat away from the modules, instead of sideways onto one another.
Dan
14th May 2008, 02:58 AM
Bad news.
Even though your memory seems to recover when it cools, the blunt truth is that electronic chips don't heal. Once a component has hit a threshold temperature and voltage combination that has broken down the junction in a component, a small "hole" has been punched, and although it may function normally for a while, (sometimes a long while if no further stress is introduced) the component is now flawed and will succumb to further damage all the more quickly. It is therefore unreliable.
If your system failed the first time while all cores were active, and the case and components were not being subjected to conditions outside their rated operating range, the blunt truth is the enclosure/cooling solution was inherently flawed.
In a system which is known to generate higher than average waste heat, I don't think there is such a thing as too much cooling. Noise be damned! Mine is a fairly modest system, running only an AMD Athlon XP 3200+, but once in a while, it gets a bit warm in the office, so I've got 2 case fans, 2 power supply fans and a solid copper Thermaltake Volcano 7+ cooler with a 6000 RPM fan on the CPU. As a result, the CPU fan itself can fail and the system will not overheat.
This is the end result of unhappy experience. I've smelled the magic smoke before. It's expensive, and it stinks!
In short, figure on replacing the memory soon, and even if you have to do some sheetmetal carving ... get some more airflow through your case!
Hope that helps.
Dan
reader
14th May 2008, 07:14 AM
I had a Dell Dimension 9200 with dual-core and 3GB as my main desktop. I was quite happy with it, except that it's a bit noisy in my bedroom. But last month, Dell has this quad-core 3GB Inspiron 530 on sale with $310 off, and I can use my company's EPP to get 2% off with free shipping, plus a 10% discount coupon found on the web. It's too good a deal to pass, plus some Dell users say 530 is really quiet, so I bought it to replace the 9200 as my main desktop.
The 3GB is 2x1GB+2x512MB, so yes, they are really close. When I open the case to remove two of them to test, they are really hot. Maybe I should get those memory modules with metal plate heat sink. I think quad-core is too much for such a small foot print desktops.
reader
14th May 2008, 08:45 AM
Mark,
I did exactly what you said. I have to reboot the machine to have them working. Here is the output:
bash-3.2$ sensors
it8718-isa-0290
Adapter: ISA adapter
in0: +1.14 V (min = +2.03 V, max = +3.95 V) ALARM
in1: +3.04 V (min = +0.00 V, max = +4.08 V)
in2: +3.34 V (min = +0.00 V, max = +4.08 V)
in3: +3.02 V (min = +0.00 V, max = +4.08 V)
in4: +2.99 V (min = +0.00 V, max = +4.08 V)
in5: +0.06 V (min = +0.00 V, max = +4.08 V)
in6: +0.11 V (min = +0.00 V, max = +4.08 V)
in7: +2.96 V (min = +0.00 V, max = +4.08 V)
in8: +3.06 V
fan1: 1683 RPM (min = 0 RPM)
fan2: 0 RPM (min = 0 RPM)
fan3: 1102 RPM (min = 0 RPM)
temp1: +84°C (low = -1°C, high = +127°C) sensor = thermistor
temp2: +43°C (low = -3°C, high = +127°C) sensor = thermistor
temp3: -51°C (low = -1°C, high = +127°C) sensor = thermistor
vid: +0.000 V
coretemp-isa-0000
Adapter: ISA adapter
Core 0: +44°C (high = +100°C)
coretemp-isa-0001
Adapter: ISA adapter
Core 1: +44°C (high = +100°C)
coretemp-isa-0002
Adapter: ISA adapter
Core 2: +42°C (high = +100°C)
coretemp-isa-0003
Adapter: ISA adapter
Core 3: +43°C (high = +100°C)
bash-3.2$
I see three things, in0 is out of range, fan2 is not running, temp3 is colder than north pole. I know all 4 fans are running in my PC, when I opened the case. I don't know what the temp[1-3] is measuring.
marko
14th May 2008, 10:28 PM
Fans showing 0 RPMs is pretty common, usually that's due to a
fan not having the yellow rpm signal wire (like using a two wire
fan on a three wire (+12, gnd, signal) header. lm sensors
generates the temp values by multiplying a constant from
a database of various chipsets and motherboards against the
value it gets from the sensor. If it gets a constant wrong by
either misdiagnosing what board you have or if a constant is
wrong, the values can be really wrong.
The temp1,2,3 are probably at the Voltage regulators near the
cpu. Mostly the one you can put stock in is the cpu temp since
those sensors are on the intel chip and it's pretty predictable (there
aren't dozens of different manufacturers of different intel chips,
there's just the one)
Mark
vBulletin® v3.8.7, Copyright ©2000-2013, vBulletin Solutions, Inc.