Troubleshooting Thermal Issues
Common cause of bugchecks, sudden resets, shutdowns and slowdowns
When our customer support points out to a user that his system crashes are
likely due to overheating, we are often met with disbelief.
What they fail to realize is that it can take only a fraction of a second for a CPU without appropriate cooling to reach its maximum temperature (which is often 100 degrees Celsius).
Thermal issues are really among the most common root causes of system crashes. Whenever memory corruption is reported by the bugcheck description or WhoCrashed, this often occurred because the system RAM failed because it was overheated. Most RAM modules do not come equipped with temperature sensors, which provide protection.
A temperature sensor of the CPU or motherboard component may detect that a thermal trip point has been reached and informs the operating system through the ACPI driver that action should be taken. Depending on the hardware, operating system version, power policies and other factors, thermal issues can manifest in various ways. Windows might sleep or hibernate the system gracefully, a blue (or black) screen may appear or the system may suddenly reset or shutdown without notice. Often memory corruption occurs which can produce all sorts of artifacts and weirdness. Also a blue (or black) screen may occur only because of a side-effect of a thermal issue.
Monitoring CPU temperatures
We offer a utility named WhySoSlow which allows you to measure the temperature of your CPUs. Also your BIOS/CMOS setup program may
offer a utility that allows you to measure the temperature of your processor(s).
Another utility that allows you to measure CPU temperatures is CoreTemp.
About clock speeds and processor throttling
If your system is equipped with a CPU with a dynamic processor clock speed feature such as Intel Speedstep or AMD Cool N Quiet, the operating clock speed of your processor might be reduced to an unacceptable level and eventually crash the system. We suggest using the WhySoSlow utility for checking both your clock speed and temperature.
Notebooks and other portable computers
If you are using a notebook or other portable system you should always make sure it is placed on a solid surface so that heat can dissipate. A notebook placed on a blanket or cloth can overheat quickly and cause problems to your computer. Also it is very common for notebooks to collect a lot of dust in the fans so check the information below.
Check the fans, remove dust
If a system is getting too hot, you should check the fans. For notebooks it is very common to collect dust over time. Dust blocks the airflow
of the thermal components so that heat may build up quickly. On certain computer systems, you might just be able to blow out the dust
from the fans without opening the system.
Also desktop and server systems collect dust over time. If your system gets hot then you should open up the system and remove dust from the fans.
Problems caused by dust can manifest in various ways including blue (or black) screens, sudden resets, shutdowns, slowdowns and weird behavior of applications running in the system.
The thermal paste that fits between a CPU or GPU and its socket wears out over time. If your system is overheating and the fans are clean and working properly, then it may be that you need to replace the thermal paste in your system.
WhoCrashed documentation and articles
· Introduction · Supported Operating Systems · Professional Edition · What's new in v7 ? · Upgrade Policy · FAQ · Using WhoCrashed · General Recommendations and Tips · Unexpected Resets and Shutdowns · Enabling Crash Dumps · If Crash Dumps are not written out · Thermal Issues · Memory Corruption · Symbol Resolution · Using Driver Verifier · Remote System Configuration · Crash Dump Test · Tools · Advanced Options · Command-line Options
· Supported Operating Systems
· Professional Edition
· What's new in v7 ?
· Upgrade Policy
· Using WhoCrashed
· General Recommendations and Tips
· Unexpected Resets and Shutdowns
· Enabling Crash Dumps
· If Crash Dumps are not written out
· Thermal Issues
· Memory Corruption
· Symbol Resolution
· Using Driver Verifier
· Remote System Configuration
· Crash Dump Test
· Advanced Options
· Command-line Options
Page generated on 6/1/2023 2:04:17 AM. Last updated on 4/19/2022 9:05:02 PM.