Stability Testing with StressLinux

I've been experimenting with over-clocking my primary gaming machine and I became extremely concerned about the constant BSODs I was generating while stability testing with Windows-based tools such as Prime95 and Intel CPU Burn. Knowing how fragile a Windows installation can be, I was keen to reduce the likelihood that my testing would result in a non-booting system.

My initial searches turned up some bootable images, such as the Ultimate Boot CD (UBCD), that came equipped with builds of Prime95, but I finally settled on a specialised Linux Live CD: StressLinux. This is an OpenSuSE derived distribution that comes bundled with the following utilities:

stress - a basic load generating tool that can stress CPU, memory and disk I/O.
cpuburn - a collection of utilities for stressing various CPU architectures.
hddtemp - a tool for reading the system temps from the S.M.A.R.T. information present in modern hard drives.
lm_sensors - a tool for monitoring hardware temperatures; CPU, motherboard, etc.
mPrime - a Linux version of the popular Prime95 tool.

The reasons I favoured the StressLinux distribution over the UBCD was that the combination of the above utilities allowed me to watch my system temps while running the mPrime/Prime95 torture tests to ascertain where instabilities lay. I was pleased to find the mPrime tool mimicked the functionality of its Windows counterpart, despite being command line driven:


       Main Menu



    1.  Test/Primenet

    2.  Test/Worker threads

    3.  Test/Status

    4.  Test/Continue

    5.  Test/Exit

    6.  Advanced/Test

    7.  Advanced/Time

    8.  Advanced/P-1

    9.  Advanced/ECM

   10.  Advanced/Manual Communication

   11.  Advanced/Unreserve Exponent

   12.  Advanced/Quit Gimps

   13.  Options/CPU

   14.  Options/Preferences

   15.  Options/Torture Test

   16.  Options/Benchmark

   17.  Help/About

   18.  Help/About PrimeNet Server

Your choice: 15


Number of torture test threads to run (2): 

Choose a type of torture test to run.

1 = Small FFTs (maximum FPU stress, data fits in L2 cache, RAM not tested 

much).

2 = In-place large FFTs (maximum heat and power consumption, some RAM 

tested).

3 = Blend (tests some of everything, lots of RAM tested).

11,12,13 = Allows you to fine tune the above three selections.

Blend is the default.  NOTE: if you fail the blend test, but can pass the

small FFT test then your problem is likely bad memory or a bad memory

controller.

Type of torture test to run (3): 3



Accept the answers above? (Y):

[Main thread Apr 11 23:22] Starting workers.

[Worker #1 Apr 11 23:22] Worker starting

[Worker #2 Apr 11 23:22] Worker starting

[Worker #1 Apr 11 23:22] Setting affinity to run worker on logical CPU #1

[Worker #2 Apr 11 23:22] Setting affinity to run worker on logical CPU #2

[Worker #1 Apr 11 23:22] Beginning a continuous self-test to check your computer.

[Worker #1 Apr 11 23:22] Please read stress.txt.  Hit ^C to end this test.

[Worker #2 Apr 11 23:22] Beginning a continuous self-test to check your computer.

[Worker #2 Apr 11 23:22] Please read stress.txt.  Hit ^C to end this test.

[Worker #1 Apr 11 23:22] Test 1, 6500 Lucas-Lehmer iterations of M12451841 using AMD K10 type-2 FFT length 640K, Pass1=640, Pass2=1K.

[Worker #2 Apr 11 23:22] Test 1, 6500 Lucas-Lehmer iterations of M12451841 using AMD K10 type-2 FFT length 640K, Pass1=640, Pass2=1K.

Peter Green - Azrael808

Thursday, April 11, 2013

Stability Testing with StressLinux