Monday, October 29, 2012

"Core Blimey!": Phenom II Core Unlocking

The manufacturing of microprocessors, such as the CPUs found in today's modern computers, is a complex and time consuming process, often resulting in a low yield of fully functional components. A percentage of the chips produced may only partially work, while others not at all; only through rigorous testing can those perfect parts be identified. Rather than simply discard the chips with imperfections, the manufacturer will identify the working components on each die and stable clock speeds they are able to operate at in order for them to be able to "bin" the parts as different SKUs. Even after a manufacturing process has been perfected and the percentage yield of fully functional parts is higher, a company may decide to bin chips with no defects into a lesser SKU simply to meet demand for cheaper products; disabling perfectly functional CPU cores, for example.

In 2009, a Korean over-clocker using a Biostar motherboard discovered it was possible to re-enable factory-disabled cores in AMD's Phenom series of CPUs. Once the news broke, some motherboard manufacturers started to add core "unlocking" features to their high-end products to tr to facilitate the process. When building my current PC, the motherboard I settled on was an Asus M4A77TD Pro, which I discovered had this feature present in the BIOS the first time I booted the system.

Ever since then, I've been toying with the idea of trying to unlock any additional cores present on my dual-core Phenom II CPU. However, I was finally convinced by a YouTube video I watched recently. In the the video, modern game titles were benchmarked to see the effect multiple CPU cores have on performance and the Frostbite 2 engine was mentioned as specifically taking advantage of additional cores in a system, so I decided to undertake my own experiment:

Would it be possible for me to unlock additional cores present on my CPU and benefit from improved performance in games I regularly play?

To measure the change in performance of the system, I used a mixture of synthetic and read-world benchmarks. The synthetic benchmarks (save for Unigine Heaven) are geared towards testing raw compute power the CPU has to offer, while the real-world tests will hopefully show the impact additional cores have on gaming.

Synthetic Tests

Cinebench 11.529
Both the single and multi-core CPU tests. The result is a score calculated by the program, a higher score indicating a better performance.
POVRay 3.7 Beta
Both single and multi-core tests again. Previous versions of the program only support single-threaded rendering, hence the need for the beta release. This test simply reports the amount of time (in seconds) that the benchmark render took to complete; a lower time is obviously preferable in this case.
wPrime 2.09
Calculates the square roots of numbers and another benchmark that measures performance by timing how long the system takes to complete.
Unigine Heaven Benchmark 3.0
Using DirectX 11, with the resolution set to 1920x1080, with 8xAA, 16x anisotropic filtering, hardware tessellation set to "normal", shaders and textures set to "high" and vsync off. The benchmark simply takes several predetermined paths through a 3D environment, recording the minimum, maximum and average frame-rate and awarding an score based on the performance.

Each of these tests will be run five times, with the median of the scores being used. In order to get the most consistent results possible, these benchmarks would be run as "Administrator" after preparing the system using Maximum PC's "How To Properly Benchmark Your PC" pre-flight checklist:

  1. Turn off screen savers
  2. Turn off power saving modes
  3. Disconnect from the network/Internet
  4. Disable antivirus and any other security-related tools
  5. Turn off Windows update
  6. Defrag HDD if needed
  7. Disable System Restore
  8. Reboot: Self explanatory
  9. Wait for the machine to fully boot and log on
  10. Force Windows to process tasks schedule to run when the system is idle. This is a neat trick I learned from the Maximum PC article, which I think is worth repeating here:
    1. Run "Command Prompt" as Administrator
    2. Type: Rundll32.exe advapi32.dll,ProcessIdleTasks
    3. Wait for disk activity to die down

Real-World Tests

For my real world tests, I would be conducting a short play through of three games I regularly play. Each was performed five times, but without running the system through any special pre-flight checklist (i.e. a "real world" scenario). I configured FRAPS to benchmark over 60 seconds and to record the minimum, maximum and average frame-rate. I also decided to measure the time taken to render each frame during the benchmark, after reading a very intriguing article on The Tech Report site, entitled "Inside the second: A new look at game benchmarking", which investigated the difference between measuring the average FPS and the time taken to render each individual frame. Without going into too much detail, a system that produces a good average frame-rate can still struggle to render the occasional frame, which can result in a jarring experience while gaming. The three games I

Battlefield 3
The resolution set to 1920x1080, with the graphics options set to "ultra" and vsync on. I ran through a portion of the Operation Swordbreaker stage; in the parking lot, just after being attacked with the RPG.
The Elder Scrolls V: Skyrim
The Resolution set to 1920x1080, with graphics options set to "ultra" and vsync on. I found an outdoor location that was near a giant's encampment with a dragon circling overhead and started benchmarking before attacking the giants.
Civilization V
Running in DirectX 11, with the resolution set to 1920x1080, 2x anti-aliasing, 16x anisotropic filtering and vsync on. I loaded a save from a late-stage game I had been playing and benchmarked the graphics performance while taking a turn. As this game really taxes other components in the system as well as the GPU, I decided to measure the time taken to load a saved game and how long the AI moves take to complete. This was probably the least accurate test, as I had to manually time these actions.

Stability Testing

One final round of tests I would need to carry out were aimed at testing any cores I was able to unlock (and the entire system) for stability.

  • IntelCPUBurn - 100 iterations with the stress test option set to "maximum".
  • Prime95 - small FFT test run on all available cores overnight and the next working day and the blend test for around 2 hours.

The Unlocking Process

After I had run the above synthetic and real-world benchmarks, I began the unlocking process:

  1. As the core unlocking feature is controlled via the BIOS, the first step was to reboot the machine.
  2. During POST, I pressed "4" on the keyboard (as prompted by the BIOS splash screen).
  3. This resulted in the machine instantly powering off (which I found rather disconcerting!), remaining in that state for a second or two, before powering back on again.
  4. Worryingly, the system did not seem to proceed to display the usual POST messages and instead the screen remained dark. I left this for several minutes before deciding to hit the reset switch on the machine.
  5. Fortunately, this allowed the machine to boot as normal, but this time, the splash screen displayed a message stating "3 cores are activated!" (see the image at the beginning of this blog post).
  6. Booting into Windows and starting Task Manager confirmed an additional core was now available to the operating system. Additionally, CPU-Z identified my processor as a Phenom II X4 B50 (codename "Deneb"), instead of a Phenom II X2 550 (codename "Callisto"), but with only 3 cores:

Before I started my stability tests, I did reboot and enter the BIOS to see the available settings relating to core unlocking and I even tried running the core unlocker again to see if a 4th core could be activated. However, the result was always the same; just 3 cores could be unlocked. I did consider forcing the BIOS to unlock the 4th core, but I suspected that it remained locked because the BIOS was unable to confirm it's stability.

The Results

After the extensive and lengthy stress testing completed without any errors, BSODs or shutdowns, I considered the additional core stable. That's a positive result in itself; I had successfully unlocked a third core on my CPU! In addition, I noticed the highest recorded temperature of the CPU was 52°C. This is much lower than the maximum operating temperature AMD state for the component, so I have some potential headroom if I decide to overclock the chip.

Continuing with the performance testing, I started seeing some interesting results. First, let's take a look at the CPU-bound synthetic tests:

As expected, when conducting multi-threaded tests after enabling the third core, there were up to 50% performance gains recorded (or exactly 50% in the case of Cinebench); great news if I run any CPU intensive tasks, like encoding video!

When comparing the Unigine results, a different story emerges:

A bit disappointing, especially the tri-core posting a lower minimum FPS value than the dual core! Considering that fact, I'm not too sure how the tri-core was awarded a higher score, as the maximum and average FPS values were only marginally higher. Whatever the reason, it doesn't look like the additional core has improved graphical performance at all. I suspect that the Unigine benchmark doesn't benefit from additional cores; it's not a full game that could potentially use additional threads for AI subroutines.

Moving onto the real-world tests, Battlefield 3 produced unimpressive, but interesting results. First, let's take a look at the minimum, maximum and average FPS achieved during each play-through:

Strangely, the tri-core system recorded ever-so-slightly lower maximum and average FPS values, but the minimum FPS value was raised by a similarly small amount. This is confirmed when looking at the time taken to render each frame:

The tri-core system actually renders fewer frames over the 60 second benchmark, but they are rendered at a more consistent rate. In fact, while the dual-core configuration resulted in 4 frames taking over 50ms to render, the 3 slowest frames in the tri-core setup only took just over 40ms. So despite the lower average FPS, the tri-core system should produce a smoother gameplay experience.

Skyrim's results are even more positive, with the min/max/avg comparison showing a clear improvement:

The tri-core system posted significantly higher maximum, minimum and average FPS values. In fact, for all 5 repetitions of this test 61 FPS was the maximum, which suggests the machine was hitting the vsync limit and potentially could be rendering faster without it enabled.

Looking at the frame-times, the tri-core configuration produced frames quicker (and therefore more of them in total) almost for the majority of the sixty second benchmark. However, even with the additional core there was still a single frame that took over 60ms to produce.

Benchmarking Civilization 5 really made me understand just how much the game pushes my system:

The minimum frame-rate recorded for both the dual and tri-core configurations is extremely low; this is most likely while panning around the map with the mouse. Happily though, there are noticeable performance increases, which the frame-time comparison also shows:

The next two benchmarks are by far the most inaccurately measured; I had to simply use a stop-watch to record the time from me clicking the mouse and the operation completing. I estimate that this introduced a one or two second margin of error for the results. Despite this, the tri-core setup appears to have improved saved game load times:

The time taken to process AI moves seems to improve ever so slightly, but I'm not sure the difference is significant enough, given the timing inaccuracies mentioned previously:

Conclusions

Given the results of the real world tests, I wish I had taken the time to benchmark more games to see which titles have benefited from my tinkering. Since the core unlock however, I have noticed is that if I have Windows Task Manager open while playing certain games (specifically, Crysis and Diablo III), there is significant load displayed on all three CPU cores, leading me to believe that they are taking advantage of the additional processing core.

Overall, I am happy with the results; clearly the third core provides some assistance when playing modern titles. What I'm particularly pleased about is that I'm able to notice the improved performance in Skyrim; the experience does seem smoother.

Given how easy it was to coax a little bit of extra performance out of my system, I'm now considering trying to over-clock the CPU to boot it further. As I mentioned previously, the operating temperature of the CPU was well below the maximum, so I should be able to increase the voltage if necessary.