Top Model With Four Cores - AMD Phenom 9700
Translation: The Spider Spins It's Web.
Today marks a historic occasion for AMD. After delays of more than a year, the company can finally present its new, highly anticipated processor - and not a moment too soon. AMD needs a fresh product. While this CPU was originally meant as a competitor to Intel's Core 2 CPUs, the balance of power in the CPU arena has shifted over the past 18 months. The new processor, dubbed Phenom by AMD, is the first quad-core CPU by AMD and, as the company likes to remind us, the first native quad-core design.
The exhaustion in the faces of our editors in the Munich lab is a testament to the hard work they've put into this article over the past few hours and days. We tested all three models of the new processor, the Phenom 9700, Phenom 9600 and Phenom 9500 , running each of them through our benchmark suite. Along with the Phenom processor, AMD is also presenting its "AMD OverDrive" tool.
With the new 7-series chipset family, consisting of the 790FX, 790X and 770, AMD is simultaneously unveiling the Spider platform. Up to four graphics cards can be set up as a Crossfire X configuration using the new 790FX chipset.
All of the current and the new motherboards and processors are fully compatible with one another.
Looking towards Eastern Europe: For the actual introduction of its Phenom processor, AMD invited the press to the Polish capital of Warsaw, where the company held a three-day press conference.
Jochen Polster, manager of AMD Germany, opened the event with a keynote addressing the press. For the first time since the acquisition of graphics chip company ATI, AMD is presenting a complete platform consisting of the Phenom processor, the 790FX chipset and the HD3800 graphics card series. With this platform, code named Spider, AMD aims to offer the basis for a computer that is affordable for everyone.
Jochen Polster emphasised that the Phenom quad-core processor does not represent a high-end model for now. AMD plans to price the Phenom models markedly lower than Intel's quad-core models.
The gaming market has always been a driving force in PC sales. With the 790FX chipset, AMD now offers buyers the possibility of creating a system using up to four graphics cards in a crossfire configuration. The appropriate driver is expected for release in January 2008.
Since we already covered the HD3800 series of graphics cards in a separate launch article, we will concentrate exclusively on the Phenom quad-core processor and the new 790FX chipset in this article.
The Phenom In Detail - A Revamped Athlon 64
AMD has thoroughly reworked the core of the Phenom processor compared to the Athlon 64, succeeding in raising the number of instructions per clock cycle (IPC). According to AMD, we should expect to see a performance increase of up to 25% at the same clock speed.
Like several of the later Athlon 64 models, the Phenom is manufactured on a 65 nm production process. In its presentation, AMD stated that it will begin transitioning to a 45 nm process starting in 2008. Unlike Intel's quad-core solutions, which consist of two dual-core processors combined in one CPU package, AMD's Phenom uses a single die comprising four cores. The resulting die has an area of 285 mm² and consists of 600 million transistors. That means that the transistor count has more than doubled compared to the Athlon 64 X2, which consisted of 227 million transistors.
The BIOS POST message
The downside to the single-die quad-core approach is a greater risk of manufacturing defects and thus lower yields. If even one of the cores suffers a manufacturing defect, the entire quad-core CPU becomes defective. AMD has found a solution, if this should ever happen, though. If one of the cores is indeed defective, it is deactivated, and the processor is sold as a three-core model. In an interview in Warsaw, AMD now officially confirmed that the tri-core models are indeed quad-cores with one deactivated core. In the end, this is a boon to the consumer. Where Intel would sell a processor with one defective core in the notebook sector, since the desktop line does not include a single-core Core 2 processor, AMD's customers will be able to purchase an inexpensive tri-core CPU. However, for now it is unclear when the Phenom X3 processors will go on sale.
Stars Core Micro-Architecture
While AMD's quad-core processor was still in development, the new micro-architecture was referred to as K10. Now, with the official introduction, it is being rechristened the Stars core micro-architecture
The last time AMD introduced a completely new micro-architecture was in September of 2003 with the launch of the Athlon 64. During the long development time for the Phenom processor, a great number of alterations were made to the core design, resulting in a performance increase at the same clock speed.
Technology I - Advanced Memory Prefetcher, SSE4a
As our avid readers will undoubtedly remember, Intel introduced the first SIMD extensions to the X86 ISA in the shape of the MMX instruction set. As a countermove, AMD implemented the 3DNow feature in its own processors. This resulted in a situation where software did not benefit from the same kind of performance boost on both companies' processors, since it had to be specially optimized to take advantage of the extensions. Thankfully, this kind of competition and incompatibility died down, and the SSE, SSE2 and SSE3 extensions used by AMD and Intel were identical. However, the two chipmakers are now parting ways once again, to the detriment of the users and the programmers. With the launch of the Penryn core, Intel introduced the SSE4.1 instruction set. AMD, meanwhile, is implementing SSE4a (formerly known as SSE128) in the new Stars Core micro architecture.
The Phenom's SSE unit is being widened to 128 bits , up from the Athlon 64's 64 bit unit. Additionally, AMD is adding four new instructions , namely EXTRQ/INSERTQ and MOVNTSD/MOVNTSS. Two more instructions, LZCNT/POPCNT, which are primarily used for load operations and bit manipulations functions, are included as well.
Sadly, Intel's SSE4.1 and AMD's SSE4a are incompatible with one another - a fact that may soon cause problems for programmers and users alike.
The advanced memory prefetcher can load data directly from the RAM to the core's L1 cache without needing to take a detour through the L2 cache first. Thus, the data can be loaded into the processor with a much lower latency. Simultaneously, this also results in a lower load on the L2 cache, which can instead buffer data more efficiently, in turn translating into an overall performance boost.
Furthermore, the prefetcher identifies recurring data patterns and can pre-fetch them even before they are requested.
x86 instructions are between 3 and 15 bytes long. Compared to the Athlon 64 core, the data buffer for fetching instructions was increased to 32 bytes, allowing the core to process more instructions simultaneously. Thus, as you can see in our diagram, up to three instructions can be processed at the same time, depending on the length of the instructions.
Technology II - Branch Prediction, Stack Counter
Object-oriented programming languages such as C++, Delphi and Java cause the most problems for branch prediction units. When branching occurs in assembly code, the question is not only whether or not a jump occurs, but also what code module the jump points to. AMD has analyzed the current crop of compilers and tweaked its branch prediction logic to increase the likelihood that the processor chooses the right branch/path. This allows many programs to execute faster.
When program code is executed, its memory address in the stack, which is basically a buffer for data, is stored in the ESP register. Until now, while decoding x86 instructions the processor had to manage the micro-ops for manipulating the ESP register on its own, which required processor time. AMD's Phenom now comes with a sideband counter that monitors the stack independently and automatically adjusts the ESP register. Thus, the instructions for updating the ESP no longer have to be executed, speeding up overall program execution.
Technology III - Virtualisation, L3-Cache, HTT 3.0
The virtualisation functionality integrated in the Phenom processor has also received a notable performance boost . Now, operating systems in virtual environments can interact directly with the hypervisor, the management software for virtual machines. This reduces the switching times between the hypervisor and the virtual machines.
This functionality is already found in the Barcelona-based Opteron processors of the server segment. Since both processors, i.e. the Phenom and the new Opteron, use the same core design, this function can now be used on the desktop without limitation as well.
As a result of AMD's decision to use a single-die design for its quad-core processor, the chipmaker is able to let all four cores share a common L3 cache. In other words, AMD is implementing an L3 cache. Each of the four processor cores possesses its own 512 kB L2 cache. Additionally, all cores have access to the same data pool through the 2MB of L3 cache . This leads to an additional increase in performance.
Additionally, the L3 cache acts as a write buffer for the system memory, which also brings a small performance increase with it.
Unlike the Athlon 64 X2, the Phenom processor no longer uses the Hypertransport 2.0 interface. Instead, AMD pairs it with the faster Hypertransport 3.0 , which increases the available bandwidth to 20.8 GB/s and can result in better 3D performance. This is an especially important factor when a multi-card Crossfire setup is used.
Hypertransport versions and their respective bandwidths:
- Version 1.0: 6.4 GB/s, 1600 MHz
- Version 2.0: 8.0 GB/s, 2000 MHz
- Version 3.0: 20.8 GB/s, 3600 MHz
Since the Hypertransport protocol is downward compatible, users will be able to plug the Phenom into older AM2 socket boards as well as the new AM2+ generation.
Direct AMD Comparison - Phenom And Athlon X2
In the following table we compare the most important technical characteristics of the Phenom and the Athlon X2.
The bottom of the Phenom CPU.
|Phenom vs. Athlon X2
||max. 2.30 GHz
||max. 3.20 GHz
||4x 64+64 kB
||2x 64+64 kB
||4x 512 kB
||2x 1 MB
2x 512 kB
The Phenom processor carries the code name Agena and uses the B2 stepping.
Phenom - Models
Three different CPU models were mentioned in the first slide of the presentation we received in preparation for AMD's event.
|AMD Phenom 9700
||4x 512 kB
|AMD Phenom 9600
||4x 512 kB
|AMD Phenom 9500
||4x 512 kB
The first slide of AMD's Phenom presentation
We tested all three models extensively.
AMD Phenom 9700
AMD Phenom 9600
AMD Phenom 9500
However, once the presentation reached the point where specific models were mentioned, AMD only spoke about two of the models.
Now there's only talk of two CPUs...
As you can imagine, the journalists present asked why AMD's presentation was limited to the Phenom 9500 and 9600 models and what had happened to the 9700...
Phenom Caught A Processor Bug - Remembering The Pentium 60
A statement about the Phenom 9700
Somehow, we couldn't help but think back to the P60 as Dave Everitt informed the audience that a processor bug had found its way into the early Phenom processor samples. The bug causes the system to freeze when a certain combination of instructions coincides with extraordinarily high traffic.
This bug can only be reproduced in the lab but does not occur under normal, real-world conditions. It is still present in the 2.20 GHz and 2.30 GHz versions of the Phenom (9500 and 9600).
As a result of this bug, the 2.4 GHz version of the Phenom with the model number 9700 has been pushed back to January of 2008. When it comes out, that version will not contain the bug.
This turn of events caught both the press and AMD's employees completely by surprise as this fact was completely unknown before the launch event. Currently, many online stores still list the Phenom 9700, but any attempt to order it should end in a cancellation by the retailer.
Still listed - Phenom 9700
We did not encounter any crashes or instabilities with the CPU we received for testing.
We should mention that bugs like these are nothing extraordinary and are a comparatively commonplace occurrence. The processor makers list these bugs in so called Errata that detail the specifics of each bug. In most cases, the error is fixed in the next stepping of the CPU without the user ever knowing it existed in the first place. Intel, for example, details how the errata on each of its processors can be provoked to cause an exception or error. In other words, the fact that the first batch of Phenom processors has a bug shouldn't be dwelled on all that much