Finally, I turn to the GCC team for some information on how the compiler may help many applications get “free” performance by using an Core 2/SSE4/Penryn compiler. I quickly got a response to the same SSE4 inquiry from developers H. J. Lu and Jan Hubicka:
H. J.: “The main benefit of SSE4 is the wider vectorizer support. More loops can be vectorized when SSE4 is enabled. The performance boost from SSE4 vectorizer varies, depending on applications.”
“On the other hand, processors with SSE4 support run faster than the current Core 2 Duo at the same clock speed, due to other architecture improvements. That means even if you just use plain -O2 with gcc 4.3, your executables will run faster on processors with SSE4 support than compiled with gcc 4.2 or older.”
Jan: “The generic model works well for core2 (as well as for AMD chips) so distros compiled with GCC 4.2 or 4.3 will work better on those chips. (originally most distros was optimized for i686 for 32bit and for K8 for 64bits, especially the second results in quite big performance loses for core2).”
“When you compare -mtune=generic relative to -march=core2, the main benefits as pointed out by H.J. comes from SSE4 support and auto-vectorization. Integer codegen differs just little, but it might accumulate to noticeable speedups for some specific codebase.”
With that answer came a few follow-up questions.
Linux Hardware: Will GCC do vectorization internally or is this something the developer will have to code manually?
H. J.: “You only need to add -ftree-vectorize -msse4.1.”
Jan: “There is -ftree-vectorize for automatic vectorization and SSE intrinsics for writing SSE code by hand.”
LH: Will there be a "penryn" -march option in GCC or will people need to specify -msse4 manually?
H. J.: “There is no penryn -march switch since penryn isn't a product name. You should use -msse4.1. BTW, -msse4 means -msse4.1 -msse4.2. You should use -msse4.1 for Penryn class processors since they only support SSE4.1.”
Following this, there was some discussion about the “nocona” arch name and how it wasn't a product either. There was some final concession that there might be a “core2-sse4.1” arch.
LH: Is there an easy way to tell which optimizations will be enabled (by default) for each -march?
H. J.: “# info gcc, and search for it.”
LH: How far are we from an official 4.3 release? Is the current pre-release snapshot reasonably safe to use?
H. J.: “No one knows when 4.3 will be released. I have been tracking stability and performance of gcc 4.3. It is quite good. But we need more tests with real applications. I would encourage users to try pre-release snapshot and report any issue with a gcc bug report.”
Jan: “Well, this is quite difficult question ;) there are 170 bugs classified as serious regressions, while 4.2.2 release has rougly 150 known bugs in this category, whether is safe enough for your use depends your expectations. I use it for daily work just to get it tested and it is mostly fine.”
After that discussion, I felt very hopeful about what the Penryn will bring to Linux thanks to the work of the team on the GCC project. Not only are they confident that the new features of the processor will provide noticeable performance improvements, work has gone into the compiler using the “core2” architecture flag along with the “-msse4.1” to get the most out of the Penryn architectural features.