Using GPUs as general purpose processors is not a new concept. What’s new is the desire to use these specialized chips for various tasks. Read on for an overview of GPGPU computing and how it applies to gamers.

The modern computer seems to be used more now as a vehicle for connecting to the rest of the world than the instrument of computational aid that it was originally intended to be. Although it is probably not used for such a purpose as often as it should be, the computer sitting in front of you right now is quite capable of carrying out complex mathematic and scientific calculations at speeds that, just 10 years ago, would have seemed impossible. There are, however, computers out there that are used strictly for the computational means for which they were developed. These machines, often housed in expansive, air conditioned rooms in the fine establishments of the academic world, differ in one rather fundamental way from the personal computers that reside in our homes.

The parallel architecture used in modern supercomputers is actually not much different than the technology used in most desktop level PCs. Indeed, modern supercomputers are really just clusters of SIMD (Single Instruction, Multiple Data) processors, not unlike those used in the consumer market. The aforementioned “cluster” however, is what allows supercomputers to achieve levels of massive parallelism not found in the personal computer. Interconnects and shared memory architectures implemented in the top modern supercomputers (currently IBM’s Blue Gene/L, housed at the Lawrence Livermore National Laboratory) are more reminiscent of sever farms and network rendering setups than any modern personal computer. The key here is not only the number of processors involved, but the way in which they collaborate.

The fastest x86 (instruction set used by essentially all personal computers) processors from AMD or Intel, multi-processor server configurations, are capable of just about 60 GFLOPS. The measure of ‘FLOP’, or Floating Point Operations per Second, is the most commonly used means of determining the shear computational power of computing systems. The SI system of prefixes applies to FLOPS, and as such a GFLOP is simply a billion FLOPS. By contrast, the fastest modern supercomputers are capable of 280 TFLOPS, or roughly 4,500 times more FLOPS than the fastest PC. When Sony’s Playstation 3 game console was announced, it was well-publicized that the system was capable of pushing 2 TFLOPS. The Cell Broadband Engine processor used in the PS3 itself was capable of a little over 200 GFLOPS, still about four times faster than the most capable PC.

When you look at the math for the supposed performance capability of the PS3, you notice that somewhere along the line 1.8 TFLOPS of information is unaccounted for. According to Sony marketing material, this missing horsepower manifests itself in the Reality Synthesizer, also known the RSX graphics processor supplied by NVIDIA. A graphics processor pushing 1.8 TFLOPS? You know what they say: if it sounds too good to be true it probably is. Independent tests, and even tests conducted internally from Sony, have revealed that the actual floating point performance of the RSX is more in the 300 GFLOPS neighborhood. This was certainly not the kind of news that champions of the PS3 had wanted to hear, but the number was still an extremely significant one. The RSX in the PS3, for all intents and purposes, is just a slightly modified GPU not unlike those found in most gaming PCs. In fact, the RSX is only half as powerful many of the very fastest GPUs used in today’s gaming PCs. The R600 GPU used in AMD’s Radeon HD2900XT graphics card is capable of just over 600 GFLOPS, with NVIDIA’s G80 not far behind. Two of these cards in CrossFire or SLI breach the TFLOP range.

{mospagebreak heading=Introduction&title=The Bigger Picture}
The Bigger Picture

It really is not even slightly important how powerful certain GPUs are compared to others. When it comes down to it, FLOP performance is a pretty poor means with which to compare the gaming performance of a graphics card, so any mention of this number in an actual graphics card review should be taken with a grain of salt. What is very important about these numbers, however, is that they are A LOT higher than the numbers produced by modern CPUs. The current best consumer-level desktop processor, the Core 2 Extreme QX6850 from Intel, is capable of somewhere in the neighborhood of 30 GFLOPS. The R600 GPU is capable of 600GFLOPS. This means that the R600 is 20 times more powerful than the QX6850 at Floating Point calculations. The Cell B/E used in the PS3 pushes just over 200 GFLOPS, just a third of the R600’s potential.

Of course by taking a step back from the situation, it is pretty clear that the publicized FP capabilities of these processors are somewhat of a far cry from the realized values. Evidence of this can be found by taking a look at the statistics for the distributed computing program Folding@Home (F@H). As a Quasi Supercomputer, the F@H network ranks pretty high on the list of top overall computing power towards a single cause. This overall computing power is broken down into different F@H clients, which vary depending on the operating system they are used for. Windows-based PCs are currently folding at a rate of 165 TFLOPS. There are 173,075 CPUs currently working to produce this statistic. That means that each CPU is contributing about .95 GFLOPS to the F@H network. PS3 systems are currently folding at a rate of 437 TFLOPS, with 24,162 active systems. Each PS3 is contributing about 18 GFLOPS to the F@H network. The only GPU series that supports F@H is the R5XX from ATI. This core, which is used in Radeon X1000 class graphics cards, features 48 programmable pixel shaders (R580) that are particularly well-suited for molecular dynamics calculations. There are currently 663 ATI GPUs folding at a rate of 39 TFLOPS. This means that each GPU is contributing an average of 59 GFLOPS; a 60x increase over traditional CPUs. So why don’t we see GPUs powering all of our applications if they’re so much better? For an answer to that, we have to go back to very origins of the GPU itself.

1981
As computer graphics began to develop beyond that flashing white underscore that practically doesn’t exist today, the need for a dedicated processor to handle the graphical operations associated with various computing tasks began to emerge. The earliest graphics cards were single color, used 4KB of video memory, and was very effective at displaying text on a screen. All GPUs used on graphics cards produced for a period of about 13 years after 1981 were rather primitive, parallel processors that were used to draw geometry on the screen. In 1995, companies like ATI, S3, and Matrox introduced the first 2D/3D graphics chips that were extremely powerful and far more capable of producing 3D images on the screen than were era CPUs. The whole purpose of the GPU was to take a work load from the CPU and process it so the CPU did not have to. GPUs were designed to be good at floating point operations, such as those common in 3D functions. As such, they really were not very good at other kinds of calculations. That was then.

Twelve years of development later, the GPU has transformed into something that barely resembles the single purpose chip that it was in 1995. The addition of pixel shaders – essentially primitive processors in their own right – and the slow transition (that is now pretty much complete) to a unified shader architecture – such that each shader does not have a specific task and can do whatever needs to be done – has transformed the GPU into, basically, a multi-processor system on a single chip; a supercomputer on a chip if you will. So, although GPUs have been around for more than 20 years, and the idea that GPUs could be used for more than just graphics processing has been around since really 2000, the GPU architecture itself has only been truly hospitable to these kinds of calculations for about four years. So now that it’s here, what are we planning to do with it?
{mospagebreak title=Best Laid Schemes}

Best Laid Schemes
The whole idea that GPUs can be used for more than just graphics processing has been given the label GPGPU, General Purpose Graphics Processing Unit. The most public application of GPGPU technology thus far is the Folding@Home campaign, launched in September of 2006. However, GPUs have been used in scientific and medical applications for quite a few years now. NVIDIA’s Tesla GPU Computing Solutions are the first mainstream hardware products devoted to GPGPU work. Tesla GPUs should be pumping out calculations at previously unheard of rates for far more companies and individuals than ever before. Now supercomputer level floating point performance is available to the masses, rather than the few who are fortunate enough to run their applications on current real supercomputers. Scientific calculations and highly-complex 3D rendering will probably be the forte of Tesla GPUs for the time being. Unfortunately for those holding out for more useful applications of the GPGPU concept, you might have a bit of a wait ahead of you, still.

It is true that GPU-based physics processing is right around the corner. However, GPU physics was announced originally by Havok with their Havok FX engine that had been in development since 2005. In March of 2006 at the Game Developers Conference in San Francisco, NVIDIA in a joint announcement with Havok announced that GPU-based physics would be possible on SLI GeForce hardware in the near future. Needless to say more than a year has past without much news on the topic beyond a couple of NVIDIA tech demonstrations. Despite Havok’s attempts at creating a middleware for physics computing on a GPU, the technology we were excited for more than a year ago still is not here.

After reviewing several white papers and technology presentations from the GPGPU organization, it becomes quite apparent that the main thing holding back the development of more applications for GPUs is the difficulty of programming such applications. Since standard code based on the x86 instruction set is not (yet) able to be run on GPUs, any developer making an application for the GPU needs to do so from the ground up. Unfortunately, nobody at FPS Labs has the faintest knowledge of programming beyond html and php, so we can’t really elaborate on the troubles that developers face when creating applications to run on GPUs. What we can say, however, is that there are steps being taken by graphics card manufacturers such as NVIDIA, who have developed something called CUDA, which includes a C-compiler development environment that should aid in coding applications to run on GPUs.

Beyond physics, the only other really promising application of GPGPU technology for gamers is AI calculations. With games becoming increasingly more complex and ‘smart’, AI is getting more and more complex. The calculations necessary to run the AI are becoming a burden for even the fastest CPUs. Since AI calculations are basically floating point algorithms, GPUs would potentially handle such a load much more easily than current hardware.
{mospagebreak title=The Battle for Physics}

The Battle for Physics
When NVIDIA and Havok FX announced their partnership at the Game Developer’s Conference in 2006, another company was rolling out a product that had been in development since 2002. AGEIA’s PhysX PPU launched into retail channels shortly after GDC 2006, and has been the ONLY hardware-based physics accelerator on the market ever since. When it launched, there was a great deal of skepticism in the enthusiast crowd. These doubts were centered more around the affordability of low-end NVIDIA GPUs that would supposedly be able to do exactly the same thing as the PPU, with the added bonus of being able to do normal graphics duty when not needed for physics. Although this NVIDIA/Havok technology has yet to break onto the scene in any sort of purchasable form, the whole concept that it exists has likely hurt AGEIA’s success with what is otherwise one of the most innovative products released in the last five years.

AGEIA’s PhysX PPU, for all intents and purposes, is a very useful product for calculating the game physics in applicable games. The problem is, however, that there are currently very few games on the market that truly make use of AGEIA’s PhysX technology. An even bigger problem is that the games that are currently out have only recently been released, meaning that the PhysX PPU was dominating a market that had no real demand whatsoever. It cannot be disputed that the PhysX PPU is an amazing performer in games like CellFactor: Revolution that were coded from the ground up using the PhysX API. However, it also cannot be disputed that the games that are really going to take the PhysX PPU or other forms of hardware physics acceleration into the limelight have yet to be released.

With NVIDIA’s recent launch of its Tesla family of products, a reasonable analyst has to think that NVIDIA’s first real step into the Havok FX world is right around the corner. Their latest GPU, the G80 core, can be considered one of the most programmable and powerful graphics cores developed to date. Its unified shader architecture, the basis for what NVIDIA calls CUDA, means that all of the stream processors on the G80 core can work together on a single task and complete it up to 100 times more efficiently than a traditional CPU. Nobody doubts the computational ability of the G80 core, and nobody doubts the ability of big green to push their development package into center stage, but we have already been through more than a year of no physics capabilities from NVIDIA hardware, and nobody really seems to be doing anything about it.

The big problem with this battle for physics is that it doesn’t seem to be happening. After GDC 2006, we kind of expected one of the two main combatants - AGEIA or NVIDIA - to come out as a clear winner in the near future. More than a year later, that never happened. Now there are more players in the game. Although never really "out" of the game so to speak, ATI has now become a strong player in the GPGPU/GPU Physics game. Using three ATI graphics cards in a triple-CrossFire setup - something that is possible on many CrossFire chipsets with three x16 PCI-Express ports - should yield a strong CrossFire graphics setup with one GPU dedicated to physics calculations. It would be nice if we could see this concept in action, but its cool nonetheless. AMD/ATI’s future plans for something called "Fusion" is also a potentially huge factor in the future of in-game physics. AMD’s initiative to integrate the GPU architecture onto the same die as the CPU not only promises to speed up the communication between the two processors (obviously), but also to bring about x86 instructions as a standard for GPU applications. Finally, Intel and their recently formed graphics division will probably be a pretty strong contender for top honors in GPU performance by the end of next year. Since Intel started the whole x86 ISA, it is reasonable to believe they would like to put their GPUs to work on the instruction set as well.

In the end, there is a ton of potential for general purpose graphics processing units, but there is limited ability to use that potential. A hodgepodge of different architectures and development environments are the problems that currently plague the rapid development of GPGPU applications. Physics engines such as Havok FX, which touted more than a year ago its ability to run to full effect on NVIDIA graphics hardware, has yet to prove itself in the face of battle. Meanwhile, a whole new kind of specialized chip has been reaping the benefits of accelerated in-game physics when a GPU is supposed to be able to do it just as well. For now, nothing is certain about the future of GPGPU. One thing that is certain, however, is that we, as a collective community, need to get organized and get behind the drive to push GPGPU computing forward.


Sources:
GPGPU.org Homepage
Wikipedia - Graphics Processing Unit
Havok FX
NVIDIA Tesla

Popularity: 5% [?]

Support FPSLabs! Share and Enjoy:
  • Digg
  • del.icio.us
  • Facebook
  • Mixx
  • Google
  • Reddit
  • Slashdot
  • StumbleUpon

You Should Also Check Out These Post:

More Active Posts: