Zelimir Fedoran
email zelimir.fedoran@gmail.com

General Purpose GPU Computation

Posted on
March 13th, 2009
, , ,


The GPU has become increasingly more attractive as a general purpose computing device, with every new hardware iteration providing significantly greater computational power. As a result, over the past few months I have been exploring and researching several methods for accessing and harnessing the GPU for general non-graphics applications.

Nvidia CUDA Framework

GPU programming tools have been evolving very rapidly over the past few years. Recently, Nvidia launched the CUDA framework for general GPU computation allowing many researchers and industry professionals to develop new GPU algorithms for solving difficult problems such as sorting, linear algebra, database operations, pattern and data matching, n-body simulations, etc…

CUDA provides a more flexible level of access using a C-like language while removing the graphics pipeline entirely. Previously impossible tasks such as scatter operations can now be implemented using CUDA. However, it is important to note that this flexibility is at a cost of complexity. As a result, many trivial algorithms need to be carefully thought out in order to achieve reasonable performance. Also, the CUDA drivers require Nvidia 8000 series graphics hardware or higher. Fortunately, most recent mid to high end computers with Nvidia graphics units meet this criteria.

Shader Level GPGPU

In the past, general purpose GPU algorithms were mapped to the graphics pipeline using shader languages. Unfortunately, certain parallel CPU based algorithms cannot not be mapped efficiently to the GPU using shaders. However, with that said shaders do provide a higher level of abstraction which do not require the developer to worry about things like which threads are running on the same block. (I highly encourage anyone with an interest in shader based GPGPU to grab a hardcover copy of GPU Gems 2 or download the online version, available for free, on Nvidia’s website.)

GPGPU Shader Engine

Many algorithms use the GPU to accelerate simulations such as cellular automata, fluid dynamics, n-body and rigid-body simulations. A common characteristic of these algorithms is that the simulation can be simplified into a nearest-neighbor or all-pairs problem. Thus we can create a framework which provides this functionality. However, an abstract language is desirable for elements which are not similar between different simulations. Therefore, CUDA does not offer a great solution for a generalized simulation engine.

A basic GPGPU application utilizes the graphics pipeline in the following way:

  1. Initialize the GPU using a graphics API
  2. Initialize GPU data buffers
  3. Initialize function kernels
  4. Declare and specify constant parameters
  5. Invoke a kernel
  6. Read resulting data buffer

It is possible to create an application which generalizes the above steps into the following:

  1. Read XML files
  2. Read HLSL files
  3. Initialize the GPU using parameters from the XML file
  4. Initialize GPU data buffers using parameters from the XML file
  5. Initialize function kernels from the HLSL files
  6. Declare and specify constant parameters
  7. Invoke kernels specified in order, by an XML file
  8. Read resulting data buffer

Further more, it is possible to allow run-time changes to the HLSL file by dynamically re-compiling the it. As a result, it is possible to change simulations on the fly without affecting the data on which it is running.


For the time being the results are limited to simplistic simulations since I am still in the process of developing the application. The compiled shader byte code running on the GPU is the same regardless of which framework or language is invoking it from the CPU. Therefore, I decided to use the power and rapid development capabilities of the C# language combined with the .Net and XNA frameworks to my advantage.