Features Overview: Lightspeed Memory Architecture Continued
Crossbar Memory Controller
One of the chief advancements on the GeForce 3 comes with the implementation of a whole new memory controller. As we have learnt earlier, the memory bottleneck is proving to be a big hindrance to achieving better graphics performance. NVIDIA has thus, borrowed the idea of a crossbar-based memory controller from high-speed server memory technologies to improve efficiency of access to the frame buffer.
Previous 128-bit memory controllers like those found on the GeForce 2 manages access to the frame buffer memory (DDR) in 256-bit "chunks", but it has been found that this block size is not always optimal. Present day interactive content are generated by triangle sizes of only several pixels and much of the access bandwidth is wasted.
The crossbar memory controller uses a complex load balanced system of four independent memory controllers and it is optimized for accessing the frame buffer with a fine granularity access pattern, down to sizes of 64-bits of individual access, whilst retaining the full capability of accessing 256-bits of information in one clock cycle.
As touted by NVIDIA, this design provides up to a four-fold increase in memory bandwidth efficiency under certain load conditions.
Occlusion detection falls into solving the problem of scene overdraw. The method is more accurately defined as z-occlusion culling.
As mentioned earlier, scene overdraw happens when a pixel is rendered even if it is not displayed onscreen, taking up precious resources that could have been better utilized. As an example, for a scene of depth complexity two (which means that for every pixel displayed, two pixels, on the average, would have to be rendered), two accesses to the frame buffer is required.
Now on the GeForce 3, a hardware unit is used to determine if a pixel will ultimately be displayed and helps save precious frame buffer bandwidth whenever suitable.
NVIDIA has also moved part of this responsibility of improving efficiency to the game designer, who knows best if his scene has objects that is often occluded at a particular point during gameplay. He can then code a specfic "occlusion query" for any region in a frame to be tested for visibility when it is processed by the GeForce 3.
Last but not least, in the quest to improve efficiency - a key word repeated over and over again in this article - the GeForce 3 applies a lossless form of data compression on the z-buffer data. How is this significant?
As the z-buffer represents the depth of each pixel that is ultimately shown onscreen, graphics processors typically read and potentially write z data for every pixel rendered, and uses up a large amount of available bandwidth.
Again, the compression and decompression is performed in real-time and implemented in hardware which is transparent to applications. The lossless method means that there will be no degredation of image quality or precision. All sounds good and well.
Note: There are a host of features and effects not entirely covered or illustrated in this writeup, you may wish to download NVIDIA's Technical Brief on the GeForce 3: Lightspeed Memory Architecture(.PDF) here.