Texture Atlases for Fun and Profit...

I'm guessing that everyone who's ever taken "intro to not particularly slow computer graphics" has written their own texture atlas implementation, so hey, I should too :) .  Texture atlases are collections of large numbers of small textures which are packed into a single, larger texture with some book-keeping metadata to allow the GL to translate coordinates expressed in terms of the original texture (0-1) as coordinates in the atlas texture.  Legacy GL even has a texture transform matrix to make this easier.

Why bother?  Well, it's one of those side-effects of the implementation poking through into your code.  Switching textures on the card is slow, as the card needs to load a (potentially large) texture into the extremely-fast-access texture-sampler-accessible memory.  Since the sampler allows random access to the texture and will be called potentially multiple times per pixel, it makes sense to trade off some load-time speed for run-time speed.

Which brings us to the "Individual Quad Anti-pattern" we all hear so much about.  In this pattern the programmer creates, for example, a Font class that renders each glyph of a font as a single texture and compiles the glyph as a display list.  When she wants to render a string of text, she does a little loop which pushes the matrix, renders the first quad with its texture, does a glTranslate() for the advance for the rendered character and renders the second quad with its texture, repeating until the entire string of text is displayed.  You see the same pattern used on game sprites and the like.  There are a number of problems with the pattern, but the one we're focussing on here is the switching of the textures.

For each character in the string we have forced a new texture to be loaded into texture memory.  Sure, each glyph texture was small, but as mentioned, there's a setup penalty for the texture enabling.  What the texture atlas allows is for there to only be one (or a very small number) of texture switches when rendering the text.  We load each of the glyph textures into one huge texture, and then when we render any given quad we specify the texture coordinate as a transformation of the original coordinate into the atlas coordinates.

Those who have been following along in your "intro to not particularly slow computer graphics" texts will see what comes next.  Namely, because the "quads" are now all using the same texture/appearance, we could render them all with a single call that specified the coordinates and texture coordinates for each vertex as a big array of data that would be friendly to the streaming nature of the GPU.

For more general geometry, rendering engines will often do a texture/appearance sort so that they can group geometry by appearance before doing a front-to-back sort to optimize occlusion rendering.  When you are using texture atlases the number of objects that show up in the same "texture plane" increases, which lets the front-to-back sort get closer to an ideal ordering.

Atlases are not perfect solutions.  They can introduce rendering artefacts when two textures are packed next to one another and then the texture is mip-mapped to a smaller size.  The reduction will tend to merge the edge pixels of two textures which with repeating textures can cause rather ugly artefacts.  You can get around that with some more book-keeping and some heuristics (potentially even composing your mip-map levels yourself), though I haven't done so myself.

Oh, yes, I'm aware that Pyglet has a texture atlas implementation I could have copied.  I wanted to build one myself from scratch as a learning exercise.  It's one thing to "know" how something works, it's another to implement it.

Comments

  1. Rene Dudfield

    Rene Dudfield on 04/13/2009 8:53 p.m. #

    hi,

    very cool :)

    Image packing is also a good technique for most places images are used.

    Websites for example can be sped up greatly using the same technique.

    It's a more general form of batching. Rather than acting on a single item, acting on multiple items is often quicker and uses less code.

    I wrote a paper about "batching APIs applied to websites" for the 2007 europython:
    http://rene.f0o.com/~rene/stuff/europython2007/Batching%20APIs,%20as%20applied%20to%20web%20applications.pdf

    Using those techniques you can create websites with 10x the content of the normal approach.

    cheers,

Comments are closed.

Pingbacks

Pingbacks are closed.