How can virtual texturing actually be efficient?



For reference, what I'm referring to is the "generic name" for the technique first(I believe) introduced with idTech 5's MegaTexture technology. See the video here for a quick glance on how it works.

I've been skimming some papers and publications related to it lately, and what I don't understand is how it can possibly be efficient. Doesn't it require constant recalculation of UV coordinates from the "global texture page" space into the virtual texture coordinates? And how doesn't that curb most attempts at batching geometry altogether? How can it allow arbitrary zooming in? Wouldn't it at some point require subdividing polygons?

There just is so much I don't understand, and I have been unable to find any actually easily approachable resources on the topic.


Posted 2015-12-02T10:07:54.247

Reputation: 343




The main reason for Virtual Texturing (VT), or Sparse Virtual Textures, as it is sometimes called, is as a memory optimization. The gist of the thing is to only move into video memory the actual texels (generalized as pages/tiles) that you might need for a rendered frame. So it will allow you to have much more texture data in offline or slow storage (HDD, Optical-Disk, Cloud) than it would otherwise fit on video memory or even main memory. If you understand the concept of Virtual Memory used by modern Operating Systems, it is the same thing in its essence (the name is not given by accident).

VT does not require recomputing UVs in the sense that you'd do that each frame before rendering a mesh, then resubmit vertex data, but it does require some substantial work in the Vertex and Fragment shaders to perform the indirect lookup from the incoming UVs. In a good implementation however, it should be completely transparent for the application if it is using a virtual texture or a traditional one. Actually, most of the time an application will mix both types of texturing, virtual and traditional.

Batching can in theory work very well, though I have never looked into the details of this. Since the usual criteria for grouping geometry are the textures, and with VT, every polygon in the scene can share the same "infinitely large" texture, theoretically, you could achieve full scene drawing with 1 draw call. But in reality, other factors come into play making this impractical.

Issues with VT

Zooming in/out and abrupt camera movement are the hardest things to handle in a VT setup. It can look very appealing for a static scene, but once things start moving, more texture pages/tiles will be requested than you can stream for external storage. Async file IO and threading can help, but if it is a real-time system, like in a game, you'll just have to render for a few frames with lower resolution tiles until the hi-res ones arrive, every now and then, resulting in a blurry texture. There's no silver bullet here and that's the biggest issue with the technique, IMO.

Virtual Texturing also doesn't handle transparency in an easy way, so transparent polygons need a separate traditional rendering path for them.

All in all, VT is interesting, but I wouldn't recommend it for everyone. It can work well, but it is hard to implement and optimize, plus there's just too many corner cases and case-specific tweaks needed for my taste. But for large open-world games or data visualization apps, it might be the only possible approach to fit all the content into the available hardware. With a lot of work, it can be made to run fairly efficiently even on limited hardware, like we can see in the PS3 and XBOX360 versions of id's Rage.


I have managed to get VT working on iOS with OpenGL-ES, to a certain degree. My implementation is not "shippable", but I could conceivably make it so if I wanted and had the resources. You can view the source code here, it might help getting a better idea of how the pieces fit together. Here's a video of a demo running on the iOS Sim. It looks very laggy because the simulator is terrible at emulating shaders, but it runs smoothly on a device.

The following diagram outlines the main components of the system in my implementation. It differs quite a bit from Sean's SVT demo (link down bellow), but it is closer in architecture to the one presented by the paper Accelerating Virtual Texturing Using CUDA, found in the first GPU Pro book (link bellow).

virtual texturing system

  • Page Files are the virtual textures, already cut into tiles (AKA pages) as a preprocessing step, so they are ready to be moved from disk into video memory whenever needed. A page file also contains the whole set of mipmaps, also called virtual mipmaps.

  • Page Cache Manager keeps an application-side representation of the Page Table and Page Indirection textures. Since moving a page from offline storage to memory is expensive, we need a cache to avoid reloading what is already available. This cache is a very simple Least Recently Used (LRU) cache. The cache is also the component responsible for keeping the physical textures up-to-date with its own local representation of the data.

  • The Page Provider is an async job queue that will fetch the pages needed for a given view of the scene and send them to the cache.

  • The Page Indirection texture is a texture with one pixel for each page/tile in the virtual texture, that will map the incoming UVs to the Page Table cache texture that has the actual texel data. This texture can get quite large, so it must use some compact format, like RGBA 8:8:8:8 or RGB 5:6:5.

But we are still missing a key piece here, and that's how to determine which pages must be loaded from storage into the cache and consequently into the Page Table. That's where the Feedback Pass and the Page Resolver enter.

The Feedback Pass is a pre-render of the view, with a custom shader and at a much lower resolution, that will write the ids of the required pages to the color framebuffer. That colorful patchwork of the cube and sphere above are actual page indexes encoded as an RGBA color. This pre-pass rendering is then read into main memory and processed by the Page Resolver to decode the page indexes and fire the new requests with the Page Provider.

After the Feedback pre-pass, the scene can be rendered normally with the VT lookup shaders. But note that we don't wait for new page request to finish, that would be terrible, because we'd simply block on synchronous file IO. The requests are asynchronous and might or might not be ready by the time the final view is rendered. If they are ready, sweet, but if not, we always keep a locked page of a low-res mipmap in the cache as a fallback, so we have some texture data in there to use, but it is going to be blurry.

Others resources worth checking out

VT is still a somewhat hot topic on Computer Graphics, so there's tons of good material available, you should be able to find a lot more. If there's anything else I can add to this answer, please feel free to ask. I'm a bit rusty on the topic, haven't read much about it for the past year, but it is alway good for the memory to revisit stuff :)


Posted 2015-12-02T10:07:54.247

Reputation: 949

Hey, thank you for the excellent answer. I know this is typically frowned upon, but I have various issues, so I mostly just skim through things - to get an intuitive overview of topics for the future(I'm afraid properly learning and implementing things is out of my reach for the moment) - anyway, if possible, could you post a pseudocode example outlining the process itself, ideally, but not necessarily, illustrated? – Llamageddon – 2015-12-03T09:33:14.113

1@Llamageddon, it just so happens that I still had a diagram at hand ;) I'm afraid pseudo-code is going to bit a bit hard to provide, since there's quite a bit of real code to it. But I hope the expanded answer helps giving a general idea of the technique. – glampert – 2015-12-03T18:47:15.340

Amazing answer, though I still find some details unclear: If the pre-pass is low-res, isn't it possible to miss some textures altogether? What happens then? How do the shaders for the pre-pass and final render look? Is the pre-pass used only for fetching the textures, or...? – Llamageddon – 2015-12-03T22:03:09.053


It's worth noting that most modern hardware now exposes programmable page tables, eliminating the need for a redirection texture. This is exposed through e.g. [tag:directx12] reserved resources.aspx), which builds on [tag:directx11] tiled resources.aspx), or [tag:opengl] sparse textures.

– MooseBoys – 2015-12-04T02:32:18.077

1@Llamageddon, the feedback pre-pass can be done at a lower res to save as much computing and memory as possible, since pixels for a page will generally repeat (you can notice the big colored squares in my demo). You're correct that it might eventually miss a visible page like that, but that's not usually going to have a big visual impact because the system should always keep at least the lowest mipmap of the whole VT available in cache. That second paper I linked has all the shader examples in the appendix, you can also refer to the repo for my own project, they are similar. – glampert – 2015-12-04T03:15:23.287

The feedback pre-pass is only useful for determining the visible set of pages needed for a view/frame, but you could also combine something like depth pre-pass to it. – glampert – 2015-12-04T03:16:55.707

BTW, why do you say that "Virtual Texturing also doesn't handle transparency in an easy way"? Transparency has nothing to do with how textures are stored in memory... – Nathan Reed – 2015-12-04T19:52:20.630

@NathanReed, it's because of the page id pre-pass. It's impossible to handle transparency there, because blending 2 or more colors would produce a different page number. What I mean is, suppose object A gets assigned page X which when encoded into a color makes a red tone, now object B gets assigned page Y, which results on green when encoded in the feedback pass. If A and B were to get blended, the color output in the pre-pass for those pixels would be a shade of yellow, translating to the wrong page id/number. – glampert – 2015-12-05T00:00:57.850


@glampert Ahh, I see; that makes sense. Still, I think there are lots of options for handling transparencies; in the page ID pass, you could dither (so histogramming would see all the pages, unless there were a huge number of transparent layers), or use a k-buffer approach, or even just base transparent texture residency on which objects are near the camera (as opposed to rendering them in a feedback pass).

– Nathan Reed – 2015-12-05T02:12:20.280


Virtual Texturing is the logical extreme of texture atlases.

A texture atlas is a single giant texture that contains textures for individual meshes inside it:

Texture Atlas Example

Texture atlases became popular due to the fact that changing textures causes a full pipeline flush on the GPU. When creating the meshes, the UVs are compressed/shifted so that they represent the correct 'portion' of the whole texture atlas.

As @nathan-reed mentioned in the comments, one of the main drawbacks of texture atlases is losing wrap modes such as repeat, clamp, border, etc. In addition, if the textures don't have enough border around them, you can accidentally sample from an adjacent texture when doing filtering. This can lead to bleeding artifacts.

Texture Atlases do have one major limitation: size. Graphics APIs place a soft limit on how big a texture can be. That said, graphics memory is only so big. So there is also a hard limit on texture size, given by the size of your v-ram. Virtual textures solve this problem, by borrowing concepts from virtual memory.

Virtual textures exploit the fact that in most scenes, you only see a small portion of all the textures. So, only that subset of textures need to be in vram. The rest can be in main RAM, or on disk.

There are a few ways to implement it, but I will explain the implementation described by Sean Barrett in his GDC talk. (which I highly recommend watching)

We have three main elements: the virtual texture, the physical texture, and the lookup table.

Virtual Texture

The virtual texture represents the theoretical mega atlas we would have if we had enough vram to fit everything. It doesn't actually exist in memory anywhere. The physical texture represents what pixel data we actually have in vram. The lookup table is the mapping between the two. For convenience, we break all three elements into equal sized tiles, or pages.

The lookup table stores the location of the top-left corner of the tile in the physical texture. So, given a UV to the entire virtual texture, how do we get the corresponding UV for the physical texture?

First, we need to find the location of the page within the physical texture. Then we need to calculate the location of the UV within the page. Finally we can add these two offsets together to get the location of the UV within the physical texture

float2 pageLocInPhysicalTex = ...
float2 inPageLocation = ...
float2 physicalTexUV = pageLocationInPhysicalTex + inPageLocation;

Calculating pageLocInPhysicalTex

If we make the lookup table the same size as the number of tiles in the virtual texture, we can just sample the lookup table with nearest neighbor sampling and we will get the location of the top-left corner of the page within the physical texture.

float2 pageLocInPhysicalTex = lookupTable.Sample(virtTexUV, nearestNeighborSampler);

Calculating inPageLocation

inPageLocation is a UV coordinate that is relative to the top-left of the page, rather than to the top-left of the whole texture.

One way to calculate this is by subtracting off the UV of the top left of the page, then scaling to the size of the page. However, this is quite a bit of math. Instead, we can exploit how IEEE floating point is represented. IEEE floating point stores the fractional part of a number by a series of base 2 fractions.

enter image description here

In this example, the number is:

number = 0 + (1/2) + (1/8) + (1/16) = 0.6875

Now lets look at a simplified version of the virtual texture:

Simple Virtual Texture

The 1/2 bit tells us if we're in the left half of the texture or the right. The 1/4 bit tells us which quarter of the half we're in. In this example, since the texture is split into 16, or 4 to a side, these first two bits tell us what page we're in. The remaining bits tell us the location inside the page.

We can get the remaining bits by shifting the float with exp2() and stripping them out with fract()

float2 inPageLocation = virtTexUV * exp2(sqrt(numTiles));
inPageLocation = fract(inPageLocation);

Where numTiles is a int2 giving the number of tiles per side of the texture. In our example, this would be (4, 4)

So let's calculate the inPageLocation for the green point, (x,y) = (0.6875, 0.375)

inPageLocation = float2(0.6875, 0.375) * exp2(sqrt(int2(4, 4));
               = float2(0.6875, 0.375) * int2(2, 2);
               = float2(1.375, 0.75);

inPageLocation = fract(float2(1.375, 0.75));
               = float2(0.375, 0.75);

One last thing to do before we're done. Currently, inPageLocation is a UV coordinate in the virtual texture 'space'. However, we want a UV coordinate in the physical texture 'space'. To do this we just have to scale inPageLocation by the ratio of virtual texture size to physical texture size

inPageLocation *= physicalTextureSize / virtualTextureSize;

So the finished function is:

float2 CalculatePhysicalTexUV(float2 virtTexUV, Texture2D<float2> lookupTable, uint2 physicalTexSize, uint2 virtualTexSize, uint2 numTiles) {
    float2 pageLocInPhysicalTex = lookupTable.Sample(virtTexUV, nearestNeighborSampler);

    float2 inPageLocation = virtTexUV * exp2(sqrt(numTiles));
    inPageLocation = fract(inPageLocation);
    inPageLocation *= physicalTexSize / virtualTexSize;

    return pageLocInPhysicalTex + inPageLocation;


Posted 2015-12-02T10:07:54.247

Reputation: 2 227

I'm not, I'm referring to virtual texturing, most well known as idTech 5's MegaTexture technology. Also see this and this. I've seen it mentioned in overview of many modern engines' rendering pipelines, and in a few papers that use a similar approach for shadowmaps. It does have a lot in common with texture atlases, yes, it uses them, in a way, but I'm not confusing it with texture atlases.

– Llamageddon – 2015-12-02T18:18:08.880

Ahh. Thanks for the links. Can you add them to the question. I will update my answer accordingly – RichieSams – 2015-12-02T18:49:41.500

3IMO, the main drawback of simple texture atlases (not virtual textures) is you lose wrap modes like repeat and clamp, and bleeding occurs due to filtering/mipmapping - not floating-point precision. I'd be surprised to see float precision becoming a problem for ordinary (non-virtual) textures; even a 16K texture (the max allowed by current APIs) isn't big enough to really strain float precision. – Nathan Reed – 2015-12-02T20:58:17.627

@RichieSams Btw, I think your answer is a good one, even if to a different question. You should make a Q&A post. – Llamageddon – 2015-12-03T09:29:11.247

Hmm, this explains it quite well, though I don't really understand how it works with mip levels. I wish I could write down my specific problem with understanding it down, but it kinda eludes me... – Llamageddon – 2015-12-04T10:29:13.167

@NathanReed Why would you need the "repeat" mode if you have a quasy unlimited amount of texture space? – Dudeson – 2018-01-12T03:13:27.813