What is Importance Sampling?

23

12

What is importance sampling? Every article I read about it mentions 'PDF' what is that as well?

From what I gather, importance sampling is a technique to only sample areas on a hemisphere that matter more than others. So, ideally, I should sample rays towards light sources to reduce noise and increase speed. Also, some BRDF's at grazing angles have little no difference in the calculation so using importance sampling to avoid that is good?

If I were to implement importance sampling for a Cook-Torrance BRDF how could I do this?

This is link a good read that explains what a PDF is. TL;DR a PDF is a function that describes the probability of (continuous aka floating point) random numbers. Generating random numbers from a specific PDF can be challenging and there are a few techniques for doing so. This talks about one of them. The article after this one talks about another way. https://blog.demofox.org/2017/08/05/generating-random-numbers-from-a-specific-distribution-by-inverting-the-cdf/

– Alan Wolfe – 2017-12-05T21:25:07.703

30

Importance sampling is a method to reduce variance in Monte Carlo Integration by choosing an estimator close to the shape of the actual function.

PDF is an abbreviation for Probability Density Function. A $pdf(x)$ gives the probability of a random sample generated being $x$.

To start, let's review what Monte Carlo Integration is, and what it looks like mathematically.

Monte Carlo Integration is an technique to estimate the value of an integral. It's typically used when there isn't a closed form solution to the integral. It looks like this:

$$\int f(x) \, dx \approx \frac{1}{N} \sum_{i=1}^N \frac{f(x_i)}{pdf(x_{i})}$$

In english, this says that you can approximate an integral by averaging successive random samples from the function. As $N$ gets large, the approximation gets closer and closer to the solution. $pdf(x_{i})$ represents the probability density function of each random sample.

Let's do an example: Calculate the value of the integral $I$.

$$I = \int_{0}^{2\pi} e^{-x} \sin(x) dx$$

Let's use Monte Carlo Integration:

$$I \approx \frac{1}{N} \sum_{i=1}^N \frac{e^{-x} \sin(x_i)}{pdf(x_{i})}$$

A simple python program to calculate this is:

import random
import math

N = 200000
TwoPi = 2.0 * math.pi

sum = 0.0

for i in range(N):
x = random.uniform(0, TwoPi)

fx = math.exp(-x) * math.sin(x)
pdf = 1 / (TwoPi - 0.0)

sum += fx / pdf

I = (1 / N) * sum
print(I)


If we run the program we get $I = 0.4986941$

Using separation by parts, we can get the exact solution:

$$I = \frac{1}{2} (1 − e−2π) = 0.4990663$$

You'll notice that the Monte Carlo Solution is not quite correct. This is because it is an estimate. That said, as $N$ goes to infinity, the estimate should get closer and closer to the correct answer. Already at $N = 2000$ some runs are almost identical to the correct answer.

A note about the PDF: In this simple example, we always take a uniform random sample. A uniform random sample means every sample has the exact same probability of being chosen. We sample in the range $[0, 2\pi]$ so, $pdf(x) = 1 / (2\pi - 0)$

Importance sampling works by not uniformly sampling. Instead we try to choose more samples that contribute a lot to the result (important), and less samples that only contribute a little to the result (less important). Hence the name, importance sampling.

If you choose a sampling function whose pdf very closely matches the shape of $f$, you can greatly reduce the variance, which means you can take less samples. However, if you choose a sampling function whose value is very different from $f$, you can increase the variance. See the picture below: Image from Wojciech Jarosz's Dissertation Appendix A

One example of importance sampling in Path Tracing is how to choose the direction of a ray after it hits a surface. If the surface is not perfectly specular (ie. a mirror or glass), the outgoing ray can be anywhere in the hemisphere.

We could uniformly sample the hemisphere to generate the new ray. However, we can exploit the fact that the rendering equation has a cosine factor in it:

$$L_{\text{o}}(p, \omega_{\text{o}}) = L_{e}(p, \omega_{\text{o}}) + \int_{\Omega} f(p, \omega_{\text{i}}, \omega_{\text{o}}) L_{\text{i}}(p, \omega_{\text{i}}) \left | \cos \theta_{\text{i}} \right | d\omega_{\text{i}}$$

Specifically, we know that any rays at the horizon will be heavily attenuated (specifically, $\cos(x)$ ). So, rays generated near the horizon will not contribute very much to the final value.

To combat this, we use importance sampling. If we generate rays according to a cosine weighted hemisphere, we ensure that more rays are generated well above the horizon, and less near the horizon. This will lower variance and reduce noise.

In your case, you specified that you will be using a Cook-Torrance, microfacet-based BRDF. The common form being:

$$f(p, \omega_{\text{i}}, \omega_{\text{o}}) = \frac{F(\omega_{\text{i}}, h) G(\omega_{\text{i}}, \omega_{\text{o}}, h) D(h)}{4 \cos(\theta_{i}) \cos(\theta_{o})}$$

where

$$F(\omega_{\text{i}}, h) = \text{Fresnel function} \\ G(\omega_{\text{i}}, \omega_{\text{o}}, h) = \text{Geometry Masking and Shadowing function} \\ D(h) = \text{Normal Distribution Function}$$

The blog "A Graphic's Guy's Note" has an excellent write up on how to sample Cook-Torrance BRDFs. I will refer you to his blog post. That said, I will try to create a brief overview below:

The NDF is generally the dominant portion of the Cook-Torrance BRDF, so if we are going to importance sample, the we should sample based on the NDF.

Cook-Torrance doesn't specify a specific NDF to use; we are free to choose whichever one suits our fancy. That said, there are a few popular NDFs:

• GGX
• Beckmann
• Blinn

Each NDF has it's own formula, thus each must be sampled differently. I am only going to show the final sampling function for each. If you would like to see how the formula is derived, see the blog post.

GGX is defined as:

$$D_{GGX}(m) = \frac{\alpha^2}{\pi((\alpha^2-1) \cos^2(\theta) + 1)^2}$$

To sample the spherical coordinates angle $\theta$, we can use the formula:

$$\theta = \arccos \left( \sqrt{\frac{\alpha^2}{\xi_1 (\alpha^2 - 1) + 1}} \right)$$

where $\xi$ is a uniform random variable.

We assume that the NDF is isotropic, so we can sample $\phi$ uniformly:

$$\phi = \xi_{2}$$

Beckmann is defined as:

$$D_{Beckmann}(m) = \frac{1}{\pi \alpha^2\cos^4(\theta)} e^{-\frac{\tan^2(\theta)}{\alpha^2}}$$

Which can be sampled with:

$$\theta = \arccos \left(\sqrt{\frac{1}{1 = \alpha^2 \ln(1 - \xi_1)}} \right) \\ \phi = \xi_2$$

Lastly, Blinn is defined as:

$$D_{Blinn}(m) = \frac{\alpha + 2}{2 \pi} (\cos(\theta))^{\alpha}$$

Which can be sampled with:

$$\theta = \arccos \left(\frac{1}{\xi_{1}^{\alpha + 1}} \right) \\ \phi = \xi_2$$

Putting it in Practice

Let's look at a basic backwards path tracer:

void RenderPixel(uint x, uint y, UniformSampler *sampler) {
Ray ray = m_scene->Camera.CalculateRayFromPixel(x, y, sampler);

float3 color(0.0f);
float3 throughput(1.0f);

// Bounce the ray around the scene
for (uint bounces = 0; bounces < 10; ++bounces) {
m_scene->Intersect(ray);

// The ray missed. Return the background color
if (ray.geomID == RTC_INVALID_GEOMETRY_ID) {
color += throughput * float3(0.846f, 0.933f, 0.949f);
break;
}

// We hit an object

// Fetch the material
Material *material = m_scene->GetMaterial(ray.geomID);
// The object might be emissive. If so, it will have a corresponding light
// Otherwise, GetLight will return nullptr
Light *light = m_scene->GetLight(ray.geomID);

// If we hit a light, add the emmisive light
if (light != nullptr) {
color += throughput * light->Le();
}

float3 normal = normalize(ray.Ng);
float3 wo = normalize(-ray.dir);
float3 surfacePos = ray.org + ray.dir * ray.tfar;

// Get the new ray direction
// Choose the direction based on the material
float3 wi = material->Sample(wo, normal, sampler);
float pdf = material->Pdf(wi, normal);

// Accumulate the brdf attenuation
throughput = throughput * material->Eval(wi, wo, normal) / pdf;

// Shoot a new ray

// Set the origin at the intersection point
ray.org = surfacePos;

// Reset the other ray properties
ray.dir = wi;
ray.tnear = 0.001f;
ray.tfar = embree::inf;
ray.geomID = RTC_INVALID_GEOMETRY_ID;
ray.primID = RTC_INVALID_GEOMETRY_ID;
ray.instID = RTC_INVALID_GEOMETRY_ID;
ray.time = 0.0f;
}

m_scene->Camera.FrameBuffer.SplatPixel(x, y, color);
}


IE. we bounce around the scene, accumulating color and light attenuation as we go. At each bounce, we have to choose a new direction for the ray. As mentioned above, we could uniformly sample the hemisphere to generate the new ray. However, the code is smarter; it importance samples the new direction based on the BRDF. (Note: This is the input direction, because we are a backwards path tracer)

// Get the new ray direction
// Choose the direction based on the material
float3 wi = material->Sample(wo, normal, sampler);
float pdf = material->Pdf(wi, normal);


Which could be implemented as:

void LambertBRDF::Sample(float3 outputDirection, float3 normal, UniformSampler *sampler) {
float rand = sampler->NextFloat();
float r = std::sqrtf(rand) * radius;
float theta = sampler->NextFloat() * 2.0f * M_PI;

float x = r * std::cosf(theta);
float y = r * std::sinf(theta);

// Project z up to the unit hemisphere
float z = std::sqrtf(1.0f - x * x - y * y);

return normalize(TransformToWorld(x, y, z, normal));
}

float3a TransformToWorld(float x, float y, float z, float3a &normal) {
// Find an axis that is not parallel to normal
float3a majorAxis;
if (abs(normal.x) < 0.57735026919f /* 1 / sqrt(3) */) {
majorAxis = float3a(1, 0, 0);
} else if (abs(normal.y) < 0.57735026919f /* 1 / sqrt(3) */) {
majorAxis = float3a(0, 1, 0);
} else {
majorAxis = float3a(0, 0, 1);
}

// Use majorAxis to create a coordinate system relative to world space
float3a u = normalize(cross(normal, majorAxis));
float3a v = cross(normal, u);
float3a w = normal;

// Transform from local coordinates to world coordinates
return u * x +
v * y +
w * z;
}

float LambertBRDF::Pdf(float3 inputDirection, float3 normal) {
return dot(inputDirection, normal) * M_1_PI;
}


After we sample the inputDirection ('wi' in the code), we use that to calculate the value of the BRDF. And then we divide by the pdf as per the Monte Carlo formula:

// Accumulate the brdf attenuation
throughput = throughput * material->Eval(wi, wo, normal) / pdf;


Where Eval() is just the BRDF function itself (Lambert, Blinn-Phong, Cook-Torrance, etc.):

float3 LambertBRDF::Eval(float3 inputDirection, float3 outputDirection, float3 normal) const override {
return m_albedo * M_1_PI * dot(inputDirection, normal);
}


For example GGX, to sample spherical coordinates angle cos(θ) we use the importance sampled formula to calculate the angle and use that in GGX as usual right? Or does the formula replace GGX entirely? – Arjan Singh – 2017-04-19T15:22:30.663

2I added a section to help answer your questions. But, in short, your first method is correct. You use the sampling formula to generate a direction, then you use that new direction in the normal GGX formula and to get the pdf for the Monte Carlo formula. – RichieSams – 2017-04-19T21:26:38.637

For GGX how would I calculate/sample wi? I understand how to sample the spherical coordinates angle θ but for the actual direction vector how is that done? – Arjan Singh – 2017-06-13T14:59:53.260

GGX is assumed to be isotropic so $\phi = \xi_{2}$. With $\phi$ and $\theta$, you can transform to cartesian coordinates to get $w_i$ http://tutorial.math.lamar.edu/Classes/CalcIII/SphericalCoords.aspx

– RichieSams – 2017-06-14T15:05:38.347

8

If you have a 1D function $f(x)$ and you want to integrate this function from say 0 to 1, one way to perform this integration is by taking N random samples in range [0, 1], evaluate $f(x)$ for each sample and calculate the average of the samples. However, this "naive" Monte Carlo integration is said to "converge slowly", i.e. you need a large number of samples to get close to the ground truth, particularly if the function has high frequencies.

With importance sampling, instead of taking N random samples in [0, 1] range, you take more samples in the "important" regions of $f(x)$ that contribute most to the final result. However, because you bias sampling towards the important regions of the function, these samples must be weighted less to counter the bias, which is where the PDF (probability density function) comes along. PDF tells the probability of a sample at given position and is used to calculate weighted average of the samples by dividing the each sample with the PDF value at each sample position.

With Cook-Torrance importance sampling the common practice is to distribute samples based on the normal distribution function NDF. If NDF is already normalized, it can serve directly as PDF, which is convenient since it cancels the term out from the BRDF evaluation. Only thing you need to do then is to distribute sample positions based on PDF and evaluate BRDF without the NDF term, i.e. $$f=\frac{FG}{\pi(n\cdot\omega_i)(n\cdot\omega_o)}$$ And calculate average of the sample results multiplied by the solid angle of the domain you integrate over (e.g. $2\pi$ for hemisphere).

For NDF you need to calculate Cumulative Distribution Function of the PDF to convert uniformly distributed sample position to PDF weighted sample position. For isotropic NDF this simplifies to 1D function because of the symmetry of the function. For more details about the CDF derivation you can check this old GPU Gems article.