Why is recursion forbidden in OpenCL?

17

3

I'd like to use OpenCL to accelerate rendering of raytraced images, but I notice that the Wikipedia page claims that recursion is forbidden in Open CL. Is this true? As I make extensive use of recursion when raytracing, this will require a considerable amount of redesign in order to benefit from the speed up. What is the underlying restriction that prevents recursion? Is there any way around it?

trichoplax

Posted 2015-08-28T18:47:48.083

Reputation: 3 748

2

GPUs work in a different way. (Some architectures) don't have the concept of a global "program stack", so recursive function calls are not possible in those. OpenCL probably adopts the lowest common denominator, thus disallowing it completely to remain portable across GPUs. Newer CUDA hardware seems to have introduced support for recursion at some point: http://stackoverflow.com/q/3644809/1198654

– glampert – 2015-08-29T02:04:27.370

Answers

23

It's essentially because not all GPUs can support function calls—and even if they can, function calls may be quite slow or have limitations such as a very small stack depth.

Shader code and GPU compute code may appear to have function calls all over the place, but under normal circumstances they're all 100% inlined by the compiler. The machine code executed by the GPU contains branches and loops, but no function calls. However, recursive function calls cannot be inlined for obvious reasons. (Unless some of the arguments are compile-time constants, in such a way that the compiler can fold them and inline the entire tree of calls.)

In order to implement true function calls, you need a stack. Most of the time, shader code doesn't use a stack at all—GPUs have large register files and shaders can keep all their data in registers the whole time. It's difficult to make a stack work because (a) you would need a lot of stack space to provide for all the many warps that can be in flight at a time, and (b) the GPU memory system is optimized for batching together a lot of memory transactions to achieve high throughput, but this comes at the expense of latency, so my guess is stack operations like saving/restoring local variables would be awfully slow.

Historically, hardware-level function calls haven't been too useful on the GPU, as it has made more sense to inline everything in the compiler. So GPU architects haven't focused on making them fast. Probably some different tradeoffs could be made, if there is a demand for efficient hardware-level calls in the future, but (as with everything in engineering) it will incur a cost somewhere else.

As far as raytracing is concerned, the way people usually handle this sort of thing is by creating queues of rays that are in the process of being traced. Instead of recursing, you add a ray to a queue, and at the high level somewhere you have a loop that keeps processing until all the queues are empty. It does require significant reorganization of your rendering code if you're starting from a classic recursive raytracer, though. For more info, a good paper to read on this is Wavefront Path Tracing.

Nathan Reed

Posted 2015-08-28T18:47:48.083

Reputation: 15 036

5I'm reluctant to share this secret sauce, but I've had pretty good luck having a fixed maximum bounce count and having a stack of a fixed size (and a loop with a fixed number of iterations) to handle this. Also (and this is the real secret sauce imo!) I have my materials be either reflective or refractive but never both, which makes it so rays don't split when they bounce. The end result of all this is recursive type raytraced rendering, but through fixed size iteration, not recursion. – Alan Wolfe – 2015-08-29T03:03:43.217

Like tail recursion? – Tanmay Patil – 2015-08-31T20:34:32.567