It's essentially because not all GPUs can support function calls—and even if they can, function calls may be quite slow or have limitations such as a very small stack depth.
Shader code and GPU compute code may appear to have function calls all over the place, but under normal circumstances they're all 100% inlined by the compiler. The machine code executed by the GPU contains branches and loops, but no function calls. However, recursive function calls cannot be inlined for obvious reasons. (Unless some of the arguments are compile-time constants, in such a way that the compiler can fold them and inline the entire tree of calls.)
In order to implement true function calls, you need a stack. Most of the time, shader code doesn't use a stack at all—GPUs have large register files and shaders can keep all their data in registers the whole time. It's difficult to make a stack work because (a) you would need a lot of stack space to provide for all the many warps that can be in flight at a time, and (b) the GPU memory system is optimized for batching together a lot of memory transactions to achieve high throughput, but this comes at the expense of latency, so my guess is stack operations like saving/restoring local variables would be awfully slow.
Historically, hardware-level function calls haven't been too useful on the GPU, as it has made more sense to inline everything in the compiler. So GPU architects haven't focused on making them fast. Probably some different tradeoffs could be made, if there is a demand for efficient hardware-level calls in the future, but (as with everything in engineering) it will incur a cost somewhere else.
As far as raytracing is concerned, the way people usually handle this sort of thing is by creating queues of rays that are in the process of being traced. Instead of recursing, you add a ray to a queue, and at the high level somewhere you have a loop that keeps processing until all the queues are empty. It does require significant reorganization of your rendering code if you're starting from a classic recursive raytracer, though. For more info, a good paper to read on this is Wavefront Path Tracing.
2
GPUs work in a different way. (Some architectures) don't have the concept of a global "program stack", so recursive function calls are not possible in those. OpenCL probably adopts the lowest common denominator, thus disallowing it completely to remain portable across GPUs. Newer CUDA hardware seems to have introduced support for recursion at some point: http://stackoverflow.com/q/3644809/1198654
– glampert – 2015-08-29T02:04:27.370