Say I have one server with 10 GPUs. I have a python program which detects available GPU and use all of them.
I have a couple of users who will run python (Machine learning or data mining) programs and use GPU.
I initially thought to use Hadoop, as I find Yarn is good at managing resources, including GPU, and YARN has certain scheduling strategies, like fair, FIFO, capacity.
I don't like hard-coded rules, eg. user1 can only use gpu1, user2 can only use gpu2.
I later find Hadoop seems to require the program written in map-reduce pattern, but my requirement is to run unmodified code as we run on Windows or local desktop, or modify as little as possible.
Which knowledge should I look at for running and scheduling python programs on a machine with multiple GPUs?