How to run unmodified Python program on GPU servers with scheduled GPUs?


Say I have one server with 10 GPUs. I have a python program which detects available GPU and use all of them.

I have a couple of users who will run python (Machine learning or data mining) programs and use GPU.

I initially thought to use Hadoop, as I find Yarn is good at managing resources, including GPU, and YARN has certain scheduling strategies, like fair, FIFO, capacity.

I don't like hard-coded rules, eg. user1 can only use gpu1, user2 can only use gpu2.

I later find Hadoop seems to require the program written in map-reduce pattern, but my requirement is to run unmodified code as we run on Windows or local desktop, or modify as little as possible.

Which knowledge should I look at for running and scheduling python programs on a machine with multiple GPUs?


Posted 2021-01-26T15:31:54.870

Reputation: 103

Please consider marking the answer as correct if deemed so. – noe – 2021-02-03T20:10:02.107



A popular solution used for job management on GPU environments is SLURM.

SLURM allows specifying the resources needed by a job (e.g. 2 CPUs, 2Gb of RAM, 4 GPUs) and it will be scheduled for execution when the needed resources are available.

A job can be any program or script.


Posted 2021-01-26T15:31:54.870

Reputation: 10 494