What are general best practices or considerations in designing a model that is optimized for real-time inference?
The only difference between any normal model and "real time" is the fact that the the real time model needs to be as fast as possible. Rephrasing your question to "What are the quick/fast models/techniques for real time [USAGE]". This question is lacking the usage. Is this image recognition in real time? Is it audio detection? Voice generation? What do you want to do in real time, then maybe people can help – Recessive – 2019-12-17T14:28:39.117
I wouldn't agree that it depends on usecase and more on how inference is genrally implemented on the GPU. So for example keeping convolution kernels small but having more layers may present very different bottlenecks in gpu implementations than having large convolution kernels and few layers. – user1282931 – 2019-12-17T15:34:30.713
That's true, but we still don't know what you are even doing. What do you want to do in real time – Recessive – 2019-12-18T00:42:26.073
Many things. I want to start with some pixel classification - but I want to do this only to learn about inference performance and implementation specifics. My whole goal is to acquie general best practices to follow during modeling when inference performance (on mobile gpus in particular) is a concern – user1282931 – 2019-12-18T11:02:04.617
Viewed: 20 times