Does fp32 & fp64 performance of GPU affect deep learning model training?



I am purchasing Titan RTX GPU. Everything seems fine with that except float32 & float64 performance which seems lower vis-a-vis some of its counter parts. I wanted to understand if single precision and double precision performance of GPU affect deep learning training or efficiency ? We work mostly with images, however not limited to that.

Ruchit Dalwadi

Posted 2019-02-13T07:30:20.597

Reputation: 297

Question was closed 2020-05-13T21:40:59.513

I would say that this type of questions is better suited for This website is better suited for questions related to the theory of AI (RL, ML, logic, etc.), I would say.

– nbro – 2019-02-14T11:47:45.130

@nbro. I would agree with you but when I posted this question I wanted to understand this from deep learning perspective. So I felt that it is suited here. But I would keep in mind the next time :) – Ruchit Dalwadi – 2019-02-14T12:39:40.333



First off I would like to post this comprehensive blog which makes comparison between all kinds of NVIDIA GPU's.

The most popular deep learning library TensorFlow by default uses 32 bit floating point precision. The choice is made as it helps in 2 causes:

  • Lesser memory requirements
  • Faster calculations

64 bit is only marginally better than 32 bit as very small gradient values will also be propagated to the very earlier layers. But the trade-off for the gain in performance vs (the time for calculations + memory requirements + time for running through so many epochs so that those small gradients actually do something) is not worth it. There are state of art CNN architectures, which insert gradients midpoint and has very good performance.

So overall 32 bit performance is the one which should really matter for deep learning, unless you are doing a very very high precision job (which still would hardly matter as small differences due to 64 bit representation is literally erased by any kind of softmax or sigmoid). So 64 bit might increase your accuracy classification by $<< 1 {\%}$ and will only become significant over very large datasets.

As far as raw specs go the TITAN RTX in comparison to 2080Ti, TITAN will perform better than 2080Ti in fp64 (as its memory is double than 2080Ti and has higher clock speeds, BW, etc) but a more practical approach would be to use 2 2080Ti's coupled together, giving a much better performance for price.

Side Note: Good GPU's require good CPU's. It is difficult to tell whether a given CPU will bottleneck a GPU as it entirely depends how the training is being performed (whether data is fully loaded in GPU then training occurs, or continuous feeding from CPU takes place.) Here are a few links explaining the problem:

CPU and GPU Bottleneck: A Detailed Explanation

A Full Hardware Guide to Deep Learning


Posted 2019-02-13T07:30:20.597

Reputation: 4 881

well yes. I googled this thing before posting here and everywhere they unanimously agreed that fp64 does not matter but I wanted to understand this from tech angle. But yeah this makes sense now :) Thanks. @DuttaA – Ruchit Dalwadi – 2019-02-13T08:05:10.193

@RuchitDalwadi i have not included all details, but if you want to know some specific detail I can add it after researching. – DuttaA – 2019-02-13T08:06:09.483

I compared both TITAN RTX and 2080Ti. In my use case I'd have to purchase either 2 TITAN RTX or 4 2080Ti. I did some calculations and I came to realize that purchasing 2 * TITAN RTX would be efficient. But I'm happy to be proven otherwise – Ruchit Dalwadi – 2019-02-13T08:07:52.537

1@RuchitDalwadi it is tough to compare 4 GPU's with 2 GPU's as increasing the number of GPU's will cause bottleneck issues (engine=cores are much more powerful but the gas=data is not being provided at timely intervals leading to unused cores). So for that you have to check performance practically. But if you are able to plan your architecture meticulously and spread it over all the GPU's properly using TensorFlow then 4 GPU's might be more efficient, but generally its worth not the effort probably (I do not have experience in this). – DuttaA – 2019-02-13T08:11:49.063

my machine specs are TITAN RTX - 48GB gpu memory, 2TB SSD, 4TB HDD, i9(10 cores), 128 GB RAM. – Ruchit Dalwadi – 2019-02-13T08:12:03.150

@RuchitDalwadi i would make sure the CPU is very capable, i9 might be good for pc but there are server grade CPU's like xeon, etc. Otherwise CPU will cause bottleneck issues. – DuttaA – 2019-02-13T08:13:44.550

I am purchasing this from lambda labs and even in customization box they are not letting me to buy other than intel i7 / i9 series. Let me do a little bit more research on CPUs then. I never thought that this might cause problem. Also, if you can add more details about what problems i9 MIGHT cause, it would be of a great help. – Ruchit Dalwadi – 2019-02-13T08:17:26.640

1@RuchitDalwadi i have added a few links. – DuttaA – 2019-02-13T08:24:59.770

@RuchitDalwadi make sure your power requirements are met by the motherboard..Slots bandwidth pins you have to check everything before setting it up...Check data sheets for detailed info. – DuttaA – 2019-02-13T13:34:04.067

yes. That makes sense. I will definitely ensure this :) @DuttaA – Ruchit Dalwadi – 2019-02-14T09:24:14.073


Deep models are very tolerant to arithmetic underflow. You can hope for neglectable differences in prediction accuracy between FP32 and FP16 models. Check this paper for concrete results.


Posted 2019-02-13T07:30:20.597

Reputation: 429