Puget Systems print logo
Read this article at https://www.pugetsystems.com/guides/1247
Dr Donald Kinghorn (Scientific Computing Advisor )

NVIDIA RTX 2080 Ti vs 2080 vs 1080 Ti vs Titan V, TensorFlow Performance with CUDA 10.0

Written on October 3, 2018 by Dr Donald Kinghorn

Are the NVIDIA RTX 2080 and 2080Ti good for machine learning?

Yes, they are great! The RTX 2080 Ti rivals the Titan V for performance with TensorFlow. The RTX 2080 seems to perform as well as the GTX 1080 Ti (although the RTX 2080 only has 8GB of memory).

Probably the most impressive new feature of the new NVIDIA RTX cards is their astounding Ray-Tracing performance. However, these are excellent cards for GPU accelerated computing. They are very well suited for Machine Learning workloads and having "Tensorcores" is nice.

I have just finished some quick testing using TensorFlow 1.10 built against CUDA 10.0 running on Ubuntu 18.04 with the NVIDIA 410.48 driver. These are preliminary results after spending only a few hours with the new RTX 2080 Ti and RTX 2080. I'll be doing more testing in the coming weeks.

I'm not going to go over details of the new RTX cards, there are already plenty of posts on-line that cover that. I will mention that the Turing architecture does include Tensorcores (FP16), similar to what you would find in the Volta architecture. From a GPU computing perspective the RTX Turing cards offer an affordable alternative to Volta based Titan V, Quadro GV100 or server oriented Tesla V100. The main drawback with the Turing based RTX cards is the lack of the outstanding double precision (FP64) performance on Volta. However, for most machine learning workloads that is not an issue. In fact the inclusion if FP16 Tensorcores is a big plus in the ML/AI domain.

Test system


  • Puget Systems Peak Single
  • Intel Xeon-W 2175 14-core
  • 128GB Memory
  • 1TB Samsung NVMe M.2
  • GPU's
  • GTX 1080Ti
  • RTX 2080
  • RTX 2080Ti
  • Titan V


The Workstation is my personal system along with the extra GPU's that are being tested.

The TensorFlow build that I used for this testing is the latest build on NGC. It is TensorFlow 1.10 linked with CUDA 10.0. The convolution neural code used for the ResNet-50 model is from "nvidia-examples" in the container instance, as is the "billion word LSTM" network code ("big_lstm").

For details on how I have Docker/NVIDIA-Docker configured on my workstation have a look at the following post along with the links it contains to the rest of that series of posts.

How-To Setup NVIDIA Docker and NGC Registry on your Workstation - Part 5 Docker Performance and Resource Tuning

Note: For my own development work I mostly use Anaconda Python installed on my local workstation along with framework packages from Anaconda Cloud. However, I am a big fan of docker, nvidia-docker and NGC! I use NGC on my workstation not "the cloud".

Do I need to have CUDA 10 to use TensorFlow on NVIDIA RTX 20xx series GPU's?

No. The RTX 20xx GPU's are "CUDA compute 7.5" devices but they will run code built for lower compute levels. I did some testing using TensorFlow 1.4 linked with CUDA 9.0 and it worked with the 2080 and 2080Ti GPU's. What IS required, is to have NVIDIA display driver version 410 or later installed on your system. You need the new 410 or later driver even if you are using docker/nvidia-docker. The CUDA "run-time" libraries are included with the driver. Driver version 410 or later is required for RTX 20xx cards and also, for CUDA 10 linked programs.

As of this writing the easiest way to install the new NVIDIA driver on Ubuntu 18.04 is to do a CUDA 10 install which includes the driver. See my recent post,

How To Install CUDA 10 (together with 9.2) on Ubuntu 18.04 with support for NVIDIA 20XX Turing GPUs.

TensorFlow benchmark results - GTX 1080Ti vs RTX 2080 vs RTX 2080Ti vs Titan V

The benchmark for GPU ML/AI performance that I've been using the most recently is a CNN (convolution neural network) Python code contained in the NGC TensorFlow docker image. NVIDIA has been maintaining that with frequent updates and it's easy to use with synthetic image data for quick benchmarking.

For reference, an example of command-lines used is,

kinghorn@i9:~$ docker run --runtime=nvidia --rm -it -v $HOME/projects:/projects nvcr.io/nvidia/tensorflow:18.09-py3
root@90752be3917b:/workspace# cd nvidia-examples/cnn
root@90752be3917b:/workspace/nvidia-examples/cnn# export CUDA_VISIBLE_DEVICES=0
root@90752be3917b:/workspace/nvidia-examples/cnn# python resnet.py --layers 50 -b64 --precision fp16

That is starting the NGC TensorFlow docker imaged tagged 18.09-py3, which contains TensorFlow 1.10 linked with CUDA 10.0. The job run is the ResNet-50 CNN model with a batch size of 64 at FP16 (single) precision. The environment variable CUDA_VISIBLE_DEVICES is used to select the GPU (or GPU's) being used. (device 0 in my case, is a Titan V). Note, that --precision fp16 means "use tensorcores".

ResNet-50 - GTX 1080Ti vs RTX 2080 vs RTX 2080Ti vs Titan V - TensorFlow - Training performance (Images/second)

FP16 (Tensorcores)
GTX 1080 Ti207 N/A
RTX 2080207 332
RTX 2080 Ti280 437
Titan V299 547

ResNet-50 with RTX GPU's

I also ran the LSTM example on the "Billion Words data set ". The results are a little inconsistent but actually I like that! It's a reminder that benchmark like results are subject to change and don't always "go your way".

For reference, an example of command-lines used is, (continuing in the container image used above)

root@90752be3917b:/workspace/nvidia-examples# cd big_lstm
root@90752be3917b:/workspace/nvidia-examples/big_lstm# ./download_1b_words_data.sh
root@90752be3917b:/workspace/nvidia-examples/cnn# export CUDA_VISIBLE_DEVICES=0
root@90752be3917b:/workspace/nvidia-examples/big_lstm# python single_lm_train.py --mode=train --logdir=./logs --num_gpus=1 --datadir=./data/1-billion-word-language-modeling-benchmark-r13output/ --hpconfig run_profiler=False,max_time=90,num_steps=20,num_shards=8,num_layers=2,learning_rate=0.2,max_grad_norm=1,keep_prob=0.9,emb_size=1024,projected_size=1024,state_size=8192,num_sampled=8192,batch_size=512

"Big LSTM" - GTX 1080Ti vs RTX 2080 vs RTX 2080Ti vs Titan V - TensorFlow - Training performance (Images/second)

GTX 1080 Ti6460
RTX 2080 (Note:1)5071
RTX 2080 Ti8945
Titan V (Note:2)7066
Titan V (Note:3)8373


  • Note:1 With only 8GB memory on the RTX 2080 I had to drop the batch size down to 256 to keep from getting "out of memory" errors. That typically has a big (downward) influence on performance.
  • Note:2 For whatever reason this result for the Titan V is worse than expected. This is TensorFlow 1.10 linked with CUDA 10 running NVIDIA's code for the LSTM model. The RTX 2080Ti performance was very good!
  • Note:3 I re-ran the "big-LSTM" job on the Titan V using TensorFlow 1.4 linked with CUDA 9.0 and got results consistent with what I have seen in the past. I have no explanation for the slowdown with the newer version of "big-LSTM".

Should you get an RTX 2080 or RTX 2080Ti for machine learning work?

OK, I think that is an obvious yes! For the kind of GPU compute workload that you will find with ML/AI work the new NVIDIA Turing RTX cards will give excellent performance.

The RTX 2080 is a good value with performance similar to the beloved GTX 1080Ti and it has the added benefit of Tensorcores, not to mention the Ray-Tracing capabilities! It's main downside is the limits imposed by the 8GB of memory.

The RTX 2080Ti is priced about the same as the older Titan Xp but offers performance for ML/AI workloads that rivals the Titan V, (which is over twice the cost). I think the RTX 2080Ti is the obvious and worthy successor to the GTX 1080Ti as the practical workstation GPU for ML/AI development work.

What's my favorite GPU?

I do like the RTX 2080Ti but I just love the Titan V! The Titan V is a great card and even though it seems expensive from a "consumer" point of view. I consider it an incredible bargain. I am doing experimental work where I really need to have double precision i.e. FP64. The Titan V offers the same stellar FP64 performance as the server oriented Tesla V100. For a development workstation for someone doing a lot of experimenting or more general scientific computing it is an easy recommendation. That said if you really don't need FP64 then the RTX 2080Ti is going to the best performance for the cost.

Happy computing! --dbk

Tags: RTX 2080 Ti, TensorFlow, CUDA, NVIDIA, ML/AI, Machine Learning

I agree with you...i am waiting for an RTX Titan which will hopefully be a clone of the RTX Quadro 6000: Full TU102 with 24GB memory for $3000.
This would give me the same price performance ratio as a Titan V with 12GB as I can get get twice the performance/accuracy for my monte carlo simulations.
And I need at least 4 of them for one machine.

The main reason I don't want to use the 2080ti is that it cannot run in TCC mode like the Titan series cards can and I get access to the full 12Gb memory in the Titan Xp as compared to a 1080ti when running the Titan Xp in TCC mode.

An even better card would be the special edition cards of the Titan V with 32GB...this would be a great version too if they replaced the 12GB version at the same price of $3000...

Posted on 2018-10-03 23:04:37
Donald Kinghorn

Yes! I'm hoping to see a Titan T (?) for 3K$ with at least 16GB mem (24GB would be great) and full FP64. For the quantum mechanics work I did recently I could really use the 32GB Titan V.

I think the 2080Ti will great for ML stuff. I'm not planning on doing more testing until they are available in distribution ...

I do nearly everything on Linux so I don't run into the issues that TCC takes care of, but, that is a real consideration for Windows. Thanks for mentioning it since I hadn't thought about it. ( ... I need to send a msg to one of our sales consultants about that for something I recommended earlier today!)

Posted on 2018-10-03 23:54:24

Yes..I do everything under windows 10 for development and I am too old(lazy??..lol) to learn linux even though i did use Sun Solaris Unix back in the day when I was doing trading support in the banks.

I have a 1080ti and a Titan Xp for development and they have both been stellar but the Titan Xp really takes the cake for me as I can get it to run nearly 20% faster and do twice as many simulations running in TCC mode compared to the 1080Ti 11GB...that 1 extra GB makes a world of difference if you allocate ALL of vram in one shot, which I do to run my simulations at max speed. I allocate all memory once and never deallocate while running my simulations when fed live market data.

I have looked at the half datatype (fp16) and the precision is way too small for me to run my simulations. I run approx 1 million simulations to price a single option with a max of 2048 prices generated per batch. I usually run as many batches that will fit in memory which then allows me to simulate ONE day of option prices for a single option!!! I need to run these simulations for the full path of the option from trade date to maturity date which i have limited to approx 10 days max!!! Definitely running many billions of fp32 ops.

I would love to use the tensor cores but the fp16 precision is no good as I need to generate one random variable per simulation which obviously has a greater range than fp16. FP32 allows me up to 2 million simulations with accuracy up 6 decimal places

...I think I am talking myself into getting the Titan V too...lol

Posted on 2018-10-04 00:15:38
Donald Kinghorn

Win 10 is a good platform and keeps getting better. I've been back and forth between that and Linux over but have mostly been on Linux since the start of the cluster computing days in the late 1990's. I had the pleasure of meeting the WSL team doing the Linux on Windows thing, they are awesome! They are getting close to having the best of both worlds. Still a ways to go but I actually believe that MS loves Linux :-) these days.

I am "long term" borrowing the Titan V I'm using now. If didn't have it I would really miss it! It would be wonderful to have the 32GB version because, as you observed, extra mem is a BIG plus.

Posted on 2018-10-04 17:54:07

Any rumors on when/if new Titan cards coming from Nvidia??

I suppose they have their hands full with the new RTX cards...will have to be patient I guess as I really want to have a prototype version of my production system ready by middle of 2019 with minimum 4 Nvidia cards with at least 12GB...2080ti will not cut it for production and Quadro RTX is just still too expensive...

Posted on 2018-10-04 18:56:50
Norbertas GL

what software stack are you using to run monte carlo on Nvdia?

Posted on 2018-10-04 10:05:08

I am using Microsoft VS2017 Enterprise Edition: C#, Managed C++, Standard C++, NVidia Cuda C/C++ (9.2)

C#: high level interface to represent the monte carlo batches
Managed C++: mid level interface to connect managed .net/c# to standard c++ library
Standard C++: library code to interface to Cuda host code
Nvidia Cuda C/C++: Cuda kernel code that actual runs on the nvidia gpu hardware to run the monte carlo simulations

Quite involved but works seamlessly...by far the hardest part was learning how to run cuda code to take advantage of the ridiculous number of parallel tasks/threads that you can run on the gpu hardware

Posted on 2018-10-04 12:36:41
Gordon Freeman

Thank you! These are the first benchmarks that have correctly used Tensor Cores for RTX. I'm curious if the speedup remains for large networks that can only be trained in very small batches. For example MaskRCNN max batch is 1-2 etc. Also, how much advantage did mixed precision give you in terms of extra GPU ram available for fitting a larger batch?

Posted on 2018-10-04 19:44:00
Donald Kinghorn

Those are great questions! Unfortunately I didn't look at them when I was testing. My experience in the past was the was that tenosrcores failed numerically for larger models. When I first looked at fp16 Inception3 was the largest model I could train. Inception4 blew up until I went back to fp32. Mixed precision needs extra care, scaling of gradients and such. Still I think it is a good thing. What I really want to test is model size reduction for inference with TensorRT targeted to tensorcores. I think that is probably the best use case. Non-linear optimization is just too susceptible to precision loss.

Mem usage would have been another great thing to check. Going fp16 should open up a lot more "room". In the PyTorch code I was working with recently I used FP64 on a Titan V since I was going for convergence to as many digits as I could get. If I dropped down to FP32 I got nearly double the memory space so I could start up larger jobs (more basis functions) but the optimization I was doing would stall out way too soon for what I was doing. Having full performance FP64 on the Titan V was wonderful!

I won't have my hands on the 2080Ti again for awhile but we do have a bunch of 2080's and we just got an NVLink connector in so I will likely do more testing again with 2 2080's + NVLink soon. When I do that I'll have more time so I can have better answers about the mem usage.

Posted on 2018-10-04 23:57:07
Mark Johnstone

Great work, as always Don. Do you have any thoughts on being able to take advantage of the NVLink on the 2080 ti?

Posted on 2018-10-04 21:32:04
Donald Kinghorn

Thanks Mark! I was actually holding in my hand the NVLink bridge we just got in when you posted your question :-) I was also looking at a table full GPU's. Unfortunately just 1 2080Ti but several 2080's. I'm pretty curious about it. It looks like it doesn't have as many "lanes" as the bridge for the GV100 or P100 Quadro's. There is just one bridge for the RTX cards instead of the two on the Quadro's ... I don't really know anything about the implementation yet. This is going to be a big question so I'll probably grab a couple cards and the bridge next week and see what I can do with it.

Posted on 2018-10-05 00:06:32
Lawrence Barras

I was looking forward to seeing some benchmarks on the 2080ti vs the Titan V. I just got around to running some myself this evening. I don't have 2080, but I'm running some tests on my Titan -V vs. some benchmarks that have popped up on the 2080ti. So far, the 2080ti seems to be doing about the same as the Titan V, perhaps a little better at times.

I am somewhat hopeful that a driver update may allow better Titan V performance. I'm pretty sure that it could go faster than it is...

Posted on 2018-10-05 06:38:04

This is good news...I am waiting for the Titan RTX but maybe I should consider the 2080ti in more depth. I run monte carlo simulations on a Titan Xp and it is at least 15-20% faster than a 1080ti and so I figured that a 2080ti would only be 10-20% faster than a Titan Xp, which to me is not worth an upgrade yet.

The Titan V ls definitely faster for compute tasks but the price performance ratio is a non starter as I want to buy multiple cards. I think I should get a 2080ti blower version since I want a multi-gpu setup and test against my Titan Xp before making any final decision. Price is not really a consideration if I buy a single card for development but I need at least 4-8 cards for real time calculation at a minimum and therefore price/performance will be critical once I move from development to production...I am so amazed how fast Cuda runs. What used to take hours, is now taking a few seconds!!! Realtime for my needs is being able to run my cuda app in less than a second. Currently it takes around 7-10 seconds for a single Titan Xp and 4-8 2080ti cards just may get me under 1 second.

This is a great column as I can't find anyone else doing low level compute benchmarks on all of the gpu cards...keep up the good work!!!

Posted on 2018-10-05 12:14:49
Lawrence Barras

With the crash of crypto-currency mining, there are plenty of cards on the secondary market, including Titan-V cards. The savings can be significant vs. MSRP.

Another vendor of custom-built high-performance computers posted benchmarks of 2080ti vs. 1080ti a few days ago, and a link to their github with the tests. I'm finding the Titan V, with cuda 9.2, driver 396.54 and NGC container tensorflow-18.08-py03 is running a little bit slower than the numbers they published for a 2080ti. The workstation I tested on is an older Skylake 6700k, so that may have held it back a little.

Posted on 2018-10-05 21:43:52

Hello! Thanks so much for this phenomenal set of bench marks and results :) I was wondering if you maybe captured temperature rates at all for the runs? I am curious to see what happened between the Titan V and RTX 2080Ti for thermals and if any watercooling would at all be worthwhile.

Posted on 2018-10-05 09:28:42
Donald Kinghorn

see my comment a little further down ... There will be blower versions the will exhaust out the back ... like they should!

Posted on 2018-10-05 22:35:39

Hi thanks for posting this and your other helpful cuda/DL-based tutorials.

I want to point out that there is a typo in your code. The command "docker run --runtime=nvidia --rm -it -v $HOME/projects:/projects nvcr.io/nvidia/tensorflow:1..." loads a nvidia container with cuda 9.0. To get this to work with the RTX GPUs, you need to use 18.09-py3, not 18.09-py3

Thanks again and looking forward to your investigation of NVLink


Posted on 2018-10-05 14:00:51

Sorry, I mean 18.09-py3, not 18.08-py3. 18.08-py3 loads a cuda9.0 container.

Posted on 2018-10-05 14:03:05
Donald Kinghorn

Thanks! I'll fix that. I did run a few older builds just see if they would work ... and they did. I copied in the wrong tag when I wrote that up. I was not thorough with the older build testing, just a quick check ...

Posted on 2018-10-05 22:17:09
Scott Le Grand

A little birdie has told me that you really can't operate more than 2 RTX 2080TIs in a single box right now without thermal issues causing them to downclock. 1080TI and TitanV did not display such behavior. Also, we've normalized $3000 consumer (prosumer?) GPUs? Okeydokey then. I see a booming business ahead in bespoke cooling solutions for these widgets. Looking forward to my first two myself in the next few weeks.

Posted on 2018-10-05 14:59:28
Donald Kinghorn

Hey Scott, :-) I'm hopeful that the blower versions will cool well with 4 2080Ti's in a properly set up chassis. We've only got 1 2080Ti (borrowed!) right now. We might be able to do some testing with 2080's with blowers

The 2080Ti is going to be a killer card for ML/AI workstations. For me personally, I have big love for the Titan V and I feel it is a bargain at $3K. I'm looking forward to the next Titan release!

Posted on 2018-10-05 22:58:21
Scott Le Grand

So if FedEx is telling the truth, I will have my first 2 on Monday. But as a 2x Camaro owner, I just can't justify TitanV vs 1080TI. I heart !/$. I frickin' love that they found a way to expose all the Tensor core goodness in consumer Turing GPUs, but there's also a note of sadness that FP32 performance is levelling off because they had an absolutely amazing run over the past decade. My path going forward is to try to make the contents of https://arxiv.org/abs/1710.... 100% automagic in my obscure indie filmmaker deep learning framework (that also happens to support model parallelism out of the box and rules the world of sparse data (for now) https://github.com/amzn/ama....

Posted on 2018-10-06 16:14:52
Donald Kinghorn

I hear you 100% on all of that! The compute performance gains have been amazing until recently. If I had to pay for my Titan V I probably wouldn't have it (I'm hoping the office forgets I'm using it :-). If I do more straight up scientific work and I need the FP64 I will seriously consider the next Titan, but, it will blow my entire computing budget for the year. Like you are doing, mixed precision is the way to go when you can, and I agree, it's great that they did tensorcores on the 20xx cards!

Posted on 2018-10-08 19:16:41
Donald Kinghorn

A note to everyone concerned about cooling with the side fans that exhaust into the case ... yea, that sucks or blows or something :-) One of the other guys here (William George) did a good post on this. We will be carrying the side fan AND blower versions of these cards!

Posted on 2018-10-05 22:32:42

Yep...all of my cards are blower versions and obviously no real need for overclocking as I would rather they be stable at room temperature while running my monte carlo calculations within trading hours 6 days a week

I think Nvidia will release the RTX Titan as a blower version similar to the RTX Quadro 6000 with a minimum of 12GB.
Hopefully they will release multiple versions of the Titan real soon (12/24GB Turing RTX, 32GB Volta ???).
If it will do approx 18 TFlops (fp32), it would be at least 50% faster than a Titan Xp...

Posted on 2018-10-06 03:45:37
Donald Kinghorn

Yup, I'm saving my compute budget for the new Titan ... if it is what we hope it will be

Posted on 2018-10-08 19:17:59
Darko Jocic

How big influence does Tensor cores have on performance of 2080? I'm still thinking should i get 2080 or 1080ti, 2080 only problem is 8gb of Vram (is that enough? atm am running 1060 6gb), everything else seems good, what is your opinion Dr Donald Kinghorn?

Posted on 2018-10-10 10:42:39
Donald Kinghorn

Tensor cores i.e. FP16 is a nice 50-75% performance boost IF you can get away with using it. It takes some work to get code numerically stable enough to use that low of precision. So if you are doing the code development you will have more work to do. However, more libraries and packages are taking advantage and people are figuring out "best practices", and, for ML inference with a well tuned and compressed model it makes a lot of sense ( I haven't tested this yet but plan to look at using NVIDIA's TenorRT)

For me the memory limit is the bigger issue. For example having to reduce the batch size when training a model can cause a significant slowdown in training. There are people that get the Titan Xp to get just a little more memory access. On the plus side for the 2080 the memory is faster GDDR6. So compared to the 1060 you are usng the 2080 is a nice improvement in every aspect.

Personally if I was doing work right now and I knew that I would NOT be able to take advantage of FP16 I would probably be looking for good deals on 1080Ti's. The 1080Ti is just a great card and it was priced very well. The supply of new cards is going away rapidly but there could be a lot of deals on the used market. If you can find a used 1080Ti for a good price it would be tempting. The best cards will be ones that are not "super clocked" The best ones in my opinion are the founder editions with the blower fan or really any that are using blowers instead of the side fans. (I have the EVGA card with the blower, love it)

Posted on 2018-10-10 16:05:39
Darko Jocic

Thanks a lot for the answer and help! I'm not doing doing any code development, am mechanical engineer (mostly working in field of artificial intelligence, autonomous vehicles etc.). I guess 1080 ti would be better deal in that case due to memory and performance.

Posted on 2018-10-10 16:18:09

Nowadays, the RTX 2070 have been released, comparing with gtx 1080 ti, which one should I choose for ML? If the prices difference is about $120.

Posted on 2018-10-25 09:09:56
Donald Kinghorn

That is a good question ... I just grabbed a RTX 2070 ... I'll be putting it in my system in a few minutes ... I've got some more testing that will be posted tomorrow or Monday using 2 RTX 2080Ti ... I probably wont include the 2070 in that so I'll just post a couple of numbers here. I would like to do some 4 x multi-GPU testing with all of the new cards. That will probably happen in a couple of weeks ... check back here in a bit for some single 2070 numbers ...

...see the comment above this for the results...

Posted on 2018-10-25 20:10:58

One thing to note with the RTX 2070 is that it does not support NVLink like the higher-end RTX cards. Neither does the older 1080 Ti, of course, and if you just want a single card it won't matter - but if you have any interest in multi-GPU stuff down the road, you might want to spend a little more now so that you have the option of utilizing NVLink later on.

Posted on 2018-10-25 21:03:31
Donald Kinghorn

OK, ran the same jobs on the RTX2070 ...

for the CNN fp32 192 images/sec fp16 280 images/sec

for the LSTM batch size 256 it did 4740 words/sec

It looks pretty good to me. That is close to the 1080TI performance. The biggest drawback is the memory limit of 8GB. That's why the LSTM job on the 1080TI is better, it used a batch size of 448 and that makes a performance difference for that job. The nice thing is that you can use fp16 on it if you can get away with that.

I'm sure I'll do multi-GPU testing including the 2070. It could be a good multi-card option for code that scales well.

Posted on 2018-10-25 21:43:19

Loоking for sеxting
Add me, my id 659167

Posted on 2019-03-17 02:03:57

Hello, thanks for the information. Any idea if pytorch works with RTX 2080?

Posted on 2018-11-20 23:01:38
Donald Kinghorn

There shouldn't be any problems but, I didn't get any PyTorch testing while I had the cards. I know PyTorch is great on Volta i.e the Titan V I've been using and that has most of the compute related stuff on the 20xx cards. I any case NVIDIA has been really good about backward comparability for several years. (they learned their lesson going from Fermi to Kepler)

I should be getting back to working with PyTorch again soon ... I really like it and have some projects to work on.

Posted on 2018-11-26 02:45:01
Yin Tianwei

Hi, you can see my last discussion that the mxnet with cuda10 can get about 395 samples/second in maximum setting. I also try the pytorch version(I use code here https://github.com/u39kun/d... but the result is quite normal and the fp16 doesn't have much improvement from fp32. I think the problem is that it is still hard to write a correct fp16 training script for pytorch. Additionally, I personally think that mxne maybe the fastest framework.

Posted on 2018-12-01 08:40:31
Yin Tianwei

I also try RTX 2080 with mxnet and cuda10. The result of ResNet50 training is about 395 samples/second with maximum fan speed. I think this partially comes from the better mixed-precision implementation and computation efficiency of mxnet. On the other hand, I use an overclocked card and notice the frequency comes up to about 1900. As I only have one GPU and I can't do anything when the gpu is training(I use AMD which doesn't have an in-CPU GPU), I would consider get another 2070 for basic work. I wonder what 2070 brand are you using, at what frequency do you get the 280 result?(Do you get a OC version GPU like gigabyte or evga?)

Posted on 2018-12-01 08:35:35
Donald Kinghorn

the cards I was using were NVIDIA reference cards. Those are usually made by PNY for them they make the Quadro cards too. There stuff is generally really good! I also like the EVGA cards a lot (and have liked them for many years!) I try to avoid overclocked cards as much as possible since they used to be more likely to fail under load. I don't really thing that us much of a problem any more though. The 900, 1000, and hopefully 2000 series cards have been reliable even when over clocked.

I've been wanting to explore MXNET for since I've heard good things about performance and usability. In general framework performance just keeps getting better.

Posted on 2018-12-03 18:33:33
Donald Kinghorn

Looks like NVIDIA's announced Titan RTX at NIPS I'll be testing as soon as we can get one. Will probably have to borrow from YouTube reviewer again :-)

Posted on 2018-12-03 17:12:21
Donald Kinghorn

Titan RTX is 24GB and $2500 ... Fantastic! Just in time for a new project I'll be helping some researchers with at U of Arizona

Posted on 2018-12-03 17:27:17

But it seems FP64 performance of the Titan RTX is only slightly better than 2080Ti. Far below the Titan V, right?

Posted on 2018-12-20 18:09:07
Donald Kinghorn

Yes, unfortunately! The theoretical perf is fp32 -> 12442 GFLOPS fp64 389 GFLOPS 1/32 ratio ugggg! I don't think this is an artificial "crippling" as there has been in some older cards but is actually a design limitation/trade-off ??? I don't expect to see a T100 Tesla with a 1/2 ratio like the V100 and Titan V
That's actually a big disappointment to me personally because I often need fp64 for "new" code. I do like the 24GB memory! That is a big deal. It will be a great card for most ML/AI kinds of stuff and most code that has was designed with 32-bit floating point as a consideration. AND that covers a lot of programs that have been written since CUDA has been available. We didn't really have good/great fp64 until Volta. Many programs work around that, successfully with careful algorithms and mixed precision methods using 64 bit on the CPU side when it's needed.

The biggest problem with lack of good fp64 perf is when you are trying to work up new code that needs the accuracy or you are trying to do a quick port of some CPU code. My recent example of that is the QM code I did with PyTorch that was very quick to work up and at first it would only run with fp64. I was latter able to stabilize it enough to work in fp32 but then it quickly ran out of precision because of the very high accuracy I was getting with the calculations. You can't get 12 digits of accuracy if your precision will only give you 7 :-) I was running the code purely on GPU.

I'm going to copy your comment and my response at the end of the comments so it's more visible ... this is pretty important in my opinion.

Posted on 2018-12-21 16:36:41

I see a Titan V 32GB coming out in the near future...

Posted on 2019-01-02 15:51:01

There are technically Titan Vs with 32GB out there - it's the CEO Edition, which were handed out by Huang himself.

Definitely not impossible.

Posted on 2019-01-15 09:27:12

I see a retail Titan V 32GB available for order on Nvidia website out in the near future....

Posted on 2019-01-15 12:29:19

Unfortunately, I don't think that will happen. The CEO Edition cards seem to have been limited to 20 that were given away (not even sold?!?) about 6-7 months ago, and there has been no further mention of them or a similar retail card since.

It is a bummer, but I bet it would have cut into sales of cards like the Quadro GV100 - which is based on the same tech as the Titan V, and sports 32GB of memory. You can buy them if you want something like this, but they will set you back a cool $9k each :/

Posted on 2019-01-15 17:15:07

No thanks to Quadro GV100 but I do hope they at least replace the Titan xp with Titan V 12GB version this year at/near same Titan xp price.
Chance of this happening is very low as Nvidia has no real competition from AMD at this end of the market....

Posted on 2019-01-17 00:23:51

Happy New Year!!!

yes you need to get on that Titan RTX asap...LOL!!!
I think the Titan RTX is a good deal but only for max 2 cards in your workstation/gpu server...not good.
If you could give some recommendations for water cooling then this would be the best card but as it stands quad Titan xp still looks like it would deliver
great price/performance in FP32 where i think the majority of HPC/AI/Deep learning still runs.
I definitely could max out the 24GB per card but I need 4 cards for production. The Titan V looks great too but price/performance is starting to decrease ra[idly!!
Oh the decisions I have to make this year...Titan RTX watercooled looks like the real winner for my apps otherwise I may stil stick with getting 3 more titan xp's
Would love to hear from you and others regarding this 2080ti vs Titan V vs Titan RTX vs Titan xp

Posted on 2019-01-02 15:49:22

Nice analysis. But honestly I think for deep learning training, people spend most of their time exploring topology options and tweaking what potions of pre-trained models to lock in or train. They're not running straight up Resnet non-stop (at least that's the case for me). Moreover doing training at FP16 is still a bit tricky when messing with the topology.
Right now you can get lightly used 1080s on ebay for ~$350, So for the price of a 2080TI or TitanV you can get about 6 1080 cards, and easily get 2x total throughput of the TitanV (at FP16) to try out multiple network topology variations at the same time. Downside is that once you find an optimal network topology you'll have to leave to run longer for fine tuning. So I don't know - I guess it depends on what type of work you're doing.

Posted on 2018-12-06 08:02:27
Donald Kinghorn

you are correct of course ... I'm not a big fan of fp16 but it helps to bring some workflows or production code closer to real time. For me personally the algorithms I usually work with require fp64 so the Titan V fantastic for me, but I can't afford it (thankfully I have one that I can borrow) Picking up used 1080Ti's is a great those are wonderful cards and newer stuff being available doesn't diminish that. On the other hand for a lot of our customers the cost of compute hardware is inconsequential if it saves any human time or just get to better results faster then it's worth it. It's all relative! That RTX Titan with 24GB is 3.5 times less expensive than the similar Quadro RTX 6000 In general I'm like a kid in a candy store. I've been doing scientific computing one-way-or-another for over 30 years, the CPU's GPU's mem ... that we have available now is light years ahead of what I started with ... I'm constantly blown away by performance
Best wishes!

Posted on 2018-12-06 17:19:05

I agree with every word!!!
When i first started running my Monte Carlo simulations on my brand new threadripper pc, it took hours.
Now that I have a 1080ti and a Titan xp gpu, it takes less than 3 seconds!!! Absolutely bonkers!!!
But the amount of money to get it to less than one second is starting to creep up so hurry up and do some more testing on the Titan RTX.
I am now looking into implementing an AI based trading algorithm but it needs to execute in under 1 second once trained.

Have you looked at Microsoft Infer.net and inferencing using probabilistic programming??
Looks interesting as it may apply to more fields where you already have a good idea of the predictive model to use...

Posted on 2019-01-02 16:04:01
Donald Kinghorn

OK the probabilistic programming thing got me interested :-) ... I'll have a look at that

Posted on 2019-01-03 22:03:41

Thanks Don

The idea of training for days is crazy to me!!!
The market changes every second and does not repeat at all so of course I am skeptical to use Tensorflow for any kind of real time trading.
I am trying to understand this probabilistic programming but I am crazy busy every day!!!(I know...no excuse!!)

You are doing an excellent job in testing these gpu's for the performance limits encountered in hpc/ai and deep learning.
I don't know of any other website that is useful with such practical information in maximizing performance on all of this new high tech GPU based hardware...

Posted on 2019-01-03 23:43:35
Donald Kinghorn

I'm copying a comment from 'mazo' and my response here so it's more visible ... this is important in my opinion. (Note: I have not tested the RTX Titan yet)

... "But it seems FP64 performance of the Titan RTX is only slightly better than 2080Ti. Far below the Titan V, right?"

Yes, unfortunately! The reported theoretical perf is fp32 -> 12442 GFLOPS fp64 389 GFLOPS 1/32 ratio ugggg! I don't think this is an artificial "crippling" as there has been in some older cards but is actually a design limitation/trade-off ??? I don't expect to see a T100 Tesla with a 1/2 ratio like the V100 and Titan V (and Pascal Tesla P100 ...not GTX Pascal). That's actually a disappointment to me personally because I sometimes need fp64 for "new" code.

I do like the 24GB memory! That is a big deal. It will be a great card for most ML/AI kinds of stuff and most code that has been designed with 32-bit floating point as a consideration. AND that covers a lot of programs that have been written since CUDA has been available. We didn't really have good/great fp64 until Volta and P100. Many programs work around that, successfully with careful algorithms and mixed precision methods using 64 bit on the CPU side when it's needed.

The biggest problem with lack of good fp64 perf is when you are trying to work up new code that needs the accuracy or you are trying to do a quick port of some CPU code. My recent example of that is the QM code I did with PyTorch that was very quick to work up and at first it would only run with fp64. I was latter able to stabilize it enough to work in fp32 but then it quickly ran out of precision because of the very high accuracy I was getting with the calculations. You can't get 12 digits of accuracy if your precision will only give you 7 :-) I was running the code purely on GPU.

Posted on 2018-12-21 17:03:39

Very interesting. Thanks a lot for your detailed answer!

Posted on 2018-12-23 16:41:32

Hello Donald, thank you for this review. Were you lucky enough to get a Radeon VII? Any thoughts or hearsay on it's value for Tensorflow?

Posted on 2019-03-01 22:33:18
Donald Kinghorn

AMD didn't send us any... In general AMD GPU's are not well supported with the various ML frameworks. There is some support (mostly unofficial) using OpenCL but the thing that looks the most promising to me is ROCm. I do want to try that.

Posted on 2019-03-04 16:21:39

I'm new to ML but have paid my dues in the traditional programming world. I have been playing with Tensorflow via ROCm on a Vega64 and a Radeon VII. Tensorflow seems to work well in the sense that none of the tutorials and examples I've tried crashes. Caffe compiles but fails most of it's unit tests. But I'm too green to say more.

ROCm as a substitute for CUDA qua HPC is pretty solid, of course nVidia has an enormous lead. AMD seems to be concentrating on Tensorflow and maybe Theano for ML support.

Posted on 2019-03-04 19:21:22
Donald Kinghorn

Thanks, it's good to hear from someone who has had some success with ROCm. I had the pleasure of talking with the principle dev at SC17 when he was just getting it going. Looked like a good project. I'm not surprised about TensorFlow since it's the 800lb gorilla in this space. (I'm spending time with PyTorch and really liking it but I don't expect it to be running on anything other than NVIDIA for a while.)

I was at the office today doing multu-GPU testing. Doing the same (but updated) jobs I did in this post. I'll have results for 1-4 GPU's for as much stuff as I stand testing :-)

A Radeon VII showed up today. I probably wont get my hands on it for a while though.

Posted on 2019-03-06 02:49:23

Ah, excellent! I think you will like the VII.

Posted on 2019-03-06 03:35:46

Newbie with ML. Just spun up Anaconda 5.3/Keras mnist_cnn on a RTX "Turing". Compared side-by-side with a "Pascal", I see your 50% improvement in training time. However, I see no discernible difference in evaluation time. As far as I can tell you haven't shown any evaluation results with your test. Is it expected that the RTX cores improve training time but don't affect actual evaluation?

Posted on 2019-05-15 16:23:50
Donald Kinghorn

Welcome to the ML/AI world!

I benchmark with training which is forward and backward (back propagation) The heaviest compute is on the backward pass computing gradients and such. I generally use reasonably complex models. I think ResNet-50 with synthetic data is a pretty good benchmark. I like the Big-LSTM too because it is using a 2GB real data set. These 2 together expose a fair number of differences between different hardware setups. (they are both easy to get working with NGC and docker too, which makes life a lot easier)

MNIST is a relatively small data set and just 28 x 28 pixel grayscale images. It's the "hello-world" of ML/AI. It's not enough "work" to be a good benchmark with most models. The discrimination time will be smoothed out with basic start-up time which will be nearly the same for any hardware. This is especially true on the forward pass i.e. inference.

In general I don't care that much about inference (forward evaluation) benchmarks because in the real world that is what you deploy on "hyperscale" hardware OR you simplify it enough to run on a smart-phone. As a developer you care most about getting a good model to start with and that's where I focus performance testing.

Posted on 2019-05-16 15:54:07