Comments Locked

47 Comments

Back to Article

  • CajunArson - Tuesday, March 17, 2015 - link

    "Deep Learning??"

    [Double Face Palm]

    Well it's good to see that both Nvidia and AMD like to spew empty marketing buzzwords during their presentations.
  • nandnandnand - Tuesday, March 17, 2015 - link

    Quadruple face palm?
  • maximumGPU - Tuesday, March 17, 2015 - link

    i don't get, these aren't exactly marketing buzzwords. Deep learning is a field of machine learning and has many applications. nvidia is just telling the world gpus are better at it then cpus, so you're better buying them.
  • alfalfacat - Tuesday, March 17, 2015 - link

    Exactly. Deep learning has been around in CS research for decades, and most machine learning techniques are, in some way or another, deep learning. It just means that it's a neural network with many layers. It certainly isn't an empty marketing buzzword.
  • grrrgrrr - Tuesday, March 17, 2015 - link

    Deep learning is if not the next generation of computing mechanism.

    We are shifting from turing machines to recursive neural networks bby.
  • Refuge - Tuesday, March 17, 2015 - link

    Digital Jazz baby!
  • name99 - Tuesday, March 17, 2015 - link

    It's not really buzzwords. Take something like driving.
    To drive well (or even acceptably) you need some sort of "theory of mind" for the other cars on the road. For example, if you're driving down a street and see a car move in a certain way, you DON'T just react to how the car is moving; you understand (through your theory of mind) that the GOAL of the car you've seen is to try to parallel park in the spot ahead of you, AND that parallel parking is done in a certain way AND that lots of people struggle with this and have to take two or three tries.

    And because you understand all that, you slow down in a certain way, and look out for certain behavior.

    Likewise for your car AI. Even after the car has learned about physics and avoiding collisions and so on, to do a really good job, it needs this sort of similar theory of mind, to understand how people behave when they are trying to park, to understand the GOALs of other entities on the road, and how they will behave in the future (not just at this immediate second) based on those goals.
  • blanarahul - Tuesday, March 17, 2015 - link

    I really wasn't expecting any FP64 performance anyway. There's only so much that can be done with 28 NM.
  • axien86 - Tuesday, March 17, 2015 - link

    Just 30% performance over GTX 980 and little or no FP64 capabilities means Nv Maxwells are starting to run on fumes for the next two years or so.

    By comparison the new AMD 390x series with 4096 cores with 4096-bit HBM will clearly dominate PC gaming AND graphics for a long time.
  • JarredWalton - Tuesday, March 17, 2015 - link

    I'd wager we'll see a compute optimized GM200 from NVIDIA in Tesla/Quadro cards at some point; NVIDIA just doesn't want to cut in to those margins is my bet. Both Quadro and Tesla are long overdue for an upgrade. Of course, maybe part of the "power optimizations" for Maxwell 2.0 involved removing all hardware level support for FP64 and that's why we haven't seen a new Tesla/Quadro on GM2x0? I haven't seen anything explicitly stating that's the case, but it's certainly possible.
  • nathanddrews - Tuesday, March 17, 2015 - link

    It makes sense. If NVIDIA's CUDA customers are primarily universities, governments, and large corporations, then why bother making a "budget" DP card like the Titan X? They are clearly selling enough Tesla cards to make it worthwhile to strip down the Titan brand.

    I wonder how many - if any - CUDA programmers are using Swan or other conversion methods to OpenCL? Also, what sort of performance difference would there be between CUDA on Titan X and that converted code on 390X?
  • blanarahul - Tuesday, March 17, 2015 - link

    "Of course, maybe part of the 'power optimizations' for Maxwell 2.0 involved removing all hardware level support for FP64"

    Then why keep support for that 0.2 GFLOPS of FP64 performance? Unless it's separate from the majority of SMMs that is.
  • JarredWalton - Tuesday, March 17, 2015 - link

    You can always emulate FP64 with software, which is why it's 1/32 -- you basically use 32X as many FP32 instructions emulating FP64 compared to doing it natively. CPUs, GPUs, whatever -- they can all do FP64, but for some of them it's very slow.
  • hammer256 - Tuesday, March 17, 2015 - link

    Can you actually use FP32 instructions to emulate FP64 operations? I would imagine that to emulate FP64 operations you'll need to be using integer and logical ops. Anyone knows anything about this?
    But yeah, there goes that FP64 performance... that 28nm. 14/16nm FinFET can't come soon enough :(
  • JarredWalton - Tuesday, March 17, 2015 - link

    You may be right -- it could be they're doing FP64 using lots of INT operations. It's been a while since I looked at doing any of this so I'm not up to speed.
  • JarredWalton - Tuesday, March 17, 2015 - link

    And now I've read the full review and understand that there is native FP64 hardware. I do wonder how hard it would be (performance wise) to emulate FP64 using other calculations, but most likely it would be even slower than the 1/32 ratio. Oops.

    [Goes and wipes egg off face...]
  • Ryan Smith - Tuesday, March 17, 2015 - link

    GM200 has 1 FP64 ALU for every 32 FP32 ALUs. This is the case for all Maxwell GPUs.
  • Kevin G - Tuesday, March 17, 2015 - link

    I'm actually surprised that nVidia hasn't make more use of the GK210 chip. They quietly announced it late last year after the GTX 980. I took it as an indication it'd carry the double precision performance banner for months to come and with GM200's weak DP performance, my prediction came true.

    Now the second part of my prediction is that we'd see a Quadro (K6200?) based upon GK210. That hasn't panned out yet and the Quadro lineup may just go with the GM200 after all. Pictures of a Quadro M6000 have been floating around the past couple of days.

    Ultimately the reason to remove much of the DP hardware from GM200 comes down to die size. It is already a huge chip and beefing up its DP throughput would balloon its size even more.
  • MrSpadge - Tuesday, March 17, 2015 - link

    Quadro needs the FP32 and other Maxwell improvements more than FP64.
  • testbug00 - Tuesday, March 17, 2015 - link

    "compute optimized" means what exactly? The only thing Nvidia would need to do is to not disable the FP64 units. Given that they have any on the die.
  • JarredWalton - Tuesday, March 17, 2015 - link

    Well, that's the $5000 question: does GM200 actually have FP64 capability that is simply turned off on Titan X, or did they totally remove all native FP64 hardware support from GM2x0? We haven't seen any Quadro/Tesla products based on GM2x0 yet, and the only Maxwell parts used in Quadro so far are built off GM107 (e.g. Quadro K2200). But there are rumors of the Quadro M6000:
    http://wccftech.com/quadro-m6000-flagship-professi...
  • JarredWalton - Tuesday, March 17, 2015 - link

    Update: in the Titan X review Ryan states that the FP64 is four FP64 units per SMM, compared to 128 FP32 units. So unless NVIDIA is being coy, the rumors of Quadro M6000 are probably just that: rumors. NVIDIA would need a new die if they wanted to have high FP64 performance. But again, NVIDIA isn't always particularly forthcoming when it comes to details like what is present but disabled, so there's still a (small) chance.
  • Yojimbo - Tuesday, March 17, 2015 - link

    Seems the M6000 is real: http://www.slashgear.com/nvidia-quadro-m6000-detai...
    The K5200 and K6000 are the only Quadro cards in their current lineup that looked to me to have fast F64 performance. The name M6000 seems odd though, being that previous Maxwell-based Quadros were released under the "K" naming scheme, the higher-numbered parts seem to be the ones with strong F64 performance, and "M6000" makes it look like it replaces the K6000. I have no idea what the demand for F64 performance is with Quadros, though. Maybe it's just a small sector?
  • JarredWalton - Tuesday, March 17, 2015 - link

    That's true... wouldn't be surprised to see M6000 have poor FP64 performance and be billed as an FP32 optimized "professional" GPU. But unless I'm mistaken, most of the compute people really need FP64, right? It will be interesting to see the whole story when/if M6000 launches. And we still need a Tesla update. Maybe it will be:

    GeForce: Gaming GPUs
    Quadro: FP32 with optimized OpenGL drivers
    Tesla: FP64 compute monsters
  • JlHADJOE - Tuesday, March 17, 2015 - link

    Could also be why they iterated Big Kepler into GK210 for the Tesla K80. I'm guessing we might see Kepler hang on for a while longer in the Quadro/Tesla market.
  • smilingcrow - Tuesday, March 17, 2015 - link

    "Maxwells are starting to run on fumes for the next two years or so."
    "AMD with HBM will clearly dominate PC gaming AND graphics for a long time."

    I think nVidia have a new architecture due next year which 'potentially' addresses both of your fears.
  • name99 - Tuesday, March 17, 2015 - link

    You obviously have your interests and concerns, and that is fine. But don't make the mistake of imagining that the world, and even the GPU companies, share your interests and concerns.

    Games DROVE (PAST TENSE) GPUs, but obsessive gamers are a tiny market compared to mainstream, mobile, auto, maybe even accelerators for HPC and enterprise. Gamers will pick up by default whatever advances are made for those other markets, but they are not going to be the focus of R&D by nV or ATI (or Intel or Imagination or ...)
  • Morawka - Tuesday, March 17, 2015 - link

    Pascal is due for release July this year by those charts.

    AMD may have a 2-3 month lead on memory with the 390x but not years lol
  • AnnihilatorX - Wednesday, March 18, 2015 - link

    No way, I checked the chart. Pascal definitely won't be released this year, 2016 probably.
  • Taneli - Tuesday, March 17, 2015 - link

    Maybe the lack of FP64 performance with Maxwell offers AMD the possibility to gain market share in HPC?
  • psyq321 - Tuesday, March 17, 2015 - link

    I am quite sure NVIDIA will follow-up with the full FP64 SKU for their Tesla line of products. I doubt they would design "big Maxwell" just for the highest-end enthusiast market. The size of that market is literally nothing compared to the scientific / HPC / finance markets which have no problem shelling $5K per card in batches of dozens / hundreds per order.

    I do hope AMD gains market share, though - simply because more competition is good for the consumer. If NVIDIA remains being Intel of the GPGPU market, it is going to become the same kind of market as mature x86 market, with mobile/entry desktop on one extreme and four-digit enterprise range on the other extreme (soon five probably for the highest end Xeon EX 8xxx v5 or whatever).
  • ExarKun333 - Tuesday, March 17, 2015 - link

    Interestingly-enough, NV's own slides don't show Tesla getting updated until Pascal. It doesn't seem that Maxwell is getting DP at all. Not set in stone, but that was the word just last month. Not surprising this Titan doesn't have that...
  • psyq321 - Tuesday, March 17, 2015 - link

    I think it is simply down to the process - GM200 is simply exhausting what the TSMC 28nm process can offer. Even without FP64, the die is >huge<.

    Maybe NVIDIA will retire Tesla as a brand (though this would sound to me like a strange decision taking into account Tesla's brand recognition in HPC field), but they would be crazy to abandon the HPC and enterprise markets which require FP64.

    I would rather suspect that GM200 is, simply, a short-term answer to AMD R390X - a mere necessity due to the market situation. Without that, I doubt NVIDIA would even release "Big Maxwell" as a consumer product before doing a process shrink and launching it as an professional product first. This "high end enthusiast" market is very small - this is the same reason why Intel is simply wrapping Xeon EP rejects as "-E" HEDT "enthusiast" parts, it is merely a by-product of the R&D done for the professional markets where the margins and volumes are.

    After the process shrink, I am almost 100% sure NVIDIA will release GM210 or whatever, which will have proper FP64 support and feature in the Tesla products. GM210 might even be exclusively reserved for the pro/enterprise market in the same way GK210 was.

    Kepler-based K80, with 24 GB of RAM is still quite competitive if one needs such power and FP64 support and I suppose NVIDIA launched GK210 late last year precisely because they would need to wait for a process shrink before "Big Maxwell" will be ready for the pro market.

    While this situation with GM200 launch is clearly suboptimal for NVIDIA, since pro market commands the highest margins and volumes, I suppose they could not avoid this situation due to the 16nm process maturity and availability from TSMC.

    I suppose the best that could happen for the consumer (consumer as in non-pro market) would be some kind of Maxwell version of 780Ti - maybe 980Ti or whatever, something which would give Titan X performance but for the $500-$600 mark. I suppose at this very moment NVIDIA has no reasons to release such product and the fate of such part depends firmly on AMD's next-gen performance.
  • testbug00 - Tuesday, March 17, 2015 - link

    50% more CUDA cores and 50% larger bus shouldn't take ~43% more of the die. Given 600mm^2 die. Either Nvidia didn't max the die size, Nvidia spread out the transistors to lower power usage, or Nvidia disabled the units.

    I think the latter 2 are the most likely, I hope that it's the 2nd, not the 3rd option. While I don't like Nvidia's pricing, being able to get access to FP64 CUDA high performance for less than 4-5 thousand dollars is good.
  • Yojimbo - Tuesday, March 17, 2015 - link

    There already are Maxwell-bases Quadro cards, and many of the Kepler-based Quadros have 1/24 double precision.
  • hammer256 - Tuesday, March 17, 2015 - link

    If AMD is also using 28nm, there isn't a whole lot that can be done I would imagine. I'm would also guess that FP64 units are going to take more transistors than their equivalent FP32 units (if both are running at the same IPC). Argh, need smaller processes.
  • testbug00 - Tuesday, March 17, 2015 - link

    GCN is good at FP64 to start. You don't need dedicated units. That's what Nvidia used in Kepler GK110 and, well... Personally I assumed they would use them in GM200.

    The 390(x) silicon will have a FP64 rate of 1/2. AMD will probably lock it down to 1/8, like Hawaii.

    NVidia's architectural choices allow for other "advantages" but, personally. I think a huge GPU like this really needs FP64 to sell a at high margins and pay itself off. But, I assume Nvidia does far more market research than me when it comes to this kind of stuff.
  • Yojimbo - Tuesday, March 17, 2015 - link

    I don't think AMD has the same level of support in terms of compute-oriented development tools as NVIDIA has. Also, AMD hasn't released any new architecture, so how will AMD gain market share? The old situation of NVIDIA having preferable offerings still exists. Plus, market share in HPC seems like it shouldn't shift around as quickly as gaming GPU market share. One can buy a new card from a competitor of one's old gaming card and drop it in and gain benefits as easily as one can buy a new card from the same manufacturer as one's old gaming card to gain benefits. If one buys a new compute GPU with a different architecture, one must re-optimize one's code for the new architecture. People aren't likely to switch to AMD even if AMD released a compute card in July that was superior to Tesla cards. Some time later, say 9 months, after that, NVIDIA would be releasing Pascal and presumably take the lead again. It probably doesn't make sense to re-optimize all code for 9 months of better performance. Perhaps for new entrants, it might make sense, but one is still more likely to choose the ecosystem which has demonstrated ambition and service in the sector.
  • eanazag - Tuesday, March 17, 2015 - link

    This live blog could use some iPhone panorama pictures. Nvidia has a wide format presentation.
  • mdriftmeyer - Tuesday, March 17, 2015 - link

    ``DIGITS DevBox: $15,000, available May 2015''

    Suddenly I'm back to 1996 and SGI is rolling out a new Octane System with a cheap shell.
  • p1esk - Tuesday, March 17, 2015 - link

    This price is unjustified. 4 cards, mobo, CPU, RAM, SSD, PS = $4k, $1k, $1k (64GB), $1k (1TBx2), $500 = around $7.5k. The software is free open source.
    50% margin?
  • r3loaded - Tuesday, March 17, 2015 - link

    Basically, Nvidia's being held back by TSMC taking its time getting its 20nm high performance process online.
  • testbug00 - Tuesday, March 17, 2015 - link

    No one is using 20nm for dGPUs currently. As far as anyone can tell. If they do use it, it will be for smaller GPUs.
  • JarredWalton - Tuesday, March 17, 2015 - link

    No one did a high performance 20nm node AFAIK -- they're all LP (Low Power) nodes for SoCs. I believe TSMC was planning an HP 20nm node but it wasn't really going to do any better than 28nm HP and so they abandoned it and will wait for 16nm HP.
  • JlHADJOE - Tuesday, March 17, 2015 - link

    Deep learning sounds like what the doctor ordered in order for hours and hours of CCTV footage automatically analyzed and the individuals and actions in them tagged for easy searching. Cities like London have had nearly ubiquitous recording for some time now, and this tool looks poised to make sense of that mountain of data.
  • deviceprogrammer - Tuesday, March 17, 2015 - link

    Well, the age of dark silicon along with the death of Moore's Law ( even for multicore) has truly arrived! When hardware companies have to rely on software buzzwords and applications to justify their existence.
  • watersb - Wednesday, March 18, 2015 - link

    I'm just catching up now. Exceptional photos for the live blog this time. Not easy to cover this thing in real time. Thanks much!

Log in

Don't have an account? Sign up now