Puget Systems print logo

https://www.pugetsystems.com

Read this article at https://www.pugetsystems.com/guides/1637
Dr Donald Kinghorn (Scientific Computing Advisor )

How To Use MKL with AMD Ryzen and Threadripper CPU's (Effectively) for Python Numpy (And Other Applications)

Written on November 27, 2019 by Dr Donald Kinghorn
Share:

Introduction

In this post I'm going to show you a simple way to significantly speedup Python numpy compute performance on AMD CPU's when using Anaconda Python

We will set a DEBUG environment variable for Intel MKL that forces it to use the AVX2 vector unit on AMD CPU's (this will work for other applications too, like MATLAB for example.) ... but please see "BIG Caveat!" at the end of this post.

You may be wondering why this is an issue. In a recent post "AMD Ryzen 3900X vs Intel Xeon 2175W Python numpy - MKL vs OpenBLAS" I showed how to do the first method using OpenBLAS and how bad performance was with AMD when using MKL. I also gave a bit of an history lesson explaining the long running "Optimization" issue between AMD and Intel. The short story is that Intel checks for "Genuine Intel" CPU's when it's numerical library MKL starts executing code. If it find an Intel CPU then it will follow an optimal code path for maximum performance on hardware. If it finds and AMD processor is takes a code path that only optimizes to the old (ancient) SSE2 instruction level i.e it doesn't take advantage of the performance features on AMD and the performance will be several times slower than it "need" to be.

The following paragraph is one of the most regretted things I've written ...

Maybe you're thinking that it's not "fair" for Intel to do that, but ... Intel has every right to do that! It IS their stuff. They worked hard utilized a lot of resources to develope it. And, there IS some incompatibility at the highest (or lowest) levels of optimization for the hardware. MKL is insanely well optimized for Intel CPU's ... as it should be!

I honestly don't feel that way! This came up years ago when Intel first started marketing their compilers. I was outraged then and really, I still am. I think I was trying to "justify" it because I couldn't change it then and I can't change it now. The only thing I can do now is show people how to get around it! For everyone that takes offense to that paragraph, I understand and I regret saying it. Collectively we should continue the fight against that sort of corporate tactic. --dbk

Read the post listed above if you are interested in this old and ongoing issue.

In the next sections we'll look at performance results from a simple numpy matrix algebra problem. There will be results from the post that was linked above along with new results using the 24-core AMD Threadripper 3960x.

Test systems: AMD Threadripper 3960x, Ryzen 3900X and Intel Xeon 2175W

AMD Hardware

  • AMD Threadripper 3960x 24-core AVX2
  • Motherboard Gigabyte TRX40 AORUS EXTREME
  • Memory 8x DDR4-2933 16GB (128GB total)
  • 1TB Samsung 960 EVO NVMe M.2
  • NVIDIA RTX 2080Ti GPU
  • AMD Ryzen 3900X 12-core AVX2
  • Motherboard Gigabyte X570 AORUS ULTRA
  • Memory 4x DDR4-3200 16GB (64GB total)
  • 2TB Intel 660p NVMe M.2
  • NVIDIA 2080Ti GPU

Intel Hardware

  • Intel Xeon-W 2175 14-core AVX512
  • ASUS C422 Pro SE (My personal workstation )
  • 128GB DDR4 2400 MHz Reg ECC memory
  • Samsung 960 EVO 1TB NVMe M.2
  • NVIDIA Titan V GPU

Software

  • Ubuntu 18.04
  • Anaconda Python build Anaconda3-2019.07-Linux-x86_64
  • numpy 1.16.4
  • mkl 2019.4
  • libopenblas 0.3.6

Notes:

  • OpenBLAS is an excellent open source BLAS library based on the, highly regarded, work originally done by Kazushige Goto.
  • OpenBLAS does not currently have optimizations for AVX512 (It does include AVX2 optimizations)

Using MKL_DEBUG_CPU_TYPE=5 with AMD CPU's

The environment variable above is the "new secret way" to fool MKL into using an AVX2 optimization level on AMD CPU's. This environment variable has been available for years but it is not documented. PLEASE SEE THE CAVEAT IN THE CONCLUSION!

I seem to remember this from long ago with Opteron?? In any case it has been making the rounds on forums recently as a solution for getting MATLAB to perform better on AMD CPU's (other use cases too). This should work for any application that is making calls to the MKL runtime library. I believe it is forcing MKL to take the Haswell/Broadwell code path which gives an optimization level that includes AVX2. By default, MKL looks for "Genuine Intel" and if it doesn't find that it drops to a code path only optimized to SSE2 instruction level i.e. no modern hardware optimizations.

On Linux you would set this environment variable in your working shell or add it to .bashrc

export MKL_DEBUG_CPU_TYPE=5

In a Jupyter notebook cell you could use (!),

!export MKL_DEBUG_CPU_TYPE=5

On Windows 10 you could set this in (Anaconda) Powershell as,

$Env:MKL_DEBUG_CPU_TYPE=5 

or, you could set it in a Jupyter notebook cell using (!) the same as in Linux.

You can also set this in System in Control Panel (Advanced tab or the Advanced System Settings item),

System control panel Env

Threadripper 3960x, Ryzen 3900X and Xeon 2175W performance using MKL, MKL_DEBUG_CPU_TYPE=5 and OpenBLAS for a Python numpy "norm of matrix product" calculation

numpy is the most commonly used numerical computing package in Python. The calculation presented in this testing is very simple but computationally intensive. It will take advantage of the BLAS library that gives numpy it's great performance. In this case we will use Anaconda Python with "envs" setup for numpy linked with Intel MKL (the default) and with OpenBLAS (described in the next section).


numpy Ryzen 3900X vs Xeon 2175W MKL vs OpenBLAS

Look at those results and think about it for awhile ... The standout features are,

  • The best result in the chart is for the TR 3960x using MKL with the environment var MKL_DEBUG_CPU_TYPE=5. AND it is significantly better than the low optimization code path from MKL alone. AND,OpenBLAS does nearly as well as MKL with MKL_DEBUG_CPU_TYPE=5 set.
  • MKL provides tremendous performance optimization on Intel CPU's The test job is definitely benefiting from AVX512 optimizations which are not available in this OpenBLAS version.
  • OpenBLAS levels the performance difference considerably by providing good optimization up to the level of AVX2. (keep in mind that the 2175W is 14-core vs 12-cores on the Ryzen 3900X and 24 cores on the TR 3960x)
  • The low optimization code-path used for AMD CPU's by MKL is devastating to performance.

This test clearly shows the effect of hardware specific code optimization. It is also pretty synthetic! In the real world programs are more complicated and are usually not anywhere near fully optimized especially in regards to vectorization that takes advantage of AVX. There are also common numerical libraries that are not so heavily targeted to specific architectures. For example, the popular, and very good, C++ boost library suite.

Creating an "env" with conda that includes OpenBLAS for numpy

Please see this older post "AMD Ryzen 3900X vs Intel Xeon 2175W Python numpy - MKL vs OpenBLAS" for information on how to use OpenBLAS with Anaconda Python and the Python code that was used for this testing.

Conclusion (BIG Caveat!)

I have to reiterate, MKL_DEBUG_CPU_TYPE is an undocumented environment variable. That means that Intel can remove it at any time without warning. And, they have every right to do that! It is obviously intended for internal debugging, not for running with better performance on AMD hardware. It is also possible that the resulting code path has some precision loss or other problems on AMD hardware. I have not tested for that!

The best solution for running numerical intensive code on AMD CPU's is to try working with AMD's BLIS library if you can. Version 2.0 of BLIS gave very good performance in my recent testing on the new 3rd gen Threadripper. For the numpy testing above it would be great to be able to use the BLIS v2.0 library with Anaconda Python the same way that I used OpenBLAS. Someone just needs to setup the conda package with the proper hooks to set it as default BLAS. I don't have the time or expertise to do this myself, so, if you can do it then please do! and let me know about it!

Happy computing! --dbk @dbkinghorn


Looking for a GPU Accelerated Workstation?

Puget Systems offers a range of workstations that are tailor-made for your unique workflow. Our goal is to provide the most effective and reliable system possible so you can concentrate on your work and not worry about your computer.

Configure a System!

Why Choose Puget Systems?


Built specifically for you

Rather than getting a generic workstation, our systems are designed around your unique workflow and are optimized for the work you do every day.

Fast Build Times

By keeping inventory of our most popular parts, and maintaining a short supply line to parts we need, we are able to offer an industry leading ship time of 7-10 business days on nearly all our system orders.

We're Here, Give Us a Call!

We make sure our representatives are as accessible as possible, by phone and email. At Puget Systems, you can actually talk to a real person!

Lifetime Support/Labor Warranty

Even when your parts warranty expires, we continue to answer your questions and even fix your computer with no labor costs.

Click here for even more reasons!

Puget Systems Hardware Partners

Tags: Ryzen, Python, Scientific Computing, AMD, numpy, BLAS, Threadripper
Methylzero

Openblas 0.3.7 now has some AVX512 optimization, and further optimizations are present in trunk, so when 0.3.8 comes around it will have quite good AVX512 performance on at least some of the BLAS functions.

Posted on 2019-11-28 11:51:29
Donald Kinghorn

Thanks for adding your comment! People should know that OpenBLAS is more up-to-date than what is pulled down by conda as I did in this post. It would be good to see that updated ... I'd also really like to see a BLIS setup

Posted on 2019-11-30 00:02:03
tim3lord

You should make a post about installing OpenBLAS with numpy and pytorch. I can't seem to figure it out on my AMD Threadripper machine!

Posted on 2019-11-30 00:13:19
Donald Kinghorn

If you check out the post "AMD Ryzen 3900X vs Intel Xeon 2175W Python numpy - MKL vs OpenBLAS" you'll see a way to do it with Anaconda Python. It's basically changing the default BLAS runtime lib in the environment. I haven't checked this with PyTorch yet.

In that post I added OpenBLAS when I setup the Jupyter kernel but you can (and maybe should) do it when you create the env,

conda create openblas-np blas=*=openblas

You are right, I should write up a better post detailing how that works ... and check it out for PyTorch (when I first worked on this stuff I was going to do PyTorch but just did numpy for the write up ...)

Posted on 2019-11-30 00:33:06
tim3lord

Thanks for all of the work you put into your articles. Very helpful!

Posted on 2019-11-30 00:42:20
Donald Kinghorn

... I just noticed that OpenBLAS 0.3.7 is in conda-forge when I was replying to tim3lord below ... I feel another post coming on :-)

Posted on 2019-11-30 00:35:20
Donald Kinghorn

yup here it is,
kinghorn@u18tr:~$ conda create --name openblas.3.7 -c conda-forge blas=*=openblas
Collecting package metadata (current_repodata.json): done
Solving environment: done

## Package Plan ##

environment location: /home/kinghorn/miniconda3/envs/openblas.3.7

added / updated specs:
- blas[build=openblas]

The following packages will be downloaded:

package | build
---------------------------|-----------------
blas-2.14 | openblas 10 KB conda-forge
libblas-3.8.0 | 14_openblas 10 KB conda-forge
libcblas-3.8.0 | 14_openblas 10 KB conda-forge
libgcc-ng-9.2.0 | hdf63c60_0 8.6 MB conda-forge
liblapack-3.8.0 | 14_openblas 10 KB conda-forge
liblapacke-3.8.0 | 14_openblas 10 KB conda-forge
libopenblas-0.3.7 | h6e990d7_3 7.6 MB conda-forge
------------------------------------------------------------
Total: 16.3 MB

Posted on 2019-11-30 01:39:58
Donald Kinghorn

Here is a quick test on Intel(R) Core(TM) i9-10980XE CPU @ 3.00GHz (I have access to this for a few days)

MKL default
kinghorn@u18tr:~$ conda activate nptest
(nptest) kinghorn@u18tr:~$ python nptest.py
took 10.723653316497803 seconds
norm = 2828339.6004819106

OpenBLAS 0.3.7 from -c conda-forge
(openblas.3.7) kinghorn@u18tr:~$ python nptest.py
took 38.21160125732422 seconds
norm = 2828248.9869486988

OpenBLAS 0.3.6 from -c anaconda
kinghorn@u18tr:~$ conda activate np-openblas
(np-openblas) kinghorn@u18tr:~$ python nptest.py
took 24.212520122528076 seconds
norm = 2828509.2474168343

The conda-forge package doesn't seem to work as well in this case for some reason??? Note: this is the same test code that I ran in the post. You can see that the Core-X 18-core does a little bit better with MKL (as expected) than the TR 3960x ... MKL is highly optimized for Core and Xeon for this kind of problem, it's mostly DGEMM going to the AVX512 vector unit...

for completeness here's NAMD STVM on this CPU
Info: Benchmark time: 18 CPUs 0.379879 s/step 4.39675 days/ns 4820.6 MB memory
The Ryzen 3950x 16-core does this in job faster at 4.2 days/ns !!

ooops, does a bit better with hyperthreading in use,
Info: Benchmark time: 36 CPUs 0.323866 s/step 3.74845 days/ns 5569.66 MB memory

Posted on 2019-11-30 02:15:00
Methylzero

Ooops, I think I was wrong about AVX-512 in 0.3.7, that version just disabled more AVX-512 code, due to at the time unresolved bugs causing wrong results. Trunk should now have the fixed asm kernels and whatnot and AVX-512 enabled.

Posted on 2019-12-03 13:47:37
Ned Flanders

I understand you included a word of caution for the usage of the MKL in AVX2 Mode on AMD CPUs. But this is nowhere near a new tweak. It never made it out of the HPC department but is used there for a long time already. Its not dangerous to use it and AMD CPUs do not give incorrect results using this tweak. Yet, OpenBLAS, if available,is the lib of choice. Fully agree with that, and I sincerely hope that Matlab will include a choice between MKL and OpenBLAS in future releases. Bundling a software with the MKL without offering alternatives is a wrong.

Posted on 2019-11-29 21:52:37
Donald Kinghorn

Hey Ned :-) Yes, I do agree with you, but I did decide to put a caution in there (while I was writing the post) just in case there is something strange with with how it responds with Zen2. I really didn't do the depth of testing I would do in a critical production environment ... but yes, it's highly unlikely that there are any problems.

I think it is great that you and others have exposed this old hack ... I had completely forgotten about it! I haven't really seriously used AMD stuff since I was building Opteron clusters "back in the day" ... This is a wonderfully simple way to get proper performance on AMD hardware for some of these important programs that linked to MKL for BLAS and Lapack calls.

My biggest fear is that, with this spreading around the net, Intel will pull the plug ... because they could ... (or at least make it more difficult)

I am pretty impressed with BLIS v2.0 and OpenBLAS is really very good (I used Goto's libs years ago). With AMD back in the game I hope that devs and ISV's will start offering alternatives to a default link to MKL

I'm also really impressed the the new Ryzen and Threadripper processors! I think the 64-core TR will be tremendous for a codes like NAMD, really looking forward to testing that.

Posted on 2019-11-30 00:04:46
Misha Engel

Maybe you can get your hands on a EPYC 7H12, it's also a 280 Watt part with 64c/128tr, when pcie 3 is fast enough you can use an old board.

Posted on 2019-11-30 00:43:43
Donald Kinghorn

would seriously love that :-)

Posted on 2019-12-02 22:05:10

See, thats exactly why I published it. "Awareness!". My feeling was that many simply implemened the MKL and when they or their customers realized that AMD was slow... well... it was slow... its AMD.
The way this MKL workaround spread the net like a wildfire only confirms this. Now at least, everyone knows what the problem is and I sincerely hope that a vendor string discriminating piece of software will not be implemented anymore without alternatives. This has always been wrong but now, people know at least. So in case Intel pulls the plug, companies like Mathworks would face even more pressure to implement alternatives. To not support the fastest CPUs (Threadripper and 3950x) on the marked is not a good idea. And as you say.... alternatives are there. OpenBLAS is actually very good, and gets better with every release. Again, thanks for picking this up and providing some benchmarks here!

Posted on 2019-11-30 19:04:29
mockingbird
Intel has every right to do that! It IS their stuff.

The FTC said otherwise.

https://www.ftc.gov/sites/d...

"IT IS FURTHER ORDERED that Respondent shall not make any engineering or design change to a Relevant Product if that change (1) degrades the performance of a Relevant Product sold by a competitor of Respondent and (2) does not provide an actual benefit to the Relevant Product sold by Respondent, including without limitation any improvement in performance, operation, cost, manufacturability, reliability, compatibility, or ability to operate or enhance the operation of another product; provided, however, that any degradation of the performance of a competing product shall not itself be deemed to be a benefit to the Relevant Product sold by Respondent. Respondent shall have the burden of demonstrating that any engineering or design change at issue complies with Section V. of this Order."

Posted on 2019-12-02 13:56:50
Neo Morpheus

Doesn't matter, Puget is in Intel's pockets. No wonder that this whole site always seems to recommend Intel systems, regardless of Intel CPUs being more expensive and slower than AMD ones.

Posted on 2019-12-02 17:27:56

That would be a shame for this site. I would hope that benchmark data dictates what CPU is better rather than some kind of Intel bias.

Posted on 2019-12-02 18:01:24

I'm not going to comment on the whole FTC ruling (it is 23 pages and I'm not even going to pretend I understand all of it), but saying we are in Intel's pockets is not at all accurate. We are an Intel partner, but we are also an AMD partner - not to mention NVDIA, Samsung, etc. If you read through our recent articles ( https://www.pugetsystems.co... ), you will see that we show that with the latest Ryzen and Threadripper CPUs, AMD comes out on top of Intel outside of a few isolated cases where Intel keeps a slim lead at certain price points. I'm not sure where people get the idea that we are paid off by Intel - we have always been about getting our customers the fastest and most reliable product for their workflow. We make the same margins whether it is AMD or Intel, so why would we ever offer a sub-par product to our customers?

Now, it is true that if AMD and Intel are very close in terms of price and performance that we will lean towards Intel for our customers. A lot of that is simply the fact that we have a TON of experience with the Z390 and X299 platforms since Intel has been pretty dominant in the markets we cater too for so long. We know their quirks and how to mitigate them, and have solid engineering contacts with both Intel and the motherboard manufacturers when issues come up. We don't have that with AMD quite yet, and that is something that can only be gained over time. Thunderbolt support that we know works is also a big factor since we have a significant number of customers who are moving from Mac to PC that need it.

As we get more sales and experience with AMD Ryzen and get Threadripper qualified up and start selling it as well, we may start shifting more of our systems where Intel and AMD are neck-in-neck over to AMD, but it is going to depend on whether or not any issues come up and how severe those issues are. Our customers are overwhelmingly not tinkerers or even all that interested in computer technology (which I suspect most of our article readers are), and they are more than willing to sacrifice a bit of performance in order to guarantee stability. It is the same reason we don't do overclocking - a bit more performance in exchange for even the chance of a few more crashes a month/year/whatever simply isn't a good exchange for our customers.

Posted on 2019-12-02 18:05:50

Thanks for clarifying. I did hope that you were not in Intels' pocket as suggested by Neo Morpheus even if you lean towards Intel if neck and neck on benchmarks.

...they are more than willing to sacrifice a bit of performance in order to guarantee stability


Does security fall under stability? I find it quite odd that people are willing to risk Intel when they have now-published unfixable security vulnerabilities just waiting to be written into some consumer targeted hack tool. It makes me nervous to run Intel, which I currently do.

PS. Your Link has a ")" in it causing 404

Posted on 2019-12-02 18:17:59

Security I think is different than stability. This is getting into my personal opinion here, but a lot of the security concerns recently are not as big of a deal as some people make it out to be. Many of them are definitely a problem for servers, but for a workstation they require such a specific set of circumstances for them to actually be a problem. I'm relying on what I've been told from people way more informed about this stuff than I am, but if someone needs physical access to the machine in order to take advantage of the flaws that isn't too big of a deal IMO. If someone already has physical access, you are in trouble no matter what.

Thanks for the mention of the link error - I got that fixed.

Posted on 2019-12-02 18:51:23
libastral

Also these fixes degrade performance, mostly on Intel, since AMD architecture is far less vulnerable by design. Something to keep in mind when choosing a pricey workstation that's supposed to last many years.

Posted on 2019-12-02 21:20:01
Larry

I'll take Matt's word over you guys any day since he's the one who work with these systems on daily basis.

Posted on 2019-12-03 19:38:53
Donald Kinghorn

This really is not true ( Matt lays it out well) ... the fact is that AMD is back in the game in a serious way with the new Zen 2 core CPU's. I hope we will be able to look at EPYC too ... I'm pretty blow away with the new Threadripper and really looking forward to trying the 64-core! We will do extensive testing and make recommendations as appropriate ...

You know, there is some great hardware coming out, and innovation is happening again. It's an exciting time to involved with it!

Posted on 2019-12-02 22:04:05
Donald Kinghorn

It's a shame that AMD "settled" on that. It seems that the only real thing Intel "had" to do was put that "Optimization Notice" on there docs! But it doesn't matter. This is now and we need to move on. AMD is on a roll. Their new BLIS v2.0 lib is looking good and OpenBLAS is also excellent. I'm planning on trying some optimized code builds linked with BLIS to see what performance we can get. AMD and the community need to focus on building "ecosystem".

Posted on 2019-12-02 22:11:11
Larry

Thing is.. the AVX/AVX2 on AMD CPUs aren't changed. There's no performance loss with them.

It's the software library and Intel has all the rights to do whatever they want with that software.

Posted on 2019-12-03 19:54:02
La Frite David Sauce Ketchup

"Ford Cars get up to 20% more MPG when running on Ford Gas*.

*By intentionally not shifting into the OD gear when non-Ford Gas is detected."

Good exemple of what you saying with intel have the right to destroy amd perf on software
Have a company like intel spend money to de-optimised amd cpu

big lol

Posted on 2019-12-02 17:48:38
Kaptein Sabeltann

ok boomer

Posted on 2019-12-02 17:50:47
Donald Kinghorn

ha ha yup I've been doing HPC since it was a thing ... really ... was one of the first folks to build a Linux cluster for computational chemistry. ... and yes we used AMD Opteron for that

Posted on 2019-12-02 21:58:08
Hifihedgehog

I am disappointed by the level of incompetence and misunderstanding this industry professional demonstrates about the creation of libraries and the anti-trust violations involved here. I suppose he is unfamiliar with Microsoft and the decades of examples of precedent that has been established that prohibit this kind of behavior. AMD's Ryzen processors fully support AVX so Intel manually restricting its use in an industry-wide ubiquitous library is like forcing the opposing team to wear weighted shoes and combat packs at a sports match at your home stadium. You can claim that your fans are the ones mostly attending here and your team invested in the stadium as a reason for forcing the other team to play under different conditions but such action is still a clear violation of league regulations.

Posted on 2019-12-02 18:35:07
libastral

Yeah, I'm astounded that so many people fail to realize the issue here is deliberate competitor sabotage, not just "Intel not optimizing for AMD". I feel that this reddit comment thread summarizes the issue perfectly.

Posted on 2019-12-02 21:18:38

Except that the Reddit post takes a comment from this blog, by a single employee here at Puget Systems, and portrays it as the company's stance... and then also ignores the fact that it was said in the intro to a post about *sharing with the wider community how to get around Intel's dumb choice*. Dr Kinghorn was trying to be helpful, and spread word of a work-around that massively improves performance on AMD with this software, and he is getting lambasted over one paragraph that doesn't line up with what some (many? most? does it matter?) readers believe. "Biting the hand that feeds" seems like an appropriate metaphor in this case :/

I am not a lawyer, but it seems to me that there is a difference between the "right" to do something and whether or not it is a good / smart / moral decision. If someone writes software that straight-up does not run on a certain CPU (or brand of CPUs)... I suppose that is their right? Its a stupid decision, in my opinion, and I would consider it to be morally wrong, but that doesn't mean that they are legally prohibited from it. Maybe there is something more in the rulings that have been passed down regarding this stuff specific to Intel (again, I am not a lawyer) that specifies that they are not legally allowed to do stuff like this, but if not then it seems to me correct to say that they have the "right" to do it... but also that it is a really dumb thing to do, and bad for the tech community as a whole. Please note, however, this is *my opinion* and does not necessarily reflect the views or opinions of Puget Systems as a whole :)

Posted on 2019-12-02 21:42:23
Hifihedgehog

There is no denying he had good intentions when he wrote his piece. Intel shilling wasn’t my beef here. I am just highly disappointed by his “Intel has every right” line. That is miscategorizing anti-trust violations as something up to moral debate and not a legal issue, an assertion Puget’s other staff also made. This is completely inaccurate and is a stark reflection of his and Puget’s misunderstanding of the legal ramifications at play here. Intel may have billions of dollars and 10,000’s of employees behind this, but they still cannot exclude a competitor’s product from AVX optimization. There are already constructs in place to programmatically verify proper AVX support in a processor at a lower level than just the name of the CPU manufacturer. This would be akin to blocking 64-bit support for Intel products on AMD libraries just because AMD could not independently verify compatibility even though Intel processors with the AMD64 flag enabled should produce a 100% identical result in conforming to the AMD64 instruction set.

Posted on 2019-12-03 00:10:44

That is a fair criticism, and as far as I know none of us here at Puget are lawyers or particularly up to date on legal stuff like that :)

Posted on 2019-12-03 00:16:05
Donald Kinghorn

That line kind blew up on me :-) Your take is right on! This kind of thing has been going on for so long that maybe I've become numb to it when in the past I would have been outraged. I can tell you, I was outraged when it first happened years ago. I couldn't believe that Intel got away with it and that AMD "settled". At that point I just used PGI compilers and Goto's library (which has become OpenBLAS) Performance was good with that.
Maybe it's a good thing that people are riled up again about this kind of thing. There is lots of new hardware coming up including really innovative stuff. We need to make sure that good things don't get squashed by the "big guys".

Posted on 2019-12-03 00:59:37
gparmar76

The CPUs are all x86 and AMD pays a license for it..Intel deliberately crippled AMD here and that much is clear. It has nothing to do with "owning" the software.. perhaps now would be a good time to read up on anti competitive laws.

Posted on 2019-12-03 03:14:04
Methylzero

And all Intel CPUs are using the AMD64 extensions. AMD and Intel agreed to license a lot of their patents to each other for compatibility reasons.

Posted on 2019-12-04 14:18:11
Donald Kinghorn

Yes, it really is sad. I'm a big fan of open source (open everything) and sharing. The "issue" with MKL and the Intel compilers in general was a big disappointment. When Intel launched their compilers they were looking really good so everyone want to use them but AMD was doing some great CPU's at that time and then this whole stupid "disable optimizations" thing came up. Myself and the people I worked with used PGI compilers and do so for may years.

Then AMD kind of just stopped and folks shifted to Intel compilers and MKL. Once Intel opened the MKL dev libs for free use a lot of ISV started linking to it by default since most high performance work was being done on Intel. Now that AMD has killer CPU's again this whole thing has come to light again.

I really hope that AMD will be able to get the resources to keep up the work on BLIS and keep their momentum going on the hardware.

Posted on 2019-12-02 21:55:51
Behrouz Sedigh

AVX,AVX2,SSE2,SSE3,SSE4 , Those extensions are Free extensions (Base on agreement between Intel/AMD) that AMD can use Those extensions on their CPU and Intel can't make compiler by writing this Code :

If Cpu = i5/i7/i9
then use AVX2
otherwise
use SSE2

This is illegal.BUT :

If Cpu = i5/i7/i9
then use Non-Free Specific extension
otherwise
use Standard extension

Or

If Cpu = i5/i7/i9
then use Hack Method = Go to B then C then A then D
otherwise
use Standard Method = Go to A Then B then C Then D

Or

If Cpu = i5/i7/i9
then AVX2
otherwise
Get Error
// this means only available on specific CPU

This is legal.this is called "OPTIMIZATION"

Posted on 2019-12-02 19:54:41
Leandro Ferrero
Maybe you're thinking that it's not "fair" for Intel to do that, but ... Intel has every right to do that! It IS their stuff. They worked hard utilized a lot of resources to develope it. And, there IS some incompatibility at the highest (or lowest) levels of optimization for the hardware. MKL is insanely well optimized for Intel CPU's ... as it should be!

I have been buying intel on the last decade a lot. But Intel has no right to hurt the performance of the final users if we don't buy hardware from them. Because it's not ethical, and maybe can be illegal.

They have the right to do new optimizations, and when the compiler found that optimizations are officially supported they should be used by the compiler.

If every company does that... maybe you can receive 80v instead of 110v because you are connected to the electrical network of another company. "They have every right to do that, because they spend money and resources on generating that electricity"... that logic is absurd.

That is not fair competition at all. Imagine Apple throttling internet connection, on google apps like google chrome on iOS. That's fair? they have every right too?

They have to be stopped. That's wrong, period.

Posted on 2019-12-03 00:58:41
Donald Kinghorn

Sorry, you are right. I've added a note around that paragraph ... It's not right and I should not have tried to justify it because I honestly think it was the worst sort of corporate tactic!

Posted on 2019-12-03 02:02:27
Leandro Ferrero

I respect that.

Posted on 2019-12-03 02:38:59