Puget Systems print logo

https://www.pugetsystems.com

Read this article at https://www.pugetsystems.com/guides/508
Dr Donald Kinghorn (Scientific Computing Advisor )

Top 5 Xeon Phi Misconceptions

Written on October 3, 2013 by Dr Donald Kinghorn
Share:

The Xeon Phi has been generally available for over 6 months now and most people that keep up with developments in high performance computing have heard about it. However, I've talked with many people that have heard about the Xeon Phi but have little understanding of its characteristics other than it's a many core co-processor that is being used in some of the worlds fastest computer systems. What follows are the top 5 misconceptions I've been hearing from people about the Phi.

1)"If I add a Phi card to my system I will have an extra 60-cores for my programs to run on."

Well, yes and no. The Xeon Phi does have 57-61 cores and they are 64-bit x86 compatible. However, those Xeon Phi cores look like they belong to a separate computer connected over an internal network to your host OS. I've talked with several people who thought those extra cores were just automatically available to your host system. Those extra cores do not just show up along side your system processor cores and are not directly visible to the host OS!

2)"The Xeon Phi is Linux only."

Again, this is yes and no. The Phi itself runs an embedded Linux OS in memory on the card. The card has a boot-loader flashed onto non-volatile memory and when the card starts up it loads a file system and Linux kernel that are stored on the host system. It's called uOS. However, the card can be booted into uOS from a Windows host! There is a Windows driver for the card and it works fine. I've tried it. (I'll talk about that in another post.) Just keep in mind that even on a Windows host the card is still running Linux. You can't run Windows programs on it since the executables would be binary incompatible.

3)"You can run any Linux programs on the Phi."

NO! As I said earlier the Phi is running an embedded Linux OS called uOS. The "u" is really supposed to be a Greek letter mu and stands for "micro", as in micro-OS. That's a hint! Most of the OS functionality on the card is provided by a cool application called "busybox" which provides some of the functionality of many common core Linux command line system programs in a very compact way. It's a pretty lean Linux system! Core system level commands and network capability are provided on the card but not much else. You can directly login to the card and run programs in "native mode" but, any programs you run are going to need to to bring any needed run time libraries over to the card or make them available over the network interface to the card. (You can do NFS mounts to the card.) It's really not a general purpose Linux OS. It's an embedded system OS running in, and using up, the limited memory available on the card. ... I've had more than one person ask me if they could run Windows programs on the Phi using "Wine", sorry, that's not going to work!

4)"It's easy to write programs for the Xeon Phi"

No way! Programming is not easy period. It is easier to get something to run on the Phi than it is on a GPU with CUDA for example. However, getting a code to run with good performance on the Phi requires a considerable effort. You have to be able to utilize all those cores and hardware threads along with the core vector units in order to get good performance out of the Phi. Also, you have the same problem of limited memory and limited bandwidth over the PCI bus that you have when working with GPU accelerators. It's a non-trivial programming effort. Is it worth the effort? Yes. The effort you put into optimization for the Phi will also benefit your codes performance on the CPU and the Intel compiler tools help tremendously with this.

5)"The Phi is just a fad, it wont last"

I'm calling that a misconception too. Those who have been in the HPC business for a while may have "déjà vu" moments when they look at Phi. Remember the "Cell BE"? (the "Cell" processor was used in the PlayStation3) It had interesting potential, it was added to some super-computers of the time, and effort was made to take advantage of it. It didn't go well! The architecture was too limiting. The Phi does not have this problem! How about FPGA's (field programmable gate arrays)? Tremendous potential, and they are still actively developed and advances in usability have been made. However, they are by design special purpose and can't really be used as a general platform for running varied codes. Phi has a big advantage in this regard. Then there's Tesla + CUDA. This is the strongest competition for the Xeon Phi and has a solid 4 year head start. Nvidia has done an outstanding job at nurturing a developer "ecosystem" and there are many successful projects making good use of the strong compute capabilities of their GPUs. However, writing compute intensive code to run well on GPUs is challenging and requires a solid understanding of the architecture. Tesla isn't going away any time soon and the Phi is not going to replace it. The Phi is different from any compute accelerators that have come before and I believe it is going to evolve into a stand alone platform. It's nearly a general purpose many core "CPU" already and Intel has the means to move it forward to become a complete high performance computing system. The next couple of years should be interesting indeed!

Tags: Intel, Xeon Phi, HPC
chandra

Hi
I am thinking of purchasing xexon phi and using it to assemble a desktop. (using the default procedure) and planning to install windows 7. Is it possible?

Posted on 2014-07-06 16:05:04
Donald Kinghorn

You can use a Phi under Windows but you can't use Windows on a Phi. The OS running on the Phi is a custom embedded Linux. You can access it from Windows and you can generate code that runs on Windows and off-loads to the Phi ... but the Phi itself is running Linux. Take care -Don

Posted on 2014-07-07 17:02:10
gregge

A killer app for the Phi would be x265 video encoding, with executables launchable from OS X, Linux, and Windows. Run your program, select the video files you want to process, and it feeds them to the Phi to be processed at lightning speeds - with all the features that (currently) nVidia, AMD, and Intel hardware encoders in their GPUs and chipsets do not have.

It takes a very expensive CPU to software encode x265 at realtime speeds. That same hardware can encode to most any other codec much faster than realtime. The payoff for x265 is much smaller file sizes at the same quality as older codecs.

Where super fast and high quality x265 encoding on Xeon Phi could find a market is when 4K ATSC 3.0 TV broadcasting gets going. Live sports broadcasts would be big users of it so they could stream realtime while simultaneously recording for editing and replays.

Posted on 2018-05-28 05:29:46
Alex

You need special programs and programming libraries to access the Intel Phi co-processing card. It's not one of those plug and play types of PCI-E cards.

Unless you have an application that directly interfaces with said card or if you're a great programmer that works with C/C++/C# and assembly language, don't bother with it.

Posted on 2016-03-24 07:23:27
Ramos

Hi Donald, I love your articles and your work here, please please do more articles on the Phi and any parallel computing, cause I love to read it! I have a CS Ba and nearly a Ms.C bar a thesis, cause I *hangs head in shame* got offered a great job in enterprise database work and bailed 10 yrs ago.

But I've taken courses in Parallel computing, I've played with Unix(and ofc Linux) in nearly all forms, I've had courses in pretty much all algorithmic branches and took the algorithms way in University. Specifically in Parallel environments, I've done code on a SGI Onyx 6 and 10-way machine with legacy GCC libraries, run parallel code(Cholesky factorization, Karypis 1994 article) on either MP(I?) or was it OpenMP, I can't remember, 10 nodes of dual cpu things back then in the Uni parallel computer.

The reason I blurt all this out, is that I know you can see I know my way around this and have actual C implementation experience, but I wanted to ask, if you think I've got something to get out of the Xeon Phi if I shell out the significant cash for one?
(Would it feel like my own little parallel cluster?...is it like writing MP-code?)

(Linux cmd line skills no problem at all)

Books, got any Top 5 for a modern parallel programmer? (pro-sumer / enthusiast / professionel level)

I have only currently the Joseph Jaja (isbn 0201548569) algorithms book on the shelf atm, but I did play around with an economic algorithmic "Hello World" (Black Scholes option pricing) in CUDA as well so read some e-books on that too.

Any hints/tricks/tips you can give would be greatly appriciated.

Posted on 2014-08-21 22:31:30
Donald Kinghorn

Hi Ramos, Thanks for your kind words! You are doing all the right things. Keep a focus on parallel programming and you will always have interesting projects. Parallelism is increasing everywhere and EVERY angle you take to it will give you more insight. You want a solid foundation with openMP and MPI and then experiment with accelerators as much as you can. (On the GPU, OpenACC looks interesting and there may be some open source implementations soon...) For the Phi, mid year2015 will be a game changer with Knights Landing so you are good to keep an eye on that. Keep in mind though, that you don't have to have a Phi card to do the learning work you need to get performance out of it. If you have a modern Intel processor with AVX or better AVX2 you can go a long way by working on getting the highest parallel performance you possibly can out of that. The Phi is all about managing many threads and getting code to vectorize. The techniques you develop on the CPU will translate directly to the Phi. I wish I had more time to spend on programming. It seems I get started on something and then fall back into mostly sysadmin type of work, but I hope those insights will be good for you too! Best wishes -Don

Posted on 2014-08-25 16:04:07
David Anderson

Just as an aside, you can pick up a 60 core previous generation Phi card now for around USD$300 now, so exprimenting is much more affordable.

Posted on 2015-11-18 23:15:21
Ramos

Great info, thanks! I take it the Knight's landings on actual sockets and not PCI-e aren't out yet right?

Posted on 2015-12-06 22:06:43

I found this column informative, but confusing. How can one be sure that it isn't another chip that will not succeed on the market if it can't be easily used in the most obvious ways? Actually, that question is easy to answer, but no hint of how you could actually put the Xeon Phi to work for you is given here.

Posted on 2014-08-24 17:16:47
Donald Kinghorn

... you are wise to keep history in mind when you look at things like Phi, I've been around long enough to have seen many "co-processors" fail and there is still really good stuff like FPGA's that haven't caught on well because of usability issues. However, this is always going to be the case when people are seeking the highest performance, you have to make trade offs. Right now the Phi is for developers, not so much for end users! Parallel programming is hard no matter what the underlying hardware is. The nice thing about the Phi is that techniques you use for optimal performance on a modern multi-core CPU is the same thing you need for the Phi, (thread management and vectorization) Also, the Phi is not going away! Mid year 2015 will be a game changer with Knights Landing and eventually it will support full "standard" OS installs and behave like a "normal" system ... In the mean time, new conventional CPU stuff coming out of Intel is fantastic! Best wishes -Don

Posted on 2014-08-25 16:22:10
Cheng Tan

hi dear donald,
i want to ask u does the phi support message passing in its architecture? it has coherent l2, so i believe it is based on cache coherent, then the pthread programming would do well in this shared memory environment. but does it support message passing with the support of special message passing buffers like the intel scc?

Posted on 2014-10-15 17:03:30
Donald Kinghorn

Hi Cheng, The Phi does support message passing with Intel MPI but I haven't tried it. ... scc is something I am not familiar with. Something that the Phi does that is kind of cool -- if you compile in OFED support it will simulate InfiniBand over the PCIe bus between cards and the host ... again I haven't tried this yet.
Best wishes --Don

Posted on 2014-10-17 17:36:44
tw119

Is there anyway this could be used in a gaming rig?

Posted on 2014-10-27 12:59:28

No, this would be of absolutely no help for gaming.

Posted on 2014-10-27 16:07:48
El~Osmodivs

I would like to use one of this cards for 3D rendering but still cant find any software for this, some cards have passive cooling and would fry on the first minutes, and I dont think native active cooling can be enough, I would use a water cooling solution for optimal performance.
Anyway, what software is usable for this hardware?

Posted on 2014-11-17 21:30:15
Alex

While this could be used for 3-D rendering the specific type of support libraries available from Intel are geared more toward scientific, low latency tasks. The Phis are really made for big data and processing therein. However, I invite you to try to find another use for it.

If you're looking to scale and get better performance get a Xeon based motherboard from either Tyan or Supermicro, dual 2011 v3 sockets, get a lot of DDR4 RAM (64 GiB or more) with ECC @ 2666 MHz and a decent CAS timing. Get two Xeons of the same type and speed, that's the best way to go about it, if you're doing 3-D that doesn't utilize GP-GPU architecture.

Posted on 2016-03-24 07:36:02
El~Osmodivs

The main reason was to save some money, the Xeon Phi is way cheaper than two Xeon CPUs.
Intel has this:
https://embree.github.io/
for ray tracing rendering, but nobody has had the guts to make it work on a 3D renderer. I use Blender for modeling but it would be nice to see the Bullet Physics engine take advantage of all those cores to make fast fuid/smoke/particles simulations.

Posted on 2016-03-24 17:21:53
Alex

Hey, that's a great use for that card, actually!

What version of Blender are you currently using?

32 or 64-bit version?

Posted on 2016-03-25 00:24:39
El~Osmodivs

I have the latest GIT build, 64bit obviously.
I like Luxrender.
We all know GPU rendering is faster but GPU still can't do all of the CPU tricks.

Posted on 2016-03-25 00:40:18
TeslaK20

Hello Dr. Kinghorn
I've seen a Xeon Phi priced at an insane $200. First, however, I need your help.
I am not a programmer. What applications use Xeon Phi? I am interested in Stata and Autodesk software. Supposing that Stata uses Xeon Phi, how would I run it from windows? Or would I need to dual-boot linux? If so, how much RAM should I give it, out of 32 GB?
Thanks

Posted on 2015-01-17 14:06:34
Donald Kinghorn

Well, yes, the ~$200 Phi's are a pretty good deal. However, you are not going to have much luck with applications if you are not a programmer. It is possible with some programs to get acceleration through the Phi by using offload from MKL if you can get the codes to link to MKL shared libs. However, assuming you can get it to work, you are only going to have a performance benefit for large matrix operations. There is an increasing list of code that are being recompiled with Phi support but you are probably going to need to be a programmer to do much of anything with Phi at this point in time.
Check out https://software.intel.com/... for some code porting efforts.
Best wishes --Don

Posted on 2015-01-21 00:40:28
batmobil

Any alternatives if you want the first misconception to be true? Simply a CPU available to the host like any other CPU, but with loads of cores?

Posted on 2015-02-11 08:58:27

You can get Intel Xeon and AMD Opteron processors in 1-, 2-, and 4-CPU configurations if you just need a ton of cores. AMD Opterons are cheaper but slower (in terms of instructions per clock cycle per core) than Xeons. They can be had in up to 16 cores per CPU, for 16, 32, or 64 core systems.

Xeons are more expensive, but much faster per core / per clock and utilize much newer technology and instruction sets. They are available in up to 18-core models for 1- and 2-CPU configurations, with the 4-CPU designs currently maxing out at 12-cores per CPU (but slated to be updated to match the others later this year). That currently means 18, 36, and 48 core options... but the 36-core and 48-core Xeon setups would beat a 64-core Opteron in most usage cases, while costing more.

Posted on 2015-02-11 17:15:24
Alex

The one thing I would like to add to this George, if I may.

The Intel Phis aren't directly utilized by the system by default, such as Microsoft Windows. However, the Phis are a type of a slave architecture meaning and the main CPU and system are the masters. Technically speaking, if there was a support library written for general use for that of the Intel Phi on Microsoft Windows or other operating systems and were to automatically use the cards if they're present might be an interesting proposition.

Posted on 2016-03-24 07:43:23
John Doe

And you also don't have to buy all the CPUs at once if you have budget concerns.

Posted on 2017-03-07 06:28:38
Alex

You've made several erroneous statements:

"Most of the OS functionality on the card is provided by a cool application called "busybox" which provides some of the functionality of many common core Linux command line system programs in a very compact way. It's a pretty lean Linux system!"

First off, BusyBox isn't an application, it's a userland and micro-kernel.

BusyBox / Linux, is a stripped down Linux kernel with the Busybox command set, support libraries, support scripts and binaries necessary to build a distribution around said userland and kernel. The most common userland associated with the Linux kernel is GNU / Linux for which the vast majority of the distributions are based off of. I can see why people call it an application but it's not really.

Technically speaking; an application requires an operating system and support libraries in order to work (kernel level wise). That's where you're error is. BusyBox is the kernel and utilities in one file but in and of itself isn't an application. The micro operating system or uOS for short is in fact a distribution built around the BusyBox binary and userland. Binaries that are loaded in from the PCI-E bus and executed in the userspace of the uOS are the "applications".

Posted on 2016-03-24 07:20:36
MP

Small question: you wrtoe, that u cannot use a Phi as a/the mainprocessor in the System, but how do you explain this one???? :

https://www.supermicro.com/...

as there is clearly shown, there is no extra CPU or even a Grid Array for on to be placed. in this System the Phi is THE main processor!

P.S.: If you mean "micro" to be a greek letter then use that damn letter. on my Keyboard it's "ALT GR + M" et viola: µ. But never ever use that smal u instead. if you cant write a "my / µ" (even your 'mu' is a giant mistake, have a look at a Dictionary or at least Wikipedia) then use the word micro or the letter "m" as an abrivation of Micro.

Posted on 2016-11-24 13:19:52

What you linked to is a new generation of Phi processors, and yes some of those can be socketed directly onto a motherboard. Remember that the article you replied to is over three years old :)

Posted on 2016-11-24 18:00:13
Donald Kinghorn

Yes, like William said :-) The photo you have is of an Intel Ninja box there are several companies selling those ... but not us ... at least at this point. The best place for information is Colfax Int. They have comprehensive training on the Xeon Phi.

Posted on 2016-11-28 16:40:07
John Doe

Interesting. Interesting indeed. I must go do some googling on this. I have never seen this before. It does make sense though that it would be socketed to a board anyhow. It is just that the Phi I knew of was a PCI express card. I am sure that it will be expensive at least for a while until something else comes along. I am looking for hardware for photogrammetry and that is very CPU intensive. It runs a CPU at 100% for anywhere from 20 minutes to an hour depending on the object you are making a 3D scan of. At least until they make it utilize the GPU more, it will be painfully slow on some systems.

Posted on 2017-03-07 06:35:38
John Doe

I am glad that I have read this. If I am not mistaken, these don't work on all motherboards too so keep that in mind when selecting the all the other stuff. I am glad that I have read all of these limitations before buying one of these expensive things. Even on eBay they are expensive used. If they ever design a plug and play version of this or anything similar to that I can imagine it becoming popular. There are other ways to get many processor cores. Try a motherboard that supports CPUs with high core counts. I have seen Opterons with as many as 16 cores. They will run your software directly and without having to use Linux and Busybox. Some Intel CPUs have a lot of cores also. Neither the AMD or Intel plug and play processors will be 60 cores but you could go with your favorite brand, AMD or Intel and choose a motherboard with 4 sockets that just happens to allow CPUs with many cores. I don't know how many cores the latest Xeons have but you could get 64 cores with 4 16 core Opterons and a motherboard that can handle them and all the RAM you will ever need for the rest of your life. Intel is pretty good too but I like a less expensive CPU.

Posted on 2017-03-07 06:24:35
halplus

the phi is gonna be killed by intel itself with multicores cpus. That is. A ton less problematic. No linux, no network no problem at all. Anyways looks like the way for intel to bring all those multicore CPUS. That is 64+

Posted on 2017-10-24 16:53:09
Glen Duncan

I have an application I know will run on this card, and have the chops to get it going. The question I have yet to answer is can it run in a PCIe 1x slot? I don't need fast IO, just lots of cores.

Posted on 2018-01-05 17:54:08
Donald Kinghorn

It should work but I have not tried it personally. Since the card has it's own power connector it is probably OK but will be slow to start up etc... As long as you are not putting a lot of money into it it's probably a safe bet. I can see putting a bunch of the old Phi's in a PCIe expansion chassis from a coin mining rig. In theory you should be OK but I hope someone that has done it will reply ...

Posted on 2018-01-06 23:54:01
Robert Burkhall

Hi Doc & Glen,

Currently I have a Phi 5110p running from a PCIe riser and will test the unit with a 4 port PCIe expander as I have 4 more to add to GPU mining rig as the cooling system for 5 cards is a special setup. But the card works good from a riser and only once did I have an issue with PCIe error where the riser card was problematic.
I plan to have a graphics rending farm with the Phi's as well as mine some coin too. I do believe the Phi's will work well in my AI setup where I'd like to have tensorflow coded on the Phi's. Maybe over my head, but It's how you learn.

PS: The test is successful. Now I do need to add the other Phi's but one is working from a 4 port extender.

Posted on 2018-02-12 22:26:59