interface to a PCI graphics card?
prof_braino
Posts: 4,313
I hear folks are using bunches of graphics cards as "supercomputer". All the cores, stream processing units, memory.
What about using a prop I.E. demo board configuration as user I/O node, and a PCI graphics card as processing node?
Anybody play with this yet?
What about using a prop I.E. demo board configuration as user I/O node, and a PCI graphics card as processing node?
Anybody play with this yet?
Comments
I would imagine that with the slow data rate in and out of the Prop and the minimal memory space within the Prop there is not much point in doing it anyway.
My conclusion: Impossible to do and mostly pointless.
There, with that said I can be sure someone has done it already
The NVIDIA cards that do all that neat parallel processing stuff use the PCI-Express which a lot different from your standard PCI. You'd need a FPGA to interface to it. In short it's a major project even for people who know the PCI-Express bus and VHDL coding. Plus forget about getting NVIDIA driver info unless you're a tier one PC vendor.
There is Nouveau nvidia open source drivers for linux. They did a lot of reverse engineering with these cards.
As you say, its already been done.
http://www.engadget.com/2009/12/14/university-of-antwerp-stuffs-13-gpus-into-fastra-ii-supercompute/
To clarify, the goal is NOT to use a video card are a video card, it is to use the processor as a massively parallel computing node. A traditional driver is NOT what is needed here.
The role of the prop is handle the sensors and actuators, as the tend lends itself well to this application. Again the role of the GPU is to crunch big data, as expected.
The "slow data rate of the prop and minimal memory space" are completely appropriate for the prop target application. In fact if needed, the GPU can be slowed as needed, it will still crunch a lot better than a prop, and nicely fullfil this role.
Remember, the goal is not to go toe to toe with Cray or Fujitsu, the goal is to get something remotely similar in concept for under $200, so more undergrads can play more often.
Ok, first step forward is made. PCI-E is what we want, not old PCI. I heard somebody mention FPGA in the past, and interfaces to these cards exist; so the request is for somebody that knows about this stuff. Probably hard to find on these forums, maybe I'll have to look elsewhere. It's ok, I can wait.
The fly in the ointment is that the Prop cannot host the NVIDIA development environment. It's predicated on either Win, Linux or Mac hosts. And the tools are rather resource hogs.
Maybe the PCI bit is the mistake. It doesn't have to be PCI, it just need to be an app on the GPU that we can send a serial stream to for processing, and recieve some result that it finished.
You gain nothing by this approach, it's easier to buy a used pc with the appropriate bus, then go to geeks.com and buy a $40 GPU CUDA enabled card. This all can be done for about $300.00. Well within the range of a hobbyist or student.
And you need the PCI-E bus to talk to the GPU, there is no getting around this and that means you need a bridge chip to handle that interface stuff which equals a big dog FPGA like the Spartan-6 series which is in BGA only. Maybe it's not the only way, but no one AFAIK has hacked a NVIDIA GPU the way you want and succeeded.
Propeller with new nvidia GPU looks like a motor from smallest possible lawn mower added to a heavy truck. If we add some suitable custom made gears, yes, it can make this truck move, but only very slow.
We can still add this motor to a smaller car. I have some old PCI and ISA 2D-accelerator cards with 512KB/1MB RAM and documentation. They can be suitable to attach a propeller to, particularly ISA one.
Sorry, this is my shallow understanding of the concept. I though the Antwerp guys linked above did exactly this.
While taking a stock graphics card does require PCI bus, we don't want to use the GPU as a graphics card, so we don;t need it.
The GPU chip, AFAIK, does NOT require PCI-E to function. The GPU is to be used as a pool of processing power with pool of memory. All we need is a serial or parallel channel to talk to it, and the channel's speed can be whatever, all it needs to send is the result. The GPU does that manipulation of large amounts of data.
At least this is my thought.
Maybe I got it backwards. I thought using a heavy duty GPU to process the relatively small data stream from one or a series of props would result something like these:
http://csudigitalhumanities.org/exhibits/archive/files/2ea878ba7cdc2803db6d1f218a914a18.jpg
http://2.bp.blogspot.com/_7gYNb3GSh4M/TH0buf-dVXI/AAAAAAAAVnw/t314AE09bwg/s640/4079865886586287.JPG
Maybe it doesn't work that way. but this is NOT intended to be used as a graphics card, its as a number crunched.
I don't know, just guessing.
The wiki description of the Cray 1 series were the initial driver for this inquiry. http://en.wikipedia.org/wiki/Cray_1
Considering that Cray cost 9 million dollars, and only a Cray will be a Cray, what can we make that has some similarities to a Cray, and still remain under $50?
NOTE to lurkers: This is NOT to build a replacement for your university's supercomputer. This is to get something that show something similar to those techniques used in big computers, that we can have on (my) desktop.
So far, nothing fits the bill really well, but I think if we raise the cost barrier, we can get a bunch of Cray-ish function. Question is how high do we have to raise the cost to get something interesting to play with? My hypothesis it that to cost is closer to $200 than $9 million.
So here's what I'm thinking:
Compare: Cray -to- Braino's box of parts, mostly prop chips
clock: 80Mhz: : 80Mhz
Word size 64 bit : : 32 bit, so use double words and/or 2 cog or more cogs per operation
Processors 144 : : 20 prop chips = 160 cogs; but 2 cogs per chip are used for communication channel overhead, so use 25(?) prop chips
Propforth can easily implement the use of registers for processing, and we think we can figure out something to address the concept of vector processing.
We (arbitrarily) have 32 channels to acccess which sets of processors we talk to at any given time. One could have more slower channels, or fewer faster channels, or use more cogs per chip for channels, and just add more chips.
Ignoring the ROM based tables and successive approximation techniques, the prop does not excel at floating point. So the major function we are lacking is the ability to crunch large amount of floating point numbers quickly. My thought is that a GPU or DSP is the hardware for this job, folks tell me the GPU as used by Antwerp is the way to go.
Of course, even if we get something that works, folks will still have to write software ON TOP of the infrastructure before we can actually do something useful. Those folks are already watching from the wings. But the model would be some "firmware" on the prop for inftrastructure (more or less in PASM, even if I have the source in propforth), and then something on top that serves as an application.
one prop with a program and all pins connected to an FPGA and FPGAs each with a prop connected to it using the hyper transfer examples to send data to all the props at the same time, a central ram bank and each FPGA with simple but fast 8 bit CPU's.
I sort of did this with an old computer where other CPUs with machine code are called on and given some variables to work on.
just picture an apple 2 with 2 65c02s plus the initial 65c02.
I had an interest in augmenting computer power so I wrote basic code programs some machine code on the main 65c02 and copied those machine code segments to the 6502s that where on a bread board with a ram chip and some support circuits. then when it was needed have the main cpu send some bytes telling one of the secondaries to use some variables and pass over them with the machine code.
I think this is what we have with the regular prop-to-prop multi-channel serial, except we use two pins instead of many pins.
The thing is the props are not good at the crunching we want to do, hence the quest for GPU stream processor and memory etc.
so the entire prop talks to the FPGA and the FPGA has some support systems like a math coprocessor
I suppose you could use propeller chips along with an external memory to emulate the Cray. Hub ram could be used for the registers and one or more cogs for each of the pipeline stages. It would require multiple propeller chips and the floating point performance would be dismal without a fp coprocessor but the MIPS rating would probably be respectable. Anyone have any Cray software and manuals laying around?
I knew I'd find somebody smart! Keep working on it, we should have .... Never mind, just keep working on it.
and a cluster of nodes to do a big job
here it is
http://chrisfenton.com/homebrew-cray-1a/
it is a working and cool looking example.
since it both is a working model and he made it 1/10 scale model as well.
Certainly a doable project. Prototype it on the Digilent board, get the software to work and that's most of the effort.
The thing the Cray does that I am interested is what I want to do on GPU. Like in the link, but not using a graphics card, and not for use as a graphics card.
Good luck
You may want to take a look at some of the DSP chips as well. It may be simpler to interface them to a prop compared to the GPU chips. I am not sure the DSP's (or the GPU's for that matter) will do 64 bit fixed or floating point math. That is something that would need to be determined from the data sheets or manuals. It has been a while since I have been involved in this area so I have not kept up with the latest and greatest in the supercomputing realm. Most of my work has been in the industrial automation, building automation, and security systems lately.
Thanks, I agree, but I don't have any FPGA resources, and can't get any for the foreseeable future. Not having it in the shop makes it difficult to test, so i would have to have somebody else do the experiments, and I would do the requirements. But I don't have an FPGA guy yet. I do think FPGA is good way to go, but remains out of my scope.