Propeller supercomputer hardware questions

Humanoido · 2010-10-20 03:07

Dr. Mario, You have lots of encouragement here. I finally completed reading your remarkable report about the planned Propeller supercomputer that you want to build. I agree, it's a very exciting project and a worthy direction. You have great enthusiasm and very interesting ideas, including those for applications. Bill Henning is one of the great experienced masters on the Propeller forum, along with others, and you are lucky to have his and their comments. He has given you excellent encouragement and advice. I believe, given the two years that you estimate for construction of your supercomputer, you will have plenty of time to learn the things he mentioned and update your "supercomputer PDF." When will you begin construction? You could put together a few Propeller chips first and see how that works. Did you try this? When you get to that point, remember to get your project into the list of multiple prop projects by contacting me and posting your project. BTW you may also want to "predetermine" the number of required hypercube connections and see if you are willing to make that many connections. For example, in a hypercube arrangement of the nearest multiple with one CPU being termed as one Propeller chip you would have 1,024 chips requiring 5,120 connects, which seems a bit challenging for assembly and debugging. A custom PCB will be very handy.

Dr. Mario · 2010-10-20 17:22

Thanks, you guys! And, Bill and Humanoido, I agree - that I gotta learn much more - simply why I said, "I'm delving into the software", however, floating-point whitepapers require registering onto the website (which always come at $$$$), so is VLIW - however, I managed to find few VLIW presentation, one from MIT - Delayed Issues (to commission the out-of-order execution functions even in simpler form of VLIW processor [Why not? They got so much IUs in there...] - it also contain basic insight of how they functions).

However, I am sure that most points in the PDF are valid, in term of computer science view-points - I'm willing to learn mucho more within the target time from what Beau's doing now, into the sell-off of finalized Propeller II chips, which I got plentiful of times to learn. (If I go to Japan, I will be even more luckier to learn mucho more at my new job - even to get grasp the understandings of the floating point and VLIW).

And, admittedly, I have limited hand-on experience in programming a VLIW CPU, leave alone tinkering with the x86 RISC CPUs (such as AMD Athlon 64 x2 in my own computer).

BTW, Humanoido, I decided to use what's really proven in computing industry: both parallel and serial interconnection = 32 pins shared throughout the Propeller II (on the same pinouts) and eight pins being serial, from individual Propeller II chips (128 pins) onto FPGA system controller - 160 pins total. Yes, I'm going to use custom PCB, that way it's easier to make a hypercube board. I'm wondering if it's really desirable to make the Hypercube board with the PCIe 1x interconnection tab that you can literally plug into a PC and watch the Propeller II magic go POOMF! on the computer monitor... (That way even a mere mortal who have used Propeller II and have basic understandings of the programming language can really use it in his / her own PC and play with it.)

Dr. Mario · 2010-10-20 17:47

And, good news, I found boatloads of VLIW programming prescriptive presenation whitepapers at HP website - search keyword: VLIW - there are so much hits there (since they smells of Fisher, Father of VLIW, I figured I could try there - I found a gold treasure trove there).

And, pipelining is also covered there too, at least the work is cut out for me. Yay!
(Rats, there are so much to cram everything into my poor brain... T____T Oh well, it's my job, anyways, to learn everything before I could really write the firmware and the scripts.)

Dr. Mario · 2010-10-21 10:41

UPDATE:

I think it's possible to really have the COGs in Propeller II to execute the instructions other than SPINs out-of-order. There are few methods to do so, but unfortunately, because of the nature of Propeller I COGs' RISC microarchitecture - it have to be a true pipelined CPU to allow that OoOE task to be done. Plus, I am really wondering if the JAVA VM-based PropSIM program will allow such complex pipelined taks through... If not, I may need to find (or wait for) a Propeller II emulator to test the OOOE task scheduler.

Still researching few papers... =P

Forest Godfrey · 2010-10-21 21:54

Dr. Mario - Earlier in this thread, you refer to using PCI-Express as the "spine" of the computer you're wanting to build. I might be misunderstanding, but it sounds like you want to link a *lot* of boards together with PCIe, which is going to be a problem. Unlike PCI, which was a true parallel bus, PCIe is a point-to-point serial connection, not a bus, so you get one card per PCIe port on the root bridge.

Dr. Mario · 2010-10-26 12:24

Yes. That's why I am going to use the PCIe bus switcher chipset (such as popular video card PCIe switcher, LucidLogix hydra 100). However, I am aiming to use 1x PCIe bus each hypercube boards which will allow me to use as much board as possible upon the switcher's specifications. (And allowing the usage of propeller II-based hypercube board in a regular PC too, shall it be desired!)

However, the driver for PCIe is somewhat simplified due to retaining the traditional PCI bus issue system. But, IRQs for them is somewhat complicated due to throughput, so I am going to either burn the active IRQ processor onto Lattice ECP2M20 chip or wire the PIC32MX to the FPGA while managing two different type of IRQs; Intra-system IRQ (motherboard or via switcher) and intra-processors IRQ (for all Propeller II interruption request queues). So, that's a bit of mixed results, both good and bad - but at least "bad" part can be eliminated through active IRQ processor, though.

Dr. Mario · 2010-11-05 13:01

The progress is currently on the hold. But you guys can still talk about the supercomputers here.

However, I have been looking around for the most beefiest software, like Pi-Constant machine. I am grabbing up J-ALICE software for human-like AI computation across more than handful of COGs. Still need more software that's of Linux nature (so I can reverse-engineer then re-port it to my own GreenOmega OS on PowerPC seed computer and especially for the thread kernels on the individual flash ROMs. Anyone recommend me any software can do by PM or here. Please, Open-Source only... Legal reasons (aside from the convience of having the source codes)...)

Dr. Mario · 2010-12-09 13:02

yet, whenever I talk about my project in progress, everyone stops.

I don't know what to say, but I felt like wanting to abandon this supercomputer project unless you guys want me to keep on with it. (I get discouraged way too easily - some verbals and actions alone are enough...)

Humanoido · 2010-12-09 17:40

Don't stop now! You're just building up great momentum to carry this project through. Maybe you can start small with a few Propeller chips and get the fundamental design working, then add more as resources become available. As this thread has evolved with many of your ideas, I now wonder how much of the intended machine is Propeller and how much is other? It's easy to become discouraged but there are many people to encourage you and help you along the way. When you finally complete the project, you will know it's a great accomplishment and will have a tremendous amount of new education and a new machine that will do great things.

Dr. Mario · 2010-12-10 13:08

Yeah. (I did look at this forum on my cell phone - too bad I can't even post it on the phone...)

Anyways, thank for reassuring me...

Humanoido · 2010-12-10 13:53

Fascinating update but the reading indicates you're certainly not starting out small! For starters, according to estimations, you'll need at least 10,000 dollars for the base unit. What is your cost limit on this project?

Dr. Mario · 2010-12-10 20:50

Just one card at a time, which is $500 to $2,000 so that way I can do Rev. A hardware card first then do more afterwards. It's to test software and boot firmwares.

So, yea, I am starting out small. (I already have DIP-40 propeller - may get more, also buy the $100 XS1-G4 HDK to do test run on my own softwares. It's to test the boot firmware and interconnections.)

Leon · 2010-12-11 01:56

With XMOS chips delivering 1600 MIPS, and with high speed XLInks for comms between them, what is the point of including Propellers?

Dr. Mario · 2010-12-11 16:38

The point is to retain the software instruction execution capability - we could port SPIN in the multi-core DSP. We could, but then it would get way too expensive - plus, the reason to connect XMOS XS1 with Propeller II chip is to provide SIMD-like floating points, something Propeller II isn't willing to tackle without outside assistance. If you don't like the idea, then look at the hard drive or the video card - you will see why. Just pointing out the facts that lies in everyday objects like inside your computer.

Also, we all have very different objectives for our own projects. What I learned in the past is that old saw, "Two brains are always better than one" is already true, as in the case of the latest HD graphic processors - they have up to 7 different CPU cores assigned to do different tasks.

Dr. Mario · 2010-12-23 12:10

Yet another report - I am going to do exploratory order.

Humanoido · 2010-12-27 02:08

Dr. Mario, I'm following your updates and the latest starting out pdf is very interesting. I see you even include a starter schematic and parts list for the exploratory initiative. Definitely each person working on their own super computing machine has a unique purpose and design. Make sure the memory chip is plugable. Those SMTs are not designed to be changed around after installation.

Dr. Mario · 2010-12-28 16:45

Yeah, but what's permanent is going to be a cache RAM for Propeller II chips for maybe like decoding LUT cache for 32-bit x86 decoding or something that requires semi-permanent cache storage, kinda like CPU. The RAM that's pluggable would be at the switch chipset (Toshiba MIPS32 chip, kinda like what's found in a laptop - in Tape-Carrier Package - Ball Grid Array (TCP-BGA) package - or the FPGA with RAM controller) - that way I can give each cards 32-bit physical address memory (4GB).
The pluggable RAMs for now are of SPI or I2C types - both violative and non-violative.

Dr. Mario · 2011-01-04 15:22

Still so much choice for final hardwares... I will just go with simple hardware for now...

Humanoido · 2011-01-05 23:39

Dr. Mario wrote: »

Still so much choice for final hardwares... I will just go with simple hardware for now...

Going with simple hardware and software can expedite the project. Perhaps you can start right away. Make a platform that can be expanded with more processors. Then things can get up and running soon.

Ale · 2011-01-06 04:13

Dr. Mario..., I also dream of a sort of "plug more for extra HPs"

but I struggle to see where I can use (and how to distribute the load of) such a system. Maybe you already have some application in mind... or is it a purely hardware research project ?

Dr. Mario · 2011-01-06 15:30

Yeah. My goal is expandable supercomputer. It involves the usage of PCI-express bus connection between a host seed computer and a board containing 16 Propeller II chips. PCIe 1x or 8x? I don't know, but I am intending to find out later on. It's a hardware research project - and software too. It involves the deeper understanding of the pipelining and distributed computing - namely, shaking the Hornets' nest here. It may look simple in the minds of computer programmers' - but when you remove the first layer of what's going on, it wouldn't be so simple. That's why I am considering it a R&D hardware. And, XMOS XS1-G4 chips will be used here because it's also cheaper (excellent for handling the integer-based SIMD, I guess).

For now, Ale, that's purely tricky. If you're handy with the recent generations of HD video cards, or maybe a Cell BE card, then you may be able to luck out. I haven't have a chance to buy out the parts yet. (Money...)

P.S. I like the AMD Radeon HD 4600s (I have HD 4670 card), since it can also give me a upper hand here in prototyping research - I can call up the vector units to run the Propeller software in emulated state - such as emulating up to 320 COGs here. I know it won't be easy, but hey, let me cheat for a bit...

Ale · 2011-01-07 03:43

Would it not be better to have an array of FPGAs ? I know that when you start to go up with the density they become quite expensive... just a thought.
But now that I remembered it, the people at XMOS has the XMP-64 with 16 G4s connected in "hypercube". It costs 1500$ but it is ready to go... The advantage I see now are the links between G4s. I hope the propeller II brings something similar to the table... that would really be great ! I think they mentioned a serializer/deserializer or a latch on that clock (used for SDRAM).

Leon · 2011-01-07 04:31

The XMP-64 should outperform an equivalent array of Propeller 2 chips, and software development will be much easier.

pharseid · 2011-01-08 06:23

I would think with the FPGA approach, one would squeeze as much as you can onto a lower-end chip (which is still quite a lot) and then connect the chips with Hypertransport or some other high-speed I/O standard. Leon mentions that software development would be easier on the XMOS, which is designed for multichip implementations, but I suppose an ambitious FPGA project would use a CPU with a MMU and run Beowulf. Or something. Like using the Pico-Java 2 CPU and one of the Java implementations of Linda. There are a lot of directions that could take.

Leon · 2011-01-08 06:38

Inter-chip comms are the key to high performance and it would be best to design the system around a suitable comms protocol. XMOS uses a five-wire system for the fastest XLinks, with two-wire XLinks being available for long-distance interconnects. It's not as fast as HyperTransport can be, but it is a lot simpler.

Ale · 2011-01-09 00:24

You can always use a Fibre interconnect and get any distance you want. And galvanic isolation. Leon is right that the inter-processor comm is key. You can start with some props and low speed serial comms (A serializer/deserializer can simplify the software side too).

Humanoido · 2011-01-09 04:03

Even simple serial has a higher speed version.

Dr. Mario · 2011-01-09 17:15

Leon, good point... Really. I also included the XS1-G4 chips just to give out some raw horsepower whenever needed (such as running machine codes specifically for Propeller II chips - much faster on the JITC software here on XS1-G4 chipsets).

Fiber Optics is neat, but the issue is prices. PCI-express is much cheaper but faster. (BTW, I heard that PCIe 3.0 will probably use fiber optics for faster mode). Also, Fiber Optics automatically provide the galvanic isolation (which is also nice - no signal noises or even high voltage to accidentally fry the chips).

Ale · 2011-01-09 23:53

I was thinking on implementing some fibre interconnect for a couple of projects. Even 1 MBps like I2S would be enough for many projects. I only have to find the appropriate transivers/connectors, cables are readly available !

Dr. Mario · 2011-01-10 16:33

Yeah, that's the problems. But at least you can try the semiconductor distributors like Maxim Semiconductors for FO VCSEL diode driver chipsets. (You gotta read some PDFs first before you can make a solid decision.)

Also, VCSEL is the most cheap laser diode, so you shouldn't have trouble being able to purchase.

Propeller supercomputer hardware questions

Comments