Parallella

steddyman · 2014-12-23 04:06

I know it has been mentioned on these forums previously, but I just purchased on of the new 16 core parallel boards.

Pretty amazing value for money. Fully Open Source, with 16 x 1 Ghz Epiphany cores in addition to a Xilinx XC7010/7020. I went for the 7020 based embedded platform which has 85k LE's on the FPGA. For around £160 that is cheaper than any of the Zynq dev boards out there and a lot more power.

http://www.parallella.org/board/

It should be possible to port the P1V over to it. Wonder if I can get it working alongside the ARM A9 cores and the Epiphany cores all in one design!

Heater. · 2014-12-23 08:04

Interesting. I too have a Parallella board. I backed it on Kickstarter. It'a good value ARM board, especially at the time, even without the FPGA fabric or Epiphany chip.

Shamefully my Parallella has not even been booted up yet, there are just too many things to do in the time available.

Getting a P1 in there sounds like a great project. I have no idea what use such a beast might be in the end but it sounds like a challenge that someone has to tackle.

porcupine · 2014-12-23 13:38

So with those boards, what is the HDMI output hooked up to? The FPGA or the Epiphany cores, or some GPU? Possible to write bare metal applications that can use the video output? All these SoCs like the RPi etc with their closed GPUs with binary blob drivers Smile me off.

evanh · 2014-12-23 14:28

porcupine wrote: »

So with those boards, what is the HDMI output hooked up to? The FPGA or the Epiphany cores, or some GPU?

I'm confident there is no GPU and no specific video logic at all. I'm gonna guess the HDMI signals connect to the FPGA fabric, as does the Epiphany chip.

Here's the datasheet for the CPU/FPGA - http://www.xilinx.com/publications/prod_mktg/zynq7000/Zynq-7000-combined-product-table.pdf

steddyman · 2014-12-23 15:48

Yes, it is an IP core in the FPGA that implements the HDMI output.

Damn post man didn't deliver my board today. Hope it comes tomorrow now.

porcupine · 2014-12-23 16:32

I may have to get one.

85k LEs, that's a lot of heft! Combined with the Epiphany stuff, those boards look pretty mighty!

Heater. · 2014-12-23 17:43

Porcupine,

Those binary blobs in GPUs and such are annoying for sure.

Note however that the binary blob in, for example, the Raspberry Pi is not a driver. Or at least not a Linux kernel driver module. Rather it is firmware loaded into the GPU.

The Raspi gets a lot of stick for this, but oddly if that same firmware blob were permanently blown into ROM on the GPU even the Free Software Foundation have said they would be happy with it.

And we have the same problem with all the firmware blobs living in USB sticks, SD cards, hard disks and SSDs, WIFI and 3G dongles etc etc etc. Heck even the very CPU's we run can be updated with new microcode binary blobs.

porcupine · 2014-12-23 19:07

Yeah the RPi isn't even the worst for it. At least Broadcom has released some specs, has shown it's not interested in suing people who reverse engineer to some degree. You can do some bare metal programming on the RPi and get fairly far, I've done some of it. But in the end, it's not a comfortable scenario. The vendor supports what they support, in the case of the RPi that's Linux, and that's the limit of the system because the documentation for the hardware is not really released.

The Parallella stuff looks more promising in that respect.

R

pik33 · 2014-12-23 23:04

The Pi is reverse engineered and there are tools available to do bare metal gpu programming, so we can choose. We can use it with its linux, then we can use the gpu blob to boot bare metal ARM program, then we can start from scratch and make a GPU program which will start when Pi boots.

And then...

My p1v retromachine is preloaded with a blob too. It starts the machine, then looks for "boot.sys" file on SD, then run it, so you don't have to program the machine with propplug after every restart. The boot.sys can be everything, which can run on the P1V, but it is intended to be an operating system or operating system loader - or even a "micro dos" (name from 8-bit Atari) which lists all executable files on the disk (SD in this case) and allows to select one.

The Pi works in similar way.

This Parallela board looks interesting... 85k LEs? I have to read more about it...

The link in post #1 links to 404 error page.

steddyman · 2014-12-24 01:43

Looks like they changed the link. It was working yesterday. Try this.

http://www.parallella.org/parallella-board/

pik33 · 2014-12-24 01:59

There is only smaller version of this available now here in Poland: http://1dc3www.kamami.pl/published/SC/html/scripts/index.php?productID=233752

Maybe until I can buy this 7020 version will be available, too. They say - 18 cores; well, when we add some P1Vs we will have 26 or maybe 34 ... 85k LEs? ... maybe 42 cores...

There is only one problem: no GPIO available in standard way. How to connect to these slots? I cannot find any extension boards for this.

steddyman · 2014-12-24 02:48

The 7010 based board has about 70% of the LE's taken up with the HDMI IP so not really usable if you plan to make use of the FPGA.

There is an extension board available called the porcupine which brings all the IO out to standard pin headers:

http://www.parallella.org/meet-porcupine-the-parallella-breakout-board/

It has only just been released, and I can't see it in the UK yet but Digikey will ship internationally and it is showing as in stock.

Here is all the info you need on the Zynq ARM/FPGA combo:

http://www.xilinx.com/support/documentation/data_sheets/ds190-Zynq-7000-Overview.pdf

Also notice that somebody is porting a full OpenGL / Direct 3D GPU to it (in Verilog):

http://forums.parallella.org/viewtopic.php?f=51&t=1581

steddyman · 2014-12-24 05:01

Postman just delivered my board. Wow, this is small. Much smaller than the pictures gave me the impression it was. The box was 70% empty!

Here is a picture of it next to an SD card and an Apple TV remove for comparison:

Heater. · 2014-12-24 05:26

OMG - An opensource GPU for an ARM SoC. This is Earth shaking news. I was a little amused that one of the first projects I saw that somebody had done for the Parallella was to stream OpenGL commands over the net for execution and display on the Raspberry Pi which has a very capable, if closed source GPU.

The Porcupine suddenly makes tinkering with the Parallella a whole lot more attractive.

@pik33,

I would not get so carried away with those core counts. Some of them are real general purpose cores, on the ARM. 16 or whatever of them are little more than FPU accelerators. If you throw a Pi in there than that is 8 more processors again of a very different kind.

Throw it all together and we have a fast linux machine, with super fast maths capability and funky real world bit banging interfacing via the P1. A real Frankenmachine that could be, shall we say "interesting" to develop applications on.

porcupine · 2014-12-24 08:19

So the P1V takes the role of a programmable I/O controller, then?

A lot of different instruction sets to develop for. I suppose with the right compiler tools it wouldn't be so bad. But it seems like a lot of overlap in functionality between the Epiphany itself and the Prop?

Heater. · 2014-12-24 08:38

Well that is the thing. You build some hybrid system to solve problems in an app that a single system cannot solve. For example an ARM for handling the big Linux stuff combined with a Prop to handle the real-time bit twiddling. Or an ARM plus FPGA. And so on.

Now you have another problem. How to program the mess you have created in an integrated "holistic" way.

With the Parallella, for example, one could imagine the ARM application, the Epiphany maths and the Propeller I/O are all programmed in C in a single project. Your IDE or other tooling would make sure all the right code ends up in the right place. The compilers take care of the instruction set differences nicely.

There is almost no overlap with what you would want to do on the Epiphany and the Propeller. The Epiphany is basically a multi-core floating point accelerator. It's processors have little (no) interaction with the outside world (pins). The Props COGS are a very much slower integer only processor with very limited memory, but they do have that tight connection to I/O pins.

porcupine · 2014-12-24 11:54

Having LLVM backends for all 3 CPUs would make it much more holistic from a tooling POV; use the same set of languages and compilers, but different libraries and loaders, targeting all 3 CPUs.

Certainly would be interesting.

Sounds like the Epiphany also has quite limited memory; 32k per core. That is higher at least than the per-Cog core, but still.

Interesting how each Epiphany core seems to be memory mapped. Could do something similar from the ARM to the Prop cores. Write instructions directly into HUB ram and then trigger cog start from a register address or something, and then have the Cog's cog ram memory mapped to the ARM for reading.

Heater. · 2014-12-24 12:17

Having a common back end like LLVM may or may not help. Currently we have GCC for the Propeller and probably GCC for the ARM and Epiphany.

But the deal is that we have some code that needs to be compiled and run on the ARM. No doubt with it's own main() function etc. Then we have some code that needs to be run on on the Epiphany, no doubt compiled with a different compiler and perhaps with it's own main(). Then we have more code for the Prop, again as a program with it's own main() and a different compiler.

Somehow it would be cool to be able to write all of that as a single project, no matter what compiler is used in each case. Heck I might not even want to use C in all places, perhaps I want the code running on the ARM to be JavaScript run under Node.js, after all the fast stuff is being done elsewhere in this system.

porcupine · 2014-12-28 20:08

Has anybody done any FPGA development with this board yet? I'm curious how the tooling is -- can you just download Xilinx tools and start flashing it, is it that accessible? How hard would it be to put a P1V on it?

steddyman · 2014-12-29 03:51

There is a series of tutorials on how to customize the FPGA using the Xilinx tools here:

http://parallellagram.org/cat/Parallella%20FPGA%20Tutorials

However, I'm finding the USB implementation incredibly flaky and looking at the forums I am not alone. I only get a working keyboard about 20% of the times I boot up, and even less a mouse.

The USB and HDMI are implemented as Hard IP's from Xilinx so there is no access to source code. May not be a panacea after all. Probably much better suited as a headless compute engine.

Here is the Software Development book from Xilinx:

http://www.xilinx.com/support/documentation/user_guides/ug821-zynq-7000-swdev.pdf

porcupine · 2014-12-29 06:59

That's a bummer! My BeMicro CV seems to be DOA, the USBblaster doesn't work. So I'm looking for another FPGA board and thought the Parallella board could serve that function as well as being an interesting platform of its own.

pik33 · 2015-01-01 04:47

I read some specification of Epiphany chip - this is very interesting concept. The matrix of cpu with one address space. This may be applied to the Propeller too. The matrix instead round-robin and full access to all memory space from every cog...

Maybe we can try to make this kind of structure with modified p1v cogs.

steddyman · 2015-01-01 07:08

From what I can understand each Epiphany core has access to a specific memory space that is mapped into the main memory space of the Zynq ARM cores. So each has dedicated memory isolated from other cores but shared with the ARM.

pik33 · 2015-01-01 07:14

They are not isolated. As I can understand, every core has 32k of "cog ram", mapped into 4GB memory space. All cores have access to all memory; local memory is only faster. So the hierarchy is: local RAM (fastest) - RAM of the other cores (slower) - external RAM (slowest). 4 GB address space seems to be splitted into 1 GB areas; one of them is external ram, the other is local Epiphany RAM.

Heater. · 2015-01-01 08:24

How many of us have the algorithmic chops to be able to make sensible use of the Parallella's many cores?

A while back I took up the challenge of parallelizing my home grown FFT for the Propeller using prop-gcc and OMP. Sure enough that can spread work over 2 or 4 (or even all 8) COGs for a performance boost. No it does not scale linearly.

Thing is it's not easy. Not something I want to be doing everyday. No doubt there are already far better parallel FFTs for the Parallella than anything I could come up with.

It's just hard. That is why my Paralllella board has never even been booted up.

I guess if I had an idea in mind that needed such performance and there was some ready made code to do the work then my Parallella would get unboxed.

pik33 · 2015-01-01 09:33

I need this kind of thing doing research on parallel algorithms for audio restoration. At now I am using DE2-115 with P1V as the system controller. The Parallella seems to be much more powerful than DE2 having 85k LEs FPGA, ARM and then the Epiphany processor. This is the abstract of my paper published in 2014:

Parallel signal processing algorithm based on a small number of samples

Abstract: The paper presents a parallel algorithm for parameter estimation of sinusoidal components of a complex signal. The proposed algorithm can identify the signal components when the number of available samples of the signal is limited. The proposed algorithm was tested on test computers used with different number of processor cores and floating point units. The experimental results show that the proposed algorithm can work efficiently even if the number of threads exceeds the number of processor cores. Directions for further research are outlined.

The algorithm I tested here was mean to run on GPU. Tested on "normal" multicore cpu it scales near linearly up to 32 threads and maybe more. It theoreticallu should run on 2000 compute units with speed about 1000x - the one thread overhead is very small.

Now I want to buy this thing. Maybe in summer.

Heater. · 2015-01-01 09:57

pik33,

Interesting stuff you have going on there. You may also like to take a look at the NVIDIA Jetson TK1 board. Dual core ARM and 192 GPU cores ready to run your CUDA code. Only 200 dollars or so.
https://www.youtube.com/watch?v=XmnM7ikhY1s

steddyman · 2015-01-01 12:59

Was also about to mentions the nVidia board. However, the parallella is a lot more interesting.

If you are waiting till the summer, then hang on for the 64 core parallella that is due out shortly. Combined with the fpga means you should be able to custom craft a hybrid hardware and software solution.

steddyman · 2015-01-07 13:26

Small update on this.

After much pain and learning I can now use the Xilinx ISE tools to compile the original sources with a minor code change to prove it works. I have compiled and tested the 7020 HDMI sources.

On the 7020 board the HDMI configuration takes around 25% of the LE's and around 10% of the BRAM, though there are tons of space for extra custom logic.

napablue · 2015-01-16 12:28

Newbie here. Y'all sound like you have fun toying around with these boards. Are you in academia or are you like me and work for a living?

Bought a QuickStart board for US$6 (cheap) at a Radio Shack. Recently started playing with it, and found the Propeller 1 Verilog forum. Since I design in Verilog and have used FPGAs, I was excited to check it out.

Alas, it was targeted for an Altera part, and I have only used Xilinx parts.

@steddyman, have you ported the P1 to Xilinx yet?
@pik33, you're right about implementing your DSP in hardware (FPGA). I implemented a 40Gbps EFEC 7 years ago in a couple of Xilinx parts (100% hardware) and was able to sustain while other vendors used mixed hardware/firmware and had to burst the data. Needless to say, when you're attempting to achieve 10e-24, even at-speed sustained takes forever to verify.

steddyman · 2015-01-16 15:20

I work for a living in IT for a bank, this is just one of my hobbies

I haven't ported P1v yet. I've had to take a break to updates couple of my iOS apps.

Parallella

Comments