FGPA recommendations?
prof_braino
Posts: 4,313
Hi All
We're looking at making a new P1V type FPGA build. Previously we user the BEMicro Cyclone V, I think it was around $50.
What is your current favorite best bang for the buck FPGA option?
We are hoping to get the FPGA on a development board for around $50 again.
The project is a FORTH optimized P1V variant: no VGA, no ROM tables, no timers, and instead use resources for more Cogs and RAM.
Tentative spec
8 to 16 cogs
2k to 8k bytes cog RAM
64k to 128k hub RAM
64 to 128 I/O
160 to 200 Mhz
The BEMicro achieve most of the the low end specs and one or two of the high end specs, now we are looking for some option that might provide more of the high specs.
Any recommendations?
Thanks!
Comments
Cool specs. However changing size of Cog RAM is a big ask. There's good reasons why the Prop2 didn't up that spec - The main one being instructions limited to 32bit in size. Currently with 512 general 32bit registers means 2x9=18 bits of each instruction is needed just for the two operands.
PS: Prop2 will have a variant of similar spec. See http://forums.parallax.com/discussion/164364/prop2-family/p1
PPS: On top of the 18 bits for operands there is another 6 bits for condition code handling. This leaves a mere 8 bits for opcodes and addressing modes. If Cog RAM was up'd one more bit addressing to allow for 1024 registers (4kByte) then that would reduce it to just 6 bits. That'd be extremely tight.
PPPS: Cluso has done some mighty fine tables of the instruction map for the Prop2 and recently even did the Prop1 - http://forums.parallax.com/discussion/164612/p1-instruction-set-summary - where you can see the 64 dual operand instruction slots, one of which (%000011) is set aside for the single operand hub-ops.
In the Prop2, Chip has removed the R bit from the instruction encoding to make room for 128 dual operand instruction slots. See http://forums.parallax.com/discussion/comment/1381397/#Comment_1381397
Although it isn't quite that plain, since by reusing the Z,C,I bits for instructions that have no use of them then only 105 slots are used to make 143 dual operand instructions. On the other hand, for example, 16 of the 128 slots are used by just 4 single operand instructions. There is currently just 2 spare dual operand slots.
Who is going to design this thing? It sounds like quite a tall order. Chip and a cast of thousands has been working on the P2 design for years.
I think apart from the MHz and cog ram, you'll get somewhere close, but you'll have to work out how to get the cogs to co-operate (modified hub).
It may make sense to aim for something like 12 cogs and 120 MHz to preserve the same mips per cog.
Just be wary about the stock situation with the bemicros though, compared to Terasic
@evanh - maybe I confused myself on the cog ram size. The "more cog ram" might have been from more cogs with the same 2k byte cog ram as on P1. So maybe ignore this spec. On the other hand, we wont be needing as many instructions, since we are removing any instructions that we haven't had a use for, such as vid, extra timers, etc. Sal has the details, I just have a general picture.
@heater - yes, P2 without smart pins. The idea is simple bit banging at its finest, per our primary use-case for the P1. We only want to talk (bit bang) to the external devices, and implement as much of the custom circuitry on an a-la cart basic in FPGA.
Sal does all the design, he already did all this one piece at a time on the BEMicro, (more cogs, more hub ram, more i/o, faster clock); now we are looking at a design that includes all the previous work in one package if possible. It all works already, we just need the right FPGA board.
@Tubular - maybe the cog ram change was my mistake, but to my knowledge the previous experiments ran at 160Mhz, so the assumption is that we might find a part capable of 200Mhz if we ask.
I am not aware of stock issues for be-micro vs Terasic; could you please post a link to an article etc for my reference?
@David Betz - yes there would be significant changes, but I don't know exactly what those will be. I just do some of web surfing research up front, and some of the documentation after the fact. Sal said the goal would be the PropFORTH code will be compatible with the P1, but the assembler would end up different and totally incompatible. Which is fine in our case, The PropFORTH assembler words generate whatever assembler at the last (optimization) step, and that result gets loaded at run time. So the same code will work on either chip, (physical P1 or FPGA P1propforth) and we really would not have a case where the wrong code can get generated.
@T Chap - that may be true in case of the strict P1V, but once we remove the stuff we don't use, the limitation is FPGA resources rather than number of cores. Sal did experiments with 2 to 16 cores (I think it was 16 cores, but it was definitely more than 8 ) on the Be-micro.
As I said, the earlier experiments demonstrated that this stuff all works, and now the target is to put them all together and get one bigger faster prop chip.
Please bear in mind that the result we are looking for will NOT be spin compatible with arbitrary code from the OBEX, this is a custom image that is tailored to run PropFORTH for bit-banging. We are removing the ROM tables, the SPIN interpreter (for the most part), the video generators, the extra timers, and some other stuff.
So, whats your favorite e.g. 144 pin FPGA these days, can it be had on a dev board for close to $50?
Thanks!
I know little of these things but I would have thought that if one wanted to build multi-core Forth engine it might be better to tailor a CPU to run Forth rather than recycle the COG. You know, a stack based machine and all.
Doug, it seems like you would be better off designing an FPGA machine that directly executes Forth primitives. Forth operations take many cycles on the P1 mostly because of the stack operations. You may want to look at the J1 Forth machine that was implemented in an FPGA. It should be possible to design a Forth machine that can execute primitive words in a single cycle. This would be very straightforward for words that don't grow or shrink the stack. Other words that change the stack might be doable in a single cycle as well, or maybe they would take an extra cycle or two.
Yes faster parts could get you there, but I don't recall anything running reliably at 160MHz on the BeMicros yet
The BeMicro stock is based on observation over the past 3 years. It comes and goes, get them while you can. I can't speak about backorder durations but Peter Jakacki was looking to order some CV A9's and might have some better insight into how it works.
Here's a link to OzPropDev's P1V generator, which might be useful for working out what will fit
http://forums.parallax.com/discussion/160327/custom-p1-verilog-code-generator/p1
The Prop seems to be attracting many a Forth folk. I once had a forum PM from an individual from my town here asking me how I got into Forth. When I said I wasn't on the forum because of Forth I never got any further replies.
Cyclone V is hard to go past for higher end FPGA, and the MAX 10 look popular for more middle-road and that does come in a TQFP option.
If you are going to tune for forth, you could look at HyperRAM - I think there is a MAX10 Board coming with HyperRAM.
This gives you a lot more code memory, without using an expensive FPGA.
You can also get more COGS/dollar, with a standard P1, so it makes sense to at least include a P1 footprint on any custom MAX10 board that is P1V targeting.
So what is the super application you have in mind?
BTW, like Heater I also wonder why Sal doesn't just communicate directly on the forum like the rest of us unwashed masses.
IIRC I had a MAX10 P1V running at 140 MHz but I seem to recall the barrel shifter was the first thing to show flakiness.
I need to get back to it all and look deeper into it.
Edit: Adding HUB ram is relatively easy . e.g. Cyclone V A9 will support 1M hub ram.
Extra IO can be done too. I have had a CV-A9 with 6 * 32 bit ports (PortA..PortF)
* stock of bemicros is iffy compared to Terasic
* Cyclone V is hard to go past for higher end FPGA,
* MAX 10 look popular for more middle-road and that does come in a TQFP option.
* look at HyperRAM - MAX10 Board
* Cyclone V A9 will support 1M hub ram
* CV-A9 has done 6 * 32 bit ports (PortA..PortF)
To the questions posed:
Sal is of course Sal Sanci, author of spinforth and propforth. He doesn't have time to spend on the forums, I do. He's not interested in discussions about E.G. how he should implement the "create" forth word in conformance to the standard (which is just a guideline anyway) for compatibility with arbitrary code examples, at the expense of the compactness and speed he needs for his projects (others can and did implement this and other words differently than Sal's way, and this is how it should be).
Sal is considering implementing a forth-only engine in FPGA. It will be propforth compatible, but the machine code will be different from the prop. He considered simply moving to an ARM etc, but he prefers propforth for bit banging, he's already familiar with this tool he created, and he's used to the prop's configuration.
The "super application" is just another tool to use more tasks, in a more or less deterministic fashion. In a nutshell: The available cogs are in "pool". He implements the standard forth round robin software multitasking. the pool of cogs (on the physical prop) represent eight "execution opportunities" for the pool of software tasks. The number and size of tasks is only limited by memory resources. If one task get delayed, the total available system capacity is only reduced by 1/8, we can still have 7 task executing deterministically. If we have 16 cogs (FPGA prop) we could lose several cogs to slow or non-deterministic execution, but the remaining ten or so task would be unaffected (or minimally impacted).
The multitasker already works on the physical prop 1 (multitasker is in propforth 6, as yet unreleased), and it works on the FPGA propforth P1V. Six cogs supported dozens of tasks, all talking one to the next. Just silly example, but it works, simple and elegant.
Its kind of cool, mostly good for bit banging, which is primarily what we use the prop for. it easily interfaces to linux box or other larger but (not necessarily deterministic) processor via routines in GO and C. The user interface and I/O to the linux box consumes a cog, so more cogs gives a better ratio of available resources to overhead.
Anyway, more cogs and more memory is something he needs for a project, so he going to build his non-prop propforth engine in any case. He asked me to check with you guys if you had any recommendations. We'll discuss the list above next time.
Thanks all!
Forth on the Propeller could do with some cooperation and synergy and focus towards applications rather than the language which itself is driven by the applications. There was a time when I used PropForth and published a lot on it but because of this isolationism and PR channel no progress could really be made in improving the language as I found it surprisingly far too slow amongst other things. There was no proper source code as there was all this mysterious base64 machine code definitions and Spinmaker etc so I wrote a neat decompiler to help me to see what was actually going on. From there I concluded that although reluctant I would have to write a Forth myself hence I created Tachyon which turned out to be even more compact and very fast. There were also minor annoyances with PropForth such as entering numbers as h1234 yet not allowing more recognizable $1234 or 1234h etc that could have been very easily addressed.
The proof is in the pudding so they say but the desire is not to create Yet Another Fantastic Forth but to get a job done as efficiently and effectively as possible while not missing out on having fun of course. Sal's Forth was the first proper native Forth for the Propeller chip and I am grateful for that.
Sure. You must have missed the google code site. Its linked from the propforth threads for the past couple years. You downloaded propforth from there. Its archived, but still out there. Tons of drivers, code snippets, explanations, instructions, littlerobot project to teach elementary school kids to build and program scratch built robots. The index had a couple pages of wiki pages I wrote, I think there were 50 or 100 entries per index page(?). There were only just over 1000 downloads of PF5.5 when code code shut down, so there never was such a big audience. I guess not so many folks "get" forth, or maybe my documentation skills are crappier than I imagined. I don't know how many folks cloned PF5.5 from github. So, yeah, I don't do much, and what I do is not always of the highest quality.
Possibly, but Tachyon and propforth are different tools for different purposes. Sal shared his code (and test automation and scripts), and you took it in your own direction, great, cooperation and synergy. Sal shouldn't be expected to change the way he works to fit your tool, he already has a tool that is complete and stable and is designed to do exactly what he wants.
Anyway, the conversation seems to have strayed from FPGA recommendations to FORTH. I'll post if we find anything wonderful. Likely we'll have to just use whatever is on the bench free.