MCL86 & now MCL51 micro-sequencer FPGA cores. 308 & 312 LUT
jmg
Posts: 15,171
in Propeller 2
First, in FEB there was this :
http://www.embedded.com/electronics-blogs/max-unleashed-and-unfettered/4441454/Only-308-FPGA-LUTs-required-to-create-cycle-accurate-8088-8086-soft-processor-core
They quietly omit mentioning how large the ROM is, but they do say
"The result is the MCL86, which is basically a 7-instruction, 32-bit micro-sequencer. Some of the micro-sequencer's instructions are specialized so as to allow it to rapidly decode instructions as well as nest function calls. With these seven instructions, I was able to microcode all of the 8086 opcodes in a relatively small number of micro-sequencer clocks."
Addit:
I see comments saying fhe core needs 100MHz to give 4.77MHz emulation, and another note saying 180MHz spec on Kintex-7 FPGA, indicates 8.586 MHz top speed ( and ~21:1 microcode MHz to CPU MHz )
Updated to mention 4 Blocks for ROM, I think means ~18k Bytes, so quite a bit of 'microcode' ?
and now, there is this
http://www.embedded.com/electronics-blogs/max-unleashed-and-unfettered/4442147/Micro-sequencer-based-8051-core-requires-only-312-FPGA-LUTs
Almost exactly the same LUT count 308 vs 312 for MCL86 vs MCL51, no ROM size mentioned yet.
This says "8-bit controller running at clock speeds from DC to a few megahertz"
"a small demo that instantiates four of these cores to create a quad-core 8051 (just when you think you've heard of everything, right?). It was easy to do, but I thought you might find it interesting that this four-core 8051 actually consumes less logic than a single 8051 core from the major core vendors. This is not surprising as they are logic-based and the MCL51 is micro-sequencer-based. A single MCL51 core is around 312 LUTs, which is a fifth of the size of 8051 cores from these vendors."
A MCL51 core is probably all you need to configure and service a Smart Pin.
Suppose a design was done with 12 P2 COGS and 4 MCL51 COGS, how much more RAM would fit in the space freed by removing the larger COGs ?
http://www.embedded.com/electronics-blogs/max-unleashed-and-unfettered/4441454/Only-308-FPGA-LUTs-required-to-create-cycle-accurate-8088-8086-soft-processor-core
They quietly omit mentioning how large the ROM is, but they do say
"The result is the MCL86, which is basically a 7-instruction, 32-bit micro-sequencer. Some of the micro-sequencer's instructions are specialized so as to allow it to rapidly decode instructions as well as nest function calls. With these seven instructions, I was able to microcode all of the 8086 opcodes in a relatively small number of micro-sequencer clocks."
Addit:
I see comments saying fhe core needs 100MHz to give 4.77MHz emulation, and another note saying 180MHz spec on Kintex-7 FPGA, indicates 8.586 MHz top speed ( and ~21:1 microcode MHz to CPU MHz )
Updated to mention 4 Blocks for ROM, I think means ~18k Bytes, so quite a bit of 'microcode' ?
and now, there is this
http://www.embedded.com/electronics-blogs/max-unleashed-and-unfettered/4442147/Micro-sequencer-based-8051-core-requires-only-312-FPGA-LUTs
Almost exactly the same LUT count 308 vs 312 for MCL86 vs MCL51, no ROM size mentioned yet.
This says "8-bit controller running at clock speeds from DC to a few megahertz"
"a small demo that instantiates four of these cores to create a quad-core 8051 (just when you think you've heard of everything, right?). It was easy to do, but I thought you might find it interesting that this four-core 8051 actually consumes less logic than a single 8051 core from the major core vendors. This is not surprising as they are logic-based and the MCL51 is micro-sequencer-based. A single MCL51 core is around 312 LUTs, which is a fifth of the size of 8051 cores from these vendors."
A MCL51 core is probably all you need to configure and service a Smart Pin.
Suppose a design was done with 12 P2 COGS and 4 MCL51 COGS, how much more RAM would fit in the space freed by removing the larger COGs ?
Comments
I also came across this:-
http://www.mouser.com/ds/2/778/aloriumtechnology_XLR8%20Product%20Brief-876414.pdf
seems to emulate an ATMEGA328 inside the max10 FPGA. If I'm reading that right, it should work with existing tools.
It'd be kind of neat to be able to grab a board, and just load on your favourite micro, such as an ATmega328 or a 3 cog Prop 1 verilog...
It seems to get vague around how you combine the Core and Verilog Xcelerator Blocks ?
They would have to release source, in order to have the tools compile both user XB and the AVR-clone-core ?
Also nothing around Debug support ?
Aside from that, quite a nice MAX10 usage.
Yes, or 3 COG P1V with (say) 4 Cores/cogs of MCL51, and maybe some P2 Smart Pin Cells ...
I saw your postings and thought I could help answer a few questions.
Yes, the MCL86 uses 16KB of ROM for the microcode which consumes four Xilinx 7-series block RAMS. The 8086 is a 16-bit processor that has many addressing modes and multi-byte instructions which took a lot of code to emulate. And because the MCL86 core can be cycle accurate to the original CPU, a small amount of code used to load a cycle counter is needed for each x86 instruction. It's all these little things that added up to the 16KB of microcode!
The MCL51 is an 8-bit core that needs only 1KB for the microcode since the instruction set is simpler and there is no cycle accuracy requirement. This would just consume a single Xilinx 7-series block RAM.
The goal for designing the MCL86 and MCL51 cores was to create the smallest processors possible which frees up the most space in user's FPGAs/ASICs for their value-added logic. The embedded processor should be "out of sight", off in a small corner of the FPGA. Well, thats my opinion!
Please contact me through my website at www.MicroCoreLabs.com if I can help answer any other questions.
Thanks,
-Edward
I my frikken God, Heater is referenced from a Wikipedia article. See below.
Any way I was wondering how how these micro sequencers compare i size to the ZPU 32 bit core that is designed to be a minimal LUT usage 32 bit core? https://github.com/zylin/zpu
The ZPU is small enough that I made a ZPU emulator for the Propeller that fits in 512 instructions.
Which brings me to this Wikipedia article : https://en.wikipedia.org/wiki/ZPU_(microprocessor)
Which has a link to my ZPU emulator (ZOG) in the references at the end!
https://en.wikipedia.org/wiki/ZPU
which came up when I first clicked on it.
Do you have Opcode-Cycle tables available yet for this MCL51, and MHz numbers ?
There is another thread here about www.baysand.com devices, metal-mask FPGAs.
Even if you do not make a device on their shuttle, it could be useful benchmarking to run up how many MCL86 and MCL51 cores could fit, and at what MHz ?
"The Shuttle is for the FG65L-5 device, which is a 65nm technology up to 600,000 usable gates, up to 900kbit of memory, 4 PLLs/DCMs with maximum of 242 user IOs."
The reason they care about about 8051 or x86 backwards compatibility is on the software side, not hardware.
There may be existing code they need to port, and existing tool chains, and compilers/simulators they want to leverage, and many will likely have more than one product, using more than one version or variants, especially with the 8051.
You can buy a 16k Flash 8051 these days for 28c/1k, which is likely similar or faster than MCL51, so that does give a reality-check to FPGA use.
The appeal of FPGA is what else can be included at the same time.
Imagine something like MCL51, paired with P2-Smart Pin cells ?
I can see that this is a possibility, but in my experience it's really nice to port your high-level 8051 code to a clean 16-bit/32-bit architecture. If you're really concerned about 8051 compatibility then you'll potentially need to add the DPTR(s), MDU, timers/counters, and whatever other peripherals. And then the design isn't going to be as small.
BTW - I've heard that Nick Tredennick (mentioned on the Microcore labs site) worked on a high-performance 8051 at one point.
Edited to add: a sample of Nick's "flowchart method" https://www.computer.org/csdl/mags/co/1981/12/01667207.pdf
You want some high speed custom logic in your FPGA. But then you want a CPU and a bit of software to manage things in there. Because you have all that fast logic available a super fast CPU is not required. But it needs to be small to fit in a cheap FPGA.
Enter the ZPU, a simple 32 bit processor using very few LUTs. With the GCC tool chain to make it useful.
Keith, I dropped the ball on that. I went back and looked at your email from August 2014 and then used Synopsis VCS at edaplayground.com to cite the problems and test my fixes. I've got the three violating files (hub.v, cog.v, cog_ctr.v) updated. I tried to do a 'git push origin master', but it doesn't allow me. I assume I don't have permission. At the time this was set up, I didn't know about github, so Jeff Martin here made it happen. Once he gives me permission tomorrow, I'll upload the new files. Thanks for sending me that email a while ago. Sorry it took so long.
Please, can you post here those changes?
Long time ago I did also some test on three different verilog simulators (icarus verilog, CVC, and verilator). I used the xilinx version (from Magnus Karlsson' Pipistrello). And also I needed to edit the ROM hex files (remove comments).
For P3, I wonder if we will be able to go back again to P1 verilog after this P2 decade.
https://github.com/parallaxinc/Propeller_1_Design
>Long time ago I did also some test on three different verilog simulators (icarus verilog, CVC, and verilator)
It's been so long that I don't have the setup that I was using - Synopsys VCS. You might like to try the version of Modelsim which Altera makes available.