MCL86 & now MCL51 micro-sequencer FPGA cores. 308 & 312 LUT

jmg · 2016-06-09 05:31

First, in FEB there was this :

http://www.embedded.com/electronics-blogs/max-unleashed-and-unfettered/4441454/Only-308-FPGA-LUTs-required-to-create-cycle-accurate-8088-8086-soft-processor-core

They quietly omit mentioning how large the ROM is, but they do say
"The result is the MCL86, which is basically a 7-instruction, 32-bit micro-sequencer. Some of the micro-sequencer's instructions are specialized so as to allow it to rapidly decode instructions as well as nest function calls. With these seven instructions, I was able to microcode all of the 8086 opcodes in a relatively small number of micro-sequencer clocks."
Addit:
I see comments saying fhe core needs 100MHz to give 4.77MHz emulation, and another note saying 180MHz spec on Kintex-7 FPGA, indicates 8.586 MHz top speed ( and ~21:1 microcode MHz to CPU MHz )

Updated to mention 4 Blocks for ROM, I think means ~18k Bytes, so quite a bit of 'microcode' ?

and now, there is this
http://www.embedded.com/electronics-blogs/max-unleashed-and-unfettered/4442147/Micro-sequencer-based-8051-core-requires-only-312-FPGA-LUTs

Almost exactly the same LUT count 308 vs 312 for MCL86 vs MCL51, no ROM size mentioned yet.

This says "8-bit controller running at clock speeds from DC to a few megahertz"

"a small demo that instantiates four of these cores to create a quad-core 8051 (just when you think you've heard of everything, right?). It was easy to do, but I thought you might find it interesting that this four-core 8051 actually consumes less logic than a single 8051 core from the major core vendors. This is not surprising as they are logic-based and the MCL51 is micro-sequencer-based. A single MCL51 core is around 312 LUTs, which is a fifth of the size of 8051 cores from these vendors."

A MCL51 core is probably all you need to configure and service a Smart Pin.

Suppose a design was done with 12 P2 COGS and 4 MCL51 COGS, how much more RAM would fit in the space freed by removing the larger COGs ?

Tubular · 2016-06-09 05:41

Interesting, thanks for the update

I also came across this:-
http://www.mouser.com/ds/2/778/aloriumtechnology_XLR8%20Product%20Brief-876414.pdf
seems to emulate an ATMEGA328 inside the max10 FPGA. If I'm reading that right, it should work with existing tools.

It'd be kind of neat to be able to grab a board, and just load on your favourite micro, such as an ATmega328 or a 3 cog Prop 1 verilog...

jmg · 2016-06-09 06:15

Tubular wrote: »

I also came across this:-
http://www.mouser.com/ds/2/778/aloriumtechnology_XLR8%20Product%20Brief-876414.pdf
seems to emulate an ATMEGA328 inside the max10 FPGA. If I'm reading that right, it should work with existing tools.

It seems to get vague around how you combine the Core and Verilog Xcelerator Blocks ?
They would have to release source, in order to have the tools compile both user XB and the AVR-clone-core ?
Also nothing around Debug support ?

Aside from that, quite a nice MAX10 usage.

Tubular wrote: »

It'd be kind of neat to be able to grab a board, and just load on your favourite micro, such as an ATmega328 or a 3 cog Prop 1 verilog...

Yes, or 3 COG P1V with (say) 4 Cores/cogs of MCL51, and maybe some P2 Smart Pin Cells ...

MicroCoreLabs · 2016-08-10 16:20

Hi,

I saw your postings and thought I could help answer a few questions.

Yes, the MCL86 uses 16KB of ROM for the microcode which consumes four Xilinx 7-series block RAMS. The 8086 is a 16-bit processor that has many addressing modes and multi-byte instructions which took a lot of code to emulate. And because the MCL86 core can be cycle accurate to the original CPU, a small amount of code used to load a cycle counter is needed for each x86 instruction. It's all these little things that added up to the 16KB of microcode!

The MCL51 is an 8-bit core that needs only 1KB for the microcode since the instruction set is simpler and there is no cycle accuracy requirement. This would just consume a single Xilinx 7-series block RAM.

The goal for designing the MCL86 and MCL51 cores was to create the smallest processors possible which frees up the most space in user's FPGAs/ASICs for their value-added logic. The embedded processor should be "out of sight", off in a small corner of the FPGA. Well, thats my opinion!

Please contact me through my website at www.MicroCoreLabs.com if I can help answer any other questions.

Thanks,
-Edward

Heater. · 2016-08-10 16:41

I came here to make a comment but thought I'd do a bit of research first and..

I my frikken God, Heater is referenced from a Wikipedia article. See below.

Any way I was wondering how how these micro sequencers compare i size to the ZPU 32 bit core that is designed to be a minimal LUT usage 32 bit core? https://github.com/zylin/zpu

The ZPU is small enough that I made a ZPU emulator for the Propeller that fits in 512 instructions.

Which brings me to this Wikipedia article : https://en.wikipedia.org/wiki/ZPU_(microprocessor)
Which has a link to my ZPU emulator (ZOG) in the references at the end!

Publison · 2016-08-10 16:46

You have been famous for a number of years. Now you have the proof!

Heater. · 2016-08-10 16:58

Yeah, as you see the forum software mangles that URL. Took me a while to notice and fix it.

Publison · 2016-08-10 17:02

Yea, I tried fixing the link, it's weird with the parans. I didn't think you were invoved with:

https://en.wikipedia.org/wiki/ZPU

which came up when I first clicked on it.

KeithE · 2016-08-10 18:57

@Heater - I wonder about that as well. I think that as long as a processor is small, royalty free, and has free C development tools then it could get some interest. I don't know that many engineers care about 8051 or x86 backwards compatibility in the context of an FPGA or custom chip design. Also having a processor implementation that can be targeted to an FPGA or ASIC would be a plus. And it should run in a Verilog/VHDL simulator. This is a place where the propeller Verilog strangely fails unless someone has made edits. (I did a long time ago, and submitted the simple edits to Chip, but he doesn't care about such things.)

jmg · 2016-08-10 21:27

MicroCoreLabs wrote: »

The MCL51 is an 8-bit core that needs only 1KB for the microcode since the instruction set is simpler and there is no cycle accuracy requirement. This would just consume a single Xilinx 7-series block RAM.

Do you have Opcode-Cycle tables available yet for this MCL51, and MHz numbers ?

There is another thread here about www.baysand.com devices, metal-mask FPGAs.
Even if you do not make a device on their shuttle, it could be useful benchmarking to run up how many MCL86 and MCL51 cores could fit, and at what MHz ?
"The Shuttle is for the FG65L-5 device, which is a 65nm technology up to 600,000 usable gates, up to 900kbit of memory, 4 PLLs/DCMs with maximum of 242 user IOs."

jmg · 2016-08-10 21:37

KeithE wrote: »

I don't know that many engineers care about 8051 or x86 backwards compatibility in the context of an FPGA or custom chip design.

The reason they care about about 8051 or x86 backwards compatibility is on the software side, not hardware.

There may be existing code they need to port, and existing tool chains, and compilers/simulators they want to leverage, and many will likely have more than one product, using more than one version or variants, especially with the 8051.
You can buy a 16k Flash 8051 these days for 28c/1k, which is likely similar or faster than MCL51, so that does give a reality-check to FPGA use.
The appeal of FPGA is what else can be included at the same time.

Imagine something like MCL51, paired with P2-Smart Pin cells ?

KeithE · 2016-08-10 22:18

jmg wrote: »

The reason they care about about 8051 or x86 backwards compatibility is on the software side, not hardware.

I can see that this is a possibility, but in my experience it's really nice to port your high-level 8051 code to a clean 16-bit/32-bit architecture. If you're really concerned about 8051 compatibility then you'll potentially need to add the DPTR(s), MDU, timers/counters, and whatever other peripherals. And then the design isn't going to be as small.

BTW - I've heard that Nick Tredennick (mentioned on the Microcore labs site) worked on a high-performance 8051 at one point.

Edited to add: a sample of Nick's "flowchart method" https://www.computer.org/csdl/mags/co/1981/12/01667207.pdf

Heater. · 2016-08-11 00:52

That is what the ZPU is all about.

You want some high speed custom logic in your FPGA. But then you want a CPU and a bit of software to manage things in there. Because you have all that fast logic available a super fast CPU is not required. But it needs to be small to fit in a cheap FPGA.

Enter the ZPU, a simple 32 bit processor using very few LUTs. With the GCC tool chain to make it useful.

cgracey · 2016-08-11 07:15

"KeithE wrote: »

...And it should run in a Verilog/VHDL simulator. This is a place where the propeller Verilog strangely fails unless someone has made edits. (I did a long time ago, and submitted the simple edits to Chip, but he doesn't care about such things.)

Keith, I dropped the ball on that. I went back and looked at your email from August 2014 and then used Synopsis VCS at edaplayground.com to cite the problems and test my fixes. I've got the three violating files (hub.v, cog.v, cog_ctr.v) updated. I tried to do a 'git push origin master', but it doesn't allow me. I assume I don't have permission. At the time this was set up, I didn't know about github, so Jeff Martin here made it happen. Once he gives me permission tomorrow, I'll upload the new files. Thanks for sending me that email a while ago. Sorry it took so long.

Ale · 2016-08-11 12:55

It would be great to have a pick at the ML86 code !... but nothing is available

KeithE · 2016-08-11 15:17

Chip - cool! Hopefully there will be some forum users interested in pursuing that. I'm not sure how P2 development is done with Treehouse, but I could see them wanting to run simulations on the P2 once certain transformations are done. A lot is done with formal equivalence checks, but simulations are important too. Strangely I found that gate level simulations were really important for low power designs given the flow used, but hopefully that's not relevant for you.

Ramon · 2016-08-12 14:18

KeithE wrote: »

And it should run in a Verilog/VHDL simulator. This is a place where the propeller Verilog strangely fails unless someone has made edits. (I did a long time ago, and submitted the simple edits to Chip, but he doesn't care about such things.)

Please, can you post here those changes?

Long time ago I did also some test on three different verilog simulators (icarus verilog, CVC, and verilator). I used the xilinx version (from Magnus Karlsson' Pipistrello). And also I needed to edit the ROM hex files (remove comments).

For P3, I wonder if we will be able to go back again to P1 verilog after this P2 decade.

KeithE · 2016-08-12 15:05

Ramon wrote: »

Please, can you post here those changes?

It looks like they been posted on github:

https://github.com/parallaxinc/Propeller_1_Design

>Long time ago I did also some test on three different verilog simulators (icarus verilog, CVC, and verilator)

It's been so long that I don't have the setup that I was using - Synopsys VCS. You might like to try the version of Modelsim which Altera makes available.

MCL86 & now MCL51 micro-sequencer FPGA cores. 308 & 312 LUT

Comments