CMM in hardware?
David Betz
Posts: 14,516
I know it's crazy but I've been wondering how practical it would be to implement the PropGCC CMM engine in hardware. Mostly, it unpacks a byte stream into PASM instructions that it executes one at a time. The idea would be to fetch a long at a time from hub memory and step through bytes decoding CMM instructions and building a PASM instruction which would be executed as soon as it was complete. I've attached a document describing the PropGCC CMM instruction set. Tell me if I'm insane! :-)
CMM.txt
By the way, the advantage of this approach is that there is already a PropGCC code generator for this instruction set.
CMM.txt
By the way, the advantage of this approach is that there is already a PropGCC code generator for this instruction set.
Comments
You're insane
As indeed most of us are here
Now, let me go look at what you posted.
I second that, a good hack. Probably no problem at all adding it.
Just make sure to implement simulation so you can try your core within the development environment. It will save you time and be good to have a test-platform later.
Good luck!
The CMM instruction set is closely related to Propeller PASM but up to 65% smaller than comparable LMM*. It's like a thumb mode except that most of the instructions get decoded into PASM by a COG core virtual machine which is relatively slow.
The idea seems to be to get rid of the PASM virtual machine by changing the COG (as an experiment for a CMM-native propeller) so that all the time-consuming decoding gets done in the COG core hardware (for the set of instructions that are used most often). I don't think that any optimizations for HUB window stalls have been considered, and they probably don't need to be in a first cut at least.
Advantages:
1. Higher code density over LMM*
2. Speed on par with LMM* for non-fcached code second only to COG PASM speed
3. CMM allows block read/execute of up to 64 PASM longs at COG speed
4. Getting rid of the VM saves global HUB RAM space.
5. A proven compiler already exists that runs the CMM code
Disadvantages:
1. Bigger COGs?
2. Development time.
*Notes: LMM was suggested by Bill Henning as a way to read and execute PASM code from HUB RAM one long at a time. Fcache is a larger fill and execute mechanism Bill mentioned as a higher performance overlay method. The only problem with LMM is the macros needed to do certain things and the fetch window stall problem. Macros used for LMM have been considered for a HUB-EXEC mechanism, and auto increment index variants of RDLONG have been considered to fix the fetch window stall problem.
That's two more than me
Not insane, the idea has strong merit.
The PropGCC code generator is a serious plus.
It probably needs a pause/enable signal added to the Prop opcode fetch, so it can idle clocks until the CMM reader is ready. ( I think such a signal is already needed for the WAIT opcodes. )
When this works from HUB, it should be possible to expand it to stream from a QuadSPI device
I guess the problem is that the resulting COG might be more then double the size of the current COG so you'd need a much bigger FPGA to get a P1+CMM with 8 COGs.