Expanding the P1V's instruction set (A Poor mans P2)
ozpropdev
Posts: 2,792
Hi All
Here's a experiment in expanding the P1 instruction set with only a slight change to the architecture.
The expansion was achieved by re-purposing the "IF_NEVER" condition to select an alternative function.
This means any standard propeller tool can be used to compile the new code.
The only PASM instruction that needs changing would be a NOP which simply can be replaced with
something like OR 0,0.
The example code adds 16 instructions to the existing instruction set. These instructions are simply
some experimental ones I conjured as part of my adventures in Verilog. Included is a PASM test program
and some notes on the added instructions.
Code was testes on a DE0-nano @ 80MHz clock. I imagine adding even more instructions may cause
some fmax drama with so many muxes. Lets see what happens... maybe adding some finite state machines
might be next on the list. This FPGA stuff is addictive.
Cheers
Brian
Here's a experiment in expanding the P1 instruction set with only a slight change to the architecture.
The expansion was achieved by re-purposing the "IF_NEVER" condition to select an alternative function.
This means any standard propeller tool can be used to compile the new code.
The only PASM instruction that needs changing would be a NOP which simply can be replaced with
something like OR 0,0.
The example code adds 16 instructions to the existing instruction set. These instructions are simply
some experimental ones I conjured as part of my adventures in Verilog. Included is a PASM test program
and some notes on the added instructions.
Code was testes on a DE0-nano @ 80MHz clock. I imagine adding even more instructions may cause
some fmax drama with so many muxes. Lets see what happens... maybe adding some finite state machines
might be next on the list. This FPGA stuff is addictive.
Cheers
Brian
Comments
Will look at your code later.
Sounds like a great direction, as MORE opcodes (but keeping binary compatible) is an important way to extend P1V.
Some can be quite low Logic cost -
examples like WAIT expanded to include edge sense (not just HI LO), and a 'paired' opcode that allows WAITCNT to delay one opcode, and run in parallel with the next opcode (typically WAIT Hi/Lo/Edge), would allow compact timeouts still with HW granularity.
Opcodes that support the inner-most loops of byte-code languages would also boost performance.
It seems that no two implementations of P1V are the same. We all have different hardware and different ideas/applications for them.
Some have video removed, less cogs, 64+ IO pins, video added, etc.
As MUL,MULS,ENC and ONES are non standard P1 opcodes their allocation seems to be available for "whatever".
The plan was to make the COGS as close to 100% compatible as possible. This method effectively doubles the available opcodes.
Making some custom "SPIN" instructions may also make it possible to shrink its size to allow some headroom for SPR expansion.
It will be interesting to see what gains can be made.
Cheers
Brian
Don't forget that the SYSOP instruction could be expanded into more opcodes.
I would even go as far as saying we could make this instruction binary incompatible with P1 if necessary ???
I like the idea of using PAR & CNT as special write registers.
There wasn't much use of them in P1 ASAIK. I used them in my spin/pasm debugger, together with INA & INB.
However, we likely want to keep INA, INB, DIRA & DIRB for masks.
It is a worthwhile exercise just to see how much logic (ie die space) is consumed by all these extra instructions. We don't want to make a "hot" P2 though!
SUBS & CMPS could share the same opcode. IIRC the C flag is different between these opcodes. This would mean SUBS NR would be lost.
SUBSX & CMPSX - same applies as to SUBS & CMPS.
DJNZ, TJNZ & TJZ could be combined into 2 opcodes also giving the opportunity for one of DJZ or IJZ or IJNZ, with the loss of DJNZ NR.
Alternately, the WR versions of TJNZ & TJZ could be used as other new instructions.
Of course, some of these possibilities remove binary compatibility.
And while I think of it, the P2 implemented the shift and rotate flags differently than on the P1. IIRC the Z & C flags were set by the resultant values, not the source values. This was considered better.
We have mul and muls done, doing what they are supposed to do and supported by proptool, so let mul do multiply.
I don't know what ENC and ONES are supposed to do, but let them do what Chip wanted them to do... and let implement this
Then we have all the rest: if_never, sysop, I don't know what, let they do what we can think of until we create some standard - what we need and where to implement this.
And there are some things which costs too much. I tried to implement div. It works but the cost is too high. 2500 LEs/cog. So maybe it will be better to have some addresses, maybe in the hub rom space, to implement a coprocessor. Put the data and instruction there, then get the result. Some operations like div, mod and floating point library can be made this way.
The way to accomplish this is to require P1+ programs to take special action to make the alternative instructions go into effect. Such an action should be something that normal functioning P1 code would never do. For example, it could set bit 31 of VCFG (on the P1, this should normally remain 0). This would be a cog-local change so if one cog does it, the other cogs would not be affected. This is important so that any other cogs running the Spin interpreter won't be affected.
Another possibility that I've been thinking of, is to create a new write-only register that's accessed at the CNT location, to turn features on or off. CNT is normally a read-only register so old software should never attempt to write to it anyway. And I'm thinking 32 bits is not going to be enough bits so maybe it should be a register bank, e.g. bit 0 determines the value that gets written to the feature register bank, and bits 1-31 are the potential address bits in a 1-bit wide register bank. This could be used to switch propeller-wide features on and off. For example, it could enable expanded hub RAM (at the price of ROM).
Additionally, we may be able to use some bit space in existing instructions to implement extended instructions. For example, the COGSTOP and some of the other hub instructions only use the lower 3 bits of the source, so it may be possible to modify COGSTOP to a different instruction if/whenever the top bit of the source is set to 1. However this will only be interesting for new instructions that are hub instructions and don't need the full source and destination fields.
The point is, instructions with IF_NEVER may not be used often, but they ARE valid instructions and there ARE situations where existing P1 code may execute them (and expect them to be NOPs). Replacing those instructions with others and disregarding those potential compatibility issues would not be a good idea, in my humble opinion.
===Jac
In order to continue with testing/development it is necessary to implement some sort of extension of opcodes.
We only had 4 spare opcodes but we now have Willy,Cluso and Rogloh's new instruction implementatios already!
The use of "IF_NEVER" was done for a couple of reasons. Firstly it was very simple to implement. Secondly it also
keeps the PASM code small as they are still 1 LONG in size. Preceeding "special" flag bits are not required.
The use of the condition bits was also used with success in the P2. Some instructions there used the
condition bits to expand opcode depth. This meant that it was illegal to use conditions on some instructions.
It was never my intention for this technique to be "implemented" in silicon, merely as a "quick and easy" approach
to the problem. Sane applies to the sample instructions in the code. Once again these are test instructions simply as a
learning exercise in Verilog. It's a little difficult to add features without breaking 100% compatibility somewhere.
BTW the code also use a shadow register at the CNT address to pass additional 32 bit data to some new instructions.
Anyhow...did I mention the "crazy" other stuff I did with the ZC and R bits!
Cheers
Brian