Expanding the P1V's instruction set (A Poor mans P2)

ozpropdev · 2014-10-10 06:43

Hi All

Here's a experiment in expanding the P1 instruction set with only a slight change to the architecture.
The expansion was achieved by re-purposing the "IF_NEVER" condition to select an alternative function.
This means any standard propeller tool can be used to compile the new code.
The only PASM instruction that needs changing would be a NOP which simply can be replaced with
something like OR 0,0.

The example code adds 16 instructions to the existing instruction set. These instructions are simply
some experimental ones I conjured as part of my adventures in Verilog. Included is a PASM test program
and some notes on the added instructions.

Code was testes on a DE0-nano @ 80MHz clock. I imagine adding even more instructions may cause
some fmax drama with so many muxes. Lets see what happens... maybe adding some finite state machines
might be next on the list. This FPGA stuff is addictive.

Cheers
Brian

pik33 · 2014-10-10 08:09

Looks good.. but... don't use mul for something other than multiply

Cluso99 · 2014-10-10 08:13

Interesting idea. There may be other ways to get some extra instructions, depending on what you want.
Will look at your code later.

jmg · 2014-10-10 16:09

ozpropdev wrote: »

Hi All

Here's a experiment in expanding the P1 instruction set with only a slight change to the architecture.
The expansion was achieved by re-purposing the "IF_NEVER" condition to select an alternative function.
...

The example code adds 16 instructions to the existing instruction set. .

Sounds like a great direction, as MORE opcodes (but keeping binary compatible) is an important way to extend P1V.

Some can be quite low Logic cost -
examples like WAIT expanded to include edge sense (not just HI LO), and a 'paired' opcode that allows WAITCNT to delay one opcode, and run in parallel with the next opcode (typically WAIT Hi/Lo/Edge), would allow compact timeouts still with HW granularity.
Opcodes that support the inner-most loops of byte-code languages would also boost performance.

ozpropdev · 2014-10-10 17:01

pik33 wrote: »

Looks good.. but... don't use mul for something other than multiply

It seems that no two implementations of P1V are the same. We all have different hardware and different ideas/applications for them.
Some have video removed, less cogs, 64+ IO pins, video added, etc.
As MUL,MULS,ENC and ONES are non standard P1 opcodes their allocation seems to be available for "whatever".

jmg wrote: »

Opcodes that support the inner-most loops of byte-code languages would also boost performance.

The plan was to make the COGS as close to 100% compatible as possible. This method effectively doubles the available opcodes.
Making some custom "SPIN" instructions may also make it possible to shrink its size to allow some headroom for SPR expansion.
It will be interesting to see what gains can be made.

Cheers
Brian

Cluso99 · 2014-10-10 17:58

ozpropdev,
Don't forget that the SYSOP instruction could be expanded into more opcodes.
I would even go as far as saying we could make this instruction binary incompatible with P1 if necessary ???

I like the idea of using PAR & CNT as special write registers.
There wasn't much use of them in P1 ASAIK. I used them in my spin/pasm debugger, together with INA & INB.

However, we likely want to keep INA, INB, DIRA & DIRB for masks.

It is a worthwhile exercise just to see how much logic (ie die space) is consumed by all these extra instructions. We don't want to make a "hot" P2 though!

Cluso99 · 2014-10-10 21:13

There are some more P1 instructions that could be looked at.They have been mentioned in P2.

SUBS & CMPS could share the same opcode. IIRC the C flag is different between these opcodes. This would mean SUBS NR would be lost.

SUBSX & CMPSX - same applies as to SUBS & CMPS.

DJNZ, TJNZ & TJZ could be combined into 2 opcodes also giving the opportunity for one of DJZ or IJZ or IJNZ, with the loss of DJNZ NR.
Alternately, the WR versions of TJNZ & TJZ could be used as other new instructions.

Of course, some of these possibilities remove binary compatibility.

And while I think of it, the P2 implemented the shift and rotate flags differently than on the P1. IIRC the Z & C flags were set by the resultant values, not the source values. This was considered better.

pik33 · 2014-10-10 23:07

ozpropdev wrote: »

As MUL,MULS,ENC and ONES are non standard P1 opcodes their allocation seems to be available for "whatever".

We have mul and muls done, doing what they are supposed to do and supported by proptool, so let mul do multiply.
I don't know what ENC and ONES are supposed to do, but let them do what Chip wanted them to do... and let implement this

Then we have all the rest: if_never, sysop, I don't know what, let they do what we can think of until we create some standard - what we need and where to implement this.

And there are some things which costs too much. I tried to implement div. It works but the cost is too high. 2500 LEs/cog. So maybe it will be better to have some addresses, maybe in the hub rom space, to implement a coprocessor. Put the data and instruction there, then get the result. Some operations like div, mod and floating point library can be made this way.

jac_goudsmit · 2014-10-11 01:14

I don't think the if_never instructions are a good place to put new instructions. I only have one major project with a Propeller and it uses a MUX instruction to change the conditional bits between if_never and other values to turn an instruction into a NOP. True, you could argue that I could change my code to reset/set ALL bits instead of just the ones that are conditional, but that's not even the point. In the event that we make modifications to the instruction set, any future silicon version of the P1V need to be compatible with all existing code.

The way to accomplish this is to require P1+ programs to take special action to make the alternative instructions go into effect. Such an action should be something that normal functioning P1 code would never do. For example, it could set bit 31 of VCFG (on the P1, this should normally remain 0). This would be a cog-local change so if one cog does it, the other cogs would not be affected. This is important so that any other cogs running the Spin interpreter won't be affected.

Another possibility that I've been thinking of, is to create a new write-only register that's accessed at the CNT location, to turn features on or off. CNT is normally a read-only register so old software should never attempt to write to it anyway. And I'm thinking 32 bits is not going to be enough bits so maybe it should be a register bank, e.g. bit 0 determines the value that gets written to the feature register bank, and bits 1-31 are the potential address bits in a 1-bit wide register bank. This could be used to switch propeller-wide features on and off. For example, it could enable expanded hub RAM (at the price of ROM).

Additionally, we may be able to use some bit space in existing instructions to implement extended instructions. For example, the COGSTOP and some of the other hub instructions only use the lower 3 bits of the source, so it may be possible to modify COGSTOP to a different instruction if/whenever the top bit of the source is set to 1. However this will only be interesting for new instructions that are hub instructions and don't need the full source and destination fields.

The point is, instructions with IF_NEVER may not be used often, but they ARE valid instructions and there ARE situations where existing P1 code may execute them (and expect them to be NOPs). Replacing those instructions with others and disregarding those potential compatibility issues would not be a good idea, in my humble opinion.

===Jac

ozpropdev · 2014-10-11 04:50

Hmmm...
In order to continue with testing/development it is necessary to implement some sort of extension of opcodes.
We only had 4 spare opcodes but we now have Willy,Cluso and Rogloh's new instruction implementatios already!
The use of "IF_NEVER" was done for a couple of reasons. Firstly it was very simple to implement. Secondly it also
keeps the PASM code small as they are still 1 LONG in size. Preceeding "special" flag bits are not required.
The use of the condition bits was also used with success in the P2. Some instructions there used the
condition bits to expand opcode depth. This meant that it was illegal to use conditions on some instructions.
It was never my intention for this technique to be "implemented" in silicon, merely as a "quick and easy" approach
to the problem. Sane applies to the sample instructions in the code. Once again these are test instructions simply as a
learning exercise in Verilog. It's a little difficult to add features without breaking 100% compatibility somewhere.
BTW the code also use a shadow register at the CNT address to pass additional 32 bit data to some new instructions.

Anyhow...did I mention the "crazy" other stuff I did with the ZC and R bits!

Cheers
Brian

mark · 2014-10-12 12:16

In the early days of P2 discussions, someone on the forum suggested increasing the long length to 33 or 34 bits in cog ram, and the 33rd bit could contain a permanent carry bit since the ALU already has this ability. I don't know if it would make some of the self-modifying code instructions trickier to implement, or in the case of FPGAs, if memory elements tend to be mutliples of 8 bits or something which would be wasteful for memory of different sizes. Otherwise, you could have all those instructions you want without repurposing if_never.

Expanding the P1V's instruction set (A Poor mans P2)

Comments