Where to grow the Prop1 architecture, instruction-wise
cgracey
Posts: 14,155
In the current Prop1 chip, there are four unused instruction opcodes:
000100 ZCRICCCC DDDDDDDDD SSSSSSSSS <MUL> D,S/#
000101 ZCRICCCC DDDDDDDDD SSSSSSSSS <MULS> D,S/#
000110 ZCRICCCC DDDDDDDDD SSSSSSSSS <ENC> D,S/#
000111 ZCRICCCC DDDDDDDDD SSSSSSSSS <ONES> D,S/#
These instructions were planned for future implementation, but are unused in the current chip and tools.
Making new instructions that work in these spaces will not impede the operation of the current tools systems. If you were to make the ADD instruction behave differently, on the other hand, you would cause the ROM code to malfunction and you wouldn't even be able to download a program. By doing new things in these four instruction spaces, you'll encounter no problems with the ROM code or current tools.
I propose, for the purpose of varied development efforts, to rename these opcodes as follows:
000100 ZCRICCCC DDDDDDDDD SSSSSSSSS USR0 D,S/#
000101 ZCRICCCC DDDDDDDDD SSSSSSSSS USR1 D,S/#
000110 ZCRICCCC DDDDDDDDD SSSSSSSSS USR2 D,S/#
000111 ZCRICCCC DDDDDDDDD SSSSSSSSS USR3 D,S/#
The assemblers can be modified to recognize and assemble these instructions so that a way exists to grow the instruction set without any fixed functional requirements.
000100 ZCRICCCC DDDDDDDDD SSSSSSSSS <MUL> D,S/#
000101 ZCRICCCC DDDDDDDDD SSSSSSSSS <MULS> D,S/#
000110 ZCRICCCC DDDDDDDDD SSSSSSSSS <ENC> D,S/#
000111 ZCRICCCC DDDDDDDDD SSSSSSSSS <ONES> D,S/#
These instructions were planned for future implementation, but are unused in the current chip and tools.
Making new instructions that work in these spaces will not impede the operation of the current tools systems. If you were to make the ADD instruction behave differently, on the other hand, you would cause the ROM code to malfunction and you wouldn't even be able to download a program. By doing new things in these four instruction spaces, you'll encounter no problems with the ROM code or current tools.
I propose, for the purpose of varied development efforts, to rename these opcodes as follows:
000100 ZCRICCCC DDDDDDDDD SSSSSSSSS USR0 D,S/#
000101 ZCRICCCC DDDDDDDDD SSSSSSSSS USR1 D,S/#
000110 ZCRICCCC DDDDDDDDD SSSSSSSSS USR2 D,S/#
000111 ZCRICCCC DDDDDDDDD SSSSSSSSS USR3 D,S/#
The assemblers can be modified to recognize and assemble these instructions so that a way exists to grow the instruction set without any fixed functional requirements.
Comments
Questions
1) Would we also not have
xxxxxx xxxx 0000 ddddddddd sssssssss
also available for instructions that can execute unconditionally?
2) Is Port B left as an exercise for the readers?
1) Yes, that would work, too. Then we could implement those four planned instructions, which would be useful.
2) I didn't add a Port B because I figured it would be a good learning opportunity for others.
I think many of you are going to get way into hardware, from now on.
Excellent idea! I had wondered about that a day or so ago it the "anticipation thread".....to really get into the verilog and make significant changes, at some point you would need to get into OpenSpin and PropGCC to have it recognize your new instructions. This would solve that problem.
Plus we decode the "never execute" opcodes.
Agreed port B is an excellent user exercise
Dang it! There goes what little "free" time I had... where is my Arrow account... (at least I know where my DE0-Nano's are)
In the medium term, a macro assembler can manage new opcodes at the ASM level, and I think PropGCC has a macro assembler.
Of course, longer term, PropGCC and OpenSpin can also code generate for new opcodes, but that will need more care, and a target-version-management scheme.
How hard is it to modify the compiler to accept different HUB sizes for the P1? The problem is that the compiler gives an error when 32KB is exceeded. IIRC, the hub size is hard coded in some file that is built into the compiler when propgcc is itself compiled.
The problem will more be in 'Which MUL and (any) DIV' management.
All take room, and some ARMs for example, either skip a DIV, or they include it in ROM
The compiler use, needs to match what the silicon does, and I can see that gets murky quickly.
Another way to manage this, is treat the new opcodes as in-line-HW-function calls.
For example, someone could code a scaled multiply (A * / C, where it is nice to have an interim result of 64 bits,but that does not need to be in register space..
Likewise, a DivMod in-line-HW-function that returns both quotient and remainder can give tighter code.
Most controller families have config files, that sets these resource check warning limits in a set of small files somewhere.
That would need to come for all the Verilog-P1 variants. (and also P2)
IIRC, all you need is a different .ld link file. I'm pretty sure the default .ld file set creates this restriction in CMM/LMM to keep people from getting horribly confused.
Hi Bill,
there was no DIV opcode on the P1, only unimplemented MUL and MULS(signed) from what I can see.
It would be really nice to gain the 16x16 multiplier operation using FPGA hardware - single instruction cycle if possible. I don't know much about how it all works yet.
Port B would naturally make a good second IO port for FPGAs with enough pins brought out or a quick way to control internal/custom FPGA peripherals without too much mucking about. Having all 3 ports (dirb, portb, and inb) each read and write gives plenty of flexibility there. You could perhaps use it to issue SDRAM operations if you wanted to, or try to use the hub to coordinate things better if multiple COGs are sharing it.
HUBEXEC
IMHO this feature is going to be a must have feature for most users here.
This requires at least 2 instructions with a large number of immediate bits. They are
CALLX/JMPX/RETX #nnnnnnnnn_nnnnnnnnn (18 bits reserved for the absolute goto address)
LOAD #nnnnnnnnn_nnnnnnnnn (18 bits reserved for the immediate load/address)
I propose to use the two opcode instruction spaces reserved for ENC and ONES.
Both these instructions will use the combined D & S fields to store the constant value.
I propose that the 18 bits all be available. Currently that would permit hub being 18 bits = 256KB.
The 'LOAD' instruction requirements is not fully defined yet. We know what was done in the P2 but that was quite a bit more complicated and there were a number of instructions. IMHO we don't have that luxury while keeping code backward compatible.
This 'LOAD' type instruction may go thru a few iterations... simple at first to test the Verilog waters.
There are also instructions in the SYSOP opcode that can be further utilised along the P2 lines (particularly the early versions). It might be necessary to put the RETX here.
For now, utilising the CCCC field for other than conditional execution is problematic in the Verilog code, at least for me as a Verilog beginner.
BTW I would not like to utilise the CCCC=0000 for further instructions. Remember, it was/is an easy way to disable instructions, so there is likely code relying on this.
That's what I was hoping/anticipating. MUL in a single instruction cycle, just like ADD today. Saves code space/time vs the successive addition approach. Even the simple AVR already has MUL (albeit 8x8) and it came in handy before for me with some real time audio scaling stuff.
+1 to that as well. Could do some nifty and fast FFT and DSP.
+1
My guess is you can have a 32x32 MUl running together with your Propeller at "full" speed.
Meanwhile for testing, you can use a long to define the usr instructions - you manually compile each usr instruction. That's how I am testing the AUGDS and JMPRETX instructions.
Thanks. Sorry I didn't respond sooner. I should have said that I was on the road and wouldn't be able to check in for a day or two.
From Chips remarks and my own (occasionally misguided) understanding I thought that a somewhat generic compiler interface was possible...
So, for instance, I could write code that made sense to my verilog substrate (using usr0) and others could write code that made sense to their verilog substrates (using usr0 )and we could all use (pretty much) the same compiler...but since our verilog implementations would vary, our results would be specific to our verilog implementation.. I understand that there will be examples that fall outside of this, but is it generally correct?