Pipeline/data forwarding issue with PTRx registers?

in Propeller 2
Hi Chip
I stumbled onto a little issue today that had me scratching my head for a while.
It appears to be a pipeline/data forwarding issue with the flags and PTRx register.
I went back to V26 and it has the same symptoms as V28 so it's not a new thing.
In the following code the testb instruction tests the bit ok and sets the C flag as expected.
But this seems to zero the original PTRx value.
The subsequent shr fails.
If I then comment out the testb instruction the shr works fine as expected.
On the othere hand if I place a nop before the testb instruction it fails too.
I'm guessing it could also be a shadow ram conflict as the PTRx registers have
a indexing mechanism as well.
I I substitute PTRx with any othere register it all works Ok.
Am I breaking the rules again? Sorry to be a pain.
I stumbled onto a little issue today that had me scratching my head for a while.
It appears to be a pipeline/data forwarding issue with the flags and PTRx register.
I went back to V26 and it has the same symptoms as V28 so it's not a new thing.
In the following code the testb instruction tests the bit ok and sets the C flag as expected.
But this seems to zero the original PTRx value.
The subsequent shr fails.
If I then comment out the testb instruction the shr works fine as expected.
On the othere hand if I place a nop before the testb instruction it fails too.
I'm guessing it could also be a shadow ram conflict as the PTRx registers have
a indexing mechanism as well.
I I substitute PTRx with any othere register it all works Ok.
Am I breaking the rules again? Sorry to be a pain.
dat org
bmask dirb,#15 'enable leds
mov ptra,##$f100_0000
' nop
testb ptra,#24 wc
shr ptra,#24 wz
outnz #40 'show c and z results on leds
outc #41
waitx ##80_000_000 'wait a while then show result
mov outb,ptra
jmp #$
Comments
PTRA and PTRB are only 20-bit registers, so their top 12 bits always read 0.
While PTRA and PTRB are only 20 bits, they can receive ALU results which are 32 bits. When ALU results are being written to a register and the next instruction in the pipeline is referencing that same register, the ALU data is forwarded to the next instruction's S and/or D value(s). There is no time to mask the ALU data down to 20 bits for the cases of PTRA and PTRB, as the ALU paths are the longest within the cog, already, and cannot stand to be made any longer.
So, this is an anomaly, but a rather harmless one. This can be documented, in case anyone runs into this and supposes something is wrong.
That's right.
Do you guys see much value in doing this?
It would make PTRA and PTRB more like other registers, in that they'd be 32 bits. The bottom 20 bits would automatically update in some situations, though.
Seems like a more expected behaviour. While only 20 bits of PTRx would be used for addressing, what would happen if PTRx increments causing an overflow? Would b20 (21st bit) increment, or would it be lost? I guess either way is acceptable anyway.
Might even be some tricks here
What FPGA image would you like this in? I figure either Prop123_A9 or BeMicro_A9. Which one first?
Might there be any point in
1. Increasing the depth?
2. Increasing the width from 22? to 32 bits?
Just wondering since the reduction to 8 cogs might have freed some spare silicon space. Presuming of course it's only a parameter change.
The incrementing would be limited to the bottom 20 bits. The PTRx registers already flirt with the critical path. There's no time to increment a full 32 bits. We are at the limit there, already. Those top 12 bits will just be normal RAM. They will be affected by ALU writing, but not by auto-inc/dec behavior.
It's 8 levels deep.
We could increase it to 32 bits, instead of 22. It will take 10 bits * 8 levels * 8 cogs = 640 more flipflops.
This means there are no more undersized registers anywhere.
When I get up in the morning, it will be done and I'll post it.
A 32 bit wide stack will make a big difference.
Keeping everything symmetrical makes life easier.
Thanks Chip!
Isn't the PAR register also short?
We have PTRA and PTRB on Prop2, and no PAR, right?
Wait, you mean the width of the parameters from COGINIT, right? Those can be widened, too.
I've got CV-A9 in the oven and I'm going to bed. I'll recompile everything else later.
I will have me a slice of that CVA9 pie when it's out of the oven and cooling on the Google Drive rack.
The wider PTRx will enhance SETQ/COGINIT nicely too.
Yes. Thanks.
The widening to 32 bits will simplify things for the users. Less restrictions and caveats lurking is excellent news.
BTW I am on CVA9 too. Unless someone is testing another board, maybe you can save compiling the others for now.
https://drive.google.com/file/d/1l7yWVpljQN8OTV7d4Mok3ukAdjTwmoDX/view?usp=sharing
Could you please see if this works as expected? Tonight I will get to widening the COGINIT conduit.
Thanks.
Sure, one immediate benefit of 32b holding, is better memory management.
A simple routine can be called, check PTRx, and if on-chip, a single opcode is used, if off-chip, more code is employed to get that R/W done.
To the user, they do not need to care where the data is.
Good point.
Stack and PTRx widening working as expected.
push ##$f0000000 push ##$ff000000 push ##$fff00000 push ##$ffff0000 push ##$fffff000 push ##$ffffff00 push ##$fffffff0 push ##$ffffffff pop $100 pop $101 pop $102 pop $103 pop $104 pop $105 pop $106 pop $107
Resulted in
-----------------------------------------------------------------------------------------(P2 Debugger) STACK : --,00000008 CZ,00300270 --,00000006 --,00000000 --,00000000 --,00000000 --,00000000 --,00000000 00004: FFF80000 AUGD #$780000 00005: FD64002A PUSH #$000 {$F0000000} (? for help) >-----------------------------------------------------------------------------------------(P2 Debugger) STACK : --,F0000000 --,00000008 CZ,00300270 --,00000006 --,00000000 --,00000000 --,00000000 --,00000000 00006: FFFF8000 AUGD #$7F8000 00007: FD64002A PUSH #$000 {$FF000000} (? for help) >-----------------------------------------------------------------------------------------(P2 Debugger) STACK : --,FF000000 --,F0000000 --,00000008 CZ,00300270 --,00000006 --,00000000 --,00000000 --,00000000 00008: FFFFF800 AUGD #$7FF800 00009: FD64002A PUSH #$000 {$FFF00000} (? for help) >-----------------------------------------------------------------------------------------(P2 Debugger) STACK : CZ,FFF00000 --,FF000000 --,F0000000 --,00000008 CZ,00300270 --,00000006 --,00000000 --,00000000 0000A: FFFFFF80 AUGD #$7FFF80 0000B: FD64002A PUSH #$000 {$FFFF0000} (? for help) >-----------------------------------------------------------------------------------------(P2 Debugger) STACK : CZ,FFFF0000 CZ,FFF00000 --,FF000000 --,F0000000 --,00000008 CZ,00300270 --,00000006 --,00000000 0000C: FFFFFFF8 AUGD #$7FFFF8 0000D: FD64002A PUSH #$000 {$FFFFF000} (? for help) >-----------------------------------------------------------------------------------------(P2 Debugger) STACK : CZ,FFFFF000 CZ,FFFF0000 CZ,FFF00000 --,FF000000 --,F0000000 --,00000008 CZ,00300270 --,00000006 0000E: FFFFFFFF AUGD #$7FFFFF 0000F: FD66002A PUSH #$100 {$FFFFFF00} (? for help) >-----------------------------------------------------------------------------------------(P2 Debugger) STACK : CZ,FFFFFF00 CZ,FFFFF000 CZ,FFFF0000 CZ,FFF00000 --,FF000000 --,F0000000 --,00000008 CZ,00300270 00010: FFFFFFFF AUGD #$7FFFFF 00011: FD67E02A PUSH #$1F0 {$FFFFFFF0} (? for help) >-----------------------------------------------------------------------------------------(P2 Debugger) STACK : CZ,FFFFFFF0 CZ,FFFFFF00 CZ,FFFFF000 CZ,FFFF0000 CZ,FFF00000 --,FF000000 --,F0000000 --,00000008 00012: FFFFFFFF AUGD #$7FFFFF 00013: FD67FE2A PUSH #$1FF {$FFFFFFFF} (? for help) >-----------------------------------------------------------------------------------------(P2 Debugger) STACK : CZ,FFFFFFFF CZ,FFFFFFF0 CZ,FFFFFF00 CZ,FFFFF000 CZ,FFFF0000 CZ,FFF00000 --,FF000000 --,F0000000 00014: FD62002B POP $100 (? for help) >-----------------------------------------------------------------------------------------(P2 Debugger) STACK : CZ,FFFFFFF0 CZ,FFFFFF00 CZ,FFFFF000 CZ,FFFF0000 CZ,FFF00000 --,FF000000 --,F0000000 --,F0000000 00015: FD62022B POP $101 (? for help) >-----------------------------------------------------------------------------------------(P2 Debugger) STACK : CZ,FFFFFF00 CZ,FFFFF000 CZ,FFFF0000 CZ,FFF00000 --,FF000000 --,F0000000 --,F0000000 --,F0000000 00016: FD62042B POP $102 (? for help) >-----------------------------------------------------------------------------------------(P2 Debugger) STACK : CZ,FFFFF000 CZ,FFFF0000 CZ,FFF00000 --,FF000000 --,F0000000 --,F0000000 --,F0000000 --,F0000000 00017: FD62062B POP $103 (? for help) >-----------------------------------------------------------------------------------------(P2 Debugger) STACK : CZ,FFFF0000 CZ,FFF00000 --,FF000000 --,F0000000 --,F0000000 --,F0000000 --,F0000000 --,F0000000 00018: FD62082B POP $104 (? for help) >-----------------------------------------------------------------------------------------(P2 Debugger) STACK : CZ,FFF00000 --,FF000000 --,F0000000 --,F0000000 --,F0000000 --,F0000000 --,F0000000 --,F0000000 00019: FD620A2B POP $105 (? for help) >-----------------------------------------------------------------------------------------(P2 Debugger) STACK : --,FF000000 --,F0000000 --,F0000000 --,F0000000 --,F0000000 --,F0000000 --,F0000000 --,F0000000 0001A: FD620C2B POP $106 (? for help) >-----------------------------------------------------------------------------------------(P2 Debugger) STACK : --,F0000000 --,F0000000 --,F0000000 --,F0000000 --,F0000000 --,F0000000 --,F0000000 --,F0000000 0001B: FD620E2B POP $107 (? for help) >-----------------------------------------------------------------------------------------(P2 Debugger) STACK : --,F0000000 --,F0000000 --,F0000000 --,F0000000 --,F0000000 --,F0000000 --,F0000000 --,F0000000 0001C: FD9FFFFC JMP #$FFFFC (ABS addr = $1C) (? for help) >REG 100 107 100: $FFFFFFFF %11111111_11111111_11111111_11111111 #4294967295 #-1 101: $FFFFFFF0 %11111111_11111111_11111111_11110000 #4294967280 #-16 102: $FFFFFF00 %11111111_11111111_11111111_00000000 #4294967040 #-256 103: $FFFFF000 %11111111_11111111_11110000_00000000 #4294963200 #-4096 104: $FFFF0000 %11111111_11111111_00000000_00000000 #4294901760 #-65536 105: $FFF00000 %11111111_11110000_00000000_00000000 #4293918720 #-1048576 106: $FF000000 %11111111_00000000_00000000_00000000 #4278190080 #-16777216 107: $F0000000 %11110000_00000000_00000000_00000000 #4026531840 #-268435456 (? for help) >