Pipeline/data forwarding issue with PTRx registers?
ozpropdev
Posts: 2,792
in Propeller 2
Hi Chip
I stumbled onto a little issue today that had me scratching my head for a while.
It appears to be a pipeline/data forwarding issue with the flags and PTRx register.
I went back to V26 and it has the same symptoms as V28 so it's not a new thing.
In the following code the testb instruction tests the bit ok and sets the C flag as expected.
But this seems to zero the original PTRx value.
The subsequent shr fails.
If I then comment out the testb instruction the shr works fine as expected.
On the othere hand if I place a nop before the testb instruction it fails too.
I'm guessing it could also be a shadow ram conflict as the PTRx registers have
a indexing mechanism as well.
I I substitute PTRx with any othere register it all works Ok.
Am I breaking the rules again? Sorry to be a pain.
I stumbled onto a little issue today that had me scratching my head for a while.
It appears to be a pipeline/data forwarding issue with the flags and PTRx register.
I went back to V26 and it has the same symptoms as V28 so it's not a new thing.
In the following code the testb instruction tests the bit ok and sets the C flag as expected.
But this seems to zero the original PTRx value.
The subsequent shr fails.
If I then comment out the testb instruction the shr works fine as expected.
On the othere hand if I place a nop before the testb instruction it fails too.
I'm guessing it could also be a shadow ram conflict as the PTRx registers have
a indexing mechanism as well.
I I substitute PTRx with any othere register it all works Ok.
Am I breaking the rules again? Sorry to be a pain.
dat org bmask dirb,#15 'enable leds mov ptra,##$f100_0000 ' nop testb ptra,#24 wc shr ptra,#24 wz outnz #40 'show c and z results on leds outc #41 waitx ##80_000_000 'wait a while then show result mov outb,ptra jmp #$
Comments
PTRA and PTRB are only 20-bit registers, so their top 12 bits always read 0.
While PTRA and PTRB are only 20 bits, they can receive ALU results which are 32 bits. When ALU results are being written to a register and the next instruction in the pipeline is referencing that same register, the ALU data is forwarded to the next instruction's S and/or D value(s). There is no time to mask the ALU data down to 20 bits for the cases of PTRA and PTRB, as the ALU paths are the longest within the cog, already, and cannot stand to be made any longer.
So, this is an anomaly, but a rather harmless one. This can be documented, in case anyone runs into this and supposes something is wrong.
That's right.
Do you guys see much value in doing this?
It would make PTRA and PTRB more like other registers, in that they'd be 32 bits. The bottom 20 bits would automatically update in some situations, though.
Seems like a more expected behaviour. While only 20 bits of PTRx would be used for addressing, what would happen if PTRx increments causing an overflow? Would b20 (21st bit) increment, or would it be lost? I guess either way is acceptable anyway.
Might even be some tricks here
What FPGA image would you like this in? I figure either Prop123_A9 or BeMicro_A9. Which one first?
Might there be any point in
1. Increasing the depth?
2. Increasing the width from 22? to 32 bits?
Just wondering since the reduction to 8 cogs might have freed some spare silicon space. Presuming of course it's only a parameter change.
The incrementing would be limited to the bottom 20 bits. The PTRx registers already flirt with the critical path. There's no time to increment a full 32 bits. We are at the limit there, already. Those top 12 bits will just be normal RAM. They will be affected by ALU writing, but not by auto-inc/dec behavior.
It's 8 levels deep.
We could increase it to 32 bits, instead of 22. It will take 10 bits * 8 levels * 8 cogs = 640 more flipflops.
This means there are no more undersized registers anywhere.
When I get up in the morning, it will be done and I'll post it.
A 32 bit wide stack will make a big difference.
Keeping everything symmetrical makes life easier.
Thanks Chip!
Isn't the PAR register also short?
We have PTRA and PTRB on Prop2, and no PAR, right?
Wait, you mean the width of the parameters from COGINIT, right? Those can be widened, too.
I've got CV-A9 in the oven and I'm going to bed. I'll recompile everything else later.
I will have me a slice of that CVA9 pie when it's out of the oven and cooling on the Google Drive rack.
The wider PTRx will enhance SETQ/COGINIT nicely too.
Yes. Thanks.
The widening to 32 bits will simplify things for the users. Less restrictions and caveats lurking is excellent news.
BTW I am on CVA9 too. Unless someone is testing another board, maybe you can save compiling the others for now.
https://drive.google.com/file/d/1l7yWVpljQN8OTV7d4Mok3ukAdjTwmoDX/view?usp=sharing
Could you please see if this works as expected? Tonight I will get to widening the COGINIT conduit.
Thanks.
Sure, one immediate benefit of 32b holding, is better memory management.
A simple routine can be called, check PTRx, and if on-chip, a single opcode is used, if off-chip, more code is employed to get that R/W done.
To the user, they do not need to care where the data is.
Good point.
Stack and PTRx widening working as expected.
Resulted in