Question on hardware stack "PUSH/POP" data width
ozpropdev
Posts: 2,792
Hi All
I've noticed that if I PUSH a value onto the hardware stack only 22 bits are stored.
Is this the "hard wired" data width or some other issue?
If it's hard wired that way then that needs to be documented in "the rules of play"
Test code
I've noticed that if I PUSH a value onto the hardware stack only 22 bits are stored.
Is this the "hard wired" data width or some other issue?
If it's hard wired that way then that needs to be documented in "the rules of play"
87654321 'pushed on stack 00254321 'pops this result
Test code
con sys_clk = 50_000_000 baud_rate = 115_200 rx_pin = 63 tx_pin = 62 dat orgh 1 launch setq #$1F7 rdlong cogregs,ptrb[(code-launch)>>2] jmp #mycode '============================================================ code org cogregs long 0[16] mycode setb outb,#tx_pin 'set tx output high setb dirb,#tx_pin wait4start testb inb,#rx_pin wz 'press a key to start in PST if_nz jmp #wait4start mov val,test_pattern 'show pattern call #show_hex call #newline push test_pattern 'push/pop val, show result pop val call #show_hex call #newline here jmp @here test_pattern long $87654321 bit_time long sys_clk / baud_rate val long 0 timer long 0 dx long 0 nibc long 0 '============================================================ newline mov dx,#13 send_byte setb dx,#8 shl dx,#1 getcnt timer rep @endrep,#10 testb dx,#0 wz setbnz outb,#tx_pin addcnt timer,bit_time waitcnt shr dx,#1 endrep ret '============================================================ show_hex mov nibc,#8 '8 x nibbles show_hex2 mov dx,val shr dx,#28 cmp dx,#9 wz,wc if_a add dx,#"A"-10 if_be add dx,#"0" call @send_byte shl val,#4 djnz nibc,@show_hex2 ret
Comments
It's useful for CALL/RET. The PUSH/POP are there so that you can inspect and modify return addresses.
I guess this means that Z and C flags don't come back to the caller.
But, sounds like subroutine could pop, change flag setting, then push before returning to have a flag set upon return to caller. Is this the idea (along with changing where it returns to)?
It might be useful to pass parameters to subroutines.
Bean
The hardware stack is only 8 or 16 deep, it's just meant for return addresses/flags. It's implemented internal to the cog as hidden registers, not taking up any actual memory.
Seems quirky/confusing to have a variance in POP/PUSH by context ?
(even on a small stack)
Is fixing that a zero cost fix ? (if Stack is already 32b seems so ?)
If I recall correctly, those instructions are using PTRA/B, hence the A/B suffix.
Maybe PUSH/POP could be renamed to PUSHR/POPR to indicate the internal return stack.
or even PUSH22/POP22 ?
I wouldn't, at least for now. At only 8 deep, you'll get into trouble real quick if you're also pushing/popping data.
Maybe rename them, though. I agree that PUSH/POP may be a bit too generic (implying generic 32-bit width).
Don't you mean "aux" ram?
Add PTRA/PTRB behavior to RDLUT/WRLUT. However, it might make more sense to use ADRA/ADRB, but the functionality would be basically the same. This would provide an easy means to maintain stacks in the LUT.
(I suspect you Forth programmers would really like this change.)
From there, you could also add a LUT version of CALLA/B and RETA/B.
note: with those two changes, it may also diminish the need for the hidden 8-deep stack. Just a thought...
hmm. 32 bits is more consistent and expected and then PUSH/POP work like any other MCU, and do not need to be renamed.
Can users read the stack pointer, and what happens on stack overflow/underflow ?
Can debug read the stack contents - eg by POP 8x should leave SP back where it was, and extract the whole stack ?
That's certainly what was on my mind. That and I feel even more address registers would be good. And since the memory addressing has been flattened somewhat there's a natural tendency to want to use the same indirection capabilities everywhere.
cccc 1011011 CZ0 DDDDDDDDD SSSSSSSSS : RDLUT D, ADRx
cccc 1011011 CZ1 DDDDDDDDD SSSSSSSSS : WRLUT D, ADRx
Incidentally, this would return one bit to the index field, allowing -32 to +31. Unfortunately, that wouldn't quite be consistent with the PTRx formatting. If there were one more spare opcode, the memory hubops could have been re-arranged in the same way, though.
Would a more conventional stack be able to map into either COG or LUT ?
This sounds like something that could be changed later, once the more fundamental things are worked out...
I agree, but figured there wasn't an available slot in the pipeline for that. But! If there *were*, then I'd suggest adding the two instructions (maybe call them something else... RDADR/WRADR maybe) and use the MSB of the S field to select cog or LUT. Then the rest of the field would still be identical to PTRx usage.
Glitch how? Just curious...
...or a streamer read being usurped by a CALL/RET, causing a bad pixel to be output.
That sounds like a bad outcome, so such a stack would need to be able to be placed elsewhere too....(in COG)
Unless that is easy to do, sounds best to keep the present design ?
My question above was can a debug routine POP the HW stack 8 times, and 'end up back where it was' and so make the Stack visible during debug ?
I think, too, that the current stack design is adequate.
Then you can keep streamer activity at the low half (0 thru FF) and simultaneously having stack (or plain RDLUT/WRLUT) activiy at the top half, in parallel.
My 0.02
Henrique
Cool, I'm sure novice users will get lost with the stack...as they do on all MCUs as they JMP out of a routine when they should RET...
Stack inspect will be important for Data use too...
(and I can see COP watchdog uses as well.. )
Do you mean with the nudge to 32b wide ?