Discussion about using LUT as a STACK
Cluso99
Posts: 18,069
in Propeller 2
The discussion about the possibility of using LUT as a stack originated in Peter's Tachyon thread.
Here is the basis of that discussion...
Here is the basis of that discussion...
Peter,
I'm not able to decipher what stacking levels are in use there. Was it intended to help answer Chip's question of whether there is a strong case of having the LUT for stacking or not?
Chip,
Another stacking variant that might be more palatable to all is for the CALLA/B, PUSHA/B instructions being able to stack to CogRAM and/or LUTRAM. Would that be feasible? I guess that cuts off a chunk of Hub addresses though.
Chip,
Another stacking variant that might be more palatable to all is for the CALLA/B, PUSHA/B instructions being able to stack to CogRAM and/or LUTRAM. Would that be feasible? I guess that cuts off a chunk of Hub addresses though.
Wow! Make CALLA/CALLB/RETA/RETB address sensitive, just like cog/hub instruction fetching is, so that addresses $200..$3FF use the lut, instead of the hub. I really like that, because it means we don't need a whole extra set of instructions and pointers to use lut for stack. Excellent!
Are there any other sleeper purposes for address sensitivity?
P.S. This scheme would work only for CALLA/CALLB/RETA/RETB, since they are dedicated instructions that could easily be redirected. PUSHA/PUSHB/POPA/POPB are actually cases of WRLONG/RDLONG, though, and to redirect them, as well, would mean that hub $200..$3FF would not be reachable for R/W. So, this would only work for calls and returns.
David Betz wrote: »Umm... Doesn't that mean that if I have PTRA pointing into LUT and do a CALLA to some function and that function does a POPA to get its return address, it will fetch the wrong value?Chip,
Another stacking variant that might be more palatable to all is for the CALLA/B, PUSHA/B instructions being able to stack to CogRAM and/or LUTRAM. Would that be feasible? I guess that cuts off a chunk of Hub addresses though.
Wow! Make CALLA/CALLB/RETA/RETB address sensitive, just like cog/hub instruction fetching is, so that addresses $200..$3FF use the lut, instead of the hub. I really like that, because it means we don't need a whole extra set of instructions and pointers to use lut for stack. Excellent!
Are there any other sleeper purposes for address sensitivity?
P.S. This scheme would work only for CALLA/CALLB/RETA/RETB, since they are dedicated instructions that could easily be redirected. PUSHA/PUSHB/POPA/POPB are actually cases of WRLONG/RDLONG, though, and to redirect them, as well, would mean that hub $200..$3FF would not be reachable for R/W. So, this would only work for calls and returns.
Yes, because PUSHA/PUSHB/POPA/POPB are actually WRLONG/RDLONG, only CALLA/CALLB/RETA/RETB could have this lut functionality. Maybe it's too complicated, in that sense, as it would make people suppose that PUSHx and POPx would work, too. To do a pop without caring about the data, you'd just 'SUB PTRx,#1', while a push would be 'WRLUT data,PTRx' plus 'ADD PTRx,#1'.
It would be easy to support PTRx expressions for RDLUT/WRLUT, where only the lower 9 bits are used for lut address. So, a pop from lut would be 'RDLUT data,--PTRx' and a push would be 'WRLUT data,PTRX++'. If we made discrete PUSHx/POPx instructions which weren't just aliases for WRLONG/RDLONG, we could handle this better by invoking either WRLONG/RDLONG or WRLUT/RDLUT, depending on the PTRx range being within lut, or not. That would be pretty easy.
We would have these discrete instructions which would use hub or lut, based on the PTRx address:
PUSHA D/#
PUSHB D/#
POPA D
POPB D
CALLA D/#/@
CALLB D/#/@
RETA
RETB
Meanwhile RDBYTE/RDWORD/RDLONG and WRBYTE/WRWORD/WRLONG would always access hub, only.
Comments
It is a full 32-bits, so pushing 32-bit values could/would work.
Chip,
Since you are using PTRA/PTRB to point to HUB or LUT, could it also be used to point to COG (stack in cog)?
Seems to me that the internal 8 level 22-bit stack would now be redundant. Can we get rid of it and its supporting instructions? IMHO it is much better to be able to put the stack in COG/LUT/HUB just by setting the PTRA/PTRB. And it saves the argument for either wider 32-bit or deeper than 8.
This actually makes a further good case for LUT Max_Adr justified (which allows the plus of 1K unbroken code) - as that keeps clear of any accidental PTR roll-into-LUT space.
LUT may be used for other tasks, so this needs to be an explicit user decision.
No, the PC address wouldn't have anything to do with this.
Phew, when you said earlier
[".... Make CALLA/CALLB/RETA/RETB address sensitive, just like cog/hub instruction fetching is, so that addresses $200..$3FF use the lut, instead of the hub. "]
I thought the mention of 'just like cog/hub instruction fetching' meant the PC got into the mix here, like it does to change cog/hub operation.
If the code is PC agnostic, and only PTRx determines the action, then that is good.
Yes, I think that needs to stay, no matter what, because it's bullet-proof and fast. Eight levels is enough for any cog/lut program. Hub programs might as well use hub for their stacks.
We'll see about some intermediate solution.
The safest way for debugging is to use hubexec and any registers required need to be backed up in hub first.
FYI I built a P1 single stepper that worked on both PASM and SPIN. It used zero footprint in the cog because it resided totally in the shadow ram register space (used for an LMM engine). Shadow ram is being used in the P2 but we now have hubexec.
INA and INB shadow registers serve as the debug interrupt jump and return vectors. They can only be read and written via the SETBRK command or within a debug ISR.
The use of INA & INB shadow registers as the debug interrupt jump and return vectors is nice.
Since they can only be read and written using SETBRK, then perhaps when a COGINIT is performed, if the Cog was already running, could it save the PC plus C&Z flags in the INB shadow register? If it was not running, could it clear the INB shadow register?
This would permit an errant cog to be force-interrupted, and the cog to be interrogated. The INA would not be used because the COGINIT would start in hubexec debug code. By examining the contents of INB, the debugger could determine if the cog was previously running or stopped. It could be set to continue by returning via INB (RET0 ???). Of course there are caveats like if the cog was in a wait or rep loop, etc. However, debuggers always have caveats
It's going to work a lot better to just push C/Z/PC onto the hardware stack when it get's COGINIT'd. No special mux is needed, just the signal to do it.
an obvious question...
If the stack is going to expand to allow store of C/Z, can users pass 2 booleans back from functions, using the stack ?
The stack has always been wide enough to store the C&Z flags plus the address, hence 22-bits while hub addresses are in bytes.
So yes, you could put 2 Booleans instead provided you did the appropriate pop/push.
A PUSH pushes D[22:0], while a RET pops bit 21 into C if WC, bit 20 into Z if WZ, and bits 19:0 into the PC.
So, what's the idea? Would the COGINIT just translate into executing a CALL (instead of a JMP) on the target COG?
This feature only exists so that you can find out where the cog was executing before the COOGINIT. The new program would have to do a POP to find out where the last program left off.
Have you ever considered doing the LUT space as a 256 x 64 bit true dual ported ram, with independent LONG select RD and WR controls?
Could an approach like this one, enable the simultaneous and uncommitted use of the streamer and code+stack operations, provided that each class runs segregated to its own longs?
Sure, if only code+stack usage is intended to satisfy some application, A0 could be used to select the appropriate long, and 512 contiguous longs are accessible.
But, when simultaneous use of the streamer is intended, and since it could be writing into LUT space at the same time, then segregating their access by dividing the ram into upper/lower long halves is needed, to avoid any interferences.
I'm not sure if it's feasible or really worth the effort, but if it is and solves the problem, why don't?
Henrique
We already have the ALTDS instruction which can modify the following instruction in many ways.
From the ALTDS notes for the SSS field we have
So if we had a couple of new assembler aliases like
Simply initialize the ADRA reg with the lut stack base address.
Before "popping" from the LUT a simple sub adra,#1 adjusts for no pre decrement.
We almost have all the pieces we need to implement a stack except for pre-decrement operation. Chip?
I feel better now.