XBYTE question
TonyB
Posts: 73
in Propeller 2
I've looked at xbyte.spin2 but the inner workings of XBYTE are not completely clear to me.
How is LUT address[8:0] for RDLUT formed exactly, for the different PUSH #$1Fx?
push #$1F8 'push $1F8 for xbyte with 8-bit lut index _ret_ setq #$100 'start xbyte with lut base = $100, no stack pop { clock phase hidden description ---------------------------------------------------------------------------------------------------------------------- 1 go RFBYTE byte last clock of instruction which is executing a RET/_RET_ to $1F8..$1FF 2 get RDLUT @byte, write byte to PA 1st clock of 1st cancelled instruction 3 go LUT long --> next D 2nd clock of 1st cancelled instruction 4 get EXECF D, write GETPTR to PB 1st clock of 2nd cancelled instruction 5 go EXECF 2nd clock of 2nd cancelled instruction 6 get flush pipe 1st clock of 3rd cancelled instruction 7 go flush pipe 2nd clock of 3rd cancelled instruction 8 get 1st clock of 1st instruction of bytecode routine, loop to 1 if _RET_ }
How is LUT address[8:0] for RDLUT formed exactly, for the different PUSH #$1Fx?
Comments
I am updating the doc's to cover this. It should be done in about a half hour:
https://docs.google.com/document/d/1x9mCjSTTPy2FBnYZlMxz7Vk6tstnhhj-Hgv_ueolMUI/edit?usp=sharing
Do SETQ and SETQ2 write to the same internal 9-bit register or counter?
It's done!
SETQ/SETQ2 writes to a 32-bit register that is always used by the next instruction. Interrupts are inhibited on SETQ/SETQ2, so that that it will be reliably coupled to the next instruction.
I'm approaching the P2 from the assembler and hardware direction and I'm interested in how much the P2 could replace FPGAs. The COG LUTs will be very useful and quite often the address inputs will come from two separate sources, changed at different frequencies. Instead of combining these in software, it would be simpler and quicker for this to be available in hardware if required for all RDLUT instructions, not just those that are part of XBYTE.
The following assumes that the Q register output is preserved until the next SETQ or SETQ2 instruction. Each RDLUT address bit could be either the corresponding RDLUT S bit or SETQ bit as given by a new 9-bit LUTMUX register, written by a new D-only instruction and cleared on reset. 0 selects S and 1 selects Q.
LUTMUX would allow all LUT base and index permutations in the documentation, although some of the LUT address bits would be swapped as the bytecode would not be shifted. This is not a problem as the LUT data could be written in a different order. In fact LUTMUX would allow much more, SSSQQQSSS or QQQSSSQQQ or SQQQQQQQQ or QSSSSSSSS, etc.
In the SQQQQQQQQ example, S could be a bit from a pattern byte shifted left at eight times the frequency that its attribute byte QQQQQQQQ is written. As LUTMUX would determine the LUT address muxing, it would be possible to push only one high COG address to start XBYTE, e.g. PUSH #$1FF.
Is XBYTE exited by popping $1F8-$1FF with a POP? If so and if XBYTE was started with a PUSH #$1Fx, what will be in D[21:9] and C & Z if enabled?
Can SETQ be used anywhere in the executed bytecode to change the LUT base address?
Can CALLs be made in the bytecode? Is the 8-level (?) stack circular?
It can only start with a _ret_ SETQ combination.
I think STEQ would return to its normal function during a XBYTE sequence.
I believe CALLS can be made within a XBYTE sequence but would have to be allowed for in the SKIP pattern.
The hardware stack is not circular, so stack loses bottom of stack on overflow.
Would pushing a value other than $1F8-$1FF be the simplest way to exit XBYTE?
I have an XBYTE-related question about SKIPF. What happens if a CALL is one of the instructions not skipped halfway through a sequence?
Then the remaining skip pattern gets applied to the code after the branch.
Choices 1-8 and 10-17 are independent. The code for 8 and 17 already exists in two routines, both too long to fit in the skip sequence in full and there is not enough space in cog RAM to duplicate them anyway.
If SKIP/SKIPF/EXECF could treat a subroutine as a single instruction, by suspending or "freezing" the skipping within the subroutine itself, it would make this excellent new mechanism even better and more powerful.
However that is not the way things work at the moment and some tweaking would be needed. The following suggestions might not be practicable but nothing ventured, nothing gained.
A new 1-bit Skip flag or SF could be created. This would be "Skip Freeze" in fact, cleared by reset or SKIP/SKIPF/EXECF and set by a CALL after the previous SF value has been pushed onto the stack along with PC, C & Z, as bit 31.
When SF = 1 (following a CALL) the skip bit pattern shifter would be disabled. At the end of the routine SF = 0 would be popped and skipping would restart. If the routine called another one, SF = 1 would be pushed and popped with skipping frozen in the second routine too.
The instruction at the return address might need cancelling and replacing with NOP if it is being skipped. It might be possible for skipping to be interrupted but that is not the main objective. Although the code above has only two calls, other examples could have more.
Deleted - see below for simpler solution.
Could you break out the "call abc" and "call xyz" cases into their own routines? It would add 16 copies of the routines, if all cases really are independent (and used).
Another way to handle "call xyz" would be to replace it with a flag setting operation; e.g. at the beginning of your routine clear C, replace "call xyz" with something that sets C, and then before instr_18 do an unconditional " if_c call xyz". Once the SKIPF pattern is into all 0's you can use call (or conditional call) all you want without fear of causing conflicts.
You mention that COG memory is full. Is there space in LUT memory to place routines?
Eric
Thanks for the reply. The cases really are independent with all permutations bar one possible, so the code snippet would handle 63 bytecodes (or 90+ instructions with variants) if it could work.
Cog RAM is really tight, I need all of the LUT as LUT and I'm using both C and Z as special-purpose flags. I might be able to jump to xyz, though, which would avoid that call.
Getting to abc is easy, it's how to choose one of 10-17 afterwards. I wouldn't have made my suggestion if I could come up with some other way. Skipping as-is will do the Spin2 interpreter but I'm sure other P2 users would want to do the same thing as me.
You could always do something like: All the branches (except possibly for the call to abc) can be done unconditionally, so the SKIPF pattern will hold all 0's after the 1-8 choice and won't cause any problems for the abc subroutine, nor for the xyz subroutine. I've used XBYTE to construct a ZPU interpreter with skipping as-is, so it's certainly usable for more than Spin2. It would be nice if a call could be treated as a single instruction for skip purposes, but that sounds complicated and I *really* don't think we want to delay the P2 any more than it already has been!
Eric
Eric, thanks for the workaround, in the absence of proper skip call handling. I'm not sure how many extra clock cycles a return adds when it's a prefix - is it 2 or 4? Instructions 10-16 would take 10 or 12 cycles altogether compared to only 2 in my code, a big difference.
The other issue is that instructions which should really be at the end have to be moved to be the beginning. Creating skip patterns is enough work without having to jump through more mental hoops. The P2 should be as easy to program as possible.
* * * * * * * * * *
What I suggested before was too complicated. Pushing or popping is not necessary and here is a much simpler alternative:
If SKIP/SKIPF/EXECF could treat a subroutine as a single instruction, by suspending or "freezing" the skipping within the subroutine itself, it would make this excellent new mechanism even better and more powerful.
A new 1-bit Skip_Freeze flag would be needed, set by a CALL and reset by a RET or SKIP or SKIPF or EXECF. The skip bit pattern shifter would be disabled when Skip_Freeze = 1. Apart from possible pipelining to keep things in sync, logically that's it.
Chip,
While you're sorting out the smart pins, could you also please consider adding "easy calls" to skipping as described? It would be ace and I've done most of the hard work already - the thinking!
I agree that skipping would be better if it was suspended within CALL'd code. I'll see about making it work that way.
I guess that becomes a 'rule' - what happens if someone accidentally breaks that rule ?
What happens to Skip structures, should an interrupt occurs in the middle of a skip action ? (effectively that is a call?)
Please be *extremely* conservative about this... while I agree that suspending SKIP over call would be handy sometimes, I don't think it's worth delaying the hardware over.
Eric
That won't work if there are nested subroutines . It would have to be a counter, which means even more logic.
In an ideal world I agree that skip would treat the whole call+subroutine as one instruction. On the other hand in an ideal world the P2 would have shipped already. It's a tricky balancing act . My feeling is this feature should be added only if it's trivial. We really really need to have a real freeze!
Eric
In this test I fire off a interrupt in the middle of a SKIP action.
The main loop functions as expected with the ISR unaffected by the SKIP.
It would seem that the SKIP action is "frozen" during interrupts.
On the other, there are a lot of COGS.
And they would be allowed in PASM blocks.
In return for that, we avoid a lot of complex interrupt state management. IMHO, this is a big win, given we have a lot of interrupt event triggers spread out over all the COGS.