XBYTE question

TonyB · 2017-05-23 23:42

I've looked at xbyte.spin2 but the inner workings of XBYTE are not completely clear to me.

		push	#$1F8		'push $1F8 for xbyte with 8-bit lut index
	_ret_	setq	#$100		'start xbyte with lut base = $100, no stack pop

{
clock	phase	hidden				description
----------------------------------------------------------------------------------------------------------------------
1	go	RFBYTE byte			last clock of instruction which is executing a RET/_RET_ to $1F8..$1FF

2	get	RDLUT @byte, write byte to PA	1st clock of 1st cancelled instruction
3	go	LUT long --> next D		2nd clock of 1st cancelled instruction
4	get	EXECF D, write GETPTR to PB	1st clock of 2nd cancelled instruction
5	go	EXECF				2nd clock of 2nd cancelled instruction
6	get	flush pipe			1st clock of 3rd cancelled instruction
7	go	flush pipe			2nd clock of 3rd cancelled instruction

8	get					1st clock of 1st instruction of bytecode routine, loop to 1 if _RET_
}

How is LUT address[8:0] for RDLUT formed exactly, for the different PUSH #$1Fx?

cgracey · 2017-05-24 01:04

TonyB,

I am updating the doc's to cover this. It should be done in about a half hour:

https://docs.google.com/document/d/1x9mCjSTTPy2FBnYZlMxz7Vk6tstnhhj-Hgv_ueolMUI/edit?usp=sharing

TonyB · 2017-05-24 01:19

Thanks Chip.

Do SETQ and SETQ2 write to the same internal 9-bit register or counter?

cgracey · 2017-05-24 01:22

TonyB wrote: »

Thanks Chip. No rush as it's way past bedtime here already.

Do SETQ and SETQ2 write to the same internal 9-bit register or counter?

It's done!

SETQ/SETQ2 writes to a 32-bit register that is always used by the next instruction. Interrupts are inhibited on SETQ/SETQ2, so that that it will be reliably coupled to the next instruction.

TonyB · 2017-05-25 00:29

Having seen the full details of how XBYTE works for the first time today, I hope it is not too late to make a suggestion.

I'm approaching the P2 from the assembler and hardware direction and I'm interested in how much the P2 could replace FPGAs. The COG LUTs will be very useful and quite often the address inputs will come from two separate sources, changed at different frequencies. Instead of combining these in software, it would be simpler and quicker for this to be available in hardware if required for all RDLUT instructions, not just those that are part of XBYTE.

The following assumes that the Q register output is preserved until the next SETQ or SETQ2 instruction. Each RDLUT address bit could be either the corresponding RDLUT S bit or SETQ bit as given by a new 9-bit LUTMUX register, written by a new D-only instruction and cleared on reset. 0 selects S and 1 selects Q.

LUTMUX would allow all LUT base and index permutations in the documentation, although some of the LUT address bits would be swapped as the bytecode would not be shifted. This is not a problem as the LUT data could be written in a different order. In fact LUTMUX would allow much more, SSSQQQSSS or QQQSSSQQQ or SQQQQQQQQ or QSSSSSSSS, etc.

In the SQQQQQQQQ example, S could be a bit from a pattern byte shifted left at eight times the frequency that its attribute byte QQQQQQQQ is written. As LUTMUX would determine the LUT address muxing, it would be possible to push only one high COG address to start XBYTE, e.g. PUSH #$1FF.

ersmith · 2017-05-25 01:43

Let's stop adding features and get this thing out the door. I'm sure there are things that could be done to improve XBYTE; there's always room for improvement. But the most important thing is to get it debugged and shipping, and the more features we add the longer that will take

. I've used XBYTE to implement a ZPU interpreter, and used RDFAST and other techniques to implement a RISC-V interpreter, and I think the P2 will be fine for interpreting other processors as-is.

potatohead · 2017-05-25 05:51

Seconded

TonyB · 2017-05-27 00:52

A few more questions:

Is XBYTE exited by popping $1F8-$1FF with a POP? If so and if XBYTE was started with a PUSH #$1Fx, what will be in D[21:9] and C & Z if enabled?

Can SETQ be used anywhere in the executed bytecode to change the LUT base address?

Can CALLs be made in the bytecode? Is the 8-level (?) stack circular?

cgracey · 2017-05-27 01:56

XBYTE is just something that happens each time a RET/_RET_ to $1F8..$1FF occurs. What happens before and after that is completely independent.

ozpropdev · 2017-05-27 02:12

TonyB wrote: »

A few more questions:

Is XBYTE exited by popping $1F8-$1FF with a POP? If so and if XBYTE was started with a PUSH #$1Fx, what will be in D[21:9] and C & Z if enabled?

Can SETQ be used anywhere in the executed bytecode to change the LUT base address?

Can CALLs be made in the bytecode? Is the 8-level (?) stack circular?

XBYTE doesn't start with a POP if TOS is $1F8..$1F.
It can only start with a _ret_ SETQ combination.

I think STEQ would return to its normal function during a XBYTE sequence.

I believe CALLS can be made within a XBYTE sequence but would have to be allowed for in the SKIP pattern.

The hardware stack is not circular, so stack loses bottom of stack on overflow.

cgracey · 2017-05-27 04:26

_RET_ + SETQ with stack=$1F8..$1FF configures and executes XBYTE. The same thing without the SETQ just executes XBYTE using the last-set configuration, but not necessarily the last SETQ. The configuration is stored separately from normal SETQ data.

TonyB · 2017-05-27 23:23

LUT base could change frequently after XBYTE has started, sometimes at the end of every byte. If LUT index stays the same then _ret_ setq lutbase would be enough to do this, else a push #$1fx is also needed first - is that correct?

Would pushing a value other than $1F8-$1FF be the simplest way to exit XBYTE?

cgracey · 2017-05-28 03:27

Think of XBYTE as an instruction, not a mode. When a RET to $1F8..$1FF occurs, XBYTE executes, using the last configuration set by _RET_+SETQ. So, you want to execute your initial XBYTE by _RET_+SETQ, to put it into a known mode - the mode being the RET address which sets the number of bytecode lookup bits, the LSB of the SETQ value which selects between MSBs and LSBs of the bytecodes, and the MSB(s) of the SETQ value which select(s) the base address of the LUT table.

TonyB · 2017-06-05 18:30

Thanks Chip, that finally sunk in (last week).

I have an XBYTE-related question about SKIPF. What happens if a CALL is one of the instructions not skipped halfway through a sequence?

cgracey · 2017-06-07 02:46

TonyB wrote: »

Thanks Chip, that finally sunk in (last week).

I have an XBYTE-related question about SKIPF. What happens if a CALL is one of the instructions not skipped halfway through a sequence?

Then the remaining skip pattern gets applied to the code after the branch.

TonyB · 2017-06-08 00:58

I've been looking at a real-world use for XBYTE and the skip code looks as follows, where instr_X are various single-long instructions:

	instr_0		'0	 always do this
	instr_1		'1  \
	instr_2		'2   \
	instr_3		'3    \
	instr_4		'4     \ do one of 1-8
	instr_5		'5     /
	instr_6		'6    /
	instr_7		'7   /
	call #\abc	'8  /
	instr_9 	'9	 always do this
	instr_10	'10 \
	instr_11	'11  \
	instr_12	'12   \
	instr_13	'13    \ do one of 10-17
	instr_14	'14    /
	instr_15	'15   /
	instr_16	'16  /
	call #\xyz	'17 /
	instr_18	'18	 always do this
	...

Choices 1-8 and 10-17 are independent. The code for 8 and 17 already exists in two routines, both too long to fit in the skip sequence in full and there is not enough space in cog RAM to duplicate them anyway.

If SKIP/SKIPF/EXECF could treat a subroutine as a single instruction, by suspending or "freezing" the skipping within the subroutine itself, it would make this excellent new mechanism even better and more powerful.

However that is not the way things work at the moment and some tweaking would be needed. The following suggestions might not be practicable but nothing ventured, nothing gained.

A new 1-bit Skip flag or SF could be created. This would be "Skip Freeze" in fact, cleared by reset or SKIP/SKIPF/EXECF and set by a CALL after the previous SF value has been pushed onto the stack along with PC, C & Z, as bit 31.

When SF = 1 (following a CALL) the skip bit pattern shifter would be disabled. At the end of the routine SF = 0 would be popped and skipping would restart. If the routine called another one, SF = 1 would be pushed and popped with skipping frozen in the second routine too.

The instruction at the return address might need cancelling and replacing with NOP if it is being skipped. It might be possible for skipping to be interrupted but that is not the main objective. Although the code above has only two calls, other examples could have more.

Deleted - see below for simpler solution.

ersmith · 2017-06-08 12:53

Tony:

Could you break out the "call abc" and "call xyz" cases into their own routines? It would add 16 copies of the routines, if all cases really are independent (and used).

Another way to handle "call xyz" would be to replace it with a flag setting operation; e.g. at the beginning of your routine clear C, replace "call xyz" with something that sets C, and then before instr_18 do an unconditional " if_c call xyz". Once the SKIPF pattern is into all 0's you can use call (or conditional call) all you want without fear of causing conflicts.

You mention that COG memory is full. Is there space in LUT memory to place routines?

Eric

TonyB · 2017-06-08 16:33

Eric

Thanks for the reply. The cases really are independent with all permutations bar one possible, so the code snippet would handle 63 bytecodes (or 90+ instructions with variants) if it could work.

Cog RAM is really tight, I need all of the LUT as LUT and I'm using both C and Z as special-purpose flags. I might be able to jump to xyz, though, which would avoid that call.

Getting to abc is easy, it's how to choose one of 10-17 afterwards. I wouldn't have made my suggestion if I could come up with some other way. Skipping as-is will do the Spin2 interpreter but I'm sure other P2 users would want to do the same thing as me.

ersmith · 2017-06-08 18:42

TonyB wrote: »

Cog RAM is really tight, I need all of the LUT as LUT and I'm using both C and Z as special-purpose flags. I might be able to jump to xyz, though, which would avoid that call.

Any branch will have the skip problem, although I guess you could construct the code after the branch to include a nop where an instruction will potentially be skipped.

Getting to abc is easy, it's how to choose one of 10-17 afterwards.

You could always do something like:

	instr_0		'0	 always do this
        mov   cfunc, #i10	'10 \
	mov   cfunc, #i11	'11  \
	mov   cfunc, #i12	'12   \
	mov   cfunc, #i13	'13    \ do one of 10-17
	mov   cfunc, #i14	'14    /
	mov   cfunc, #i15	'15   /
	mov   cfunc, #i16	'16  /
	mov   cfunc, #i17	'17 /
	instr_1		'1  \
	instr_2		'2   \
	instr_3		'3    \
	instr_4		'4     \ do one of 1-8
	instr_5		'5     /
	instr_6		'6    /
	instr_7		'7   /
	call #abc	'8  /
        instr_9 	'9	 always do this
	call cfunc               always do indirect call via cfunc
        instr_18	'18	 always do this

i10
  _ret_ instr_10
i11
  _ret_ instr_11
...
i17
        jmp #xyz

All the branches (except possibly for the call to abc) can be done unconditionally, so the SKIPF pattern will hold all 0's after the 1-8 choice and won't cause any problems for the abc subroutine, nor for the xyz subroutine.

Skipping as-is will do the Spin2 interpreter but I'm sure other P2 users would want to do the same thing as me.

I've used XBYTE to construct a ZPU interpreter with skipping as-is, so it's certainly usable for more than Spin2. It would be nice if a call could be treated as a single instruction for skip purposes, but that sounds complicated and I *really* don't think we want to delay the P2 any more than it already has been!

Eric

potatohead · 2017-06-08 21:02

Wouldn't handling call require another state be added and handled? May exceed timing, if so.

TonyB · 2017-06-10 16:10

ersmith wrote: »
TonyB wrote: »

Cog RAM is really tight, I need all of the LUT as LUT and I'm using both C and Z as special-purpose flags. I might be able to jump to xyz, though, which would avoid that call.

Any branch will have the skip problem, although I guess you could construct the code after the branch to include a nop where an instruction will potentially be skipped.

Getting to abc is easy, it's how to choose one of 10-17 afterwards.

You could always do something like:
	instr_0		'0	 always do this
        mov   cfunc, #i10	'10 \
	mov   cfunc, #i11	'11  \
	mov   cfunc, #i12	'12   \
	mov   cfunc, #i13	'13    \ do one of 10-17
	mov   cfunc, #i14	'14    /
	mov   cfunc, #i15	'15   /
	mov   cfunc, #i16	'16  /
	mov   cfunc, #i17	'17 /
	instr_1		'1  \
	instr_2		'2   \
	instr_3		'3    \
	instr_4		'4     \ do one of 1-8
	instr_5		'5     /
	instr_6		'6    /
	instr_7		'7   /
	call #abc	'8  /
        instr_9 	'9	 always do this
	call cfunc               always do indirect call via cfunc
        instr_18	'18	 always do this

i10
  _ret_ instr_10
i11
  _ret_ instr_11
...
i17
        jmp #xyz
All the branches (except possibly for the call to abc) can be done unconditionally, so the SKIPF pattern will hold all 0's after the 1-8 choice and won't cause any problems for the abc subroutine, nor for the xyz subroutine.

Eric

Eric, thanks for the workaround, in the absence of proper skip call handling. I'm not sure how many extra clock cycles a return adds when it's a prefix - is it 2 or 4? Instructions 10-16 would take 10 or 12 cycles altogether compared to only 2 in my code, a big difference.

The other issue is that instructions which should really be at the end have to be moved to be the beginning. Creating skip patterns is enough work without having to jump through more mental hoops. The P2 should be as easy to program as possible.

* * * * * * * * * *

What I suggested before was too complicated. Pushing or popping is not necessary and here is a much simpler alternative:

If SKIP/SKIPF/EXECF could treat a subroutine as a single instruction, by suspending or "freezing" the skipping within the subroutine itself, it would make this excellent new mechanism even better and more powerful.

A new 1-bit Skip_Freeze flag would be needed, set by a CALL and reset by a RET or SKIP or SKIPF or EXECF. The skip bit pattern shifter would be disabled when Skip_Freeze = 1. Apart from possible pipelining to keep things in sync, logically that's it.

Chip,

While you're sorting out the smart pins, could you also please consider adding "easy calls" to skipping as described? It would be ace and I've done most of the hard work already - the thinking!

cgracey · 2017-06-11 05:42

Tonyb,

I agree that skipping would be better if it was suspended within CALL'd code. I'll see about making it work that way.

jmg · 2017-06-11 08:39

TonyB wrote: »

If SKIP/SKIPF/EXECF could treat a subroutine as a single instruction, by suspending or "freezing" the skipping within the subroutine itself, it would make this excellent new mechanism even better and more powerful.

That structure would allow more flexible skip coding, but implies you cannot have any Skip's inside the Called subroutine ?
I guess that becomes a 'rule' - what happens if someone accidentally breaks that rule ?
What happens to Skip structures, should an interrupt occurs in the middle of a skip action ? (effectively that is a call?)

ersmith · 2017-06-11 10:40

Chip:

Please be *extremely* conservative about this... while I agree that suspending SKIP over call would be handy sometimes, I don't think it's worth delaying the hardware over.

Eric

ersmith · 2017-06-11 10:46

TonyB wrote: »

The other issue is that instructions which should really be at the end have to be moved to be the beginning. Creating skip patterns is enough work without having to jump through more mental hoops. The P2 should be as easy to program as possible.

Sure, but there comes a point when you're running out of COG RAM that you have to resort to odd tricks to make everything fit. That's going to happen at some point no matter what.

If SKIP/SKIPF/EXECF could treat a subroutine as a single instruction, by suspending or "freezing" the skipping within the subroutine itself, it would make this excellent new mechanism even better and more powerful.

A new 1-bit Skip_Freeze flag would be needed, set by a CALL and reset by a RET or SKIP or SKIPF or EXECF. The skip bit pattern shifter would be disabled when Skip_Freeze = 1. Apart from possible pipelining to keep things in sync, logically that's it.

That won't work if there are nested subroutines

. It would have to be a counter, which means even more logic.

In an ideal world I agree that skip would treat the whole call+subroutine as one instruction. On the other hand in an ideal world the P2 would have shipped already. It's a tricky balancing act

. My feeling is this feature should be added only if it's trivial. We really really need to have a real freeze!

Eric

ozpropdev · 2017-06-12 03:58

jmg wrote: »

What happens to Skip structures, should an interrupt occurs in the middle of a skip action ? (effectively that is a call?)

Good question
In this test I fire off a interrupt in the middle of a SKIP action.
The main loop functions as expected with the ISR unaffected by the SKIP.
It would seem that the SKIP action is "frozen" during interrupts.

dat	org

	mov	ijmp1,#isr

	drvh	#47
	drvh	#46
	drvh	#44

loop	waitx	##30_000_000
	skip	#%1000
	drvnot	#44
	drvnot	#43
	trgint1			'trigger interrupt
	drvnot	#47		'cancelled as expected
	drvnot	#46
	jmp	#loop

isr	drvnot	#32		'isr functions OK
	drvnot	#33
	reti1

cgracey · 2017-06-12 04:03

Interrupts are disallowed if there are any SKIP bits waiting.

ozpropdev · 2017-06-12 04:35

Ah. Ok, Got it.

David Betz · 2017-06-12 11:21

cgracey wrote: »

Interrupts are disallowed if there are any SKIP bits waiting.

Couldn't that be a very large percentage of the time while executing byte codes? Does that effectively mean that interrupts can't be used on a byte code COG?

potatohead · 2017-06-12 16:04

Yes, and on one hand that makes interrupts on that COG far less useful in the do it right now sense. But they would continue to be very useful in the get this done first chance it can be done sense.

On the other, there are a lot of COGS.

And they would be allowed in PASM blocks.

In return for that, we avoid a lot of complex interrupt state management. IMHO, this is a big win, given we have a lot of interrupt event triggers spread out over all the COGS.

cgracey · 2017-06-12 16:39

As I develop the interpreter, I run three separate timer interrupts to make sure that everything is working together. While interrupts are blocked for what might be most of the time, interrupts do get very frequent opportunities to execute. The jitter on the scope seems pretty low.

XBYTE question

Comments