XBYTE question

TonyB · 2017-06-12 23:00

cgracey wrote: »

TonyB,

I agree that skipping would be better if it was suspended within CALL'd code. I'll see about making it work that way.

Thank you, Chip.

ersmith wrote: »

TonyB wrote: »

If SKIP/SKIPF/EXECF could treat a subroutine as a single instruction, by suspending or "freezing" the skipping within the subroutine itself, it would make this excellent new mechanism even better and more powerful.

A new 1-bit Skip_Freeze flag would be needed, set by a CALL and reset by a RET or SKIP or SKIPF or EXECF. The skip bit pattern shifter would be disabled when Skip_Freeze = 1. Apart from possible pipelining to keep things in sync, logically that's it.

That won't work if there are nested subroutines . It would have to be a counter, which means even more logic.

In an ideal world I agree that skip would treat the whole call+subroutine as one instruction. On the other hand in an ideal world the P2 would have shipped already. It's a tricky balancing act . My feeling is this feature should be added only if it's trivial. We really really need to have a real freeze!

Eric

If necessary, the rule could be the first RET encountered unfreezes skipping, in which case nested calls (which probably won't happen much of the time) could be done by a jump there after patching the jump back. Not ideal but the simplest to do.

I like your idea of a counter: incremented by CALL, decremented by RET and reset by SKIP/SKIPF/EXECF (when skip pattern shifter is loaded perhaps) with skip shifter disabled when count > 0. A 3-bit counter should be more than enough and what's that compared to the rest of the cog logic?

Although I haven't been here long, I'd like to try out the P2 as soon as possible. I think what I've suggested is worthwhile and I promise you this is my final suggestion.

TonyB · 2017-06-12 23:19

If an interrupt could increment and decrement the skip freeze counter, then ...

This is an unfinished sentence and I haven't broken my promise!

cgracey · 2017-06-13 10:02

After getting the smart pin filtering done, I've been looking at the SKIPF circuit and I think it won't be hard to suspend skipping during CALL and resuming skipping on RET. This will solve a problem I've had with bitfields in the interpreter, where an extra bytecode was needed to write the bitfield back. The idea of a 3-bit counter was good - I would have wrongly thought to use just one bit.

TonyB · 2017-06-13 12:10

Eric (ersmith) deserves the credit for the multi-bit counter.

It must be 4-bit to be safe as stack level is 8. The extra counts available above 8 are more than enough for the 4 interrupts, too. Maximum count possible is 12 for 8-level nested subroutines and 4-level nested interrupts at the same time.

Although not the main intention, interruptible skipping could be a big win here. On the face of it the extra logic required is not very much, however IRETx is a CALLD. Definitely worth investigating but omit if too tricky.

Could all the different CALLs and RETs be supported?

cgracey · 2017-06-14 08:50

I worked on this all day to finally realize that this is not going to work. I got it working under SKIP, but I couldn't get it to work under SKIPF, which is what XBYTE uses, where, rather than cancel instructions at the top level of the pipeline, the PC is variably advanced to skip over instructions four clocks ahead of time. There is not enough time to decode the CALL instructions that early, as that path is too tight, already. There is only enough time to register the instruction data from the RAM, not analyze it.

So, I've reverted to what is in v19. Sorry about this. I was looking forward to this working, as the Spin interpreter would have benefited from this, but I suspect there are other ways around my problem there.

TonyB · 2017-06-14 14:04

cgracey wrote: »

I worked on this all day to finally realize that this is not going to work. I got it working under SKIP, but I couldn't get it to work under SKIPF, which is what XBYTE uses, where, rather than cancel instructions at the top level of the pipeline, the PC is variably advanced to skip over instructions four clocks ahead of time. There is not enough time to decode the CALL instructions that early, as that path is too tight, already. There is only enough time to register the instruction data from the RAM, not analyze it.

So, I've reverted to what is in v19. Sorry about this. I was looking forward to this working, as the Spin interpreter would have benefited from this, but I suspect there are other ways around my problem there.

Chip, thanks for spending the time on this. From what you say, I infer that skip shifting cannot be disabled at the right time. It would be a shame to waste a whole day and here's my last throw of the dice:

If a CALL is to be executed under SKIPF, it must be treated as two instructions (not one) and therefore the skip pattern for it is 00. The first 0 applies to the CALL and the second to the first instruction in the subroutine. As a CALL takes four clocks, this might allow enough time to disable the skip shifter with the second 0 as the LSB so that the rest of the subroutine is executed.

It is then a matter of enabling the shifter again. If the same decoding delay applies to RET, then the instruction after the CALL will always be executed (a NOP if necessary) but it would have no skip bit and the second 0 in 00 could be thought of as referring to this instruction.

Using two pattern bits for a CALL is not a big deal. Usually the CALL will replace several instructions and reduce the net number of pattern bits. Even a 000 pattern would not be that bad, if required to make this work. If the CALL is skipped the pattern is 1 as now.

I don't know whether allowing interrupts during skipping, if considered worthwhile, is easier or harder.

cgracey · 2017-06-14 20:13

TonyB wrote: »

...If a CALL is to be executed under SKIPF, it must be treated as two instructions (not one) and therefore the skip pattern for it is 00. The first 0 applies to the CALL and the second to the first instruction in the subroutine. As a CALL takes four clocks, this might allow enough time to disable the skip shifter with the second 0 as the LSB so that the rest of the subroutine is executed.

This would almost solve the problem, but in the case of the Spin interpreter, it uses ALTI to bring a random instruction into the next pipeline slot. If I could make that a CALL, I could handle all the bitfield reading/writing, but at compile time, I can't represent that slot with either one or two bits.

Another way around this would be to always follow the "0" for the CALL with another "0", so that you'd have to execute both the CALL and the following instruction. That would break up the big shift, to inhibit the swallowing of too many "1"'s that cause the CALLed routine to skip its own initial instructions. But then, I feel there'd be too many caveats stacked up and complexity would ruin an otherwise-understandable feature.

TonyB · 2017-06-14 21:12

What's the difference between my idea of "00" and your idea of "00"?

If it's needed for it to work, making the instruction after CALL always execute is a price worth paying, I think. Quite often it would do something useful and a NOP could be avoided.

So CALL could work in SKIPF but just not for your specific Spin interpreter example?

cgracey · 2017-06-15 02:38

I've been working on this all day and I realized there IS a way to make it work: the CALL must be absolute immediate or register indirect, and NOT relative immediate. The PC was being advanced so early when there were skips after the CALL, that it was causing the skip offset to be realized in the CALL destination. Also, the skip offset was helpfully advancing the RETurn address. Anyway, by using only 'CALL #\address', it can work fine. This is all the Spin interpreter needs out of it. I think I will have it just support 'CALL #\address', 'CALL D', and _RET_ and RET. It could also support 'CALLPA/B {#}D,S', but not the #S version. So, very limited use cases, and I have mixed feelings about allowing interrupts, since they would introduce hidden state data (skip bits waiting in the wings) during debug interrupts. What do you think?

Cluso99 · 2017-06-15 04:26

Just get the silicon out Chip.

While these are nice speedups, they are not going to make or break the P2 success. But time to market will kill the P2

jmg · 2017-06-15 08:44

cgracey wrote: »

.... So, very limited use cases, and I have mixed feelings about allowing interrupts, since they would introduce hidden state data (skip bits waiting in the wings) during debug interrupts. What do you think?

Seems this raises the question of just how can someone Debug SKIP code, if (debug) interrupts are disabled ?

Dave Hein · 2017-06-15 12:31

jmg wrote: »

cgracey wrote: »

.... So, very limited use cases, and I have mixed feelings about allowing interrupts, since they would introduce hidden state data (skip bits waiting in the wings) during debug interrupts. What do you think?

Seems this raises the question of just how can someone Debug SKIP code, if (debug) interrupts are disabled ?

Use a simulator.

ozpropdev · 2017-06-15 14:47

jmg wrote: »

Seems this raises the question of just how can someone Debug SKIP code, if (debug) interrupts are disabled ?

In my V19 P2 debugger (still a WIP) single stepping a "SKIP" action is a problem.
The only thing I could do was to display the chunk of code about to be stepped over.

-----------------------------------------------------------------------------------------(P2 Debugger)
00004: F704200F              INCMOD  $010,#$00F
(? for help) >*
-----------------------------------------------------------------------------------------(P2 Debugger)
00005: 00000000 _RET_        ROR     $000,$000
(? for help) >*
-----------------------------------------------------------------------------------------(P2 Debugger)
00006: 00000000 _RET_        ROR     $000,$000
(? for help) >*
-----------------------------------------------------------------------------------------(P2 Debugger)
00007: FD644031              SKIP    #$020
00008: FA002010              MUL     $010,$010
00009: FD644459              DRVH    #$022
0000A: F1042003              ADD     $010,#$003
0000B: F5642007              XOR     $010,#$007
0000C: F5242001              AND     $010,#$001
0000D: F5442008 <cancel>     OR      $010,#$008
(? for help) >*
-----------------------------------------------------------------------------------------(P2 Debugger)
0000E: FD644058              DRVL    #$020

The instructions that will be "cancelled" are shown.
Better than nothing at all.

XBYTE is another issue again.

potatohead · 2017-06-15 21:18

I say keep interrupts inhibited. The way it is now is clean, and free of lots of little ugly potential state problems.

We have 16 cogs, each with all the events. If interrupt precision is needed, put that on a COG and nail it.

Debugging SPIN code can be simulator, or a modified interpreter.

TonyB · 2017-06-15 22:46

cgracey wrote: »

I've been working on this all day and I realized there IS a way to make it work: the CALL must be absolute immediate or register indirect, and NOT relative immediate. The PC was being advanced so early when there were skips after the CALL, that it was causing the skip offset to be realized in the CALL destination. Also, the skip offset was helpfully advancing the RETurn address. Anyway, by using only 'CALL #\address', it can work fine. This is all the Spin interpreter needs out of it. I think I will have it just support 'CALL #\address', 'CALL D', and _RET_ and RET. It could also support 'CALLPA/B {#}D,S', but not the #S version. So, very limited use cases, ...

Great news, thanks Chip! I think you described this as ninja-level stuff so limited uses perfectly acceptable in my view. The most important thing is it's there for people to use if they choose.

Is there a 4-bit +CALL/-RET counter as suggested?

cgracey wrote: »

... and I have mixed feelings about allowing interrupts, since they would introduce hidden state data (skip bits waiting in the wings) during debug interrupts. What do you think?

I vote yes to allowing interrupts during skipping. Why would you not want the interrupt response to be as fast as possible? The delay could be quite long otherwise, especially if there is a CALL or even worse nested CALLs.

Cluso99 · 2017-06-15 23:21

What was to be a very specific set of use cases is blowing out into another nightmare.

Just let it be and get the silicon out! Otherwise there will be no silicon for 2018 either !!!

cgracey · 2017-06-16 01:55

potatohead wrote: »

I say keep interrupts inhibited. The way it is now is clean, and free of lots of little ugly potential state problems...

Lots of state problems, for sure, and I don't think my brain could figure them all out. Better to keep it sane and likely bug-free.

I have a three-bit call+/ret- counter. I'd kind of like to make it even ONE bit. Putting nested calls into a skip sequence would be bad practice, considering that interrupts would be inhibited. I'm thinking maybe two bits would be about all you'd practically want to use.

Where this feature shines is in cases where you often have a single instruction, but may need more code to handle special cases. No problem - just put a CALL in, instead. The CALL'd code can do more than one instruction. Consider the dynamically-inserted instructions following the ALTI's in this case:

sha_mod		mov	y,x		'	x x a b c d e | | h i	a: >>
sgn_mod		not	y,x		'	x x | | | | | f g | |	b: <<
		alti	rd		'rd	m n | | | | | | | | |	c: SAR
		popa	x		'rd,op	m n a b c d e f g h i	d: ROR
		rev	x		'REV	x x | | | | | f | | |	e: ROL
		shr	x,y		'>>	x x a | | | | f | | |	f: REV
		shl	x,y		'<<	x x | b | | | | g | |	g: SIGNX
		sar	x,y		'SAR	x x | | c | | | g | |	h: +
		ror	x,y		'ROR	x x | | | d | | | | |	i: -
		rol	x,y		'ROL	x x | | | | e | | | |
		add	x,y		'+	x x | | | | | | | h |
		sub	x,y		'-	x x | | | | | | | | i
		alti	wr		'wr	m n | | | | | | | | |
		ret			'wr,op	m n a b c d e f g h i	m: var ?= exp	(isolated)
	_ret_	popa	x		'iso	m |			n: var ?= exp	(push)
	_ret_	zerox	x,sz		'push	  n			x: use a..i

Registers 'rd' and 'wr' hold instructions that are usually atomic, but in the case of bitfield reading or writing, a CALL can be in 'rd' or 'wr' to point to the several instructions needed to perform the bitfield operations, which are more complex than the usual atomic RDxxxx/WRxxxx.

TonyB · 2017-06-16 14:01

cgracey wrote: »

I have a three-bit call+/ret- counter. I'd kind of like to make it even ONE bit. Putting nested calls into a skip sequence would be bad practice, considering that interrupts would be inhibited. I'm thinking maybe two bits would be about all you'd practically want to use.

A 3-bit counter was my first suggestion. Please don't reduce it as it seems to be the optimum size. It should be possible to have a good number of nested subroutines when interrupts are not used. A nested level of 3 with a 2-bit counter probably won't be enough sometimes and saving one counter bit doesn't gain anything. If interrupts are used and response time is important don't do nested calls - simple!

A few questions:

1. Are skip CALLs working in practice or just in theory?

2. Is the instruction following the CALL skippable?

3. Does anything bad happen in XBYTE if there is a RET/_RET_ to $1F8-$1FF when the next skip bit is 1 or does a return clear the skip pattern shifter (if the skip freeze counter = 0)?

cgracey · 2017-06-16 16:34

TonyB wrote: »

cgracey wrote: »

I have a three-bit call+/ret- counter. I'd kind of like to make it even ONE bit. Putting nested calls into a skip sequence would be bad practice, considering that interrupts would be inhibited. I'm thinking maybe two bits would be about all you'd practically want to use.

A 3-bit counter was my first suggestion. Please don't reduce it as it seems to be the optimum size. It should be possible to have a good number of nested subroutines when interrupts are not used. A nested level of 3 with a 2-bit counter probably won't be enough sometimes and saving one counter bit doesn't gain anything. If interrupts are used and response time is important don't do nested calls - simple!

A few questions:

1. Are skip CALLs working in practice or just in theory?

2. Is the instruction following the CALL skippable?

3. Does anything bad happen in XBYTE if there is a RET/_RET_ to $1F8-$1FF when the next skip bit is 1 or does a return clear the skip pattern shifter (if the skip freeze counter = 0)?

1) They are working NOW.
2) Yes, no caveats.
3) Nothing bad happens. XBYTE sets a new SKIPF pattern.

potatohead · 2017-06-16 16:41

Where this feature shines

I'm good there. You got it to meet timing and without gotchas

TonyB · 2017-06-16 23:28

cgracey wrote: »

TonyB wrote: »

cgracey wrote: »

I have a three-bit call+/ret- counter. I'd kind of like to make it even ONE bit. Putting nested calls into a skip sequence would be bad practice, considering that interrupts would be inhibited. I'm thinking maybe two bits would be about all you'd practically want to use.

A 3-bit counter was my first suggestion. Please don't reduce it as it seems to be the optimum size. It should be possible to have a good number of nested subroutines when interrupts are not used. A nested level of 3 with a 2-bit counter probably won't be enough sometimes and saving one counter bit doesn't gain anything. If interrupts are used and response time is important don't do nested calls - simple!

A few questions:

1. Are skip CALLs working in practice or just in theory?

2. Is the instruction following the CALL skippable?

3. Does anything bad happen in XBYTE if there is a RET/_RET_ to $1F8-$1FF when the next skip bit is 1 or does a return clear the skip pattern shifter (if the skip freeze counter = 0)?

1) They are working NOW.
2) Yes, no caveats.
3) Nothing bad happens. XBYTE sets a new SKIPF pattern.

Excellent! I'm really pleased about this, for all of us. Thanks Chip.

cgracey · 2017-06-17 02:23

TonyB wrote: »

cgracey wrote: »

TonyB wrote: »

cgracey wrote: »

I have a three-bit call+/ret- counter. I'd kind of like to make it even ONE bit. Putting nested calls into a skip sequence would be bad practice, considering that interrupts would be inhibited. I'm thinking maybe two bits would be about all you'd practically want to use.

A 3-bit counter was my first suggestion. Please don't reduce it as it seems to be the optimum size. It should be possible to have a good number of nested subroutines when interrupts are not used. A nested level of 3 with a 2-bit counter probably won't be enough sometimes and saving one counter bit doesn't gain anything. If interrupts are used and response time is important don't do nested calls - simple!

A few questions:

1. Are skip CALLs working in practice or just in theory?

2. Is the instruction following the CALL skippable?

3. Does anything bad happen in XBYTE if there is a RET/_RET_ to $1F8-$1FF when the next skip bit is 1 or does a return clear the skip pattern shifter (if the skip freeze counter = 0)?

1) They are working NOW.
2) Yes, no caveats.
3) Nothing bad happens. XBYTE sets a new SKIPF pattern.

Excellent! I'm really pleased about this, for all of us. Thanks Chip.

You're welcome. This improves the bytecode density for variable bitfields in Spin, removing a shortcoming that was there.

TonyB · 2017-06-18 00:17

cgracey wrote: »

I have mixed feelings about allowing interrupts, since they would introduce hidden state data (skip bits waiting in the wings) during debug interrupts. What do you think?

The ensuing debate was short but, thinking about it some more, the choice was a false one.

If these skip bits waiting in the wings might cause trouble (not yet proven) then it would be wrong to always allow interrupts. The real benefits such as instant interrupts and the ability to single-step skipping (needed here more than anywhere else perhaps) mean it would be wrong to never allow interrupts.

The solution is to make this programmable and there a couple of options, both within debug ISRs:

1. Modify ALLOWI behaviour to allow skip interrupts. Once enabled they would remain so until cog stopped or a reset. No need for corresponding STALLI. Default state on cog start is disallowed.

2. Add skip interrupt enable bit to SETBRK write data. Possibly also a read bit for "any 1 bits left in skip pattern?"

No new instruction required.

ozpropdev · 2017-06-18 01:50

The "hidden" debug interrupt has to be setup before the cog is launched and cannot be set after cog launch.
Therefore the user has already made a conscience decision to use single stepping, breakpoint, etc.
Perhaps a flag set by the write to hub $FFC0 + (cog * 4) is all that is needed to allow ONLY debug interrupts during a SKIP code sequence.

TonyB · 2017-06-18 23:30

Normal interrupts don't appear to be a problem. For some reason, it is considered too risky to allow interrupts during skipping because "they would introduce hidden state data (skip bits waiting in the wings) during debug interrupts."

I assume hidden means not readable directly. Skip bits are hidden but not unknown and isn't Q also hidden?

It would help if the potential problems during debug interrupts were explained. They are essentially the same as CALLs in terms of what happens to the skip bits. To be honest, I don't know why we have to mess about and can't have skip interrupts all the time.

How does XBYTE debugging with interrupts compare to that without?

cgracey · 2017-06-20 00:21

TonyB wrote: »

Normal interrupts don't appear to be a problem. For some reason, it is considered too risky to allow interrupts during skipping because "they would introduce hidden state data (skip bits waiting in the wings) during debug interrupts."

I assume hidden means not readable directly. Skip bits are hidden but not unknown and isn't Q also hidden?

It would help if the potential problems during debug interrupts were explained. They are essentially the same as CALLs in terms of what happens to the skip bits. To be honest, I don't know why we have to mess about and can't have skip interrupts all the time.

How does XBYTE debugging with interrupts compare to that without?

My big concern was that it may be too complex for me to figure out how to allow interrupts during skipping, not just that there would be hidden state data.

I thought I would do an experiment and take the obvious and simple path, which might be too simplistic, but it would let me see if the functionality was even in the ball park. So, I added a simple skip disabler that just inhibited skipping if an interrupt service routine was executing. Very simple. To my surprise, it seems to be all that was needed! Skipping is working with all interrupts now. You can single-step through it. I'm really glad about that, because I hated that single stepping was opaque, not to mention causing interrupt latencies to grow.

So, skipping now works with interrupts.

jmg · 2017-06-20 00:32

cgracey wrote: »

So, skipping now works with interrupts.

Great ! That's what most users will expect, so is easiest to explain.

Is this working in all forms (abs, relative) and COG/HUB, as well as the 2 versions of SKIP ?

SKIP opcodes are niche ones, so there will always be some rules around use, but best if those caveats do not spread too far into other operations & 'unexpected areas'

ozpropdev · 2017-06-20 01:39

Fabulous! Perfect timing Chip!

I had just started on a bytecode simulator this morning.
I had already tried a breakpoint in some xbyte code which fired the interrupt Ok but then broke the xbyte operation.
Thanks Chip!

David Betz · 2017-06-20 01:50

cgracey wrote: »

TonyB wrote: »

Normal interrupts don't appear to be a problem. For some reason, it is considered too risky to allow interrupts during skipping because "they would introduce hidden state data (skip bits waiting in the wings) during debug interrupts."

I assume hidden means not readable directly. Skip bits are hidden but not unknown and isn't Q also hidden?

It would help if the potential problems during debug interrupts were explained. They are essentially the same as CALLs in terms of what happens to the skip bits. To be honest, I don't know why we have to mess about and can't have skip interrupts all the time.

How does XBYTE debugging with interrupts compare to that without?

My big concern was that it may be too complex for me to figure out how to allow interrupts during skipping, not just that there would be hidden state data.

I thought I would do an experiment and take the obvious and simple path, which might be too simplistic, but it would let me see if the functionality was even in the ball park. So, I added a simple skip disabler that just inhibited skipping if an interrupt service routine was executing. Very simple. To my surprise, it seems to be all that was needed! Skipping is working with all interrupts now. You can single-step through it. I'm really glad about that, because I hated that single stepping was opaque, not to mention causing interrupt latencies to grow.

So, skipping now works with interrupts.

One thing this won't work with is if you decide to implement a time-slice scheduler that switches tasks on a timer interrupt. In that case you'll be returning to different code after the interrupt. I suppose that isn't likely on the Propeller though since independent tasks are likely to be run on separate COGs.

cgracey · 2017-06-20 03:07

ozpropdev wrote: »

Fabulous! Perfect timing Chip!
I had just started on a bytecode simulator this morning.
I had already tried a breakpoint in some xbyte code which fired the interrupt Ok but then broke the xbyte operation.
Thanks Chip!

Can I send you a Prop123-A9 image really quick, so that you can try it out? Do you need more than two cogs? It's fast to compile smaller images.

XBYTE question

Comments