Fast Bytecode Interpreter

msrobots · 2017-03-15 18:29

Since this is replacing the condition if_never it should occupy the place used for conditions to avoid exactly this possible confusion.

AUTORET seems fine to me.

I do not like the idea of letting the assembler automagically replace ret instructions by AUTORET on the previous instruction.

If you want to do so, write it like that in the first place.

Enjoy!

Mike

jmg · 2017-03-15 18:51

cgracey wrote: »

I really like RET_AFTER. Is there any way to get that under 8 characters, though, so that it can sit at the tab stop just prior to the instruction? Then, it aligns with most IF_x prefixes.

RET_FIN for ret after finish, or ret finally ?

cgracey wrote: »

I agree with the reasoning about RET being placed along with WC and WZ, after the instruction and operands, but it would be marooned way out there. I think this thing needs to go where the conditionals go, so that it jumps out in a consistent position. Otherwise, it gets lost in the noise.

I think for that reason RET_ need to be the leading root word, so that excludes AUTORET and P_RET etc

Of course, I also find RET_ marooned as a prefix too, and the clearest code to visually scan is the ret on its own line I proposed. Let the PC do what it is good at.

You can easily support both formats, so those who like more clutter, but fewer lines, can code that way.

jmg · 2017-03-15 19:02

potatohead wrote: »

If we don't do that, labels can get confusing. Say someone wants to jmp to the ret as an early exit, or something similar. They will have trouble.

I'm not following this ?
If someone wants to jump to ret, surely they need a label to jump to & the label solves the problem.
The code works as their glance suggested it would, in both cases.

Heater. wrote: »

I get the idea. It's just that now instructions as written may generate a variable number of actual instructions. Which makes all that instruction counting that people like to do on the Propeller, to get timing right, rather more difficult.

I have always campaigned for simplicity in the P2, but that boat seems to have sailed a long time ago.

Many boats sailed long ago.

Tedious 'counting instructions' used to mostly work on older MCUs, but these days opcode timings on many CPUs are more complex. Even P2 depends on where the code is run.

The simplicity I prefer, is that of code-clarity and easily scanned source and LST files.
Thankfully, that is still around, and on many MCUs.

cgracey · 2017-03-15 19:23

I thought of RET_FIN, too. My thinking was that the latin FIN means END, which works.

RET_FIN doesn't look that good in code, though. It's kind of long.

_RET_ may be awkward, but it really pops out.

msrobots · 2017-03-15 19:27

maybe it is just that, a prefix

RET_ whatever instruction.

Ending underscore to show that it combines with the following instruction.

On the other hand we already have PASM instructions automagically producing two longs of code for extended constants so counting instructions is no longer counting clock cycles anyways.

So @jmg might be right and we could let the assembler sort this out. Just write the RET in its own line, and if no label in front of it and the previous instruction has no condition modifier it will attach to the previous instruction.

So you do not need to think about it while writing code, just use RET as before and used in most assembly languages.

It will use the shortest and fastest version by default, and if you need the explicit RET for some reason, just put a label there.

And no marooned clutter at all, easy to read, write and understand.

Enjoy!

Mike

Heater. · 2017-03-15 19:34

Hmm..."tedious instruction counting" was never so easy as it was on the Propeller. Many older MCU and microprocessors had variable length instructions and different clocks for different ops that made it quite a chore.

Instruction counting was used to great effect by many people using the Prop and building serial or video drivers etc.

All that simplicity has sailed away, as you say.

The only other device I know whose build tools can tell you how long your code will take to run is from XMOS.

I'm all for code-clarity and easily scanned source files. Sadly not much of that around now a days.

potatohead · 2017-03-15 19:56

RET_AFTR

Just drop the E.

I'm in favor of it going in the conditionals.

I really like _RET_

potatohead · 2017-03-15 20:03

RET_BNT

Return on branch not taken?

jmg · 2017-03-15 20:08

Heater. wrote: »

Hmm..."tedious instruction counting" was never so easy as it was on the Propeller. Many older MCU and microprocessors had variable length instructions and different clocks for different ops that made it quite a chore.

Newer MCUs tend to have conditional opcodes that vary with taken/not taken, just a natural consequence of pipelines.
That makes good ol' eye-ball estimates of exact timing quite hard, especially across conditionals.

The P2 is now in that group, and the Cycles column for COGexec vary from 2 to 41 and HUBexec from 2 thru 70

Heater. wrote: »

The only other device I know whose build tools can tell you how long your code will take to run is from XMOS.

Simulators, and actual hardware measurement, are the other approach, required by the more complex cycles counts.
P2 at least, makes self-hardware timing somewhat easy.

8051 and AVR have good simulation tools, and I believe the AVR8 simulator engine is derived from the Verilog files.

That last detail is impressive, and maybe that can be done for the P2 ?
Verilog derivation is the ideal, as it includes the peripheral delay effects, and all the 'warts' of the final device.
However, it comes with a time overhead, & I suspect a verilog-to-exe P2 simulator may be too slow ?

Seairth · 2017-03-15 20:09

The thing is, we are now encoding both predicates and postdicates(???) in the same field. This is partly why there's desire to stick the "ret" with the "wc" and "wz". Except, the "wc" and "wz" aren't postdicates. They're part of the actual instruction. Since I never really liked those flags there, I'll suggest a format that changes up a couple things.

Here's a prior snippet of Chip's code:

con_bx		rfbyte	y
		decod	x,y
		testb	y,#5	wc
	if_c	sub	x,#1
		testb	y,#6	wc
	if_c	not	x
	_ret_	pusha	x

To emphasize the predicate and postdicate nature, as well as to address the "wc" and "wz", suppose the following variation:

con_bx              rfbyte      y
                    decod       x,y
                    testb.c     y,#5
        pre(c)      sub         x,#1
                    testb.c     y,#6
        pre(c)      not         x
        post(ret)   pusha       x

Personally, I find this at least as readable as the original version, but I prefer it for a couple reasons:
* The predicates and postdicates are obvious
* The "c" and "z" flag are part of the instruction instead of being somewhere to the right of the parameter(s).

The one thing I don't really like is that "pre()" and "post()" are somewhat verbose for an otherwise terse syntax. So, I might suggest the following simplification:

con_bx          rfbyte      y
                decod       x,y
                testb.c     y,#5
        c:      sub         x,#1
                testb.c     y,#6
        c:      not         x
        :ret    pusha       x

I still find this very readable. The colon position helps emphasize the predicate/postdicate quality. I doubt that Chip will come up with any more postdicates, but who really knows! Just in case he does, this syntax would still be usable.

potatohead · 2017-03-15 20:15

con_bx		rfbyte	y
		decod	x,y
		testb	y,#5	wc
	if_c	sub	x,#1
		testb	y,#6	wc
	if_c	not	x
	     	pusha	x 	_ret_

Frankly, I like putting it in the effects column the most. This does work after an instruction is processed.

And it could just be RET, no underscore.

potatohead · 2017-03-15 20:21

The other terse syntaxes seem too far afield from existing PASM for me, BTW

potatohead · 2017-03-15 20:23

con_bx		rfbyte	y
		decod	x,y
		testb	y,#5	wc
	if_c	sub	x,#1
		testb	y,#6	wc
	if_c	not	x
      if_nxt    pusha	x 	ret

Since it is related to both, put something in both fields to reflect that. This does solve the need to put a lot of meaning into a single identifier too.

And it reads like "if next instruction, return"

jmg · 2017-03-15 20:55

potatohead wrote: »
con_bx		rfbyte	y
		decod	x,y
		testb	y,#5	wc
	if_c	sub	x,#1
		testb	y,#6	wc
	if_c	not	x
	     	pusha	x 	_ret_
Frankly, I like putting it in the effects column the most. This does work after an instruction is processed.

And it could just be RET, no underscore.

Yup, and then add the simple step of the assembler tolerates CrLf as white space, these are equivalent one-opcode forms

	     	pusha	x 	ret

	     	pusha	x
	     	ret

and this, gives 2 opcodes

	     	pusha	x
label
	     	ret

jmg · 2017-03-15 21:02

Seairth wrote: »
....
con_bx          rfbyte      y
                decod       x,y
                testb.c     y,#5
        c:      sub         x,#1
                testb.c     y,#6
        c:      not         x
        :ret    pusha       x
I still find this very readable. The colon position helps emphasize the predicate/postdicate quality.

I like the .c opcode variant idea, helps avoid the 'too many columns syndrome'.

However, Colons tag labels in most assemblers, so this is not as user-portable, and they are harder to see visually.

eg GAS has labels like this, which is pretty much industry wide.

_main:

The words should indicate what the operation does, so if_c wins on that count.
c alone makes everyone reach for the manual, and makes ASM more like forth...

Heater. · 2017-03-15 21:03

jmg,

Simulators, and actual hardware measurement, are the other approach, required by the more complex cycles counts.
P2 at least, makes self-hardware timing somewhat easy.

The XMOS compiler I'm referring to does not simulate or do hardware measurements. It calculates, at compile time, how long sequences of instructions will take. It can do that because of the regular timing of XMOS instructions. Effectively it is instruction counting for you. Except whilst using a high level language.

Of course when your code hits a shared resource it's all but impossible to keep track of execution time. Such is the case with HUB RAM on the Prop or receiving data over a channel from some other core.

Of course "instruction counting" is not the preferred way to time things on XMOS. All I/O can be timed precisely, clocked in and out according to the system clock. You code can be a bit sloppy in it's execution time, but it's good that the compiler can tell you when you are going to miss your deadlines.

I have not followed P2 developments so much recently but I get the impression that the "smart pins" are intended to make instruction counting less necessary. In the XMOS style.

jmg · 2017-03-15 21:10

Heater. wrote: »

I have not followed P2 developments so much recently but I get the impression that the "smart pins" are intended to make instruction counting less necessary. In the XMOS style.

Certainly Smart Pin cells and the streamer will handle many timing problems.

I expect in P2, the questions will be more around 'possible maximum cycles', and some of the cycle ranges in the P2 docs look a little scary.

I'm unclear, even with hardware self-timing, how you can prove you have covered a worst-case path ?

A simulator could be instructed to take the worst-case time, or even do some min-max bouncing to allow fastest and slowest coverage.

potatohead · 2017-03-15 21:24

Re: syntax

I get it jmg, just don't like the ambiguity of it. I feel it's important to note something in the conditional field.

jmg · 2017-03-15 21:41

potatohead wrote: »

Re: syntax

I get it jmg, just don't like the ambiguity of it. I feel it's important to note something in the conditional field.

That's fine too - notice my suggestion supports both forms.

Heater. · 2017-03-15 22:09

jmg,

I'm unclear, even with hardware self-timing, how you can prove you have covered a worst-case path ?

In most programming languages such proof is impossible. For example a compiler may be able to calculate how long it takes to do one iteration of a loop but it has no idea what value the loop counter or condition will have at run time, so it cannot tell you how long a loop takes to complete.

But there is an easy way:

Arrange that your lumps of code are started on some clock tick. Your code runs from top to bottom. At the end it exits, it's dead until it gets restarted, at the top, on the next clock tick. Your code is written in a high level language with expressions, operators, functions, conditionals etc. BUT it has no loop constructs. No "while" or "for" or "goto" etc. In the absence of loops the compiler can calculate the execution time of every possible pathway from the top of your code to the bottom. Build a program out of hundreds of such lumps of code (modules) and your build system can report exactly the execution time of the whole thing. Basically your code is a Directed Acyclic Graph (DAG) so it becomes very easy to reason about.

This is how we were building avionic control systems 30 years ago. Some of the most easy to test, fault free and predictable code I have ever seen.

I probably fibbed a bit when describing the timing analysis of the XMOS tools. I'm pretty sure it cannot tell you the execution time of any arbitrary function full of weird loops and such. It can tell you the execution time from point A in your code to point B, assuming no loops along the way. It does the DAG part. It's up to you to structure your code so as to allow the analysis tools to help you.

David Betz · 2017-03-15 22:17

How important is it on P2 to control timing exactly? Don't the smart pins take care of a lot of that for you?

cgracey · 2017-03-15 22:45

The trouble with MNEM.C.Z is that some mnemonics are already 7 characters long, which means a tab always gets you into the operand column. Having .C and .Z suffixes on mnemonics would necessitate another tab to get to operands, consistently.

cgracey · 2017-03-15 22:47

David Betz wrote: »

How important is it on P2 to control timing exactly? Don't the smart pins take care of a lot of that for you?

Timing gets crazy when you access hub memory, particularly during hub-exec, and especially when doing hub-based stack calls during hub-exec.

For code that runs in the cog or LUT, timing is very simple.

jmg · 2017-03-15 23:08

Heater. wrote: »

I'm unclear, even with hardware self-timing, how you can prove you have covered a worst-case path ?

In most programming languages such proof is impossible. For example a compiler may be able to calculate how long it takes to do one iteration of a loop but it has no idea what value the loop counter or condition will have at run time, so it cannot tell you how long a loop takes to complete.

I was not even meaning to that level of complexity, but more even just in-line P2 code...

The P2 opcode timing have many entries like 2...17 & 2...41 & 9...24 etc

I think Chip mentioned he individually timed those, with a scatter test.
That's ok for one opcode under test, but what if you have a series of them ?

eg Once one slot-aligns, does that reduce the variance of later ones, or not ?

ozpropdev · 2017-03-15 23:35

The easiest solution with the least amount of modifications to Pnut is to
simply change the "IF_NEVER" condition to "RET_AFTER".
As this new return feature uses the condition bits it makes sense to position
the command in the condition location in the nstruction line.
As for string length and tab alignment, the larger condition names well and
truly exceed 7 characters.

	if_nc_and_nz	add	r0,r1
	ret_after	add	r2,r3

IMHO The purpose of "RET_AFTER" is quite clear.

Roy Eltham · 2017-03-16 00:37

Honestly, I don't care what the syntax for this _ret_ thing is, as long as it's documented.
ASM for any CPU or MCU has weird syntax with weird shortened instruction names, it all just needs documenting for reference.

If it was up to me, I'd make all the instructions and modifiers have longer more meaningful names. We have screens that can display 100, 200, or even more columns of text. My current editor setup has 218 columns of text using a nice readable fixed width font. Why must we pack everything into a few 8 space tab stops such that your code is all crammed into the first 24 columns? The editor can be configured to whatever tab stops are useful to make the PASM code line up nice.

"All the typing" you say? What code editor these days doesn't have tab completion of keywords and context based completion of non-keywords? If you are stuck in such an editor, please do yourself a favor and upgrade.

jmg · 2017-03-16 00:52

Roy Eltham wrote: »

If it was up to me, I'd make all the instructions and modifiers have longer more meaningful names. We have screens that can display 100, 200, or even more columns of text. My current editor setup has 218 columns of text using a nice readable fixed width font. Why must we pack everything into a few 8 space tab stops such that your code is all crammed into the first 24 columns? The editor can be configured to whatever tab stops are useful to make the PASM code line up nice.

That's a very good point, and systems in the future will be even more capable.
In my code, I always favour clarity over terse & I see the same trend in vendor naming conventions.

Even a $10 RasPi does not need 8 space tabs, and most editors can save tabs as spaces, so making portable source files.

potatohead · 2017-03-16 04:12

ozpropdev wrote: »
The easiest solution with the least amount of modifications to Pnut is to
simply change the "IF_NEVER" condition to "RET_AFTER".
As this new return feature uses the condition bits it makes sense to position
the command in the condition location in the nstruction line.
As for string length and tab alignment, the larger condition names well and
truly exceed 7 characters.
	if_nc_and_nz	add	r0,r1
	ret_after	add	r2,r3
IMHO The purpose of "RET_AFTER" is quite clear.

Let's do this. Least change, makes sense.

cgracey · 2017-03-16 04:24

potatohead wrote: »
ozpropdev wrote: »
The easiest solution with the least amount of modifications to Pnut is to
simply change the "IF_NEVER" condition to "RET_AFTER".
As this new return feature uses the condition bits it makes sense to position
the command in the condition location in the nstruction line.
As for string length and tab alignment, the larger condition names well and
truly exceed 7 characters.
	if_nc_and_nz	add	r0,r1
	ret_after	add	r2,r3
IMHO The purpose of "RET_AFTER" is quite clear.
Let's do this. Least change, makes sense.

I went to do that, but realized it will force an extra tab (I know, some hate tabs) for lines with labels:

con_n1	_ret_	pusha	neg1
con_0	_ret_	pusha	#0
con_1	_ret_	pusha	#1
con_2	_ret_	pusha	#2
con_3	_ret_	pusha	#3
con_4	_ret_	pusha	#4
con_7	_ret_	pusha	#7
con_8	_ret_	pusha	#8
con_15	_ret_	pusha	#15
con_16	_ret_	pusha	#16
con_31	_ret_	pusha	#31
con_32	_ret_	pusha	#32

jmg · 2017-03-16 04:34

cgracey wrote: »
I went to do that, but realized it will force an extra tab (I know, some hate tabs) for lines with labels:
con_n1	_ret_	pusha	neg1
con_0	_ret_	pusha	#0
con_1	_ret_	pusha	#1
con_2	_ret_	pusha	#2
con_3	_ret_	pusha	#3
con_4	_ret_	pusha	#4
con_7	_ret_	pusha	#7
con_8	_ret_	pusha	#8
con_15	_ret_	pusha	#15
con_16	_ret_	pusha	#16
con_31	_ret_	pusha	#31
con_32	_ret_	pusha	#32

? do you mean the issue is that that becomes this ?

con_n1  ret_after       pusha   neg1
con_0   ret_after       pusha   #0
con_1   ret_after       pusha   #1
con_2   ret_after       pusha   #2
con_3   ret_after       pusha   #3
con_4   ret_after       pusha   #4
con_7   ret_after       pusha   #7
con_8   ret_after       pusha   #8
con_15  ret_after       pusha   #15
con_16  ret_after       pusha   #16
con_31  ret_after       pusha   #31
con_32  ret_after       pusha   #32

with ASM as column 25 ?
That looks fine to me, for anyone wanting label+prefix+mnemonic on one line.
I looked at some of my ASM, and Col25 is common, to allow for reasonable length label names anyway.

Fast Bytecode Interpreter

Comments