TIA - An interactive inline assembler for TAQOZ

rogloh · 2020-03-12 05:36

This undocumented behaviour needs to be documented.

evanh · 2020-03-12 05:44

Yeah, I think so now too. Originally, Chip had just said don't use a branch as the last instruction of the block. But, since this issue has cropped up a few times already, it's clearly going to keep reoccurring over the life of the prop2.

Another alternative to using an absolute branch is to compensate the REP offset by adding the block length to the relative branch distance.

Yanomani · 2020-03-12 07:27

Electrodude wrote: »

Using GETPTR instead of GETCT should shave off two instructions, and it doesn't require STALLI and ALLOWI:

                  rdfast   #0, a
                  rep      #2, #0 ' loop 2 next instructions forever until we branch out
                  rfbyte   x wcz
   if_z_or_c      jmp      #exit  ' leave loop
exit              getptr   b
                  sub      b,a
                  sub      b, #2 ' adjust lenght for extra rfbyte, done just before loop exits

P.S. my fault...
xor a,b
xor b,a
xor a,b

                     getptr    a
                     ret            ' now, on ret, a contains next string start address
'                                  b contains last string lenght, excluding end==0 or >127
'                                  c and z would reflect the value of next string first character,
'                                  useful to get each string lenght, from a list terminated by a null
'                                  string, which uses the other possible terminator char.

In the hope I've got it right, after adapting it, due to evanh's warning about the extra pass, caused by the use of a relative branch to exit the rep block, and extracting another usefull result, or two, indeed...

"If you only have some saussage and bread, then be ensured you'd make the better saussage sandwich you can."

TonyB_ · 2020-03-12 10:14

deleted

TonyB_ · 2020-03-12 10:26

Electrodude wrote: »
Using GETPTR instead of GETCT should shave off two instructions, and it doesn't require STALLI and ALLOWI:
		rdfast	#0,a
		rep	#2,#0		' loop 2 next instructions forever until we branch out
		rfbyte	x	wcz
  if_z_or_c	jmp	#exit		' leave loop
exit		getptr	b
		sub	b,a
		sub	b,#1		' don't count trailing null byte
		ret

Is there any reason not to use _ret_ ?

It's possible to replace jmp with execf to jump with skipping, if latter ever needed.

Peter Jakacki · 2020-03-12 15:07

After adapting the getptr method I found that there is some peculiar inconsistency.
This is my code:

code LEN ' ( str -- len )
	rdfast	#0, a
	rep	#2, #0	' loop 2 next instructions forever until we branch out
        rfbyte	x wcz
 {c|z}	ijnz	a,#l0	' leave loop but compensate length calc
.l0	getptr   x
 _ret_	subr     a, x	' a = x-a'
 	end

I then load this into cog memory and create an alias LEN that points to this code. Now I try it out on a string:

TAQOZ# " ABCDEFGHIJKLMNOPQRSTUVWXYZ" LEN . --- 26  ok

So I tried it on a 64kB block by first filling the block with a valid character and then terminating and checking it.

TAQOZ# $1.0000 $1.0000 'A' FILL  ---  ok
TAQOZ# 0 $2.0000 C! ---  ok
TAQOZ# $1.FFF0 $20 DUMP --- 
1FFF0: 41 41 41 41  41 41 41 41  41 41 41 41  41 41 41 41     'AAAAAAAAAAAAAAAA'
20000: 00 5A 82 53  C3 A7 C7 A2  86 89 28 D8  E5 1E E1 E1     '.Z.S......(.....' ok

Then I run it expecting 65536

TAQOZ# $1.0000 LEN . --- 65537  ok

Timing wise it works out to 4 cycles per character of course with some overhead:

TAQOZ# $1.0000 LAP LEN LAP .LAP --- 262,208 cycles= 1,311,040ns @200MHz ok

edit: It seems to be to do with the memory area. I've dropped right down to 32 bytes terminated in $10000 and compared that against a 32 byte string.

TAQOZ# $1.0000 $40 DUMP --- 
10000: 41 41 41 41  41 41 41 41  41 41 41 41  41 41 41 41     'AAAAAAAAAAAAAAAA'
10010: 41 41 41 41  41 41 41 41  41 41 41 41  41 41 41 41     'AAAAAAAAAAAAAAAA'
10020: 00 41 41 41  41 41 41 41  41 41 41 41  41 41 41 41     '.AAAAAAAAAAAAAAA'
10030: 41 41 41 41  41 41 41 41  41 41 41 41  41 41 41 41     'AAAAAAAAAAAAAAAA' ok
TAQOZ#  ---  ok
TAQOZ# " ABCDEFGHIJKLMNOPQRSTUVWXYZ123456" LEN . --- 32  ok
TAQOZ# $1.0000 LEN . --- 33  ok

TonyB_ · 2020-03-12 16:08

Peter Jakacki wrote: »
After adapting the getptr method I found that there is some peculiar inconsistency.
This is my code:
code LEN ' ( str -- len )
	rdfast	#0, a
	rep	#2, #0	' loop 2 next instructions forever until we branch out
        rfbyte	x wcz
 {c|z}	ijnz	a,#l0	' leave loop but compensate length calc
.l0	getptr   x
 _ret_	subr     a, x	' a = x-a'
 	end

Did you see Evan's post?
http://forums.parallax.com/discussion/comment/1491515/#Comment_1491515

IJNZ is clever way to save an instruction, which I tried to do this morning but failed. How about this:

code LEN ' ( str -- len )
	mov	b, #.l0
	rdfast	#0, a
	rep	#2, #0	' loop 2 next instructions forever until we branch out
        rfbyte	x wcz
 {c|z}	ijnz	a,b	' leave loop but compensate length calc
.l0	getptr   x
 _ret_	subr     a, x	' a = x-a'
 	end

MOV adds a long that IJNZ removes, though.

TonyB_ · 2020-03-12 16:25

Or since initial a is not being preserved:

code LEN ' ( str -- len )
	rdfast	#0, a
	add	a, #1	' compensate length calc
	rep	#2, #0	' loop 2 next instructions forever until we branch out
        rfbyte	x wcz
 {c|z}	jmp	#\.l0	' leave loop
.l0	getptr   x
 _ret_	subr     a, x	' a = x-a'
 	end

evanh · 2020-03-12 21:14

Another variation

code LEN ' ( str -- len )
	rdfast	#0, a
	rep	#2, #0	' loop 2 next instructions forever until we branch out
        rfbyte	x wcz
 {c|z}	ijnz	a,#l0+2	' leave loop but compensate length calc
.l0	getptr   x
 _ret_	subr     a, x	' a = x-a'
 	end

rogloh · 2020-03-12 21:22

Could a new REP loop break out of an existing REP loop, like this?

code LEN ' ( str -- len )
	rdfast	#0, a
	rep	#2, #0	' loop 2 next instructions forever until we branch out
        rfbyte	x wcz
 {c|z}	rep     #1, #1
   	getptr  x
 _ret_	subr    a, x	' a = x-a'
 	end

evanh · 2020-03-12 21:37

Lol, I haven't tried that one ... tested ... still has the offset. I guess that figures since a new REP is still a relative setup.

EDIT: Ah, eek, it does something extra with the second REP. It seems to concatenate the two REPs together. You get a mix of both.

rogloh · 2020-03-12 21:54

evanh wrote: »

EDIT: Ah, eek, it does something extra with the second REP. It seems to concatenate the two REPs together. You get a mix of both.

Wha?

evanh · 2020-03-12 22:00

In the below snippet:
misc2 increments 4 times
misc3 increments 3 times

		...
		rep	#3, #0
		add	misc2, #1
		nop
		rep	#1, #3
		add	misc3, #1
		nop
		nop
		nop
		...

rogloh · 2020-03-12 22:16

That code is a bit different to what I have as it is missing the conditional that I had there to stop the second REP until the last case character matching 0 or >127 case that breaks out.

Peter Jakacki · 2020-03-12 22:31

Just a quick check again this morning before I rush out, but using the same routine I get varying results. With a 32 character string it will report a length of 33 at times depending upon the memory area.

See if you can follow what I'm doing (haven't enough time to comment this, but $! will copy a string at an address to an address):

TAQOZ# $1.0000 PRINT$ --- ABCDEFGHIJKLMNOPQRSTUVWXYZ123456 ok
TAQOZ# $1.0000 LEN . --- 32  ok
TAQOZ# $1.0000 $30 DUMP --- 
10000: 41 42 43 44  45 46 47 48  49 4A 4B 4C  4D 4E 4F 50     'ABCDEFGHIJKLMNOP'
10010: 51 52 53 54  55 56 57 58  59 5A 31 32  33 34 35 36     'QRSTUVWXYZ123456'
10020: 00 D8 CA 0B  B0 34 F1 C3  02 B6 40 9C  4A 45 50 63     '.....4....@.JEPc' ok
TAQOZ# $1.0000 $7.8000 $! ---  ok
TAQOZ# $7.8000 PRINT$ --- ABCDEFGHIJKLMNOPQRSTUVWXYZ123456 ok
TAQOZ# $7.8000 LEN . --- 32  ok
TAQOZ# $7.0000 $1000 'A' FILL ---  ok
TAQOZ# 0 $7.0020 C! ---  ok
TAQOZ# $7.0000 LEN . --- 33  ok
TAQOZ# $7.0000 $30 DUMP --- 
70000: 41 41 41 41  41 41 41 41  41 41 41 41  41 41 41 41     'AAAAAAAAAAAAAAAA'
70010: 41 41 41 41  41 41 41 41  41 41 41 41  41 41 41 41     'AAAAAAAAAAAAAAAA'
70020: 00 41 41 41  41 41 41 41  41 41 41 41  41 41 41 41     '.AAAAAAAAAAAAAAA' ok
TAQOZ# $7.8000 $7.0000 $! ---  ok
TAQOZ# $7.0000 $30 DUMP --- 
70000: 41 42 43 44  45 46 47 48  49 4A 4B 4C  4D 4E 4F 50     'ABCDEFGHIJKLMNOP'
70010: 51 52 53 54  55 56 57 58  59 5A 31 32  33 34 35 36     'QRSTUVWXYZ123456'
70020: 00 41 41 41  41 41 41 41  41 41 41 41  41 41 41 41     '.AAAAAAAAAAAAAAA' ok
TAQOZ# $7.0000 LEN . --- 33  ok
TAQOZ# $7.0000 $7.1080 $! ---  ok
TAQOZ# $7.1080 LEN . --- 33  ok
TAQOZ# $7.8000 LEN . --- 32  ok
TAQOZ# $7.8000 $30 DUMP --- 
78000: 41 42 43 44  45 46 47 48  49 4A 4B 4C  4D 4E 4F 50     'ABCDEFGHIJKLMNOP'
78010: 51 52 53 54  55 56 57 58  59 5A 31 32  33 34 35 36     'QRSTUVWXYZ123456'
78020: 00 C8 6A 0C  58 C6 41 C7  00 B2 31 02  1E DD 96 C6     '..j.X.A...1.....' ok

evanh · 2020-03-12 22:45

rogloh wrote: »

That code is a bit different to what I have as it is missing the conditional that I had there to stop the second REP until the last case character matching 0 or >127 case that breaks out.

It's certainly not going to do what you want.

rogloh · 2020-03-12 22:55

evanh wrote: »

rogloh wrote: »

That code is a bit different to what I have as it is missing the conditional that I had there to stop the second REP until the last case character matching 0 or >127 case that breaks out.

It's certainly not going to do what you want.

Ok it sounds like the whole end of REP loop thing is going to be an issue to deal with on exiting loops.
I wonder if using

if_c_or_z   REP #0, #1

would behave any differently?

evanh · 2020-03-12 23:06

That's the same result.

The above examples that Tony and myself posted should all work correctly - https://forums.parallax.com/discussion/comment/1491558/#Comment_1491558

Three options:
- Add a compensating block length offset to the relative immediate.
- or, use register direct, which is always an absolute address.
- or, use an instruction that can encode absolute immediate, like JMP.

Peter Jakacki · 2020-03-13 02:57

Nobody has commented on the weird discrepancies I'm finding. Testing with this routine:

TAQOZ# code LEN ' ( str -- len )
071E0 FC78_0022         rdfast  #0, a         
071E4 FCDC_0400         rep     #2, #0  ' loop 2 next instructions forever until we branch out
071E8 FD78_1610         rfbyte  x wcz
071EC EB8C_45FF   {c|z} ijnz    a,#l0   ' leave loop but compensate length calc
071F0 FD60_1634 .l0     getptr   x
071F4 02C0_440B   _ret_ subr     a, x   ' a = x-a'
                        end ---  ok

(The ijnz forward reference lists before it is resolved but it's fine: TAQOZ# $71EC @ .L --- $EB8C_4400 ok)

Now I store a 32 character string and check it out:

TAQOZ# " AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA" $1.0000 $! ---  ok
TAQOZ# $1.0000 $30 DUMP --- 
10000: 41 41 41 41  41 41 41 41  41 41 41 41  41 41 41 41     'AAAAAAAAAAAAAAAA'
10010: 41 41 41 41  41 41 41 41  41 41 41 41  41 41 41 41     'AAAAAAAAAAAAAAAA'
10020: 00 41 41 41  41 41 41 41  41 41 41 41  41 41 41 41     '.AAAAAAAAAAAAAAA' ok

How long is the 32 character string?

TAQOZ# $1.0000 LEN . --- 33  ok

Wrong!

Try terminating the terminator and see what happens:

TAQOZ# $A4 $1.0021 C! ---  ok
TAQOZ# $1.0000 LEN . --- 32  ok
TAQOZ# 0 $1.0021 C! ---  ok
TAQOZ# $1.0000 LEN . --- 32  ok
TAQOZ# $41 $1.0021 C! ---  ok
TAQOZ# $1.0000 LEN . --- 33  ok

Weird? RIght?

evanh · 2020-03-13 03:07

Peter,
Fix your code as per the discussion, then try again.

Peter Jakacki · 2020-03-13 03:08

everybody has these variations. I haven't had time to check them all.

rogloh · 2020-03-13 03:14

Try evanh's latest posted version called "Another variation".

rogloh · 2020-03-13 03:30

When that works, you can try the unrolled speedups if you care to spare the extra long(s). In theory it could boost performance by up to 1.6x for really long strings with 4 rfbyte unrolls vs single rfbyte in the loop. Only tiny strings are faster without unrolling.

I charted the following cases for the speedup gain with 2,3,4 rfbyte unrolls including the execution time of this subroutine itself (including ret and call overhead etc, plus an assumption of the average rdfast execution time of 15.5 clocks) and got this result.

evanh · 2020-03-13 03:52

Peter Jakacki wrote: »

everybody has these variations. I haven't had time to check them all.

They're all to work around the same issue. Choose the one that suits you best.

Peter Jakacki · 2020-03-13 07:06

The reality is that strings are normally very short but the other place I would use this is to scan a buffer 512 byte sector for a terminator. The easy peasy 6 cycle method is fine for this, but still I want to see what's going on.

@evanh - I can't see how your +2 on the ijnz helps and it definitely locks up when I try it.

code LEN ' ( str -- len )
	rdfast	#0, a
	rep	#2, #0	' loop 2 next instructions forever until we branch out
        rfbyte	x wcz
 {c|z}	ijnz	a,#l0+2	' leave loop but compensate length calc
.l0	getptr   x
 _ret_	subr     a, x	' a = x-a'
 	end

You are essentially saying that it should jump to just after the last instruction, whatever that is.

evanh · 2020-03-13 09:05

Oh, is that code hubexec? Maybe it needs to be +8 instead of +2. Hmm, that'll suck having to customise ...

Peter Jakacki · 2020-03-13 09:24

evanh wrote: »

Oh, is that code hubexec? Maybe it needs to be +8 instead of +2. Hmm, that'll suck having to customise ...

It's compiled in the hub memory but I copy it to cog for this test and run it as there is no way you can use rdfast otherwise, even if rep limps along.

evanh · 2020-03-13 10:27

Ha! Interesting, the offset effect doesn't happen when the REP block is executing in hubram. All seems to be ordinary behaviour.

The +2 should be working. Does the missing dot from the label name matter? Fastspin certainly complains about that.

TonyB_ · 2020-03-13 10:48

Peter Jakacki wrote: »

everybody has these variations. I haven't had time to check them all.

If you don't like having to adjust the IJNZ jump, here's one I posted earlier that adds two cycles overall:

code LEN ' ( str -- len )
	rdfast	#0, a
	add	a, #1	' compensate length calc
	rep	#2, #0	' loop 2 next instructions forever until we branch out
        rfbyte	x wcz
 {c|z}	jmp	#\.l0	' leave loop
.l0	getptr   x
 _ret_	subr     a, x	' a = x-a'
 	end

rogloh · 2020-03-13 10:56

I noticed that some of this pasted code above is using number "1" followed by zero instead of lower case letter "L" zero for the label. Be careful what you type into the code. It could be crashing if you are doing a jump to #10 (ten), instead of "L" zero..

TIA - An interactive inline assembler for TAQOZ

Comments