Shop OBEX P1 Docs P2 Docs Learn Events
TIA - An interactive inline assembler for TAQOZ - Page 2 — Parallax Forums

TIA - An interactive inline assembler for TAQOZ

24

Comments

  • This undocumented behaviour needs to be documented.
  • evanhevanh Posts: 16,027
    Yeah, I think so now too. Originally, Chip had just said don't use a branch as the last instruction of the block. But, since this issue has cropped up a few times already, it's clearly going to keep reoccurring over the life of the prop2.

    Another alternative to using an absolute branch is to compensate the REP offset by adding the block length to the relative branch distance.

  • YanomaniYanomani Posts: 1,524
    edited 2020-03-12 07:48
    Using GETPTR instead of GETCT should shave off two instructions, and it doesn't require STALLI and ALLOWI:
                      rdfast   #0, a
                      rep      #2, #0 ' loop 2 next instructions forever until we branch out
                      rfbyte   x wcz
       if_z_or_c      jmp      #exit  ' leave loop
    exit              getptr   b
                      sub      b,a
                      sub      b, #2 ' adjust lenght for extra rfbyte, done just before loop exits
    
    P.S. my fault...
    xor a,b
    xor b,a
    xor a,b
                         getptr    a
                         ret            ' now, on ret, a contains next string start address
    '                                  b contains last string lenght, excluding end==0 or >127
    '                                  c and z would reflect the value of next string first character,
    '                                  useful to get each string lenght, from a list terminated by a null
    '                                  string, which uses the other possible terminator char.
    

    In the hope I've got it right, after adapting it, due to evanh's warning about the extra pass, caused by the use of a relative branch to exit the rep block, and extracting another usefull result, or two, indeed...

    "If you only have some saussage and bread, then be ensured you'd make the better saussage sandwich you can." :lol:
  • TonyB_TonyB_ Posts: 2,193
    edited 2020-03-12 10:20
    deleted
  • Using GETPTR instead of GETCT should shave off two instructions, and it doesn't require STALLI and ALLOWI:
    		rdfast	#0,a
    		rep	#2,#0		' loop 2 next instructions forever until we branch out
    		rfbyte	x	wcz
      if_z_or_c	jmp	#exit		' leave loop
    exit		getptr	b
    		sub	b,a
    		sub	b,#1		' don't count trailing null byte
    		ret
    

    Is there any reason not to use _ret_ ?

    It's possible to replace jmp with execf to jump with skipping, if latter ever needed.
  • Peter JakackiPeter Jakacki Posts: 10,193
    edited 2020-03-12 15:16
    After adapting the getptr method I found that there is some peculiar inconsistency.
    This is my code:
    code LEN ' ( str -- len )
    	rdfast	#0, a
    	rep	#2, #0	' loop 2 next instructions forever until we branch out
            rfbyte	x wcz
     {c|z}	ijnz	a,#l0	' leave loop but compensate length calc
    .l0	getptr   x
     _ret_	subr     a, x	' a = x-a'
     	end
    

    I then load this into cog memory and create an alias LEN that points to this code. Now I try it out on a string:
    TAQOZ# " ABCDEFGHIJKLMNOPQRSTUVWXYZ" LEN . --- 26  ok
    

    So I tried it on a 64kB block by first filling the block with a valid character and then terminating and checking it.
    TAQOZ# $1.0000 $1.0000 'A' FILL  ---  ok
    TAQOZ# 0 $2.0000 C! ---  ok
    TAQOZ# $1.FFF0 $20 DUMP --- 
    1FFF0: 41 41 41 41  41 41 41 41  41 41 41 41  41 41 41 41     'AAAAAAAAAAAAAAAA'
    20000: 00 5A 82 53  C3 A7 C7 A2  86 89 28 D8  E5 1E E1 E1     '.Z.S......(.....' ok
    
    Then I run it expecting 65536
    TAQOZ# $1.0000 LEN . --- 65537  ok
    

    Timing wise it works out to 4 cycles per character of course with some overhead:
    TAQOZ# $1.0000 LAP LEN LAP .LAP --- 262,208 cycles= 1,311,040ns @200MHz ok
    

    edit: It seems to be to do with the memory area. I've dropped right down to 32 bytes terminated in $10000 and compared that against a 32 byte string.
    TAQOZ# $1.0000 $40 DUMP --- 
    10000: 41 41 41 41  41 41 41 41  41 41 41 41  41 41 41 41     'AAAAAAAAAAAAAAAA'
    10010: 41 41 41 41  41 41 41 41  41 41 41 41  41 41 41 41     'AAAAAAAAAAAAAAAA'
    10020: 00 41 41 41  41 41 41 41  41 41 41 41  41 41 41 41     '.AAAAAAAAAAAAAAA'
    10030: 41 41 41 41  41 41 41 41  41 41 41 41  41 41 41 41     'AAAAAAAAAAAAAAAA' ok
    TAQOZ#  ---  ok
    TAQOZ# " ABCDEFGHIJKLMNOPQRSTUVWXYZ123456" LEN . --- 32  ok
    TAQOZ# $1.0000 LEN . --- 33  ok
    
  • TonyB_TonyB_ Posts: 2,193
    edited 2020-03-12 16:26
    After adapting the getptr method I found that there is some peculiar inconsistency.
    This is my code:
    code LEN ' ( str -- len )
    	rdfast	#0, a
    	rep	#2, #0	' loop 2 next instructions forever until we branch out
            rfbyte	x wcz
     {c|z}	ijnz	a,#l0	' leave loop but compensate length calc
    .l0	getptr   x
     _ret_	subr     a, x	' a = x-a'
     	end
    
    Did you see Evan's post?
    http://forums.parallax.com/discussion/comment/1491515/#Comment_1491515

    IJNZ is clever way to save an instruction, which I tried to do this morning but failed. How about this:
    code LEN ' ( str -- len )
    	mov	b, #.l0
    	rdfast	#0, a
    	rep	#2, #0	' loop 2 next instructions forever until we branch out
            rfbyte	x wcz
     {c|z}	ijnz	a,b	' leave loop but compensate length calc
    .l0	getptr   x
     _ret_	subr     a, x	' a = x-a'
     	end
    
    MOV adds a long that IJNZ removes, though.
  • TonyB_TonyB_ Posts: 2,193
    edited 2020-03-13 11:03
    Or since initial a is not being preserved:
    code LEN ' ( str -- len )
    	rdfast	#0, a
    	add	a, #1	' compensate length calc
    	rep	#2, #0	' loop 2 next instructions forever until we branch out
            rfbyte	x wcz
     {c|z}	jmp	#\.l0	' leave loop
    .l0	getptr   x
     _ret_	subr     a, x	' a = x-a'
     	end
    
  • evanhevanh Posts: 16,027
    Another variation
    code LEN ' ( str -- len )
    	rdfast	#0, a
    	rep	#2, #0	' loop 2 next instructions forever until we branch out
            rfbyte	x wcz
     {c|z}	ijnz	a,#l0+2	' leave loop but compensate length calc
    .l0	getptr   x
     _ret_	subr     a, x	' a = x-a'
     	end
    
  • Could a new REP loop break out of an existing REP loop, like this?
    code LEN ' ( str -- len )
    	rdfast	#0, a
    	rep	#2, #0	' loop 2 next instructions forever until we branch out
            rfbyte	x wcz
     {c|z}	rep     #1, #1
       	getptr  x
     _ret_	subr    a, x	' a = x-a'
     	end
    
  • evanhevanh Posts: 16,027
    edited 2020-03-12 21:51
    Lol, I haven't tried that one ... tested ... still has the offset. I guess that figures since a new REP is still a relative setup.

    EDIT: Ah, eek, it does something extra with the second REP. It seems to concatenate the two REPs together. You get a mix of both.
  • evanh wrote: »
    EDIT: Ah, eek, it does something extra with the second REP. It seems to concatenate the two REPs together. You get a mix of both.

    Wha?
  • evanhevanh Posts: 16,027
    In the below snippet:
    misc2 increments 4 times
    misc3 increments 3 times
    		...
    		rep	#3, #0
    		add	misc2, #1
    		nop
    		rep	#1, #3
    		add	misc3, #1
    		nop
    		nop
    		nop
    		...
    
  • That code is a bit different to what I have as it is missing the conditional that I had there to stop the second REP until the last case character matching 0 or >127 case that breaks out.
  • Peter JakackiPeter Jakacki Posts: 10,193
    edited 2020-03-12 22:31
    Just a quick check again this morning before I rush out, but using the same routine I get varying results. With a 32 character string it will report a length of 33 at times depending upon the memory area.

    See if you can follow what I'm doing (haven't enough time to comment this, but $! will copy a string at an address to an address):
    TAQOZ# $1.0000 PRINT$ --- ABCDEFGHIJKLMNOPQRSTUVWXYZ123456 ok
    TAQOZ# $1.0000 LEN . --- 32  ok
    TAQOZ# $1.0000 $30 DUMP --- 
    10000: 41 42 43 44  45 46 47 48  49 4A 4B 4C  4D 4E 4F 50     'ABCDEFGHIJKLMNOP'
    10010: 51 52 53 54  55 56 57 58  59 5A 31 32  33 34 35 36     'QRSTUVWXYZ123456'
    10020: 00 D8 CA 0B  B0 34 F1 C3  02 B6 40 9C  4A 45 50 63     '.....4....@.JEPc' ok
    TAQOZ# $1.0000 $7.8000 $! ---  ok
    TAQOZ# $7.8000 PRINT$ --- ABCDEFGHIJKLMNOPQRSTUVWXYZ123456 ok
    TAQOZ# $7.8000 LEN . --- 32  ok
    TAQOZ# $7.0000 $1000 'A' FILL ---  ok
    TAQOZ# 0 $7.0020 C! ---  ok
    TAQOZ# $7.0000 LEN . --- 33  ok
    TAQOZ# $7.0000 $30 DUMP --- 
    70000: 41 41 41 41  41 41 41 41  41 41 41 41  41 41 41 41     'AAAAAAAAAAAAAAAA'
    70010: 41 41 41 41  41 41 41 41  41 41 41 41  41 41 41 41     'AAAAAAAAAAAAAAAA'
    70020: 00 41 41 41  41 41 41 41  41 41 41 41  41 41 41 41     '.AAAAAAAAAAAAAAA' ok
    TAQOZ# $7.8000 $7.0000 $! ---  ok
    TAQOZ# $7.0000 $30 DUMP --- 
    70000: 41 42 43 44  45 46 47 48  49 4A 4B 4C  4D 4E 4F 50     'ABCDEFGHIJKLMNOP'
    70010: 51 52 53 54  55 56 57 58  59 5A 31 32  33 34 35 36     'QRSTUVWXYZ123456'
    70020: 00 41 41 41  41 41 41 41  41 41 41 41  41 41 41 41     '.AAAAAAAAAAAAAAA' ok
    TAQOZ# $7.0000 LEN . --- 33  ok
    TAQOZ# $7.0000 $7.1080 $! ---  ok
    TAQOZ# $7.1080 LEN . --- 33  ok
    TAQOZ# $7.8000 LEN . --- 32  ok
    TAQOZ# $7.8000 $30 DUMP --- 
    78000: 41 42 43 44  45 46 47 48  49 4A 4B 4C  4D 4E 4F 50     'ABCDEFGHIJKLMNOP'
    78010: 51 52 53 54  55 56 57 58  59 5A 31 32  33 34 35 36     'QRSTUVWXYZ123456'
    78020: 00 C8 6A 0C  58 C6 41 C7  00 B2 31 02  1E DD 96 C6     '..j.X.A...1.....' ok
    
  • evanhevanh Posts: 16,027
    rogloh wrote: »
    That code is a bit different to what I have as it is missing the conditional that I had there to stop the second REP until the last case character matching 0 or >127 case that breaks out.
    It's certainly not going to do what you want.

  • evanh wrote: »
    rogloh wrote: »
    That code is a bit different to what I have as it is missing the conditional that I had there to stop the second REP until the last case character matching 0 or >127 case that breaks out.
    It's certainly not going to do what you want.
    Ok it sounds like the whole end of REP loop thing is going to be an issue to deal with on exiting loops.
    I wonder if using
    if_c_or_z   REP #0, #1
    

    would behave any differently?
  • evanhevanh Posts: 16,027
    edited 2020-03-12 23:16
    That's the same result.

    The above examples that Tony and myself posted should all work correctly - https://forums.parallax.com/discussion/comment/1491558/#Comment_1491558

    Three options:
    - Add a compensating block length offset to the relative immediate.
    - or, use register direct, which is always an absolute address.
    - or, use an instruction that can encode absolute immediate, like JMP.
  • Nobody has commented on the weird discrepancies I'm finding. Testing with this routine:
    TAQOZ# code LEN ' ( str -- len )
    071E0 FC78_0022         rdfast  #0, a         
    071E4 FCDC_0400         rep     #2, #0  ' loop 2 next instructions forever until we branch out
    071E8 FD78_1610         rfbyte  x wcz
    071EC EB8C_45FF   {c|z} ijnz    a,#l0   ' leave loop but compensate length calc
    071F0 FD60_1634 .l0     getptr   x
    071F4 02C0_440B   _ret_ subr     a, x   ' a = x-a'
                            end ---  ok
    
    (The ijnz forward reference lists before it is resolved but it's fine: TAQOZ# $71EC @ .L --- $EB8C_4400 ok)

    Now I store a 32 character string and check it out:
    TAQOZ# " AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA" $1.0000 $! ---  ok
    TAQOZ# $1.0000 $30 DUMP --- 
    10000: 41 41 41 41  41 41 41 41  41 41 41 41  41 41 41 41     'AAAAAAAAAAAAAAAA'
    10010: 41 41 41 41  41 41 41 41  41 41 41 41  41 41 41 41     'AAAAAAAAAAAAAAAA'
    10020: 00 41 41 41  41 41 41 41  41 41 41 41  41 41 41 41     '.AAAAAAAAAAAAAAA' ok
    
    How long is the 32 character string?
    TAQOZ# $1.0000 LEN . --- 33  ok
    
    Wrong!

    Try terminating the terminator and see what happens:
    TAQOZ# $A4 $1.0021 C! ---  ok
    TAQOZ# $1.0000 LEN . --- 32  ok
    TAQOZ# 0 $1.0021 C! ---  ok
    TAQOZ# $1.0000 LEN . --- 32  ok
    TAQOZ# $41 $1.0021 C! ---  ok
    TAQOZ# $1.0000 LEN . --- 33  ok
    
    Weird? RIght?
  • evanhevanh Posts: 16,027
    Peter,
    Fix your code as per the discussion, then try again.
  • everybody has these variations. I haven't had time to check them all.
  • Try evanh's latest posted version called "Another variation".
  • roglohrogloh Posts: 5,837
    edited 2020-03-13 03:41
    When that works, you can try the unrolled speedups if you care to spare the extra long(s). In theory it could boost performance by up to 1.6x for really long strings with 4 rfbyte unrolls vs single rfbyte in the loop. Only tiny strings are faster without unrolling.

    I charted the following cases for the speedup gain with 2,3,4 rfbyte unrolls including the execution time of this subroutine itself (including ret and call overhead etc, plus an assumption of the average rdfast execution time of 15.5 clocks) and got this result.


    speedup.png
    1195 x 1093 - 155K
  • evanhevanh Posts: 16,027
    everybody has these variations. I haven't had time to check them all.
    They're all to work around the same issue. Choose the one that suits you best.

  • Peter JakackiPeter Jakacki Posts: 10,193
    edited 2020-03-13 07:06
    The reality is that strings are normally very short but the other place I would use this is to scan a buffer 512 byte sector for a terminator. The easy peasy 6 cycle method is fine for this, but still I want to see what's going on.

    @evanh - I can't see how your +2 on the ijnz helps and it definitely locks up when I try it.
    code LEN ' ( str -- len )
    	rdfast	#0, a
    	rep	#2, #0	' loop 2 next instructions forever until we branch out
            rfbyte	x wcz
     {c|z}	ijnz	a,#l0+2	' leave loop but compensate length calc
    .l0	getptr   x
     _ret_	subr     a, x	' a = x-a'
     	end
    

    You are essentially saying that it should jump to just after the last instruction, whatever that is.
  • evanhevanh Posts: 16,027
    Oh, is that code hubexec? Maybe it needs to be +8 instead of +2. Hmm, that'll suck having to customise ...

  • evanh wrote: »
    Oh, is that code hubexec? Maybe it needs to be +8 instead of +2. Hmm, that'll suck having to customise ...
    It's compiled in the hub memory but I copy it to cog for this test and run it as there is no way you can use rdfast otherwise, even if rep limps along.

  • evanhevanh Posts: 16,027
    edited 2020-03-13 10:28
    Ha! Interesting, the offset effect doesn't happen when the REP block is executing in hubram. All seems to be ordinary behaviour.

    The +2 should be working. Does the missing dot from the label name matter? Fastspin certainly complains about that.

  • TonyB_TonyB_ Posts: 2,193
    edited 2020-03-13 11:02
    everybody has these variations. I haven't had time to check them all.

    If you don't like having to adjust the IJNZ jump, here's one I posted earlier that adds two cycles overall:
    code LEN ' ( str -- len )
    	rdfast	#0, a
    	add	a, #1	' compensate length calc
    	rep	#2, #0	' loop 2 next instructions forever until we branch out
            rfbyte	x wcz
     {c|z}	jmp	#\.l0	' leave loop
    .l0	getptr   x
     _ret_	subr     a, x	' a = x-a'
     	end
    
  • I noticed that some of this pasted code above is using number "1" followed by zero instead of lower case letter "L" zero for the label. Be careful what you type into the code. It could be crashing if you are doing a jump to #10 (ten), instead of "L" zero..
Sign In or Register to comment.