Shop OBEX P1 Docs P2 Docs Learn Events
Fast Bytecode Interpreter - Page 26 — Parallax Forums

Fast Bytecode Interpreter

12426282930

Comments

  • evanhevanh Posts: 15,858
    edited 2017-09-02 03:12
    Dave,
    Intriguing, so the FIFO stops being used when the code path enters CogExec, the FIFO is then being used for a buffered block write, then automatically refills with HubExec code when the RET is hit.

    There's probably a FIFO flush delay at this point but "That's so easy!" :) Now for the block copy version ... it'll need to chop it up into chunks like Chip has.
  • cgraceycgracey Posts: 14,134
    edited 2017-09-02 03:18
    One nice thing about SETQ+RDLONG/WRLONG is that the FIFO is not used. Data is streamed directly between the cog and the hub at one clock per long. FIFO status and mode are unchanged.
  • evanhevanh Posts: 15,858
    So many ways in the Prop2.

    I've been using Dave's comport code for my testing. It's simple bit-bashing code (presumably been like that since Prop2Hot days) but it saved me going through the trial and error effort myself. Had to hack it around a little to remove the GCC parts. I wanted all the printf() code too but that was just too hard to extract. So I settled for making my own itoa() that uses the putch() primitive.
  • cgraceycgracey Posts: 14,134
    edited 2017-09-02 04:59
    Dave Hein wrote: »
    Chip, that looks great. However, I wonder if you could get a speed improvement by separating xxxxFILL from xxxxMOVE. Also, because the P2 does unaligned reads and writes the BYTE and WORD operations could be almost as fast as the LONG operations. It just requires performing 0 to 3 BYTE accesses at the beginning, and then doing LONG accesses after that. This is the code I used for memset() in p2gcc, which is basically the same as BYTEFILL() in Spin.
    ' This code resides in cog RAM
    __LONGFILL
            wrfast  #0, r0
            rep     #1, r2
            wflong  r1
            ret
    
    // This code resides in hub RAM
    // ptr, val and num are in r0, r1 and r2
    void memset(void *ptr, int val, int num)
    {
        __asm__("       mov     r3, #3");
        __asm__("       and     r3, r2 wz");
        __asm__(" if_z  jmp     #label1");
        __asm__("label2 wrbyte  r1, r0");
        __asm__("       add     r0, #1");
        __asm__("       djnz    r3, #label2");
        __asm__("label1 setnib  r1, r1, #1");
        __asm__("       setbyte r1, r1, #1");
        __asm__("       setword r1, r1, #1");
        __asm__("       shr     r2, #2 wz");
        __asm__(" if_nz call    #\\__LONGFILL");
    }
    
    I could speed it up a bit by moving all the code to cog RAM, but p2gcc C code currently calls functions that only reside in hub RAM.

    Dave, there was a lot of overlap between the move and fill functions. So, I melded them together. I didn't want one full-length SETQ+WRLONG fill, since interrupt latency would have been compromised.

    If you don't care about interrupt latency, you can do one SETQ+WRLONG combo, using a constant for the WRLONG's D. It will fill at the rate of 1 long per clock.
  • I'll have to change it to use SETQ+WRLONG. Since it doesn't use the FIFO I should be able to do this from hub RAM and I won't need to use cog RAM. I guess if I need to worry about interrupt latency in the future I could break the SETQ+WRLONG into to smaller chunks.
  • cgraceycgracey Posts: 14,134
    Work on the interpreter is going slowly, but almost done. It is really challenging for me because there are so many subtle things that can be done to create optimal code, that it's enough to drive one nuts.

    We almost need some kind of compiler that can optimize variable placement and coordinate it with SETQ+RDLONG activity to get the big speed improvements possible.

    I've been thinking for years now that it would be really neat to have a tool where you can define some kind of outcome that you want, based on inputs, and the tool, using random instruction selection and simulation, comes up with really efficient solutions that would have been too difficult for a human to produce in as timely a way.
  • jmgjmg Posts: 15,171
    cgracey wrote: »
    Work on the interpreter is going slowly, but almost done. It is really challenging for me because there are so many subtle things that can be done to create optimal code, that it's enough to drive one nuts.

    We almost need some kind of compiler that can optimize variable placement and coordinate it with SETQ+RDLONG activity to get the big speed improvements possible.

    I've been thinking for years now that it would be really neat to have a tool where you can define some kind of outcome that you want, based on inputs, and the tool, using random instruction selection and simulation, comes up with really efficient solutions that would have been too difficult for a human to produce in as timely a way.

    So you mean sort of like a Place & Route in FPGA, only for Opcodes... ?
    Could be interesting to try, but likely to quickly become exponentially slower, as the task increases.

    Easier could be more-automated tools for your masked skip opcodes.
    ie fairly simple ones that re-create masks as lines are added and removed.
  • Chip, that's interesting...a Monte Carlo approach to coding.

    https://en.m.wikipedia.org/wiki/Monte_Carlo_method
  • cgraceycgracey Posts: 14,134
    edited 2017-09-07 14:28
    Yes, it seems almost like an FPGA compilation problem, but with a larger-grain and a more-random set of building blocks that cover a different set of functions. It's like chess, whereas FPGA's are more like Legos.

    Yesterday, I had the problem of needing to find if a register held a value that was between two other random register values. I can do it in six instructions, but there must be a simpler solution. I think it would be reasonable to have a compiler come up with a solution for that simple of a problem.
  • cgraceycgracey Posts: 14,134
    Jeff Haas wrote: »
    Chip, that's interesting...a Monte Carlo approach to coding.

    https://en.m.wikipedia.org/wiki/Monte_Carlo_method

    That didn't occur to me, at first, but that would be the best way to do it. You could eliminate bad combos right away with that approach. A full parametric sweep would be impossible.
  • cgraceycgracey Posts: 14,134
    Maybe the simplest way to make such a thing is to give the tool some code sequence that works, and have it try to find a shorter/faster sequence. That way, there's less abstraction needed on defining the goal. It just simulates the model while it tries to find a better one.
  • cgracey wrote: »
    Maybe the simplest way to make such a thing is to give the tool some code sequence that works, and have it try to find a shorter/faster sequence. That way, there's less abstraction needed on defining the goal. It just simulates the model while it tries to find a better one.

    Sounds like your moving toward genetic algorithms :-)
  • cgraceycgracey Posts: 14,134
    edited 2017-09-07 20:42
    The fastest way to build this and the fastest way to run it, is on the prop2, itself. No need for a simulator, then:

    1) generate new random input values
    2) run 'model' routine with generated random input values, capture outputs
    4) run 'test' routine with generated random input values, capture outputs
    5) compare outputs and iterate 'test' routine if mismatch
    6) repeat until n cycles with no mismatch
  • isn't that exactly what neuronal networks do?

    provide them with input-data and a goal output to train them?

    nothing new here

    Mike
  • msrobots wrote: »
    isn't that exactly what neuronal networks do?
    Mike

    No, thats exactly what Genetic Algorithms do.

    "The genetic algorithm is a method for solving both constrained and unconstrained optimization problems that is based on natural selection, the process that drives biological evolution. The genetic algorithm repeatedly modifies a population of individual solutions."
  • cgracey wrote: »
    Yesterday, I had the problem of needing to find if a register held a value that was between two other random register values. I can do it in six instructions, but there must be a simpler solution. I think it would be reasonable to have a compiler come up with a solution for that simple of a problem.

    Hi Chip,
    I would like to show you a solution that I came up with to your above problem. I have tested it pretty thoroughly
    using Cluso's debugger, so I hope that it helps.
    InRange       cmps n,r1  wc,wz        
    IF_NC_AND_NZ  cmps r2,n  wc,wz            
    IF_NC         jmp #InRange_ret        
                  cmps r1,n  wc,wz        
    IF_NC_AND_NZ  cmps n,r2  wc,wz       
    InRange_ret   ret
    

    HydraHacker

  • cgraceycgracey Posts: 14,134
    cgracey wrote: »
    Yesterday, I had the problem of needing to find if a register held a value that was between two other random register values. I can do it in six instructions, but there must be a simpler solution. I think it would be reasonable to have a compiler come up with a solution for that simple of a problem.

    Hi Chip,
    I would like to show you a solution that I came up with to your above problem. I have tested it pretty thoroughly
    using Cluso's debugger, so I hope that it helps.
    InRange       cmps n,r1  wc,wz        
    IF_NC_AND_NZ  cmps r2,n  wc,wz            
    IF_NC         jmp #InRange_ret        
                  cmps r1,n  wc,wz        
    IF_NC_AND_NZ  cmps n,r2  wc,wz       
    InRange_ret   ret
    

    HydraHacker

    Neat! I was wondering if something like this was possible - checking the range in two directions. It hurt my head so bad that I couldn't get on top of it. Looks like you got there, though.

    Meanwhile, here is what I came up with (comments are first, to state the goal of the code):
    '
    '
    ' LOOKDOWN(target : ,,,value1..value2,,,)
    '
    '  if value1 <= value2
    '    delta = value2 - value1
    '    if value1 <= target <= value2
    '      result = - value1 + index + target
    '      branch to address
    '    else index += delta + 1
    '
    '  if value1 > value2
    '    delta = value1 - value2
    '    if value2 <= target <= value1
    '      result = value1 + index - target
    '      branch to address
    '    else index += delta + 1
    '
    '
    '  x		value2		x
    '  ptra[-1]	value1		t
    '  ptra[-2]	index		u
    '  ptra[-3]	target		v
    '  ptra[-4]	address		w
    '
    '
    lookdnr		setq	#3		'pop value1+index+target+address into t+u+v+w
    		rdlong	w,--ptra[4]
    
    		cmps	x,t	wc	'z = (value1 > value2)
    		modcz	0,_c	wz
    
      if_nz		cmps	v,t	wc	'if nz, nc = (value1 <= target <= value2)
      if_nz_and_nc	cmps	x,v	wc
    
      if_z		cmps	v,x	wc	'if z, nc = (value2 <= target <= value1)
      if_z_and_nc	cmps	t,v	wc
    
      if_nc		negnz	x,t		'if target within range,
      if_nc		add	x,u		'..get result on stack top
      if_nc		sumz	x,v
      if_nc		jmp	#casex		'..and branch
    
      if_nz		sub	x,t		'else, get new index on stack top
      if_z		subr	x,t
    		addx	x,u		'(c=1, results in +1)
    
      _ret_		add	ptra,#2*4	'..and unpop address+target
    

    This code gets the direction of the compare into Z, so that it can still be used after C indicates if we are in range, or not.
  • cgracey wrote: »
    cgracey wrote: »
    Yesterday, I had the problem of needing to find if a register held a value that was between two other random register values. I can do it in six instructions, but there must be a simpler solution. I think it would be reasonable to have a compiler come up with a solution for that simple of a problem.

    Hi Chip,
    I would like to show you a solution that I came up with to your above problem. I have tested it pretty thoroughly
    using Cluso's debugger, so I hope that it helps.
    InRange       cmps n,r1  wc,wz        
    IF_NC_AND_NZ  cmps r2,n  wc,wz            
    IF_NC         jmp #InRange_ret        
                  cmps r1,n  wc,wz        
    IF_NC_AND_NZ  cmps n,r2  wc,wz       
    InRange_ret   ret
    

    HydraHacker

    Neat! I was wondering if something like this was possible - checking the range in two directions. It hurt my head so bad that I couldn't get on top of it. Looks like you got there, though.

    Meanwhile, here is what I came up with (comments are first, to state the goal of the code):
    '
    '
    ' LOOKDOWN(target : ,,,value1..value2,,,)
    '
    '  if value1 <= value2
    '    delta = value2 - value1
    '    if value1 <= target <= value2
    '      result = - value1 + index + target
    '      branch to address
    '    else index += delta + 1
    '
    '  if value1 > value2
    '    delta = value1 - value2
    '    if value2 <= target <= value1
    '      result = value1 + index - target
    '      branch to address
    '    else index += delta + 1
    '
    '
    '  x		value2		x
    '  ptra[-1]	value1		t
    '  ptra[-2]	index		u
    '  ptra[-3]	target		v
    '  ptra[-4]	address		w
    '
    '
    lookdnr		setq	#3		'pop value1+index+target+address into t+u+v+w
    		rdlong	w,--ptra[4]
    
    		cmps	x,t	wc	'z = (value1 > value2)
    		modcz	0,_c	wz
    
      if_nz		cmps	v,t	wc	'if nz, nc = (value1 <= target <= value2)
      if_nz_and_nc	cmps	x,v	wc
    
      if_z		cmps	v,x	wc	'if z, nc = (value2 <= target <= value1)
      if_z_and_nc	cmps	t,v	wc
    
      if_nc		negnz	x,t		'if target within range,
      if_nc		add	x,u		'..get result on stack top
      if_nc		sumz	x,v
      if_nc		jmp	#casex		'..and branch
    
      if_nz		sub	x,t		'else, get new index on stack top
      if_z		subr	x,t
    		addx	x,u		'(c=1, results in +1)
    
      _ret_		add	ptra,#2*4	'..and unpop address+target
    

    This code gets the direction of the compare into Z, so that it can still be used after C indicates if we are in range, or not.

    Thanks chip for looking at my code, I really busted my brain! Trying to make a range check that works in two directions. I’ll add some comments soon to my code, I just have to figure out a good way to explain it. Chip your code really looks good and because you write such efficient code sometimes I have trouble understanding what you wrote :-)

    HydraHacker
  • Hi Chip,

    I have documented my code that performs a range check in two directions.
    dat
    
    {
    if n>=r1                r1<r2 direction
       if n<=r2
          return true       n is in range of r1 and r2
    elseif n<=r1            r1>r2 direction
       if n>=r2
          return true       n is in range of r1 and r2
    else
       return false         n is not in range of r1 and r2
    }
        
    InRange       cmps n,r1  wz,wc     ' checks for n>=r1   
    IF_NZ_AND_NC  cmps r2,n  wz,wc     ' if n>=r1 then check if n<=r2       
    IF_NC         jmp #InRange_ret     ' if n>=r1 and n<=r2 then exit    
                  cmps r1,n  wz,wc     ' checks for n<=r1   
    IF_NZ_AND_NC  cmps n,r2  wz,wc     ' if n<=r1 then check if n>=r2   
    InRange_ret   ret
    
    ' defines what flags mean upon subroutine exit
    
    ' z=1&c=0 means that n is equal to r1 or n is equal to r2
    ' z=0&c=0 means that n is between r1 and r2, but does not include r1 or r2
    ' z=0&c=1 means that n is not in range!
    
  • cgraceycgracey Posts: 14,134
    edited 2017-10-08 22:06
    Hydrahacker,

    Thanks for documenting that. Looks great! Stuff like this is hard to think about.

    If that code is used as a subroutine, it could be quickened a little:
        
    InRange       cmps n,r1  wz,wc     ' checks for n>=r1   
    IF_NZ_AND_NC  cmps r2,n  wz,wc     ' if n>=r1 then check if n<=r2       
    IF_NC         ret                  ' if n>=r1 and n<=r2 then exit    
    
                  cmps r1,n  wz,wc     ' checks for n<=r1   
    IF_NZ_AND_NC  cmps n,r2  wz,wc     ' if n<=r1 then check if n>=r2   
                  ret
    
  • cgraceycgracey Posts: 14,134
    edited 2018-02-01 09:54
    I ran into a limitation while working on Spin2 with XBYTE...

    XBYTE uses SETQ to establish the bytecode base and LSB/MSB indexing, usually at the start of bytecode execution:
    _RET_   SETQ    #$000           'set bytecode base to $000 in LUT, stack = $1F8, do XBYTE
    

    After that, _RET_+Instruction's are done at the end of each code snippet to execute the next bytecode.

    What if you have an alternate set of bytecodes, though, that you want to execute through a portal bytecode? You could do a new _RET_+SETQ, but then you permanently change the base and MSB/LSB indexing for subsequent _RET_+instruction's.

    There needs to be a way to do a temporary base-and-MSB/LSB-indexing change that only works on the next bytecode, so that alternate sets of bytecodes could be used to run lower-usage code, without any need for a restorative _RET_+SETQ instruction later.

    SETQ2 will now serve this purpose:
    _RET_   SETQ2   #$100           'set bytecode base to $100 in LUT for next bytecode only, stack = $1F8, do XBYTE
    

    This will add 10 flops to each cog, but make alternate bytecode sets easy to engage without the need for post clean-up.
  • Chip,
    Very cool, that will make emulation of some CPUs and VMs nicer too!
  • cgraceycgracey Posts: 14,134
    Roy Eltham wrote: »
    Chip,
    Very cool, that will make emulation of some CPUs and VMs nicer too!

    Man! I was just thinking that I wanted you to see this, because you would probably like it.
  • cgracey wrote: »
    Yesterday, I had the problem of needing to find if a register held a value that was between two other random register values. I can do it in six instructions, but there must be a simpler solution. I think it would be reasonable to have a compiler come up with a solution for that simple of a problem.
    In P1 code:
                  sub       lim2, val wz
            if_nz sub       lim1, val wz
                  xor       lim1, lim2
                  shl       lim1, #1 wc
    
    The value is in range "if_c_or_z". Unfortunately it changes the values of your limt variables, and I'm guessing that's not allowed.

    Jonathan
  • TonyB_TonyB_ Posts: 2,178
    edited 2018-02-17 22:43
    Deleted.

    Please see this post and later
  • TonyB_TonyB_ Posts: 2,178
    edited 2018-02-16 23:10
    .
  • TonyB_TonyB_ Posts: 2,178
    edited 2018-02-16 22:53
    .
  • cgraceycgracey Posts: 14,134
    lonesock wrote: »
    cgracey wrote: »
    Yesterday, I had the problem of needing to find if a register held a value that was between two other random register values. I can do it in six instructions, but there must be a simpler solution. I think it would be reasonable to have a compiler come up with a solution for that simple of a problem.
    In P1 code:
                  sub       lim2, val wz
            if_nz sub       lim1, val wz
                  xor       lim1, lim2
                  shl       lim1, #1 wc
    
    The value is in range "if_c_or_z". Unfortunately it changes the values of your limt variables, and I'm guessing that's not allowed.

    Jonathan

    Jonathan, this is hard to think about it. I will make the effort to ingest this soon.
  • cgraceycgracey Posts: 14,134
    edited 2018-02-04 09:16
    TonyB_ wrote: »
    Further, refined thoughts on two options for a possible new instruction as introduced above. The existing instructions that would be replaced by the new one are as follows:

    1. Minimal option
    	MOV	temp,bytecode		'preserve bytecode for later
    	AND	temp,#andmask		'andmask = %11 {1 of 4} or %111 {1 of 8}
    	DECOD	temp			'temp = %0001...%1000 or %00000001...%10000000
    	XOR	temp,#xormask		'xormask = %1111 or %11111111
    	SHL	temp,#1			'temp = %1110_0...%0111_0 or %11111110_0...%01111111_0
    	SKIPF	temp			'temp = skip pattern, unused high bits all zero
    

    2. Ideal option
    	MOV	temp,bytecode
    	SHR	temp,#shift		'right shift desired group to low bits
    	AND	temp,#andmask
    	DECOD	temp
    	XOR	temp,#xormask
    	SHL	temp,#1
    	SKIPF	temp
    

    The ideal option assumes the existing shift logic can be used for the right shift, the decode logic afterwards is trivial. The left shift inserts a '0' as the first skip bit, so that one instruction is always executed before the four or eight skippable ones.

    As the instruction after SKIPF can only be cancelled if unused, not skipped, the leading '0' avoids 3 of 4 or 7 of 8 options incurring a two-cycle penalty, it keeps the timing consistent and creates space for a preliminary jump instruction. The new skip instruction, SKIPX for want of a better name, could be placed one instruction early if a branch is not needed.

    The minimal option would require a right shift beforehand if the lowest bytecode bits are not the desired ones and a D-only instruction could suffice, with an opcode bit selecting 1 of 4 / 1 of 8. The ideal option would need a D,S slot, where S[4:0] could select the right shift and S[8] 1 of 4 / 1 of 8.

    As can be seen, SKIPX could replace up to six or seven instructions, a better ratio than some other instructions. It would be a direct binary-to-skip-fast instruction, which is currently missing.

    TonyB_,

    I'm not readily understanding your idea. I'm going to need to look carefully when I have more time.

    Right now, I'm getting to leave in the morning to go to Treehouse in Colorado, so that we can get the layout changes done once and for all, hopefully.
  • TonyB_TonyB_ Posts: 2,178
    edited 2018-02-04 20:10
    cgracey wrote: »
    TonyB_,

    I'm not readily understanding your idea. I'm going to need to look carefully when I have more time.

    Right now, I'm getting to leave in the morning to go to Treehouse in Colorado, so that we can get the layout changes done once and for all, hopefully.

    After more consideration, what I'm trying to do could be achieved with two or three existing instructions, not six or seven, therefore no real need for a new one.

    There is a real issue here, though, related to your recent SETQ2 addition to XBYTE. I hope I can explain it better when you get back from your trip.
Sign In or Register to comment.