Shop OBEX P1 Docs P2 Docs Learn Events
F32 - Concise floating point code for the Propeller - Page 2 — Parallax Forums

F32 - Concise floating point code for the Propeller

2»

Comments

  • John AbshierJohn Abshier Posts: 1,116
    edited 2011-03-11 12:06
    I was surprised at the magnitude of the speed up. I vote for your suggestion for change. For me 70 bytes is not worth the slow down of a function call.

    John Abshier
  • lonesocklonesock Posts: 917
    edited 2011-05-02 07:48
    Version 1.3 is now in the OBEX. It contains the faster "repeat" loops, and also fixes a bug in FRound found by John Abshier...THANKS!!

    Jonathan
  • KyeKye Posts: 2,200
    edited 2011-05-02 07:58
    I'm all about space savings.

    @You got the idea Lonesock.
  • lonesocklonesock Posts: 917
    edited 2011-08-02 15:58
    I'm thinking of adding the following 'atof' function to F32. The weak link is that it uses Exp10 to correct the exponent, which in turn uses the prop's internal log/exp tables, so I'll probably need to swap that out for a loop multiplying by 10.0 or 0.1. Can anyone see any bugs, or ways to speed it up?
    PUB atof( strptr ) : f | int, sign, dmag, mag, get_exp, b
      ' get all the digits as if this is an integer (but track the exponent)
      ' int := sign := dmag := mag := get_exp := 0
      longfill( @int, 0, 5 )
      repeat
        case b := byte[strptr++]
          "-": sign := $8000_0000
          "0".."9":
               int := int*10 + b - "0"
               mag += dmag
          ".": dmag := -1
          other: ' either done, or about to do exponent
               if get_exp
                 ' we just finished processing the exponent
                 if sign
                   int := -int
                 mag += int
                 quit
               else
                 ' convert int to a (signed) float
                 f := FFloat( int ) | sign
                 ' should we continue?
                 if (b == "E") or (b == "e")
                   ' int := sign := dmag := 0
                   longfill( @int, 0, 3 )
                   get_exp := 1
                 else
                   quit
      ' Exp10 is the weak link...uses the Log table in P1 ROM
      f := FMul( f, Exp10( FFloat( mag ) ) )
    
    thanks,
    Jonathan
  • James NewmanJames Newman Posts: 133
    edited 2012-02-13 17:04
    About to start using this, and noticed that some of the comments in the asm are from the old Float objects. An example is this line ':execCmd nop ' execute command, which was replaced by getCommand'

    Just thought I'd let you know. Great object btw.

    [EDIT] Ohh also the subtract function has an unneeded jmp as far as I can tell:
    _FSub                   xor     fnumB, Bit31            ' negate B
                            jmp     #_FAdd                  ' add values
    
    _FAdd                   call    #_Unpack2               ' unpack two variables   
    
  • lonesocklonesock Posts: 917
    edited 2012-04-27 13:25
    Hi, everyone! Been away for a while, but I'm back ;-)

    Here is an update to F32. I'm calling it 1.5 leaving 1.4 for the code fixes that Marty Lawson did in the table interpolation code...THANKS!

    Here's what's different:
    * fixed table interp (buggy LOG function, and maybe sine?), for certain input values
    * FCmp is faster and smaller
    * replaced "jmp #label_ret" with "jmp label_ret", saving 4 clocks...thanks kuroneko!
    * fixed the PASM dispatch offset table bug (in counting, I think I had skipped 10). Only would have seen this if calling directly from PASM. I can't find a record of who found this! If if was you please let me know so I can credit you!

    That's basically it. I'll leave it here for some testing. If no bugs crop up, I'll update the OBEX.

    Thanks, everyone!
    Jonathan
  • MacTuxLinMacTuxLin Posts: 821
    edited 2012-04-27 18:18
    Thank you Jonathan! I'll be putting this to test on my side too.
  • SRLMSRLM Posts: 5,045
    edited 2012-05-21 01:11
    I tried to make a version of F32 that could be parallel, but in the end it operated faster in a single cog!

    I have a bunch of math that could be parallelized, and so I edited F32 a bit to make it so that the wait for a result is in an external function. This requires some new variables and a bit of moving data around. By parallelizing F32, I can have four instances of the object running in four cogs, and have them all crunching numbers at the same time. So here's the multiply function that I came up with:
    VAR
    	long	func_result, func_a, func_b
    PUB FMulPar(a, b)
    {{
      Multiplication: result = a * b
      Parameters:
        a        32-bit floating point value
        b        32-bit floating point value
      Returns:   32-bit floating point value
    }}
      func_result  := cmdFMul
      func_a := a
      func_b := b  
      f32_Cmd := @func_result 
      
    PUB Wait
      
      repeat
      while f32_Cmd
    
      result := func_result
    

    And for my code that calls it, I basically had something like
    	fp.FMulPar(w1,w2)
    	fp1.FMulPar(w1,x2)
    	fp2.FMulPar(w1,y2)
    	fp3.FMulPar(w1,z2)
    	
    	w := fp.wait
    	x := fp1.wait
    	y := fp2.wait
    	z := fp3.wait
    

    In the end, the parallelization overhead added about 50 microseconds to the total execution time (of the four multiplies above). A slight benefit might be seen for functions that take longer (such as cos/atan/sqrt?).

    Any suggestions on how to make it faster? If not, then I might look into porting the interpreter thing from Float32.
  • Heater.Heater. Posts: 21,230
    edited 2012-05-21 01:36
    Every spin byte code takes 50 to 100 Prop instructions to execute and every Spin statement generates lots of byte codes so we can guess that the speed of F32 is just lost in the noise here. If I remember F32 pretty much fills a COG so you would have to throw out some operations in order to fit the "interpreter thing" which would be a shame.
    If I were needing speed and floating point and provided my application is not too big I would consider writing in C. The code would much prettier not having to call functions to perform all the operations and it would run at about one quarter of native PASM speed.
    I'm not sure if the propgcc compiler can make use of F32 "out of the box" but I have used it from the zpugcc compiler with Zog without much difficulty.
  • LawsonLawson Posts: 870
    edited 2012-05-21 12:46
    @SLRM

    When I looked through the F32 code to fix the table bug, I noticed that several of the functions are made by chaining a few core functions. I.e. x^y is done as exp(y*ln(x)). These types of functions could be re-implimented using the "interpreter thing" from Float32 possibly freeing up enough space to add an interpreter in? (last I checked F32 only has a couple of longs free)

    Lawson
  • LawsonLawson Posts: 870
    edited 2014-04-24 09:18
    It'd be nice if one cog running F32 could be shared between several cogs. My first thought for doing this would be to protect the spin interface with locks, but that would be slow. My second thought is to have F32 check 3-4 command mail-boxes round robin fashion. (instead of the one box currently used) Assuming that the spin interface code is taking most of the time, this could be quite fast. For a software interface, I think it'd be simplest to have the extra objects "register" to a given mail-box and then have one object start the F32 cog.

    Marty
  • Duane DegnDuane Degn Posts: 10,588
    edited 2015-02-06 19:24
    I'm using two instances of F32 in my hexapod code. F32 is sure a useful object. Thanks for writing it Jonathan.

    I was starting to run out of RAM so I wanted to move the PASM section of F32 to the EEPROM and then temporarily move the F32 PASM code (and PASM code from other objects) into the hub prior to launching it into a cog.

    I created a modified version of the call table which can exist independent of the PASM code.

    I discuss some of the details in this thread. An example of how to save the PASM section to the EEPROM is given as well as the modified call table.
  • I think I've found a way to speed up your comparisons and get you a couple longs back.

    You do this:
    _FCmp                   mov     t1, fnumA               ' if both values...
                            and     t1, fnumB               '  are negative...
                            shl     t1, #1 wc               ' (bit 31 high)...
    

    but I think this has the same effect, does it not?
                            add     fNumA, fNumB  nr, wc
    

    If the highest bit (sign) is set in both values, doing an unsigned add will overflow, setting the carry flag. Specifying 'NR' tells the add not to write its result, so you preserve the value of fNumA.

    It's a tiny change (it's only two longs) but I'm trying to squeeze as much into this cog as I can and I thought I'd pass that along.

    J
  • ElectrodudeElectrodude Posts: 1,658
    edited 2015-08-22 04:52
    JasonDorie wrote: »
    I think I've found a way to speed up your comparisons and get you a couple longs back.

    You do this:
    _FCmp                   mov     t1, fnumA               ' if both values...
                            and     t1, fnumB               '  are negative...
                            shl     t1, #1 wc               ' (bit 31 high)...
    

    but I think this has the same effect, does it not?
                            add     fNumA, fNumB  nr, wc
    

    If the highest bit (sign) is set in both values, doing an unsigned add will overflow, setting the carry flag. Specifying 'NR' tells the add not to write its result, so you preserve the value of fNumA.

    It's a tiny change (it's only two longs) but I'm trying to squeeze as much into this cog as I can and I thought I'd pass that along.

    J


    I don't think that will work. A counter example is (fnumA = $FFFF_FFFF) + (fnumB = $0000_0002) = $1_0000_0001. The carry bit gets set in that case, although only one of the two numbers is negative.

    How about this? It only saves one long, though.
                            shl     fnumA, #1  wc, nr       ' if fnumA is negative
            if_c            shl     fnumB, #1  wc, nr       ' and fnumB is negative (don't run if fnumA was positive)
    
  • D'oh - I didn't think that through far enough apparently. I'll happily use yours. :)
  • I like Electrodude's solution, though you probably want to either SHL by #0, or include the NR flag.

    Jonathan
  • lonesock wrote: »
    I like Electrodude's solution, though you probably want to either SHL by #0, or include the NR flag.

    Jonathan

    Right. I'll edit my post to use nr.
Sign In or Register to comment.