Shop Learn
PNut/Spin2 Latest Version (v35q - Floating-Point Added, symbol table increased from 64KB to 256KB) - Page 46 — Parallax Forums

PNut/Spin2 Latest Version (v35q - Floating-Point Added, symbol table increased from 64KB to 256KB)

1404142434446»

Comments

  • ersmithersmith Posts: 5,051

    @evanh said:
    Good stuff Wuerfel_21. And quick. I'm sure Eric is happy there is someone else with the skills to help.

    Very happy. Ada has contributed so much, not just in the obvious things she's added to flexspin, but also behind the scenes with bug reports, suggestions, and testing. She's really a great programmer!

  • RaymanRayman Posts: 12,220

    @cgracey said:

    @Rayman said:
    Great to see floating point in Spin2!

    Does Spin2 use up more of the cog memory now?

    The Spin2 interpreter is now 4,784 bytes long, occupying $00000..$012AF. The floating-point instructions added a total of 554 bytes of code and data to the interpreter, increasing its size by 13%.

    Wait, that must be HUB space, right? Is cog usage still the same?

  • cgraceycgracey Posts: 13,627
    edited 2021-09-24 03:29

    @Rayman said:

    @cgracey said:

    @Rayman said:
    Great to see floating point in Spin2!

    Does Spin2 use up more of the cog memory now?

    The Spin2 interpreter is now 4,784 bytes long, occupying $00000..$012AF. The floating-point instructions added a total of 554 bytes of code and data to the interpreter, increasing its size by 13%.

    Wait, that must be HUB space, right? Is cog usage still the same?

    Sorry, I didn't answer your actual question.

    Cog RAM usage is exactly the same. No change there.

  • ersmithersmith Posts: 5,051

    @"Stephen Moraco" said:
    @Wuerfel_21 thanks for clarifying, I jumped the gun... I'll look forward to the new operations arriving when they do!

    They're in github now.

  • evanhevanh Posts: 11,824

    Cool! floating_point_demo.spin2 now works with flexspin.

  • evanhevanh Posts: 11,824
    edited 2021-09-25 08:47

    Hmm, just investigated converting the clock frequency calculations back into floats. But it doesn't really look any better unless all variables are converted back. Otherwise the code looks just as untidy with lots of float() and round() sprinkled around.

    Really, muldiv65() does what is needed there. After all, the objective is to find the best set of integer components of clock-mode for a HUBSET instruction.

  • pik33pik33 Posts: 1,061

    @ersmith said:

    @"Stephen Moraco" said:
    @Wuerfel_21 thanks for clarifying, I jumped the gun... I'll look forward to the new operations arriving when they do!

    They're in github now.

    @evanh said:
    Cool! floating_point_demo.spin2 now works with flexspin.

    What I can see now on Github is the commit from 10 days ago (2021.09.15) without any floating point debug/operators

  • evanhevanh Posts: 11,824
    edited 2021-09-25 10:00

    Ah, yeah, I'm not using the website. Using the "git pull" and "make" commands from shell instead. Eric has updated the master dev branch in the Git repository. One advantage of having a Linux box. These tools are just there, even if they're a mystery to work sometimes.

  • pik33pik33 Posts: 1,061
    edited 2021-09-25 11:14

    @evanh said:
    Ah, yeah, I'm not using the website. Using the "git pull" and "make" commands from shell instead. Eric has updated the master dev branch in the Git repository. One advantage of having a Linux box. These tools are just there, even if they're a mystery to work sometimes.

    I have several raspberries to try this.
    Edit: fresh build on rpi400 doesn't understand new operators. "Unexpected ."

  • evanhevanh Posts: 11,824
    edited 2021-09-25 11:44

    Is that the make? Or is that a flexspin error?

    Do a make clean and make it again. And make sure you're actually using the fresh build of flexspin.

  • pik33pik33 Posts: 1,061

    I did git pull, make, all went ok and I have a working flexprop on my RPi400. It says it is 5.9.2
    However if I tried

    c:= a /. b

    it told me "unexpected ." while trying to compile this for a P2

  • evanhevanh Posts: 11,824
    edited 2021-09-25 12:06

    It may not be part of flexprop. Did you build flexspin alone? git clone https://github.com/totalspectrum/spin2cpp

    Or you can download the zip from the website and unzip then git pull and make from that.

  • evanhevanh Posts: 11,824

    Yeah, that'll be it. My flexspin version string is Version 5.9.3-beta-v5.9.2-21-gd463d1ea Compiled on: Sep 25 2021

  • ersmithersmith Posts: 5,051

    @pik33 said:
    I did git pull, make, all went ok and I have a working flexprop on my RPi400. It says it is 5.9.2
    However if I tried

    c:= a /. b

    it told me "unexpected ." while trying to compile this for a P2

    Sorry, I had only updated spin2cpp, not flexprop. Try again now (and if you're still having trouble do a "make clean" before "make install").

  • pik33pik33 Posts: 1,061

    Now it is 5.9.3 and it compiled /.

  • ersmithersmith Posts: 5,051

    @cgracey : I've found a few bugs in the Spin2 floating point routines. There are a few general categories:

    (1) Rounding errors: IEEE 32 says results should be rounded to the nearest even bit pattern. I think Spin2 may always be rounding up.
    (2) Overflow errors: Many operations which should give infinite results give NaN patterns (but not always the "official" NaN)
    (3) Negative zero: there are a few cases where $8000_0000 needs to be treated specially.

    I've attached a .zip file with the test program I used to generate these.

    IEEE 32 bit binary tests
    Add tests: 
      Error: 7F800000 +. 7F800000 -> 7FFFFFFF expected 7F800000
      Error: FF800000 +. FF800000 -> FFFFFFFF expected FF800000
      Error: 7F800000 +. 7F000000 -> 7FC00000 expected 7F800000
      Error: 7F800000 +. FF000000 -> 7F000000 expected 7F800000
      Error: FF800000 +. 7F000000 -> FF000000 expected FF800000
      Error: FF800000 +. FF000000 -> FFC00000 expected FF800000
      Error: 7F000000 +. 7F800000 -> 7FC00000 expected 7F800000
      Error: 7F000000 +. FF800000 -> FF000000 expected FF800000
      Error: FF000000 +. 7F800000 -> 7F000000 expected 7F800000
      Error: FF000000 +. FF800000 -> FFC00000 expected FF800000
      Error: 7F7FFFFE +. 7F7FFFFE -> 7FFFFFFE expected 7F800000
      Error: FF7FFFFE +. FF7FFFFE -> FFFFFFFE expected FF800000
      Error: 3F800001 +. 3F800000 -> 40000001 expected 40000000
      Error: BF800001 +. BF800000 -> C0000001 expected C0000000
      Error: C0000000 +. C0000001 -> C0800001 expected C0800000
      Error: 40000000 +. 40000001 -> 40800001 expected 40800000
      Error: 7F7FFFFE +. 7F7FFFFF -> 7FFFFFFF expected 7F800000
      Error: FF7FFFFE +. FF7FFFFF -> FFFFFFFF expected FF800000
      Error: 7F000001 +. 7F000000 -> 7F800001 expected 7F800000
      Error: FF000001 +. FF000000 -> FF800001 expected FF800000
      Error: 7E800001 +. 7E800000 -> 7F000001 expected 7F000000
      Error: FE800001 +. FE800000 -> FF000001 expected FF000000
      Error: 7EFFFFFE +. 7EFFFFFF -> 7F7FFFFF expected 7F7FFFFE
      Error: FEFFFFFE +. FEFFFFFF -> FF7FFFFF expected FF7FFFFE
      Error: 40000000 +. 34000000 -> 40000001 expected 40000000
      Error: C07FFFFF +. B3FFFFFF -> C0800000 expected C07FFFFF
      Error: 3F800000 +. 3F800001 -> 40000001 expected 40000000
      Error: BF800000 +. BF800001 -> C0000001 expected C0000000
     Elapsed microseconds: 225
     count: 344 passed: 316
    Sub tests: 
      Error: 3F800001 -. BF800000 -> 40000001 expected 40000000
      Error: BF800001 -. 3F800000 -> C0000001 expected C0000000
      Error: 7F800000 -. FF800000 -> 7FFFFFFF expected 7F800000
      Error: FF800000 -. 7F800000 -> FFFFFFFF expected FF800000
      Error: 7F800000 -. FF000000 -> 7FC00000 expected 7F800000
      Error: 7F800000 -. 7F000000 -> 7F000000 expected 7F800000
      Error: FF800000 -. FF000000 -> FF000000 expected FF800000
      Error: FF800000 -. 7F000000 -> FFC00000 expected FF800000
      Error: 7F000000 -. FF800000 -> 7FC00000 expected 7F800000
      Error: 7F000000 -. 7F800000 -> FF000000 expected FF800000
      Error: FF000000 -. FF800000 -> 7F000000 expected 7F800000
      Error: FF000000 -. 7F800000 -> FFC00000 expected FF800000
      Error: 7F7FFFFE -. FF7FFFFE -> 7FFFFFFE expected 7F800000
      Error: FF7FFFFE -. 7F7FFFFE -> FFFFFFFE expected FF800000
      Error: C0000000 -. 40000001 -> C0800001 expected C0800000
      Error: 40000000 -. C0000001 -> 40800001 expected 40800000
      Error: 7F7FFFFE -. FF7FFFFF -> 7FFFFFFF expected 7F800000
      Error: FF7FFFFE -. 7F7FFFFF -> FFFFFFFF expected FF800000
      Error: 7F000001 -. FF000000 -> 7F800001 expected 7F800000
      Error: FF000001 -. 7F000000 -> FF800001 expected FF800000
      Error: 7E800001 -. FE800000 -> 7F000001 expected 7F000000
      Error: FE800001 -. 7E800000 -> FF000001 expected FF000000
      Error: 7EFFFFFE -. FEFFFFFF -> 7F7FFFFF expected 7F7FFFFE
      Error: FEFFFFFE -. 7EFFFFFF -> FF7FFFFF expected FF7FFFFE
      Error: 40000000 -. B4000000 -> 40000001 expected 40000000
      Error: C07FFFFF -. 33FFFFFF -> C0800000 expected C07FFFFF
     Elapsed microseconds: 305
     count: 316 passed: 290
    Mul tests: 
      Error: C0000000 *. 7F800000 -> FFFFFFFF expected FF800000
      Error: C0800000 *. FF800000 -> 7FFFFFFF expected 7F800000
      Error: 40A00000 *. 7F800000 -> 7FFFFFFF expected 7F800000
      Error: 40E00000 *. FF800000 -> FFFFFFFF expected FF800000
      Error: C1100000 *. 7F000000 -> FFFFFFFF expected FF800000
      Error: C0400000 *. FF000000 -> 7FC00000 expected 7F800000
      Error: BF000001 *. 00000001 -> 80000000 expected 80000001
      Error: BFC00000 *. 80000001 -> 00000001 expected 00000002
      Error: C0200001 *. 00000001 -> 80000002 expected 80000003
      Error: C0600000 *. 80000001 -> 00000003 expected 00000004
      Error: 7F800000 *. 7F800000 -> 7FFFFFFF expected 7F800000
      Error: FF800000 *. 7F800000 -> FFFFFFFF expected FF800000
      Error: 7F800000 *. FF800000 -> FFFFFFFF expected FF800000
      Error: FF800000 *. FF800000 -> 7FFFFFFF expected 7F800000
      Error: 7F800000 *. C0400000 -> FFFFFFFF expected FF800000
      Error: FF800000 *. 40C00000 -> FFFFFFFF expected FF800000
      Error: FF800000 *. C1000000 -> 7FFFFFFF expected 7F800000
      Error: 7F000000 *. 7F800000 -> 7FFFFFFF expected 7F800000
      Error: FE800000 *. 7F800000 -> FFFFFFFF expected FF800000
      Error: 7F800000 *. FF000000 -> FFFFFFFF expected FF800000
      Error: FF800000 *. FE800000 -> 7FFFFFFF expected 7F800000
      Error: 7F800000 *. 7EFFFFFF -> 7FFFFFFF expected 7F800000
      Error: FE7FFFFF *. 7F800000 -> FFFFFFFF expected FF800000
      Error: 7F800000 *. FF7FFFFF -> FFFFFFFF expected FF800000
      Error: FF7FFFFF *. FF800000 -> 7FFFFFFF expected 7F800000
      Error: 00800000 *. 7F800000 -> 40800000 expected 7F800000
      Error: 81000000 *. 7F800000 -> C1000000 expected FF800000
      Error: 7F800000 *. 81000000 -> C1000000 expected FF800000
      Error: FF800000 *. 80800000 -> 40800000 expected 7F800000
      Error: 7F800000 *. 00FFFFFF -> 40FFFFFF expected 7F800000
      Error: 80800001 *. 7F800000 -> C0800001 expected FF800000
      Error: 7F800000 *. 80800001 -> C0800001 expected FF800000
      Error: 80FFFFFF *. FF800000 -> 40FFFFFF expected 7F800000
      Error: 00000001 *. 7F800000 -> 35000000 expected 7F800000
      Error: 80000003 *. 7F800000 -> B5C00000 expected FF800000
      Error: 7F800000 *. 80000002 -> B5800000 expected FF800000
      Error: FF800000 *. 80000004 -> 36000000 expected 7F800000
      Error: 7F800000 *. 007FFFFF -> 407FFFFE expected 7F800000
      Error: 807FFFFF *. 7F800000 -> C07FFFFE expected FF800000
      Error: 7F800000 *. 807FFFFF -> C07FFFFE expected FF800000
      Error: 807FFFFF *. FF800000 -> 407FFFFE expected 7F800000
      Error: C03FFFFE *. 7F000000 -> FFBFFFFE expected FF800000
      Error: C09FFFFE *. FF000000 -> 7FFFFFFF expected 7F800000
      Error: C0DFFFF9 *. 7F000000 -> FFFFFFFF expected FF800000
      Error: C1100001 *. FF000000 -> 7FFFFFFF expected 7F800000
      Error: 7F000000 *. 7F000000 -> 7FFFFFFF expected 7F800000
      Error: FF7FFFFD *. 7F000000 -> FFFFFFFF expected FF800000
      Error: 7F000000 *. FE800004 -> FFFFFFFF expected FF800000
      Error: FF000005 *. FF000001 -> 7FFFFFFF expected 7F800000
      Error: 7F000009 *. 7F7FFFFA -> 7FFFFFFF expected 7F800000
      Error: FE7FFFF9 *. 7F000000 -> FFFFFFFF expected FF800000
      Error: 7F000000 *. FE800000 -> FFFFFFFF expected FF800000
      Error: FF7FFFFF *. FF7FFFFF -> 7FFFFFFF expected 7F800000
      Error: FEFFFFF7 *. 7E800001 -> FFFFFFFF expected FF800000
      Error: 7F000000 *. FF000000 -> FFFFFFFF expected FF800000
      Error: 7F000000 *. 7F7FFFFE -> 7FFFFFFF expected 7F800000
      Error: FF7FFFFD *. FF000001 -> 7FFFFFFF expected 7F800000
      Error: 7EFFFFFD *. C0000008 -> FF800006 expected FF800000
      Error: 3F800002 *. 7F7FFFFE -> 7F800001 expected 7F800000
      Error: 7F000009 *. C0C00002 -> FFFFFFFF expected FF800000
      Error: FF7FFFFD *. C0400001 -> 7FFFFFFF expected 7F800000
      Error: 00000001 *. 3F7FFFFF -> 00000000 expected 00000001
      Error: 00000001 *. BF7FFFFF -> 80000000 expected 80000001
      Error: 00FFFFFF *. 3F000000 -> 007FFFFF expected 00800000
      Error: 00FFFFFF *. BF000000 -> 807FFFFF expected 80800000
     Elapsed microseconds: 490
     count: 334 passed: 269
    Div tests: 
      Error: 7F800000 /. 00000000 -> FFFFFFFF expected 7F800000
      Error: FF800000 /. 00000000 -> FFFFFFFF expected FF800000
      Error: 7F800000 /. 80000000 -> FFFFFFFF expected FF800000
      Error: FF800000 /. 80000000 -> FFFFFFFF expected 7F800000
      Error: FF800000 /. 40000000 -> FF000000 expected FF800000
      Error: 7F800000 /. C0400000 -> FEAAAAAB expected FF800000
      Error: FF800000 /. C0800000 -> 7E800000 expected 7F800000
      Error: 7F800000 /. 40A00000 -> 7E4CCCCD expected 7F800000
      Error: FF800000 /. 40C00000 -> FE2AAAAB expected FF800000
      Error: 7F800000 /. C0E00000 -> FE124925 expected FF800000
      Error: FF800000 /. C1000000 -> 7E000000 expected 7F800000
      Error: 3F800000 /. 7F800000 -> 00200000 expected 00000000
      Error: C0000000 /. 7F800000 -> 80400000 expected 80000000
      Error: 40400000 /. FF800000 -> 80600000 expected 80000000
      Error: C0800000 /. FF800000 -> 00800000 expected 00000000
      Error: 40A00000 /. 7F800000 -> 00A00000 expected 00000000
      Error: C0C00000 /. 7F800000 -> 80C00000 expected 80000000
      Error: 40E00000 /. FF800000 -> 80E00000 expected 80000000
      Error: C1000000 /. FF800000 -> 01000000 expected 00000000
      Error: 7F000000 /. 7F800000 -> 3F000000 expected 00000000
      Error: FE800000 /. 7F800000 -> BE800000 expected 80000000
      Error: 7F000000 /. FF800000 -> BF000000 expected 80000000
      Error: FE800000 /. FF800000 -> 3E800000 expected 00000000
      Error: 7EFFFFFF /. 7F800000 -> 3EFFFFFF expected 00000000
      Error: FE7FFFFF /. 7F800000 -> BE7FFFFF expected 80000000
      Error: 7F7FFFFF /. FF800000 -> BF7FFFFF expected 80000000
      Error: FF7FFFFF /. FF800000 -> 3F7FFFFF expected 00000000
      Error: 7F800000 /. 7F000000 -> 40000000 expected 7F800000
      Error: FF800000 /. 7E800000 -> C0800000 expected FF800000
      Error: 7F800000 /. FF000000 -> C0000000 expected FF800000
      Error: FF800000 /. FE800000 -> 40800000 expected 7F800000
      Error: 7F800000 /. 7EFFFFFF -> 40000001 expected 7F800000
      Error: 7F800000 /. FE7FFFFF -> C0800001 expected FF800000
      Error: 7F800000 /. FF7FFFFF -> BF800001 expected FF800000
      Error: FF800000 /. FF7FFFFF -> 3F800001 expected 7F800000
      Error: 7F800000 /. 00800000 -> 7FFFFFFF expected 7F800000
      Error: FF800000 /. 01000000 -> FFFFFFFF expected FF800000
      Error: 7F800000 /. 81000000 -> FFFFFFFF expected FF800000
      Error: FF800000 /. 80800000 -> 7FFFFFFF expected 7F800000
      Error: 7F800000 /. 00FFFFFF -> 7FFFFFFF expected 7F800000
      Error: FF800000 /. 00800001 -> FFFFFFFF expected FF800000
      Error: 7F800000 /. 80800001 -> FFFFFFFF expected FF800000
      Error: FF800000 /. 80FFFFFF -> 7FFFFFFF expected 7F800000
      Error: 7F800000 /. 00000001 -> 7FFFFFFF expected 7F800000
      Error: FF800000 /. 00000003 -> FFFFFFFF expected FF800000
      Error: 7F800000 /. 80000002 -> FFFFFFFF expected FF800000
      Error: FF800000 /. 80000004 -> 7FFFFFFF expected 7F800000
      Error: 7F800000 /. 007FFFFF -> 7FFFFFFF expected 7F800000
      Error: FF800000 /. 007FFFFF -> FFFFFFFF expected FF800000
      Error: 7F800000 /. 807FFFFF -> FFFFFFFF expected FF800000
      Error: FF800000 /. 807FFFFF -> 7FFFFFFF expected 7F800000
      Error: 3F800000 /. 00000000 -> FFFFFFFF expected 7F800000
      Error: C0000000 /. 00000000 -> FFFFFFFF expected FF800000
      Error: 40400000 /. 80000000 -> FFFFFFFF expected FF800000
      Error: C0800000 /. 80000000 -> FFFFFFFF expected 7F800000
      Error: 40A00000 /. 00000000 -> FFFFFFFF expected 7F800000
      Error: C0C00000 /. 00000000 -> FFFFFFFF expected FF800000
      Error: 40E00000 /. 80000000 -> FFFFFFFF expected FF800000
      Error: C1000000 /. 80000000 -> FFFFFFFF expected 7F800000
      Error: 7F000000 /. 00000000 -> FFFFFFFF expected 7F800000
      Error: FE800000 /. 00000000 -> FFFFFFFF expected FF800000
      Error: 7F000000 /. 80000000 -> FFFFFFFF expected FF800000
      Error: FE800000 /. 80000000 -> FFFFFFFF expected 7F800000
      Error: 7EFFFFFF /. 00000000 -> FFFFFFFF expected 7F800000
      Error: FE7FFFFF /. 00000000 -> FFFFFFFF expected FF800000
      Error: 7E7FFFFF /. 80000000 -> FFFFFFFF expected FF800000
      Error: FEFFFFFF /. 80000000 -> FFFFFFFF expected 7F800000
      Error: 00800000 /. 00000000 -> FFFFFFFF expected 7F800000
      Error: 81000000 /. 00000000 -> FFFFFFFF expected FF800000
      Error: 01000000 /. 80000000 -> FFFFFFFF expected FF800000
      Error: 80800000 /. 80000000 -> FFFFFFFF expected 7F800000
      Error: 00FFFFFF /. 00000000 -> FFFFFFFF expected 7F800000
      Error: 80800001 /. 00000000 -> FFFFFFFF expected FF800000
      Error: 00800001 /. 80000000 -> FFFFFFFF expected FF800000
      Error: 80FFFFFF /. 80000000 -> FFFFFFFF expected 7F800000
      Error: 00000001 /. 00000000 -> FFFFFFFF expected 7F800000
      Error: 80000003 /. 00000000 -> FFFFFFFF expected FF800000
      Error: 00000002 /. 80000000 -> FFFFFFFF expected FF800000
      Error: 80000004 /. 80000000 -> FFFFFFFF expected 7F800000
      Error: 007FFFFF /. 00000000 -> FFFFFFFF expected 7F800000
      Error: 807FFFFF /. 00000000 -> FFFFFFFF expected FF800000
      Error: 007FFFFF /. 80000000 -> FFFFFFFF expected FF800000
      Error: 807FFFFF /. 80000000 -> FFFFFFFF expected 7F800000
      Error: 7B000000 /. 05000000 -> 7FFFFFFF expected 7F800000
      Error: 7F7FFFFF /. 00000001 -> 7FFFFFFF expected 7F800000
      Error: 7F000000 /. 007FFFFF -> 7FFFFFFF expected 7F800000
      Error: 00FFFFFF /. 40000000 -> 007FFFFF expected 00800000
      Error: 5E800000 /. 00000000 -> FFFFFFFF expected 7F800000
      Error: 46FFFE00 /. 00000000 -> FFFFFFFF expected 7F800000
      Error: C6FFFE00 /. 00000000 -> FFFFFFFF expected FF800000
     Elapsed microseconds: 740
     count: 379 passed: 289
    Sqrt tests: 
      Error: FSQRT 80000000 -> FFFFFFFF expected 80000000
      Error: FSQRT 7F800000 -> 5F800000 expected 7F800000
     Elapsed microseconds: 753
     count: 96 passed: 94
    
  • evanhevanh Posts: 11,824

    Rounding errors: IEEE 32 says results should be rounded to the nearest even bit pattern

    Gee, never tried to consider that level of specificity myself. I'm guessing there is some particular method that produces that as a natural outcome ...

  • ersmithersmith Posts: 5,051

    Oh, and another (minor) bug: DEBUG(fdec(x)) is printing "3.402824e+38" instead of "Inf" for $7f80_0000.

  • ersmithersmith Posts: 5,051
    edited 2021-09-25 17:07

    And another oddity: NaN ==. NaN is returning FALSE (as it should), but NaN <>. NaN is also returning FALSE. I'm not sure if this is a bug, exactly, but in most other languages I've seen NaN <>. NaN would return TRUE, just like !(NaN ==. NaN).

    ~~EDITED: Whoops! It turns out that NaN <>. NaN should be false, according to the spec, so this is a case where Spin2 is doing the right thing, and gcc is not.~~

    EDITED further: I should have gone back to the actual standard rather than relying on web pages. The IEEE spec is very careful in specifying that any comparison involving NaN should return unordered, which is distinct from equal (and hence x "not equal" x should be TRUE if x is NaN).

  • ersmithersmith Posts: 5,051

    @evanh said:

    Rounding errors: IEEE 32 says results should be rounded to the nearest even bit pattern

    Gee, never tried to consider that level of specificity myself. I'm guessing there is some particular method that produces that as a natural outcome ...

    No, it's just that if you have a value like say 1.5 that's exactly half way between integers, then there's no unique "nearest" value to round to. Always rounding up results in a small bias away from 0. This can matter a lot in scientific and financial applications. For an integer example, if you have inputs that are exactly 1.5 and 2.5 and you round up then you get 2 and 3 (the sum is 5), whereas if you round to nearest even you get 2 and 2 ( the sum is 4, the same as the sum of the original inputs). This special rounding only needs to happen if the input is exactly half way between possible outputs of the rounding; otherwise, of course, you round to the nearest number.

  • ersmithersmith Posts: 5,051
    edited 2021-09-25 16:58

    (deleted)

  • evanhevanh Posts: 11,824
    edited 2021-09-25 17:03

    Okay, got it.

    What I've been doing previously is adding 50% of the divisor to the dividend before division. But the means by which I get 50% is simple truncation (SHR #1) ...

  • evanhevanh Posts: 11,824
    edited 2021-09-25 18:44

    I think SHR #2 then SHL #1 would work. In other words clear the lsb of the 50% divisor. Nope failed.
    Best I got is: ((dividend + (divisor>>1) - 1) | 1) / divisor grr, that fails too.
    Maybe this time: First copy dividend bit1 to bit0, then (dividend + divisor>>1) / divisor damn it! Way past bed time.

  • evanhevanh Posts: 11,824
    edited 2021-09-26 16:06

    Right, I've cheated and used the Cordic's pipeline to give me both results at once.

    pub  div33ieee( dividend, divisor ) : r | s
        org
            mov r, divisor
            shr r, #1       wc      ' odd/even check
            add dividend, r         ' + 50% divisor
    '       rep @.rend, #1
            qdiv    dividend, divisor
        if_nc   sub dividend, #1            ' even
        if_nc   qdiv    dividend, divisor       ' even
        if_nc   getqx   s               ' even
        if_nc   testb   s, #0       wc      ' even
            getqx   r
    .rend
        if_nc   mov r, s                ' even
        end
    
  • evanhevanh Posts: 11,824

    Here's the muldiv65() version:

    pub  muldiv65ieee( mult1, mult2, divisor ) : r | s
        org
            rep @.rend1, #1         ' IRQ shield
            qmul    mult1, mult2
            mov r, divisor
            shr r, #1
            testb   divisor, #0 wz      ' odd(1)/even(0)
            getqx   mult1
            getqy   mult2
    .rend1
            add mult1, r    wc      ' + 50% divisor
            addx    mult2, #0
    
            rep @.rend2, #1         ' IRQ shield
            setq    mult2
            qdiv    mult1, divisor
        if_nz   sub mult1, #1   wc      ' minus one
        if_nz   subx    mult2, #0
        if_nz   setq    mult2
        if_nz   qdiv    mult1, divisor
        if_nz   getqx   s
        if_nz   testb   s, #0       wz      ' odd(1)/even(0)
            getqx   r
    .rend2
        if_nz   mov r, s                ' even
        end
    
  • evanhevanh Posts: 11,824
    edited 2021-09-27 08:12

    I figure Chip's original muldiv64(), including two IRQ shields, should take 124 sysclock ticks. In reality it takes more because of skipping and other overheads. But, for comparison, I'll use 124 ticks.

    pub  muldiv64( mult1, mult2, divisor ) : r
        org
            rep @.rend1, #1         ' IRQ shield
            qmul    mult1, mult2            ' 4
            getqx   mult1               ' 60
    .rend1
            getqy   mult2
    
            rep @.rend2, #1         ' IRQ shield
            setq    mult2
            qdiv    mult1, divisor          ' 68
            getqx   r               ' 124
    .rend2
        end
    

    My extended muldiv65() comes to 132 ticks. An extra 8 ticks between the QMUL and QDIV.

    pub  muldiv65( mult1, mult2, divisor ) : r
        org
            rep @.rend1, #1         ' IRQ shield
            qmul    mult1, mult2            ' 4
            mov r, divisor
            shr r, #1
            getqx   mult1               ' 60
            getqy   mult2
    .rend1
            add mult1, r    wc
            addx    mult2, #0
    
            rep @.rend2, #1         ' IRQ shield
            setq    mult2
            qdiv    mult1, divisor          ' 76
            getqx   r               ' 132
    .rend2
        end
    

    And finally, muldiv65ieee() comes to 142 ticks.
    All versions additionally have variability of up to +7 more for hub-op alignment.

  • cgraceycgracey Posts: 13,627

    I just posted a new version 35q at the top of this thread.

    The main symbol table was increased from 64KB to 256KB. Others went from 4KB to 32KB.

    https://drive.google.com/file/d/1Op4YOyFIul-7-UhRq7Cdv-OfGx5afb5M/view?usp=sharing

Sign In or Register to comment.