Shop OBEX P1 Docs P2 Docs Learn Events
PNut/Spin2 Latest Version (v46 - DEBUG gating, clock-setter control, VAR flexibility, C_Z for DEBUG) - Page 49 — Parallax Forums

PNut/Spin2 Latest Version (v46 - DEBUG gating, clock-setter control, VAR flexibility, C_Z for DEBUG)

1464749515265

Comments

  • cgraceycgracey Posts: 14,151

    @evanh said:
    Cool. I've not delved into floats yet but that negative constants improvement will definitely prevent some extra swear words.

    Did the Cordic pipeline trick work out for de-biasing?

    Which pipeline trick are you meaning? I just added 3 instructions, I think, to do the job. It's in the "packf" routine.

  • evanhevanh Posts: 15,914
    edited 2022-02-06 10:57

    Okay, of course, it's done at packing time. The one did was integrated to divide and not generic. And I keep forgetting, I didn't have the luxury of excess lower bits.

  • cgraceycgracey Posts: 14,151

    @evanh said:
    Okay, of course, it's done at packing time. The one did was integrated to divide and not generic. And I keep forgetting, I didn't have the luxury of excess lower bits.

    I think I remember this. It's purpose was to give a rounded quotient, right?

  • evanhevanh Posts: 15,914
    edited 2022-02-06 11:05

    Yes. Matching the IEEE de-biasing results, but applied to 32-bit integers instead of floats. Mainly for use with enhanced muldiv65() routine. Which itself was an extension to your original muldiv64().

  • cgraceycgracey Posts: 14,151
    edited 2022-02-06 11:21

    @evanh said:
    Yes. Matching the IEEE de-biasing results, but applied to 32-bit integers instead of floats. Mainly for use with enhanced muldiv65() routine. Which itself was an extension to your original muldiv64().

    Ah, yes. I keep remembering and forgetting about this. Do you think it would be good to modify the MULDIV64() to operate differently? I remember someone pointing out that the error could be quite large in some cases.

  • evanhevanh Posts: 15,914

    @cgracey said:
    Ah, yes. I keep remembering and forgetting about this. Do you think it would be good to modify the MULDIV64() to operate differently? I remember someone pointing out that the error could be quite large in some cases.

    Yes, the cost for the basic (+ divisor>>1) was minimal ... but it doesn't combat the nature of integers losing resolution at an alarming rate. I was surprised how much of an improvement fixed-point can be, even with multiplication. Ada made that clear not long ago.

  • evanhevanh Posts: 15,914
    edited 2022-02-07 01:10

    So that's a feature request, btw, of the Cordic if the Prop2 ever goes for a re-spin. Providing a new instruction, GETQZ say, that grabs QY[15:0]and QX[31:16] for doing faster 16.16 fixed point.

  • Byte -$80 to $FF ?
    Word -$8000 to $FFFF?

    how?

    confused

    Mike

  • cgraceycgracey Posts: 14,151
    edited 2022-02-06 18:33

    @msrobots said:
    Byte -$80 to $FF ?
    Word -$8000 to $FFFF?

    how?

    confused

    Mike

    It checks to make sure values will fit into bytes/words.

    It accommodates byte/word signed values that can be later sign-extended back to 32 bits.

    Also, it allows unsigned values up to the byte/word size.

    I will probably change it in the next release to work by placing the word FIT after BYTE/LONG.

  • evanhevanh Posts: 15,914
    edited 2022-02-06 22:55

    Hmm, not too sure about performing unsigned bounding. Unsigned is naturally circular. That's how carry/borrow functions. Masking suits it better than bounding.

  • evanhevanh Posts: 15,914
    edited 2022-02-07 01:21

    Chip,
    Scratch the above feature request. I just realise that GETQX and GETQY are not bound to the usual hub-op granularity timing. It isn't clear in the spreadsheet that they aren't affected by hub slots alignment.

    For 16.16 fixed-point, this is pretty quick in the scheme of things:

        getqx  pa
        getqy  fp
        rolword fp, pa, #1
    

    And even 24.8 fixed-point is just as quick:

        getqx  pa
        getqy  fp
        rolbyte fp, pa, #3
    

    And 8.24 is one more instruction:

        getqx  pa
        getqy  fp
        rolword fp, pa, #1
        rolbyte fp, pa, #1
    
  • @evanh said:
    For 16.16 fixed-point, this is pretty quick in the scheme of things:

        getqx  pa
        getqy  fp
        rolword fp, pa, #1
    

    Think of that in a loop though. Remember, it's not single multiply ops, it's vectors of sums of products. So add an ALTR + ADD and whoops it no longer fits into an 8-cycle loop and you have to waste half the cordic throughput.

  • evanhevanh Posts: 15,914

    Isn't there more to it though? It's never going to fit in an eight clock window.

  • Yes, with current P2 you also need to work around QMUL being unsigned-only, which is a much worse issue than extracting the fixed point result. Other than that, no.

  • cgraceycgracey Posts: 14,151

    If you want to pipeline these operations, then just store the GETQX and GETQY values and process them later.

  • evanhevanh Posts: 15,914
    edited 2022-02-07 03:09

    I imagine something like this. The second loop needs 15 clock cycles, first loop padded to suit the second:

            ...
            rep @.rend1, #4
            rflong  pa
            rdlut   pb, ptra++
            qmul    pa, pb
            waitx   #4
    .rend1
            rep @.rend2, count
            rflong  pa
            rdlut   pb, ptra++
            qmul    pa, pb
    
            getqx   pa
            getqy   pb
            altr    bufi, buff_m
            rolword pb, pa
    .rend2
            rep @.rend3, #4
            getqx   pa
            getqy   pb
            altr    bufi, buff_m
            rolword pb, pa
    .rend3
            ...
    
  • evanhevanh Posts: 15,914
    edited 2022-02-07 08:47

    Chip,
    For inline Pasm code within a Spin method, is it guaranteed to be a fresh copy of the routine in cogRAM for each time the assembled code is called? I ask because I'm considering doing self-modify. And it would be shorter if the cogRAM copy gets reset for each call.

  • @evanh said:
    Chip,
    For inline Pasm code within a Spin method, is it guaranteed to be a fresh copy of the routine in cogRAM for each time the assembled code is called? I ask because I'm considering doing self-modify. And it would be shorter if the cogRAM copy gets reset for each call.

    A search for "inline" in Spin2_interpreter.spin2 will provide the answer.

  • evanhevanh Posts: 15,914

    I should ask Eric too. If he's not happy to support it then I better not. I've kind of already flagged even the self-modifying option as too much overhead with minimal speed up. The results are good as is.

  • cgraceycgracey Posts: 14,151
    edited 2022-02-07 16:24

    @evanh said:
    Chip,
    For inline Pasm code within a Spin method, is it guaranteed to be a fresh copy of the routine in cogRAM for each time the assembled code is called? I ask because I'm considering doing self-modify. And it would be shorter if the cogRAM copy gets reset for each call.

    Yes, the inline PASM code gets reloaded into cog RAM each time it is executed.

  • evanhevanh Posts: 15,914

    Thanks Chip.

  • @cgracey said:

    @evanh said:
    Chip,
    For inline Pasm code within a Spin method, is it guaranteed to be a fresh copy of the routine in cogRAM for each time the assembled code is called? I ask because I'm considering doing self-modify. And it would be shorter if the cogRAM copy gets reset for each call.

    Yes, the inline PASM code gets reloaded into cog RAM each time it is executed.

    but what about interrupts, you had some sample staying resident?

  • evanhevanh Posts: 15,914
    edited 2022-02-08 04:30

    @msrobots said:
    but what about interrupts, you had some sample staying resident?

    That's not inline code then. Differently loaded. Flexspin don't support it so I'm vague on how. It'll be a DAT section using regload() or something.

  • cgraceycgracey Posts: 14,151
    edited 2022-02-08 09:26

    The inline PASM code loads wherever the ORG says to. If you ORG $080, the code will be loaded into register $080, upwards. If your next ORG is $000 (default), your code loads into register $000, upwatds. It doesn't affect the old code at $080 unless it overlaps it.

  • evanhevanh Posts: 15,914

    Huh, never expected a value after ORG to have an effect for inlined. Obviously important for regload()ed sections though.

  • Some improvement requests for the Propeller Tool:

    • The debug window has a mysterious delay when scrolling. The last line is not cleared immediately but instead keeps a copy of the line just scrolled upward. Half a second later it is eventually cleared or overwritten with new content.
    • The debug window is not refreshed when it is obscured by another window and then brought to the front again. Instead, it stays black.
    • It would be nice if text from the debug window could be copied/pasted into an editor. This would make it much easier to share or document debugging output, for example here in the forum.
    • The debug window has a mysterious delay when scrolling. The last line is not cleared immediately but instead keeps a copy of the line just scrolled upward. Half a second later it is eventually cleared or overwritten with new content.

    That annoys me too. I think this may happen only on certain systems or OS versions.

    • The debug window is not refreshed when it is obscured by another window and then brought to the front again. Instead, it stays black.

    I've seen that also. What OS are you experiencing this? I saw it on Win 7, but not Win 10 when I tested. The solution (I've already found) may slow the display slightly, so I was leery of implementing it.

    • It would be nice if text from the debug window could be copied/pasted into an editor. This would make it much easier to share or document debugging output, for example here in the forum.

    Noted. In the meantime, you can use the DEBUG log feature, but should be careful about limiting the log size and it is certainly not as convenient in a situation with an impromptu need for such text.

  • @"Jeff Martin" said:
    What OS are you experiencing this? I saw it on Win 7,

    Yes, I also use Win7

    Noted. In the meantime, you can use the DEBUG log feature

    I think you've already mentioned that but, unfortunatelly, I haven't managed to find out how this works. How is the log feature activated?

  • @ManAtWork said:
    I think you've already mentioned that but, unfortunatelly, I haven't managed to find out how this works. How is the log feature activated?

    In your code, just define DEBUG_LOG_SIZE in your code as a value > 0... set it to the maximum number of bytes you'd like to limit the log file too.

    CON
    
      DEBUG_LOG_SIZE = 1024
    

    When using Propeller Tool, the log file will be stored as the "DEBUG.log" file in your My Documents > Propeller Tool folder.

  • cgraceycgracey Posts: 14,151

    Yes, the DEBUG window is just a visual thing designed to run as fast as possible. It simply scrolls the bitmap each time a new line is printed. It is up to Windows to get around to repainting the window. This is much faster than redrawing all the text, and it enables things to go really fast, to not incur much of a DEBUG cache delay.

    Like Jeff said, you can log the first N bytes of DEBUG data by setting that DEBUG_LOG_SIZE to a size limit, so that the DEBUG data goes into a DEBUG.log file.

Sign In or Register to comment.