PNut/Spin2 Latest Version (v47 - Cooperative multitasking added to Spin2, up to 32 tasks)

cgracey · 2022-02-06 10:51

@evanh said:
Cool. I've not delved into floats yet but that negative constants improvement will definitely prevent some extra swear words.

Did the Cordic pipeline trick work out for de-biasing?

Which pipeline trick are you meaning? I just added 3 instructions, I think, to do the job. It's in the "packf" routine.

evanh · 2022-02-06 10:55

Okay, of course, it's done at packing time. The one did was integrated to divide and not generic. And I keep forgetting, I didn't have the luxury of excess lower bits.

cgracey · 2022-02-06 10:59

@evanh said:
Okay, of course, it's done at packing time. The one did was integrated to divide and not generic. And I keep forgetting, I didn't have the luxury of excess lower bits.

I think I remember this. It's purpose was to give a rounded quotient, right?

evanh · 2022-02-06 11:02

Yes. Matching the IEEE de-biasing results, but applied to 32-bit integers instead of floats. Mainly for use with enhanced muldiv65() routine. Which itself was an extension to your original muldiv64().

cgracey · 2022-02-06 11:21

@evanh said:
Yes. Matching the IEEE de-biasing results, but applied to 32-bit integers instead of floats. Mainly for use with enhanced muldiv65() routine. Which itself was an extension to your original muldiv64().

Ah, yes. I keep remembering and forgetting about this. Do you think it would be good to modify the MULDIV64() to operate differently? I remember someone pointing out that the error could be quite large in some cases.

evanh · 2022-02-06 11:28

@cgracey said:
Ah, yes. I keep remembering and forgetting about this. Do you think it would be good to modify the MULDIV64() to operate differently? I remember someone pointing out that the error could be quite large in some cases.

Yes, the cost for the basic (+ divisor>>1) was minimal ... but it doesn't combat the nature of integers losing resolution at an alarming rate. I was surprised how much of an improvement fixed-point can be, even with multiplication. Ada made that clear not long ago.

evanh · 2022-02-06 11:35

So that's a feature request, btw, of the Cordic if the Prop2 ever goes for a re-spin. Providing a new instruction, GETQZ say, that grabs QY[15:0]and QX[31:16] for doing faster 16.16 fixed point.

msrobots · 2022-02-06 17:04

Byte -$80 to $FF ?
Word -$8000 to $FFFF?

how?

confused

Mike

cgracey · 2022-02-06 18:31

@msrobots said:
Byte -$80 to $FF ?
Word -$8000 to $FFFF?

how?

confused

Mike

It checks to make sure values will fit into bytes/words.

It accommodates byte/word signed values that can be later sign-extended back to 32 bits.

Also, it allows unsigned values up to the byte/word size.

I will probably change it in the next release to work by placing the word FIT after BYTE/LONG.

evanh · 2022-02-06 22:54

Hmm, not too sure about performing unsigned bounding. Unsigned is naturally circular. That's how carry/borrow functions. Masking suits it better than bounding.

evanh · 2022-02-07 01:10

Chip,
Scratch the above feature request. I just realise that GETQX and GETQY are not bound to the usual hub-op granularity timing. It isn't clear in the spreadsheet that they aren't affected by hub slots alignment.

For 16.16 fixed-point, this is pretty quick in the scheme of things:

    getqx  pa
    getqy  fp
    rolword fp, pa, #1

And even 24.8 fixed-point is just as quick:

    getqx  pa
    getqy  fp
    rolbyte fp, pa, #3

And 8.24 is one more instruction:

    getqx  pa
    getqy  fp
    rolword fp, pa, #1
    rolbyte fp, pa, #1

Wuerfel_21 · 2022-02-07 01:29

@evanh said:
For 16.16 fixed-point, this is pretty quick in the scheme of things:
    getqx  pa
    getqy  fp
    rolword fp, pa, #1

Think of that in a loop though. Remember, it's not single multiply ops, it's vectors of sums of products. So add an ALTR + ADD and whoops it no longer fits into an 8-cycle loop and you have to waste half the cordic throughput.

evanh · 2022-02-07 01:37

Isn't there more to it though? It's never going to fit in an eight clock window.

Wuerfel_21 · 2022-02-07 01:45

Yes, with current P2 you also need to work around QMUL being unsigned-only, which is a much worse issue than extracting the fixed point result. Other than that, no.

cgracey · 2022-02-07 02:05

If you want to pipeline these operations, then just store the GETQX and GETQY values and process them later.

evanh · 2022-02-07 03:07

I imagine something like this. The second loop needs 15 clock cycles, first loop padded to suit the second:

        ...
        rep @.rend1, #4
        rflong  pa
        rdlut   pb, ptra++
        qmul    pa, pb
        waitx   #4
.rend1
        rep @.rend2, count
        rflong  pa
        rdlut   pb, ptra++
        qmul    pa, pb

        getqx   pa
        getqy   pb
        altr    bufi, buff_m
        rolword pb, pa
.rend2
        rep @.rend3, #4
        getqx   pa
        getqy   pb
        altr    bufi, buff_m
        rolword pb, pa
.rend3
        ...

evanh · 2022-02-07 08:42

Chip,
For inline Pasm code within a Spin method, is it guaranteed to be a fresh copy of the routine in cogRAM for each time the assembled code is called? I ask because I'm considering doing self-modify. And it would be shorter if the cogRAM copy gets reset for each call.

TonyB_ · 2022-02-07 10:22

@evanh said:
Chip,
For inline Pasm code within a Spin method, is it guaranteed to be a fresh copy of the routine in cogRAM for each time the assembled code is called? I ask because I'm considering doing self-modify. And it would be shorter if the cogRAM copy gets reset for each call.

A search for "inline" in Spin2_interpreter.spin2 will provide the answer.

evanh · 2022-02-07 10:46

I should ask Eric too. If he's not happy to support it then I better not. I've kind of already flagged even the self-modifying option as too much overhead with minimal speed up. The results are good as is.

cgracey · 2022-02-07 16:24

@evanh said:
Chip,
For inline Pasm code within a Spin method, is it guaranteed to be a fresh copy of the routine in cogRAM for each time the assembled code is called? I ask because I'm considering doing self-modify. And it would be shorter if the cogRAM copy gets reset for each call.

Yes, the inline PASM code gets reloaded into cog RAM each time it is executed.

evanh · 2022-02-07 23:15

Thanks Chip.

msrobots · 2022-02-08 03:41

@cgracey said:

@evanh said:
Chip,
For inline Pasm code within a Spin method, is it guaranteed to be a fresh copy of the routine in cogRAM for each time the assembled code is called? I ask because I'm considering doing self-modify. And it would be shorter if the cogRAM copy gets reset for each call.

Yes, the inline PASM code gets reloaded into cog RAM each time it is executed.

but what about interrupts, you had some sample staying resident?

evanh · 2022-02-08 04:25

@msrobots said:
but what about interrupts, you had some sample staying resident?

That's not inline code then. Differently loaded. Flexspin don't support it so I'm vague on how. It'll be a DAT section using regload() or something.

cgracey · 2022-02-08 09:25

The inline PASM code loads wherever the ORG says to. If you ORG $080, the code will be loaded into register $080, upwards. If your next ORG is $000 (default), your code loads into register $000, upwatds. It doesn't affect the old code at $080 unless it overlaps it.

evanh · 2022-02-08 09:45

Huh, never expected a value after ORG to have an effect for inlined. Obviously important for regload()ed sections though.

ManAtWork · 2022-02-19 16:29

Some improvement requests for the Propeller Tool:

The debug window has a mysterious delay when scrolling. The last line is not cleared immediately but instead keeps a copy of the line just scrolled upward. Half a second later it is eventually cleared or overwritten with new content.
The debug window is not refreshed when it is obscured by another window and then brought to the front again. Instead, it stays black.
It would be nice if text from the debug window could be copied/pasted into an editor. This would make it much easier to share or document debugging output, for example here in the forum.

Jeff Martin · 2022-02-19 16:54

The debug window has a mysterious delay when scrolling. The last line is not cleared immediately but instead keeps a copy of the line just scrolled upward. Half a second later it is eventually cleared or overwritten with new content.

That annoys me too. I think this may happen only on certain systems or OS versions.

The debug window is not refreshed when it is obscured by another window and then brought to the front again. Instead, it stays black.

I've seen that also. What OS are you experiencing this? I saw it on Win 7, but not Win 10 when I tested. The solution (I've already found) may slow the display slightly, so I was leery of implementing it.

It would be nice if text from the debug window could be copied/pasted into an editor. This would make it much easier to share or document debugging output, for example here in the forum.

Noted. In the meantime, you can use the DEBUG log feature, but should be careful about limiting the log size and it is certainly not as convenient in a situation with an impromptu need for such text.

ManAtWork · 2022-02-19 19:24

@"Jeff Martin" said:
What OS are you experiencing this? I saw it on Win 7,

Yes, I also use Win7

Noted. In the meantime, you can use the DEBUG log feature

I think you've already mentioned that but, unfortunatelly, I haven't managed to find out how this works. How is the log feature activated?

Jeff Martin · 2022-02-22 18:28

@ManAtWork said:
I think you've already mentioned that but, unfortunatelly, I haven't managed to find out how this works. How is the log feature activated?

In your code, just define DEBUG_LOG_SIZE in your code as a value > 0... set it to the maximum number of bytes you'd like to limit the log file too.

CON

  DEBUG_LOG_SIZE = 1024

When using Propeller Tool, the log file will be stored as the "DEBUG.log" file in your My Documents > Propeller Tool folder.

cgracey · 2022-02-25 03:18

Yes, the DEBUG window is just a visual thing designed to run as fast as possible. It simply scrolls the bitmap each time a new line is printed. It is up to Windows to get around to repainting the window. This is much faster than redrawing all the text, and it enables things to go really fast, to not incur much of a DEBUG cache delay.

Like Jeff said, you can log the first N bytes of DEBUG data by setting that DEBUG_LOG_SIZE to a size limit, so that the DEBUG data goes into a DEBUG.log file.

PNut/Spin2 Latest Version (v47 - Cooperative multitasking added to Spin2, up to 32 tasks)

Comments