P2 PASM SIN 16Bit

evanh · 2021-03-26 13:36

I multiplied freql, not clkfreq. But dividing clkfreq is also an option.

MOV pa, clkfreq
SHR pa, #8     'divide by 256
QFRAC freql, pa
GETQX f1

pic18f2550 · 2021-03-26 13:45

I already have tomatoes on my eyes.

Here, 268,437,500Hz would be a bigger calculation error than 256,000,000Hz.

Thank you

evanh · 2021-03-26 14:07

The assembly code below is the same as f1 long round(freql * 65536.0 * 65536.0 * 256.0 / float(clkfreq_))

MOV pa, freql
SHL pa, #8     'PA = freql * 256
QFRAC pa, clkfreq
GETQX f1     'f1 = freql * 256 * $1_0000_0000 / clkfreq

pic18f2550 · 2021-03-26 21:40

The calculation of the sine curve is all well and good, but it consumes more than 60 clk impulses.

I will rather create a sin table in the hram.
That is definitely faster.

That way I can calculate the frequency more easily. The missing gaps are simply calculated out.

evanh · 2021-03-26 22:41

No problem. Good exercise anyway. Chip's Spiral Demo is a good one to have a look at for seeing the Cordic in heavy use.

pic18f2550 · 2021-03-27 00:11

Yes, I want to watch the demo too.
But I'm still missing the DVI adapter.
I still have to build it.
You can fool the eye, but the ear doesn't miss a thing.
I have installed 5 oscillators so far.
Two of them have an extra reel. (LFO)
The 1st generates an amplitude modulation at 3..5.
The 2nd generates a frequency modulation at 3..5.

If everything works out, I will be able to add 5 more.
And all this in one COG.
Then there will be 5 of them working around in the propeller.
50 oscillators - a great idea

Translated with www.DeepL.com/Translator (free version)

evanh · 2021-03-27 02:15

VGA version - https://forums.parallax.com/discussion/comment/1484706/#Comment_1484706

pic18f2550 · 2021-03-27 22:36

In my attempts, I have come across something that I can't explain.
A wrong frequency is generated here.

dac2
        {{ 1. Sampelausschnitt und Amplitute Berechnen }}
        add     ph1, fr1
        qrotate am1, ph1
        getqy   outw
        bitnot  outw, #15

The correct frequency is generated here.

dac2
        {{ 1. Sampelausschnitt und Amplitute Berechnen }}
        add     ph1, fr1
        nop                 '<--- Warum auch immer sonst stimmt getqy nicht
        qrotate am1, ph1
        getqy   outw
        bitnot  outw, #15

fr1 and fr2 are apparently not the same either.
Since a frequency error also occurs here.

PUB
K1              = 38400

PUB main()
  fr1 := (12345) / 30
  fr2 := (12345.00) / 30.00

evanh · 2021-03-27 23:32

The NOP is not needed for what's there. So something else, not listed, is the problem.

fr2 is an IEEE float. It won't be usable without conversion using constant ROUND(), ie: At compile time only. It is not run-time usable unless you know how to decode a float, bit by bit.

pic18f2550 · 2021-03-28 00:42

I have now replaced the wait function "waitse1" with "waitct1".
I let the DAC run freely with 1M/s sampels.
By "waitse1" I can reduce the sampelrate to 250k/s and get the needed computing time.
I don't need the NOP anymore.

I have made the calculation over constants, here rounding errors can be removed more easily.

evanh · 2021-03-28 01:06

Cleaned up the constant floats for you (without a "." they are integers):

K10kHz          = 163840000.0
K1kHz           = 16384000.0
K100Hz          = 1638400.0
K10Hz           = 163840.0
K1Hz            = 16384.0
KHz1            = 1638.4
KHz10           = 163.84
fr1_1       = round( (2.0 * K10kHz) + (0.0 * K1kHz) + (0.0 * K100Hz) + (0.0 * K10Hz) + (0.0 * K1Hz) )

  fr1 := fr1_1

evanh · 2021-03-28 01:14

Best not to use ASMCLK when mixed Spin/Pasm. It's for pure pasm only code. Just let Spin handle the sysclock setup.

evanh · 2021-03-28 01:24

The "dacs" value isn't suitable for PWM dither either. I'd ditch the variable and just use an immediate WXPIN #256,#pin1

EDIT: The small value in "dacs" will be why the WAITSE1 wasn't good.

evanh · 2021-03-28 01:31

Pin mode value commented:

dac     long    %0000_0000_000_101_00_00000000_01_00011_0   'P_DAC_990R_3V dac mode + DAC_DITHER_PWM smartpin mode

pic18f2550 · 2021-03-28 14:28

asmclk I have taken out, had no negative effects,

pic18f2550 · 2021-03-28 14:59

Signal image of 20kHz at 250k sampling rate.
With the help of linear interpolation, I want to calculate the 3 missing intermediate values.
So I come to 1M sampling, and the stairs become smaller.

Here is the sum signal of 4 oscillators.

Now come square, triangle and PWM.
And everything must be based with the same values as the sinius.

evanh · 2021-03-28 21:44

Nice.

But "dacs" needs to be a multiple of 256 if you want the PWM dither to work. And then you can correctly go back to using WAITSE1.

pic18f2550 · 2021-03-28 22:22

dacs is now fixed at 256.
Otherwise it makes no sense in the project.
I still have to come up with something for the other signal types. Tables are unsuitable here.
I'm racking my brains over the interpolation. But I haven't found a useful solution yet.
I will probably reduce to 200ks.
That will make the calculation easier, divide by 2, 4 and simple addition.

evanh · 2021-03-28 22:56

Cordic can go a lot faster for prebuilding multiple points. Just have to stagger the calculations to use the pipeline. Up to one command per 8-clocks. And if that is still not enough, the next level is get other cogs to help by using their share of the Cordic. It can, in theory, achieve a command per clock.

pic18f2550 · 2021-03-29 08:54

Let's see if that works too.

        {{ linear interpolation }}
        mov     outwd, outwa     {{  2  }}
        sub     outwd, outwn     {{  2  }}
        qdiv    outwd, #3        {{  9  }}
        getqx   outwd
        mov     outw1, outwa     {{  2  }}
        bitnot  outw1, #15
        mov     outw2, outwa     {{  2  }}
        add     outw2, outwd     {{  2  }}
        bitnot  outw2, #15
        mov     outw3, outwa     {{  2  }}
        add     outw3, outwd     {{  2  }}
        add     outw3, outwd     {{  2  }}
        bitnot  outw3, #15
        mov     outw4, outwa     {{  2  }}
        add     outw3, outwd     {{  2  }}
        add     outw3, outwd     {{  2  }}
        add     outw3, outwd     {{  2  }}
        bitnot  outw4, #15
        mov     outwa, outwn     {{  2  }}

evanh · 2021-03-29 09:58

Here's a part of Chip's Spiral Demo. It demonstrates how to fill the Cordic pipeline:

.pixels     qvector .x,.y   '0 in       'do overlapped QVECTOR ops for 16 pixels
        add .x,#1   '1 in
        qvector .x,.y
        add .x,#1   '2 in
        qvector .x,.y
        add .x,#1   '3 in
        qvector .x,.y
        add .x,#1   '4 in
        qvector .x,.y
        add .x,#1   '5 in
        qvector .x,.y
        add .x,#1   '6 in
        qvector .x,.y
        add .x,#1   '7 in
        qvector .x,.y
        getqx   .px+0   '0 out
        getqy   .py+0
        add .x,#1   '8 in
        qvector .x,.y
        getqx   .px+1   '1 out
        getqy   .py+1
        add .x,#1   '9 in
        qvector .x,.y
        getqx   .px+2   '2 out
        getqy   .py+2
        ...
        ...

evanh · 2021-03-29 10:03

Note how results have to be collected too. Two GETQ's, an ADD, and QVECTOR is 8 sysclocks, so couldn't issue commands faster even if the pipeline could go faster.

pic18f2550 · 2021-03-29 10:46

If everything works out, I'll be there with 35 sysclocks.
The "bitnot" already belong to the signal conversion of the PWM.

evanh · 2021-03-29 11:30

I've shaved 14 clocks from your above code:

        {{ linear interpolation }}
        mov     outwd, outwa     {{  2  }}
        sub     outwd, outwn     {{  2  }}
        qdiv    outwd, #3        {{  9  }}

        mov     outw1, outwa     {{  2  }}
        bitnot  outw1, #15
        mov     outw2, outwa     {{  2  }}
        mov     outw3, outwa     {{  2  }}
        mov     outw4, outwa     {{  2  }}
        bitnot  outw4, #15
        mov     outwa, outwn     {{  2  }}

        getqx   outwd
        add     outw2, outwd     {{  2  }}
        bitnot  outw2, #15
        add     outw3, outwd     {{  2  }}
        add     outw3, outwd     {{  2  }}
        bitnot  outw3, #15
        add     outw3, outwd     {{  2  }}
        add     outw3, outwd     {{  2  }}
        add     outw3, outwd     {{  2  }}

pic18f2550 · 2021-03-29 12:19

How do you come to 14?

        {{ linear interpolation }}
        mov     outwd, outwa     {{  2  }}
        sub     outwd, outwn     {{  2  }}

        mov     outw1, outwa     {{  2  }}
        mov     outw2, outwa     {{  2  }}
        mov     outw3, outwa     {{  2  }}
        mov     outw4, outwa     {{  2  }}

        shr     outwd, #1        {{  2  }}{{ DIV outwd/2 }}
        add     outw3, outwd     {{  2  }}
        add     outw4, outwd     {{  2  }}
        shr     outwd, #1        {{  2  }}{{ DIV outwd/2 }}
        add     outw2, outwd     {{  2  }}
        add     outw4, outwd     {{  2  }}

        mov     outwa, outwn     {{  2  }}

evanh · 2021-03-29 17:45

Seven instructions between QDIV and GETQX.

pic18f2550 · 2021-03-29 19:54

But there are still 7 commands missing around it, so that one value becomes 4.

evanh · 2021-03-30 04:14

I just took your prior listing and optimised around the cordic's QDIV. Making it 14 clocks faster by not having to twiddle thumbs for so long. Nothing else.

pic18f2550 · 2021-03-30 08:26

OK, now I've eaten it.
The cordic is like a coprocessor for the COG.
Therefore, it can perform calculations in parallel with the COG.

Does each COG have its own cordic?

{{ ==== 1. linear interpolation }}
        mov     outw1, outwa     {{  2  }}
        mov     outw2, outwa     {{  2  }}
        mov     outw3, outwa     {{  2  }}
        mov     outw4, outwa     {{  2  }}

        mov     outwd, outwa     {{  2  }}
        sub     outwd, outwn     {{  2  }}
        qdiv    outwd, #2        {{  2 (9)  }}
        NOP                      {{  OP-CODE SPACE 2 }}
        NOP                      {{  OP-CODE SPACE 2 }}
        NOP                      {{  OP-CODE SPACE 2 }}
        NOP                      {{  OP-CODE SPACE 2 }}
        NOP                      {{  OP-CODE SPACE 2 }}
        getqx   t1               {{  58 }}
        add     outw3, t1        {{  2  }}
        add     outw4, t1        {{  2  }}
        qdiv    outwd, #4        {{  2 (9)  }}
        NOP                      {{  OP-CODE SPACE 2 }}
        NOP                      {{  OP-CODE SPACE 2 }}
        NOP                      {{  OP-CODE SPACE 2 }}
        NOP                      {{  OP-CODE SPACE 2 }}
        NOP                      {{  OP-CODE SPACE 2 }}
        getqx   t1               {{  58 }}
        add     outw2, t1        {{  2  }}
        add     outw4, t1        {{  2  }}

        mov     outwa, outwn     {{  2  }}

For better understanding, I have not optimized the code.

evanh · 2021-03-30 10:43

Yes, the QDIV to GETQX is stall time, like a NOP, until result comes available.

@pic18f2550 said:
OK, now I've eaten it.
The cordic is like a coprocessor for the COG.
Therefore, it can perform calculations in parallel with the COG.

Yes.

Does each COG have its own cordic?

Not quite. It's one fully pipelined unit with 54 stages. By "fully" I mean it can be issued a new command on every clock cycle. Which means there is 54 individual stages of hardware in that unit, many of them are identical functions, that constitute the whole Cordic operation.

It is located in the Hub. Each Cog gets one-in-eight slots of the command issuing. Which has the effect of it being in all eight Cogs. This allows excellent performance in a compact footprint and still fits the deterministic symmetry of the Propeller.

P2 PASM SIN 16Bit

Comments