P2 PASM SIN 16Bit - Page 3 — Parallax Forums

# P2 PASM SIN 16Bit

• Posts: 10,802

I multiplied freql, not clkfreq. But dividing clkfreq is also an option.

```MOV pa, clkfreq
SHR pa, #8     'divide by 256
QFRAC freql, pa
GETQX f1
```
• Posts: 128
edited 2021-03-26 13:45

I already have tomatoes on my eyes.

Here, 268,437,500Hz would be a bigger calculation error than 256,000,000Hz.

Thank you

• Posts: 10,802

The assembly code below is the same as `f1 long round(freql * 65536.0 * 65536.0 * 256.0 / float(clkfreq_))`

```MOV pa, freql
SHL pa, #8     'PA = freql * 256
QFRAC pa, clkfreq
GETQX f1     'f1 = freql * 256 * \$1_0000_0000 / clkfreq
```
• Posts: 128

The calculation of the sine curve is all well and good, but it consumes more than 60 clk impulses.

I will rather create a sin table in the hram.
That is definitely faster.

That way I can calculate the frequency more easily. The missing gaps are simply calculated out.

• Posts: 10,802

No problem. Good exercise anyway. Chip's Spiral Demo is a good one to have a look at for seeing the Cordic in heavy use.

• Posts: 128
edited 2021-03-27 00:15

Yes, I want to watch the demo too.
But I'm still missing the DVI adapter.
I still have to build it.
You can fool the eye, but the ear doesn't miss a thing.
I have installed 5 oscillators so far.
Two of them have an extra reel. (LFO)
The 1st generates an amplitude modulation at 3..5.
The 2nd generates a frequency modulation at 3..5.

If everything works out, I will be able to add 5 more.
And all this in one COG.
Then there will be 5 of them working around in the propeller.
50 oscillators - a great idea

Translated with www.DeepL.com/Translator (free version)

• Posts: 10,802
• Posts: 128

In my attempts, I have come across something that I can't explain.
A wrong frequency is generated here.

```dac2
{{ 1. Sampelausschnitt und Amplitute Berechnen }}
qrotate am1, ph1
getqy   outw
bitnot  outw, #15

```

The correct frequency is generated here.

```dac2
{{ 1. Sampelausschnitt und Amplitute Berechnen }}
nop                 '<--- Warum auch immer sonst stimmt getqy nicht
qrotate am1, ph1
getqy   outw
bitnot  outw, #15

```

fr1 and fr2 are apparently not the same either.
Since a frequency error also occurs here.

```PUB
K1              = 38400

PUB main()
fr1 := (12345) / 30
fr2 := (12345.00) / 30.00

```
• Posts: 10,802

The NOP is not needed for what's there. So something else, not listed, is the problem.

fr2 is an IEEE float. It won't be usable without conversion using constant ROUND(), ie: At compile time only. It is not run-time usable unless you know how to decode a float, bit by bit.

• Posts: 128

I have now replaced the wait function "waitse1" with "waitct1".
I let the DAC run freely with 1M/s sampels.
By "waitse1" I can reduce the sampelrate to 250k/s and get the needed computing time.
I don't need the NOP anymore.

I have made the calculation over constants, here rounding errors can be removed more easily.

• Posts: 10,802

Cleaned up the constant floats for you (without a "." they are integers):

```K10kHz          = 163840000.0
K1kHz           = 16384000.0
K100Hz          = 1638400.0
K10Hz           = 163840.0
K1Hz            = 16384.0
KHz1            = 1638.4
KHz10           = 163.84
fr1_1       = round( (2.0 * K10kHz) + (0.0 * K1kHz) + (0.0 * K100Hz) + (0.0 * K10Hz) + (0.0 * K1Hz) )
```
```  fr1 := fr1_1
```
• Posts: 10,802

Best not to use ASMCLK when mixed Spin/Pasm. It's for pure pasm only code. Just let Spin handle the sysclock setup.

• Posts: 10,802
edited 2021-03-28 01:26

The "dacs" value isn't suitable for PWM dither either. I'd ditch the variable and just use an immediate `WXPIN #256,#pin1`

EDIT: The small value in "dacs" will be why the WAITSE1 wasn't good.

• Posts: 10,802
edited 2021-03-28 01:46

Pin mode value commented:

```dac     long    %0000_0000_000_101_00_00000000_01_00011_0   'P_DAC_990R_3V dac mode + DAC_DITHER_PWM smartpin mode
```
• Posts: 128
edited 2021-03-28 14:34

asmclk I have taken out, had no negative effects,

• Posts: 128
edited 2021-03-28 15:00

Signal image of 20kHz at 250k sampling rate.
With the help of linear interpolation, I want to calculate the 3 missing intermediate values.
So I come to 1M sampling, and the stairs become smaller.

Here is the sum signal of 4 oscillators.

Now come square, triangle and PWM.
And everything must be based with the same values as the sinius.

• Posts: 10,802
edited 2021-03-28 21:46

Nice.

But "dacs" needs to be a multiple of 256 if you want the PWM dither to work. And then you can correctly go back to using WAITSE1.

• Posts: 128

dacs is now fixed at 256.
Otherwise it makes no sense in the project.
I still have to come up with something for the other signal types. Tables are unsuitable here.
I'm racking my brains over the interpolation. But I haven't found a useful solution yet.
I will probably reduce to 200ks.
That will make the calculation easier, divide by 2, 4 and simple addition.

• Posts: 10,802

Cordic can go a lot faster for prebuilding multiple points. Just have to stagger the calculations to use the pipeline. Up to one command per 8-clocks. And if that is still not enough, the next level is get other cogs to help by using their share of the Cordic. It can, in theory, achieve a command per clock.

• Posts: 128
edited 2021-03-29 10:56

Let's see if that works too.

```        {{ linear interpolation }}
mov     outwd, outwa     {{  2  }}
sub     outwd, outwn     {{  2  }}
qdiv    outwd, #3        {{  9  }}
getqx   outwd
mov     outw1, outwa     {{  2  }}
bitnot  outw1, #15
mov     outw2, outwa     {{  2  }}
add     outw2, outwd     {{  2  }}
bitnot  outw2, #15
mov     outw3, outwa     {{  2  }}
add     outw3, outwd     {{  2  }}
add     outw3, outwd     {{  2  }}
bitnot  outw3, #15
mov     outw4, outwa     {{  2  }}
add     outw3, outwd     {{  2  }}
add     outw3, outwd     {{  2  }}
add     outw3, outwd     {{  2  }}
bitnot  outw4, #15
mov     outwa, outwn     {{  2  }}
```
• Posts: 10,802

Here's a part of Chip's Spiral Demo. It demonstrates how to fill the Cordic pipeline:

```.pixels     qvector .x,.y   '0 in       'do overlapped QVECTOR ops for 16 pixels
qvector .x,.y
qvector .x,.y
qvector .x,.y
qvector .x,.y
qvector .x,.y
qvector .x,.y
qvector .x,.y
getqx   .px+0   '0 out
getqy   .py+0
qvector .x,.y
getqx   .px+1   '1 out
getqy   .py+1
qvector .x,.y
getqx   .px+2   '2 out
getqy   .py+2
...
...
```
• Posts: 10,802

Note how results have to be collected too. Two GETQ's, an ADD, and QVECTOR is 8 sysclocks, so couldn't issue commands faster even if the pipeline could go faster.

• Posts: 128
edited 2021-03-29 10:55

If everything works out, I'll be there with 35 sysclocks.
The "bitnot" already belong to the signal conversion of the PWM.

• Posts: 10,802

I've shaved 14 clocks from your above code:

```        {{ linear interpolation }}
mov     outwd, outwa     {{  2  }}
sub     outwd, outwn     {{  2  }}
qdiv    outwd, #3        {{  9  }}

mov     outw1, outwa     {{  2  }}
bitnot  outw1, #15
mov     outw2, outwa     {{  2  }}
mov     outw3, outwa     {{  2  }}
mov     outw4, outwa     {{  2  }}
bitnot  outw4, #15
mov     outwa, outwn     {{  2  }}

getqx   outwd
add     outw2, outwd     {{  2  }}
bitnot  outw2, #15
add     outw3, outwd     {{  2  }}
add     outw3, outwd     {{  2  }}
bitnot  outw3, #15
add     outw3, outwd     {{  2  }}
add     outw3, outwd     {{  2  }}
add     outw3, outwd     {{  2  }}
```
• Posts: 128

How do you come to 14?

```        {{ linear interpolation }}
mov     outwd, outwa     {{  2  }}
sub     outwd, outwn     {{  2  }}

mov     outw1, outwa     {{  2  }}
mov     outw2, outwa     {{  2  }}
mov     outw3, outwa     {{  2  }}
mov     outw4, outwa     {{  2  }}

shr     outwd, #1        {{  2  }}{{ DIV outwd/2 }}
add     outw3, outwd     {{  2  }}
add     outw4, outwd     {{  2  }}
shr     outwd, #1        {{  2  }}{{ DIV outwd/2 }}
add     outw2, outwd     {{  2  }}
add     outw4, outwd     {{  2  }}

mov     outwa, outwn     {{  2  }}
```

{{ clock cycles from Propeller 2 Rev B/C PASM Instructions }}

• Posts: 10,802

Seven instructions between QDIV and GETQX.

• Posts: 128

But there are still 7 commands missing around it, so that one value becomes 4.

• Posts: 10,802
edited 2021-03-30 04:17

I just took your prior listing and optimised around the cordic's QDIV. Making it 14 clocks faster by not having to twiddle thumbs for so long. Nothing else.

• Posts: 128
edited 2021-03-30 09:00

OK, now I've eaten it.
The cordic is like a coprocessor for the COG.
Therefore, it can perform calculations in parallel with the COG.

Does each COG have its own cordic?

```{{ ==== 1. linear interpolation }}
mov     outw1, outwa     {{  2  }}
mov     outw2, outwa     {{  2  }}
mov     outw3, outwa     {{  2  }}
mov     outw4, outwa     {{  2  }}

mov     outwd, outwa     {{  2  }}
sub     outwd, outwn     {{  2  }}
qdiv    outwd, #2        {{  2 (9)  }}
NOP                      {{  OP-CODE SPACE 2 }}
NOP                      {{  OP-CODE SPACE 2 }}
NOP                      {{  OP-CODE SPACE 2 }}
NOP                      {{  OP-CODE SPACE 2 }}
NOP                      {{  OP-CODE SPACE 2 }}
getqx   t1               {{  58 }}
add     outw3, t1        {{  2  }}
add     outw4, t1        {{  2  }}
qdiv    outwd, #4        {{  2 (9)  }}
NOP                      {{  OP-CODE SPACE 2 }}
NOP                      {{  OP-CODE SPACE 2 }}
NOP                      {{  OP-CODE SPACE 2 }}
NOP                      {{  OP-CODE SPACE 2 }}
NOP                      {{  OP-CODE SPACE 2 }}
getqx   t1               {{  58 }}
add     outw2, t1        {{  2  }}
add     outw4, t1        {{  2  }}

mov     outwa, outwn     {{  2  }}
```

For better understanding, I have not optimized the code.

• Posts: 10,802

Yes, the QDIV to GETQX is stall time, like a NOP, until result comes available.

@pic18f2550 said:
OK, now I've eaten it.
The cordic is like a coprocessor for the COG.
Therefore, it can perform calculations in parallel with the COG.

Yes.

Does each COG have its own cordic?

Not quite. It's one fully pipelined unit with 54 stages. By "fully" I mean it can be issued a new command on every clock cycle. Which means there is 54 individual stages of hardware in that unit, many of them are identical functions, that constitute the whole Cordic operation.

It is located in the Hub. Each Cog gets one-in-eight slots of the command issuing. Which has the effect of it being in all eight Cogs. This allows excellent performance in a compact footprint and still fits the deterministic symmetry of the Propeller.