Shop OBEX P1 Docs P2 Docs Learn Events
Is it possible to tighten this loop? — Parallax Forums

Is it possible to tighten this loop?

Andrey DemenevAndrey Demenev Posts: 377
edited 2011-06-09 23:34 in Propeller 1
I need to make an arbitrary waveform generator using an external DAC. Frequency value should be read from HUB RAM. Once frequency is set to zero, the generation should be stopped - not immediately, but as close as possible to the moment when waveform crosses the zero. The following code works just fine:
gen
:loop           rdword  OUTA, addr
                mov     addr, PHSA
                nop 'mov     OUTA, #0
                rdlong  freq, PAR               wz
                shr     addr, #19
                add     addr, base

                rdword  OUTA, addr  
                mov     addr, PHSA
                nop 'mov     OUTA, #0
                shr     addr, #19
                add     addr, base
    if_z        jmp     #fadeout
                nop

                rdword  OUTA, addr
                mov     addr, PHSA
                nop 'mov     OUTA, #0
                shr     addr, #19
                add     addr, base
                mov     FRQA, freq
                jmp     #:loop

fadeout         test    middle, OUTA             wc
:loop           rdword  OUTA, addr
                test    middle, OUTA             wz
                nop 'mov     OUTA, #0
                mov     addr, PHSA
                shr     addr, #19
                add     addr, base
    if_c_ne_z   jmp     #:loop
                mov     OUTA, middle
:wait           rdlong  freq, PAR               wz
    if_z        jmp     #:wait
                jmp     #gen


middle  long    $200
base    long    16384
pinmask long    $3FF

addr    res     1
sample  res     1
freq    res     1

The problem is that sampling frequency is not big enough. I would like to use another cog to double the sampling freq. Obviously, they should be interleaved, and at the moment one cog outputs a sample, another one should output zero, so they do not step each others toes. It may seem that there is some space to do this (see the NOP'ed instructions) - but zeroing OUTA occurs at wrong time. One sampling period takes 32 cycles, and each cog should zero OUTA exactly 16 cycles after it outputs its sample. The problem here is that I need to access HUB RAM twice per sampling period - first to read sample value, second - to read frequency value. Because I read sample directly to OUTA, frequency reading occurs at same moment when zeroing should happen. Reading sample to a register could help - but I am out of cycles.

I could save 4 cycles per sample if the waveform table was located starting from zero address - then I would not need add addr, base, but unfortunately this does not seem possible (well, it IS possible, but in that case I would have to code everything in assembler, and use very non-trivial programming techniques)

Any advise?

Comments

  • AleAle Posts: 2,363
    edited 2011-06-08 03:48
    The waveform can be at 0, no problem. And the trickery you refer to is trivial:

    Your waveform is in a DAT section, the first one in your main source file (so it will be placed at the lowest address). When initializing you just move that block a few longs down making it start at zero :). Of course you lose the clock config long at address 0, so any code that depends on that should be rewrote :).
    CON
        _clkmode      = XTAL1 + PLL16
        _xinfreq      = 80_000_000
    OBJ
    { objects here }
    DAT
    waveform long 0, 0, 0, 0
    
    DAT ' this DAT identifier can be skipped
      org
    ' assembly here
    
    PUB startall
      movelong(@waveform, 0, length_waveform)
    

    or something like that :)
  • Andrey DemenevAndrey Demenev Posts: 377
    edited 2011-06-08 04:03
    I do not think that would work. The lower addresses of RAM hold information about where objects' code is located - overwriting that area will break all spin code.
  • kuronekokuroneko Posts: 3,623
    edited 2011-06-09 00:08
    Reading from hub directly into outa - while convenient - is not going to fly as you found out already because you need the slot for reading the frqx update. I'm still suffering a bit from jetlag but this should do (despite adding some latency due to a slightly longer loop). It uses a pair of address pointers to compensate for the rdlong, code is untested and doesn't include the fadeout code. There are some limitations as to what the base address can be but that should only be a minor issue as you're using 16K already.
    DAT
    
    ' setup eins as initial sample address
    '               ...
    gen
    :loop
    ' eins: active
    ' zwei: usable (base)
                    rdword  temp, eins              '  +0 =
                    mov     outa, temp              '  +8   set
                    add     zwei, phsa              '  -4   base + offset (rotated)
                    rdlong  freq, par wz            '  +0 =
                    mov     outa, #0                '  +8   clr
                    ror     zwei, #19               '       sample address
    ' eins: dirty
    ' zwei: active
                    rdword  temp, zwei              '  +0 =
                    mov     outa, temp              '  +8   set
                    mov     zwei, phsa              '  -4   offset        (rotated)
                    add     zwei, base              '  +0 = offset + base (rotated)
                    ror     zwei, #19               '       sample address
                    mov     outa, #0                '  +8   clr
            if_z    jmp     #fadeout                '       exit
    ' eins: dirty
    ' zwei: active
                    rdword  temp, zwei              '  +0 =
                    mov     outa, temp              '  +8   set
                    mov     zwei, phsa              '  -4   offset        (rotated)
                    add     zwei, base              '  +0 = offset + base (rotated)
                    ror     zwei, #19               '       sample address
                    mov     outa, #0                '  +8   clr
                    mov     frqa, freq              '       update
    ' eins: dirty
    ' zwei: active
                    rdword  temp, zwei              '  +0 =
                    mov     outa, temp              '  +8   set
                    mov     zwei, phsa              '  -4   offset        (rotated)
                    add     zwei, base              '  +0 = offset + base (rotated)
                    ror     zwei, #19               '       sample address
                    mov     outa, #0                '  +8   clr
                    mov     eins, base              '       reset (base)
    ' eins: usable
    ' zwei: active
                    rdword  temp, zwei              '  +0 =
                    mov     outa, temp              '  +8   set
                    add     eins, phsa              '  -4   offset + base (rotated)
                    ror     eins, #19               '  +0 = sample address
                    mov     zwei, base              '       reset (base)
                    mov     outa, #0                '  +8   clr
    ' eins: active
    ' zwei: usable
                    jmp     #:loop
    
    fadeout         ' ...
    
    base            long    16384 <- 19
    eins            long    16384 <- 19
    zwei            long    16384 <- 19
    
    freq            res     1
    temp            res     1
    
    DAT
    
  • Andrey DemenevAndrey Demenev Posts: 377
    edited 2011-06-09 02:47
    Thanks, kuroneko! You're a genius! I have replaced mov outa, #0 with nop, works just fine on single cog. Now I need to figure out how to write code for second cog. Should be easy - move all mov outa, XXXX one instruction down (the last mov outa, #0 goes to beginning of the loop), and choose cogs with 12 cycles spacing between hub access windows
  • kuronekokuroneko Posts: 3,623
    edited 2011-06-09 03:48
    Should be easy - move all mov outa, XXXX one instruction down (the last mov outa, #0 goes to beginning of the loop), and choose cogs with 12 cycles spacing between hub access windows
    Or leave the code as is and simply pick a cog 8 cycles down the road (i.e. ID+4 mod 8). But I'm sure you'll find a way.

    Scratch that, I need more coffee. 10 days off the grid is doing that to you. 4/12 sounds fine.
  • Andrey DemenevAndrey Demenev Posts: 377
    edited 2011-06-09 04:36
    Yes, I am running this now on cogs 1 and 7 - works fine. So now I can make a 2-channel low frequency DDS with sampling frequency of 5 MHz. The final goal is a 1-channel synthesizer with 10 MHz sampling - but that is a different story.
  • kuronekokuroneko Posts: 3,623
    edited 2011-06-09 04:46
    Nice. Is that 16bit (rdword) resolution or less?
  • Andrey DemenevAndrey Demenev Posts: 377
    edited 2011-06-09 05:42
    In theory, it can be anything up to 16 bit. In practice, I think no more than 12 bits - main reasons are :

    - at higher resolution, DACs are too expensive
    - it makes no sense to use higher resolution with 16KiB of samples memory

    Also, I want 2 channels - so I think I will go 10 bits, with 12-bit DACs (lower bits always zero, to minimize linearity errors). Otherwise I do not have enough pins - I need rotary encoder, some keys, input for frequency counter, 2 outputs for DC offsets, LCD etc.

    If I wanted only sine, triangle and square - it would make more sense to use an integrated DDS circuit. But I want an AWG
  • Andrey DemenevAndrey Demenev Posts: 377
    edited 2011-06-09 05:47
    Actually , in theory, it can be anything up to 32 bit :)
  • kuronekokuroneko Posts: 3,623
    edited 2011-06-09 05:47
    I see. Unfortunately that precludes use of the video h/w which could do the setting/clearing value quite easily (8 bits only). Unless you want to throw more cogs at it ... but maybe that's more trouble than it's worth.
  • Andrey DemenevAndrey Demenev Posts: 377
    edited 2011-06-09 05:57
    Using video generator sounds attractive, but... there is always but :) It would not make any improvement unless it is used at frequencies higher than Prop clock. But chaining two PLLs may make the output worse in terms of SNR. Also, higher clock frequency requires wider phase accumulator to achieve same frequency resolution.
  • Andrey DemenevAndrey Demenev Posts: 377
    edited 2011-06-09 17:22
    Bad news. The code above only works when lower 2 bits of frequency value are both zero :(
  • kuronekokuroneko Posts: 3,623
    edited 2011-06-09 19:28
    DAT
    
    ' eins: current phsx value (slot -4)
    ' zwei: initial sample address
    '               ...
    gen
    :loop
    ' eins: primed[1]
    ' zwei: active
                    rdword  temp, zwei              '  +0 =
                    mov     outa, temp              '  +8   set
                    [COLOR="red"]shr     eins, #19               '  -4   offset (based on phsx[0])[/COLOR]
                    rdlong  freq, par wz            '  +0 =
                    mov     outa, #0                '  +8   clr
                    [COLOR="red"]add     eins, base              '       sample address[/COLOR]
    ' eins: active
    ' zwei: unused
                    [COLOR="red"]rdword  temp, eins              '  +0 =[/COLOR]
                    mov     outa, temp              '  +8   set
                    mov     zwei, phsa              '  -4   offset        (rotated)
                    shr     zwei, #19               '  +0 = offset
                    add     zwei, base              '       sample address
                    mov     outa, #0                '  +8   clr
            if_z    jmp     #fadeout                '       exit
    ' eins: unused
    ' zwei: active
                    rdword  temp, zwei              '  +0 =
                    mov     outa, temp              '  +8   set
                    mov     zwei, phsa              '  -4   offset        (rotated)
                    shr     zwei, #19               '  +0 = offset
                    add     zwei, base              '       sample address
                    mov     outa, #0                '  +8   clr
                    mov     frqa, freq              '       update
    ' eins: unused
    ' zwei: active
                    rdword  temp, zwei              '  +0 =
                    mov     outa, temp              '  +8   set
                    mov     zwei, phsa              '  -4   offset        (rotated)
                    [COLOR="red"]mov     eins, zwei              '  +0 = phsx[-128][/COLOR]
                    shr     zwei, #19               '       offset
                    mov     outa, #0                '  +8   clr
                    add     zwei, base              '       sample address
    ' eins: primed[0]
    ' zwei: active
                    rdword  temp, zwei              '  +0 =
                    mov     outa, temp              '  +8   set
                    mov     zwei, phsa              '  -4   offset        (rotated)
                    shr     zwei, #19               '  +0 = offset
                    add     zwei, base              '       sample address
                    mov     outa, #0                '  +8   clr
                    [COLOR="red"]shl     freq, #7                '       frqx*128[/COLOR]
    ' eins: primed[0]
    ' zwei: active
                    rdword  temp, zwei              '  +0 =
                    mov     outa, temp              '  +8   set
                    mov     zwei, phsa              '  -4   offset        (rotated)
                    shr     zwei, #19               '  +0 = offset
                    add     zwei, base              '       sample address
                    mov     outa, #0                '  +8   clr
                    [COLOR="red"]add     eins, freq              '       phsx[-128] + frqx*128 = phsx[0][/COLOR]
    ' eins: primed[1]
    ' zwei: active
                    rdword  temp, zwei              '  +0 =
                    mov     outa, temp              '  +8   set
                    mov     zwei, phsa              '  -4   offset        (rotated)
                    shr     zwei, #19               '  +0 = offset
                    add     zwei, base              '       sample address
                    mov     outa, #0                '  +8   clr
    ' eins: primed[1]
    ' zwei: active
                    jmp     #:loop
    
    fadeout         ' ...
    
    base            long    16384
    eins            res     1
    zwei            res     1
    
    freq            res     1
    temp            res     1
    
    DAT
    
  • Andrey DemenevAndrey Demenev Posts: 377
    edited 2011-06-09 22:57
    This code has another problem - the code for second cog will have PHSA sampled at wrong moments - 4 cycles too early, leading to non-even samples distribution (20-12-20-12 instead of 16-16-16-16). I think, I will change to software phase accumulator, so actual moments when phase value is used are not so critical. Partially prepare 16 samples , without base address added - this gives enough time to also zero the output, check for exit condition, account for 16 additional phase increments, and update frequency. Then in output loop, I have 2 instructions each 16 cycles to add base to address and output the sample. The order of these 2 will be different for different COGs, producing a 4 cycles offset, and another 12 cycles will be added by hub access window
  • kuronekokuroneko Posts: 3,623
    edited 2011-06-09 23:34
    ... I will change to software phase accumulator, so actual moments when phase value is used are not so critical ...
    Sounds familiar. I had similar issues when writing soft-duty counters (16/8/4 cycle pulses). It may feel odd (compared to using h/w) but gives you more freedom in the way you can (re)arrange your code. So if nothing else I/we had some form of brain exercise :)
Sign In or Register to comment.