signal processing on propeller , need help on assembly routines

sylvain.azarian · 2009-08-30 18:04

hello all,
i am a beginer in propeller ASM and I am trying to do something like a downsampling routine. My code has one COG sampling and filling hub ram with 12 bits samples at 72 Khz. A second cog is reading the samples and I need a basic averaging to produce one sample over 6.

with RDWORD i retrieve samples but... I do not know how I can write 6 values in an array stored in cog ram, then perform a loop to do some math on it.
I've been trough deSilva tutorials around assembly but his example ex07c with ADDS sum, 0-0 looks a bit tricky...

Mainly my question is : how can I store and access local arrays in COG Ram ? In fact I need indirect addressing...

thanks a lot
Sylvain Azarian

SRLM · 2009-08-30 18:20

You can use self modifying code. Specifically, instructions like MOVS and MOVD allow you to modify the source or destination of a variable. You may also want to look at the MCP3208 fast 3 chip ADC code in the obex. It allows averaging of 1 to 16 samples. I suspect that you could almost cut and paste the code.

sylvain.azarian · 2009-08-30 20:00

thanks for the sample given. The idea of self-modifying code makes code quite hard to re-read

looking forward to have next' gen propeller with embedded multiply ...

SRLM · 2009-08-30 20:21

For self modifying code, just replace the source or destination of the modified code with 0-0, indicating that the code is modified.

CounterRotatingProps · 2009-08-30 23:01

Hi Sylvain,

welcome to the Forums!

SRLM> modified code with 0-0

At the risk of thread drift - that token is as old as dirt ... anyone know where it came from --- IBM BAL 360 or earlier even?

- Howard

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔

pgbpsu · 2009-08-31 14:53

@Sylvain-

Welcome to the forums. The suggestions to use self-modifying code are good ones, but I wonder if that's necessary. If you are simply trying to average samples and get a single result why not do that in the same cog that is acquiring the data? Maybe you have something else going on that you didn't mention. I'm doing that very thing and I can give you a PASM example if you would like it.

I've used self-modifying code but found it hard to get my head around. I try to use the simplest possible solution for my problems and I think averaging (or at the very least summing) in the acquisition cog would be simpler and less HUB intensive than writing all the values to HUB to be averaged by another cog.

Let me know if you'd like an example. I'm collecting 24-bit samples at 80Khz and averaging down to 10Khz or all the way down to 625sps. The 32-bit size of a long allows that without overflow. Doing it for 12-bits gives you lots of room for summing!

Regards,
Peter

Agent420 · 2009-08-31 15:00

Regarding self-modifying code for an indexed loop, I found this snippit from one of Chips' vga drivers that I am hacking a clever example of how to accomplish this.· The base address is moved into the destination inside·the first lines, and then subsequently incremented by adding the d0 constant.

This is one of those areas where previous 8 bit experience used indexed registers to accomplish the task... it requires a change in mindset for the Prop.

' Build cursor pixels
                        mov     par,#32                 'ready for 32 cursor segments
                        movd    :pix_l,#cpix_l
                        movd    :pix_r,#cpix_r
                                    
:pixloop                rdlong  cseg_l,cshape_l         'custom
                        add     cshape_l,#4
                        rdlong  cseg_r,cshape_r
                        add     cshape_r,#4
                        
:pix_l                  mov     cpix_l,cseg_l           'save segment into pixels
                        add     :pix_l,d0
:pix_r                  mov     cpix_r,cseg_r           'save segment into pixels
                        add     :pix_r,d0
                        
                        djnz    par,#:pixloop           'another segment?
 
' Data
d0s0                    long    1 << 9 + 1              'd and s field increments
d0                      long    1 << 9                  'd field increment
d1                      long    2 << 9                  'd field double increment

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔

sylvain.azarian · 2009-08-31 19:05

Thanks for all these examples. I like the vga one

it is quite similar to what I need

About the advice to do that in sampling cog, in fact i need the datas in two flavors and the averaging is in fact a decimation filter that runs in its own cog.

I have :
- one task sampling at full speed ( 2 channels at 75 ksamples/sec @ 12 bits ) two analog channels and storing data in a HUB ram block "A"
- a second task, doing exactly the same and filling a second "B" block, once "A" is full.
- while B is being filled, a task reads the samples and decimate them ( low pass filtering ) and the result are stored in "C" buffer in a fixed radix format ( Q1.15 )
- same when A is being filled and B is full.
- another task doing a slow average and some data processing on "A" & "B" blocks for adjusting gain of the A/D converters ( that is why i need the averaging to be done elsewhere )
- another task managing display
- last task does a hilbert transformation on C buffers... and sends the output to a D/A converter

currently my main problem is being able to understand the addressing subtilities of the propeller assembly fixed point multiplication

thanks all for your advices, I hope my project will be published soon ... it aims to be a "Propeller defined radio". But "do not sell the bear skin before you killed it" ( poor translation from french). The road is still long before I can hear first "noise" from the receiver

sylvain

sylvain.azarian · 2009-08-31 19:45

As a complement... here is exactly the C code I need to implement :

/* Digital filter designed by mkfilter/mkshape/gencode   A.J. Fisher
   Command line: /www/usr/fisher/helpers/mkfilter -Bu -Lp -o 6 -a 1.6666666667e-01 0.0000000000e+00 -l */

#define NZEROS 6
#define NPOLES 6
#define GAIN   2.319917755e+02

static float xv[noparse][[/noparse]NZEROS+1], yv[noparse][[/noparse]NPOLES+1];

static void filterloop()
  { for (;;)
      { xv[noparse][[/noparse]0] = xv; xv = xv; xv = xv; xv = xv; xv = xv; xv = xv[noparse][[/noparse]6]; 
        xv[noparse][[/noparse]6] = next input value / GAIN;
        yv[noparse][[/noparse]0] = yv; yv = yv; yv = yv; yv = yv; yv = yv; yv = yv[noparse][[/noparse]6]; 
        yv[noparse][[/noparse]6] =   (xv[noparse][[/noparse]0] + xv[noparse][[/noparse]6]) + 6 * (xv + xv) + 15 * (xv + xv)
                     + 20 * xv
                     + ( -0.0135636846 * yv[noparse][[/noparse]0]) + (  0.1354403429 * yv)
                     + ( -0.5962625798 * yv) + (  1.4692829541 * yv)
                     + ( -2.2523796180 * yv) + (  1.9816107366 * yv);
        next output value = yv[noparse][[/noparse]6];
      }
  }

sylvain

ericball · 2009-09-02 17:27

Your biggest challenge is going to be performing the multiplications fast enough. The inner loop of a multiply is 4 instructions (Shift, Add, Shift, Jump) per bit. So that's 16*32 clock cycles per full 32 bit multiply. (Yes, it is possible less than 32 reps will be required, but that's the worst case.) You've got 7 full multiplies in your routine (yv[noparse][[/noparse]0-6] scaling plus GAIN), so that's 3584 cycles, or 22k samples per second at 80MHz. So that's off by a factor of 3.5 per channel. Now, if you can fudge the scaling factors so they can be done as a small number of shifts and adds (like the xv[noparse][[/noparse]0-6] scaling), then you may be able to process the data fast enough.

I also wonder why you propose having multiple buffers. The filter cog is a consumer/producer. It should be able to wait on a lock to get the next sample, process the sample, then write out the output value and kick the lock so the next stage can handle it. No buffers required. Buffers are only required if the processing performed by the stage varies enough so some samples require more time to process than others, but the overall average is less than the sample interval.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Composite NTSC sprite driver: Forum
NTSC & PAL driver templates: ObEx Forum
OnePinTVText driver: ObEx Forum

sylvain.azarian · 2009-09-02 19:12

Hello,

thanks for your advices regarding multiplying. I agree with you that it is gone to be tricky...
About multiple buffering : the ADC are AD7823 from analog devices, with a serial clock and a "s/h" pin. I need a quite precise timing and currently the sampling cog is more or less doing nothing but sampling and wrlong at the end of the cycle. Not much space left for extra computing or queue management currently.
Each sampling cog ( 2 ) is filling 1024 words of data to make sure nothing is lost, the acquisition process is done at around 72000 samples / secs, on two AD7823.
I use a "flip/flop" with 1 semaphore for cog synchro : cog 1 makes filter while cog 2 is acquiring then change , cog 1 is acquiring and cog 2 is filtering and so on...

re-edit :
to produce "earable" sound on this software defined radio system, the dsp chain is quite complex. My idea is to use several cogs to split math operations and then be able to do the all of it.
for the reader who wants to know what is "softare defined radio" ( aka "SDR") google "Software defined radio for the masses", very good Pdf article from QST magazine

sylvain

Post Edited (sylvain.azarian) : 9/2/2009 7:18:30 PM GMT

signal processing on propeller , need help on assembly routines

Comments