I2S read to 8-bit write. I need a little help.

pgbpsu · 2007-09-02 03:33

I've written my first bit of Propeller assebmly code to read an I2S type ADC. My code will work, but not at the speed I need. I was hoping some of the more clever, more experienced programmers here could help me tweek this to get to the speed I need.

I have two questions regarding my first propeller project:
1. How can I speed my code up so I can read a single channel of data?
2. How difficult is it to move from 1 channel to 5?

The ADC I'm reading (AD1871) puts out a bit clock (BCLK = 6.144Mhz), a left-right clock (LRCLK = 96Khz) and data (DATA). The data are I2S-justified meaning that during the low part of LRCLK the MSB of my 24-bit Lsample comes on the second rising edge of BCLK. The next 23 rising edges of BCLK contain the rest of my Lsample. The situation is similar for the right channel. The second rising edge of BCLK after a falling edge on LRCLK contains the MSB of my Rsample. The next 23 rising edges of BCLK contain valid data. Each L and R clock contain 32 bits, of which only 24 are valid data.

So far I've only been able to read data when the ADC LRCLK is running at 80Khz (BCLK=5.12Mhz). The engineer who built the eval board for me smartly put a switchable master clock (24Mhz or 20Mhz). While LRCLK is high, I read my 24-bits and immediatly turn around and write them out in 8-bit chunks to a linux box connected to the propeller. To be sure that data are transferred correctly between the propeller and the linux box, I've ginned up a bit of handshaking. The propeller starts with FRAME pin high, indicating data are NOT valid. Once all 24 bits have been collected from the ADC, I put the lowest 8 onto my propeller output pins and clear the FRAME pin. The linux box sees this and reads the valid data. After reading the data it sets the RX_OK line high. When the propeller sees this, it raises the FRAME pin and clears the data. The FRAME pin has to stay high long enough for the linux box recognizes it and clear RX_OK. This completes one 8-bit transfer. The propeller then moves to the next 8-bit chunk. It cycles through this 4 times, before returning to read the next Lsample.

My current design only reads 25 bits of the left channel and writes them out in 8-bit chunks during the remaining 39 BCLK cycles. This seems the most basic place to start. When this is working I can tackle the problem of reading mutiple channels simultanously. Maybe this is where other cogs would be useful, but that's still beyond me. My problem is that the bit loop at the top of the code which reads the data from the ADC can't keep up with 96Khz data. It does keep up with 80Khz data. How can I speed this up? I should point out that for this to work at 80Khz there must be 3 instructions then the DJNZ command. As you an see in the code below, I've got an AND(4), RCL(4), and NOP(4) (or MUX(4) for testing). This is the only way I can get this loop to work. If I take the NOP (or MUX) out things fall apart. I've verified, using an oscilliscope, that the 24-bit data word coming from the ADC matches what the propeller gets. On the scope, the propeller word takes less total time but it's all there. The propeller seems to be running a bit faster than the incoming data, but they are close enough that the propeller doesn't skip any bits. I'm clearly wasting time in this loop just to keep things "synced". There must be a way to re-write this so I can get data at 6.144Mhz.

Suggestions on how to tweak this loop, or how to expand this code from 1 channel to 5 would be greatly appreciated.

Thanks,
Peter

{{LCD_8_.spin
 This program reads 24 bit data from the AD1871 and
 writes it out to be read by the TS7260 on its LCD Header

 INPUTS:
 LRCLK - 96kHz clock; left and right channels on low or high                                    
 BCLK - 6.144Mhz clock; data are valid on rising edge
 DATA - 6.144Mhz nominal; data for BCLK
 RX_OK - when this line goes high the TSS has received the data

 OUTPUTS:
 DATA7-DATA0
 FRAME - line that goes low when ouput data are valid
 }}

CON

  _CLKMODE = XTAL1 + PLL16X
  _XINFREQ = 5_000_000
VAR
  long  Lsamp
  
PUB Main
{Launch cog to read I2S data}

  cognew(@i2s, @Lsamp)               'Launch new cog

DAT
{Read data from the AD1871 and pass it on to the TS7260}
              ORG       0                       'Begin at cog RAM addr 0

i2s           mov       dira, DIRA_DEFN         'Setup output pins
                                                             'Setup input pins

' BEGIN READ/WRITE LOOP        
' A BIT OF SETUP AT THE BEGINNIG OF EACH READ/WRITE CYCLE
:readloop     mov       outa, FRAME             ' Set FRAME to indicat data are 
                                    ' no longer valid, clear data 
                                          ' pins
              mov       nbits, #25                     ' set number of iterations
                                          ' first bit is junk
                                          ' next 24 are good data
              mov       Lsamp_, #0            ' clear Lsamp_ variable
              waitpeq   LRCLK, LRCLK               ' wait for LRCLK to high
              waitpne   LRCLK, LRCLK               ' wait for LRCLK to go low
                                          ' get synced with LRCLK
                                          ' so we are sure to start 
                                          ' reading the first BCLK in 
                                          ' this LRCLK
              xor       outa, SHADOW               ' toggle SHADOW at LRCLK rate
' START READING 25 SAMPLES FROM ADC STORE IN Lsamp_
:bitloop      and       DATA, ina nr, wc          ' Read datapin AND ina with 
                                    ' DATA, set C=1 if odd number of 
                                    ' ones in result
              rcl       Lsamp_, #1                     ' shift C left into Lsamp
'              muxc      outa, SHADOW              ' For testing - echo
              nop                            ' needed to keep loop synced
              djnz      nbits, #:bitloop               ' Decrement loop count.  
              
' FINISHED READING INPUT DATA

' START WRITING AS 8-BIT CHUNKS
' get first group of 8 bits onto the data pins
              mov       t1, nibble1            ' move 0x0000_00FF into t1
              and       t1, Lsamp_                     ' extract bits 7-0 of Lsamp_
              shl       t1, #16                            ' shift nibble1 into bits 23-16 
              mov       outa, t1                         ' clear FRAME and write DATA 
              waitpeq   RX_OK, RX_OK             ' wait for RX to go HIGH
              mov       outa, FRAME                  ' set FRAME and clear data
              waitpne   RX_OK, RX_OK             ' wait for RX_OK to go LOW


' get second group of 8 bits onto the data pins
              mov       t1, nibble2            ' move 0x0000_FF00 into t1
              and       t1, Lsamp_                     ' extract bits 15-8 of Lsamp_
              shl       t1, #8                              ' shift result into bits 23-16
              mov       outa, t1                         ' clear FRAME and write DATA 
              waitpeq   RX_OK, RX_OK             ' wait for RX to go HIGH
              mov       outa, FRAME                  ' set FRAME and clear data
              waitpne   RX_OK, RX_OK             ' wait for RX to go LOW

' get third group of 8 bits onto the data pins
              mov       t1, nibble3                     ' move 0x00FF_0000 into t1
              and       t1, Lsamp_                     ' extract bits 23-16 of Lsamp_
              shr       t1, #0                             ' shift result into bits 23-16
              mov       outa, t1                         ' clear FRAME and write DATA 
              waitpeq   RX_OK, RX_OK             ' wait for RX to go HIGH
              mov       outa, FRAME                  ' set FRAME and clear data
              waitpne   RX_OK, RX_OK             ' wait for RX to go LOW

' get upper group of 8 bits onto the data pins
              mov       t1, nibble4                     ' move 0xFF00_0000 into t1
              and       t1, Lsamp_                     ' extract bits 31-24 of Lsamp_
              shr       t1, #8                             ' shift result into bits 23-16
              mov       outa, t1                         ' clear FRAME and write DATA 
              waitpeq   RX_OK, RX_OK             ' wait for RX to go HIGH
              mov       outa, FRAME                  ' set FRAME and clear data
              waitpne   RX_OK, RX_OK             ' wait for RX to go LOW

' FINISHED WRITING

'              mov      Lsamp_, cnt
'              xor       outa, SHADOW       

              jmp       #:readloop            ' jump back to beginning
' DONE READING AND WRITING THIS SAMPLE JUMP BACK TO START AND DO ALL OVER
' AGAIN
              
' outa bits which are active outputs
DIRA_DEFN     long      %0000_0000_1111_1111_1000_0000_0000_0000 
'PIN ASSIGNMENTS
DATA7         long      |< 23 ' %1000_0000      'TS LCD_7 (out) = pin 23
DATA6         long      |< 22 ' %1000_0000      'TS LCD_6 (out) = pin 22
DATA5         long      |< 21 ' %1000_0000      'TS LCD_5 (out) = pin 21
DATA4         long      |< 20 ' %0100_0000      'TS LCD_4 (out) = pin 20
DATA3         long      |< 19 ' %0010_0000      'TS LCD_3 (out) = pin 19                                                  
DATA2         long      |< 18 ' %1000_0000      'TS LCD_2 (out) = pin 18
DATA1         long      |< 17 ' %0100_0000      'TS LCD_1 (out) = pin 17
DATA0         long      |< 16 ' %0010_0000      'TS LCD_0 (out) = pin 16
DATABITS      long      DATA0 | DATA1 | DATA2 | DATA3 | DATA4 | DATA5 | DATA6 | DATA7
FRAME         long      |< 15 ' %0001_0000      'TS FRAME (out) = pin 15
RX_OK         long      |< 14 ' %0000_1000      'TS RX_OK (in)  = pin 14
DATA          long      %0000_0100              'ADC DATA (in)  = pin 2
BCLK          long      %0000_0010              'ADC BCLK (in)  = pin 1
LRCLK         long      %0000_0001              'ADC LRCLK (in) = pin 0

SHADOW        long      %0000_0000_0000_0000_0001_0000_0000_0000    'P12 will shadow the input data

'VARIABLES
MSB           long      %10000000_00000000_00000000_00000000

DataWord      long      %0000_0001_1111_1111_0110_1111_1001_0111  
                        '0x1FF6F97 = 33517463            
'DataWord      long      %0000_0001_0000_0000_0000_0000_0011_0111  
                        '0x1000037 = 55 after removing parity bits            
'DataWord      long      %1111_1111_1111_1111_1111_1111_1111_1111              
nibble1       long      %0000_0000_0000_0000_0000_0000_1111_1111
nibble2       long      %0000_0000_0000_0000_1111_1111_0000_0000
nibble3       long      %0000_0000_1111_1111_0000_0000_0000_0000
nibble4       long      %1111_1111_0000_0000_0000_0000_0000_0000

nbits            res       1
average       res       1
total             res       1
nibbles         res       1
Lsamp_        res       1
DataValue    res       1
t1                res       1

deSilva · 2007-09-02 09:14

Peter:
it is always a pleasure to look at a well structured program - especially if it basically works and just needs some "tuning".
You use some quite clever tricks, and I must say I doubt this is your first assembly program

Wrt timing: The ADC requires a bit to be accepted each 162 ns (6MHz) or 195 ns (5 MHz).
A COG instruction takes 4x 12.5 ns = 50 ns,
so you have 3 for the fast case and "nearly" 4 for the "slow" case. Bingo!

You now have to investigate carefully into the position of the sample strobe wrt the stable bit signal.
The difference between 195 ns and 200 ns is just small enough you can catch 30 bits, but you are "at the end at the end"...
Synchronizing to 162 ns with 3 instruction = 150 ns is not possible - you can read around 10 to 15 bits and then be out out sync.

This a very demanding application!

I see two ways forward:
(1) use a slower (!) crystal, so 3 instructions run in 160 ns rather than 150 ns
(Around 15/16*5 MHz = 4,6 MHz I think even a PAL crystal will do 4,433 MHZ)

(2) Use 3 COGS: Two COGS run the data aquisition, one the even bits, the other the odd bits.
That gives you 320 ns to spend you can EXACTLY sync to by WAITCNT.
This gives you additional headroom for respective changes in the timing of the ADC
You then write this to the HUB and let the third COG do all the bookkeeping, bit merging, LINUX com.

(3) Edit2 -> see below!!

This is exactly what the Prop is made for. An excellent example I should say!
Much luck!

Edit:
wrt your second question "5 Channels"
First you will need a separate COG to do the LINUX com anyhow, so my above proposal is not work in vain.
Second you now have only 4 COGs left givin 3 channels only
Using the "slow crystal" trick however gives you enough COGs for 5 channels.

As the communication to the host is much slower than the data acquisition many samples will get lost...
I think this has to be controlled in an orderly way somehow...
But if this is not so important, you can use just ONE dedicated communication cell in the HUB for each channel.
There is no need in this situation for LOCKS or semaphores.
However before sending a LONG byte by byte you should save a copy

Otherwise a COG might change this value
underway. Reading and writing LONGS is an inseparable activity.

Edit2:
(3a) I just divised a third solution. Being faster is always better than being slower

How can you profit?
The answer is "loop unrolling"- write down all 24 fetches = 72 instruction

  IN
  SHIFT
  NOP

This will easily fit into the COG, and copy and paste will help

Now you have 150 ns per sample, you thorougly have to pad for best fit to 162 ns:
Insert a second NOP in (about) every fifth pack.

This really looks like an "easy win"

(3b) If you shy the "mass of unrolled code" - and as you have the NOP to play with - you can still use DJNZ with 5 runs. Add 4 of these loops one after another.
But now it becomes tricky

Jumps not performed take 8 ticks rather than 4, so you spend exactly the needed time betwen the "loop packs"!
Everything would look perfect except for the poor loop variable.
All right, there is one chance: Use 4 different loop variables, preset to #5 at the beginning.

Really, I like this project!

Post Edited (deSilva) : 9/2/2007 11:23:38 AM GMT

deSilva · 2007-09-02 10:45

I just had a look at the datasheet: A nice chip! Quanta costa?

When you also connect the BCLK line to the Prop a clocked approach can also be considered
(in contrast to the more or less async approaches done before)

Unrolling your loop according to Solution 3a, you have 62 ns to spend, filled by one or two NOPs.
You could substitute this (these) NOP(s) by a WAITPEQ instruction waiting for the rising BCLK.

However....
Looking into the (Propeller) Data Sheet, you find that wait instructions take 5+ ticks.
This is a real pity, as 5 ticks are 62,5 ns.

It looks risky but one can try... In anycase you will not loose anything, as the WAITPEQ does NOT wait for
the rising slope, but falls through when the signal is high (or whatever)
In that case however it will not be better than the async approach.
But note it will automatically handle the "low speed" case!!
Obviously it will no longer work below 3,2 MHz.

I wanted to mention this technique however, as it is the cleanest one available - when it fits!

Post Edited (deSilva) : 9/2/2007 11:22:53 AM GMT

Fred Hawkins · 2007-09-02 11:26

Why not use modulator mode so you can get 4 bits at a time?

pgbpsu · 2007-09-02 17:21

deSilva-

Thanks so much for your quick and helpful replys. I'm very impressed at how quickly you grasped my problem.

You've explained the timing problem much better than I could.

Regarding Fred's suggestion of using the Modulator mode. We'd really like to stick with the I2S mode of the ADC because it is the default mode for that chip. Any other mode requires us to program the chip before we start reading data. That's not a big deal, but we are trying to avoid unnecessary communication. If I'm unable to read these data with the propeller, we'll switch to an FPGA. In fact we have a grad student writing Verilog code for just this purpose. But I find the propeller much more interesting and more accessable. I don't know the first thing about programming an FPGA, but with no previous experience and only a bit of C background, I was able to put together this program which almost does what we want.

Moving to a slower ADC speed is not desirable. The AD1871 is designed to run at 6.144Mhz. Besides, that kind of side steps the problem [noparse];)[/noparse] It seems like some code to read 96Khz I2S data would be of use to others. When I get that part of this working I'll be sure to post it.

deSilva, I think you've found the solution. Simply unroll the bitloop. It may not be as elegent as looping, but I believe it will get the job done. My one track mind saw the need to read 25 bits and immediately thought loop. Strange how one gets tunnel vision like that. That's the first thing I'll try when I get the new ADC board back from the engineer. As you mention some well placed NOPs or WAITs will help put the propeller back in sync with incoming data. I'm not worried about the slower case. We will be running the ADC flat out. In fact, we plan to average 16 incoming samples into one, then pass that averaged value out to the linux box. In the upper 8 bits of each output sample I need to add a parity bit for each 8-bit word, and a 2 bit channel code. The fifth channel will simply be a 24-bit counter so I can keep track of missing or dropped samples. In the end we'll have 5 channels at 96Khz, each averaged down to 12Khz (shift right 8). These 5 channels of 32-bit, 12Khz data will be passed out to the linux box. My guess is that the linux box (a 200Mhz arm based board) will struggle to keep up. If that's the case I'll have to find a way to pass 16-bits at a time or move to some kind of buffering between the two. Can't say I'm looking forward to that.

I'll keep you posted. Thanks for all the feed back.

Oh yeah, the AD1871 runs about $5 (which gets you 2 channels) in quanity 100. I think that's not a bad buy.

Regards,
Peter

Fred Hawkins · 2007-09-02 19:55

Food for thought: would this project be feasible without the linux box?

pgbpsu · 2007-09-03 04:59

Hi Fred-

I wish, but I don't see how. The linux box is currently how we write these data to SD card (something I think we could do with only the propeller), down sample them and send the downsampled version over wireless ethernet (>5Mbits/second) to a base station. Since the linux box is there we are also using it to add some headers to these data files, record GPS data and start and stop acquisition. The overall idea here is to build 10 or so small datalogging boxes with these ADC and enough brains that we can record data on sight and control them from a remote location. We'll distribute them across about 1 square km and they will all talk with the base station via the wireless.

The linux box also can run C programs which is what we are most familiar with and provides a lot of flexibility. However, the unpredictable timing of the linux system is starting to give us trouble. For example, I ran the above code with the bitloop commented out. I simply placed the value of CNT into Lsamp and handed that out. I trust that the Propeller clock runs at a constant 80Mhz. When I put the CNT values back together on the linux box, it's clear that the time it takes to pass data between the Propeller and the TS varies depening on what the linux system wants to do. If the kernel runs off to do something else our transfer suffer and we end up dropping samples, which is not what we want. If I could get something as deterministic as the propeller I'd be delighted.

A while back we looked at a serial to ethernet device but never really played with it. Maybe we need to go back and look at that.

Do you have something in mind?

Regards,
Peter

Fred Hawkins · 2007-09-03 08:26

Peter,

Truthfully I was fishing. DeSilva's unrolled loop offers some hope -- all those NOP's. Maybe a SDcard with four parallel lines -- they are rated to 25MB* per second. And twice as fast in "high speed mode". (Physical Layer Simplified Specification Version 2.00 from SDcard association.-- simplified means no timing diagrams, and nearly useless.)

*But: A Sandisk doc says 12.5MB per second. They also say max clock is 25mhz as does the association's docs. (Sandisk does show timing details: Product Manual Version 1.9 Document No. 80-13-00169, search for this name: ProdManualSDCardv1.9.pdf at Sandisk.com)

Fred

deSilva · 2007-09-03 18:45

I think there is a basic misunderstanding here. The mass of data (24 bit@ 6 Mhz) is most likely a consequence of the generic delta-sigma process ("oversampling"). It seems there is no decimator used in the ADC, so this must be done in the receiving controller.

Note there never can be any information beyond the recorded bandwith, with audio this is around 48 kHz, ultrasonic can be considerably higher.
So it makes no sense to transmit anything beyond 96 kHz (= 300 kByte/s) times the number of channels. Decimation can be done straight forward but in an orderly way, one can also use moving averages...

Fred Hawkins · 2007-09-03 19:36

deSilva said...
·It seems there is no decimator used in the ADC, so this must be done in the receiving controller.

"Each of the AD1871’s input channels (left and right) can be
configured as either differential or single-ended (two inputs
muxed with internal single-ended-to-differential conversion).
The input PGA features a gain range of 0 dB to 12 dB in steps
of 3 dB. The Σ-Δ modulator features a proprietary multibit
architecture that realizes optimum performance over an audio
bandwidth with standard audio sampling rates of 32 kHz up to
96 kHz. The decimation filter response features very low passband
ripple and excellent stop-band attenuation."
·_____________________________________
"Digital Decimating Filters
The filtering and decimation of the AD1871’s modulator data
stream is implemented in an embedded DSP engine. The first
stage of filtering is the sinc filtering, which has selectable decimation
(selected by the modulator clock control bit (AMC, see
Modulator section). The default decimation in the sinc stage
provides a sample rate reduction of 16; this corresponds with a
MODCLK rate of 128 ¥ fS. The alternate setting of the AMC
Bit gives a sinc decimation factor of 8 that corresponds with a
MODCLK rate of 64 ¥ fS. The output of the sinc decimator
stage is at a rate of 8 ¥ fS.
The filter engine implements two half-band FIR filter sections
and a sinc compensation stage that together give a further
decimation factor of 8. Please refer to TPCs 1 through 4 for
details on the responses of the sinc and FIR filter sections.
TPC 5 gives the composite response of the sinc and FIR filters.
High-Pass Filter
The AD1871 features an optional high-pass filter section that
provides the ability of rejecting dc from the output data stream.
The high-pass filter is enabled by setting Bit 8 (HPE) of Control
Register I to 1. Please refer to TPC 7 and TPC 8 for details of
the high-pass filter characteristics."
·_________________________
"Modulator Mode Enable
The AD1871 defaults to the conversion of the analog audio to
linear, PCM-encoded digital outputs. Modulator Mode allows
the user to bypass the digital decimation filter section and access
the multibit sigma-delta modulator outputs directly. When in
this mode, certain pins are redefined (see Modulator Mode) and
the modulator output (at a nominal rate of 128 fS) is available
on the modulator data pins (D[noparse][[/noparse]0–3]). To enable the Modulator
Mode, set the MME Bit to high."
__________________________

I think the·graph of the Modulator Mode's FFT shows that·skipping the decimation filters does give·LOTS of ultrasonic.·I suspect that they mean to say that when you select Modulator Mode, you bypass the filters. I don't think they mean that you can get the filtered output four bits at a time. Pity.

Post Edited (Fred Hawkins) : 9/3/2007 8:04:19 PM GMT

Fred Hawkins · 2007-09-03 20:31

After following this thread, http://forums.parallax.com/showthread.php?p=670946, I am thinking that directing the AD1871's firehose into a dataflash chip might be way simpler. Atmel's AT26DF321 stores 32Mbit, goes as fast as 66 MHz Maximum Clock Frequency. http://www.atmel.com/dyn/resources/prod_documents/doc3633.pdf

Maybe the prop can set these two chips up and just step out of the way. After they're done recording the stream, moving data to a SDcard would be a piece of cake. And stereo feeds, if it all fits together as easily as I imagine (hardly ever), ought to be simple.

Post Edited (Fred Hawkins) : 9/3/2007 8:36:28 PM GMT

deSilva · 2007-09-04 07:37

Again a nightly Black-Out!
Of course 96kHz sampling * 32 bit * Stereo is 6,144 MHz

But 300kB/S per mono channel was at least correct. (5 stereo channels = 3 MBytes/s)

So this means when you can not deliver all samples, you definitely loose bandwidth.
(This is an uncontrolled implicite low-pass filter.)

There is only a slight advantage of a low-pass filter over a reduced sampling rate.
But other kinds of filters may make sense, especially correlations between the channels.

When handling stereo it is sometimes sufficient to compute the difference signal and filter it down to a much lower bandwidth, as you do not expect the R/L situation to change within micro seconds (FM pilot tone)

Post Edited (deSilva) : 9/4/2007 9:29:06 AM GMT

I2S read to 8-bit write. I need a little help.

Comments