pipe ADC (SPI interface) output directly into SRAM (SPI interface)

iammegatron · 2014-07-22 08:40

I have 16 12-bit ADC's (AD7276) that are driven simultaneously (same SPI clock) to sample 16 channels over some period of time. That's a lot of data to save into Propeller or in this case for most MCU. So I am thinking of piping the data directly into a SPI SRAM chip, and query the data later.

The sampling rate is about 1MHz. So the question is, can Propeller in 1-microSec, SPI clock a data point and then put it into some SRAM?

kwinn · 2014-07-22 20:38

Not sure this could be done with SPI SRAM, but by using the cog counters and a 16bit parallel sram it should be possible. How many samples do you need to buffer in the sram before processing the data?

jmg · 2014-07-22 21:02

iammegatron wrote: »

I have 16 12-bit ADC's (AD7276) that are driven simultaneously (same SPI clock) to sample 16 channels over some period of time. That's a lot of data to save into Propeller or in this case for most MCU. So I am thinking of piping the data directly into a SPI SRAM chip, and query the data later.

The sampling rate is about 1MHz. So the question is, can Propeller in 1-microSec, SPI clock a data point and then put it into some SRAM?

If the ADC sampling rate is 1MHz, then you need ~14~16MHz SPI clock.
I guess you also mean 16 x SPI RAM, to link to the 16 x 12b ADCs you mentioned ?

SPI, memory can stream at ~20MHz once it is addressed, and the ADC can clk > 20MHz

This looks do-able on a split SW-HW basis.
With 32 CLK loads & 16 CS loads, some CLK.CS buffers are probably a good idea

a) first SW mode addresses the SPI memory Prop -> MEM.DI, ADC.CS = Hi (floats ADC.SDATA)
b) Set HW (Prop Counters) to generate 16MHz (CLK's) and 1MHz (ADC.CS) signals
(in this phase, take care to float the Prop pins, to allow the ADC.SDATA to drive the MEM.DI on each ADC-MEM pairing)
c) Run for XX cycles, then stop HW generate.
d) read result using SW SPI mode.

The relative starting phase of HW CLK and HW CS, will define where the ADC aligns inside the 16b fields, and that may be tricky to get right.
During testing, a contention resistor of ~330 ohms or so can protect against the oops of Prop and ADC driving MEM.DI at the same time.

kwinn · 2014-07-22 21:24

@jmg

I took a look at using 16 x SPI RAM and 4 x QSPI RAM. The difficulty there is the initialization of the SPI RAM and writing 12 bits of data into the byte oriented SPI RAM. A single 16 bit wide parallel sram and some external hardware (address counter) reduces prop pin count, chip count, and clock loading. Hard to say which route would be better without knowing how much data needs to be buffered.

Peter Jakacki · 2014-07-22 21:40

This would be one of those rare occasions I would not use a Prop, there are ARM and other chips better suited for this type of thing and with the 16 channels of 12-bit A/D and 192KB RAM on-chip in the case of the STM32F407.

jmg · 2014-07-22 22:49

kwinn wrote: »

@jmg

I took a look at using 16 x SPI RAM and 4 x QSPI RAM. The difficulty there is the initialization of the SPI RAM and writing 12 bits of data into the byte oriented SPI RAM. A single 16 bit wide parallel sram and some external hardware (address counter) reduces prop pin count, chip count, and clock loading. Hard to say which route would be better without knowing how much data needs to be buffered.

I assumed 16 x SPI, but Quad would be ok if the OP can tolerate 1/4 the depth per capture, and reads and writes in quads.
12b of data can writes with 16 clocks, to always takes 2 bytes if random indexing later is needed, or you could lower the CLK count to the chip limit, to pack a little bit more in, but that harder to manage.

I'd favour 16-clocks, to give some lee-way on the launch phase of CLK and CS.

jmg · 2014-07-22 22:52

Peter Jakacki wrote: »

This would be one of those rare occasions I would not use a Prop, there are ARM and other chips better suited for this type of thing and with the 16 channels of 12-bit A/D and 192KB RAM on-chip in the case of the STM32F407.

With 16 channels needing 1 MSPS, a micro is not quite going to make it.
16 separate ADCs gives simultaneous sampling, and 16x the rate of a multiplexed single ADC.
I'm guessing that matters to the OP, or he would have chosen a MUX-style ADC

Peter Jakacki · 2014-07-22 23:16

jmg wrote: »

With 16 channels needing 1 MSPS, a micro is not quite going to make it.
16 separate ADCs gives simultaneous sampling, and 16x the rate of a multiplexed single ADC.
I'm guessing that matters to the OP, or he would have chosen a MUX-style ADC

So as you have said, a micro is not going to make it which begs the question why would the op be trying to do this with a Prop? The STM32F407 does have 3 A/D converters to provide 16 channels at 2.7MSPS each but the important thing is if that speed were insufficient that he could still have the external A/D chips because the ARM has three high speed SPI that operate at least from 21MHz each. Or there is the option to just use two ARM chips perhaps so that 6 converters at 2.7MSPS but slightly over-clocked could handle 16 1MSPS channels. But the OP is not quite clear on the sampling rate or why.

abecedarian · 2014-07-23 00:06

Since it was brought up, there are micros that can do it.

TM4C1294NCPDT (similar to the STM32F407) has 4 SSI modules (synchronous serial interface) that support quad-SPI, so one SSI module could be set up to read the ADC's, one could be set up to write to SPI RAM, and the DMA module could transfer data from one to the other with minimal core intervention, and if the SPI RAM doesn't support bi- or quad- SPI, the 256KB RAM on the chip and/or RAM on EPI would make a heck of a buffer.

Peter Jakacki · 2014-07-23 00:30

abecedarian wrote: »

Since it was brought up, there are micros that can do it.

TM4C1294NCPDT (similar to the STM32F407) has 4 SSI modules (synchronous serial interface) that support quad-SPI, so one SSI module could be set up to read the ADC's, one could be set up to write to SPI RAM, and the DMA module could transfer data from one to the other with minimal core intervention, and if the SPI RAM doesn't support bi- or quad- SPI, the 256KB RAM on the chip and/or RAM on EPI would make a heck of a buffer.

Isn't the TM4C unobtainium?

abecedarian · 2014-07-23 00:40

Peter Jakacki wrote: »

Isn't the TM4C unobtainium?

That one so far yes, as a chip, but they do have it available on one of their 'launchpad' kits.
The TM4C123GH6PM is available but I don't think it could keep up since it's limited to 12MHz SPI, IIRC.

And sorry for perpetuating the off-topic.

jmg · 2014-07-23 00:43

abecedarian wrote: »

Since it was brought up, there are micros that can do it.

TM4C1294NCPDT (similar to the STM32F407) has 4 SSI modules (synchronous serial interface) that support quad-SPI, so one SSI module could be set up to read the ADC's, one could be set up to write to SPI RAM, and the DMA module could transfer data from one to the other with minimal core intervention, and if the SPI RAM doesn't support bi- or quad- SPI, the 256KB RAM on the chip and/or RAM on EPI would make a heck of a buffer.

I make that only 8 pathways, the OP wanted 16

Isn't the TM4C unobtainium?

Findchips says it is expensive ($18.64), zero stocked, and in big packages (128TQF)

You could also do this HW-DMA bursts with almost any small uC and a SPLD for CLK and CS drives..

iammegatron · 2014-07-23 08:29

Thanks for the discussion. Please allow me to elaborate the problem I have

I need to use DAC to generate a voltage signal and ADC to sample the resulting current signal (for impedance measurement for example).

I used to have a PIC32MX (at 80MHz clock) with timer interrupt sync-ed clock to generate SPI signal to drive both an external DAC and external ADC (reasons not to use the internal DAC and ADC). But the interrupt routine (written in assembly) can only go as fast as 100kHz (10us). That is just to read 1 ADC. And the resulting data takes about 24K-byte RAM.

Now I have 16 ADC's to read. There are certainly large RAM PIC32 (the recent one being PIC32MZ 512K RAM, claiming 200MHz clock). It's going to be very tight.

That's why I wonder if I can instead have 16 ADC's stream into 16 32K SPI RAM's directly. Then the MCU just needs to clock fast enough.

"jmg" raised a good point. The difficult part is to have the address for the RAM spliced in during ADC data piping. Wonder if someone has done this sort of thing before. Or any better ideas?

Mark_T · 2014-07-23 08:45

How about clocking into an SDRAM - clock the video generator at the required rate (20MHz?), its 8 outputs
drive the control signals to the SDRAM and CS on the ADCs... Playback at a slower rate later.

abecedarian · 2014-07-23 10:00

Just thinking out loud here....

Most SPI RAM have an auto-incrementing addressing, meaning that once written to, unless another address is specified it will write to the next contiguous address.

So what if you had one cog doing only ADC reads and one doing only RAM writes, and it was set up in a round-robin fashion such that cog A reads an ADC, stores the value and moves to the next, cog B writes the value to the appropriate chip and moves to the next and so on? You'd need some sort of semaphore / flag system so that the cogs remain synchronized. I think you could have 1 SCLK shared among the ADC / RAM, 16 chip-selects shared among ADC and RAM, one MOSI/MISO pair shared across ADC, and one pair shared across RAM: 22 pins. The CS lines could be shared like: ADC1 / RAM16; ADC2 / RAM1; ADC3 / RAM2... ADC16 / RAM15. The only issues, I think, would be the first read since there's nothing yet ready to write, and the final write since there's not another read, and could they reach 1MSPS.

Bill Henning · 2014-07-23 10:15

The really basic questions are:

1) Do you REALLY need 1Msps for all 16 channels?

2) Do you REALLY need 16 channels?

4) How many samples must you capture from each channel?

5) How long a burst must you capture?

6) Is there a regular dwell time (ie when you don't need to sample)?

Until you answer those basic questions, there is not enough information to do a proper design.

Regards,

Bill

iammegatron · 2014-07-23 13:36

Bill Henning wrote: »

The really basic questions are:

1) Do you REALLY need 1Msps for all 16 channels?

yes.

2) Do you REALLY need 16 channels?

yes.

4) How many samples must you capture from each channel?

12,000 samples

5) How long a burst must you capture?

12,000 samples at 1Meg sample/sec = 12 msec

6) Is there a regular dwell time (ie when you don't need to sample)?

Yes, after 12 msec. I have all the time I need to process/store/retrieve the sampled data.

Until you answer those basic questions, there is not enough information to do a proper design.

Regards,

Bill

Thanks for asking.

jmg · 2014-07-23 13:37

iammegatron wrote: »

...That is just to read 1 ADC. And the resulting data takes about 24K-byte RAM...
Now I have 16 ADC's to read..

That's why I wonder if I can instead have 16 ADC's stream into 16 32K SPI RAM's directly. Then the MCU just needs to clock fast enough.

"jmg" raised a good point. The difficult part is to have the address for the RAM spliced in during ADC data piping.

If you are reading ~ 12000 samples at a time( as 24k indicates above), you only need the address defined once at start.- from there, it AutoINCs (and wraps), and ADC and SPI can run up to ~20MHz streaming.

24k/chan does allow you to nicely pack 4 chans into a 128k QuadRAM
(there is a little RAM 'wastage', as you cannot get a conversion done in 12 clocks, but 16b stores looks easy.)

You would SW-set the RAM address (all RAM can set in parallel), ready Counters, FloatPins, then wait on trigger, and then you can quickly start the CLK & CS stream clocks.

WAIT has 1 tSys granularity/jitter, and opcodes to start CNT are 4 tSys each, so I get ~ 9 tSys or ~ 140ns if you use 64MHz fSys ( ~112ns @ 80MHz, which could support 1.25MSPS x 16 chans ) - jitter is always ~tSys

The nice thing about a Prop for this sort of work, is the COG doing this is isolated from all other code, so it takes a bit to get the start phase right, but once done, it is unlikely to break on other changes.
A PLD could also do the CLKs, but (PLD+AnyuC) does not handle trigger as well as a Prop.

jmg · 2014-07-23 13:45

iammegatron wrote: »

12,000 samples at 1Meg sample/sec = 12 msec.

Assuming an easy 16b of RAM used per sample, (means you can choose a 14b ADC later too

)
you can fit 2^14 samples per 128k SPI part packed as quad, or 2^16 per 128k, if you use one SPI-RAM per ADC, to allow some design head room.

PCB design may be simpler with 1 RAM per ADC, and a Prop could support 24+ pairs, and RAM is cheaper than most ADC anyway...

Bill Henning · 2014-07-23 20:00

You are welcome, and thanks for the answers.

Ok, it is feasible, but only because of the ability to read back at a saner pace.

FYI, ADC samples at 1MHz, but ram/adc must be clocked at ~16MHz.

Probably needs a small CPLD to help manipulate the /CS signals to the memories, as all must be asserted during writes, but only one during reads.

Not needed. Can ignore unneeded bits during read.

Will require using the counters, and SPI ram chips with minimum 16MHz SPI clock.

Interesting little project.

16 channels * 12k samples * 2 bytes per sample = 384KB of ram needed, PIC32MZ with 512KB of ram using PMP is probably a better choice.

Easiest to do in an FPGA, with enough bram.

This would be a piece of cake on the upcoming prop2.

Quote Originally Posted by Bill Henning View Post
The really basic questions are:

1) Do you REALLY need 1Msps for all 16 channels?

yes.

2) Do you REALLY need 16 channels?

yes.

4) How many samples must you capture from each channel?

12,000 samples

5) How long a burst must you capture?

12,000 samples at 1Meg sample/sec = 12 msec

6) Is there a regular dwell time (ie when you don't need to sample)?

Yes, after 12 msec. I have all the time I need to process/store/retrieve the sampled data.

Until you answer those basic questions, there is not enough information to do a proper design.

Regards,

Bill

kwinn · 2014-07-23 20:57

Can be done with 16 ADC's, 1 32K x 16 sram, a prop, and 3 or 4 7400 chips.

I have done something like the reverse of this in the past when I needed to replace a Victor Lister with an interface that could send data to a parallel printer and a serial modem. One bit of an eprom was used to shift out the serial data, one bit was used to enable a 555 timer, one was used as the strobe to the printer, and the remaining 5 bits were used for 0 – 9, CR, and LF. What you want to do is conceptually simpler and has the advantage of using a prop chip in place of a 555 and eprom.

Based on what has been posted so far getting the data from the ADC's is relatively simple and straight forward. Setting up the RAM and writing the data to it is a bit more difficult, as is reading the data and shuffling the bits to make sense of the data.

On the ADC side all that seems to be required is a single /CS, and SCLK, signal going to all of the ADC`s in parallel. A low CS and fourteen or sixteen clocks will shift the data out.

If you are using one SPI RAM per ADC once the start address has been set up you can clock 8 bits of data in, toggle /CS, and shift the next 8 bits in. Synchronizing the data out from the ADC with simultaneously inputting it to the RAM is the tricky part. Reading the data is simple since two consecutive bytes from each SPI ram chip will hold one reading for the ADC it is connected to.

If you use a QSPI memory the timing is similar to an SPI ram but the bits of each byte will need to be shifted around when the data is transferred from the QSPI ram. Each byte will hold 2 bits from each of the four ADC's connected to it.

If you use a parallel 16 bit wide sram then bit 1 of each word would always contain the data from ADC 1, bit 2 data from ADC2, etc., to bit 16 being data from ADC16. This would be my preferred method, and IMHO the simplest way to do it even though you would need to shift every bit from each ram location to one of 16 registers to resynchronize the data.

jmg · 2014-07-23 21:40

kwinn wrote: »

Can be done with 16 ADC's, 1 32K x 16 sram, a prop, and 3 or 4 7400 chips.

I presume you mean 74xx family parts, used to increment the address of the BIG SRAM @ 16MHz LSB speed ?
The PCB design would be less flexible, and I think you need bigger than 32K x 16 ?
- if the OP wants 12000 samples, 16 channels and 16bps, into one RAM, that's 256k x 16 SRAM.

iammegatron · 2014-07-24 05:41

As was pointed out, the difficult part is to sync the HW clock (counter)--which drives ADC and SPI RAM, with the Propeller assembly code--which supplies RAM address at the beginning.

I can understand that most MCU, when comes to assembly code, can sync assembly code with the main execution clock. In a sense, like coding Verilog in FPGA.

Propeller assembly is no exception. Or even exceptional in this respect.

Is there sample code like this, "syncing" HW clock/counter, with assembly code in Propeller?

kwinn · 2014-07-24 06:35

jmg wrote: »

I presume you mean 74xx family parts, used to increment the address of the BIG SRAM @ 16MHz LSB speed ?
The PCB design would be less flexible, and I think you need bigger than 32K x 16 ?
- if the OP wants 12000 samples, 16 channels and 16bps, into one RAM, that's 256k x 16 SRAM.

Yes, if 12,000 16 bit samples are required then the ram needs to be 256K x 16. The PCB design may be a bit less flexible but it simplifies clocking the data out of the ADC's and into the RAM, and clocking the data from the RAM to the propeller. Each of the 3 memory choices ( SPI, QSPI, parallel RAM ) have advantages and disadvantages. The advantage of the parallel RAM is that it can be clocked with the same signals that clock the ADC's, and more samples can be stored in less memory and time by using 14 clock pulses rather than 16. In the RAM the data can be stored as 12 bits rather than as the 2 bytes the SPI and QSPI chips require.

PS - Yes, I meant 74xx parts. Old habits die hard.

jmg · 2014-07-24 13:53

iammegatron wrote: »

As was pointed out, the difficult part is to sync the HW clock (counter)--which drives ADC and SPI RAM, with the Propeller assembly code--which supplies RAM address at the beginning.

I can understand that most MCU, when comes to assembly code, can sync assembly code with the main execution clock. In a sense, like coding Verilog in FPGA.

Propeller assembly is no exception. Or even exceptional in this respect.

The details matter, and there the Prop is rather different.

The WAIT opcodes are 1 fSys granular, so they pause the core to 12.5ns @ 80MHz (15.625 ns @ 64MHz) precision on a Counter or pin event. (trigger). From there, opcodes take 4 sysClks each, but their start-phase is 12.5ns accurate.
That is a lot better than a generic micro + interrupt design, but you used 1 COG to do it..

Prop COG Counters, are also really 32b adders, another detail that is important.*

iammegatron wrote: »

Is there sample code like this, "syncing" HW clock/counter, with assembly code in Propeller?

Someone may have a link ?

Study AN001 carefully, I think the Sync/launch is 3 lines of ASM and 3 more to stop
a) - one line using one of WAITPEQ, WAITPNE, or WAITCNT to time the start
b)- one line to Start CTRA in NCO Single-ended mode, for 16MHz CLK (50%)
c)- One line to Start CTRB in Duty mode Single ended for 1MHz CS (1 CLK Hi, 63 clks low)
d) - one line using WAITCNT to time the stop (12ms?) 15.625ns precision is available here, stop on a clean sample boundary.
e)- one line to Stop CTRA
f)- One line to Stop CTRB

a) times(releases) to 15.625ns granularity, and b) and c) follow 62.5ns each from that.

NCO fo formula is fSys*FRQA/2^32
64M*2^30/2^32 = 16000000, (divided by 4) so you want the adder value FRQA to be 2^30

Duty mode, AN001 Fig10 shows the effect - you need CS 1MHz repeat = 64MHz/64, so you have 63 sys clks low and 1 sys clk hi. IIRC the 15.625ns HI time,is ok for that ADC (Adder value FRQB is 2^26)

* Every clock FRQA adds to PHSA, and outputs either MSB(NCO mode), or CY(Duty mode), so you can control the CLK phase by preloading PHSA and CS phase by preload of PHSB, before you start the counters.
From there, they add the same each SysClk, so stay in sync.

64MHz is suggested to start with, as that gives a clean binary ratio adder, which avoids numeric jitter effects.

[Once you have that working, if you really needed 80MHz for other tasks, you could try this

fo = 80M*round(2^32/(80M/16M))/2^32 = 15999999.9962747097
- notice that is (very) close, but not quite an exact /5, which means there are rare /6
- also note the 1MHz CS is (very) close, but not 16:1, so phase creep occurs over long time periods.
If you preload every time and trigger short (12ms) bursts, those effects may be tolerable, but users should be aware of the effect.]

pipe ADC (SPI interface) output directly into SRAM (SPI interface)

Comments