Maximum Sampling Speed

lanternfish · 2010-10-28 20:01

Hi - a noob here

A noob question that I am having trouble finding an answer for.

What would be the fastest sampling speed under the following circumstances:

Assembler: detect sample ready pulse, read 24-bits on port(s) and storing in HUB ram?

Assume 100MHz clock.

Cheers

jazzed · 2010-10-28 22:00

lanternfish wrote: »

What would be the fastest sampling speed

10ns sample period for < 500 samples using 5 COGs. 100MByte/s COG to HUB HUB storage rate (75MB/s for 24 bits). Sample and storage rate would be separate in this case for maximum aggregate 50MB/s rate.

Another method can be used to sample and store data all at once but I'm not sure what the back-to-back latency is, so a more precise number is illusive. Maybe Kuroneko knows that answer.

kwinn · 2010-10-28 23:10

Best I could come up with on short notice was between 31 and 54 clocks, 310 to 540nS or roughly 2 to 3 million samples a second for one cog. Acquiring the data can take as little as 12 clocks, or 20 clocks if you just miss the data ready pulse. writing the data to hub can take from 19 to 34 clocks depending on hub synchronization.

pjv · 2010-10-29 12:32

Lanternfish;

If one could locate the ReadyBit on P0, and the databits on P1..P24, then I think the following would work, a somewhat weird way to use the Ina shadow register,..... but I have not tested it:

Begin...shr..........Ina,Ina.........wc......'get ready bit into carry, and right justify databits
if_nc....jmp.........#Begin...................'loop until ready bit high
..........wrlong......ina,Hubptr..............'put it away
..........add..........HubPtr,#4..............'presumably we need to advance the storage
..........waitpne....State,#1................'wait for ready pulse to leave
..........jmp..........#Begin...................'go again

State...long 1
HubPtr..long BufferStart

This sequence consumes 28 clocks plus extra for the hubwrite, so it will spin at 32 clocks.
At your suggested 100MHz clock that means 320 nanoseconds; or just over 3Ms/S

Cheers,

Peter (pjv)

Hanno · 2010-10-29 13:18

Lanternfish,
As Jazzed alluded to with his 10ns sample period using 5 cogs- you can get quite high sample rates using multiple cogs. Basically, think of a 4 cylinder 4 stroke engine. Each cylinder only fires once every 4 strokes. However, if you "interleave" them- you can set up your engine so that one of the 4 cylinder's fires every stroke. Now apply this to cogs: Interleave your cogs execution so that:
cog 1 starts sampling at cnt=n+1
cog 2 starts sampling at cnt=n+2
cog 3 starts sampling at cnt=n+3
cog 4 starts sampling at cnt=n+4
You can dynamically create a pasm program that executes a whole bunch of
mov x,ina
...
mov x,ina

So- each cog will take a sample of all 32 bits every 4 clock cycles. And, since you have 4 cogs doing this, you're sampling every clock cycle. That gets you 100Msps or 10ns sample period with a 100MHz clock. Since you have 4 cogs doing this, you can store not 500 samples- but close to 2000- depending on how clever your dynamic program is.

I did this ~3 years ago with ViewPort's QuickSample object- it let's you use your Propeller as a logic state analyzer to monitor the pins of your Propeller while the rest of the Propeller's cogs are running your code. ViewPort is a PC application that graphs this data in a simulated logic analyzer- with timescale, measurements, trigger.... People have used it to troubleshoot all sorts of protocol issues.

I refined the above algorithm for use in the Parallax PropScope. There, it continually samples into cog ram while looking for a trigger. This allows the PropScope to show you samples taken BEFORE the trigger.

See my sig for links- including a review of ViewPort in Robot magazine...

Hanno

lanternfish · 2010-10-29 14:06

Still getting my head around how a Prop operates and using multiple cogs so thanks for the excellent replies.

Phil Pilgrim (PhiPi) · 2010-10-29 14:45

lanternfish,

What exactly is the timing of your sample pulse relative to the data on the 24 input lines? IOW, is the sample ready on a rising edge, falling edge, or ... ? How wide is the pulse?

-Phil

lanternfish · 2010-10-29 14:59

Hi Phil

Sample on rising edge. Timing pulse is 20 - 30ns.

Cheers

Phil Pilgrim (PhiPi) · 2010-10-29 15:33

What's the setup time for the ready pulse, i.e. the time between when the data is ready and the pulse rising edge? What's the data hold time after the falling edge of the ready pulse? How many data in one burst? How much time between bursts?

-Phil

lanternfish · 2010-10-29 19:36

The ADC outputs data every 25ns. The Ready Pulse goes high approx 12ns later and holds for approx 12ns
Each burst will be 800 (24 bit) values.
Time between bursts is 6.4us

Having found a bit more info in the FAQ section (and other areas) I think the Prop will be suitable, with tight assembler code, to do the following:

800 x 600 SVGA signal applied to Analog Devices AD9985A (or similar).

Prop reads in each line (800 24-bit values) using Hanno's 4 cog suggestion (?) and stores in HUB RAM (video buffer).
5th cog decimates two lines when they are completed from 1600 to 400 24-bit values. Also sets buffer pointer(s). Or alternatively decimates 800 to 400 24-bit values as sampling is progressing with final 2 to 1 line reduction after last 1600th 24-bit value read.

The decimation is an averaging of line(x)pixel(y) + line(x)pixel(y+1) + line(x+1)pixel(y) + line(x+1)pixel(y+1)

6th cog? outputs this in SPI mode to LED drivers (type yet to be decided)

Of course there is insufficient COG RAM to digitize and decimate a full 800 x 600 frame to 400 x 300 so I am not sure how many decimated video lines can be stored in HUB RAM for SPI output.

And it gets a little messier as the 6th cog will not be outputting 24-bit values from 1 - 400, 401 - 800 but 1,401,801 .... to max lines stored i.e. outputs pixels in columns rather than rows.

The result, a POV rotating 360deg video display. Or complete failure.....

I hope all that makes sense.

Phil Pilgrim (PhiPi) · 2010-10-29 22:54

lanternfish wrote:

The decimation is an averaging of line(x)pixel(y) + line(x)pixel(y+1) + line(x+1)pixel(y) + line(x+1)pixel(y+1)

... for each of the packed 8-bit RGB values, so you won't be adding longs to longs, right?

-Phil

lanternfish · 2010-10-30 02:03

I have not delved into Prop operators yet but believe that I can manipulate HUB RAM in byte, word or long formats.

Another option is to physically route the ADC RGB outputs to the Prop inputs such that the RGB values are 3 x 5 bits but are read as 3 x 8 bit:

(for each of the ADC RGB)
ADC Prop
RGB R G B
D7

P4 P12 P20
D6

P3 P11 P19
D5

P2 P10 P18
D4

P1 P9 P17
D3

P0 P8 P16
D2 - nc
D1 - nc All other Prop Pn tied to 0V (via pulldown?)
D0 - nc

As this decimation is a simple form of bilinear interpolation there will be some degradation of the final RGB values.

I'm not sure if I have this correct but believe I can store at least 30 decimated lines in HUB RAM.

Jack Buffington · 2010-10-30 08:36

This isn't going to be possible as you are proposing it. The prop doesn't have enough speed to capture every pixel in an 800x600 pixel image. The pixel clock rate is too high. At 56 Hz refresh, your pixel clock will be 36 Mhz. You would have to clock your propeller at 144 MHz to do that. You could capture every other pixel on a scan line and let a RC filter do your averaging for you though. That would work.

I wrote a program that captures groups of pixels from a 640x480 x 60Hz signal and sends it out to LED arrays. I can't share my code but I can describe it a bit. I had to overclock the prop to 100.7MHz to have one instruction cycle per pixel. I captured using one cog but a single cog doesn't have enough RAM to grab every pixel in even a scan line of 400 pixels. I think that you would also need two serial cogs going to get all of that data out but I never really figured out what portion of my serial stream I was using since I wasn't using all of the pixels in the image. Something that I did to help speed things up was to capture in 16-bit color rather than 24 bit. This allowed me to pack the data into hub RAM and get it out of the serial port faster.

Now that I think about it, you *might* be able to clock your prop at 72MHz and do some fancy timing so that you use two cogs to sample at different times. This would allow you to sample at every pixel. One of the cogs would output the clock for the ADC. That is the shaky part. The data sheet says that the PLL for the counters can only accept up to an 8 MHz signal. You would need to feed it a 9MHz signal to output 144 MHz. This would probably work. That is 12% faster than the PLL is rated for but I was running my prop at 100.7 MHz, which is 25% faster than the system clock PLL is rated for. Presumably they are the same circuit.

Make sure to design using a four-layer circuit board with a big ground plane. My first version didn't do this and although it worked, it had some noise in the video.

Good luck.

lanternfish · 2010-10-30 13:34

Thanks for that Jack.

Hanno suggested in post #5 to run 4 cogs interleaved to overcome the sampling problem.

And reducing from 24 (3 x 8) to 15 (3 x 5) bit sampling will allow more display lines to be stored in HUB RAM.

As the final 400 x 300 output does not have to have an excellent image we are not too worried about noise at this point.

What LED drivers did you use on your project?

Jack Buffington · 2010-10-30 14:02

I was at first planning on using the TLC5940 with a newer PIC18F chip but ran into a lot of issues with the PIC since it was too bleeding edge and my compiler didn't support its features well enough yet. In the end, I just used a propeller, some transistors, and a 74XX238 chip for every 8x8 grid of RGB LEDs. That allowed me to customize the PWM routine to exactly what I needed and made it easy to transmit/receive the data to/from all of the arrays. It also saved about $3 on each circuit board. The cool thing is that I can interactively do stuff like color balancing on the arrays themselves. I could also calculate some interesting eye candy if I wanted to because coming up with 64 pixels worth of data is easily within the processing power of a propeller.

lanternfish · 2010-10-30 14:41

@ Jack

Rather than matrix panels as per your project, this project will be a 'column' of LED's rotating at 60 rpm.

There will in fact be two columns with an approx 10mm horizontal offset and a 3mm vertical offset. This provides an effective pixel spacing of 3mm.

The main PCB(s) will be the ADC, and Prop/LED drivers. A wiring loom will run out to the LED boards.

The rotating drum motor is from a Fisher & Paykel Smartdrive (701). Another 701 motor sits on top of the drum and is wired as an alternator and will power the LED electronics and a netbook that will provide the video source. The netbook will be controlled wirelessly.

The rotating drum is currently under construction.

Cheers

Hanno · 2010-10-30 14:58

Sounds like a very cool project Lanternfish!

You write about Fisher& Paykel, wiring loom and use "mm"- are you in New Zealand?

If you turn your camera sideways you may be able to do much of IO "on the fly"- to make the most of the Prop's limited RAM... So, you would scan one "vertical" line from the camera- and then directly output it to the LED's. Then, scan the next line and output that to the LED's which are then in the new position. This way you wouldn't have to store the whole frame- at most one line!

With one cog my ViewPort PropCapture object grabs ~200x240 pixels at 4bits into HUB RAM- however only grayscale. This consumes most of the 32kb- so I haven't pursued color. The cog can also do on-the-fly detection of the brightest spot- useful for tracking a laser pointer.

I've been very happy with the ADC08100 to convert the NTSC's analog signal into 4 digital inputs to the Propeller- I wrote a Circuit Cellar article on this- as well as 2 chapters in the "Official Propeller Guide".
Hanno

tdlivings · 2010-10-30 15:51

Hanno
Which issue of Circuit Cellar was your article in?
Thank's
Tom

lanternfish · 2010-10-30 16:17

Hanno wrote: »

...You write about Fisher& Paykel, wiring loom and use "mm"- are you in New Zealand?

Yup - Dunedin

If you turn your camera sideways you may be able to do much of IO "on the fly"- to make the most of the Prop's limited RAM... So, you would scan one "vertical" line from the camera- and then directly output it to the LED's. Then, scan the next line and output that to the LED's which are then in the new position. This way you wouldn't have to store the whole frame- at most one line!

Yes, we had thought of that. But it doesn't quite suit the projects visual goal. A hell of a lot easier for sure. And could make a good rotating poster bollard.....

... I've been very happy with the ADC08100 to convert the NTSC's analog signal into 4 digital inputs to the Propeller- I wrote a Circuit Cellar article on this- as well as 2 chapters in the "Official Propeller Guide".
Hanno

So that was you. We need to mix stills and live video (albeit with some latency) so will be using a netbook with powerpoint & wireless (UltrtaVNC or similar) rather than PAL/NTSC video feed.

And why do they call a 4.1 an aftershock?

Humanoido · 2010-10-30 18:43

Hanno wrote: »

Lanternfish,
As Jazzed alluded to with his 10ns sample period using 5 cogs- you can get quite high sample rates using multiple cogs. Basically, think of a 4 cylinder 4 stroke engine. Each cylinder only fires once every 4 strokes. However, if you "interleave" them- you can set up your engine so that one of the 4 cylinder's fires every stroke. Now apply this to cogs: Interleave your cogs execution so that:
cog 1 starts sampling at cnt=n+1
cog 2 starts sampling at cnt=n+2
cog 3 starts sampling at cnt=n+3
cog 4 starts sampling at cnt=n+4
You can dynamically create a pasm program that executes a whole bunch of
mov x,ina
...
mov x,ina

So- each cog will take a sample of all 32 bits every 4 clock cycles. And, since you have 4 cogs doing this, you're sampling every clock cycle. That gets you 100Msps or 10ns sample period with a 100MHz clock. Since you have 4 cogs doing this, you can store not 500 samples- but close to 2000- depending on how clever your dynamic program is.

I did this ~3 years ago with ViewPort's QuickSample object- it let's you use your Propeller as a logic state analyzer to monitor the pins of your Propeller while the rest of the Propeller's cogs are running your code. ViewPort is a PC application that graphs this data in a simulated logic analyzer- with timescale, measurements, trigger.... People have used it to troubleshoot all sorts of protocol issues.

I refined the above algorithm for use in the Parallax PropScope. There, it continually samples into cog ram while looking for a trigger. This allows the PropScope to show you samples taken BEFORE the trigger.

See my sig for links- including a review of ViewPort in Robot magazine...

Hanno

What kind of sample rate could you get with 40 props?

Hanno · 2010-10-31 00:54

We've had weekly aftershocks measuring 5.x... In most other places those would and have caused significant loss of life and damage. We've started betting pools to see who can most accurately determine a shock's Richter value. I recently came across the mapping between peak g-force, duration and Richter value- so together with my Propeller powered earthquake monitor (posted earlier) I should do well

Give me a call if/when you're brave enough to visit Christchurch!

Humanoido-
More props give us more sampling cogs. To interleave additional cogs, we can take advantage of speed of light- ~30cm/nanosecond. 4 cogs lets us sample every 10ns. If we wanted to sample every 1ns- for a rate of 1gsamples/second- then we would need 40 cogs- doable in 5 or 6 propellers. The clock wire to the other propellers could be delayed using several feet/meters of bare wire- to delay cogs 4-7 by 1ns, 8..11 by 2ns and so on. Sounds like fun

Tom,
Circuit Cellar article is here:
http://www.circuitcellar.com/archives/viewable/224-Sander/index.html

All my articles are listed here:
http://hannoware.com/press.php

Blog is here:
http://blog.hannoware.com

If people are interested I could post a tutorial on connecting the ADC08100 and camera to the Prop...
Hanno

Humanoido · 2010-11-01 05:45

Hanno wrote: »

Humanoido-More props give us more sampling cogs. To interleave additional cogs, we can take advantage of speed of light- ~30cm/nanosecond. 4 cogs lets us sample every 10ns. If we wanted to sample every 1ns- for a rate of 1gsamples/second- then we would need 40 cogs- doable in 5 or 6 propellers. The clock wire to the other propellers could be delayed using several feet/meters of bare wire- to delay cogs 4-7 by 1ns, 8..11 by 2ns and so on. Sounds like fun Hanno

What is the g in 1gsamples/second? So with enough props and some predetermined lengths of light-speed wire, a machine could be assembled to measure samples as fast as a 100MHZ to 1GHZ scope. Is there an upper limit on this?

kwinn · 2010-11-01 06:48

Humanoido wrote: »

What is the g in 1gsamples/second? So with enough props and some predetermined lengths of light-speed wire, a machine could be assembled to measure samples as fast as a 100MHZ to 1GHZ scope. Is there an upper limit on this?

I am wondering if speeds this high really are doable without high speed latches on each prop. How do you capture an input signal that is of shorter duration than the time required for a mov instruction to capture the data?

jazzed · 2010-11-01 08:51

kwinn wrote: »

How do you capture an input signal that is of shorter duration than the time required for a mov instruction to capture the data?

Use multiple COGs. Each COG can sample INA at different times.
Relying on the speed of light for different sample points is interesting.

Sounds like a good Propeller tower application. All you need now is wire,
a meter stick, and some software development skills. You could do 10Gs/s

Hanno · 2010-11-01 21:08

Humanoido- yes g for giga! 1Billion samples/sec! Woohoo...

When I was programming ViewPort QuickSample object I was watching the "cnt" register instead of the "ina" register, and was happy to see all clock activity on a 100MHz Propeller. I've also used hooked up a 100Msps ADC and gotten a nice signal in ViewPort. I have not hooked up multiple Propellers to take advantage of the "slowness" of light but don't see any reason why it wouldn't work- at least up to 1Gsmps. Of course at some point it may be easier to just switch to faster hardware, rather than using more Propellers...

A nice demonstration of the speed of light is to use the function generator of the PropScope to output a signal and measure that signal with Ch1. PropScope can graph both the function generator output and the ch1 input on the same screen- up to 50ns/div. There's a constant delay due mainly to the ADC lag- but you can see and measure the effect of using a longer wire between output and input- 1ns/foot!

kwinn · 2010-11-01 21:50

Guess I did not ask the right question or perhaps did not make it clear. I am basically referring to the fact that the ready pulse would have to be no longer than the time the data is valid, so what is the shortest ready pulse the prop can detect with a waitxxx instruction.

Also, at the very minimum a cog would need for the data to be valid long enough for the waitxxx and at least one more instruction (to input the data) to execute. At 100MHz (10nS per clock) that would be 80nS.

If I am wrong here please explain how this would be accomplished.

lanternfish · 2010-11-01 22:27

I'm taking stab in the dark on answering this but I am guessing 40ns @ 100Mhz. This is assuming 1/2 1st instruction time plus 1/2 second instruction time. So trigger clock would be 25MHz. Yet Hanno seems to allude to greater vakues?

Hanno · 2010-11-01 23:00

We've been referring to burst sample speed. We're assuming all cogs are synchronized to some trigger event- either by using waitcnt or waitpne. Then, a series of mov x,ina instructions sample the ina port every 4 cycles- for every cog. On one propeller you can interleave 4 cogs so that a "mov" is done every cycle. If we assume that the actual sampling time is very short (ie something like 1ns), then by interleaving multiple propeller's using delay lines for the common clock you could attain higher rates.

This approach is not able to sample before the trigger- or up to 6 or so cycles after the trigger has been found by waitpne. For the PropScope I had to use yet another cog to allow high speed samples to be taken BEFORE the trigger event.

kwinn · 2010-11-02 09:44

Hanno, that's what I thought. I understood how multiple cogs could be synchronized to read or write at the rate of one r/w per clock cycle, but could not see any way to do that using a pin and waitxx to indicate data was ready before reading it.

Still sometimes falling into old habits of thought when inputting data as my first post in this thread shows.

jazzed · 2010-11-02 10:42

kwinn wrote: »

Hanno, that's what I thought. I understood how multiple cogs could be synchronized to read or write at the rate of one r/w per clock cycle, but could not see any way to do that using a pin and waitxx to indicate data was ready before reading it.

Looking back at your post now it's obvious what you were asking.

Now, I'm not sure if waitxx would miss a pulse between 1 and 4 clock cycles wide unless the pulse starts at almost exactly the same time as waitxx. Maybe kuroneko would know.

Humanoido · 2010-11-02 13:41

Hanno wrote: »

Humanoido- yes g for giga! 1Billion samples/sec! Woohoo...

When I was programming ViewPort QuickSample object I was watching the "cnt" register instead of the "ina" register, and was happy to see all clock activity on a 100MHz Propeller. I've also used hooked up a 100Msps ADC and gotten a nice signal in ViewPort. I have not hooked up multiple Propellers to take advantage of the "slowness" of light but don't see any reason why it wouldn't work- at least up to 1Gsmps. Of course at some point it may be easier to just switch to faster hardware, rather than using more Propellers...

A nice demonstration of the speed of light is to use the function generator of the PropScope to output a signal and measure that signal with Ch1. PropScope can graph both the function generator output and the ch1 input on the same screen- up to 50ns/div. There's a constant delay due mainly to the ADC lag- but you can see and measure the effect of using a longer wire between output and input- 1ns/foot!

Hanno, incredible and that's just crazy - I mean it's totally crazy in a very good way! ..Mainly any light speed experiments are fantastic. I once determined the speed of light with Jupiter's Moons and a small telescope. So I've got that value to work with.

Maximum Sampling Speed

Comments