Maximum Sampling Speed
lanternfish
Posts: 366
Hi - a noob here
A noob question that I am having trouble finding an answer for.
What would be the fastest sampling speed under the following circumstances:
Assembler: detect sample ready pulse, read 24-bits on port(s) and storing in HUB ram?
Assume 100MHz clock.
Cheers
A noob question that I am having trouble finding an answer for.
What would be the fastest sampling speed under the following circumstances:
Assembler: detect sample ready pulse, read 24-bits on port(s) and storing in HUB ram?
Assume 100MHz clock.
Cheers
Comments
10ns sample period for < 500 samples using 5 COGs. 100MByte/s COG to HUB HUB storage rate (75MB/s for 24 bits). Sample and storage rate would be separate in this case for maximum aggregate 50MB/s rate.
Another method can be used to sample and store data all at once but I'm not sure what the back-to-back latency is, so a more precise number is illusive. Maybe Kuroneko knows that answer.
If one could locate the ReadyBit on P0, and the databits on P1..P24, then I think the following would work, a somewhat weird way to use the Ina shadow register,..... but I have not tested it:
Begin...shr..........Ina,Ina.........wc......'get ready bit into carry, and right justify databits
if_nc....jmp.........#Begin...................'loop until ready bit high
..........wrlong......ina,Hubptr..............'put it away
..........add..........HubPtr,#4..............'presumably we need to advance the storage
..........waitpne....State,#1................'wait for ready pulse to leave
..........jmp..........#Begin...................'go again
State...long 1
HubPtr..long BufferStart
This sequence consumes 28 clocks plus extra for the hubwrite, so it will spin at 32 clocks.
At your suggested 100MHz clock that means 320 nanoseconds; or just over 3Ms/S
Cheers,
Peter (pjv)
As Jazzed alluded to with his 10ns sample period using 5 cogs- you can get quite high sample rates using multiple cogs. Basically, think of a 4 cylinder 4 stroke engine. Each cylinder only fires once every 4 strokes. However, if you "interleave" them- you can set up your engine so that one of the 4 cylinder's fires every stroke. Now apply this to cogs: Interleave your cogs execution so that:
cog 1 starts sampling at cnt=n+1
cog 2 starts sampling at cnt=n+2
cog 3 starts sampling at cnt=n+3
cog 4 starts sampling at cnt=n+4
You can dynamically create a pasm program that executes a whole bunch of
mov x,ina
...
mov x,ina
So- each cog will take a sample of all 32 bits every 4 clock cycles. And, since you have 4 cogs doing this, you're sampling every clock cycle. That gets you 100Msps or 10ns sample period with a 100MHz clock. Since you have 4 cogs doing this, you can store not 500 samples- but close to 2000- depending on how clever your dynamic program is.
I did this ~3 years ago with ViewPort's QuickSample object- it let's you use your Propeller as a logic state analyzer to monitor the pins of your Propeller while the rest of the Propeller's cogs are running your code. ViewPort is a PC application that graphs this data in a simulated logic analyzer- with timescale, measurements, trigger.... People have used it to troubleshoot all sorts of protocol issues.
I refined the above algorithm for use in the Parallax PropScope. There, it continually samples into cog ram while looking for a trigger. This allows the PropScope to show you samples taken BEFORE the trigger.
See my sig for links- including a review of ViewPort in Robot magazine...
Hanno
What exactly is the timing of your sample pulse relative to the data on the 24 input lines? IOW, is the sample ready on a rising edge, falling edge, or ... ? How wide is the pulse?
-Phil
Sample on rising edge. Timing pulse is 20 - 30ns.
Cheers
-Phil
Each burst will be 800 (24 bit) values.
Time between bursts is 6.4us
Having found a bit more info in the FAQ section (and other areas) I think the Prop will be suitable, with tight assembler code, to do the following:
800 x 600 SVGA signal applied to Analog Devices AD9985A (or similar).
Prop reads in each line (800 24-bit values) using Hanno's 4 cog suggestion (?) and stores in HUB RAM (video buffer).
5th cog decimates two lines when they are completed from 1600 to 400 24-bit values. Also sets buffer pointer(s). Or alternatively decimates 800 to 400 24-bit values as sampling is progressing with final 2 to 1 line reduction after last 1600th 24-bit value read.
The decimation is an averaging of line(x)pixel(y) + line(x)pixel(y+1) + line(x+1)pixel(y) + line(x+1)pixel(y+1)
6th cog? outputs this in SPI mode to LED drivers (type yet to be decided)
Of course there is insufficient COG RAM to digitize and decimate a full 800 x 600 frame to 400 x 300 so I am not sure how many decimated video lines can be stored in HUB RAM for SPI output.
And it gets a little messier as the 6th cog will not be outputting 24-bit values from 1 - 400, 401 - 800 but 1,401,801 .... to max lines stored i.e. outputs pixels in columns rather than rows.
The result, a POV rotating 360deg video display. Or complete failure.....
I hope all that makes sense.
-Phil
Another option is to physically route the ADC RGB outputs to the Prop inputs such that the RGB values are 3 x 5 bits but are read as 3 x 8 bit:
(for each of the ADC RGB)
ADC Prop
RGB R G B
D7
P4 P12 P20
D6
P3 P11 P19
D5
P2 P10 P18
D4
P1 P9 P17
D3
P0 P8 P16
D2 - nc
D1 - nc All other Prop Pn tied to 0V (via pulldown?)
D0 - nc
As this decimation is a simple form of bilinear interpolation there will be some degradation of the final RGB values.
I'm not sure if I have this correct but believe I can store at least 30 decimated lines in HUB RAM.
I wrote a program that captures groups of pixels from a 640x480 x 60Hz signal and sends it out to LED arrays. I can't share my code but I can describe it a bit. I had to overclock the prop to 100.7MHz to have one instruction cycle per pixel. I captured using one cog but a single cog doesn't have enough RAM to grab every pixel in even a scan line of 400 pixels. I think that you would also need two serial cogs going to get all of that data out but I never really figured out what portion of my serial stream I was using since I wasn't using all of the pixels in the image. Something that I did to help speed things up was to capture in 16-bit color rather than 24 bit. This allowed me to pack the data into hub RAM and get it out of the serial port faster.
Now that I think about it, you *might* be able to clock your prop at 72MHz and do some fancy timing so that you use two cogs to sample at different times. This would allow you to sample at every pixel. One of the cogs would output the clock for the ADC. That is the shaky part. The data sheet says that the PLL for the counters can only accept up to an 8 MHz signal. You would need to feed it a 9MHz signal to output 144 MHz. This would probably work. That is 12% faster than the PLL is rated for but I was running my prop at 100.7 MHz, which is 25% faster than the system clock PLL is rated for. Presumably they are the same circuit.
Make sure to design using a four-layer circuit board with a big ground plane. My first version didn't do this and although it worked, it had some noise in the video.
Good luck.
Hanno suggested in post #5 to run 4 cogs interleaved to overcome the sampling problem.
And reducing from 24 (3 x 8) to 15 (3 x 5) bit sampling will allow more display lines to be stored in HUB RAM.
As the final 400 x 300 output does not have to have an excellent image we are not too worried about noise at this point.
What LED drivers did you use on your project?
Rather than matrix panels as per your project, this project will be a 'column' of LED's rotating at 60 rpm.
There will in fact be two columns with an approx 10mm horizontal offset and a 3mm vertical offset. This provides an effective pixel spacing of 3mm.
The main PCB(s) will be the ADC, and Prop/LED drivers. A wiring loom will run out to the LED boards.
The rotating drum motor is from a Fisher & Paykel Smartdrive (701). Another 701 motor sits on top of the drum and is wired as an alternator and will power the LED electronics and a netbook that will provide the video source. The netbook will be controlled wirelessly.
The rotating drum is currently under construction.
Cheers
You write about Fisher& Paykel, wiring loom and use "mm"- are you in New Zealand?
If you turn your camera sideways you may be able to do much of IO "on the fly"- to make the most of the Prop's limited RAM... So, you would scan one "vertical" line from the camera- and then directly output it to the LED's. Then, scan the next line and output that to the LED's which are then in the new position. This way you wouldn't have to store the whole frame- at most one line!
With one cog my ViewPort PropCapture object grabs ~200x240 pixels at 4bits into HUB RAM- however only grayscale. This consumes most of the 32kb- so I haven't pursued color. The cog can also do on-the-fly detection of the brightest spot- useful for tracking a laser pointer.
I've been very happy with the ADC08100 to convert the NTSC's analog signal into 4 digital inputs to the Propeller- I wrote a Circuit Cellar article on this- as well as 2 chapters in the "Official Propeller Guide".
Hanno
Which issue of Circuit Cellar was your article in?
Thank's
Tom
Yup - Dunedin
Yes, we had thought of that. But it doesn't quite suit the projects visual goal. A hell of a lot easier for sure. And could make a good rotating poster bollard.....
So that was you. We need to mix stills and live video (albeit with some latency) so will be using a netbook with powerpoint & wireless (UltrtaVNC or similar) rather than PAL/NTSC video feed.
And why do they call a 4.1 an aftershock?
Give me a call if/when you're brave enough to visit Christchurch!
Humanoido-
More props give us more sampling cogs. To interleave additional cogs, we can take advantage of speed of light- ~30cm/nanosecond. 4 cogs lets us sample every 10ns. If we wanted to sample every 1ns- for a rate of 1gsamples/second- then we would need 40 cogs- doable in 5 or 6 propellers. The clock wire to the other propellers could be delayed using several feet/meters of bare wire- to delay cogs 4-7 by 1ns, 8..11 by 2ns and so on. Sounds like fun
Tom,
Circuit Cellar article is here:
http://www.circuitcellar.com/archives/viewable/224-Sander/index.html
All my articles are listed here:
http://hannoware.com/press.php
Blog is here:
http://blog.hannoware.com
If people are interested I could post a tutorial on connecting the ADC08100 and camera to the Prop...
Hanno
I am wondering if speeds this high really are doable without high speed latches on each prop. How do you capture an input signal that is of shorter duration than the time required for a mov instruction to capture the data?
Relying on the speed of light for different sample points is interesting.
Sounds like a good Propeller tower application. All you need now is wire,
a meter stick, and some software development skills. You could do 10Gs/s
When I was programming ViewPort QuickSample object I was watching the "cnt" register instead of the "ina" register, and was happy to see all clock activity on a 100MHz Propeller. I've also used hooked up a 100Msps ADC and gotten a nice signal in ViewPort. I have not hooked up multiple Propellers to take advantage of the "slowness" of light but don't see any reason why it wouldn't work- at least up to 1Gsmps. Of course at some point it may be easier to just switch to faster hardware, rather than using more Propellers...
A nice demonstration of the speed of light is to use the function generator of the PropScope to output a signal and measure that signal with Ch1. PropScope can graph both the function generator output and the ch1 input on the same screen- up to 50ns/div. There's a constant delay due mainly to the ADC lag- but you can see and measure the effect of using a longer wire between output and input- 1ns/foot!
Also, at the very minimum a cog would need for the data to be valid long enough for the waitxxx and at least one more instruction (to input the data) to execute. At 100MHz (10nS per clock) that would be 80nS.
If I am wrong here please explain how this would be accomplished.
This approach is not able to sample before the trigger- or up to 6 or so cycles after the trigger has been found by waitpne. For the PropScope I had to use yet another cog to allow high speed samples to be taken BEFORE the trigger event.
Still sometimes falling into old habits of thought when inputting data as my first post in this thread shows.
Now, I'm not sure if waitxx would miss a pulse between 1 and 4 clock cycles wide unless the pulse starts at almost exactly the same time as waitxx. Maybe kuroneko would know.