streamer conflicts

rjo__ · 2016-04-23 18:54

I'll be the first to admit that I don't understand the egg-beater, and I have gotten pretty sloppy using the streamers...
right up until I painted myself into a corner:)

What exactly happens if I use a streamer to write to the exact location that another streamer is already reading from? and vice versa.

It looks to me that the write waits and then occasionally goes to the wrong place.

Is this correct?

cgracey · 2016-04-23 19:16

All hub accesses are atomic, so there's no problem of conflict. Remember, there are 16 RAMs and 16 cogs, and each cog gets a turn at one of the RAMs on each clock.

Electrodude · 2016-04-23 19:44

On a given clock T, cog N can only access memory addresses X for which (X >> 2) mod 16 = (T+C) mod 16, except it might not actually be dependent on the main system clock. This is implemented through 16 separate RAM blocks: RAM 0 contains all addresses ending in %0000_xx, RAM 1 contains all addresses ending in %0001_xx, ..., RAM 15 contains all addresses ending in %1111_xx. Since only one cog has access to a given RAM block at any point in time, there can be no conflicts. If you tell the streamer to access some memory, it waits for the right block to come around to access it. If two cogs tell their streamers to do access the same location at the same time, the accesses will happen in the order the streamers actually get access to the appropriate RAM block.

rjo__ · 2016-04-24 00:31

A problem is occurring when I push the PropCam in parallel/snapshot mode, above 275fps.
I had myself convinced it was a streamer issue. Now I am convinced that this is probably the limit
for this mode.

Thanks

evanh · 2016-04-24 00:35

Electrodude wrote: »

If two cogs tell their streamers to do access the same location at the same time, the accesses will happen in the order the streamers actually get access to the appropriate RAM block.

I'd guess the SysCounter is generating the block address. And along with the added CogID, a single rotation of the Hub provides access to the following low 6-bit addresses:

CNT(mod16)         0      1      2      3      4      5      6      7      8      9      A      B      C      D      E      F
Cog0(+CNT)  ->  0000xx,0001xx,0010xx,0011xx,0100xx,0101xx,0110xx,0111xx,1000xx,1001xx,1010xx,1011xx,1100xx,1101xx,1110xx,1111xx
Cog1(+CNT)  ->  0001xx,0010xx,0011xx,0100xx,0101xx,0110xx,0111xx,1000xx,1001xx,1010xx,1011xx,1100xx,1101xx,1110xx,1111xx,0000xx
Cog2(+CNT)  ->  0010xx,0011xx,0100xx,0101xx,0110xx,0111xx,1000xx,1001xx,1010xx,1011xx,1100xx,1101xx,1110xx,1111xx,0000xx,0001xx
Cog3(+CNT)  ->  0011xx,0100xx,0101xx,0110xx,0111xx,1000xx,1001xx,1010xx,1011xx,1100xx,1101xx,1110xx,1111xx,0000xx,0001xx,0010xx
...
CogE(+CNT)  ->  1110xx,1111xx,0000xx,0001xx,0010xx,0011xx,0100xx,0101xx,0110xx,0111xx,1000xx,1001xx,1010xx,1011xx,1100xx,1101xx
CogF(+CNT)  ->  1111xx,0000xx,0001xx,0010xx,0011xx,0100xx,0101xx,0110xx,0111xx,1000xx,1001xx,1010xx,1011xx,1100xx,1101xx,1110xx

Every single one of those 256 address combinations occurs in one 16-SysClock Hub rotation.

Chip to verify the order ...

evanh · 2016-04-24 00:41

So, for example, CogF trails Cog0 by a single clock; except for addresses with %xxxxxxxxxxxxxx1111xx. CogF then leads Cog0 by 15 clocks.

EDIT: I suppose it's not quite right to view it that way. Each Cog most likely starts its own rotation at 0000xx. So they are all phase shifted in time.

cgracey · 2016-04-24 01:58

evanh wrote: »
Electrodude wrote: »

If two cogs tell their streamers to do access the same location at the same time, the accesses will happen in the order the streamers actually get access to the appropriate RAM block.

I'd guess the SysCounter is generating the block address. And along with the added CogID, a single rotation of the Hub provides access to the following low 6-bit addresses:
CNT(mod16)         0      1      2      3      4      5      6      7      8      9      A      B      C      D      E      F
Cog0(+CNT)  ->  0000xx,0001xx,0010xx,0011xx,0100xx,0101xx,0110xx,0111xx,1000xx,1001xx,1010xx,1011xx,1100xx,1101xx,1110xx,1111xx
Cog1(+CNT)  ->  0001xx,0010xx,0011xx,0100xx,0101xx,0110xx,0111xx,1000xx,1001xx,1010xx,1011xx,1100xx,1101xx,1110xx,1111xx,0000xx
Cog2(+CNT)  ->  0010xx,0011xx,0100xx,0101xx,0110xx,0111xx,1000xx,1001xx,1010xx,1011xx,1100xx,1101xx,1110xx,1111xx,0000xx,0001xx
Cog3(+CNT)  ->  0011xx,0100xx,0101xx,0110xx,0111xx,1000xx,1001xx,1010xx,1011xx,1100xx,1101xx,1110xx,1111xx,0000xx,0001xx,0010xx
...
CogE(+CNT)  ->  1110xx,1111xx,0000xx,0001xx,0010xx,0011xx,0100xx,0101xx,0110xx,0111xx,1000xx,1001xx,1010xx,1011xx,1100xx,1101xx
CogF(+CNT)  ->  1111xx,0000xx,0001xx,0010xx,0011xx,0100xx,0101xx,0110xx,0111xx,1000xx,1001xx,1010xx,1011xx,1100xx,1101xx,1110xx
Every single one of those 256 address combinations occurs in one 16-SysClock Hub rotation.

Chip to verify the order ...

That looks right.

Cluso99 · 2016-04-24 02:32

Chip,
Is the order correct?
ie I thought/expected Cog0 reads 0000xx, next clock reads 0000xx, following clock Cog2 reads 0000xx, etc ???
From the diagram, it appears reversed...
ie Cog15 reads 0000xx, next clock Cog14 reads 0000xx, following clock Cog13 reads 0000xx, etc.
(reads means access)

rjo__ · 2016-04-24 03:10

Cluso

There is an outside chance that there is a problem here. I've done what I could... I think we are fine,
but I would like to see a real stress test.

Rich

cgracey · 2016-04-24 03:11

Cluso99 wrote: »

Chip,
Is the order correct?
ie I thought/expected Cog0 reads 0000xx, next clock reads 0000xx, following clock Cog2 reads 0000xx, etc ???
From the diagram, it appears reversed...
ie Cog15 reads 0000xx, next clock Cog14 reads 0000xx, following clock Cog13 reads 0000xx, etc.
(reads means access)

On each clock cycle, each cog is connected to one of the 16 RAMs. Each cog sees an ascending long address from clock to clock, as he gets access to the next of the 16 RAMs.

From the cogs' perspectives they are each accessing the next-higher RAM on each clock. From the RAMs' perspectives, they are each being accessed by the next lower cog on each clock.

Cluso99 · 2016-04-24 05:43

I had never really thought about it before.

If you want to pass data between cogs via hub, it.would therefore be better to have a higher cog pass to a lower cog to minimise the delay.

This is not the same as the P1 where the next higher cog gets access to the hub after the previous cog. So the quickest pass is from a lower cog to a higher cog, which will be the reverse in P2.

rjo__ · 2016-04-25 16:22

I figured out my problem and it had nothing to do with streamers:)

I won't show you the original code... because it would take a shaggy dog story to explain it all,

BUT... look how nice the PASM2 code ended up being to acquire images from a PropCam.

PropCam_Acq_Parallel
            mov acq_lpf,#101
            mov acq_line_ptr,##_Screen_Buffer
            wrfast #0,##_Screen_Buffer
            mov PCA_timer1,##16_318  
            getct PCA_timer2
            setb dira,#_PROPCAM_VSYNC
            setb outa,#_PROPCAM_VSYNC
            waitx ##20
            clrb outa,#_PROPCAM_VSYNC  
            waitx ##_integration_time_plus            'Propcam integratiion time plus hblank=acquisition time
acq_line          
            rep @end_line_ACQa,##127       'readout
            testb ina,#_PropCam_tap wc
       if_c mov parallel_pixel,ina
       if_c shl parallel_pixel,#4
       if_c wfbyte  parallel_pixel

end_line_ACQa

            add acq_line_ptr,#320
            wrfast #0,acq_line_ptr
            waitx #310
            djnz acq_lpf ,#acq_line

Have you ever seen anything so beautiful!!!

500+ FPS

rjo__ · 2016-04-25 16:29

I think it is going to be always true that to do something with the P2 that you previously did with a P1... it is going to be simpler and easier to understand. If you want to do something that you couldn't do with a P1, then it is going to be marginally more complicated... this means that the entry path for new users should be better... and the exit point for frustrated engineers much farther down the trail.

mhmm

cgracey · 2016-04-25 17:56

rjo__ wrote: »

I figured out my problem and it had nothing to do with streamers:)

I won't show you the original code... because it would take a shaggy dog story to explain it all,

BUT... look how nice the PASM2 code ended up being to acquire images from a PropCam.

PropCam_Acq_Parallel
            mov acq_lpf,#101
            mov acq_line_ptr,##_Screen_Buffer
            wrfast #0,##_Screen_Buffer
            mov PCA_timer1,##16_318  
            getct PCA_timer2
            setb dira,#_PROPCAM_VSYNC
            setb outa,#_PROPCAM_VSYNC
            waitx ##20
            clrb outa,#_PROPCAM_VSYNC  
            waitx ##_integration_time_plus            'Propcam integratiion time plus hblank=acquisition time
acq_line          
            rep @end_line_ACQa,##127       'readout
            testb ina,#_PropCam_tap wc
       if_c mov parallel_pixel,ina
       if_c shl parallel_pixel,#4
       if_c wfbyte  parallel_pixel

end_line_ACQa

            add acq_line_ptr,#320
            wrfast #0,acq_line_ptr
            waitx #310
            djnz acq_lpf ,#acq_line

Have you ever seen anything so beautiful!!!

500+ FPS

I've got a question: How come you are only saving nibbles via wfbyte, and not whole bytes? You are testing that PropCam_tap signal and if it's low, you don't do anything. If it's high, you do a WFBYTE with {ina[3:0], 4'b0000}.

rjo__ · 2016-04-25 20:30

PropCam only exposes 4 of the sensor's data pins. In Serial Snapshot mode, PropCam uses two of these pins and we can get 8bit data... one data bit per clock, but in Parallel SnapShot we get a nibble per clock. The other nibble isn't available.
My next step is to plug another PropCam into the available nibble and get stereo imaging going at 500FPS:)

For lower frame rates (60-90FPS), the PropCam's acquisition mode isn't critical, but I wanted max FPS for work I am planning with the Elev8's flight controller.

Phil says that he has plenty of un-mounted sensors, so it will be possible to get 8bits at max FPS.

In my experience this is probably not necessary... the PropCam is really nice just the way it is,
but there might be a commercial application that would require 8-bits at 500fps for process control... who knows?
For the Elev8 stuff, I expect the unmodified PropCam to be perfect.

cgracey · 2016-04-25 22:08

rjo__ wrote: »

PropCam only exposes 4 of the sensor's data pins. In Serial Snapshot mode, PropCam uses two of these pins and we can get 8bit data... one data bit per clock, but in Parallel SnapShot we get a nibble per clock. The other nibble isn't available.
My next step is to plug another PropCam into the available nibble and get stereo imaging going at 500FPS:)

For lower frame rates (60-90FPS), the PropCam's acquisition mode isn't critical, but I wanted max FPS for work I am planning with the Elev8's flight controller.

Phil says that he has plenty of un-mounted sensors, so it will be possible to get 8bits at max FPS.

In my experience this is probably not necessary... the PropCam is really nice just the way it is,
but there might be a commercial application that would require 8-bits at 500fps for process control... who knows?
For the Elev8 stuff, I expect the unmodified PropCam to be perfect.

I see.

What about this PropCam_tap signal? Why is it not always true, since I see your REP is set to 127?

Here's how you could do two nibbles per WFBYTE, if you knew PropCam_tap was a go:

	REP	@.end,#127

	ROLNIB	parallel_pixel,ina,#0	'2
	NOP				'2
	ROLNIB	parallel_pixel,ina,#0	'2
	WFBYTE	parallel_pixel		'2	total = 8 clocks
.end

That would be twice as fast. It assumes that PropCam_tap is high, though.

rjo__ · 2016-04-26 22:01

I've been on the road with the rugrat.

It is all timing, I am throwing away a pixel but I didn't have time to fiddle further:)
I think the answer is in your first comment... I am using the wrong part of the clock.

PropCam_tap: I've set up a 10MHz NCO to drive the PropCam, which is actually sitting on a Propeller Activity Board. The PropCam_tap, is the monitor for this NCO... didn't know how else to do it:) I suspect that there is a smart pin that will do all of the above if I feed it right:)

The really, really nice thing about the PropCam is that it uses a global shutter, so all of the readout sequences are exactly the same... two versions, but it doesn't matter what the integration time is, the readout time is fixed and clock accurate.
Phil has an eye for hardware that is hard to beat.

That extra instruction(the NOP) might come in very handy.

Thank you!

streamer conflicts

Comments