Shop OBEX P1 Docs P2 Docs Learn Events
streamer conflicts — Parallax Forums

streamer conflicts

I'll be the first to admit that I don't understand the egg-beater, and I have gotten pretty sloppy using the streamers...
right up until I painted myself into a corner:)

What exactly happens if I use a streamer to write to the exact location that another streamer is already reading from? and vice versa.

It looks to me that the write waits and then occasionally goes to the wrong place.

Is this correct?

Comments

  • cgraceycgracey Posts: 14,152
    All hub accesses are atomic, so there's no problem of conflict. Remember, there are 16 RAMs and 16 cogs, and each cog gets a turn at one of the RAMs on each clock.
  • ElectrodudeElectrodude Posts: 1,657
    edited 2016-04-23 19:45
    On a given clock T, cog N can only access memory addresses X for which (X >> 2) mod 16 = (T+C) mod 16, except it might not actually be dependent on the main system clock. This is implemented through 16 separate RAM blocks: RAM 0 contains all addresses ending in %0000_xx, RAM 1 contains all addresses ending in %0001_xx, ..., RAM 15 contains all addresses ending in %1111_xx. Since only one cog has access to a given RAM block at any point in time, there can be no conflicts. If you tell the streamer to access some memory, it waits for the right block to come around to access it. If two cogs tell their streamers to do access the same location at the same time, the accesses will happen in the order the streamers actually get access to the appropriate RAM block.
  • rjo__rjo__ Posts: 2,114

    A problem is occurring when I push the PropCam in parallel/snapshot mode, above 275fps.
    I had myself convinced it was a streamer issue. Now I am convinced that this is probably the limit
    for this mode.


    Thanks

  • evanhevanh Posts: 15,915
    edited 2016-04-24 01:12
    If two cogs tell their streamers to do access the same location at the same time, the accesses will happen in the order the streamers actually get access to the appropriate RAM block.

    I'd guess the SysCounter is generating the block address. And along with the added CogID, a single rotation of the Hub provides access to the following low 6-bit addresses:
    CNT(mod16)         0      1      2      3      4      5      6      7      8      9      A      B      C      D      E      F
    Cog0(+CNT)  ->  0000xx,0001xx,0010xx,0011xx,0100xx,0101xx,0110xx,0111xx,1000xx,1001xx,1010xx,1011xx,1100xx,1101xx,1110xx,1111xx
    Cog1(+CNT)  ->  0001xx,0010xx,0011xx,0100xx,0101xx,0110xx,0111xx,1000xx,1001xx,1010xx,1011xx,1100xx,1101xx,1110xx,1111xx,0000xx
    Cog2(+CNT)  ->  0010xx,0011xx,0100xx,0101xx,0110xx,0111xx,1000xx,1001xx,1010xx,1011xx,1100xx,1101xx,1110xx,1111xx,0000xx,0001xx
    Cog3(+CNT)  ->  0011xx,0100xx,0101xx,0110xx,0111xx,1000xx,1001xx,1010xx,1011xx,1100xx,1101xx,1110xx,1111xx,0000xx,0001xx,0010xx
    ...
    CogE(+CNT)  ->  1110xx,1111xx,0000xx,0001xx,0010xx,0011xx,0100xx,0101xx,0110xx,0111xx,1000xx,1001xx,1010xx,1011xx,1100xx,1101xx
    CogF(+CNT)  ->  1111xx,0000xx,0001xx,0010xx,0011xx,0100xx,0101xx,0110xx,0111xx,1000xx,1001xx,1010xx,1011xx,1100xx,1101xx,1110xx
    
    Every single one of those 256 address combinations occurs in one 16-SysClock Hub rotation.

    Chip to verify the order ...

  • evanhevanh Posts: 15,915
    edited 2016-04-24 01:03
    So, for example, CogF trails Cog0 by a single clock; except for addresses with %xxxxxxxxxxxxxx1111xx. CogF then leads Cog0 by 15 clocks.

    EDIT: I suppose it's not quite right to view it that way. Each Cog most likely starts its own rotation at 0000xx. So they are all phase shifted in time.
  • cgraceycgracey Posts: 14,152
    evanh wrote: »
    If two cogs tell their streamers to do access the same location at the same time, the accesses will happen in the order the streamers actually get access to the appropriate RAM block.

    I'd guess the SysCounter is generating the block address. And along with the added CogID, a single rotation of the Hub provides access to the following low 6-bit addresses:
    CNT(mod16)         0      1      2      3      4      5      6      7      8      9      A      B      C      D      E      F
    Cog0(+CNT)  ->  0000xx,0001xx,0010xx,0011xx,0100xx,0101xx,0110xx,0111xx,1000xx,1001xx,1010xx,1011xx,1100xx,1101xx,1110xx,1111xx
    Cog1(+CNT)  ->  0001xx,0010xx,0011xx,0100xx,0101xx,0110xx,0111xx,1000xx,1001xx,1010xx,1011xx,1100xx,1101xx,1110xx,1111xx,0000xx
    Cog2(+CNT)  ->  0010xx,0011xx,0100xx,0101xx,0110xx,0111xx,1000xx,1001xx,1010xx,1011xx,1100xx,1101xx,1110xx,1111xx,0000xx,0001xx
    Cog3(+CNT)  ->  0011xx,0100xx,0101xx,0110xx,0111xx,1000xx,1001xx,1010xx,1011xx,1100xx,1101xx,1110xx,1111xx,0000xx,0001xx,0010xx
    ...
    CogE(+CNT)  ->  1110xx,1111xx,0000xx,0001xx,0010xx,0011xx,0100xx,0101xx,0110xx,0111xx,1000xx,1001xx,1010xx,1011xx,1100xx,1101xx
    CogF(+CNT)  ->  1111xx,0000xx,0001xx,0010xx,0011xx,0100xx,0101xx,0110xx,0111xx,1000xx,1001xx,1010xx,1011xx,1100xx,1101xx,1110xx
    
    Every single one of those 256 address combinations occurs in one 16-SysClock Hub rotation.

    Chip to verify the order ...

    That looks right.
  • Cluso99Cluso99 Posts: 18,069
    Chip,
    Is the order correct?
    ie I thought/expected Cog0 reads 0000xx, next clock reads 0000xx, following clock Cog2 reads 0000xx, etc ???
    From the diagram, it appears reversed...
    ie Cog15 reads 0000xx, next clock Cog14 reads 0000xx, following clock Cog13 reads 0000xx, etc.
    (reads means access)
  • rjo__rjo__ Posts: 2,114
    Cluso

    There is an outside chance that there is a problem here. I've done what I could... I think we are fine,
    but I would like to see a real stress test.

    Rich
  • cgraceycgracey Posts: 14,152
    Cluso99 wrote: »
    Chip,
    Is the order correct?
    ie I thought/expected Cog0 reads 0000xx, next clock reads 0000xx, following clock Cog2 reads 0000xx, etc ???
    From the diagram, it appears reversed...
    ie Cog15 reads 0000xx, next clock Cog14 reads 0000xx, following clock Cog13 reads 0000xx, etc.
    (reads means access)

    On each clock cycle, each cog is connected to one of the 16 RAMs. Each cog sees an ascending long address from clock to clock, as he gets access to the next of the 16 RAMs.

    From the cogs' perspectives they are each accessing the next-higher RAM on each clock. From the RAMs' perspectives, they are each being accessed by the next lower cog on each clock.
  • Cluso99Cluso99 Posts: 18,069
    edited 2016-04-24 11:29
    I had never really thought about it before.

    If you want to pass data between cogs via hub, it.would therefore be better to have a higher cog pass to a lower cog to minimise the delay.

    This is not the same as the P1 where the next higher cog gets access to the hub after the previous cog. So the quickest pass is from a lower cog to a higher cog, which will be the reverse in P2.
  • rjo__rjo__ Posts: 2,114
    edited 2016-04-25 16:23
    I figured out my problem and it had nothing to do with streamers:)

    I won't show you the original code... because it would take a shaggy dog story to explain it all,

    BUT... look how nice the PASM2 code ended up being to acquire images from a PropCam.
    PropCam_Acq_Parallel
                mov acq_lpf,#101
                mov acq_line_ptr,##_Screen_Buffer
                wrfast #0,##_Screen_Buffer
                mov PCA_timer1,##16_318  
                getct PCA_timer2
                setb dira,#_PROPCAM_VSYNC
                setb outa,#_PROPCAM_VSYNC
                waitx ##20
                clrb outa,#_PROPCAM_VSYNC  
                waitx ##_integration_time_plus            'Propcam integratiion time plus hblank=acquisition time
    acq_line          
                rep @end_line_ACQa,##127       'readout
                testb ina,#_PropCam_tap wc
           if_c mov parallel_pixel,ina
           if_c shl parallel_pixel,#4
           if_c wfbyte  parallel_pixel
    
    end_line_ACQa
    
                add acq_line_ptr,#320
                wrfast #0,acq_line_ptr
                waitx #310
                djnz acq_lpf ,#acq_line
    

    Have you ever seen anything so beautiful!!!

    500+ FPS
  • rjo__rjo__ Posts: 2,114
    I think it is going to be always true that to do something with the P2 that you previously did with a P1... it is going to be simpler and easier to understand. If you want to do something that you couldn't do with a P1, then it is going to be marginally more complicated... this means that the entry path for new users should be better... and the exit point for frustrated engineers much farther down the trail.

    mhmm
  • cgraceycgracey Posts: 14,152
    rjo__ wrote: »
    I figured out my problem and it had nothing to do with streamers:)

    I won't show you the original code... because it would take a shaggy dog story to explain it all,

    BUT... look how nice the PASM2 code ended up being to acquire images from a PropCam.
    PropCam_Acq_Parallel
                mov acq_lpf,#101
                mov acq_line_ptr,##_Screen_Buffer
                wrfast #0,##_Screen_Buffer
                mov PCA_timer1,##16_318  
                getct PCA_timer2
                setb dira,#_PROPCAM_VSYNC
                setb outa,#_PROPCAM_VSYNC
                waitx ##20
                clrb outa,#_PROPCAM_VSYNC  
                waitx ##_integration_time_plus            'Propcam integratiion time plus hblank=acquisition time
    acq_line          
                rep @end_line_ACQa,##127       'readout
                testb ina,#_PropCam_tap wc
           if_c mov parallel_pixel,ina
           if_c shl parallel_pixel,#4
           if_c wfbyte  parallel_pixel
    
    end_line_ACQa
    
                add acq_line_ptr,#320
                wrfast #0,acq_line_ptr
                waitx #310
                djnz acq_lpf ,#acq_line
    

    Have you ever seen anything so beautiful!!!

    500+ FPS

    I've got a question: How come you are only saving nibbles via wfbyte, and not whole bytes? You are testing that PropCam_tap signal and if it's low, you don't do anything. If it's high, you do a WFBYTE with {ina[3:0], 4'b0000}.
  • rjo__rjo__ Posts: 2,114
    PropCam only exposes 4 of the sensor's data pins. In Serial Snapshot mode, PropCam uses two of these pins and we can get 8bit data... one data bit per clock, but in Parallel SnapShot we get a nibble per clock. The other nibble isn't available.
    My next step is to plug another PropCam into the available nibble and get stereo imaging going at 500FPS:)

    For lower frame rates (60-90FPS), the PropCam's acquisition mode isn't critical, but I wanted max FPS for work I am planning with the Elev8's flight controller.

    Phil says that he has plenty of un-mounted sensors, so it will be possible to get 8bits at max FPS.

    In my experience this is probably not necessary... the PropCam is really nice just the way it is,
    but there might be a commercial application that would require 8-bits at 500fps for process control... who knows?
    For the Elev8 stuff, I expect the unmodified PropCam to be perfect.

  • cgraceycgracey Posts: 14,152
    edited 2016-04-26 04:05
    rjo__ wrote: »
    PropCam only exposes 4 of the sensor's data pins. In Serial Snapshot mode, PropCam uses two of these pins and we can get 8bit data... one data bit per clock, but in Parallel SnapShot we get a nibble per clock. The other nibble isn't available.
    My next step is to plug another PropCam into the available nibble and get stereo imaging going at 500FPS:)

    For lower frame rates (60-90FPS), the PropCam's acquisition mode isn't critical, but I wanted max FPS for work I am planning with the Elev8's flight controller.

    Phil says that he has plenty of un-mounted sensors, so it will be possible to get 8bits at max FPS.

    In my experience this is probably not necessary... the PropCam is really nice just the way it is,
    but there might be a commercial application that would require 8-bits at 500fps for process control... who knows?
    For the Elev8 stuff, I expect the unmodified PropCam to be perfect.

    I see.

    What about this PropCam_tap signal? Why is it not always true, since I see your REP is set to 127?

    Here's how you could do two nibbles per WFBYTE, if you knew PropCam_tap was a go:
    	REP	@.end,#127
    
    	ROLNIB	parallel_pixel,ina,#0	'2
    	NOP				'2
    	ROLNIB	parallel_pixel,ina,#0	'2
    	WFBYTE	parallel_pixel		'2	total = 8 clocks
    .end
    

    That would be twice as fast. It assumes that PropCam_tap is high, though.
  • rjo__rjo__ Posts: 2,114
    I've been on the road with the rugrat.

    It is all timing, I am throwing away a pixel but I didn't have time to fiddle further:)
    I think the answer is in your first comment... I am using the wrong part of the clock.

    PropCam_tap: I've set up a 10MHz NCO to drive the PropCam, which is actually sitting on a Propeller Activity Board. The PropCam_tap, is the monitor for this NCO... didn't know how else to do it:) I suspect that there is a smart pin that will do all of the above if I feed it right:)

    The really, really nice thing about the PropCam is that it uses a global shutter, so all of the readout sequences are exactly the same... two versions, but it doesn't matter what the integration time is, the readout time is fixed and clock accurate.
    Phil has an eye for hardware that is hard to beat.

    That extra instruction(the NOP) might come in very handy.

    Thank you!


Sign In or Register to comment.