Shop OBEX P1 Docs P2 Docs Learn Events
New Hub Scheme For Next Chip - Page 27 — Parallax Forums

New Hub Scheme For Next Chip

1242527293037

Comments

  • jmgjmg Posts: 15,179
    edited 2014-05-20 22:15
    cgracey wrote: »
    Ah, I'm thinking hardware. Software DMA is going to be done within a few clocks, usually, getting the FIFO out of the picture.

    OK, I think we are saying the same things then.
  • jmgjmg Posts: 15,179
    edited 2014-05-20 22:21
    Ugh. You're right. Any chance that competition between a direct op and a FIFO op could defer to the FIFO, allowing the direct op to complete on the next opportunity? (Of course that would typically mean waiting for the FIFO to flush, I s'pose.)
    In HW-FIFO cases, with moderate fSys/N ( say /3 and slower), there are quite a number of spare slots where a FIFO can be idling.

    In SW FIFO cases, it is unlikely SW will be both feeding the FIFO at high rates, and trying to do a direct access in close timing proximity.
  • Invent-O-DocInvent-O-Doc Posts: 768
    edited 2014-05-21 03:18
    Either this FIFO is too complicated and inelegant or people are making something simple and reasonable into something quixotic and labyrinthine. Unless there is simplicity and elegance, the resulting chip will be an ugly kludge.
  • dMajodMajo Posts: 855
    edited 2014-05-21 03:44
    RossH wrote: »
    Why would we want non-blocking direct read/writes?

    Ross.

    Because when you are dealing with real world (pin) events, of course not at higher speed the hub can tolerate, they will most probably be asynchronous and not in sync with the eg random write hub window. If you need to acquire the event, process somehow and store it even if its frequency is the same or a bit lower than the hub, but at varying duty cycle you risk to miss the hub window and thus mis the next data acquisition. One level write buffering is mandatory IMHO. The second write is OK to stall the cog since this means you are trying to deal with to high frequencies the propeller is not capable to handle but is not admissible to loose details just because they are out of phase, and this is ordinary with the real world events.
  • cgraceycgracey Posts: 14,231
    edited 2014-05-21 04:21
    dMajo wrote: »
    Because when you are dealing with real world (pin) events, of course not at higher speed the hub can tolerate, they will most probably be asynchronous and not in sync with the eg random write hub window. If you need to acquire the event, process somehow and store it even if its frequency is the same or a bit lower than the hub, but at varying duty cycle you risk to miss the hub window and thus mis the next data acquisition. One level write buffering is mandatory IMHO. The second write is OK to stall the cog since this means you are trying to deal with to high frequencies the propeller is not capable to handle but is not admissible to loose details just because they are out of phase, and this is ordinary with the real world events.


    With the hub FIFO, once you set it up for read or write, every read or write instruction always takes just one clock. The limitation, of course, is that you are reading/writing the hub memory in a straight line.
  • cgraceycgracey Posts: 14,231
    edited 2014-05-21 04:30
    I've just about got the logic done for the interface between the cog, FIFO, and hub memory. It's been really challenging, even though it's not much logic.

    Once you do a RDINIT D/#address19, the bottom level of the FIFO is already primed and you are ready to pull any number of sequential bytes/words/longs from hub memory, either via software or hardware, at up to a byte/word/long per clock. You can never outpace it. Same goes for WRINIT D/#address19. You are immediately ready to software write or hardware stream, at any rate, up to the system clock, any number of bytes/words/longs into hub memory.

    For cases where determinism is important, this is the ultimate in efficiency, as long as reading/writing in a stream is what you need.

    Does anyone see a strong need for separate read and write FIFOs that could operate concurrently (but not at top speeds, together)? This would be good for software reading and writing. In my experience, I usually need to input for a while, or output for a while, in which case a single FIFO, usable for either reading or writing, is adequate.
  • RossHRossH Posts: 5,502
    edited 2014-05-21 04:36
    cgracey wrote: »
    I've just about got the logic done for the interface between the cog, FIFO, and hub memory. It's been really challenging, even though it's not much logic.

    Once you do a RDINIT D/#address19, the bottom level of the FIFO is already primed and you are ready to pull any number of sequential bytes/words/longs from hub memory, either via software or hardware, at up to a byte/word/long per clock. You can never outpace it. Same goes for WRINIT D/#address19. You are immediately ready to software write or hardware stream, at any rate, up to the system clock, any number of bytes/words/longs into hub memory.

    For cases where determinism is important, this is the ultimate in efficiency, as long as reading/writing in a stream is what you need.

    Great! Can you confirm that direct read/writes are not "buffered" or "non-blocking"? I.e. that they behave the way one would normally expect?

    Ross.
  • cgraceycgracey Posts: 14,231
    edited 2014-05-21 04:40
    RossH wrote: »
    Great! Can you confirm that direct read/writes are not "buffered" or "non-blocking"? I.e. that they behave the way one would normally expect?

    Ross.


    I think direct read and writes need to yield to FIFO activity, and use slots the FIFO skips. For software FIFO activity, this is no problem, but for hardware streaming, this could introduce delays. Is this okay?
  • RossHRossH Posts: 5,502
    edited 2014-05-21 04:47
    cgracey wrote: »
    I think direct read and writes need to yield to FIFO activity, and use slots the FIFO skips. For software FIFO activity, this is no problem, but for hardware streaming, this could introduce delays. Is this okay?

    Hmm. If I understand the FIFO operation correctly, I think it is useful only in quite limited scenarios, so that should be ok. You would either use the FIFO or direct access - rarely both at the same time.

    Ross.
  • cgraceycgracey Posts: 14,231
    edited 2014-05-21 05:00
    RossH wrote: »
    Hmm. If I understand the FIFO operation correctly, I think it is useful only in quite limited scenarios, so that should be ok. You would either use the FIFO or direct access - rarely both at the same time.

    Ross.


    That's true. Both would be getting used during hub exec, though.
  • RossHRossH Posts: 5,502
    edited 2014-05-21 05:06
    cgracey wrote: »
    That's true. Both would be getting used during hub exec, though.

    Yes, I wondered about that. What happens when the instruction fetched via the FIFO is a hub access - do you have to wait till the FIFO fills up before the hub access is executed? If so, would that be up to 20 clocks plus whatever the hub latency happened to be for the address being accessed?

    Ross.
  • cgraceycgracey Posts: 14,231
    edited 2014-05-21 05:10
    RossH wrote: »
    Yes, I wondered about that. What happens when the instruction fetched via the FIFO is a hub access - do you have to wait till the FIFO fills up before the hub access is executed? If so, would that be up to 20 clocks plus whatever the hub latency happened to be for the address being accessed?

    Ross.


    Well, since we are drawing instructions from the FIFO at no more than half the rate they are going into the FIFO, the FIFO will almost always be nearly topped off, so there's not much waiting, if any. Wait... on branches the FIFO will want to reload pretty often. Maybe for hub exec, we limit it to a depth of only eight, or so.
  • RossHRossH Posts: 5,502
    edited 2014-05-21 05:14
    cgracey wrote: »
    Well, since we are drawing instructions from the FIFO at no more than half the rate they are going into the FIFO, the FIFO will almost always be nearly topped off, so there's not much waiting, if any.

    But the FIFO will be empty after each branch, so if the next instruction after the branch is a hub operation, the wait may be very long. And in some code the FIFO will rarely if ever get a chance to fill up.

    Ross.
  • cgraceycgracey Posts: 14,231
    edited 2014-05-21 05:16
    RossH wrote: »
    But the FIFO will be empty after each branch, so if the next instruction after the branch is a hub operation, the wait may be very long. And in some code the FIFO will rarely if ever get a chance to fill up.

    Ross.


    Maybe hub instructions should take priority over instruction spooling, then. Any ideas about how to improve this?
  • RossHRossH Posts: 5,502
    edited 2014-05-21 05:22
    cgracey wrote: »
    Maybe hub instructions should take priority over instruction spooling, then.

    Yes, I think that would be better. If you want to use the FIFO for other purposes (like streaming), then don't use direct access!

    Ross.
  • RossHRossH Posts: 5,502
    edited 2014-05-21 05:30
    cgracey wrote: »
    Any ideas about how to improve this?

    No. Except perhaps to make the operation of the FIFO (such as how it behaves in the presence of direct access) configurable for different purposes.

    More complexity!

    Ross.
  • cgraceycgracey Posts: 14,231
    edited 2014-05-21 05:43
    RossH wrote: »
    No. Except perhaps to make the operation of the FIFO (such as how it behaves in the presence of direct access) configurable for different purposes.

    More complexity!

    Ross.


    Would you like to go back to the strict round-robin approach? Maybe with some slot allocation?
  • RossHRossH Posts: 5,502
    edited 2014-05-21 05:49
    cgracey wrote: »
    Would you like to back to the strict round-robin approach? Maybe with some slot allocation?

    Now I know you're taking the mickey :smile:

    No - I think this could work. The simplest thing is just to make direct hub access take precedence over the FIFO. Those who want to use the FIFO for other purposes just have to be aware of the consequences of also using direct access.
  • Bill HenningBill Henning Posts: 6,445
    edited 2014-05-21 05:52
    Agreed.

    This also works best for hubexec.
    RossH wrote: »
    Now I know you're taking the mickey :smile:

    No - I think this could work. The simplest thing is just to make direct hub access take precedence over the FIFO. Those who want to use the FIFO for other purposes just have to be aware of the consequences of also using direct access.
  • cgraceycgracey Posts: 14,231
    edited 2014-05-21 05:55
    RossH wrote: »
    Now I know you're taking the mickey :smile:

    No - I think this could work. The simplest thing is just to make direct hub access take precedence over the FIFO. Those who want to use the FIFO for other purposes just have to be aware of the consequences of also using direct access.


    Okay.
  • dMajodMajo Posts: 855
    edited 2014-05-21 06:37
    RossH wrote: »
    But the FIFO will be empty after each branch, so if the next instruction after the branch is a hub operation, the wait may be very long. And in some code the FIFO will rarely if ever get a chance to fill up.

    Ross.

    Isn't the FIFO a lung (perhaps not the right word, to clarify isn't a stack a LIFO?). I mean in a 20 elements FIFO you get out the first element that went in, in order. It doesn't mean that all the elements need to be filled in to get out the first one, its a variable length storage of up to n elements, isn't it?
  • cgraceycgracey Posts: 14,231
    edited 2014-05-21 06:41
    dMajo wrote: »
    Isn't the FIFO a lung (perhaps not the right word, to clarify isn't a stack a LIFO?). I mean in a 20 elements FIFO you get out the first element that went in, in order. It doesn't mean that all the elements need to be filled in to get out the first one, its a variable length storage of up to n elements, isn't it?


    That's right. It starts out at size=0 and can grow to 19 (used to be 20, but 19 is what we actually need).
  • dMajodMajo Posts: 855
    edited 2014-05-21 06:45
    Chip,

    will the FIFO linearly read/fill the hub source/destination endlessly increasing the hub address? It is possible also to set a known amount of hub longs (space) and utilize the FIFO to eg. read/write a hub based circular buffer (auto roll-over to the starting address)? Or you need to stop the FIFO reset the starting address and start it again?
  • David BetzDavid Betz Posts: 14,516
    edited 2014-05-21 06:48
    How would I use the FIFO to copy a block of data from one hub location to another. I can see how I can use it to stream data into or out of a COG but is hub-to-hub copy supported?
  • Bill HenningBill Henning Posts: 6,445
    edited 2014-05-21 06:59
    The FIFO is a HUGE advance for Px!

    Having separate read/write FIFO's would potentially double hub-to-hub copying bandwidth, with the addition of "COPYB/W/L" instructions, so things like the str* mem* C library code would benefit, as would video blits, sprites etc.

    This time, even I am not sure it is needed / worth the gates.
    cgracey wrote: »
    I've just about got the logic done for the interface between the cog, FIFO, and hub memory. It's been really challenging, even though it's not much logic.

    Once you do a RDINIT D/#address19, the bottom level of the FIFO is already primed and you are ready to pull any number of sequential bytes/words/longs from hub memory, either via software or hardware, at up to a byte/word/long per clock. You can never outpace it. Same goes for WRINIT D/#address19. You are immediately ready to software write or hardware stream, at any rate, up to the system clock, any number of bytes/words/longs into hub memory.

    For cases where determinism is important, this is the ultimate in efficiency, as long as reading/writing in a stream is what you need.

    Does anyone see a strong need for separate read and write FIFOs that could operate concurrently (but not at top speeds, together)? This would be good for software reading and writing. In my experience, I usually need to input for a while, or output for a while, in which case a single FIFO, usable for either reading or writing, is adequate.
  • Bill HenningBill Henning Posts: 6,445
    edited 2014-05-21 07:02
    With one fifo:

    Stream into a buffer on the cog, stream out of it, assuming REPs. When copying longs, 200MB/sec copy rate, 100MB/sec for words, 50MB/sec for bytes

    With separate read & write fifos, and addition of "COPYB/W/L" instructions, assuming REPs:

    When copying longs 400MB/sec, words 200MB/sec, bytes 100MB/sec

    For comparison, an 80MHz P1 would copy at:

    longs 10MB/sec, words 5MB/sec, bytes 2.5MB/sec
    David Betz wrote: »
    How would I use the FIFO to copy a block of data from one hub location to another. I can see how I can use it to stream data into or out of a COG but is hub-to-hub copy supported?
  • David BetzDavid Betz Posts: 14,516
    edited 2014-05-21 07:06
    With one fifo:

    Stream into a buffer on the cog, stream out of it, assuming REPs. When copying longs, 200MB/sec copy rate, 100MB/sec for words, 50MB/sec for bytes

    With separate read & write fifos, and addition of "COPYB/W/L" instructions, assuming REPs:

    When copying longs 400MB/sec, words 200MB/sec, bytes 100MB/sec
    But the COG has very limited memory. I guess moves have to be done in small chunks.
  • Bill HenningBill Henning Posts: 6,445
    edited 2014-05-21 07:15
    With one FIFO, it depends on how much cog space you have available... say 128 longs would work great.

    With separate read/write FIFO's and the COPYB/W/L instructions, no cog buffer is needed.
    ' 400MB/sec hub copy
    INITR src
    INITW dst
    REP count
    COPYL
    
    David Betz wrote: »
    But the COG has very limited memory. I guess moves have to be done in small chunks.
  • cgraceycgracey Posts: 14,231
    edited 2014-05-21 07:24
    dMajo wrote: »
    Chip,

    will the FIFO linearly read/fill the hub source/destination endlessly increasing the hub address? It is possible also to set a known amount of hub longs (space) and utilize the FIFO to eg. read/write a hub based circular buffer (auto roll-over to the starting address)? Or you need to stop the FIFO reset the starting address and start it again?


    It loops through the whole memory right now, but it could be made to wrap in a limited area. That's a great idea you have! That way, you could output to four 8-bit DACs a loop of longs at up to 200MHz. If we had one more control bit somewhere, we could make the buffer switchable in position, so that we could write one buffer while we output the other.
  • pjvpjv Posts: 1,903
    edited 2014-05-21 07:40
    cgracey wrote: »
    ........If we had one more control bit somewhere, we could make the buffer switchable in position, so that we could write one buffer while we output the other.

    Yes!

    Would it be possible (reasonable) to have two FIFOs like this?

    Cheers,

    Peter (pjv)
Sign In or Register to comment.