Ugh. You're right. Any chance that competition between a direct op and a FIFO op could defer to the FIFO, allowing the direct op to complete on the next opportunity? (Of course that would typically mean waiting for the FIFO to flush, I s'pose.)
In HW-FIFO cases, with moderate fSys/N ( say /3 and slower), there are quite a number of spare slots where a FIFO can be idling.
In SW FIFO cases, it is unlikely SW will be both feeding the FIFO at high rates, and trying to do a direct access in close timing proximity.
Either this FIFO is too complicated and inelegant or people are making something simple and reasonable into something quixotic and labyrinthine. Unless there is simplicity and elegance, the resulting chip will be an ugly kludge.
Why would we want non-blocking direct read/writes?
Ross.
Because when you are dealing with real world (pin) events, of course not at higher speed the hub can tolerate, they will most probably be asynchronous and not in sync with the eg random write hub window. If you need to acquire the event, process somehow and store it even if its frequency is the same or a bit lower than the hub, but at varying duty cycle you risk to miss the hub window and thus mis the next data acquisition. One level write buffering is mandatory IMHO. The second write is OK to stall the cog since this means you are trying to deal with to high frequencies the propeller is not capable to handle but is not admissible to loose details just because they are out of phase, and this is ordinary with the real world events.
Because when you are dealing with real world (pin) events, of course not at higher speed the hub can tolerate, they will most probably be asynchronous and not in sync with the eg random write hub window. If you need to acquire the event, process somehow and store it even if its frequency is the same or a bit lower than the hub, but at varying duty cycle you risk to miss the hub window and thus mis the next data acquisition. One level write buffering is mandatory IMHO. The second write is OK to stall the cog since this means you are trying to deal with to high frequencies the propeller is not capable to handle but is not admissible to loose details just because they are out of phase, and this is ordinary with the real world events.
With the hub FIFO, once you set it up for read or write, every read or write instruction always takes just one clock. The limitation, of course, is that you are reading/writing the hub memory in a straight line.
I've just about got the logic done for the interface between the cog, FIFO, and hub memory. It's been really challenging, even though it's not much logic.
Once you do a RDINIT D/#address19, the bottom level of the FIFO is already primed and you are ready to pull any number of sequential bytes/words/longs from hub memory, either via software or hardware, at up to a byte/word/long per clock. You can never outpace it. Same goes for WRINIT D/#address19. You are immediately ready to software write or hardware stream, at any rate, up to the system clock, any number of bytes/words/longs into hub memory.
For cases where determinism is important, this is the ultimate in efficiency, as long as reading/writing in a stream is what you need.
Does anyone see a strong need for separate read and write FIFOs that could operate concurrently (but not at top speeds, together)? This would be good for software reading and writing. In my experience, I usually need to input for a while, or output for a while, in which case a single FIFO, usable for either reading or writing, is adequate.
I've just about got the logic done for the interface between the cog, FIFO, and hub memory. It's been really challenging, even though it's not much logic.
Once you do a RDINIT D/#address19, the bottom level of the FIFO is already primed and you are ready to pull any number of sequential bytes/words/longs from hub memory, either via software or hardware, at up to a byte/word/long per clock. You can never outpace it. Same goes for WRINIT D/#address19. You are immediately ready to software write or hardware stream, at any rate, up to the system clock, any number of bytes/words/longs into hub memory.
For cases where determinism is important, this is the ultimate in efficiency, as long as reading/writing in a stream is what you need.
Great! Can you confirm that direct read/writes are not "buffered" or "non-blocking"? I.e. that they behave the way one would normally expect?
Great! Can you confirm that direct read/writes are not "buffered" or "non-blocking"? I.e. that they behave the way one would normally expect?
Ross.
I think direct read and writes need to yield to FIFO activity, and use slots the FIFO skips. For software FIFO activity, this is no problem, but for hardware streaming, this could introduce delays. Is this okay?
I think direct read and writes need to yield to FIFO activity, and use slots the FIFO skips. For software FIFO activity, this is no problem, but for hardware streaming, this could introduce delays. Is this okay?
Hmm. If I understand the FIFO operation correctly, I think it is useful only in quite limited scenarios, so that should be ok. You would either use the FIFO or direct access - rarely both at the same time.
Hmm. If I understand the FIFO operation correctly, I think it is useful only in quite limited scenarios, so that should be ok. You would either use the FIFO or direct access - rarely both at the same time.
Ross.
That's true. Both would be getting used during hub exec, though.
That's true. Both would be getting used during hub exec, though.
Yes, I wondered about that. What happens when the instruction fetched via the FIFO is a hub access - do you have to wait till the FIFO fills up before the hub access is executed? If so, would that be up to 20 clocks plus whatever the hub latency happened to be for the address being accessed?
Yes, I wondered about that. What happens when the instruction fetched via the FIFO is a hub access - do you have to wait till the FIFO fills up before the hub access is executed? If so, would that be up to 20 clocks plus whatever the hub latency happened to be for the address being accessed?
Ross.
Well, since we are drawing instructions from the FIFO at no more than half the rate they are going into the FIFO, the FIFO will almost always be nearly topped off, so there's not much waiting, if any. Wait... on branches the FIFO will want to reload pretty often. Maybe for hub exec, we limit it to a depth of only eight, or so.
Well, since we are drawing instructions from the FIFO at no more than half the rate they are going into the FIFO, the FIFO will almost always be nearly topped off, so there's not much waiting, if any.
But the FIFO will be empty after each branch, so if the next instruction after the branch is a hub operation, the wait may be very long. And in some code the FIFO will rarely if ever get a chance to fill up.
But the FIFO will be empty after each branch, so if the next instruction after the branch is a hub operation, the wait may be very long. And in some code the FIFO will rarely if ever get a chance to fill up.
Ross.
Maybe hub instructions should take priority over instruction spooling, then. Any ideas about how to improve this?
Would you like to back to the strict round-robin approach? Maybe with some slot allocation?
Now I know you're taking the mickey
No - I think this could work. The simplest thing is just to make direct hub access take precedence over the FIFO. Those who want to use the FIFO for other purposes just have to be aware of the consequences of also using direct access.
No - I think this could work. The simplest thing is just to make direct hub access take precedence over the FIFO. Those who want to use the FIFO for other purposes just have to be aware of the consequences of also using direct access.
No - I think this could work. The simplest thing is just to make direct hub access take precedence over the FIFO. Those who want to use the FIFO for other purposes just have to be aware of the consequences of also using direct access.
But the FIFO will be empty after each branch, so if the next instruction after the branch is a hub operation, the wait may be very long. And in some code the FIFO will rarely if ever get a chance to fill up.
Ross.
Isn't the FIFO a lung (perhaps not the right word, to clarify isn't a stack a LIFO?). I mean in a 20 elements FIFO you get out the first element that went in, in order. It doesn't mean that all the elements need to be filled in to get out the first one, its a variable length storage of up to n elements, isn't it?
Isn't the FIFO a lung (perhaps not the right word, to clarify isn't a stack a LIFO?). I mean in a 20 elements FIFO you get out the first element that went in, in order. It doesn't mean that all the elements need to be filled in to get out the first one, its a variable length storage of up to n elements, isn't it?
That's right. It starts out at size=0 and can grow to 19 (used to be 20, but 19 is what we actually need).
will the FIFO linearly read/fill the hub source/destination endlessly increasing the hub address? It is possible also to set a known amount of hub longs (space) and utilize the FIFO to eg. read/write a hub based circular buffer (auto roll-over to the starting address)? Or you need to stop the FIFO reset the starting address and start it again?
How would I use the FIFO to copy a block of data from one hub location to another. I can see how I can use it to stream data into or out of a COG but is hub-to-hub copy supported?
Having separate read/write FIFO's would potentially double hub-to-hub copying bandwidth, with the addition of "COPYB/W/L" instructions, so things like the str* mem* C library code would benefit, as would video blits, sprites etc.
This time, even I am not sure it is needed / worth the gates.
I've just about got the logic done for the interface between the cog, FIFO, and hub memory. It's been really challenging, even though it's not much logic.
Once you do a RDINIT D/#address19, the bottom level of the FIFO is already primed and you are ready to pull any number of sequential bytes/words/longs from hub memory, either via software or hardware, at up to a byte/word/long per clock. You can never outpace it. Same goes for WRINIT D/#address19. You are immediately ready to software write or hardware stream, at any rate, up to the system clock, any number of bytes/words/longs into hub memory.
For cases where determinism is important, this is the ultimate in efficiency, as long as reading/writing in a stream is what you need.
Does anyone see a strong need for separate read and write FIFOs that could operate concurrently (but not at top speeds, together)? This would be good for software reading and writing. In my experience, I usually need to input for a while, or output for a while, in which case a single FIFO, usable for either reading or writing, is adequate.
How would I use the FIFO to copy a block of data from one hub location to another. I can see how I can use it to stream data into or out of a COG but is hub-to-hub copy supported?
will the FIFO linearly read/fill the hub source/destination endlessly increasing the hub address? It is possible also to set a known amount of hub longs (space) and utilize the FIFO to eg. read/write a hub based circular buffer (auto roll-over to the starting address)? Or you need to stop the FIFO reset the starting address and start it again?
It loops through the whole memory right now, but it could be made to wrap in a limited area. That's a great idea you have! That way, you could output to four 8-bit DACs a loop of longs at up to 200MHz. If we had one more control bit somewhere, we could make the buffer switchable in position, so that we could write one buffer while we output the other.
........If we had one more control bit somewhere, we could make the buffer switchable in position, so that we could write one buffer while we output the other.
Yes!
Would it be possible (reasonable) to have two FIFOs like this?
Comments
OK, I think we are saying the same things then.
In SW FIFO cases, it is unlikely SW will be both feeding the FIFO at high rates, and trying to do a direct access in close timing proximity.
Because when you are dealing with real world (pin) events, of course not at higher speed the hub can tolerate, they will most probably be asynchronous and not in sync with the eg random write hub window. If you need to acquire the event, process somehow and store it even if its frequency is the same or a bit lower than the hub, but at varying duty cycle you risk to miss the hub window and thus mis the next data acquisition. One level write buffering is mandatory IMHO. The second write is OK to stall the cog since this means you are trying to deal with to high frequencies the propeller is not capable to handle but is not admissible to loose details just because they are out of phase, and this is ordinary with the real world events.
With the hub FIFO, once you set it up for read or write, every read or write instruction always takes just one clock. The limitation, of course, is that you are reading/writing the hub memory in a straight line.
Once you do a RDINIT D/#address19, the bottom level of the FIFO is already primed and you are ready to pull any number of sequential bytes/words/longs from hub memory, either via software or hardware, at up to a byte/word/long per clock. You can never outpace it. Same goes for WRINIT D/#address19. You are immediately ready to software write or hardware stream, at any rate, up to the system clock, any number of bytes/words/longs into hub memory.
For cases where determinism is important, this is the ultimate in efficiency, as long as reading/writing in a stream is what you need.
Does anyone see a strong need for separate read and write FIFOs that could operate concurrently (but not at top speeds, together)? This would be good for software reading and writing. In my experience, I usually need to input for a while, or output for a while, in which case a single FIFO, usable for either reading or writing, is adequate.
Great! Can you confirm that direct read/writes are not "buffered" or "non-blocking"? I.e. that they behave the way one would normally expect?
Ross.
I think direct read and writes need to yield to FIFO activity, and use slots the FIFO skips. For software FIFO activity, this is no problem, but for hardware streaming, this could introduce delays. Is this okay?
Hmm. If I understand the FIFO operation correctly, I think it is useful only in quite limited scenarios, so that should be ok. You would either use the FIFO or direct access - rarely both at the same time.
Ross.
That's true. Both would be getting used during hub exec, though.
Yes, I wondered about that. What happens when the instruction fetched via the FIFO is a hub access - do you have to wait till the FIFO fills up before the hub access is executed? If so, would that be up to 20 clocks plus whatever the hub latency happened to be for the address being accessed?
Ross.
Well, since we are drawing instructions from the FIFO at no more than half the rate they are going into the FIFO, the FIFO will almost always be nearly topped off, so there's not much waiting, if any. Wait... on branches the FIFO will want to reload pretty often. Maybe for hub exec, we limit it to a depth of only eight, or so.
But the FIFO will be empty after each branch, so if the next instruction after the branch is a hub operation, the wait may be very long. And in some code the FIFO will rarely if ever get a chance to fill up.
Ross.
Maybe hub instructions should take priority over instruction spooling, then. Any ideas about how to improve this?
Yes, I think that would be better. If you want to use the FIFO for other purposes (like streaming), then don't use direct access!
Ross.
No. Except perhaps to make the operation of the FIFO (such as how it behaves in the presence of direct access) configurable for different purposes.
More complexity!
Ross.
Would you like to go back to the strict round-robin approach? Maybe with some slot allocation?
Now I know you're taking the mickey
No - I think this could work. The simplest thing is just to make direct hub access take precedence over the FIFO. Those who want to use the FIFO for other purposes just have to be aware of the consequences of also using direct access.
This also works best for hubexec.
Okay.
Isn't the FIFO a lung (perhaps not the right word, to clarify isn't a stack a LIFO?). I mean in a 20 elements FIFO you get out the first element that went in, in order. It doesn't mean that all the elements need to be filled in to get out the first one, its a variable length storage of up to n elements, isn't it?
That's right. It starts out at size=0 and can grow to 19 (used to be 20, but 19 is what we actually need).
will the FIFO linearly read/fill the hub source/destination endlessly increasing the hub address? It is possible also to set a known amount of hub longs (space) and utilize the FIFO to eg. read/write a hub based circular buffer (auto roll-over to the starting address)? Or you need to stop the FIFO reset the starting address and start it again?
Having separate read/write FIFO's would potentially double hub-to-hub copying bandwidth, with the addition of "COPYB/W/L" instructions, so things like the str* mem* C library code would benefit, as would video blits, sprites etc.
This time, even I am not sure it is needed / worth the gates.
Stream into a buffer on the cog, stream out of it, assuming REPs. When copying longs, 200MB/sec copy rate, 100MB/sec for words, 50MB/sec for bytes
With separate read & write fifos, and addition of "COPYB/W/L" instructions, assuming REPs:
When copying longs 400MB/sec, words 200MB/sec, bytes 100MB/sec
For comparison, an 80MHz P1 would copy at:
longs 10MB/sec, words 5MB/sec, bytes 2.5MB/sec
With separate read/write FIFO's and the COPYB/W/L instructions, no cog buffer is needed.
It loops through the whole memory right now, but it could be made to wrap in a limited area. That's a great idea you have! That way, you could output to four 8-bit DACs a loop of longs at up to 200MHz. If we had one more control bit somewhere, we could make the buffer switchable in position, so that we could write one buffer while we output the other.
Yes!
Would it be possible (reasonable) to have two FIFOs like this?
Cheers,
Peter (pjv)