Is there a reason the FIFO couldn't be also used for buffered random writes (and possibly reads)? Perhaps by extending each FIFO element to also hold the addresses?
I hear you, but I would think the required realestate would be negligible.
The gain is diminishing, as the system still needs to wait for a slot to align, and if non-blocking direct is possible, (as requested before and in #742) then that is very close to a Buffer version.
If we can get non-blocking WRx, then that's fine, as that's what I'm after. I just figured use the FIFO as much of the logic is likely already there.
Even without non-blocking direct, if the FIFO is not being use for anything more important, it can achieve the same thing ie Set address and Data, then continue code, and if a new Address arrives, before the prev one is done, it now waits.
Is that how it works? I guess I missed that.
To load both address and Data fields is going to need 2 opcodes anyway.
Not sure why that would need to be the case, just as it's not for WRx. I guess it's not really a big deal though.
By "work around" do you mean "wait politely for the FIFO ops to finish" or "barge to the head of the line" ?
-Phil
I think Chip means that direct read/writes would stall the FIFO, which would then resume operation after the direct read/write has completed.
This will make things a bit confusing if the same addresses are involved. I foresee very interesting times ahead trying to debug programs on this new chip!
By "work around" do you mean "wait politely for the FIFO ops to finish" or "barge to the head of the line" ?
I don't think either is very accurate - they are separate from the FIFO, "work around" I take as bypass, or' go around the outside of'.
The Rotate scheme means they might both want the same slot, if their lower 4 bits match.
There, some decision is needed on does HW trump SW, or vice versa.
At fSys/1 there are no spare slots anyway, but at fSys/2 and slower, there is some slack.
If SW flukes 16c spaced access to the same LSN every rotate, then SW trumps HW could stall fill forever ?
When you guys talk about 'non-blocking', what do you mean?
To me it means that the operation returns immediately, regardless of pending conditions. For example, a non-blocking read from the serial port might return either the ASCII character or -1 to indicate that there was no character available to read. A blocking read would wait for an available character.
When you guys talk about 'non-blocking', what do you mean?
From the software side, it means the opcode executes, and then continues, and in the example of a WR_Direct, a simple single buffer (as in a classic UART) stores the value and waits for the matching Slot.
The advantage is software can proceed and do other stuff, and so fill the available (Rotate+LSN) determined fSys cycles.
Sw that tries to run faster, will block on the second write, as the buffer is not yet empty.
Mainly, it gives a means to work 'somewhat more' like a P1, and 'interlace' HUB writes with opcodes.
Both jmg and markaeric seem to be assuming we will. But I don't understand why. Isn't that what the FIFO is for?
The FIFO handles streaming linear data, for HW and SW clients very nicely
Direct opcodes are for random single data transfers, and the merit of a (single) buffered-write in Direct case, is that the code can continue.
I wouldn't call it non-blocking entirely, as rapid writes will have to wait.
Buffered reads are a little more complex, and need a RDREQ to set the address and a RDGET to be applied some opcodes later to pick the result.
That's pretty much a clone of the FIFO, so the only case for this would be if the FIFO were busy doing something else more important.
There are cases where the writes need to bypass the fifo and have priority, and there are cases where the fifo must complete its writes before allowing the bypass.
If the fifo is used for reading, and a bypass write takes place, there are cases where the fifo needs to be flushed and there are cases where the fifo doesn't/shouldn't be flushed.
This is way more complex, and becoming so by the minute.
This New Hub Scheme was supposed to be simple and easy. Clearly it's not!
The FIFO handles streaming linear data, for HW and SW clients very nicely
Direct opcodes are for random single data transfers, and the merit of a (single) buffered-write in Direct case, is that the code can continue.
I wouldn't call it non-blocking entirely, as rapid writes will have to wait.
Buffered reads are a little more complex, and need a RDREQ to set the address and a RDGET to be applied some opcodes later to pick the result.
That's pretty much a clone of the FIFO, so the only case for this would be if the FIFO were busy doing something else more important.
So you want a one-entry FIFO even for direct reads and writes? But what's it for?
It seems very wasteful to make even a simple Read require two instructions to execute.
Ross.
There are cases where the writes need to bypass the fifo and have priority, and there are cases where the fifo must complete its writes before allowing the bypass.
If the fifo is used for reading, and a bypass write takes place, there are cases where the fifo needs to be flushed and there are cases where the fifo doesn't/shouldn't be flushed.
This is way more complex, and becoming so by the minute.
This New Hub Scheme was supposed to be simple and easy. Clearly it's not!
It's faster, and the FIFO SW path clearly has caveats, so needs use caution.
The real scrambles can come where the same data areas are being accessed, but that is rare, and really operator error.
More interesting is the subtle details of allowing HUB SW access (via Direct) and Hardware streaming.
Some nice performance and control can result, but there are little fish-hooks, like the one I gave in #761
If SW flukes 16c spaced access to the same LSN every rotate, then SW trumps HW could stall fill forever ?
For HW Streaming, better may be HW(FIFO) trumps SW(Direct)
Borrowing from serial I/O protocols, bypassing the FIFO can occur where software (XON/XOFF) handshaking is in effect. You don't want to buffer outgoing handshaking characters when the input buffer is threatenting overflow: they need to be sent at once. So there are likely to be cases outside the serial I/O realm where the same is true. Therefore I agree that direct reads/wirtes should bypass the FIFO in the P2 by default.
Therefore I agree that direct reads/wirtes should bypass the FIFO in the P2 by default.
I would agree for SW using the FIFO case, but in the DMA (hardware using FIFO) instances, FIFO is not available for SW, and here it is best if DMA trumps SW directs access, should a slot allowing both occurs.
Do I have this Block Diagram sort of correct?
The FIFO can be set to be used in one direction only, either Read from Hub, or Write to HUB?
That looks right.
A thought: It may not be wise to step in front of the FIFO on a needed hub slot, because we could easily disrupt its operation enough that it starves on reads or overflows on writes. I think we either wait for the needed slot when the FIFO is not using it, or turn off the FIFO before doing direct reads/writes. Probably the prior is best. In 98% of cases, a free slot of interest would be coming up very soon. Only if the FIFO is set for 100% throughput would you not get a free slot, in which case interrupting it would certainly cause it to err.
A thought: It may not be wise to step in front of the FIFO on a needed hub slot, because we could easily disrupt its operation enough that it starves on reads or overflows on writes. I think we either wait for the needed slot when the FIFO is not using it, or turn off the FIFO before doing direct reads/writes. Probably the latter is best. In 98% of cases, a free slot of interest would be coming up very soon. Only if the FIFO is set for 100% throughput would you not get a free slot, in which case interrupting it would certainly cause it to err.
Are you talking about SW or HW(DMA-Video streaming) use of the FIFO here ?
I think those different use cases need different priority rules.
A thought: It may not be wise to step in front of the FIFO on a needed hub slot, because we could easily disrupt its operation enough that it starves on reads or overflows on writes.
Ugh. You're right. Any chance that competition between a direct op and a FIFO op could defer to the FIFO, allowing the direct op to complete on the next opportunity? (Of course that would typically mean waiting for the FIFO to flush, I s'pose.)
Comments
If we can get non-blocking WRx, then that's fine, as that's what I'm after. I just figured use the FIFO as much of the logic is likely already there.
Is that how it works? I guess I missed that.
Not sure why that would need to be the case, just as it's not for WRx. I guess it's not really a big deal though.
-Phil
I think Chip means that direct read/writes would stall the FIFO, which would then resume operation after the direct read/write has completed.
This will make things a bit confusing if the same addresses are involved. I foresee very interesting times ahead trying to debug programs on this new chip!
Ross.
It could step in front of the FIFO when the needed slot came up.
The Rotate scheme means they might both want the same slot, if their lower 4 bits match.
There, some decision is needed on does HW trump SW, or vice versa.
At fSys/1 there are no spare slots anyway, but at fSys/2 and slower, there is some slack.
If SW flukes 16c spaced access to the same LSN every rotate, then SW trumps HW could stall fill forever ?
-Phil
From the software side, it means the opcode executes, and then continues, and in the example of a WR_Direct, a simple single buffer (as in a classic UART) stores the value and waits for the matching Slot.
The advantage is software can proceed and do other stuff, and so fill the available (Rotate+LSN) determined fSys cycles.
Sw that tries to run faster, will block on the second write, as the buffer is not yet empty.
Mainly, it gives a means to work 'somewhat more' like a P1, and 'interlace' HUB writes with opcodes.
-Phil
Why would we want non-blocking direct read/writes?
Ross.
-Phil
Mainly for deterministic loops that incorporate random reads/writes every 16 clocks or more.
Both jmg and markaeric seem to be assuming we will. But I don't understand why. Isn't that what the FIFO is for?
Ross.
Direct opcodes are for random single data transfers, and the merit of a (single) buffered-write in Direct case, is that the code can continue.
I wouldn't call it non-blocking entirely, as rapid writes will have to wait.
Buffered reads are a little more complex, and need a RDREQ to set the address and a RDGET to be applied some opcodes later to pick the result.
That's pretty much a clone of the FIFO, so the only case for this would be if the FIFO were busy doing something else more important.
There are cases where the writes need to bypass the fifo and have priority, and there are cases where the fifo must complete its writes before allowing the bypass.
If the fifo is used for reading, and a bypass write takes place, there are cases where the fifo needs to be flushed and there are cases where the fifo doesn't/shouldn't be flushed.
This is way more complex, and becoming so by the minute.
This New Hub Scheme was supposed to be simple and easy. Clearly it's not!
So you want a one-entry FIFO even for direct reads and writes? But what's it for?
It seems very wasteful to make even a simple Read require two instructions to execute.
Ross.
It's faster, and the FIFO SW path clearly has caveats, so needs use caution.
The real scrambles can come where the same data areas are being accessed, but that is rare, and really operator error.
More interesting is the subtle details of allowing HUB SW access (via Direct) and Hardware streaming.
Some nice performance and control can result, but there are little fish-hooks, like the one I gave in #761
If SW flukes 16c spaced access to the same LSN every rotate, then SW trumps HW could stall fill forever ?
For HW Streaming, better may be HW(FIFO) trumps SW(Direct)
The FIFO can be set to be used in one direction only, either Read from Hub, or Write to HUB?
This is not operator error. It is a design flaw.
Ross.
-Phil
That looks right.
A thought: It may not be wise to step in front of the FIFO on a needed hub slot, because we could easily disrupt its operation enough that it starves on reads or overflows on writes. I think we either wait for the needed slot when the FIFO is not using it, or turn off the FIFO before doing direct reads/writes. Probably the prior is best. In 98% of cases, a free slot of interest would be coming up very soon. Only if the FIFO is set for 100% throughput would you not get a free slot, in which case interrupting it would certainly cause it to err.
Are you talking about SW or HW(DMA-Video streaming) use of the FIFO here ?
I think those different use cases need different priority rules.
Ah, I'm thinking hardware. Software DMA is going to be done within a few clocks, usually, getting the FIFO out of the picture.
-Phil