Streamer WFBYTE vs fast block move
SaucySoliton
Posts: 521
in Propeller 2
My NTSC input program streams ADC data from scope mode to the hub. I was experimenting with oversampling the video, because why not? It appears that streamer WFWORD and WFBYTE don't use the hub as efficiently as WFLONG.
Clock frequency is 280MHz.
The processing code on the same cog uses fast block moves to
read 12,273,000 Longs per second, 16 longs at a time
and write 3,068,250 Longs per second, 4 longs at a time.
This is mostly because the FIFO is in use by the streamer and the desire to have a useful video decoder in one cog.
The streamer is feeding the FIFO with data from the scope filter at these rates:
12,273,000 Longs per second: OK (data added to fifo every 22 clocks)
24,546,000 Words per second: OK (data added to fifo every 11 clocks??)
49,092,000 Bytes per second: Not OK. (data added to fifo every 5 clocks???) The processing code is not keeping up. That is odd, because the total data rate from the streamer should be unchanged compared to the previous two cases. This leads me to believe that bytes are not combined into longs for writing.
Clock frequency is 280MHz.
The processing code on the same cog uses fast block moves to
read 12,273,000 Longs per second, 16 longs at a time
and write 3,068,250 Longs per second, 4 longs at a time.
This is mostly because the FIFO is in use by the streamer and the desire to have a useful video decoder in one cog.
The streamer is feeding the FIFO with data from the scope filter at these rates:
12,273,000 Longs per second: OK (data added to fifo every 22 clocks)
24,546,000 Words per second: OK (data added to fifo every 11 clocks??)
49,092,000 Bytes per second: Not OK. (data added to fifo every 5 clocks???) The processing code is not keeping up. That is odd, because the total data rate from the streamer should be unchanged compared to the previous two cases. This leads me to believe that bytes are not combined into longs for writing.
Comments
If you experiment with the actual number of read/writes in the block transfer burst does that help? They seem fairly small at 16 & 4 so perhaps requesting some larger block transfers might help if the setup overhead gets spread out (amortized) over the larger burst, writes in particular. Could doing that get you over the line for the bandwidth performance you require I wonder?
Yet more details for Chip still to reveal.
So, the fifo would need to write single bytes to the hub to make this 20 clock deadline. If the bytes were combined into words/longs, how long should the FIFO wait before writing a single byte? Even if it did have some features to combines bytes, I'm writing a bit slow to fill a long and have it written in 20 clocks.
Another part of the problem could be that when writing bytes the FIFO will use the same hub slice several times in a row, stalling the block move for a very long time.
Reminds me: Back in the day when dialup modems got compression features added they sent the first character of a string straight through uncompressed, with the remaining arriving later. It worked well for BBSes and remote terminals where you were either typing one character at a time or large strings like display updates or file transfers. But it caused havoc in industrial control applications where everything was in round-robin short strings.