Streamer WFBYTE vs fast block move

SaucySoliton · 2020-07-22 14:29

My NTSC input program streams ADC data from scope mode to the hub. I was experimenting with oversampling the video, because why not? It appears that streamer WFWORD and WFBYTE don't use the hub as efficiently as WFLONG.

Clock frequency is 280MHz.
The processing code on the same cog uses fast block moves to
read 12,273,000 Longs per second, 16 longs at a time
and write 3,068,250 Longs per second, 4 longs at a time.
This is mostly because the FIFO is in use by the streamer and the desire to have a useful video decoder in one cog.

The streamer is feeding the FIFO with data from the scope filter at these rates:
12,273,000 Longs per second: OK (data added to fifo every 22 clocks)
24,546,000 Words per second: OK (data added to fifo every 11 clocks??)
49,092,000 Bytes per second: Not OK. (data added to fifo every 5 clocks???) The processing code is not keeping up. That is odd, because the total data rate from the streamer should be unchanged compared to the previous two cases. This leads me to believe that bytes are not combined into longs for writing.

rogloh · 2020-07-23 03:22

So you are basically saying that if you configure the streamer to use bytes when writing ADC data to the HUB vs longs or words, you are starving out more block read/write bandwidth for your COG when it is also reading back this sampled data to the same COG for further processing and then block writing back these results to the HUB again. Have I got that right?

If you experiment with the actual number of read/writes in the block transfer burst does that help? They seem fairly small at 16 & 4 so perhaps requesting some larger block transfers might help if the setup overhead gets spread out (amortized) over the larger burst, writes in particular. Could doing that get you over the line for the bandwidth performance you require I wonder?

evanh · 2020-07-23 04:35

SaucySoliton wrote: »

... This leads me to believe that bytes are not combined into longs for writing.

I certainly haven't tested that myself but your guess seems likely. The docs do make the distinction of stating those modes as using WFLONG, WFWORD and WFBYTE respectively. And, given hubRAM byte access is supported, presumably it is doing exactly that for FIFO writes. That way it won't have the alternative problems of dealing with delayed flushing, stale data, and the likes.

Yet more details for Chip still to reveal.

evanh · 2020-07-23 04:40

evanh wrote: »

... it won't have the alternative problems of dealing with delayed flushing, stale data, and the likes.

Speaking of which, the 1/2/4-bit modes might have length granularity requirements to fill a byte.

SaucySoliton · 2020-07-24 03:21

From the documentation:

If a cog has been writing to the hub via WRFAST, and it wants to immediately COGSTOP itself, a 'WAITX #20' should be executed first, in order to allow time for any lingering FIFO data to be written to the hub.

So, the fifo would need to write single bytes to the hub to make this 20 clock deadline. If the bytes were combined into words/longs, how long should the FIFO wait before writing a single byte? Even if it did have some features to combines bytes, I'm writing a bit slow to fill a long and have it written in 20 clocks.

Another part of the problem could be that when writing bytes the FIFO will use the same hub slice several times in a row, stalling the block move for a very long time.

evanh · 2020-07-24 09:08

I remember Chip having to extend the FIFO depth at one stage when he discovered conditions where it needed more.

Reminds me: Back in the day when dialup modems got compression features added they sent the first character of a string straight through uncompressed, with the remaining arriving later. It worked well for BBSes and remote terminals where you were either typing one character at a time or large strings like display updates or file transfers. But it caused havoc in industrial control applications where everything was in round-robin short strings.

Streamer WFBYTE vs fast block move

Comments