I think the way to do this is to eliminate one of the two hub RAM lookups. You need to lookup characters and then font scanlines. The characters are contiguous in memory, but the font scanlines are random, as they are dictated by the character data. So, use SETQ+RDLONG to get a row of characters into cog registers. It only takes one clock per long that way. Then, use RDLONG/RDWORD/RDBYTE instructions to lookup the font scanlines. Do it from cog-exec, not hub-exec, of course. I think you could make a fast character-based display that way.
Inspired by Chip's suggestions I had another attempt at building a text driver without a frame buffer in 1 cog.
I used "SETQ+RDLONG" instructions to blast text and font data into cogram.
I also organized font tables for faster sequential flow.
In addition to these tqweaks I also pre calculated pixel masks and stored them in LUT.
The target was 40 columns (32 was achieved).
The killer is the pixel conversion stuff. Silicon at x2 speed obviously would be Ok.
I also added a test to see the side effects of shared LUT/streamer operation.
As predicted the "glitch" is visible.
Experimenting with frequency of access and access length had mixed results.
Maybe with some syncronization using "COGATN" might be able to disguise the glitch.
Anyhow here's the code.
Maybe some fresh eyes can see where more gains can be made.
I also added a test to see the side effects of shared LUT/streamer operation.
As predicted the "glitch" is visible.
Experimenting with frequency of access and access length had mixed results.
Does this mean two COGS, or live LUT changes ?
For the two COG case, can a carefully interleaved access solve side effects ?
It means for a 80MHz SysCLK you limit to 40MHz streamer, and every second slot is possible 'other cog' access ?
Not sure just how you sync that exactly, I don't think 2nd COG can see any 1st COG streamer info, but they could maybe use a common master timestamp, and pivot off that. A quick test could simply run a few char-rows of each timing.
What about Two COGS, 'sharing at the pins', to run two streamers, in short char wide bursts ? - how serious are the side effects doing that ?
Does this mean two COGS, or live LUT changes ?
For the two COG case, can a carefully interleaved access solve side effects ?
It means for a 80MHz SysCLK you limit to 40MHz streamer, and every second slot is possible 'other cog' access ?
Not sure just how you sync that exactly, I don't think 2nd COG can see any 1st COG streamer info, but they could maybe use a common master timestamp, and pivot off that. A quick test could simply run a few char-rows of each timing.
What about Two COGS, 'sharing at the pins', to run two streamers, in short char wide bursts ? - how serious are the side effects doing that ?
The "noise test" was two cogs with the second cog writing to the first cog's LUT.
A write to LUT from the second cog while the streamer is outputting visible lines causes pixel loss (black) noise.
Dual streamers would not work as each cog has it's own exclusive DAC set.
Is that what you meant by 'sharing at the pins'?
The "noise test" was two cogs with the second cog writing to the first cog's LUT.
A write to LUT from the second cog while the streamer is outputting visible lines causes pixel loss (black) noise.
What pixel clock to sysclk ratios did you try ?
For ratios not 1:1 it seems to me you could interleave LUT access ?
Comments
Had to make image as data file.
2bpp BMP image is possible but can't find any app that can save that way...
Photo has some weird artifacts from camera, looks better in reality...
Inspired by Chip's suggestions I had another attempt at building a text driver without a frame buffer in 1 cog.
I used "SETQ+RDLONG" instructions to blast text and font data into cogram.
I also organized font tables for faster sequential flow.
In addition to these tqweaks I also pre calculated pixel masks and stored them in LUT.
The target was 40 columns (32 was achieved).
The killer is the pixel conversion stuff. Silicon at x2 speed obviously would be Ok.
I also added a test to see the side effects of shared LUT/streamer operation.
As predicted the "glitch" is visible.
Experimenting with frequency of access and access length had mixed results.
Maybe with some syncronization using "COGATN" might be able to disguise the glitch.
Anyhow here's the code.
Maybe some fresh eyes can see where more gains can be made.
This code, on YCbCr (component), should perform nicely and deliver 640x400ish lines. 80 column definitely possible.
Want to get on my FPGA! Grrrr....
The pixel conversions are expensive! As we all thought.
Seems like changing the LUT on char boundaries is worth a look.
For the two COG case, can a carefully interleaved access solve side effects ?
It means for a 80MHz SysCLK you limit to 40MHz streamer, and every second slot is possible 'other cog' access ?
Not sure just how you sync that exactly, I don't think 2nd COG can see any 1st COG streamer info, but they could maybe use a common master timestamp, and pivot off that. A quick test could simply run a few char-rows of each timing.
What about Two COGS, 'sharing at the pins', to run two streamers, in short char wide bursts ? - how serious are the side effects doing that ?
The "noise test" was two cogs with the second cog writing to the first cog's LUT.
A write to LUT from the second cog while the streamer is outputting visible lines causes pixel loss (black) noise.
Dual streamers would not work as each cog has it's own exclusive DAC set.
Is that what you meant by 'sharing at the pins'?
What pixel clock to sysclk ratios did you try ?
For ratios not 1:1 it seems to me you could interleave LUT access ?