Bufferless text video mode?
Rayman
Posts: 14,641
in Propeller 2
Was just thinking about a text mode that was like P1, only with arbitrary colors.
Seems simple enough, just bring in say the Parallax font as 1 bit per pixel array of words.
Use a screen array with one byte per character position on screen for character #.
Use a two longs per character position for foreground and background color.
But, how can you change colors between characters?
The streamer would be in 1 bpp LUT mode.
How do you arrange to change the first two LUT entries between characters?
Is there a way?
Seems simple enough, just bring in say the Parallax font as 1 bit per pixel array of words.
Use a screen array with one byte per character position on screen for character #.
Use a two longs per character position for foreground and background color.
But, how can you change colors between characters?
The streamer would be in 1 bpp LUT mode.
How do you arrange to change the first two LUT entries between characters?
Is there a way?
Comments
I need to get an FPGA. I've been wanting to try scanline graphics on the P2 for a long time now.
That way could also be adapted to do arbitrary graphics without a buffer, like we talked about in a different thread.
What we'd need to do it the easy way is some kind of XLUTOFF command that lets the streamer work from some LUT offset instead of LUT address 0... If that were there, then LUT could hold 256 pairs of foreground and background colors... Would allow up to 256 columns of text with arbitrary colors...
Right?
I think the problem here is information bandwidth.
1 bpp is lowest possible bit-rate per character, but if you want to change colours on a true random, FG/BG basis on any char boundary, now you need to move 24+24 bits every 8 pixel clocks.
ie, on average, you are now at 6(7) bpp
Of course, real systems do not need fully random colours, so you may be able to do some compression by using a table of used colours, per line group.
This becomes a form of scan-line buffering, only you do not fill-in at the pixel level, you fill-in at the palette index level.
http://forums.parallax.com/discussion/153913/p2-vga-text-driver
I know I had a bit of a battle building the scan lines in time for the video hardware.
I remember at the time that a pixel conversion instruction would have been nice.
Since P2_Hot had twice the performance of the current P2 it's now even trickier.
In the current P2 I have made text on the fly but only in 2 colors. (See Invaders 2.0 & the even earlier PST style text driver)
Here's the Verilog code for the pixel convert idea.
Anyhow it was just an idea based on my experiences so far using/creating text stuff.
It's NTSC with 80 column and a color per character. For VGA timing, I got max. 40 columns. But with the higher clock freq of the real P2 80 chars per line should be possible.
It works with the Instruction set of Nov 2015, I have not updated my P2 FPGA since then.
Andy
What I'd like to do is reproduce the P1 XGA (1024x768) text mode.
But, with arbitrary 24-bit foreground and background colors per 16x16 pixel cell.
Maybe use two cogs to alternate between each 16 pixel tall character row.
Within that row all the colors and characters are the same, so maybe can be fast.
Think two 16-pixel tall row buffers for XGA comes in ~100 kB, so not too bad.
Pixel clock is 65 MHz. With two cogs, that's 32.5 MHz each. We're at 40 MIPS per cog, so that's pretty tight. There is 30% blank time on horizontal line, so maybe can be done with tight rep loop? Anyway, I think 4 cogs might be able to get it done.
Still wonder how feasible it would be to change the LUT origin with the XINIT/XCONT or some new X instruction... Or, just have the LUT origin advance two places after every xcont...
Three COG is another possible solution point.
I think auto-advance does not really help, as you need to refill the LUT every block of lines, with that X colour-repeat set. It may allow a slightly smaller LUT, or more pixels per LUT.
This problem is similar to the HyperRAM Streamer questions I posed.
The streamer can already manage blocks of data fast, but I've not seen details around the edges of this. ?
eg With HyperRAM, you need to be able to flip from Read to Write, on a clean burst boundary.
That means a write-mode streamer drives the pin, and immediately a read mode streamer starts, the pin dirn needs to reverse.
If those can be queued, so much the better, but HyperRAM is more tolerant of hand-over gaps than Video is.
With Multi COG video and Streamer, you need a tighter, gap-less hand-over between COGs.
ie COG1 primes with a LUT Start and Count (and can be 1,2,4,.. bppp I think ?)
Meanwhile COG 2 readies the next burst Start and Count, but the Streamer needs to queue that, and change over on the boundary, ideally without dropping clocks.
COG3 can give 3x the time to prepare the information, and so on....
In MCU land, some SPI ports can manage this change over better than others. Losing a clock is quite common, but some do manage clean packing.
More usual Streamer use, is to set for a whole line, and 'go', and the boundaries are less critical then.
Is there any info, or test results, on exactly how the Streamer behaves with small, closely packed bursts ?
(and commands coming from more than one COG, on a ping-pong basis ?)
Need to read in 128 longs, but this is only 1 clock each with SetQ2, right?
Common might be 4,8,12,16... pixels (in this instance, you want 16)
Next, does that UAB alignment need to change, or does this work only on 1bpp streaming (==LSB) ?
Sounds like a reasonably significant amount of logic and config registers.
to poor to buy a FPGA, have to wait for silicon.
Enjoy!
Mike
I think that would cause glitches in the video output based on what the docs state.
I got all the instruction timings worked out in the Google Sheets file I've been working on:
https://docs.google.com/spreadsheets/d/1EM9LYoqcUgn0hAhzE38vLEi7-IABeD1CdLqDgICx3Hc/edit?usp=sharing
I just need to finish making descriptions for the math/logic instructions. Anything heavy will be explained in the Google Doc.
Looks like he was able to use some bits in D to set an offset in LUT for each character.
That's exactly what I think I need...
Well, this gives 16 possible sets of colors anyway. Guess that's enough to replicate P1 video somewhat...
In my old Potato text driver, I got 80 columns and did it with a single scanline RAM buffer. While this worked, racing ahead of the beam, I found I could not reliably composite (overlay) additional graphics, such as a mouse pointer reliably. It also had sharp lower clock speed bounds and would not fail gracefully.
(I did a 2 to 4 color lookup conversion, planning for the mouse to use an unused color to always be unique to the two color chars. 2 bits per pixel waitvid.)
If a double buffer is used for the scanline RAM, all of this gets much easier.
Fast fetch char row, do hub lookups for char pixel data, overlay pointer, or sprites, etc... and there is the whole scan line to do it in.
Use the vertical scan line counter to toggle display vs fetch buffers.
Doing this is likely one cog on p2. Took two on P1 due to how expensive color lookups are.
I'm thinking that maybe using the actual P1 ROM font here might make sense...
If you want the 16-bit wide font. Since a long read takes just as long as a word read...
Just use 2bpp mode. That might reduce the # of colors sets from 16 to 8 as you'd need a different color set to select between the merged pairs of characters, just like P1
Yeah, Parallax font would be good 2bpp.
I've got my FPGA updated, but am currently porting a bunch of old code. Stepped away a bit too long.
Might just rewrite.
The slowest sweeps are 640x480 and that same mode, interlaced us the slowest. Did that a while back and many displays de interlace for free. But 640x480 can do nice text.
I'll bet there is time on P2 with a full scanline buffer, using a quick row fetch like Chip suggested.
Cog indexing is fast now and there is the LUT. It's probably best to keep scanline buffers in COG. 2bpp is nice and small.
You only need 160 bytes per line that way, and you could also just convert one color fonts to two color via script or at runtime too.
That leaves a color for the pointer, should you drop one in.
If I were to attempt it right now, that is what I would try.
The challenge could be getting that careful interleave and keeping it ?
I wonder about 4bpp and some combination of the nib and mux instructions too.
Drive it 1bpp, use an interrupt to directly modify color entries in the LUT on char boundaries. I did test this early on. When chip double buffered the streamer I did not test again. May not work now.
Could be two cog, shared LUT scenario too.
A buffer change should not break it, but it may shift the load alignment.
How many bpp did you test with ?
I think the write needs to be either
Atomic (ie 32b write can update 16 FG, 16BG in one clock edge),
or maybe 2 writes can be carefully ordered, based on the current-pixel content, so you avoid swap of an about to read pixel.
Along the lines of That is one test, for choice of one of two write orders, inside the interrupt ?
INT update of 2bpp I think gets harder, could work for an 8b palette ?
But, can't be two cogs, right? Each cog has streamer tied to it's own 4 I/O pins.
That means you could time-share 2-3-4+ COGS streamers onto pins, which I've assumed was possible.
The next question is, how seamless can that time-share be made ?
I'm hoping chip just buffered commands. If so, this should still work.
As for writes, the LUT read for the pixel should happen, and once it does, changing it would affect a future pixel.
The signal cog does the streaming to display, and maybe char row fetch on blank too.
Either cog can modify shared LUT values.
The graphics cog does hub fetches for char values to load line buffer and can start this in the blank after char row arrives to be ahead of the signal cog.
BTW, char values are the same, so they can be buffered too. Not sure that gets us anything though.
Streamer on graphics cog goes unused. It's fetching char and attribute values from HUB to get pixel data addresses, and looking up colors from the attribute values, used as addresses. The interrupt will need to get those, maybe small ring buffer.
Hopefully, the signal cog can have the interrupts for color changes.
Put line buffers, char row in LUT. One buffer should work. Two would work at higher sweep frequencies. Should fit at 320 bytes per line.
This is basically P1 style, but with the shared LUT and pointers. It's gonna work much faster.
Instead, think I'll try a 2 bpp screen array. At 1024x768 that's ~200 kB.
I'll use the P1 ROM font and Andy's trick to set LUT offsets.
Just need a byte array of color choices for each 16x16 tile.
This sounds easy and can allow arbitrary graphics over full screen (with just a restriction on colors).
Or, maybe a long array of 4 colors for each tile so that tile colors can be arbitrary.
Don't need the LUT offset trick in this case.
RFAST should be able to get those four longs super fast...
I get ~50kB for color array.
That gives me 8 slots for color sets that I can cycle through.
This avoids the buffered streamer command issue, I think...
It's cool we have enough RAM now to make that kind of tradeoff.
One other thing you might consider, if you don't need a full screen bitmap would be to just partial buffer the bitmap.
Just make vertically small region, couple of rows and have a text COG render into them ahead of the raster.
1024x64 or something.
But, looking like arbitrary tile colors isn't going to be possible with one cog.
Looks like going to be limited to 16 color options with the LUT offsets.
But, maybe arbitrary tile colors can be done with second cog with shared LUT.