In my other recent thread about 2MB hub RAM running PropGCC code on a P1V, a question was asked about what hardware I was using so here are some more juicy details of what I've been playing with. I decided to start another thread on this as there is quite a bit to show and tell.
Brief background: Over many years I've been fascinated wtih getting video generation directly out of embedded microcontrollers and developed quite a collection of little home grown systems using discrete TTL logic, AVRs and Propellers etc with tightly timed software to do this but there was always something lacking; either limitations in resolution, color depth, memory bandwidth, or application memory size etc. I forever wanted something faster/better that I could still mostly create myself (no RasPi's for me!). When the P1V came out I knew there was still hope yet for me if I moved from software back into FPGAs again...
So in the last few months I've designed/assembled/tested a handy little SRAM memory expansion PCB that fits neatly under the BeMicro MAX 10 FPGA board (Tubular and Ozpropdev each assisted a little bit too along the way, thx guys). This PCB probably suits several other BeMicro boards too like the CV, maybe even their new A9, as they each seem to use a common high density (MEC6) connector interface with basically the same pinout.
My PCB supports up to 2MB of 32 bit wide RAM (byte addressable) using two 16 bit TSSOP-II devices and is connected to every single pin output by the FPGA on the BeMicro MAX10 MEC6 connector giving maximum memory expansion without any bus multiplexing.
The memory fitted on my particular board is high speed SRAM rated for 8ns access at 3.3V. With bus turnaround accounted for I am able to run my board with an 80MHz P1V propeller and get 8 independent read or write accesses per hub cycle out of it, which is ideal for what I require.
In conjunction with this I've also designed a video rendering engine in Verilog that integrates with the P1V hub and shares this 2MB SRAM with the 3 COGs in the MAX10 FPGA. Having 2MB now is great because it allows for really good looking 800x600x15bpp double buffered graphics plus a bit extra left over for a text buffer and fonts, all without burning any internal hub RAM. To save some LE's on the 8k LE 10M08 FPGA I removed the legacy video Verilog from my P1V COGs, because there is no real need for that any more. The WAITVID instruction has also been tweaked to wait for VSYNC instead which is handy to sync with a screen redraw.
My video engine supports both attributed color text and bitmapped graphics modes (one 16 bit word per pixel, one 32 bit long per character). I'll list a few more details below.
In its text mode the per character attributes are all fully independent and include: bright/dim, flash, hidden, underline, inverse, font1/2 along with separate foreground and background color. Text mode actually uses 9 bit color (3 bits per primary color), allowing the bright/dim attributes to be achieved by simple logical shifts of the 3 bits within the overall 5 bit dynamic range per color. All this information nicely fits within the 32 bits per character allowing clean atomic updates of each character when writing to the screen memory so there are no weird flashes during video memory updates.
Character scanlines for the fonts are selectable up to 32. There is a hardware cursor that can be set to flash and its starting scanline is also programmable allowing block or underscore types of cursors. Two different fonts are allowed per screen and the 9th bit of the character conveniently selects between them. You can use this feature for alternate shaped fonts such as italized or bold text for example, and still have the rest of the attributes available.
In graphics mode you can vary the start address where pixels are read from which then allows for some scrolling / page flipping effects. The graphics engine can also do pixel doubling giving 400x300 resolution which allows 4x as many screens in the same memory and will be good for some games that don't need full resolution.
All video register control is done by using simple COGINIT calls to the unused COG IDs on the MAX10. I found this is a very convenient way to control the system. For example COGINIT(3, param1, param2) can be used to enable the video engine and setup the mode and pass an address with the parameters. It allows any COG to control video and the hub nicely synchronizes the different COG accesses to my engine.
The hub uses 3 of its 8 slots for the 3 P1V COGs to read/write SRAM/hub memory on the MAX10 board. Four of the remaining time slots get used by the video engine for reading bitmapped graphics or text.
There is still one spare hub timeslot left which is handy as I plan to use it with a general purpose DMA engine. This DMA engine will support both audio sample streaming and graphics image copies (eg. sprites with transparency) in the background, giving the audio transfers priority when reading hub memory. Video memory copies will happen in the remaining hub timeslots unused by audio. I'm possibly interested in also doing short intermittent SPI or UART type transfers this way. This saves burning whole COGs for just simple UARTs etc and I can still use COGINIT with other COG IDs for controling the other various DMA engine(s) I may add.
Once I had my SRAM board working a carrier board that fits the BeMicro MAX 10 was then built up for my experiments and right now it contains the RGB DAC resistors and prop plug interface. Utimately I may design a full PCB board for all this but for now it still suffices for my various experiments.
Propeller PORT A was mapped to all the BeMicro MAX10 board's PMOD I/O pins and for the propplug UART interface. The remaining pins of PORT A are mapped to the onboard peripherals like the accelerometer and 2MB SPI flash.
I kept Propeller PORT B free and fully available for 32 bit GPIO. PortB sits along on one side of the BeMicro MAX10 IDC pin header connectors, the second IDC header side is used for the RGB video DAC pins and still contains 8 spare differential pairs for connecting to a future LVDS LCD panel plus a few more for possibly some i2c capacitive touchscreen experiments. The expansion port obviously goes to my SRAM board. So everything seems to fit together really nicely now. All my Verilog so far with 3 COGs uses about 84% of the MAX10 10M08 device, so I still have a bit left to spare, and the remaining things I need to add should fit.
Once I got PropGCC working with this (my other thread) I quickly hacked up a simple gfx drawing code in PropGCC C to do Breshenhams line/circles and it works well and already quite fast in C at 5MIPs. Native COG PASM would be even faster, and if I get ambitious I might even put some simple line drawing and fill functionality in my sprite engine to max it out (@5Mpixel/second draw rate).
Attached are some photos of some of this work... apologies for the SVGA screen shot quality, my monitor is sometimes difficult to photograph with my room lighting. However in person I can assure you it all looks totally awesome and clear on my old Sony Trinitron and the 32 levels of primary color gradients are really nice and even.
- Add SD card to my carrier board
- Add PS/2 Keyboard and Mouse - trivial, though H/W USB would probably be nicer (maybe one day!)
- Incorporate my RDLONG WC/WRLONG WC/etc autoincrementing instructions back into the P1V Verilog to speedup LMM execution by 12.5%
- Extend DMA/sprite engine, add few extra video enhancements allowing multiple text/gfx regions per screen, more scrolling, masking ops
- Add a 24 bit AUDIO I2S DAC - some PMOD devices are available for this which will suit me
- Use the on board 2MB flash to boot a shell and write some simple OS code to load/run expanded LMM images from SD
- Get the onboard 8MB SDRAM up and allocate it to a single COG for the really large programs