P2D2, TAQOZ and HyperRAM design decisions

Peter Jakacki · 2018-12-04 03:40

This P2 sure is keeping me busy but there are a couple of design decisions I thought I'd get some feedback on.

First off I've tested out a lot of extras for TAQOZ and I'd like to include them in the ROM but I only have maybe 12kB of ROM available. The current ROM is this:

ADDR	BYTES	FUNCTION
========================================
$FC000	1376	BOOTER
$FC560	1304	SD BOOT VARIABLES & CODE
$FCA78	1455	LMM DEBUGGER
$FD027	12195	TAQOZ
$FFFCA	54	FREE

TAQOZ is made up of PASM, HUBEXEC, and 16-bit wordcode as well as a dictionary but I have also added many extras and also the dictionary has expanded. How do I fit that all in to 12k? Of course I could remove some other useless/useful code but that's not enough. However, if I take the binary compiled for testing in RAM and compress it with LZMA it compresses to around 8k. I'm taking a look at what would be required in P2ASM to extract the contents of the archive and load it into RAM? Do you think I could write a decompress in around 2k of assembly?

Second, the P2D2 hardware now has a tiny 3x3mm QFN20 Busy Bee micro supervising the power, reset, DTR, and all 6 boot lines. This chip is actually initially programmed from the P2 itself and I even have an assembler written in TAQOZ too. This means it provides the brown-out and power-on reset, optional watchdog, DTR edge detection, plus it can also load up the P2 directly, even replacing the boot ROM in RAM. It might do this by reading the SPI Flash or using a Busy Bee with larger Flash and in which case compressing TAQOZ in the ROM would not be needed.

Thirdly, I don't think it would hurt to include HyperRAM or even QSPI RAM but certainly HyperRAM is available in tiny 16M byte BGA24 packs which I could integrate into the P2D2 itself as an option. Due to the bus speed that they can handle plus what we would need them to handle they are best placed close to the P2 itself, that is, on the P2D2 which is already a very tight and compact design.

Note also that rather than relying on a multilayer PCB which although it could be manufactured that way, there is a separate thermal pcb that can be surface mount soldered to the bottom of the P2D2 with the advantage that both sides of this pcb can have rather thick copper without restriction. However, this will only be an option as my P2D2 runs at 240MHz without any cooling at present. The thermal PCB has cutouts to allow for the microSD socket to be used on that side of the board. Also the P2D2 has a 0.2" strip on the serial end with a small reset switch and larger LEDs that can be mounted on either side of the board. Maybe the thermal PCB could be loaded with a whole stack of LEDs and buffers!

Peter Jakacki · 2018-12-04 04:09

To answer my first question it seems someone has done this in 219 bytes of Z80 assembly.

Excuse the lack of formatting.

;
;	Original algorithm by Yann Collet
;;	http://cyan4973.github.io/lz4/
;   This source is based on LZ4 explained
;;   http://fastcompression.blogspot.fr/2011/05/lz4-explained.html
;
;   Z80 assembly version by edouard.berge (at) gmail (dot) com
;
;	
;
;   As the Z80 cannot adress more than 64K in a row
;   the decrunching routine DO NOT handle relative offset
;   bigger than 65535. You've been warned!
;
; 
; hl' compressed data adress
; de' negative lenght of outputed data (set -256 to output 256 bytes)
; de  output adress of data
;

LZ4_decrunch


nextsequence
exx
ld a,(hl)
inc hl
ld (lzunpacklength+1),a
exx
and #F0
jr z,lzunpack ; no litteral bytes
srl a
rrca
rrca
rrca
ld b,0
ld c,a
cp 15 ; more bytes for length?
call z,getbytelength
push bc
exx
pop bc
push hl
ex hl,de
add hl,bc ; increase counter
ex hl,de
add hl,bc ; jump over litterals
exx
pop hl
ldir

lzunpack
exx
ld a,d
or e
ret z ; cause last byte is ALWAYS litterals!

; read 2 bytes offset
ld c,(hl)
inc hl
ld b,(hl)
inc hl
push bc
exx
pop hl
push de
ex hl,de
sbc hl,de
pop de

lzunpacklength ld a,#12
and #F
add 4
ld b,0
ld c,a
cp 19 ; more bytes for length?
call z,getbytelength
push bc
ldir

; update counter
exx
ex hl,de
pop bc
add hl,bc
ex hl,de
exx
jr nextsequence

; get additionnal length subroutine
getbytelength
exx
ld a,(hl)
inc hl
exx
cp 255
jr nz,mediumlength
inc b
dec bc
jr getbytelength

mediumlength
add a,c
ld c,a
ld a,b
adc a,0
ld b,a
; bc=length
ret

jmg · 2018-12-04 04:19

Peter Jakacki wrote: »

Thirdly, I don't think it would hurt to include HyperRAM or even QSPI RAM but certainly HyperRAM is available in tiny 16M byte BGA24 packs which I could integrate into the P2D2 itself as an option. Due to the bus speed that they can handle plus what we would need them to handle they are best placed close to the P2 itself, that is, on the P2D2 which is already a very tight and compact design.

HyperRAM would certainly be nice to have ! There is also OctaRAM, which looks broadly similar (sparse data atm)

If the 6x8 BGA24 is too tight to fit, there are also SO8 SDRAM parts like IPS6404L-SQ-SPN, LY68L6400SLIT - these look to have similar refresh rules as HyperRAM.

Peter Jakacki wrote: »

Second, the P2D2 hardware now has a tiny 3x3mm QFN20 Busy Bee micro supervising the power, reset, DTR, and all 6 boot lines. This chip is actually initially programmed from the P2 itself and I even have an assembler written in TAQOZ too. This means it provides the brown-out and power-on reset, optional watchdog, DTR edge detection, plus it can also load up the P2 directly, even replacing the boot ROM in RAM. It might do this by reading the SPI Flash or using a Busy Bee with larger Flash and in which case compressing TAQOZ in the ROM would not be needed.

If you do decide to bump the Flash size, I see in QFN20, you can get 40kF in the EFM8UB3 (USB for free?), and if you use QFN24(3x3) that has 32kF & 64kF choices.
The UB3 pinout is a slight change, as they nudge in 2 dedicated USB pins. It also comes with a USB bootloader, which may simplify things.

jmg · 2018-12-04 04:53

Peter Jakacki wrote: »

To answer my first question it seems someone has done this in 219 bytes of Z80 assembly.

Interesting, google of that takes me to

https://docs.google.com/spreadsheets/d/1385RUevHgwYEnhL7wUaVAYPej5wvg9WPlq1Zm8usNqU/edit#gid=2094062265

https://www.cemetech.net/forum/viewtopic.php?t=11406&start=0

a poster there claims
"Also, if anyone is looking for alternatives to this, check out this topic for a similar compression / decompression algorithm
http://www.cemetech.net/forum/viewtopic.php?t=11292
The decompression code is considerably smaller and it yields similar compression ratios given the same data.
The compressor is somewhat slower.
I don't know if the decompression is faster, slower, or nearly equivelant.
I prefer lz4 because it's pretty easy to install with a system package manager, but take your pick based on where your priorities are."

dlz77 code here - seems to be simple lookups, which explains why it is fast. I think 36 LOC vs 72 LOC above ?

;cccc ccc0 : lits[c+1]
;bbcc ccc1 bbbb bbbb : count c+2 1024-backref


;HL=reader
;DE=writer
;BC=number of codewords
dlz77_4_main:
 ld c,(hl)
 inc hl
 ld b,(hl)
 inc hl
dlz77_4_loop:
 ld a,b
 or c  ;ensures carry is dead when rra is reached
 ret z
 push bc
   ld a,(hl)
   inc hl
   ld b,a
   rra
   inc a
   jr nc,dlz77_output_literals
   push hl
     and %00011111
     inc a
     ld c,a
     ld a,b
     or  %00111111  ;set all bits to simulate negative
     rlca
     rlca
     ld L,(hl)
     ld H,a
     add hl,de
     ld b,0
     ldir
   pop hl
   inc hl
dlz77_4_loopend:
 pop bc
 dec bc
 jr dlz77_4_loop
dlz77_output_literals:
   ld c,a
   ld b,0
   ldir
   jr dlz77_4_loopend

thej · 2018-12-04 05:51

There are a few small unpackers here written for 6502/65C02.

ZX7 unpacker in 142 bytes for 65C02
LZ4 unpacker in 136 bytes for 65C02
LZ compressor/decompressor
http://pferrie.host22.com/misc/appleii.htm

j

rogloh · 2018-12-04 06:05

Hi Peter,

Please whatever you decide, keep the memory optional.

If you start to add memory / peripherals to your board it would likely start to require pins. I am designing a nice PCB to make good use of a P2D2 and I am already accessing all the P2 IO pins apart from SD/Flash! It's almost done now.

Was recently looking at that small Quad SPI espressif PSRAM myself as a possible on board RAM solution, plus keep the ability to add other memory types via breakouts etc. What is nice is that its SO-8 footprint is generic and allows other (Q)SPI devices etc.

Update: Yeah maybe your thermal PCB could have the footprint for a memory device..

Peter Jakacki · 2018-12-04 06:18

@jmg - Yes, the EFM8UB3 is only around $1 and seems to be readily available in the same tiny 3x3mm QFN20 as well. Something that I might consider but then I'd need to add a micro USB connector somewhere too unless I cheat and just run to 4 pins, but only if I am desperate.

@thej - thanks for that, I'm converting across the 65C02 code as an exercise, but of course it takes less instructions than the 65C02 although it may take more memory. We'll see.

@rogloh - The RAM will be optional and whatever 11 I/O it uses it will be just like the boot pins in that they are all totally accessible. That was the other option too since the HyperRAM is a fairly large 6x8mm package but the problem is that it would mean not having as thick a copper layer as I would like. But mind you, the RAM could have its own little pcb to stack onto the back of the P2D2.

Peter Jakacki · 2018-12-04 07:07

Ok, I've replaced my 8-pin IDC serial header with a micro B USB connector while still having room on the end strip for direct pin headers for serial. This allows for the USB version of the Busy Bee which at $1 seems quite reasonable. BTW, this chip also monitors temperature, communicates via I2C on P58/P59 and optional can clock the P2 since the 48MHz is specd at 1.5%.

Also the HyperRAM will connect up to P47..P57 so that the connections are very short. I'm trying to fit it in just below the 50mil pads at P52..P57 on the bottom but otherwise it will be mounted on a thin pcb that will sit in the same spot and connect to the 50mil pads.

So it looks like too that I should be able to compress TAQOZ into the ROM and when it is called it will simply run a small assembly routine to decompress the image into RAM. That lets me pack a lot into a little for all P2 chips but the P2D2 will also have the alternate booter that the Busy Bee can load in independent of the Flash or SD.

EDIT: Other changes on the P2D2 include:
* 1M pull-down on SD CS - this makes it easier to detect the card, it is either high or low, no need for a float test.
* P-channel to switch power to the microSD to bypass any weird mode - controlled from Busy Bee via I2C or every reset.

jmg · 2018-12-04 07:48

Peter Jakacki wrote: »

So it looks like too that I should be able to compress TAQOZ into the ROM and when it is called it will simply run a small assembly routine to decompress the image into RAM. That lets me pack a lot into a little for all P2 chips but the P2D2 will also have the alternate booter that the Busy Bee can load in independent of the Flash or SD.

Looking at that compression code I pasted above of

;cccc ccc0 : lits[c+1]
;bbcc ccc1 bbbb bbbb : count c+2 1024-backref

they mention tuning of the second ccccc / bbbbbbbbbb split, to tune the compression.
On P2 code the chunks will all be 32b, so the counts could be all 4x greater reach, and it means any 32b opcode used already (anywhere in previous 1024W), will compress 2:1
A scan of the ROM code should let you find the optimal split of c/b bits for P2 opcodes.

rogloh · 2018-12-04 11:37

Peter, if you are planning on compressing TAQOZ and uncompressing on boot (presumably for revB P2), I have two questions:
1) how much longer will it take to boot?
2) what extra features are you planning to add into the newly freed extra space?

Peter Jakacki · 2018-12-04 12:00

rogloh wrote: »

Peter, if you are planning on compressing TAQOZ and uncompressing on boot (presumably for revB P2), I have two questions:
1) how much longer will it take to boot?
2) what extra features are you planning to add into the newly freed extra space?

At present TAQOZ needs a terminal sequence to become active so "boot time" would seem unimportant but I want the new ROM to fall through to TAQOZ if no boot was found in the timeout period. So even if TAQOZ took a second to boot (although I doubt it would be that long), it wouldn't matter.

The extra features are to do with the SD and FAT32 by allowing files to be renamed, copied, created, loaded, and saved etc. This and also some standalone features where it can run a keyboard and VGA monitor it would be possible to write code without any special tools or even a PC. If I can fit a logic analyser in there too, I will.

Nonetheless, if we had a choice to have more or have less for the same "price", which would we choose?

rogloh · 2018-12-04 12:20

Thanks Peter, sounds very interesting. I quite like the idea of automatically having the ability to fall through to TAQOZ in the case of some timeout, though it probably needs to be safely controlled via strapping/pullup options of some type to keep the now complex boot sequence predictable/sane.

If you do manage to get it to boot to a minimal dev environment with no external PC/serial console needed that would be a great achievement indeed. One tricky part might be in determining/allocating which pins to use for the KB/display to keep things flexible as this would vary with HW implementation.

potatohead · 2018-12-04 12:34

More, of course! Peter, there are basically no downsides.

Peter Jakacki · 2018-12-04 12:40

rogloh wrote: »

Thanks Peter, sounds very interesting. I quite like the idea of automatically having the ability to fall through to TAQOZ in the case of some timeout, though it probably needs to be safely controlled via strapping/pullup options of some type to keep the now complex boot sequence predictable/sane.

If you do manage to get it to boot to a minimal dev environment with no external PC/serial console needed that would be a great achievement indeed. One tricky part might be in determining/allocating which pins to use for the KB/display to keep things flexible as this would vary with HW implementation.

The boot pins are fixed and we work with that. If we have fixed VGA and PS/2 pins for TAQOZ in boot ROM then that shouldn't be a problem either. If a system config is available then it could use that, much like the clock config in Flash or SD that I've mentioned before. So if we have a bare P2 chip that we have placed on a board and for some reason we do not have a PC, we can still talk to that chip and write code for it and on it. Imagine if the majority of PCs were struck by a super StuxCryptoDoomFicker virus or a Winever update solved the problem of the BSOD and made it black instead? Maybe, maybe not, but for no cost we have an independent software development tool that doesn't need a PC to get it up and going and at the very least a real computer on a chip, locked into the chip.

David Betz · 2018-12-04 13:02

Yeah but to talk to your "real computer" you'll probably need a PC running a terminal emulator. When was the last time you saw an actual terminal for sale?

Peter Jakacki · 2018-12-04 13:40

David Betz wrote: »

Yeah but to talk to your "real computer" you'll probably need a PC running a terminal emulator. When was the last time you saw an actual terminal for sale?

? As mentioned in my many posts, I have VGA and keyboard capability built-in, that is what I have been using now for some time, running 5x7 font over a 640x480 8-bpp VGA along with PS/2 keyboard. Of course I could add a mouse too and some basic editing facility. It's much more than DOS ever did but even that needed to be booted from a disk.

David Betz · 2018-12-04 13:46

Peter Jakacki wrote: »

David Betz wrote: »

Yeah but to talk to your "real computer" you'll probably need a PC running a terminal emulator. When was the last time you saw an actual terminal for sale?

? As mentioned in my many posts, I have VGA and keyboard capability built-in, that is what I have been using now for some time, running 5x7 font over a 640x480 8-bpp VGA along with PS/2 keyboard. Of course I could add a mouse too and some basic editing facility. It's much more than DOS ever did but even that needed to be booted from a disk.

Excellent. I'll have to try that sometime. So actually, a P2 with VGA+keyboard could act as a terminal I guess. Do you actually use P2+VGA+keyboard for development?

Peter Jakacki · 2018-12-04 14:21

David Betz wrote: »

Excellent. I'll have to try that sometime. So actually, a P2 with VGA+keyboard could act as a terminal I guess. Do you actually use P2+VGA+keyboard for development?

Well yes, as I am developing the tools that is, so I still need a PC in the meantime

But I can load/run/compile files as standalone and even assemble and program 8051 code for the helper micro. I need to write a simple editor I suppose, but I have quite a few aspects of P2 development on my plate, not just the software. Right now I am making up schematic symbols and pcb footprints for HyperRAM and integrating that into my P2D2 revision, for one.

I also have added single key commands to TAQOZ so that function keys which I represent with hex codes $F1 to $FC and any other 8th bit code will automatically search for a word with the keycode in its name in the TAQOZ dictionary. So if I define a word kFC then when I hit F12 it immediately searches the dictionary for kFC and if found executes it. In this case I just get it to list the files in short format with:

pub kFC  ls ;

While I can view bmp files standalone, I want to be able to play wav files next, which is fairly easy, especially for the P2. Then I can write some simple games such as a VGA version of the Breakout game I've had running on the P1. But I'm not really interested in the games (yeah yeah), but they are a good platform for testing software.

TonyB_ · 2018-12-06 00:00

Peter Jakacki wrote: »

TAQOZ is made up of PASM, HUBEXEC, and 16-bit wordcode as well as a dictionary but I have also added many extras and also the dictionary has expanded. How do I fit that all in to 12k? Of course I could remove some other useless/useful code but that's not enough. However, if I take the binary compiled for testing in RAM and compress it with LZMA it compresses to around 8k. I'm taking a look at what would be required in P2ASM to extract the contents of the archive and load it into RAM? Do you think I could write a decompress in around 2k of assembly?

I've translated the ZX7 decoder (standard version) from Z80 to p2asm and it's only ~50 longs for code and registers. I think (a) most of the ROM should be compressed and (b) smaller size is more important than quicker decompression.

ZX7
http://www.worldofspectrum.org/infoseekid.cgi?id=0027996

rogloh · 2018-12-06 01:44

Decompression makes sense for packing in even more TAQOZ functionality, hopefully it won't be totally slow to perform if running on the RC oscillator which I am presuming it could be during boot. Keeping a one second boot response or thereabouts is nice. Hopefully it won't drag into multiple seconds or anything excessive to unpack 12kB or so.

P2D2, TAQOZ and HyperRAM design decisions

Comments