This is just an update on the state of Tachyon in preparation for the TAQOZ BOOT ROM version. Since booting from SD cards is highly desirable I have been spending quite a bit of time on the do's and don'ts and whatif's and finer points of SD cards in SPI mode.
Single sector read speed over 64KB takes 105ms or around 618kB/second so it would be possible to do a full 512kB load in under a second. However I'm sure that if I use multi-block read that this can be read in much faster. To fully initialize the card and mount the file system takes 125ms first time and that's with my optimizations. Yet with a 64GB card that only takes 98ms, but still in the same ballpark. So even if the user code required a 512k load and even at the single block mode read we can still be up and running in under a second. Final silicon will run faster and besides multiblock read I may even use smartpins with a software mode fallback.
TAQOZ# FOPEN WEBSTERS.TXT...opened at 0009.61C0 ok
TAQOZ# $3.0000 64 KB LAP FREAD LAP .LAP 8450497 cycles = 105.631ms ok
TAQOZ# $2.0000 256 KB LAP FREAD LAP .LAP 33435841 cycles = 417.948ms ok
Perhaps I should start up a TAQOZ thread specifically for that version
I spend some time while doing my failed RAISD project, comparing and reusing KYE's Fat_Engine and FSRW.
I am still very impressed by @Lonesock's block-driver used in FSRW. Besides using multi-block reads he implemented read ahead and write behind keeping a sector buffer in the COG.
Really enjoyable PASM code. Somewhere in it you also find @kuronekos's handwriting, a very clever way to do fast HUB-COG or COG-HUB transfers.
On a P1 this is the fastest code I found, running over 1200kB/second on read and respectable 980kB/second on write. The read ahead and write behind made a big difference to Kye's Fat_Engine.
Using the SD-block-driver-Cog as parallel running process, reading the next sector in advance, even if not asked for and write-behind. Copy to COG, report success (you have the content of the buffer in cog ram), write it to SD while the caller prepares the next buffer in HUB.
really cool code. Anybody heard of @lonesock latly?
Just tried multiple block reads and the results look good:
TAQOZ# $2.0000 256 KB LAP FREAD LAP .LAP 21455057 cycles = 268.188ms ok
That's 268ms vs 418ms using single block reads reading 256kB or just shy of 1MB/sec.
So a full 512K can be loaded in just over 600ms including mounting a fresh card and the point is that this is at 80MHz and bit-bash mode only. At 160MHz this will barely be noticeable especially if the user code is a lot less. Next comes the smartpin mode.
BTW, these SD routines aren't using a special cog, just the same one that the TAQOZ console runs from.
I'm working on getting v28 out. We have an enhanced XORO32 and a new ONES instruction which counts the '1's in a register. Also, I added more status bits to GETINT and SETBRK. Nothing big, but I've been tidying up the Verilog as I go, trying to get everything into shape for final synthesis.
How is the SD loader going to know if there actually is a crystal and what the highest PLL multiplier it can safely use is? Will it start in slow mode and then switch to fast mode after reading clkset info off of the SD card after mounting the card but before loading the program?
Everything seems to be working well on V28 so I have just been tidying up the code while getting ready to squeeze a boot ROM version of it into around 12kB or so.
One of my tests with files is to do a simple lookup, in this case the zipcode. This is only a very rudimentary lookup but suitable for a demonstration. The longest search takes about 2 seconds. The code is rough and ready and if I was doing this a lot I would create a record handling extension vocabulary to simplify doing this (as if it were a new language).
Usage: ( response in red ) TAQOZ# ZIP Knoxville GA 31050 ok TAQOZ# ZIP Londonderry NH 03053 ok TAQOZ# ZIP Canton NY 13617 ok
BTW, it's an old ZIPCODE file.
{
Simple ZIPCODE lookup - type in town and state to display zipcode
ZIPCODE.TXT file format
00.0000: 0D 0A 41 62 62 65 76 69 6C 6C 65 20 20 20 20 20 ..Abbeville
00.0010: 20 20 20 20 41 4C 20 33 36 33 31 30 0D 0A 41 62 AL 36310..Ab
Usage:
TAQOZ# ZIP Knoxville GA 31050 ok
TAQOZ# ZIP Londonderry NH 03053 ok
TAQOZ# ZIP Canton NY 13617 ok
}
18 BYTES town
pre ZIP ( <town> <state> -- )
?MOUNT
" ZIPCODES.TXT" FOPEN$ NOT IF PRINT" No ZIP file " EXIT THEN
town 17 + C~ town 17 $20 FILL
--- Town
GETWORD town OVER LEN$ CMOVE
--- State (2 characters) and leave on stack as a 16-bit word
GETWORD W@
--- locate state first as 2 characters offset from start of file
20 BEGIN 2DUP FSW@ <> WHILE 28 + REPEAT NIP
--- Found state - now point back to town field and search
SPACE 18 - ( @town )
--- quick search based on first 4 chars <> next record
town @ SWAP BEGIN 2DUP FS@ <> WHILE 28 + REPEAT NIP
--- matched first 4 characters, now match as strings
town 4+ SWAP 4+ ( town+4 rcdptr )
--- terminate rcd (temp) then compare as string
BEGIN 2DUP FSADR DUP 13 + C~ COMPARE$ 0= WHILE 28 + REPEAT NIP
--- found it ( rcdptr ) type out zip code field
17 + FSADR 5 CTYPE
;
FSADR buffers the correct sector for the 32-bit FileSystem Address of the opened file and returns with the physical address in hub RAM
FSW@ uses FSADR but returns with a 16-bit word
FS@ same as FSW@ but fetches a 32-bit long
Comments
Single sector read speed over 64KB takes 105ms or around 618kB/second so it would be possible to do a full 512kB load in under a second. However I'm sure that if I use multi-block read that this can be read in much faster. To fully initialize the card and mount the file system takes 125ms first time and that's with my optimizations. Yet with a 64GB card that only takes 98ms, but still in the same ballpark. So even if the user code required a 512k load and even at the single block mode read we can still be up and running in under a second. Final silicon will run faster and besides multiblock read I may even use smartpins with a software mode fallback.
Perhaps I should start up a TAQOZ thread specifically for that version
I am still very impressed by @Lonesock's block-driver used in FSRW. Besides using multi-block reads he implemented read ahead and write behind keeping a sector buffer in the COG.
Really enjoyable PASM code. Somewhere in it you also find @kuronekos's handwriting, a very clever way to do fast HUB-COG or COG-HUB transfers.
On a P1 this is the fastest code I found, running over 1200kB/second on read and respectable 980kB/second on write. The read ahead and write behind made a big difference to Kye's Fat_Engine.
Using the SD-block-driver-Cog as parallel running process, reading the next sector in advance, even if not asked for and write-behind. Copy to COG, report success (you have the content of the buffer in cog ram), write it to SD while the caller prepares the next buffer in HUB.
really cool code. Anybody heard of @lonesock latly?
Mike
So a full 512K can be loaded in just over 600ms including mounting a fresh card and the point is that this is at 80MHz and bit-bash mode only. At 160MHz this will barely be noticeable especially if the user code is a lot less. Next comes the smartpin mode.
BTW, these SD routines aren't using a special cog, just the same one that the TAQOZ console runs from.
I'm working on getting v28 out. We have an enhanced XORO32 and a new ONES instruction which counts the '1's in a register. Also, I added more status bits to GETINT and SETBRK. Nothing big, but I've been tidying up the Verilog as I go, trying to get everything into shape for final synthesis.
One of my tests with files is to do a simple lookup, in this case the zipcode. This is only a very rudimentary lookup but suitable for a demonstration. The longest search takes about 2 seconds. The code is rough and ready and if I was doing this a lot I would create a record handling extension vocabulary to simplify doing this (as if it were a new language).
Usage: ( response in red )
TAQOZ# ZIP Knoxville GA 31050 ok
TAQOZ# ZIP Londonderry NH 03053 ok
TAQOZ# ZIP Canton NY 13617 ok
BTW, it's an old ZIPCODE file.
FSADR buffers the correct sector for the 32-bit FileSystem Address of the opened file and returns with the physical address in hub RAM
FSW@ uses FSADR but returns with a 16-bit word
FS@ same as FSW@ but fetches a 32-bit long