Dr_Acula: I've written the cache driver for ZOG based on the code you posted. Unfortunately, it doesn't work. It's not surprising that it didn't work the first time. I'm sure I misunderstood your code and transcribed it incorrectly. My question now is this, is there an SRAM test program for the Dracblade that I can use to verify that my SRAM is working correctly? I'd like to rule that out before I spend a lot of time pouring through the code.
My question now is this, is there an SRAM test program for the Dracblade that I can use to verify that my SRAM is working correctly? I'd like to rule that out before I spend a lot of time pouring through the code.
While I'd still love to have an SRAM test, it is no longer necessary. ZOG is now running on the Dracblade!
Looks like you have it working now. I must confess we never did write a ram test to test all locations. Tested three, then went straight into an emulation of a CP/M computer. Looking back, it is amazing that it did all work.
Heater keeps telling me that Zog is this huge monster. I must admit some trepidation on putting such a thing in my Dracblade.
It has always been in the back of my mind to create customized zogs that access external RAM directly on DracBlades and perhaps TriBlades with no overhead of a cache or virtual memory layer.
The idea would be that Zog fetches it's byte codes from external RAM but the stack and data are in HUB. Thus allowing big programs but with a speed gain of data in HUB. Given that Zog instructions are all single bytes this seems to be an excellent match to the RAM interface and might even stand a chance of competing speed wise with Catalina on such boards due to Catalina's 32 bit instruction fetch.
It was this match between ZPU and Ext RAM that got me all excited about the ZPU architecture when I discovered it.
I'm not suggesting we think about this now so much but how easy would it be to get your current Zog build/load system to operate that way?
It has always been in the back of my mind to create customized zogs that access external RAM directly on DracBlades and perhaps TriBlades with no overhead of a cache or virtual memory layer.
The idea would be that Zog fetches it's byte codes from external RAM but the stack and data are in HUB. Thus allowing big programs but with a speed gain of data in HUB. Given that Zog instructions are all single bytes this seems to be an excellent match to the RAM interface and might even stand a chance of competing speed wise with Catalina on such boards due to Catalina's 32 bit instruction fetch.
It was this match between ZPU and Ext RAM that got me all excited about the ZPU architecture when I discovered it.
I'm not suggesting we think about this now so much but how easy would it be to get your current Zog build/load system to operate that way?
This shouldn't be difficult to do. Accessing the SRAM chip is quite easy. Here is the code I stole from Dr_Acula's driver. I'm not sure when I'll have time to try integrating this directly into ZOG but it should be pretty easy to make a first pass by using this to replace the getbyte code and build up words and longs from bytes. It helps that the setting of the address is separate from the reading of the bytes. You could set the address once and loop over the bytes in a word or long. My only concern is if there is enough memory in ZOG for this code.
''From Dracblade driver for talking to a ram chip via three latches'' Modified code from Cluso's triblade'---------------------------------------------------------------------------------------------------------'Memory Access Functions
read_memory_byte call #RamAddress ' sets up the latches with the correct ram addressmovdira,LatchDirection2 ' for reads so P0-P7 tristate till do readmovouta,GateHigh ' actually ReadEnable but they are the sameandnouta,GateHigh ' set gate lownop' short delay to stabilisenopmov data_8, ina' read SRAMand data_8, #$FF' extract 8 bitsorouta,GateHigh ' set the gate high again
read_memory_byte_ret ret
write_memory_byte call #RamAddress ' sets up the latches with the correct ram addressmov outx,data_8 ' get the byte to outputand outx, #$FF' ensure upper bytes=0or outx,WriteEnable ' or with correct 138 addressmovouta,outx ' send it outandnouta,GateHigh ' set gate lownop' no nop doesn't work, one does, so put in two to be surenop' another NOPorouta,GateHigh ' set it high again
write_memory_byte_ret ret
RamAddress ' sets up the ram latches. Assumes high latch A16-A18 low so only accesses 64k of rammovdira,LatchDirection ' set up the pins for programming latch chipsmov outx,address ' get the address into a temp variableand outx,#$FF' mask the low byteor outx,LowAddress ' or with 138 low addressmovouta,outx ' send it outandnouta,GateHigh ' set gate loworouta,GateHigh ' set it high again' now repeat for the middle bytemov outx,address ' get the address into a temp variableshr outx,#8' shift right by 8 placesand outx,#$FF' mask the low byteor outx,MiddleAddress ' or with 138 middle addressmovouta,outx ' send it outandnouta,GateHigh ' set gate loworouta,GateHigh ' set it high again
RamAddress_ret ret
delay long80' waitcnt delay to reduce power (#80 = 1uS approx)
ctr long0' used to pause execution (lower power use) & byte counter
GateHigh long%00000000_00000000_00000001_00000000' HC138 gate high, all others must be low
Outx long0' for temp use, same as n in the spin code
LatchDirection long%00000000_00000000_00001111_11111111' 138 active, gate active and 8 data lines active
LatchDirection2 long%00000000_00000000_00001111_00000000' for reads so data lines are tristate till the read
LowAddress long%00000000_00000000_00000101_00000000' low address latch = xxxx010x and gate high xxxxxxx1
MiddleAddress long%00000000_00000000_00000111_00000000' middle address latch = xxxx011x and gate high xxxxxxx1
HighAddress long%00000000_00000000_00001001_00000000' high address latch = xxxx100x and gate high xxxxxxx1'ReadEnable long %00000000_00000000_00000001_00000000 ' /RD = xxxx000x and gate high xxxxxxx1' commented out as the same as GateHigh
WriteEnable long%00000000_00000000_00000011_00000000' /WE = xxxx001x and gate high xxxxxxx1
data_8 long%00000000_00000000_00000000_00000000' so code compatability with zicog driver
address long%00000000_00000000_00000000_00000000' address for ram chip
ledpin long%00000000_00000000_00000000_00001000' to turn on led
HighLatch long%00000000_00000000_00000000_00000000' static value for the 374 latch that does the led, hA16-A19 and the other 4 outputs
It has always been in the back of my mind to create customized zogs that access external RAM directly on DracBlades and perhaps TriBlades with no overhead of a cache or virtual memory layer.
I didn't realise you were doing it that way? Well of course, just grab the bytes directly from ram. And - like you say, put the stack and data in hub. If you want speed we can do speed!
There are more things we can optimise if you have the code space. eg, if you work in blocks of 256 bytes you don't need to update the middle latch very often. Only when it goes outside that block. It depends on whether you have the code space.
Have you ever got zog running on Cluso's ramblade that accessed all address lines directly?
I'll try merging the SRAM access code with ZOG later tonight. One reason that is somewhat less appealing than the cache driver approach is that the code has to be merged into ZOG using lots of ifdefs. It's easier just to have a single set of ifdefs in ZOG to interface to the JCACHE API and then write lots of JCACHE drivers. I did this first because it was easy to get working.
I was thinking for this case just to have a totally separate zog.spin with only the DracBlade memory interface. It's another version to maintain but as I said before we don't expect changes to the Zog interpreter code itself anymore.
Or we should sort this out using HomeSpun and #include
I'll try merging the SRAM access code with ZOG later tonight. One reason that is somewhat less appealing than the cache driver approach is that the code has to be merged into ZOG using lots of ifdefs. It's easier just to have a single set of ifdefs in ZOG to interface to the JCACHE API and then write lots of JCACHE drivers. I did this first because it was easy to get working.
I just realized that it's a bit of a pain to merge the SRAM code with ZOG. In the original SRAM code there are places where DIRA and OUTA are just set to a value. Since ZOG allows C programs to manipulate unused I/O pins, I can't just jam values into those registers anymore. I have to ANDN with a mask to clear the bits I want to update and then use OR to set the new value. This complicates the code and also slows it down a bit.
Okay, my first attempt at merging Dracblade SRAM access into ZOG.spin was a failure. It overflows COG memory space by 42 longs. Not even close!
Here is my code. Can anyone see a way to shrink it significantly? I haven't even added the code to preserve the non-SRAM bits in DIRA and OUTA and I'm already out of space.
'---------------------------------------------------------------------------------------------------------'Memory Access Functions' Read functions:' address - address in sram' data - return value
read_sram_byte
call #ram_address ' sets up the latches with the correct ram addresscall #read_memory_byte
mov data, data_8
read_sram_byte_ret
ret
read_sram_word
call #ram_address ' sets up the latches with the correct ram addresscall #read_memory_byte
mov data, data_8
shl data, #8call #ram_address_next
call #read_memory_byte
or data, data_8
sub address, #1
read_sram_word_ret
ret
read_sram_long
call #ram_address ' sets up the latches with the correct ram addresscall #read_memory_byte
mov data, data_8
shl data, #8call #ram_address_next
call #read_memory_byte
or data, data_8
shl data, #8call #ram_address_next
call #read_memory_byte
or data, data_8
shl data, #8call #ram_address_next
call #read_memory_byte
or data, data_8
sub address, #3
read_sram_long_ret
ret' Write functions:' address - address in sram' data - data to write
write_sram_byte
call #ram_address ' sets up the latches with the correct ram addressmov data_8, data
call #write_memory_byte
write_sram_byte_ret
ret
write_sram_word
call #ram_address ' sets up the latches with the correct ram addressmov data_8, data
rol data_8, #24call #write_memory_byte
call #ram_address_next
rol data_8, #8call #write_memory_byte
sub address, #1
write_sram_word_ret
ret
write_sram_long
call #ram_address ' sets up the latches with the correct ram addressmov data_8, data
rol data_8, #8call #write_memory_byte
call #ram_address_next
rol data_8, #8call #write_memory_byte
call #ram_address_next
rol data_8, #8call #write_memory_byte
call #ram_address_next
rol data_8, #8call #write_memory_byte
sub address, #3
write_sram_long_ret
ret''From Dracblade driver for talking to a ram chip via three latches'' Modified code from Cluso's triblade
read_memory_byte
movdira,LatchDirectionRd ' for reads so P0-P7 tristate till do readmovouta,ReadEnable ' or with correct 138 addressandnouta,GateHigh ' set gate lownop' short delay to stabilisenopmov data_8, ina' read SRAMand data_8, #$FF' extract 8 bitsorouta,GateHigh ' set the gate high again
read_memory_byte_ret
ret
write_memory_byte
mov outx,data_8 ' get the byte to outputand outx, #$FF' ensure upper bytes=0or outx,WriteEnable ' or with correct 138 addressmovouta,outx ' send it outandnouta,GateHigh ' set gate lownop' no nop doesn't work, one does, so put in two to be surenop' another NOPorouta,GateHigh ' set it high again
write_memory_byte_ret
ret
ram_address
movdira,LatchDirectionWr ' set up the pins for programming latch chips' set bits 23:16mov outx,address ' get the address into a temp variableshr outx,#16' bits 23:16and outx,#$FF' mask the low byteor outx,HighAddress ' or with 138 low addressmovouta,outx ' send it outandnouta,GateHigh ' set gate loworouta,GateHigh ' set it high again' set bits 15:8mov outx,address ' get the address into a temp variableshr outx,#8' bits 15:8and outx,#$FF' mask the low byteor outx,MiddleAddress ' or with 138 middle addressmovouta,outx ' send it outandnouta,GateHigh ' set gate loworouta,GateHigh ' set it high againjmp #ram_address_low
ram_address_next
add address,#1movdira,LatchDirectionWr ' set up the pins for programming latch chips
ram_address_low
' set bits 7:0mov outx,address ' get the address into a temp variableand outx,#$FF' mask the low byteor outx,LowAddress ' or with 138 middle addressmovouta,outx ' send it outandnouta,GateHigh ' set gate loworouta,GateHigh ' set it high again
ram_address_ret
ram_address_next_ret
ret
LatchMask ' same as LatchDirection below ' mask of all bits used by the SDRAM interface
LatchDirectionWr long%00000000_00000000_00001111_11111111' 138 active, gate active and 8 data lines active
LatchDirectionRd long%00000000_00000000_00001111_00000000' for reads so data lines are tristate till the read
GateHigh ' same as ReadEnable below ' HC138 gate high, all others must be low
ReadEnable long%00000000_00000000_00000001_00000000' xxxx000x = /RD and gate high xxxxxxx1
WriteEnable long%00000000_00000000_00000011_00000000' xxxx001x = /WE and gate high xxxxxxx1
LowAddress long%00000000_00000000_00000101_00000000' xxxx010x = low address latch and gate high xxxxxxx1
MiddleAddress long%00000000_00000000_00000111_00000000' xxxx011x = middle address latch and gate high xxxxxxx1
HighAddress long%00000000_00000000_00001001_00000000' xxxx100x = high address latch and gate high xxxxxxx1
data_8 long0' data to read from or write to memory
outx long0' for temp use
I know next to nothing about hardware design but it seems like it would be nice to have the address latches auto-increment. A fair amount of the overhead of reading multiple bytes (like a word or long) from SRAM is setting the address over and over again. Could at least the low order latch be replaced by a counter? This would also make it unnecessary to constantly change the low order DIRA bits from write mode to read mode.
Heater is probably the best person to answer this as you probably have to look at code and think about how often things are run and prioritise code.
We ran into this problem when the original 8080 emulation was ported over to Z80 with more instructions. I think it was Pullmoll that took the idea of LMM code and managed to move almost everything into LMM. It takes a little while to get the skeleton of a LMM working, but once you have that, you just keep adding PASM code without ever worrying about running out of space.
So to clarify, you have code working in a separate cog for the ram access, and now want to merge it into the main cog? Could you post that main zog cog code?
#ifdef USE_JCACHED_MEMORYmov addr, tos
xor addr, #%11'XOR here is an endianess fix.call #zpu_cache 'z flag is set if address is in current bufferif_necall #cache_read 'If address not in current buffer, read new bufferrdbyte tos, memp 'zpu_cache / cache_read leaves address in memp#endif#ifdef USE_VIRTUAL_MEMORY':waitmbox rdlong temp, mboxcmd wz 'If only one client to VMCOG, these first two' if_nz jmp #:waitmbox 'instructions are not neccmov addr, tos
xor addr, #%11'XOR here is an endianess fix.shl addr, #9movs addr, #READVMB
wrlong addr, mboxcmd
:waitres rdlong data, mboxcmd wzif_nzjmp #:waitres
rdbyte tos, mboxdat
#endif#ifdef USE_HUB_MEMORYmov memp, tos
xor memp, #%11'XOR here is an endianess fix.add memp, zpu_memory_addr
rdbyte tos, memp
#endifjmp #done_and_inc_pc
zpu_storeb call #pop
mov memp, tos ' use by this and USE_HUB_MEMORYcmp memp, zpu_hub_start wc'Check for normal memory accessif_cjmp #:next
sub memp, zpu_hub_start
wrbyte data, memp
jmp #zpu_storeb_done
:next
Does the dracblade code replace this or is it as well as this? Just thinking in general terms, a virtual memory driver might take more than an sram driver, but maybe that isn't true?
Dr_Acula: Did you have a chance to look at my Dracblade SRAM code to see if there is any way to trim it down so it has a chance of fitting in ZOG.spin?
I don't think you can make the dracblade sram much smaller. So it is a matter of making savings somewhere else. I'm a bit confused about which code you don't now need. If all the other ifdefs are false, there should be more than enough space?
I'm surprised if the DracBlade RAM driver does not fit into Zog.
Re: "have the address latches auto-increment."
If one is using direct access to RAM from Zog this only makes any sense if one is executing code from external memory but all your data and stack is in HUB. In that case quite long sequences of code would run faster as the address counter auto increments. Loading the counter is only required for jumps. This could be quite a useful speed up.
Is there such a counter chip that can be loaded and used like the current latches but also just clocked to increment without a load? Presumably two or more of these can be cascaded so as to be able to auto-increment seamlessly through the entire address range.
But if one has data and stack in external RAM as well it would not make any/much improvement. Because basically every instruction would fetch an op code from ext RAM and then make one or more accesses to stack or data thus causing an address counter reload. Then it's back to opcode fetch and another counter reload. And so on.
If one is using a memory manager COG perhaps there is a gain in using an address counter as sizeable buffers of data are fetched from ext RAM in bursts. Depends on the memory manager I guess.
But if one has data and stack in external RAM as well it would not make any/much improvement. Because basically every instruction would fetch an op code from ext RAM and then make one or more accesses to stack or data thus causing an address counter reload. Then it's back to opcode fetch and another counter reload. And so on.
It could still help with long and word accesses. As it is now, I have to reload at least the low order address byte on every byte fetch. That's four times to fetch a long. Auto-incrementing could eliminate three of those four loads.
Looks like we would need a counter with parallel load so that it acts like a latch and then counts as required. The only one I can find is the 74LS160 which is only 4 bits wide in a 16 pin package. So it looks it increases the chip count by one.
@David, is it not just a matter of deleting all the other code? (or comment it out?). Can you see how much space is used now, and how much space you have when you start deleting code? You only need to free up about 20 lines of code.
Looks like we would need a counter with parallel load so that it acts like a latch and then counts as required. The only one I can find is the 74LS160 which is only 4 bits wide in a 16 pin package. So it looks it increases the chip count by one.
I know next to nothing about hardware design but it seems like it would be nice to have the address latches auto-increment. A fair amount of the overhead of reading multiple bytes (like a word or long) from SRAM is setting the address over and over again. Could at least the low order latch be replaced by a counter? This would also make it unnecessary to constantly change the low order DIRA bits from write mode to read mode.
You should gain something with a counter. I suspect you would gain a lot more by only incrementing the bottom latch. Possibly nearly twice as fast. It would add a few lines of code to check if the current middle and high latches need to be changed, and most of the time they would not, so then you would only change the lower latch.
Then - possibly you could push the speed even further. I am not sure how quickly zog interprets each byte, but if it is slower (even by a small amount) than it takes to read a byte off external ram, you could prefetch 256 bytes into hub ram. Pass a long to the ram decoder. Decode that to an 18 bit address. Compare the middle and upper byte to the previous request.
1) if different, then load the new addresses then start loading 256 bytes over to hub. Return a true flag once the first one has been loaded.
2) if the same, return a flag to say that hub ram already has the value.
Most of the time it will do 2). The ram code might need to be modified so it checks for requests more often than once per cycle. Say it checks every 5 or 10 lines of code? Or maybe the checking for the new middle and upper blocks happens inside the zog code?
Now, this won't achieve much if zog is faster than the ram. But if it is slower, then by prefetching data it could significantly increase the speed. And it would be a cool use of the parallel abilities of the propeller.
@David, is it not just a matter of deleting all the other code? (or comment it out?). Can you see how much space is used now, and how much space you have when you start deleting code? You only need to free up about 20 lines of code.
I tried adding the Dracblade code that I posted in place of all of the ifdefs in ZOG.spin and that is how I found that I was 40 some longs over the size of COG RAM. I can do the merge again and post the merged code if you'd like.
Maybe it won't fit? We had all sorts of similar problems with the 8080 and Z80 code. I spent a lot of time coding that original ram code with just one long spare, which made debugging very hard as there was no room to put in any debugging code.
Can you spare the extra cog and dedicate it just to ram access?
If so, the upside is you have more code space in which to do clever things like prefetching data. I see some intriguing possibilities.
Sorry for not wading through all of the hundreds of messages in this thread to find this info. I'm sure it must be here somewhere but could someone give me a concise list of which Propeller pins are connected to each Dracblade device? I'm particularly interested in the pins used for the TV, VGA, SD card, PS2, and the second serial port. I'm assuming the first serial port is on P31/P30 as usual. I need these to get those devices working with ZOG.
Sure, no problem
Pins 0-11 = ram access and external latches etc
Pin 12-15 = SD card (12=DO, 13=CLK, 14=DI, 15=CS)
Pin 16-23 =VGA as per the demo board.
Pin 16-18 = TV
Pin 24,25 = second serial port (24=data out)
Pin 24,25 = mouse (shared with second serial port, use a jumper to select)
Pin 26,27 = keyboard
Pin 28,29 = eeprom
Pin 30,31 = programming serial port
Comments
Thanks,
David
While I'd still love to have an SRAM test, it is no longer necessary. ZOG is now running on the Dracblade!
Heater keeps telling me that Zog is this huge monster. I must admit some trepidation on putting such a thing in my Dracblade.
It has always been in the back of my mind to create customized zogs that access external RAM directly on DracBlades and perhaps TriBlades with no overhead of a cache or virtual memory layer.
The idea would be that Zog fetches it's byte codes from external RAM but the stack and data are in HUB. Thus allowing big programs but with a speed gain of data in HUB. Given that Zog instructions are all single bytes this seems to be an excellent match to the RAM interface and might even stand a chance of competing speed wise with Catalina on such boards due to Catalina's 32 bit instruction fetch.
It was this match between ZPU and Ext RAM that got me all excited about the ZPU architecture when I discovered it.
I'm not suggesting we think about this now so much but how easy would it be to get your current Zog build/load system to operate that way?
This shouldn't be difficult to do. Accessing the SRAM chip is quite easy. Here is the code I stole from Dr_Acula's driver. I'm not sure when I'll have time to try integrating this directly into ZOG but it should be pretty easy to make a first pass by using this to replace the getbyte code and build up words and longs from bytes. It helps that the setting of the address is separate from the reading of the bytes. You could set the address once and loop over the bytes in a word or long. My only concern is if there is enough memory in ZOG for this code.
''From Dracblade driver for talking to a ram chip via three latches '' Modified code from Cluso's triblade '--------------------------------------------------------------------------------------------------------- 'Memory Access Functions read_memory_byte call #RamAddress ' sets up the latches with the correct ram address mov dira,LatchDirection2 ' for reads so P0-P7 tristate till do read mov outa,GateHigh ' actually ReadEnable but they are the same andn outa,GateHigh ' set gate low nop ' short delay to stabilise nop mov data_8, ina ' read SRAM and data_8, #$FF ' extract 8 bits or outa,GateHigh ' set the gate high again read_memory_byte_ret ret write_memory_byte call #RamAddress ' sets up the latches with the correct ram address mov outx,data_8 ' get the byte to output and outx, #$FF ' ensure upper bytes=0 or outx,WriteEnable ' or with correct 138 address mov outa,outx ' send it out andn outa,GateHigh ' set gate low nop ' no nop doesn't work, one does, so put in two to be sure nop ' another NOP or outa,GateHigh ' set it high again write_memory_byte_ret ret RamAddress ' sets up the ram latches. Assumes high latch A16-A18 low so only accesses 64k of ram mov dira,LatchDirection ' set up the pins for programming latch chips mov outx,address ' get the address into a temp variable and outx,#$FF ' mask the low byte or outx,LowAddress ' or with 138 low address mov outa,outx ' send it out andn outa,GateHigh ' set gate low or outa,GateHigh ' set it high again ' now repeat for the middle byte mov outx,address ' get the address into a temp variable shr outx,#8 ' shift right by 8 places and outx,#$FF ' mask the low byte or outx,MiddleAddress ' or with 138 middle address mov outa,outx ' send it out andn outa,GateHigh ' set gate low or outa,GateHigh ' set it high again RamAddress_ret ret delay long 80 ' waitcnt delay to reduce power (#80 = 1uS approx) ctr long 0 ' used to pause execution (lower power use) & byte counter GateHigh long %00000000_00000000_00000001_00000000 ' HC138 gate high, all others must be low Outx long 0 ' for temp use, same as n in the spin code LatchDirection long %00000000_00000000_00001111_11111111 ' 138 active, gate active and 8 data lines active LatchDirection2 long %00000000_00000000_00001111_00000000 ' for reads so data lines are tristate till the read LowAddress long %00000000_00000000_00000101_00000000 ' low address latch = xxxx010x and gate high xxxxxxx1 MiddleAddress long %00000000_00000000_00000111_00000000 ' middle address latch = xxxx011x and gate high xxxxxxx1 HighAddress long %00000000_00000000_00001001_00000000 ' high address latch = xxxx100x and gate high xxxxxxx1 'ReadEnable long %00000000_00000000_00000001_00000000 ' /RD = xxxx000x and gate high xxxxxxx1 ' commented out as the same as GateHigh WriteEnable long %00000000_00000000_00000011_00000000 ' /WE = xxxx001x and gate high xxxxxxx1 data_8 long %00000000_00000000_00000000_00000000 ' so code compatability with zicog driver address long %00000000_00000000_00000000_00000000 ' address for ram chip ledpin long %00000000_00000000_00000000_00001000 ' to turn on led HighLatch long %00000000_00000000_00000000_00000000 ' static value for the 374 latch that does the led, hA16-A19 and the other 4 outputs
I didn't realise you were doing it that way? Well of course, just grab the bytes directly from ram. And - like you say, put the stack and data in hub. If you want speed we can do speed!
There are more things we can optimise if you have the code space. eg, if you work in blocks of 256 bytes you don't need to update the middle latch very often. Only when it goes outside that block. It depends on whether you have the code space.
Have you ever got zog running on Cluso's ramblade that accessed all address lines directly?
I was thinking for this case just to have a totally separate zog.spin with only the DracBlade memory interface. It's another version to maintain but as I said before we don't expect changes to the Zog interpreter code itself anymore.
Or we should sort this out using HomeSpun and #include
Here is my code. Can anyone see a way to shrink it significantly? I haven't even added the code to preserve the non-SRAM bits in DIRA and OUTA and I'm already out of space.
'--------------------------------------------------------------------------------------------------------- 'Memory Access Functions ' Read functions: ' address - address in sram ' data - return value read_sram_byte call #ram_address ' sets up the latches with the correct ram address call #read_memory_byte mov data, data_8 read_sram_byte_ret ret read_sram_word call #ram_address ' sets up the latches with the correct ram address call #read_memory_byte mov data, data_8 shl data, #8 call #ram_address_next call #read_memory_byte or data, data_8 sub address, #1 read_sram_word_ret ret read_sram_long call #ram_address ' sets up the latches with the correct ram address call #read_memory_byte mov data, data_8 shl data, #8 call #ram_address_next call #read_memory_byte or data, data_8 shl data, #8 call #ram_address_next call #read_memory_byte or data, data_8 shl data, #8 call #ram_address_next call #read_memory_byte or data, data_8 sub address, #3 read_sram_long_ret ret ' Write functions: ' address - address in sram ' data - data to write write_sram_byte call #ram_address ' sets up the latches with the correct ram address mov data_8, data call #write_memory_byte write_sram_byte_ret ret write_sram_word call #ram_address ' sets up the latches with the correct ram address mov data_8, data rol data_8, #24 call #write_memory_byte call #ram_address_next rol data_8, #8 call #write_memory_byte sub address, #1 write_sram_word_ret ret write_sram_long call #ram_address ' sets up the latches with the correct ram address mov data_8, data rol data_8, #8 call #write_memory_byte call #ram_address_next rol data_8, #8 call #write_memory_byte call #ram_address_next rol data_8, #8 call #write_memory_byte call #ram_address_next rol data_8, #8 call #write_memory_byte sub address, #3 write_sram_long_ret ret ''From Dracblade driver for talking to a ram chip via three latches '' Modified code from Cluso's triblade read_memory_byte mov dira,LatchDirectionRd ' for reads so P0-P7 tristate till do read mov outa,ReadEnable ' or with correct 138 address andn outa,GateHigh ' set gate low nop ' short delay to stabilise nop mov data_8, ina ' read SRAM and data_8, #$FF ' extract 8 bits or outa,GateHigh ' set the gate high again read_memory_byte_ret ret write_memory_byte mov outx,data_8 ' get the byte to output and outx, #$FF ' ensure upper bytes=0 or outx,WriteEnable ' or with correct 138 address mov outa,outx ' send it out andn outa,GateHigh ' set gate low nop ' no nop doesn't work, one does, so put in two to be sure nop ' another NOP or outa,GateHigh ' set it high again write_memory_byte_ret ret ram_address mov dira,LatchDirectionWr ' set up the pins for programming latch chips ' set bits 23:16 mov outx,address ' get the address into a temp variable shr outx,#16 ' bits 23:16 and outx,#$FF ' mask the low byte or outx,HighAddress ' or with 138 low address mov outa,outx ' send it out andn outa,GateHigh ' set gate low or outa,GateHigh ' set it high again ' set bits 15:8 mov outx,address ' get the address into a temp variable shr outx,#8 ' bits 15:8 and outx,#$FF ' mask the low byte or outx,MiddleAddress ' or with 138 middle address mov outa,outx ' send it out andn outa,GateHigh ' set gate low or outa,GateHigh ' set it high again jmp #ram_address_low ram_address_next add address,#1 mov dira,LatchDirectionWr ' set up the pins for programming latch chips ram_address_low ' set bits 7:0 mov outx,address ' get the address into a temp variable and outx,#$FF ' mask the low byte or outx,LowAddress ' or with 138 middle address mov outa,outx ' send it out andn outa,GateHigh ' set gate low or outa,GateHigh ' set it high again ram_address_ret ram_address_next_ret ret LatchMask ' same as LatchDirection below ' mask of all bits used by the SDRAM interface LatchDirectionWr long %00000000_00000000_00001111_11111111 ' 138 active, gate active and 8 data lines active LatchDirectionRd long %00000000_00000000_00001111_00000000 ' for reads so data lines are tristate till the read GateHigh ' same as ReadEnable below ' HC138 gate high, all others must be low ReadEnable long %00000000_00000000_00000001_00000000 ' xxxx000x = /RD and gate high xxxxxxx1 WriteEnable long %00000000_00000000_00000011_00000000 ' xxxx001x = /WE and gate high xxxxxxx1 LowAddress long %00000000_00000000_00000101_00000000 ' xxxx010x = low address latch and gate high xxxxxxx1 MiddleAddress long %00000000_00000000_00000111_00000000 ' xxxx011x = middle address latch and gate high xxxxxxx1 HighAddress long %00000000_00000000_00001001_00000000 ' xxxx100x = high address latch and gate high xxxxxxx1 data_8 long 0 ' data to read from or write to memory outx long 0 ' for temp use
We ran into this problem when the original 8080 emulation was ported over to Z80 with more instructions. I think it was Pullmoll that took the idea of LMM code and managed to move almost everything into LMM. It takes a little while to get the skeleton of a LMM working, but once you have that, you just keep adding PASM code without ever worrying about running out of space.
So to clarify, you have code working in a separate cog for the ram access, and now want to merge it into the main cog? Could you post that main zog cog code?
Here it is.
#ifdef USE_JCACHED_MEMORY mov addr, tos xor addr, #%11 'XOR here is an endianess fix. call #zpu_cache 'z flag is set if address is in current buffer if_ne call #cache_read 'If address not in current buffer, read new buffer rdbyte tos, memp 'zpu_cache / cache_read leaves address in memp #endif #ifdef USE_VIRTUAL_MEMORY ':waitmbox rdlong temp, mboxcmd wz 'If only one client to VMCOG, these first two ' if_nz jmp #:waitmbox 'instructions are not necc mov addr, tos xor addr, #%11 'XOR here is an endianess fix. shl addr, #9 movs addr, #READVMB wrlong addr, mboxcmd :waitres rdlong data, mboxcmd wz if_nz jmp #:waitres rdbyte tos, mboxdat #endif #ifdef USE_HUB_MEMORY mov memp, tos xor memp, #%11 'XOR here is an endianess fix. add memp, zpu_memory_addr rdbyte tos, memp #endif jmp #done_and_inc_pc zpu_storeb call #pop mov memp, tos ' use by this and USE_HUB_MEMORY cmp memp, zpu_hub_start wc 'Check for normal memory access if_c jmp #:next sub memp, zpu_hub_start wrbyte data, memp jmp #zpu_storeb_done :next
Does the dracblade code replace this or is it as well as this? Just thinking in general terms, a virtual memory driver might take more than an sram driver, but maybe that isn't true?
Re: "have the address latches auto-increment."
If one is using direct access to RAM from Zog this only makes any sense if one is executing code from external memory but all your data and stack is in HUB. In that case quite long sequences of code would run faster as the address counter auto increments. Loading the counter is only required for jumps. This could be quite a useful speed up.
Is there such a counter chip that can be loaded and used like the current latches but also just clocked to increment without a load? Presumably two or more of these can be cascaded so as to be able to auto-increment seamlessly through the entire address range.
But if one has data and stack in external RAM as well it would not make any/much improvement. Because basically every instruction would fetch an op code from ext RAM and then make one or more accesses to stack or data thus causing an address counter reload. Then it's back to opcode fetch and another counter reload. And so on.
If one is using a memory manager COG perhaps there is a gain in using an address counter as sizeable buffers of data are fetched from ext RAM in bursts. Depends on the memory manager I guess.
Looks like we would need a counter with parallel load so that it acts like a latch and then counts as required. The only one I can find is the 74LS160 which is only 4 bits wide in a 16 pin package. So it looks it increases the chip count by one.
You should gain something with a counter. I suspect you would gain a lot more by only incrementing the bottom latch. Possibly nearly twice as fast. It would add a few lines of code to check if the current middle and high latches need to be changed, and most of the time they would not, so then you would only change the lower latch.
Then - possibly you could push the speed even further. I am not sure how quickly zog interprets each byte, but if it is slower (even by a small amount) than it takes to read a byte off external ram, you could prefetch 256 bytes into hub ram. Pass a long to the ram decoder. Decode that to an 18 bit address. Compare the middle and upper byte to the previous request.
1) if different, then load the new addresses then start loading 256 bytes over to hub. Return a true flag once the first one has been loaded.
2) if the same, return a flag to say that hub ram already has the value.
Most of the time it will do 2). The ram code might need to be modified so it checks for requests more often than once per cycle. Say it checks every 5 or 10 lines of code? Or maybe the checking for the new middle and upper blocks happens inside the zog code?
Now, this won't achieve much if zog is faster than the ram. But if it is slower, then by prefetching data it could significantly increase the speed. And it would be a cool use of the parallel abilities of the propeller.
Can you spare the extra cog and dedicate it just to ram access?
If so, the upside is you have more code space in which to do clever things like prefetching data. I see some intriguing possibilities.
Thanks,
David
Pins 0-11 = ram access and external latches etc
Pin 12-15 = SD card (12=DO, 13=CLK, 14=DI, 15=CS)
Pin 16-23 =VGA as per the demo board.
Pin 16-18 = TV
Pin 24,25 = second serial port (24=data out)
Pin 24,25 = mouse (shared with second serial port, use a jumper to select)
Pin 26,27 = keyboard
Pin 28,29 = eeprom
Pin 30,31 = programming serial port