Dracblade SBC with Catalina C, PropBasic, CP/M, MP/M, TRS80, wireless network,

David Betz · 2011-02-02 04:59

Thanks! This will be quite helpful. I'll try to make a JCACHE interface for this so it can be used by ZOG.

David Betz · 2011-02-02 18:02

Dr_Acula: I've written the cache driver for ZOG based on the code you posted. Unfortunately, it doesn't work. It's not surprising that it didn't work the first time. I'm sure I misunderstood your code and transcribed it incorrectly. My question now is this, is there an SRAM test program for the Dracblade that I can use to verify that my SRAM is working correctly? I'd like to rule that out before I spend a lot of time pouring through the code.

Thanks,
David

David Betz · 2011-02-02 18:27

David Betz wrote: »

My question now is this, is there an SRAM test program for the Dracblade that I can use to verify that my SRAM is working correctly? I'd like to rule that out before I spend a lot of time pouring through the code.

While I'd still love to have an SRAM test, it is no longer necessary. ZOG is now running on the Dracblade!

Dr_Acula · 2011-02-02 19:54

Looks like you have it working now. I must confess we never did write a ram test to test all locations. Tested three, then went straight into an emulation of a CP/M computer. Looking back, it is amazing that it did all work.

Heater keeps telling me that Zog is this huge monster. I must admit some trepidation on putting such a thing in my Dracblade.

Heater. · 2011-02-03 02:32

David,

It has always been in the back of my mind to create customized zogs that access external RAM directly on DracBlades and perhaps TriBlades with no overhead of a cache or virtual memory layer.

The idea would be that Zog fetches it's byte codes from external RAM but the stack and data are in HUB. Thus allowing big programs but with a speed gain of data in HUB. Given that Zog instructions are all single bytes this seems to be an excellent match to the RAM interface and might even stand a chance of competing speed wise with Catalina on such boards due to Catalina's 32 bit instruction fetch.

It was this match between ZPU and Ext RAM that got me all excited about the ZPU architecture when I discovered it.

I'm not suggesting we think about this now so much but how easy would it be to get your current Zog build/load system to operate that way?

David Betz · 2011-02-03 04:16

Heater. wrote: »

David,

It has always been in the back of my mind to create customized zogs that access external RAM directly on DracBlades and perhaps TriBlades with no overhead of a cache or virtual memory layer.

The idea would be that Zog fetches it's byte codes from external RAM but the stack and data are in HUB. Thus allowing big programs but with a speed gain of data in HUB. Given that Zog instructions are all single bytes this seems to be an excellent match to the RAM interface and might even stand a chance of competing speed wise with Catalina on such boards due to Catalina's 32 bit instruction fetch.

It was this match between ZPU and Ext RAM that got me all excited about the ZPU architecture when I discovered it.

I'm not suggesting we think about this now so much but how easy would it be to get your current Zog build/load system to operate that way?

This shouldn't be difficult to do. Accessing the SRAM chip is quite easy. Here is the code I stole from Dr_Acula's driver. I'm not sure when I'll have time to try integrating this directly into ZOG but it should be pretty easy to make a first pass by using this to replace the getbyte code and build up words and longs from bytes. It helps that the setting of the address is separate from the reading of the bytes. You could set the address once and loop over the bytes in a word or long. My only concern is if there is enough memory in ZOG for this code.

''From Dracblade driver for talking to a ram chip via three latches
'' Modified code from Cluso's triblade

'---------------------------------------------------------------------------------------------------------
'Memory Access Functions

read_memory_byte        call #RamAddress                ' sets up the latches with the correct ram address
                        mov dira,LatchDirection2        ' for reads so P0-P7 tristate till do read
                        mov outa,GateHigh               ' actually ReadEnable but they are the same
                        andn outa,GateHigh              ' set gate low
                        nop                             ' short delay to stabilise
                        nop
                        mov data_8, ina                 ' read SRAM
                        and data_8, #$FF                ' extract 8 bits
                        or  outa,GateHigh               ' set the gate high again
read_memory_byte_ret    ret

write_memory_byte       call #RamAddress                ' sets up the latches with the correct ram address
                        mov outx,data_8                 ' get the byte to output
                        and outx, #$FF                  ' ensure upper bytes=0
                        or outx,WriteEnable             ' or with correct 138 address
                        mov outa,outx                   ' send it out
                        andn outa,GateHigh              ' set gate low
                        nop                             ' no nop doesn't work, one does, so put in two to be sure
                        nop                             ' another NOP
                        or outa,GateHigh                ' set it high again
write_memory_byte_ret   ret

RamAddress ' sets up the ram latches. Assumes high latch A16-A18 low so only accesses 64k of ram
                        mov dira,LatchDirection         ' set up the pins for programming latch chips
                        mov outx,address                ' get the address into a temp variable
                        and outx,#$FF                   ' mask the low byte
                        or  outx,LowAddress             ' or with 138 low address
                        mov outa,outx                   ' send it out
                        andn outa,GateHigh              ' set gate low
                        or outa,GateHigh                ' set it high again
                                                        ' now repeat for the middle byte
                        mov outx,address                ' get the address into a temp variable
                        shr outx,#8                     ' shift right by 8 places
                        and outx,#$FF                   ' mask the low byte
                        or  outx,MiddleAddress          ' or with 138 middle address
                        mov outa,outx                   ' send it out
                        andn outa,GateHigh              ' set gate low
                        or outa,GateHigh                ' set it high again
RamAddress_ret          ret

delay                   long    80                                    ' waitcnt delay to reduce power (#80 = 1uS approx)
ctr                     long    0                                     ' used to pause execution (lower power use) & byte counter
GateHigh                long    %00000000_00000000_00000001_00000000  ' HC138 gate high, all others must be low
Outx                    long    0                                     ' for temp use, same as n in the spin code
LatchDirection          long    %00000000_00000000_00001111_11111111 ' 138 active, gate active and 8 data lines active
LatchDirection2         long    %00000000_00000000_00001111_00000000 ' for reads so data lines are tristate till the read
LowAddress              long    %00000000_00000000_00000101_00000000 ' low address latch = xxxx010x and gate high xxxxxxx1
MiddleAddress           long    %00000000_00000000_00000111_00000000 ' middle address latch = xxxx011x and gate high xxxxxxx1
HighAddress             long    %00000000_00000000_00001001_00000000 ' high address latch = xxxx100x and gate high xxxxxxx1
'ReadEnable long    %00000000_00000000_00000001_00000000 ' /RD = xxxx000x and gate high xxxxxxx1
                                                        ' commented out as the same as GateHigh
WriteEnable             long    %00000000_00000000_00000011_00000000 ' /WE = xxxx001x and gate high xxxxxxx1
data_8                  long    %00000000_00000000_00000000_00000000 ' so code compatability with zicog driver
address                 long    %00000000_00000000_00000000_00000000 ' address for ram chip
ledpin                  long    %00000000_00000000_00000000_00001000 ' to turn on led
HighLatch               long    %00000000_00000000_00000000_00000000 ' static value for the 374 latch that does the led, hA16-A19 and the other 4 outputs

Dr_Acula · 2011-02-03 04:55

It has always been in the back of my mind to create customized zogs that access external RAM directly on DracBlades and perhaps TriBlades with no overhead of a cache or virtual memory layer.

I didn't realise you were doing it that way? Well of course, just grab the bytes directly from ram. And - like you say, put the stack and data in hub. If you want speed we can do speed!

There are more things we can optimise if you have the code space. eg, if you work in blocks of 256 bytes you don't need to update the middle latch very often. Only when it goes outside that block. It depends on whether you have the code space.

Have you ever got zog running on Cluso's ramblade that accessed all address lines directly?

David Betz · 2011-02-03 05:13

I'll try merging the SRAM access code with ZOG later tonight. One reason that is somewhat less appealing than the cache driver approach is that the code has to be merged into ZOG using lots of ifdefs. It's easier just to have a single set of ifdefs in ZOG to interface to the JCACHE API and then write lots of JCACHE drivers. I did this first because it was easy to get working.

Heater. · 2011-02-03 06:11

Yep #ifdef soup is a pain.

I was thinking for this case just to have a totally separate zog.spin with only the DracBlade memory interface. It's another version to maintain but as I said before we don't expect changes to the Zog interpreter code itself anymore.

Or we should sort this out using HomeSpun and #include

Dr_Acula · 2011-02-03 14:12

If the zog interpreter is now stable then a separate zog.spin would be most appreciated!

David Betz · 2011-02-03 14:59

David Betz wrote: »

I'll try merging the SRAM access code with ZOG later tonight. One reason that is somewhat less appealing than the cache driver approach is that the code has to be merged into ZOG using lots of ifdefs. It's easier just to have a single set of ifdefs in ZOG to interface to the JCACHE API and then write lots of JCACHE drivers. I did this first because it was easy to get working.

I just realized that it's a bit of a pain to merge the SRAM code with ZOG. In the original SRAM code there are places where DIRA and OUTA are just set to a value. Since ZOG allows C programs to manipulate unused I/O pins, I can't just jam values into those registers anymore. I have to ANDN with a mask to clear the bits I want to update and then use OR to set the new value. This complicates the code and also slows it down a bit.

David Betz · 2011-02-03 15:25

Okay, my first attempt at merging Dracblade SRAM access into ZOG.spin was a failure. It overflows COG memory space by 42 longs. Not even close!

Here is my code. Can anyone see a way to shrink it significantly? I haven't even added the code to preserve the non-SRAM bits in DIRA and OUTA and I'm already out of space.

'---------------------------------------------------------------------------------------------------------
'Memory Access Functions

' Read functions:
' address - address in sram
' data - return value

read_sram_byte
        call #ram_address               ' sets up the latches with the correct ram address
        call #read_memory_byte
        mov data, data_8
read_sram_byte_ret
        ret

read_sram_word
        call #ram_address               ' sets up the latches with the correct ram address
        call #read_memory_byte
        mov data, data_8
        shl data, #8
        call #ram_address_next
        call #read_memory_byte
        or data, data_8
        sub address, #1
read_sram_word_ret
        ret

read_sram_long
        call #ram_address               ' sets up the latches with the correct ram address
        call #read_memory_byte
        mov data, data_8
        shl data, #8
        call #ram_address_next
        call #read_memory_byte
        or data, data_8
        shl data, #8
        call #ram_address_next
        call #read_memory_byte
        or data, data_8
        shl data, #8
        call #ram_address_next
        call #read_memory_byte
        or data, data_8
        sub address, #3
read_sram_long_ret
        ret

' Write functions:
' address - address in sram
' data - data to write

write_sram_byte
        call #ram_address               ' sets up the latches with the correct ram address
        mov data_8, data
        call #write_memory_byte
write_sram_byte_ret
        ret

write_sram_word
        call #ram_address               ' sets up the latches with the correct ram address
        mov data_8, data
        rol data_8, #24
        call #write_memory_byte
        call #ram_address_next
        rol data_8, #8
        call #write_memory_byte
        sub address, #1
write_sram_word_ret
        ret

write_sram_long
        call #ram_address               ' sets up the latches with the correct ram address
        mov data_8, data
        rol data_8, #8
        call #write_memory_byte
        call #ram_address_next
        rol data_8, #8
        call #write_memory_byte
        call #ram_address_next
        rol data_8, #8
        call #write_memory_byte
        call #ram_address_next
        rol data_8, #8
        call #write_memory_byte
        sub address, #3
write_sram_long_ret
        ret

''From Dracblade driver for talking to a ram chip via three latches
'' Modified code from Cluso's triblade

read_memory_byte
        mov dira,LatchDirectionRd       ' for reads so P0-P7 tristate till do read
        mov outa,ReadEnable             ' or with correct 138 address
        andn outa,GateHigh              ' set gate low
        nop                             ' short delay to stabilise
        nop
        mov data_8, ina                 ' read SRAM
        and data_8, #$FF                ' extract 8 bits
        or  outa,GateHigh               ' set the gate high again
read_memory_byte_ret
        ret

write_memory_byte
        mov outx,data_8                 ' get the byte to output
        and outx, #$FF                  ' ensure upper bytes=0
        or outx,WriteEnable             ' or with correct 138 address
        mov outa,outx                   ' send it out
        andn outa,GateHigh              ' set gate low
        nop                             ' no nop doesn't work, one does, so put in two to be sure
        nop                             ' another NOP
        or outa,GateHigh                ' set it high again
write_memory_byte_ret
        ret

ram_address

        mov dira,LatchDirectionWr       ' set up the pins for programming latch chips

        ' set bits 23:16
        mov outx,address                ' get the address into a temp variable
        shr outx,#16                    ' bits 23:16
        and outx,#$FF                   ' mask the low byte
        or  outx,HighAddress            ' or with 138 low address
        mov outa,outx                   ' send it out
        andn outa,GateHigh              ' set gate low
        or outa,GateHigh                ' set it high again
                        
        ' set bits 15:8
        mov outx,address                ' get the address into a temp variable
        shr outx,#8                     ' bits 15:8
        and outx,#$FF                   ' mask the low byte
        or  outx,MiddleAddress          ' or with 138 middle address
        mov outa,outx                   ' send it out
        andn outa,GateHigh              ' set gate low
        or outa,GateHigh                ' set it high again
        jmp #ram_address_low
                        
ram_address_next

        add address,#1
        mov dira,LatchDirectionWr       ' set up the pins for programming latch chips

ram_address_low

        ' set bits 7:0
        mov outx,address                ' get the address into a temp variable
        and outx,#$FF                   ' mask the low byte
        or  outx,LowAddress             ' or with 138 middle address
        mov outa,outx                   ' send it out
        andn outa,GateHigh              ' set gate low
        or outa,GateHigh                ' set it high again
        
ram_address_ret
ram_address_next_ret
        ret

LatchMask               ' same as LatchDirection below               ' mask of all bits used by the SDRAM interface
LatchDirectionWr        long    %00000000_00000000_00001111_11111111 ' 138 active, gate active and 8 data lines active
LatchDirectionRd        long    %00000000_00000000_00001111_00000000 ' for reads so data lines are tristate till the read
GateHigh                ' same as ReadEnable below                   ' HC138 gate high, all others must be low
ReadEnable              long    %00000000_00000000_00000001_00000000 ' xxxx000x = /RD and gate high xxxxxxx1
WriteEnable             long    %00000000_00000000_00000011_00000000 ' xxxx001x = /WE and gate high xxxxxxx1
LowAddress              long    %00000000_00000000_00000101_00000000 ' xxxx010x = low address latch and gate high xxxxxxx1
MiddleAddress           long    %00000000_00000000_00000111_00000000 ' xxxx011x = middle address latch and gate high xxxxxxx1
HighAddress             long    %00000000_00000000_00001001_00000000 ' xxxx100x = high address latch and gate high xxxxxxx1

data_8                  long    0                                    ' data to read from or write to memory
outx                    long    0                                    ' for temp use

David Betz · 2011-02-03 15:50

I know next to nothing about hardware design but it seems like it would be nice to have the address latches auto-increment. A fair amount of the overhead of reading multiple bytes (like a word or long) from SRAM is setting the address over and over again. Could at least the low order latch be replaced by a counter? This would also make it unnecessary to constantly change the low order DIRA bits from write mode to read mode.

Dr_Acula · 2011-02-03 15:53

Heater is probably the best person to answer this as you probably have to look at code and think about how often things are run and prioritise code.

We ran into this problem when the original 8080 emulation was ported over to Z80 with more instructions. I think it was Pullmoll that took the idea of LMM code and managed to move almost everything into LMM. It takes a little while to get the skeleton of a LMM working, but once you have that, you just keep adding PASM code without ever worrying about running out of space.

So to clarify, you have code working in a separate cog for the ram access, and now want to merge it into the main cog? Could you post that main zog cog code?

David Betz · 2011-02-03 16:17

Dr_Acula wrote: »

Could you post that main zog cog code?

Here it is.

Dr_Acula · 2011-02-03 17:07

#ifdef USE_JCACHED_MEMORY
                        mov     addr, tos
                        xor     addr, #%11              'XOR here is an endianess fix.
                        call    #zpu_cache              'z flag is set if address is in current buffer
            if_ne       call    #cache_read             'If address not in current buffer, read new buffer
                        rdbyte  tos, memp               'zpu_cache / cache_read leaves address in memp
#endif
#ifdef USE_VIRTUAL_MEMORY
':waitmbox              rdlong  temp, mboxcmd wz        'If only one client to VMCOG, these first two
'              if_nz    jmp     #:waitmbox              'instructions are not necc
                        mov     addr, tos
                        xor     addr, #%11              'XOR here is an endianess fix.
                        shl     addr, #9
                        movs    addr, #READVMB
                        wrlong  addr, mboxcmd
:waitres                rdlong  data, mboxcmd wz
              if_nz     jmp     #:waitres
                        rdbyte  tos, mboxdat
#endif
#ifdef USE_HUB_MEMORY
                        mov     memp, tos
                        xor     memp, #%11              'XOR here is an endianess fix.
                        add     memp, zpu_memory_addr
                        rdbyte  tos, memp
#endif
                        jmp     #done_and_inc_pc

zpu_storeb              call    #pop
                        mov     memp, tos               ' use by this and USE_HUB_MEMORY
                        cmp     memp, zpu_hub_start wc  'Check for normal memory access
              if_c      jmp     #:next
                        sub     memp, zpu_hub_start
                        wrbyte  data, memp
                        jmp     #zpu_storeb_done
:next

Does the dracblade code replace this or is it as well as this? Just thinking in general terms, a virtual memory driver might take more than an sram driver, but maybe that isn't true?

David Betz · 2011-02-03 17:16

Yes, the Dracblade code will replace all of the stuff in the USE_xxx_MEMORY ifdefs.

David Betz · 2011-02-05 19:03

Dr_Acula: Did you have a chance to look at my Dracblade SRAM code to see if there is any way to trim it down so it has a chance of fitting in ZOG.spin?

Dr_Acula · 2011-02-05 19:43

I don't think you can make the dracblade sram much smaller. So it is a matter of making savings somewhere else. I'm a bit confused about which code you don't now need. If all the other ifdefs are false, there should be more than enough space?

Heater. · 2011-02-06 01:11

I'm surprised if the DracBlade RAM driver does not fit into Zog.

Re: "have the address latches auto-increment."

If one is using direct access to RAM from Zog this only makes any sense if one is executing code from external memory but all your data and stack is in HUB. In that case quite long sequences of code would run faster as the address counter auto increments. Loading the counter is only required for jumps. This could be quite a useful speed up.

Is there such a counter chip that can be loaded and used like the current latches but also just clocked to increment without a load? Presumably two or more of these can be cascaded so as to be able to auto-increment seamlessly through the entire address range.

But if one has data and stack in external RAM as well it would not make any/much improvement. Because basically every instruction would fetch an op code from ext RAM and then make one or more accesses to stack or data thus causing an address counter reload. Then it's back to opcode fetch and another counter reload. And so on.

If one is using a memory manager COG perhaps there is a gain in using an address counter as sizeable buffers of data are fetched from ext RAM in bursts. Depends on the memory manager I guess.

cde · 2011-02-06 05:58

This project is awesome! I started to learn about computers when I was a kid with a TRS-80, this thread reminds me of those good memories

David Betz · 2011-02-06 09:00

Heater. wrote: »

But if one has data and stack in external RAM as well it would not make any/much improvement. Because basically every instruction would fetch an op code from ext RAM and then make one or more accesses to stack or data thus causing an address counter reload. Then it's back to opcode fetch and another counter reload. And so on.

It could still help with long and word accesses. As it is now, I have to reload at least the low order address byte on every byte fetch. That's four times to fetch a long. Auto-incrementing could eliminate three of those four loads.

Heater. · 2011-02-06 09:10

Ah yes, didn't think about that.

Looks like we would need a counter with parallel load so that it acts like a latch and then counts as required. The only one I can find is the 74LS160 which is only 4 bits wide in a 16 pin package. So it looks it increases the chip count by one.

Dr_Acula · 2011-02-06 14:32

@David, is it not just a matter of deleting all the other code? (or comment it out?). Can you see how much space is used now, and how much space you have when you start deleting code? You only need to free up about 20 lines of code.

Bill Henning · 2011-02-06 15:42

I've experimented with 74HC163 and 74HC193 (if I am recalling the correct numbers) for low order address byte counter...

Heater. wrote: »

Ah yes, didn't think about that.

Looks like we would need a counter with parallel load so that it acts like a latch and then counts as required. The only one I can find is the 74LS160 which is only 4 bits wide in a 16 pin package. So it looks it increases the chip count by one.

Dr_Acula · 2011-02-06 17:07

I know next to nothing about hardware design but it seems like it would be nice to have the address latches auto-increment. A fair amount of the overhead of reading multiple bytes (like a word or long) from SRAM is setting the address over and over again. Could at least the low order latch be replaced by a counter? This would also make it unnecessary to constantly change the low order DIRA bits from write mode to read mode.

You should gain something with a counter. I suspect you would gain a lot more by only incrementing the bottom latch. Possibly nearly twice as fast. It would add a few lines of code to check if the current middle and high latches need to be changed, and most of the time they would not, so then you would only change the lower latch.

Then - possibly you could push the speed even further. I am not sure how quickly zog interprets each byte, but if it is slower (even by a small amount) than it takes to read a byte off external ram, you could prefetch 256 bytes into hub ram. Pass a long to the ram decoder. Decode that to an 18 bit address. Compare the middle and upper byte to the previous request.
1) if different, then load the new addresses then start loading 256 bytes over to hub. Return a true flag once the first one has been loaded.
2) if the same, return a flag to say that hub ram already has the value.

Most of the time it will do 2). The ram code might need to be modified so it checks for requests more often than once per cycle. Say it checks every 5 or 10 lines of code? Or maybe the checking for the new middle and upper blocks happens inside the zog code?

Now, this won't achieve much if zog is faster than the ram. But if it is slower, then by prefetching data it could significantly increase the speed. And it would be a cool use of the parallel abilities of the propeller.

David Betz · 2011-02-06 17:15

Dr_Acula wrote: »

@David, is it not just a matter of deleting all the other code? (or comment it out?). Can you see how much space is used now, and how much space you have when you start deleting code? You only need to free up about 20 lines of code.

I tried adding the Dracblade code that I posted in place of all of the ifdefs in ZOG.spin and that is how I found that I was 40 some longs over the size of COG RAM. I can do the merge again and post the merged code if you'd like.

Dr_Acula · 2011-02-06 17:29

Maybe it won't fit? We had all sorts of similar problems with the 8080 and Z80 code. I spent a lot of time coding that original ram code with just one long spare, which made debugging very hard as there was no room to put in any debugging code.

Can you spare the extra cog and dedicate it just to ram access?

If so, the upside is you have more code space in which to do clever things like prefetching data. I see some intriguing possibilities.

David Betz · 2011-02-17 07:53

Sorry for not wading through all of the hundreds of messages in this thread to find this info. I'm sure it must be here somewhere but could someone give me a concise list of which Propeller pins are connected to each Dracblade device? I'm particularly interested in the pins used for the TV, VGA, SD card, PS2, and the second serial port. I'm assuming the first serial port is on P31/P30 as usual. I need these to get those devices working with ZOG.

Thanks,
David

Dr_Acula · 2011-02-17 14:35

Sure, no problem
Pins 0-11 = ram access and external latches etc
Pin 12-15 = SD card (12=DO, 13=CLK, 14=DI, 15=CS)
Pin 16-23 =VGA as per the demo board.
Pin 16-18 = TV
Pin 24,25 = second serial port (24=data out)
Pin 24,25 = mouse (shared with second serial port, use a jumper to select)
Pin 26,27 = keyboard
Pin 28,29 = eeprom
Pin 30,31 = programming serial port

Dracblade SBC with Catalina C, PropBasic, CP/M, MP/M, TRS80, wireless network,

Comments