Shop OBEX P1 Docs P2 Docs Learn Events
memory expansion circuit for the propeller. — Parallax Forums

memory expansion circuit for the propeller.

mahjonggmahjongg Posts: 141
edited 2007-02-01 00:26 in Propeller 1
Although I do not own a propeller board (yet) I have been studying this remarkable chip for some time, and have written a bit about it on Wikipedia.

One of the things I imagined you could do with this chip is to emulate other hardware, recently somebody suggested emulating a PC-junior with it.
I have been thinking along similar lines, I thought about building a replicate of a simple home computer with it, something along the lines of a Sinclair Spectrum.
It should be possible to emulate a Z80 microprocessor with one or two COG's, and either write a Video Display driver that can read a Sinclair Spectrum video buffer directly, or let the emulator write to an area of RAM that acts like it's "video buffer" and use another cog to translate (copy) the video data to the native video buffer of the propellors own TV-driver. A project like this should also be very interesting to owners of the XGameStation.

But there is a problem, the 32K RAM of the propeller just is not enough. The Sinclair Spectrum had 48K. The 32K of the Propeller should be enough for the two video buffers (The Spectrum, and the Propeller's native one) and for the emulator of the Z80 and the other logic of the Spectrum. The keyboard for example could be emulated by using a bit of memory for each "key", and using the PS/2 keyboard driver to set and reset bits in this memory. But the remaining memory is much too small to hold a copy of Sinclair Basic, and the 48K of ram of the spectrum.

So some kind of "virtual memory" scheme is needed, and some external hardware to attach a quick memory system to swap blocks of memory to and from.
really only a fast RAM adressing setup would be fast enough. One problem with attaching a large memory to the Propeller is the fact that in a typical system (such as the XgameStation) there are only a few I/O ports that can be missed. The XGameStation has an expansion connector with an 8-bit port , and a few bits that are also used for something else (the USB port I think, I do not own the XGameStation and in-depth details are scarce). I know that somebody has published a scheme that uses a old DRAM memory stick, but I am not sure that is the ideal solution, the data has to go through a single (or a few) I/O port(s), and a ram refreshing system is also needed.
So I designed another solution around a 512 KB SRAM, using the following design criteria.
* Very fast uploading and downloading of blocks of memory.
* Minimal number of I/O ports needed, no more than 8+2.
* Minimal part count.
* Easy to write a driver for.

The resulting design is in the attached .PDF.

let me describe the design.
Note that the eight bit port is used for eight bit data transfers, to and from the RAM, and the "command and upper address latch".
Transferring eight bit at a time ensures a high transfer rate is possible.
A 512KB RAM uses 19 Address lines to address all of it's 219 bytes.
I used a 12-bit counter to drive the lower address lines, so a block of 4096 bytes (212) can be addressed with it.
I designed the system so that after each one-byte transfer this counter would auto-increment.
The remaining 7 address lines come from a latch, so that any of the 128 4K blocks can be addressed at random.
The eight bit of the latch determines the direction of the 8-bit data transfer. From the propeller to the latch or from the latch to the propeller.
I used a "0" for a "read cycle" (the latch typically powers up with it's outputs at low level, and defining low for a read cycle helps to protect the contents of the RAM during booting) and a "1" for a write cycle (from Propeller to the RAM).
Because the counter must be reset to zero before a new block transfer I designed the system so that when writing a new control word to the latch the counter will automatically be reset to zero.
It took a bit of thought how to control everything with just two control bit's, but I managed to do it with only a single 74HC00 chip.
One bit I call Control-A controls the data transfer function, and I designed it so that a high level is the "inactive" state, so that before the propellor boots and this input pin is still an input pin you can use a pull-up to ensure nothing happens to the contents of the RAM, furthermore I added a "write protect switch" so that the contents of the RAM can be protected against incidents, so you can turn it into a "ROM" after filling it. Pulsing it low will initiate a data transfer. The second control bit (Control-B) controls the latch, and the reset function of the counter. This bit is low in inactive state, pulling it high resets the counter and latches data on the 8-bit port into the latch. During a reset this bit will be high, but that does not cause a problem.

To transfer a 4K block to the RAM you first program the 8-Bit port as output and set the lower 7 bits to the number of the 4K block you want to address and the most significant bit to "1", then pulse the CTRL-B bit high and low again. Note that the latch will act on the high going flank of the pulse, so the data on the data bus must be stable before pulsing CTRL-B. Also note that I have added "protection resistors" in the data bus of the RAM, in case of runaway software that tries to output data on the 8-bit port while the CRTL-A bit is low, and the latch is in "read" mode, this makes sure neither the Propeller nor the RAM chip can be damaged.

Now that we have primed the memory expander for a block write, a data byte should be put on the 8-bit port and CTRL-A must be pulsed low. The data byte will be written to the RAM chip, and on the rising flank of the pulse the address counter will be incremented to the next byte. So new data can be put on the port and CTRL-A can be pulsed again, etc. etc. 4096 times. Note that it is NOT possible to address individual bytes with this design, but a block transfer can be extremely fast.
One way with which it might be possible to increase the data transfer even more, is to use a hardware timer to pulse the CTRL-A output, synchronised to a routine that does nothing than to change the data on the 8-bit port. But I am not sure this can be done.

Reading a 4K block is similar, only the top bit of the latch is set to "0" for a read cycle, and the 8-bit port is programmed as an input port. The CTRL-A pin is then driven low, and the RAM will put the addressed data on the 8-bit port where it can be read, after that the CTRL-A pin is driven high again.

That's it, I release this design for non commercial use, I hope it is of use, Mahjongg.

Comments

  • AndreLAndreL Posts: 1,004
    edited 2007-01-23 07:48
    Boy sounds very similar to the new hydra SRAM card. But, mine has more programability to the post increment / decrement and its designed for random access all over with block access from any address, not just on the 4K boundaries. Trying to get it manfuctured as soon as possible and trying to decide on to use 128, 256, or 512K srams to control cost...

    Andre'
  • mahjonggmahjongg Posts: 141
    edited 2007-01-23 13:30
    Interesting,
    I also thought about a design with similar features to yours, using sn74as867 presettable up down counter's but the control logic would take much more than the three chips of my design, and I could see no way to control it reliably with only two control bits. Does your design use a 8-bit data path? and how many control ports do you need?

    The random-access is of your design is of course a very desirable feature, and probably warrants a much more expensive and complex design.

    I listed the RAM in my design as a M68AW511A because of it's Low cost. Farnell lists it as €5.20 per piece, or only €3,40 at 100+, but of course in principle almost any SRAM device will work.

    It will be very interesting to see what a Hydra equipped with your card will be able to do.

    Mahjongg.
  • KlossKloss Posts: 43
    edited 2007-01-23 15:35
    Why not use a multiplexed address/data bus like most other
    micros do?
    As a cog has 32bit access to the ports, it schould be able to
    serve both address/data lines and the status lines in one cycle.
    We would need 19 lines to address 64k RAM 16bits wide.
    The 16 address/data lines could be used for other purposes
    when the RAM is not addressed.

    Other idea: I'm currently reading the RDRAM specification.
    RDRAM uses 8 or 16 data bits and has a serial command interface for addressing.
    There are additional row/column bits, but you don't have to use them if you
    don't need all the memory.

    And:
    Have you ever seen this?
    http://www.nanoamp.com/SSRAMs.html
  • LawsonLawson Posts: 870
    edited 2007-01-23 15:41
    AndreL:· I'd·say use the·512K Sram.· My primary reason for this opinion is.··If someone needs more than the 32K hub ram it is very likely that this same person will need a LOT more than the 32K hub ram.···

    Any chance that your memory card design would be extendable to support more memory over the same data·bus and controll pins?·
  • Dennis FerronDennis Ferron Posts: 480
    edited 2007-01-23 15:43
    That's a really neat circuit.

    Using a virtual memory/caching scheme to implement an emulator for a system with more RAM than the Propeller carries internally is an intriguing concept. It should get fairly good performance because most programs spend much of their time in short loops. You'd only have to access RAM when a different block becomes necessary.

    I would think this could be a useful thing to use with Mike's Propeller OS, too. A virtual memory system like this would allow you to run bigger programs or more than one at a time, and more OS code and device drivers as well. The programs would have to be designed to work with it but it could be done; especially if it were used for an interpreted language like BASIC or Smalltalk - the interpreter could keep track of moving blocks in and out of memory without the programmer worrying about it.
  • mahjonggmahjongg Posts: 141
    edited 2007-01-23 16:16
    Well in essence my design -does- multiplex the 8-bit port for data and adresses, that is just the trick, although it does it in a very different fashon than most microcontrollers.

    Of course the more pins you use for the "pipe" to the external RAM the higher the potential bandwidth would be, but using 19 I/O pins just for the data/adress bus of the RAM plust the four or so you would need for the control interface, makes 23 of the available 32 Pins. Using that many pins would be quite impossible as an add-on to most existing designs. Doubling up the bidirectional data/adress bus pins with any other function would also be almost impossible to realise.

    My design was wrought with the basic idea that I needed to use as few I/O pins as possible. Many systems have one 8-bit port free, and a few other I/O pins can be scaped together by dropping some other function, so ten I/O bits is more or less the practical limit. Also, I wanted a design that could be easily wired on a breadboard, using as few as possible cheap and easy to get of the shelf components.

    Most dynamic RAM's use a multiplexed adress bus, (because of the Row/Column adressing of the memory array) using such a multiplexed address bus would indeed lower the I/O count for the address lines, but you also need a second port for the Databus. One design that uses an old DRAM stick has allready been proposed on this forum. This solution solved the I/O port scarcity problem by using just a single bit of the data bus (throwing away 7/8 of the capacity of the memory stick). Also with most DRAM solutions you need to implement a DRAM refresh generator, that can be done quite easy in software, but is still a hassle and will take bus cycles, which slows down the transfer rate.

    The SPI bus controlled serial RAM is indeed a nice find, thanks for that. I tried to find something similar that could be connected to the I2C bus like the boot EEPROM (it could share the I2C bus with the EEPROM), but I realised that for most applications it would be too slow. AThis RAM, hanging on a SPI bus would be much faster, but still not as fast as a system using a 8-bit data port. Also these devices are still very exotic, and for a hobbyist hard to get, and probably expensive.

    Mahjongg.
  • AndreLAndreL Posts: 1,004
    edited 2007-01-24 08:32
    I know 512 would be better than 256, then I am like why not make it a meg [noparse]:)[/noparse] We will see... And mine uses 8-bit data bus from the hydra and 2 control lines, plus a clock.

    Andre'
  • mahjonggmahjongg Posts: 141
    edited 2007-01-25 12:28
    Dear Andre',
    Yes having a third, (clock) control bits makes all the difference.

    I can see how I would create a fully adressable RAM expansion with it. Just for fun let me tell you how. I know you allready designed your own version, but maybe you can still use this information.

    You could use a string of three sn74as867 presettable up down counters for the address generator, with it you can address up to 4MB of RAM (or FLASH).

    Using a halve of a 74ALV139 with the controls lines connected to A and B and the clock line connected to the enable you can create four low going control strobes.

    These could have the following four functions (A and B are the two control lines):

    AB
    00 Read from RAM (with auto increment/decrement)
    01 Write to RAM (with auto increment/decrement)
    10 Write to control latch (see below)
    11 Write 8 bit of address data to one of the three presettable up down counters

    The first two are straightforward, Y0 of the 139 goes to to the ram /OE and Y1 to the /WE, the A control line goes to the /CE of the RAM and to the clocks of the up down counters.

    The control latch, is a simple 8-bit latch, its outputs can be divided in four 2-bit pairs. Three of them control the S0 and S1 inputs of the three up down counters, (Clear, Count Down, Load and Count Up) and the fourth pair steers the A B inputs of the other halve of the 74ALV139 to control which of the three three up down counters will be adressed when writing the 8-bit partial start addres to one of the three up down counters, using the fourth function.

    Of course there are still a lot of small details to fill in, but this is the basis.

    By the way, at first thought you might think that auto decrement is not very usefull, but when implementing a fast (Z80 LDIR like) blockmove command it is an essential feature to have.

    I hope this helps.

    Mahjongg

    Post Edited (mahjongg) : 1/25/2007 4:36:07 PM GMT
  • KlossKloss Posts: 43
    edited 2007-01-25 13:50
    Well, sounds like a good application for a CPLD.
  • QuattroRS4QuattroRS4 Posts: 916
    edited 2007-01-25 19:55
    mahjongg

    Thats a blast from the past ..

    Sounds great - I am a serious speccy fan - have the zx80,zx81,16K,48K,128K,128 +2,and 128 +3

    I have copies of the rom files a and b from the 48K and loads of circuit diagrams for external HDD connection - with modified roms - even though only half of the HDD is addressable - I can knock them onto you if it helps ?
    Have a boot to choice - i.e pick your spectrum ! Can copy the other roms if you like

    Also loads of links to emulators etc..


    Post Edited (QuattroRS4) : 1/25/2007 8:42:49 PM GMT
  • mahjonggmahjongg Posts: 141
    edited 2007-01-26 02:15
    Thanks QuattroRS4, but that won't be necessary. you see, I own the Z80 and Z81 myself, and a couple of Spectrums, and even a QL (although I think I do not own any of the newer Amstrad style spectrums).

    You know, when the Z80 came on the market I was working for a small company (called MCP, in the Netherlands) that sold computer kits, Mainly TRS-80 stuff, but also the ZX80. I Must have build and repaired, several dozens of ZX80's for clients who bought a kit through us but found they couldn't get them to work, or simply paid us to build it for them. I also remember that when the ZX81 came out I retrofitted many Z80's with the new ZX81 ROM's so they could run ZX81 software, (except for the SLOW mode of course).
    I still have all my old speccies somewhere in a room with about eighty to ninety other old home and single board computers. My first one (that worked) was a KIM-1.
    I have always liked Sinclairs designs because they were always so brilliantly simple. Although he "lended" many ideas for the ZX80 directly out of Don Lancasters "TV Typewriter cookbook". I know, because at the same time I also designed a similar video system for a 6502 based system, the "Junior Computer" from Elektor/Elektuur (the hobby electronics magazine) using ideas from the very same book.

    As you might guess, I am a collector of sorts, but it's all stored away in a small room and it's difficult to get at.

    I chose the Spectrum as an example for a system that could use my memory expansion idea, because it has relatively easy to emulate electronics But the same basic principle can apply to almost any 8-bit home computer of the time, such as the Amstrad CPC. Except for re-creating an Atari 800, which will be much more difficult. The video chip in that machine was really quite complex.

    Emulating the hardware of the ZX80/81 with a propeller should also be possible, but it's video hardware is so bizarrely different from the principles used in the video software for the Propeller that software emulating it must be written completely from scratch, and a memory extension is perhaps unnecessary.

    Mahjongg

    (see my user page user:mahjongg at Wikipedia, or click on the little house icon to the left of this text, underneath the member picture.)
  • QuattroRS4QuattroRS4 Posts: 916
    edited 2007-01-26 05:46
    mahjongg,
    Thats very interesting. I also like the way that initial design used 'damaged memory I.C's' the· either OKI or TI - link selectable at the top of the board also with the working portions addressed TI 4532-4 for expansion - at the time it was said that if the full compliment of completely working 4116 was addressable that the 48k could be a 64K ! I think Sir Clive and Amstrad has allowed rom usage / replication. The good old Z80 and the Ferranti ULA !


    You have re-sparked my interest - must go and 'poke' out a few speccies and have a 'peek' inside to refresh on the internals - just remembered I have an old Currah speech unit ! I wonder .....

    Incidently am also a big Elektor reader ..

    Keep me posted on your progress - btw: nice idea !

    Post Edited (QuattroRS4) : 1/26/2007 5:57:08 AM GMT
  • AndreLAndreL Posts: 1,004
    edited 2007-01-26 08:28
    Mine is already done, but has some of the similar features. I have been just wishy washy over the memory size to control cost. But, my final design·is very fast and more flexible. At first I was thinking of using the counters to make things easy, but it just locks you into things, plus you need an extra instruction or cycle to tell the counters what to do. Not to mention, this stuff has to fit on that teenie weenie hydra expansion board, and multiple chips is out of the question, so after 100 milliseconds I started in with a CPLD.

    When doing graphics its SPEED SPEED SPEED. So I am interested in having the fastest random access as well as fastest block transfers possible. And those two constraints force you to a CPLD to make it happen. In graphics, you could have·a sprite here, a buffer there, a texture there strewn all over memory etc. so the lack of being able to get to them with the fewest possible operations and having the second level of indirection with the command latch kills your performance, you're talking about slowing yourself down 30-50% in many cases for random access and memory hoping all over the place, for example a linked list would kill you or generally anything that's not contiguous reads/writes.

    But, bottom line is all these will be similar in design since there are only so many ways to use 2 bits of control and yes being able to decrement is a nice feature as well. My original design had full math to the address operand, but to fit in this particular CPLD, I had to get less freaky and settle on less exotic stuff that were playing games with the data for graphics features.

    However, I think I will save that for another board that is more of a blitter and DMA unit with graphics in mind.

    Andre'


    ·
  • mahjonggmahjongg Posts: 141
    edited 2007-01-26 14:38
    Yes I agree, for a professional design a CPLD is definitely the way to go.
    But I just wanted to throw something together from parts that a typical hobbyist would have in his parts box, and that he could build on his breadboard with minimal fuss, but I also thought about the experimental plug-in boards for the hydra (I really must order one sometime soon).
    I did not realise it had a third (clock?) control line available, having only two really complicates matters. Also the same mechanism that is used to read a byte of data should auto increment (or decrement) the address pointer, otherwise you must indeed use more machine cycles to change the address. My design solves this by incrementing the address on the end of the read signal pulse.


    If logic complexity is unimportant, than a completely random access way to address the RAM is of course the way to go, ideally with as few propeller (expletive)embler instructions per byte transferred as possible.

    I am very curious how your circuit works. It sounds like you are using some kind of co-processor approach that somehow "knows" what data the propeller needs, and feeds that data to it; otherwise I can't quite place the "math for the addressing mode" remark.

    Mahjongg
  • AndreLAndreL Posts: 1,004
    edited 2007-01-26 22:03
    Nope, nothing that complicated, just the memory controller allows general access via the latch address. And has a "behavior" at the end of the read/write cycle, that behavior is add, sub, or nothing.

    Andre'
  • KaioKaio Posts: 253
    edited 2007-01-28 20:48
    Hi Mahjongg,

    that is a nice design of memory expansion. I am also a big Speccy fan and I have think about to realize an emulator for a ZX with the propeller. I have not designed a memory expansion in detail yet, bacause I am a programmer. But I would like to use a multiplexer for the address and data bus.

    I think your solution is a good beginning, but the 12-bit counter could be the bottleneck in your design. Let me explain why.
    In real Z80 code you have not only sequential access to the memory, which your solution provide very good. You must also have random access to the memory, even so in a block move (LDIR). When you want to access a memory address which is located in another memory page, then you must select the memory page and increment the lower address lines to the desired address by pulses. This can be occur in a range of 0 up to 4095 clocks on the 12-bit counter. Now I have calculated how much time this can take?
    With an optimized propeller assembler code you can do one clock count with 3 instructions:

    :countTo  mov outa,#clockLow   'sets the clock bit to low
              mov outa,#clockHigh  'sets the clock bit to high
              djnz count,#:countTo
    
    



    You need for one counter clock 3*4=12 propeller clocks, at 80 MHz internal clock this takes 150 ns. So, only the addressing for the lower lines will take up to 614,25 µs in a worst case (last byte of page). When we suppose that an average addressing for the lower lines will take a quarter of the worst time, then this will cost 153,45 µs. In this time a Speccy processes 77 simple assembler instructions.

    Therefore I would propose to use a second latch and a multiplexer logic for the lower address lines for performance reason.

    Kaio
  • mahjonggmahjongg Posts: 141
    edited 2007-01-29 17:09
    Dear Kaio,

    I am quite familiar with Z80 Assembler, so I am aware of the limitations of my design when trying to use it as a random access memory. The problem is even worse when you try to do a blockmove where the direction of the move is "downwards", so you need to access memory addresses that are sequentially decremented (especially if the source and destination area's overlap, otherwise you could still program around this problem).

    But that is not how I wanted the software to work in the first place! I wanted to use it with a "virtual memory" setup. let me explain what that means. It means that the CPU emulator divides it's memory map into 4K segments, and uses a number (as many as there is memory left in the 32K main memory of the propeller) of 4K blocks as "real memory", and uses a swapping mechanism to loads/store 4K segments of memory from/to my external memory expansion on an "as needed" base. When the emulated Z80 needs to address a 4K block that is not in any of the "real memory" blocks, the virtual memory routine must then load it from the memory expander after first writing the "least used" (the virtual memory manager needs to keep track) 4K block of "real" memory to the memory expander.

    Any LDIR instruction (when it stays within the borders of the available real memory blocks) should then run at high speed using the 32K internal memory, without needing to acces my memory expander.

    My memory expander was designed with the idea that it had to be as simple as possible in terms of hardware, but I am well aware that writing a virtual memory driver for the Propeller is not trivial, and that the resulting speed of the emulated system cannot be deterministic. By that I mean that it is very hard to predict the performance of the emulated system while running different kinds of software on such a virtual memory based system. If the emulated system continuously acces memory "all over the place" the performance will drop dramatically.

    The ideal solution for a memory expander for the propeller would be to have a fully random access system where you can set a beginning address where you want to start reading/writing to the external memory, and the possibility to either auto increment, or auto decrement the "memory pointer" after each byte written or read. The software can then just implement a kind of read ahead caching system.

    But with just two control lines it was not possible to control the "memory pointer" logic of such a complex design, so I settled for a "virtual memory" solution.

    But in the meantime I got an idea idea.gif of how to create seven different control pulses from an interface with three control bits, by using a "deglitcher" circuit I just invented.
    That is possible, because (as I understand, please correct me if I'm wrong) that it is possible to set (or reset) more than one I/O bits at the same time using a single instruction, so that its possible that all three output control bits of my interface change at exactly (to the nano second) at the same time. My idea is that when all three control output bits are '1' the circuit is "inactive", but any combination of the three I/O bits with one or more '0's in it means that any of seven different "commands" is given. Normally such an idea would not work because even when every output bit transits at the "exact same time", there would still be fatal "glitches" on the outputs of the 3-to-8 decoder that decodes the tree control bits to activate one out of eight signals. But I think I know how to solve that problem.
    With this idea, and the sn74as867 presettable up down counters I talked about earlier, it would be possible to design a more complex memory expander without using too many components. The question is whether there is a need for such a design. If there is, I am willing to put some work into designing it, and releasing it here.

    Mahjongg
  • KaioKaio Posts: 253
    edited 2007-01-29 20:57
    Please be aware of at your concept that a propeller's cog has a speed limited access to the main memory. But each cog has only 1984 bytes of own memory which can be accessed without speed limitations and also the external ports. Hence it would be better in my opinion that the page size is maximal 1KB. So you can transfer one page from the main memory to the cog memory and then with "high speed" to the memory expansion. The cog is then limited to 240 longs for code and variables, which should be enough.

    An alternative would be you act directly on the memory expansion. So you can operate always with full speed on the external memory.

    Kaio
  • mahjonggmahjongg Posts: 141
    edited 2007-01-29 22:54
    Dear Kayo,

    Thanks, you are rising a good point! I am aware that the cog needs to wait for access to the 32K main memory, and only has 512 32-bit words of RAM (2K). That is indeed a limiting factor for the performance of a virtual memory system!

    I am not sure the block size of the memory expander matters much though.
    As far as I understand it, (I do not know every detail of the propeller, I am not in the first place an assembly language programmer) In each access cycle that a COG gets, it can execute one first instruction, which takes 7 cycles, and then it has another nine cycles before the end of the access window. So in one access window time period it can transfer four bytes (one 32-bit long word) with the first instruction, and then has 9 more cycles to execute only one more long word memory instructions, because such an instruction takes seven cycles. So in theory, as far as I understand it, you can transfer 8 bytes per cog cycle.
    Therefore the optimal transfer method seems to be to try to read eight bytes from the external memory in between two cog cycles, and then, during the cog access cycle to transfer them to the main RAM.
    Because of this limitation, at least in in theory, it does not seem to matter much how big the buffer is, because the limiting factor is not the size of the cog RAM but the limited access to the main ram that the cog has.

    But just in case that you still want to limit the buffer to 1K, for whatever reason, its quite possible to do that with my simple design, but the price is that you can only address 128K of RAM. In many cases that can be quite enough, so it's a valid option. To do this, just do not use the upper two outputs (Q11 and Q12) of the counter, and shift the address lines of the latch down two positions. So the address lines of the RAM are connected this way; A0 to A9 come from Q1 to Q10 of the 74HC4040, and address lines A10 to A16 come from the latch outputs 1Q to 7Q.

    I am not completely sure it's possible but maybe the optimal solution would be to try to have the complete Z80 emulator stored in the RAM of a single cog, and with the use of an external memory system that can access in a random access way (the new design, I could try to develop), access the external memory one Z80 instruction at a time. For example a LDIR would then not have to go through the main 32K memory, and thus would be executed quite fast.
    If its not possible to fit the whole Z80 emulator in a single cog, another way would be to divide the emulation work over two (or more) cog's so that for example simple Z80 instructions would be "executed" by the "main cog", and a "slave cog" would do the more difficult instructions like LDIR's and complex indirect addressing instructions.

    In a system based on "virtual memory" could work in a similar way. You could use one cog to manage the virtual memory system, and this cog would then "feed" the other, (z80 emulating) cog(s) as many bytes as needed to execute one (or maybe two) Z80 instructions.

    So, my question is still, is there a need for an external memory expander that can access that memory in a completely random access way. If so, I will try to design such a circuit.

    Mahjongg
  • LawsonLawson Posts: 870
    edited 2007-01-30 03:25
    i think Kaio is thinking of using a 1k byte or less block size with the memory circuit so that the cogs can directly use the external memory bypassing the HUB entirely.

    neet stuff,
    marty
  • mahjonggmahjongg Posts: 141
    edited 2007-01-30 07:59
    Lawson said...
    i think Kaio is thinking of using a 1k byte or less block size with the memory circuit so that the cogs can directly use the external memory bypassing the HUB entirely
    Yes, I understood that, but what would the cog then -do- with the 1K data from the memory circuit?
    When trying to implement a virtual memory scheme the data -has- to go to the 32K main memory first! Please remember that my simple memory expander -cannot- efficiently handle random-access, so it needs a virtual memory scheme to become somewhat efficient.

    With a truly random access memory expander it would perhaps be feasible to use the remaining 240 words of code in the cog to do basic decoding of a Z80 assembler instruction by decoding the Z80 instructions on a "micro instruction" level, but with my "4K memory sector" design it is just not efficient.

    With a truly random access external memory circuit it may also be feasible to use the external memory as the frame-buffer, and run a special "TV-device" in a cog that accesses the external memory expander to display its contents. To come to think of it, perhaps it would even be possible to do that with my simple device, when no other use of it would be required at the same time, during the building up of the frame (in between the frames, in the blanking period it may be used for other things). At least it would clear up room in the 32K for other things.

    That said, there is another way you could use the memory expander for large programs. Its feasible to put 1K code snippets in the external memory expander, and use these to run much more code than is now possible, but the remaining 240 bytes of cog memory are not very useful then, and using 1K blocks means the memory expander (in its current form) is then limited to just 128K.
    What you could also do is to write a very small memory threader that just loads the next needed "external memory sector" into the remaining room in the cog, and use all the remaining cog room to load "memory sectors" into, it would mean that you would throw away some memory capacity, but that might be a small price to pay. Say for example that you can get this "memory threader" running in 32 instructions, then you could have "memory sectors" that are 2048-16(registers)-32(memory threader) = 2000 bytes long, so you throw away the last 48 bytes of each 2K memory sector. With a 2K memory sector my device would support 256K of external ram, or 128x2000 bytes of memory sectors. It would perhaps not really be so useable for an emulator setup, but just for running very large assembler programs it might be useable.

    Mahjongg
  • mahjonggmahjongg Posts: 141
    edited 2007-01-30 11:56
    I have been tinking about a fully random access memory expansion design, and I think I have come up with a nice controller interface setup with a 3-line interface, that can control a maximum of 224 = 16MB memory, and has auto increment and decrement.

    Assuming we have one eight bit I/O port (for all eight-bit data transports), and a second I/O port with three control lines labeled A, B and C(where A is the least significant bit of a I/O port), then:

    CBA
    111 = inactive

    100 = write least significant byte of memory pointer adress (A0-A7) to counter/latch #1
    101 = write mid significant byte of memory pointer address (A8-A15) to counter latch #2
    110 = write most significant byte of memory pointer address (A16-A23) to counter latch #3

    000 = write to RAM at location pointed at by the memory pointer, and then increment memory pointer
    001 = read from RAM at location pointed at by the memory pointer, and then increment memory pointer
    010 = write to RAM at location pointed at by the memory pointer, and then decrement memory pointer
    011 = read from RAM at location pointed at by the memory pointer, and then decrement memory pointer.

    Of course all three control bits must be set/reset using a single instruction, or intermediate output results would trigger unwanted actions.

    To "deglitch" the 3-to-1of8 decoder (so no unwanted pulses occur at its output) you could use a three input OR port (each input connected to one of the control lines), delay its output a few tens of nanoseconds (a few inverters in series) then feed the signal to a one-shot (simple RC combination and schmitt triger input would do) that creates an output pulse of say one hunderd and fifty nanoseconds when the output of the OR goes low, and then feed this output pulse to the enable of the 3-to-1of8 decoder.
    The software timing for the control bit driver must be such that the control lines all stay the same for the duration of the delay+one-shot pulse (say 200 ns) and afterward all turn high again.

    For a 64K max memory system only the counter/latch #1 and #2 would be needed.

    Such a design could be built using a handfull of discrete components, or with a CPLD.

    Mahjongg.
  • KaioKaio Posts: 253
    edited 2007-01-31 14:22
    Mahjongg said...
    I am not completely sure it's possible ... complete Z80 emulator stored in the RAM of a single cog

    I have it tried, it is not possible to have a complete Z80 emulator in one cog. I have not tried to split the code, so that the emulator uses more than one cog.

    Currently I implement an Z80 emulator in spin for a first version. I will use a second cog as a "memory controller" which read from and write to an external memory expansion. The whole video RAM is located in the prop's main memory and is not copied to the external memory.

    Your concept of a memory expansion up to 16 MB is very interesting for a real OS implementation. But for an 8 Bit emulator I would prefer a simple design with random access to external memory, maximal up to 128 KB. You need also I/O capability for such a design, so that you can use original hardware extensions.

    Kaio
  • mahjonggmahjongg Posts: 141
    edited 2007-02-01 00:26
    Interesting,

    Won't SPIN be much too slow?

    If you look at the Z80 instruction matrix, as documented in many places, such as here:

    www.z80.info/decoding.htm

    you can see that you can "disassemble" the instruction (just like the internal decoder of the Z80 would do), and "split off" some of the harder to implement instructions and use a semaphore mechanism to let another cog decode and "execute" them.

    I can really relate to you wanting a simple memory controller with random access capabilities.

    So you know what, how does 3 chips, and using only two control bits for a completely random access 128K memory controller sound?

    I think I found a way to build one with only two 74HC574 latches, and one 74HC137, nothing more !
    I will draw up the schematic diagram tomorrow, and post it here.

    There is one catch. My design does NOT do auto increment, so you have to change the address even when it's sequential. But doing that only takes very few instruction cycles. Also I used a trick to double the 64K that is addressable with two 8-bit latches to 128K, and it only complicates the code a little bit, but does not slow down access. Also, the design is extendable, if you want more memory, just add another latch, four more can be added! That means you can potentially addres up to 248 = 281474976710656 bytes or 262144 Giga byte! That should be enough smilewinkgrin.gif

    Mahjongg
Sign In or Register to comment.