PROJECT: Extended addressing for Extreme 512K

epmoyer · 2007-06-14 06:05

In its production release configuration, the Extreme 512K card only supports random access addressing of the lower 64K of memory. All memory above the 64K boundary must be accessed by sequentially “walking” up to the desired address using auto-increment mode. As you might expect, this makes “random” access to upper memory extremely slow.

If you want to implement a double buffered 256x192 8-bit video driver with the E512K that would mean two pages of 48K bytes each, which means one of those pages would reside (at least in part) above the 64K boundary. When the video driver was drawing the lower page the application would be drawing in the upper page, and would (generally) require random access to the upper page in order to be remotely efficient.

To resolve the above issue I have developed an alternative version of the E512K PLD which supports “Extended Addressing”, allowing random access to the entire 512K.

Here is the nitty gritty:

There was not enough room in the PLD to implement the change without giving something up. Since the auto increment / decrement modes are a “one time” setting at reset anyway, and since everyone (in general) will choose auto-increment reads and auto-increment writes, the Extended Addressing version defaults to this mode. The configuration cycle is still necessary (for backward software compatibility, of a sort), but is ignored.

I strongly suspect that I can add the W0 and R0 bits (1 bit magnitudes) back in and still fit in the PLD. The sw and sr bits are the ones that take a huge amount of PLD resources to implement (i.e supporting both count up and count down takes a lot of space).

The extended address bits (A18, A17, A16) are accessed by issuing two “Latch Upper Address” commands in a row (i.e. SRAM_C1 = 1, SRAM_C0 = 1). The first command will latch D7..D0 into A15..A8 as usual. The second command will latch D2..D0 into A18..A15. If any other command is issued after the first “Latch Upper Address” (Write, Read, or Latch Lower Address), then the internal state is reset and the next “Latch Upper Address” will function as a standard A15..A8 latch. In general, there is no reason for code to ever perform two back-to-back “Latch Upper Address” commands, so this implementation will generally be backward compatible with existing code. For example, Andre’s SRAM test app still works perfectly with the Extended Addressing PLD.

I have written a modified version of Andre’s test app which includes an Extended Addressing test.

I will post the PLD code and test app tomorrow, but I wanted to sleep on the design first and I want to get approval from Andre’ to post the source.

I’m toying with the idea of also adding the ability to modify the “default” state of A18,A17,A16. Today those bits get cleared to zero whenever you write the upper address bits (A15..A8), but I am considering changing the format of the “extended address” cycle from “X,X,X,X,X,A18,A17,A16” (i.e. how I have it now) to “q,DEFA18, DEFA17, DEFA16,X,A18,A17,A16”. If bit q were set to 1, then DEFA18, DEFA17, DEFA16 would become the new default A18,A17,A16 (i.e. the default 64K page) which get set whenever an “Latch Upper Address” command is issued . That would allow some speed optimizations since the cog doing random access (i.e. drawing) could set a default 64K page and use the standard 2 command method to specify the lower 16 bits for each write , and the video cog(s) would have to send the whole extended address using the new 3 command method, but generally the video cog(s) will be caching a whole scan line of reads so the 3 command hit is less painful than it would be for the random access cog.

Post Edited (epmoyer) : 6/14/2007 6:31:59 AM GMT

potatohead · 2007-06-14 06:18

Nice!!

I like the default page idea. I suspect this will come into play for more than video, given tasks that span multiple cogs.

epmoyer · 2007-06-14 06:32

Andre gave me the nod so I've posted the current version. Included in the .zip are the PLD source code (.ABL), the PLD binary (.JED), and a modified version of the test app which performs a very rudimentary test of the current implementation of the extended addressing function. I'll add a more complete test when I've settled on the final implementation.

Jasper_M · 2007-06-14 07:15

Sounds good ^_^ the upper three bits could be used to swap display buffers... (well, in my driver the display buffers will be interleaved, but on a driver that does the drawing on VBLANK)

epmoyer · 2007-06-14 15:21

I got the W0 and R0 bits back in (no problem) but am struggling to fit the DEFA18, DEFA17, DEFA16 bits. Plenty of room to set them and store them internally, but am challenged for routing channels to use them for the re-initialization of A18,A17,A16. I will have to pick through the detailed logs and see what's really going on under the hood to see if there is some clever way to shoe horn them in.

AndreL · 2007-06-14 18:35

Yup, the thing is that there are basically 8-16 different ways to skin this cat, the problem is once you pick once then you have to write 50 pages of docs on it [noparse]:)[/noparse] So I prefer the fast one and simple use lower bit depth and the 64K mode, but you can for example do things like forget about the 512K, assume its 128K, then set the upper 16 address bits and then use a default auto increment, and at worst have to auto inc 1 time to get to the next byte. Anyway, the CPLD isn't that big, so its a real challenge getting things to fit in it, but lots of fun as well.

As more people come up with different behaviors and APIs, I will add them to the CD that comes with it, so users can try them (if they build or buy a programming cable).

Andre'

Bob the Builder on a C64 · 2007-06-16 14:33

Potatohead, I have on comment: If you wanted to get around the slow speed of the data addressing beyond 64 KB, you could allways make programs that run at 4 MHZ or at very slow·speeds! Think of the SNES. Although it wouldn't run fast you'd have to program so it would run at a decent pace despite that.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Realize that I am really a mad scientist··· and

Don't forget it!

http://raydillon.com/Images/Illustration/GameArt/WildIsle/WildIsle-Ink-ScientistClose.jpg

·

epmoyer · 2007-06-16 14:56

It's not just about game frame rate (i.e. how many different frames of video are seen each second), its about being able to access SRAM fast enough that the video driver can paint the screen at video display rates. A double buffered full color paging scheme means two 48K pages which means one of the two will be above the 64K boundary. Since the video renderer and the draw engine have to work in SRAM at the same time, that means that the video renderer cannot just rely on the sequential address increment feature of the production release E512K PLD, since the draw engine will be periodically changing the SRAM address in a random access fasion. If the render engine were rendering from memory above 64K, lost its place because the draw engine rewrote the SRAM address pointers, and had to "walk" back up to the right spot using the auto-increment read trick, it would take way to long to keep up with the necessary video output speeds.

Mike Huselton · 2009-06-04 08:29

Any flashes of brilliance? Any flashes of mediocrity?

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
JMH

RossH · 2009-06-09 23:31

@all,

I'm currently testing out epmoyer's clever mod for the HX512 for use with Catalina on the Hydra (and soon on the Hybrid as well!) - this looks like a much better solution than the one I proposed, since it maintains a high degree of compatibility with existing HX512 software.

I'll report when testing is complete.

Ross.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Catalina - a FREE C compiler for the Propeller - see Catalina

Post Edited (RossH) : 6/10/2009 2:08:16 AM GMT

RossH · 2009-06-10 23:14

@all,

Well, so far everything seems to work beautifully. Since Catalina only uses the autoincrement mode of the Xtreme, adopting this new addressing scheme means it will still be backwards compatible with the original Xtreme firmware - so I will include support for both in the next release.

If you have the original Xtreme firmware you should really only use the first 64K of the Xtreme with Catalina. While you can use the rest, once you get above 64K the memory access slows down dramatically - and gets slower the further above 64K you get.

But if you install epmoyer's version of Xtreme firmware, you can now randomly access ALL the 512K of the Xtreme, which means Catalina programs can have code segments up to 512K.

Many thanks to epmoyer for this.

Ross.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Catalina - a FREE C compiler for the Propeller - see Catalina

Mike Huselton · 2009-07-07 03:15

RossH, any updates on the progress of your development and testing? Would you consider reprogramming my Extreme 512K for a reasonable fee? I don't want to build the cable and go through the whole experience of mistrusting my work. You know the drill: is it the cable or my ignorance of using it? I would rather have to take advantage of your experience of the whole process and be done with it. I would consider $30 + shipping to be fair if it takes 15-20 minutes of your time. $50 if it takes more.

Epmoyer, would you be willing to take this on? I can do PayPal.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
JMH

Post Edited (James Michael Huselton) : 7/7/2009 3:23:25 AM GMT

RossH · 2009-07-07 07:50

Hi JMH,

I have run my Xtreme on both the Hybrid and the Hydra and it seems fine. I haven't done any rigorous testing to make sure I can access the full 512K - I've been busy on other things (and in any case it didn't seem very urgent since I can always revert to the original firmware if there's a problem). But I'll have a go at doing some more testing in the next couple of days and report back.

I'd happily reprogram your card for nothing (all care but no responsibility!) if you want to pay the postage both ways. But you're in the US - surely there MUST be someone closer who has a suitable programming cable?

Also, I believe in the US you can buy a real cable for about $45 (look for the RS-232 version, not the USB one) - which is probably cheaper than the return postage to Australia in any case. I only resorted to making one because the last time I looked the only place that stocked them here wanted nearly $500 for one lousy cable!

Ross.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Catalina - a FREE C compiler for the Propeller - see Catalina

PROJECT: Extended addressing for Extreme 512K

Comments