PROJECT: Extended addressing for Extreme 512K
epmoyer
Posts: 314
In its production release configuration, the Extreme 512K card only supports random access addressing of the lower 64K of memory. All memory above the 64K boundary must be accessed by sequentially “walking” up to the desired address using auto-increment mode. As you might expect, this makes “random” access to upper memory extremely slow.
If you want to implement a double buffered 256x192 8-bit video driver with the E512K that would mean two pages of 48K bytes each, which means one of those pages would reside (at least in part) above the 64K boundary. When the video driver was drawing the lower page the application would be drawing in the upper page, and would (generally) require random access to the upper page in order to be remotely efficient.
To resolve the above issue I have developed an alternative version of the E512K PLD which supports “Extended Addressing”, allowing random access to the entire 512K.
Here is the nitty gritty:
There was not enough room in the PLD to implement the change without giving something up. Since the auto increment / decrement modes are a “one time” setting at reset anyway, and since everyone (in general) will choose auto-increment reads and auto-increment writes, the Extended Addressing version defaults to this mode. The configuration cycle is still necessary (for backward software compatibility, of a sort), but is ignored.
I strongly suspect that I can add the W0 and R0 bits (1 bit magnitudes) back in and still fit in the PLD. The sw and sr bits are the ones that take a huge amount of PLD resources to implement (i.e supporting both count up and count down takes a lot of space).
The extended address bits (A18, A17, A16) are accessed by issuing two “Latch Upper Address” commands in a row (i.e. SRAM_C1 = 1, SRAM_C0 = 1). The first command will latch D7..D0 into A15..A8 as usual. The second command will latch D2..D0 into A18..A15. If any other command is issued after the first “Latch Upper Address” (Write, Read, or Latch Lower Address), then the internal state is reset and the next “Latch Upper Address” will function as a standard A15..A8 latch. In general, there is no reason for code to ever perform two back-to-back “Latch Upper Address” commands, so this implementation will generally be backward compatible with existing code. For example, Andre’s SRAM test app still works perfectly with the Extended Addressing PLD.
I have written a modified version of Andre’s test app which includes an Extended Addressing test.
I will post the PLD code and test app tomorrow, but I wanted to sleep on the design first and I want to get approval from Andre’ to post the source.
I’m toying with the idea of also adding the ability to modify the “default” state of A18,A17,A16. Today those bits get cleared to zero whenever you write the upper address bits (A15..A8), but I am considering changing the format of the “extended address” cycle from “X,X,X,X,X,A18,A17,A16” (i.e. how I have it now) to “q,DEFA18, DEFA17, DEFA16,X,A18,A17,A16”. If bit q were set to 1, then DEFA18, DEFA17, DEFA16 would become the new default A18,A17,A16 (i.e. the default 64K page) which get set whenever an “Latch Upper Address” command is issued . That would allow some speed optimizations since the cog doing random access (i.e. drawing) could set a default 64K page and use the standard 2 command method to specify the lower 16 bits for each write , and the video cog(s) would have to send the whole extended address using the new 3 command method, but generally the video cog(s) will be caching a whole scan line of reads so the 3 command hit is less painful than it would be for the random access cog.
Post Edited (epmoyer) : 6/14/2007 6:31:59 AM GMT
If you want to implement a double buffered 256x192 8-bit video driver with the E512K that would mean two pages of 48K bytes each, which means one of those pages would reside (at least in part) above the 64K boundary. When the video driver was drawing the lower page the application would be drawing in the upper page, and would (generally) require random access to the upper page in order to be remotely efficient.
To resolve the above issue I have developed an alternative version of the E512K PLD which supports “Extended Addressing”, allowing random access to the entire 512K.
Here is the nitty gritty:
There was not enough room in the PLD to implement the change without giving something up. Since the auto increment / decrement modes are a “one time” setting at reset anyway, and since everyone (in general) will choose auto-increment reads and auto-increment writes, the Extended Addressing version defaults to this mode. The configuration cycle is still necessary (for backward software compatibility, of a sort), but is ignored.
I strongly suspect that I can add the W0 and R0 bits (1 bit magnitudes) back in and still fit in the PLD. The sw and sr bits are the ones that take a huge amount of PLD resources to implement (i.e supporting both count up and count down takes a lot of space).
The extended address bits (A18, A17, A16) are accessed by issuing two “Latch Upper Address” commands in a row (i.e. SRAM_C1 = 1, SRAM_C0 = 1). The first command will latch D7..D0 into A15..A8 as usual. The second command will latch D2..D0 into A18..A15. If any other command is issued after the first “Latch Upper Address” (Write, Read, or Latch Lower Address), then the internal state is reset and the next “Latch Upper Address” will function as a standard A15..A8 latch. In general, there is no reason for code to ever perform two back-to-back “Latch Upper Address” commands, so this implementation will generally be backward compatible with existing code. For example, Andre’s SRAM test app still works perfectly with the Extended Addressing PLD.
I have written a modified version of Andre’s test app which includes an Extended Addressing test.
I will post the PLD code and test app tomorrow, but I wanted to sleep on the design first and I want to get approval from Andre’ to post the source.
I’m toying with the idea of also adding the ability to modify the “default” state of A18,A17,A16. Today those bits get cleared to zero whenever you write the upper address bits (A15..A8), but I am considering changing the format of the “extended address” cycle from “X,X,X,X,X,A18,A17,A16” (i.e. how I have it now) to “q,DEFA18, DEFA17, DEFA16,X,A18,A17,A16”. If bit q were set to 1, then DEFA18, DEFA17, DEFA16 would become the new default A18,A17,A16 (i.e. the default 64K page) which get set whenever an “Latch Upper Address” command is issued . That would allow some speed optimizations since the cog doing random access (i.e. drawing) could set a default 64K page and use the standard 2 command method to specify the lower 16 bits for each write , and the video cog(s) would have to send the whole extended address using the new 3 command method, but generally the video cog(s) will be caching a whole scan line of reads so the 3 command hit is less painful than it would be for the random access cog.
Post Edited (epmoyer) : 6/14/2007 6:31:59 AM GMT
Comments
I like the default page idea. I suspect this will come into play for more than video, given tasks that span multiple cogs.
As more people come up with different behaviors and APIs, I will add them to the CD that comes with it, so users can try them (if they build or buy a programming cable).
Andre'
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Realize that I am really a mad scientist··· and
Don't forget it!
http://raydillon.com/Images/Illustration/GameArt/WildIsle/WildIsle-Ink-ScientistClose.jpg
·
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
JMH
I'm currently testing out epmoyer's clever mod for the HX512 for use with Catalina on the Hydra (and soon on the Hybrid as well!) - this looks like a much better solution than the one I proposed, since it maintains a high degree of compatibility with existing HX512 software.
I'll report when testing is complete.
Ross.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Catalina - a FREE C compiler for the Propeller - see Catalina
Post Edited (RossH) : 6/10/2009 2:08:16 AM GMT
Well, so far everything seems to work beautifully. Since Catalina only uses the autoincrement mode of the Xtreme, adopting this new addressing scheme means it will still be backwards compatible with the original Xtreme firmware - so I will include support for both in the next release.
If you have the original Xtreme firmware you should really only use the first 64K of the Xtreme with Catalina. While you can use the rest, once you get above 64K the memory access slows down dramatically - and gets slower the further above 64K you get.
But if you install epmoyer's version of Xtreme firmware, you can now randomly access ALL the 512K of the Xtreme, which means Catalina programs can have code segments up to 512K.
Many thanks to epmoyer for this.
Ross.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Catalina - a FREE C compiler for the Propeller - see Catalina
Epmoyer, would you be willing to take this on? I can do PayPal.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
JMH
Post Edited (James Michael Huselton) : 7/7/2009 3:23:25 AM GMT
I have run my Xtreme on both the Hybrid and the Hydra and it seems fine. I haven't done any rigorous testing to make sure I can access the full 512K - I've been busy on other things (and in any case it didn't seem very urgent since I can always revert to the original firmware if there's a problem). But I'll have a go at doing some more testing in the next couple of days and report back.
I'd happily reprogram your card for nothing (all care but no responsibility!) if you want to pay the postage both ways. But you're in the US - surely there MUST be someone closer who has a suitable programming cable?
Also, I believe in the US you can buy a real cable for about $45 (look for the RS-232 version, not the USB one) - which is probably cheaper than the return postage to Australia in any case. I only resorted to making one because the last time I looked the only place that stocked them here wanted nearly $500 for one lousy cable!
Ross.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Catalina - a FREE C compiler for the Propeller - see Catalina