P2 HyperRAM External Memory Driver
* supports Parallax HyperRAM/HyperFlash P2-EVAL expansion board
* is also intended to support other HyperRAM implementations with the P2
* supports multiple chip selected devices on a shared Hyper data bus
* configurable control pin allocation & reset for multiple devices
* data transfers made at sysclk/1 or sysclk/2 byte rates (sysclk/2 can be enforced)
* transfer rate and read delay configured by operating frequency range
* multiple COGs can all share the external memory together
* nominated COGs can have priority access to memory, eg. video & audio COG drivers
* all other non-priority COGs are round-robin polled for fairness of requests
* COGs not needing to access this memory can be excluded to reduce polling overhead
* COG polling loop code is constructed automatically to optimize performance
* selectable byte/word/long or larger burst transfers to/from external memory
* longer round-robin COG bursts automatically divided up to guarantee stable video operation
* a simple mailbox interface is used for COGs requesting external memory service
* serviced COGs can be notified of results via the mailbox and optional COGATN
* supports up to 15 (14?) devices on the same bus
* banks can be arranged in 16MB memory blocks or higher
* flexible bank to device mapping allowing variable device sizes, eg. 32MB flash
* inbuilt support for reporting errors, eg. device busy / locked or invalid bank etc
* support a list of multiple requests in single mailbox access
(this may be very handy for supporting multiple audio input channels, eg wavetable synthesis)
* dynamic addition of other RR COGs to the polling list being serviced after driver startup
* dynamic control of burst size for more bandwidth fairness amongst RR COGs?
* enabling larger tranfer bursts when video is known to be inactive
* support for other similar memory devices like octaRAM?
* multiple HyperRAM driver instances could share a common SW interface and select driver+bank based on address?
* locks for RR COGs if/when their bursts are divided?
* wider memory / parallel HyperRAM?
fltl rwdspin 'prepare pins for next memory access
wrpin #0, clkpin 'run clock in GPIO mode
p5 setbyte dira+PINX, #$FF, #BYTEX 'setup data bus as output
getbyte pb, addrhi, #3
p6 setbyte outa+PINX, pb, #BYTEX
rogloh wrote: »
Yes I think once HyperRAM devices that use variable latency are found and utilised it could be revisited with optimisations. ..
Octal RAM CR[7:4]
0000 3 clocks 6 clocks 83Mhz
0001 4 clocks 8 clocks 100Mhz
0010 5 clocks (default at 3V) 10 clocks 133Mhz
0011 6 clocks 12 clocks 150MHz
0100 7 clocks 14 clocks NA
0101 8 clocks(default at 1.8V) 16 clocks 166/200Mhz(2)
0100 - 1111 Reserved - NA
HyperRAM CR7-4 Initial Latency
0000 - 5 Clock Latency
0001 - 6 Clock Latency (default)
0010 - Reserved
0011 - Reserved
0100 - Reserved
1101 - Reserved
1110 - 3 Clock Latency
1111 - 4 Clock Latency
rogloh wrote: »
quickly fill in rectangles or ellipses or triangles (3d?) quickly as well.
Wuerfel_21 wrote: »
I wonder if the technique is viable on P2... (with somewhat simpler geometry and no 1/Z buffering (and thus necessarily much simpler moving entity geometry)). In theory 4 P2 cogs at 250+ MHz should(tm) have integer power comparable to a Pentium's combined integer+float power, the performance is there, but memory is tight.
rogloh wrote: »
I would tend to agree about 15-16 devices not being practical/realistic. However the Parallax board already has 2 chips fitted on the one bus, and once you mix two devices then supporting more than two is not a great deal harder in software, it's mainly extra space. A full nibble is allocated to the device/bank selection which can be apportioned between either device address or bank, so you could have two banks of 128MB each or 15 banks of 16MB devices for example. It's flexible.
rogloh wrote: »
If someone wanted to split the busses for the different devices they can then do multiple transfers simultaneously to increase overall bandwidth albeit with more memory COG drivers.
I was wondering about direct inter chip transfers too yesterday, thinking you might be able to drive a different address out to each device if you could suspend clocks at the right time in the latency period. Maybe something is possible where one chip is reading and the other writing sharing a common clock phase over both clock outputs at the same time using a common data bus. It could be tricky to get it to work though not impossible with suitable clock control. It could double the transfer rate and completely avoid using the hub memory as the intermediate buffer.