Shop OBEX P1 Docs P2 Docs Learn Events
Memory drivers for P2 - PSRAM/SRAM/HyperRAM (was HyperRAM driver for P2) - Page 14 — Parallax Forums

Memory drivers for P2 - PSRAM/SRAM/HyperRAM (was HyperRAM driver for P2)

1111214161737

Comments

  • Cluso99Cluso99 Posts: 18,066
    edited 2020-05-20 06:31
    rogloh wrote: »
    I was able to compile my new HyperRAM driver codebase in PNut v34s running on VirtualBox. Still not tested, just compiling without errors.
    The size difference vs Fastspin is interesting. Looks like the SPIN2 driver object is currently about 8kB including the 3600 byte PASM code. This probably compares to just over 13kB in Fastspin with optimisation enabled. Though the Fastspin version should still be somewhat faster to run of course. By how much, I'm keen to find out at some point.

    driver.png

    I needed to change a few things before it compiled and this is what I learned (I'm sure it has been discussed before, but this is the first time I've ever run PNut so I'm learning the hard way when porting the driver code to be hopefully runnable using both environments):

    - PNUT needs that return parameter to compile without errors if you want to return something, Fastspin doesn't need it.
    PUB getHyperDriver() : r
        return @hyper_driver
    vs
    PUB getHyperDriver()  ' Fastspin allows this syntax and can still return a value
        return @hyper_driver
    

    - PNUT needs cogid to be returned via function cogid() while Fastspin allows just cogid to be used

    - Fastspin allows # but PNut now always needs a dot. Eg:
    driver#REQ_READBYTE  ' Fastspin allows this
    vs
    driver.REQ_READBYTE
    

    - There is no cognew function in PNut to spawn PASM COGs you need to use coginit with 16 as the argument to start a new COG.
    driverCog := cognew(addr, @params)
    vs
    driverCog := coginit(16, addr, @params)
    

    - SPIN2 method parameters can't use the same name as labels do in the PASM2 code in PNut.

    - PNut requires any no-argument SPIN2 methods to be defined and called with ()

    - Finally there was a problem with greater than and equal to order
    PNut needs this:
    repeat until long[m] >= 0
    
    while (perhaps an older) Fastspin needed this to work correctly:
    repeat until long[m] => 0
    
    Hopefully a newer Fastspin should fix this.
    Update: looks like Fastspin 4.1.9 is doing what I want now and can use the Pnut syntax...this should work according to the listing output.
    00730 | ' repeat until long[m] >= 0
    00730 | LR__0001
    00730 81 CC 01 FB | rdlong dump_tmp001_, _dump_m
    00734 00 CC 5D F2 | cmps dump_tmp001_, #0 wcz
    00738 F4 FF 9F CD | if_b jmp #LR__0001
    Do yourself a big favor and get the latest fastspin release :)

    Yes, found all those.= :wink:

    BTW the latest fastspin uses >= and <= too.
    pnut Spin2 leaves cog $000-$131 free and PR0-PR7 is usable too (? $1D8-$1DF)
    fastspin leaves cog $000-$01F free only.
  • Yeah I downloaded 4.1.9 yesterday. Only problem is that it changes much faster than I can just about keep up with! LOL.

    Eric is a very productive guy.
  • roglohrogloh Posts: 5,122
    edited 2020-05-21 12:25
    Finally had a bit of success today after a lot of stupid little problems I ran into with my new SPIN2 HyperRAM driver interface. It's been one of those days and I shouldn't be staying up late and waking up as early I guess. But in any case I have been able to read/write to the HyperRAM again using this whole new interface. A lot of new code needed to execute correctly for this to work, including a small change to the PASM driver to initialise things which I wasn't expecting to cause as many issues. Still needs a lot more testing but it shows life now. :smile:

    Software like this is hard if you leave it alone for too long and need to relearn it each time you come back to it.
  • roglohrogloh Posts: 5,122
    edited 2020-05-24 04:31
    evanh wrote: »
    With low and high temperature measurements:

    Frequency bands for HR Read Data using Eval Board and Hyper Accessory (data pins P16-P23, clock pin P24):
    1-96 MHz, 112-193 MHz, 232-288 MHz: All registered pins, no capacitor.
    1-87 MHz, 107-174 MHz, 217-266 MHz: All registered pins, 22 pF capacitor on clock.
    1-92 MHz, 113-183 MHz, 226-279 MHz: -5 °C, All registered pins, 22 pF capacitor on clock.
    1-82 MHz, 101-165 MHz, 208-253 MHz: 55 °C, All registered pins, 22 pF capacitor on clock.

    1-94 MHz, 107-188 MHz, 221-277 MHz: Registered data pins, no capacitor.
    1-84 MHz, 103-168 MHz, 208-256 MHz: Registered data pins, 22 pF capacitor on clock.
    1-88 MHz, 108-177 MHz, 216-268 MHz: -5 °C, Registered data pins, 22 pF capacitor on clock.
    1-79 MHz, 97-159 MHz, 199-243 MHz: 55 °C, Registered data pins, 22 pF capacitor on clock.

    1-80 MHz, 90-162 MHz, 180-241 MHz, 276-317 MHz: Registered clock pin, no capacitor.
    1-73 MHz, 86-145 MHz, 171-221 MHz, 259-295 MHz: Registered clock pin, 22 pF capacitor on clock.
    1-77 MHz, 91-153 MHz, 183-233 MHz, 273-309 MHz: -5 °C, Registered clock pin, 22 pF capacitor on clock.
    1-68 MHz, 81-137 MHz, 161-208 MHz, 250-281 MHz: 55 °C, Registered clock pin, 22 pF capacitor on clock.

    1-78 MHz, 87-156 MHz, 173-233 MHz, 264-308 MHz: All unregistered pins, no capacitor.
    1-71 MHz, 83-140 MHz, 165-214 MHz, 249-286 MHz: All unregistered pins, 22 pF capacitor on clock.
    1-75 MHz, 88-147 MHz, 175-225 MHz, 260-299 MHz: -5 °C, All unregistered pins, 22 pF capacitor on clock.
    1-66 MHz, 78-132 MHz, 154-200 MHz, 238-272 MHz: 55 °C, All unregistered pins, 22 pF capacitor on clock.
    Looking at this again @evanh, I spent some more time on this today and added the ability to enable registered clock outputs for the Hyper bus transfers. It is just a flag passed at driver startup time and is not further modifiable at runtime at this point, although it could potentially be later to help dynamically switch between operating intervals. The registered data pin setting (used just for reads) can in theory be changed at run time, even per device on the bus in case there are slight path timing differences. I'm hoping that this will be sufficient for anyone wishing to deal with the temperature variation.

    In my driver I have just setup some frequency intervals in the CON section and it can easily be altered if some suitable timing profile is known in advance. The table I am using is this (for sysclk/2 read transfer rates):
        DELAY1 = 5            ' value used at/below FREQ1
        FREQ1  = 90_000_000   ' in Hz
        DELAY2 = 6            ' value used between FREQ1 and FREQ2
        FREQ2  = 120_000_000
        DELAY3 = 7
        FREQ3  = 180_000_000
        DELAY4 = 8
        FREQ4  = 225_000_000
        DELAY5 = 9
        FREQ5  = 270_000_000
        DELAY6 = 10          ' value used above FREQ5
    
    
    The effect of these DELAYx values is to do this (the actual bits of DELAYx get split further):

    P2 frequency <90MHz : use registered data pins, delay between WYPIX clock output and streamer XINIT read = 6 clocks
    90-120MHz : use unregistered pins, delay = 7 clocks
    120-180MHz : use registered data pins, delay = 7 clocks
    180-225MHz : use unregistered pins, delay = 8 clocks
    225-270MHz : use registered data pins, delay = 8 clocks
    P2 frequency >270MHz : use unregistered dat pins, delay = 9 clocks

    Also if reading at sysclk/1 the above clock delays need to be reduced by 1 because the phase differs by one clock cycle with respect to the streamer.

    For the current writes and the address phase for both reads/writes I always leave the data pins registered to keep the timing constant there. For future sysclk/1 writes I might need to review that choice.

    What worries me slightly is that boards without the suggested capacitor vs with the capacitor will have different optimal operational frequency ranges. There is no way the driver can fully know which to apply in advance which means any distributed applications/demos etc using the HyperRAM can perform quite differently in different systems depending on them having that capacitor fitted or not, even at the same temp and P2 clock speed. The addition of the capacitor for supporting future sysclk/1 writes reduces the operational range and interval overlaps slightly too, unfortunately around the 297MHz rate which is used for HDTV timing, which is not ideal. Perhaps with proper 3V rated v2 HyperRAM this may not be such an issue.
  • evanhevanh Posts: 15,126
    I'm thinking that using the capacitor should be reserved for special board layout with prop2 and a dedicated HR side by side. When those boards come available the read frequency bands will be entirely different anyway.

    So there'll be sysclock/2 read/write reliable on all boards, and sysclock/2 writes reliable with sysclock/1 reads somewhat reliable on most boards. And fully sysclock/1 "mostly" reliable on the special boards. Mostly because there is still going to be read frequency bands, but hopefully broad enough to cater for general use without issue.

  • Yes that is a reasonable way to consider it evanh. It is pretty much impossible for this driver to cater for every case in advance automatically, but at least it will have some flexibility to be tweaked so people can attempt to tune it for their situation if they wish operate in frequency bands causing problems.

    This information might be important for @"Peter Jakacki" particularly if he plans to fit a capacitor on his up coming P2PAL HyperRAM. Perhaps it might be sufficient just to have the footprint for one on the PCB and people could solder it on later if their want to experiment with sysclk/1 writes.
  • roglohrogloh Posts: 5,122
    edited 2020-05-28 03:56
    Made some decent progress on wrapping up the SPIN2/Fastspin API code for my HyperRAM driver in the last couple of days as I've been in the right frame of mind to get it done. Now I'm testing and documenting it.

    I think it is quite a bit easier to use now as I have simplified some APIs and only do the bus creation internally within the driver once its first device is mapped. You can now use it as easily as one or two lines like this...
    OBJ mem : "memorydriver"
    
    ' a minimalist setup...
    PUB simpleStart()
        ' map and init HyperRAM at address 0-$ffffff, HyperFlash at $2000000-$3ffffff
        ' base module P2 pin number is 32
        ' all COGs round-robin serviced
        ' maximum burst limited only device's !CS limit or $ffff (whichever lower)
        mem.initHyperDriver(32, 0, $2000000, 0)
    
        ' read byte from address $aaaa of HyperFlash
        mem.readByte($200aaaa) 
    
        ' write long $abcdef12 to address $bcd0 of HyperRAM
        mem.writeLong($bcd0, $abcdef12)
    
    ' a more complex setup and config...
    PUB customStart() | bus
        ' map 16MB HyperRAM only to the $80000000-$80FFFFFF range, 
        ' transfer burst is automatically limited to fit 4uS
        bus := mem.mapHyperRam($80000000, S_16MB, 32, 32+12, 32+8, 32+10, 32+15, 0) 
    
        ' setup all COGs to use round-robin polling with a 256 byte burst limit
        mem.setupCogParams(ALLCOGS, bus, 256, 0) 
    
        ' then make this COG the highest priority COG (priority 7) and don't yield during transfer requests
        mem.setupCogParams(1<<cogid(), bus, -1, F_LOCKED + F_PRIORITY + 7)
    
        ' start the driver and also enable faster sysclk/1 reads
        mem.start(bus, F_FASTREAD)
    
        ' start some video driver on this COG, pass it the mailbox address for this COG and HyperRAM address
        startVideoCog(cogid(), getMailboxAddr(bus, cogid()), $80000000)
    

    Here's the latest API I have now and it shouldn't need to change much now I hope, maybe some minor name tweaking. In each description using it, "r" represents the returned result/error.

    There also may be some scope in the future to map other memory types such as SPI flash using a similar API so the software infrastructure could remain common. Eg. there could be a mapSpiFlash(flashStartAddr, size, miso, mosi, cspin, clkpin) API added etc which could map elsewhere into the common 4GB external memory address space. Some extra overhead in the outer Read/Write functions is required for enabling this probably using method pointers, but the software flexibility gains could be rather good allowing data to be sourced from different devices with the same API. TBD..
    'P2-EVAL HyperRAM/HyperFlash simple init
    PUB initHyperDriver(basePin, ramStartAddr, flashStartAddr, flags) : bus
    PUB initHyperDriverCog(basePin, ramStartAddr, flashStartAddr, flags, cog) : bus 
    
    'init/config related
    PUB mapHyperRam(ramStartAddr, size, datapin, cspin, clkpin, rwdspin, resetpin, burst) : bus
    PUB mapHyperFlash(flashStartAddr, size, datapin, cspin, clkpin, rwdspin, resetpin, burst) : bus
    PUB start(bus, flags) : driverCog
    PUB startCog(bus, flags, cog) : driverCog 
    PUB setupCogParams(cogmask, bus, burst, priorityFlags) : cog 
    PUB removeCogs(cogmask, bus) : r
    PUB shutdown(bus) : r 
    
    'helpers
    PUB getMailboxAddr(bus, cog) : addr 
    PUB getDriverCogID(bus) : cog
    PUB getMaxBurst(frequency, cs_interval, latency) : clocks
    
    'reads
    PUB readByte(srcAddr) : r 
    PUB readWord(srcAddr) : r 
    PUB readLong(srcAddr) : r 
    PUB read(dstHubAddr, srcAddr, count) : r
    PUB readReg(addr, addrhi_16, addrlo_32) : r 
    
    'writes
    PUB writeByte(dstAddr, data) : r 
    PUB writeWord(dstAddr, data) : r 
    PUB writeLong(dstAddr, data) : r 
    PUB write(srcHubAddr, dstAddr, count) : r
    PUB writeReg(addr, addrhi_16, addrlo_32, value) : r 
    
    'complex transfers/request lists
    'listPtr is an optional non-zero pointer to build a request list item for later processing instead of executing the single request immediately
    PUB readBytes(dstHubAddr, srcAddr, count, listPtr) : r 
    PUB writeBytes(srcHubAddr, dstAddr, count, listPtr) : r 
    PUB fillBytes(dstAddr, pattern, count, listPtr) : r 
    PUB fillWords(dstAddr, pattern, count, listPtr) : r 
    PUB fillLongs(dstAddr, pattern, count, listPtr) : r 
    PUB gfxCopyImage(dstAddr, dstPitch, srcAddr, srcPitch, byteWidth, height, hubbuf, listPtr) : r 
    PUB gfxReadImage(dstHubAddr, dstPitch, srcAddr, srcPitch, byteWidth, height, listPtr) : r 
    PUB gfxWriteImage(srcHubAddr, srcPitch, dstAddr, dstPitch, byteWidth, height, listPtr) : r 
    PUB gfxFillBytes(dstAddr, dstPitch, width, height, pattern, listPtr) : r 
    PUB gfxFillWords(dstAddr, dstPitch, width, height, pattern, listPtr) : r 
    PUB gfxFillLongs(dstAddr, dstPitch, width, height, pattern, listPtr) : r 
    PUB copyBuf(dstAddr, srcAddr, totalBytes, hubBuffer, bufSize, listPtr) : r
    PUB execList(bus, listPtr) : r
    
    'advanced setup/config
    PUB readIR(addr, ir_num, mcpdie_num) : r
    PUB readCR(addr, cr_num, mcpdie_num) : r
    PUB writeCR(addr, cr_num, mcpdie_num, value) : r
    PUB setFlashLatency(addr, latency) : r ' future?
    PUB setRamLatency(addr, latency) : r 
    PUB setBurst(addr, burst) : r 
    PUB setDelay(addr, delay) : r 
    PUB getBurst(addr) : r
    PUB getDelay(addr) : r 
    

  • I've updated the above API slightly to make its calls more consistent with respect to address order, and introduced the more generic read/write API for single calls for the typical read/write burst transfers, and made readBytes/writeBytes as the list capable forms.

    I think I'll also introduce a non-blocking read/write option in the list. Possibly I could use the MSB of the listPtr as an optional flag that will indicate not to block, and the requesting client can then later poll or wait on its ATN for the result. Even in SPIN this will be useful, especially once longer lists are used, and you'll be able to do work while the data is being transferred in the background.
  • That'd be a rather ugly design choice to repurpose the msb.
  • roglohrogloh Posts: 5,122
    edited 2020-05-28 06:15
    Why? HUB RAM addresses are only 19 bits. You can OR in the top bit. This is not being used as an external address, just a flag & HUB address in my driver function. We could otherwise add an additional parameter to every call involving list creation but it seems excessive to do it that way. The only reason not to would be if you know your hub addresses of your own managed lists are already using the upper bits for some weird reason and you don't want to have to clear it each time if you don't want to enable non-blocking operation.

    Eg. what's wrong with having this:
     fillBytes(addr, pattern, count, list)
     execList(bus, list | NON_BLOCKING)
    

    I would't mind adding an additional flags argument to just execList so much for this, but it is the dozen other methods that would also need it, when their listPtr is 0 and they want non-blocking operation enabled.
  • roglohrogloh Posts: 5,122
    edited 2020-05-28 06:58
    The HyperFLASH is responding! Only needed to flip the endianness to get it to work.

    Reading its NVCR gives $8EBB which is what the data sheet says is the default. This proves both reads/writes are functional because you first need to write a special pattern before reading the register.
  • @rogloh
    If some future spin of the silicon can accommodate the full 1MB of Hub RAM then you'll want that 20th bit. That said, if you are satisfied with limiting listPtr to only be able to point at the lower (existing) half of the Hub that's probably ok (but not pretty).

    If the coloured chart you produced still accurately represents the fields within the mailbox then you have 4 don't care bits above the list pointer field. Could you give one of those this purpose?

  • roglohrogloh Posts: 5,122
    edited 2020-05-28 07:20
    AJL, I'm planning on setting bit 31 not bit 19 in this listPtr to the driver layer so 1MB HUB is still fine down the track. Also the 20 bit list pointer field sent to the HyperRAM driver will already have its top 8 bits overwritten with the special start list request pattern ($BF). This is why I can reuse these upper bits before they even get into the PASM driver's mailbox area. Eg:
    ' todo add non blocking
    PUB execList(bus, listptr) : r | m
        m := getMailboxAddr(bus, cogid())
        if m < 0 
            return m
        repeat until long[m] >= 0       'don't start another list if the last one hasn't ended
        long[m] := (listptr & $fffff) | R_STARTLIST ' R_STARTLIST = $BF<<24
        repeat until long[m] >= 0
        r := (long[m] == 0) ? 0 : long[m][1]
    
    

    Note: This Non-blocking thing is something handled in the SPIN layer where it won't wait in the repeat loops above, it is not done in the PASM, though you would want to enable ATN notifications for it to work correctly.
  • Writing to a HyperFlash sector and the individual words within it is working now. :smile:

    I found we can also write using a single 512 byte write burst operation after sending the special 3 word unlock sequence. This will improve application performance, avoiding many extra mailbox transactions doing it word at a time etc. With video running we should still be able to get 1-4 HyperFlash mailbox transactions done per scan line, so writes can happen fast enough, certainly faster than the 0.5-2ms write time per 512 bytes, which is limited by the device itself.

    Any sector erase time is still not ideal though. It's 2.9 seconds per 256kB sector erased. I still can't get my head around that delay. Writing 1MB will likely take ~13 seconds if you also have to erase the sector first but probably just 1 second if already erased, and it's up to 4 minutes to erase the whole 32MB chip!

    This is the first time I've been able to try anything like this until now, and having the proper SPIN2 API integrated makes all the difference for faster experimental testing.
  • rogloh wrote: »
    AJL, I'm planning on setting bit 31 not bit 19 in this listPtr to the driver layer so 1MB HUB is still fine down the track. Also the 20 bit list pointer field sent to the HyperRAM driver will already have its top 8 bits overwritten with the special start list request pattern ($BF). This is why I can reuse these upper bits before they even get into the PASM driver's mailbox area. Eg:
    ' todo add non blocking
    PUB execList(bus, listptr) : r | m
        m := getMailboxAddr(bus, cogid())
        if m < 0 
            return m
        repeat until long[m] >= 0       'don't start another list if the last one hasn't ended
        long[m] := (listptr & $fffff) | R_STARTLIST ' R_STARTLIST = $BF<<24
        repeat until long[m] >= 0
        r := (long[m] == 0) ? 0 : long[m][1]
    
    

    Note: This Non-blocking thing is something handled in the SPIN layer where it won't wait in the repeat loops above, it is not done in the PASM, though you would want to enable ATN notifications for it to work correctly.

    Ok, I understand now.
  • Cluso99Cluso99 Posts: 18,066
    @rogloh
    Is there any point in keeping b31 & b30 free?
    You can pass the c & z flags in these. Of course you can test b31 into c on a rdxxxx from hub.
  • roglohrogloh Posts: 5,122
    edited 2020-05-30 03:10
    Cluso99, I'm not entirely sure what you are asking about bit31 & bit30 being free if it relates directly to this above SPIN2 example.

    However in my mailbox scheme I already make good use bit31 extensively to test whether the mailbox request is active as well as with TJS, etc, and the lower bits including bit30 already contains other data used to indicate the request type (in fact bit30 = read/write).
                        setq    #24-1                   'read 24 longs
                        rdlong  req0, mbox              'get all mailbox requests and data longs
    polling_code        skipf   pattern                 ']dyanmic polling code starts from here....
                        jatn    atn_handler             ']JATN (or JINT?) triggers reconfiguration 
                        tjs     req0, cog0_handler      ']
                        tjs     req1, cog1_handler      ']Initially this is just a dummy placeholder
                        tjs     req2, cog2_handler      ']loop taking up the most space if there is
                        tjs     req3, cog3_handler      ']a polling loop with all round robin COGs.
                        ...
    

    I've found that extracting both C/Z pair in one go with RCZL/RCZR you need to rotate twice or copy/restore the original value as there is no "NR" anymore, so reading them independently as needed is just as fast and avoids that double rotation step.

    Actually within the driver I just use the entire 8 bit upper value of the first mailbox long as a table jump index anyway which includes the bank bits / memory type so extracting two flags there are not that important. This method gives me instant branching to where I want it, by both service and memory type (flash or RAM), as well as my control path with special bank 15. It's fast and avoids multiple branches but the jump table does burn 128 COG LONGs though. I can save part of this space in the future if I add 2-3 more instructions per request if I get desperate.
  • Cluso99Cluso99 Posts: 18,066
    @rogloh
    Nothing in particular. I remembered there were instructions to extract the c&z flags. Shame we didn't think about making another instruction that did not rotate.
  • Speaking of COG space usage, the PASM driver is getting full again now. I added a couple more niceties to prevent potential problems happening in lists trashing the mailbox area and it is now consuming 485 COGRAM longs without any of my state dump debug code included, though I can always push up to 502 longs as I don't use interrupts at this stage. It is also consuming 508 LUT RAMs longs without any debug code.

    If I hunt for some more instruction optimizations I'm sure I can shrink it down a little here and there, but I haven't got desperate enough for that yet because it is sort of feature complete now. There's no room left for arbitrary angle pixel plotting at this stage. That potential idea I had might have to be jettisioned for now until that 128 COG RAM long jump table implementation ever gets ditched. Adding this still would speed up non-horizontal and non-vertical line drawing though in 8/16/32bpp modes...so I still like the idea of it.
  • evanh wrote: »
    Same as Brian's results. Although the capacitor degrades the bands a little more. And higher temperature will degrade them further. Running the sources above ...

    Frequency bands for HR Read Data, room temperature, data pins P16-P23, clock pin P24:
    1-96 MHz, 112-193 MHz, 232-288 MHz: All registered pins, no capacitor.
    1-87 MHz, 107-174 MHz, 217-266 MHz: All registered pins, 22 pF capacitor on clock.

    1-94 MHz, 107-188 MHz, 221-277 MHz: Registered data pins, no capacitor.
    1-84 MHz, 103-168 MHz, 208-256 MHz: Registered data pins, 22 pF capacitor on clock.

    1-80 MHz, 90-162 MHz, 180-241 MHz, 276-317 MHz: Registered clock pin, no capacitor.
    1-73 MHz, 86-145 MHz, 171-221 MHz, 259-295 MHz: Registered clock pin, 22 pF capacitor on clock.

    1-78 MHz, 87-156 MHz, 173-233 MHz, 264-308 MHz: All unregistered pins, no capacitor.
    1-71 MHz, 83-140 MHz, 165-214 MHz, 249-286 MHz: All unregistered pins, 22 pF capacitor on clock.

    EDIT: Oops, corrected a bug with HRclock pin registering.

    @evanh. I was able to replicate this type of HyperRAM read test using my driver by iterating through the different P2 frequencies from 25MHz to 310 MHz, although I think 45-50MHz is probably about the practical lower operating limit if the 4uS CS time is to be honoured and you want to be able to transfer more than a long at a time, given the overheads and address phase etc.

    The code and module appears to work together across the frequency bands if you set the delay appropriately at the transition points. This is what I used to change the delay as I varied the P2 frequency (freq in MHz). The LSB of the delay actually controls registered/unregistered data pin selection which adds a little more delay (it's like a half step of the true delay).
         delay := (fast) ? 9 :  11 ' fast <> 0 for sysclk/1 reads
         if freq < 270
            delay--
         if freq < 225
           delay--
         if freq < 180
           delay--
         if freq < 120   
           delay--
         if freq < 88
           delay--
         mem.setDelay(RAM, delay) ' where RAM is the base address of the HyperRAM bank to adjust
    
    I tested out the HyperRAM module in pin positions 0-15, 16-31, and 32-47 and all worked. Operating the module at base pins 0 or 16 seemed to top out at around 308MHz and 304MHz for final successful sysclk/2 and sysclk/1 read rates respectively at 20C room temperature. Running at base pin 32 with the P2-EVAL is slightly slower hitting around 304-305MHz P2 limit for both rates. It's still thankfully a little over 297MHz which is a sweet spot for 1080p.

    Test output is attached showing the delay values changes and read test result for HyperRAM. I write a different 256 byte pattern at different addresses once at the start at 200MHz with sysclk/2 and then read back each pattern for different frequencies, then compare byte by byte. It's not an intensive memory test, just there to test my own timing delay values which, when incorrect, quickly show up as a skew offset by one or more bytes. Only the first 16 bytes of what was sent and received back are dumped for brevity.
    HyperRAM driver init, result bus = 0
    HyperRAM cog id = 1
    HyperRAM mailbox addr = 000053C4
    Freq=25 MHz, delay=6: read values compared ok
    Freq=26 MHz, delay=6: read values compared ok
    Freq=27 MHz, delay=6: read values compared ok
    Freq=28 MHz, delay=6: read values compared ok
    Freq=29 MHz, delay=6: read values compared ok
    ... <snip>
    Freq=85 MHz, delay=6: read values compared ok
    Freq=86 MHz, delay=6: read values compared ok
    Freq=87 MHz, delay=6: read values compared ok
    Freq=88 MHz, delay=7: read values compared ok
    Freq=89 MHz, delay=7: read values compared ok
    Freq=90 MHz, delay=7: read values compared ok
    Freq=91 MHz, delay=7: read values compared ok
    ... <snip>
    Freq=117 MHz, delay=7: read values compared ok
    Freq=118 MHz, delay=7: read values compared ok
    Freq=119 MHz, delay=7: read values compared ok
    Freq=120 MHz, delay=8: read values compared ok
    Freq=121 MHz, delay=8: read values compared ok
    Freq=122 MHz, delay=8: read values compared ok
    Freq=123 MHz, delay=8: read values compared ok
    Freq=124 MHz, delay=8: read values compared ok
    ... <snip>
    Freq=176 MHz, delay=8: read values compared ok
    Freq=177 MHz, delay=8: read values compared ok
    Freq=178 MHz, delay=8: read values compared ok
    Freq=179 MHz, delay=8: read values compared ok
    Freq=180 MHz, delay=9: read values compared ok
    Freq=181 MHz, delay=9: read values compared ok
    Freq=182 MHz, delay=9: read values compared ok
    Freq=183 MHz, delay=9: read values compared ok
    Freq=184 MHz, delay=9: read values compared ok
    ... <snip>
    Freq=221 MHz, delay=9: read values compared ok
    Freq=222 MHz, delay=9: read values compared ok
    Freq=223 MHz, delay=9: read values compared ok
    Freq=224 MHz, delay=9: read values compared ok
    Freq=225 MHz, delay=10: read values compared ok
    Freq=226 MHz, delay=10: read values compared ok
    Freq=227 MHz, delay=10: read values compared ok
    Freq=228 MHz, delay=10: read values compared ok
    Freq=229 MHz, delay=10: read values compared ok
    Freq=230 MHz, delay=10: read values compared ok
    ... <snip>
    Freq=266 MHz, delay=10: read values compared ok
    Freq=267 MHz, delay=10: read values compared ok
    Freq=268 MHz, delay=10: read values compared ok
    Freq=269 MHz, delay=10: read values compared ok
    Freq=270 MHz, delay=11: read values compared ok
    Freq=271 MHz, delay=11: read values compared ok
    Freq=272 MHz, delay=11: read values compared ok
    ... <snip>
    Freq=302 MHz, delay=11: read values compared ok
    Freq=303 MHz, delay=11: read values compared ok
    Freq=304 MHz, delay=11: read values compared ok
    Freq=305 MHz, delay=11: first mismatch at offset 80
    00000000 00003F30 : 31 62 93 C4 F5 26 57 88 B9 EA 1B 4C 7D AE DF 10 
    00000000 00007498 : 31 62 93 C4 F5 26 57 88 B9 EA 1B 4C 7D AE DF 10 
    Freq=306 MHz, delay=11: first mismatch at offset 104
    00000000 00003F30 : 32 64 96 C8 FA 2C 5E 90 C2 F4 26 58 8A BC EE 20 
    00000000 00007498 : 32 64 96 C8 FA 2C 5E 90 C2 F4 26 58 8A BC EE 20 
    Freq=307 MHz, delay=11: first mismatch at offset 30
    00000000 00003F30 : 33 66 99 CC FF 32 65 98 CB FE 31 64 97 CA FD 30 
    00000000 00007498 : 33 66 99 CC FF 32 65 98 CB FE 31 64 97 CA FD 30 
    Freq=308 MHz, delay=11: first mismatch at offset 20
    00000000 00003F30 : 34 68 9C D0 04 38 6C A0 D4 08 3C 70 A4 D8 0C 40 
    00000000 00007498 : 34 68 9C D0 04 38 6C A0 D4 08 3C 70 A4 D8 0C 40 
    Freq=309 MHz, delay=11: first mismatch at offset 4
    00000000 00003F30 : 35 6A 9F D4 09 3E 73 A8 DD 12 47 7C B1 E6 1B 50 
    00000000 00007498 : 35 6A 9F D4 1D 3E 73 A8 DD 12 47 7C B1 E6 3B 50 
    Freq=310 MHz, delay=11: first mismatch at offset 0
    00000000 00003F30 : 36 6C A2 D8 0E 44 7A B0 E6 1C 52 88 BE F4 2A 60 
    00000000 00007498 : 3C 00 00 00 00 00 00 00 00 00 3C 00 00 00 00 00
    
  • roglohrogloh Posts: 5,122
    edited 2020-05-30 13:14
    I also tested out the HyperFlash but only seem to get it reading okay from 95-278MHz (sysclk/2) or 191-278MHz (sysclk/1) for some reason. Could be differences in output timing compared to HyperRAM - if so, I am very glad I made my delay a per bank parameter, not global per driver. :relieved: Still checking.

    One thing users will need to know is that reading a burst from flash can introduce gaps in the data when it crosses certain page boundaries. The streamer cannot compensate for this because it does not interpret RWDS as the data byte strobe, and the gaps will end up in hub memory. These gaps can be reduced or eliminated in some case by reducing the latency, but this reduces the upper operating frequency as well. Thankfully this problem does not happen if you start your read from the beginning of the page boundary, so ideally any burst read that crosses the page boundary should really begin there.
    flash.png
    678 x 152 - 47K
  • rogloh wrote: »

    Test output is attached showing the delay values changes and read test result for HyperRAM. I write a different 256 byte pattern at different addresses once at the start at 200MHz with sysclk/2 and then read back each pattern for different frequencies, then compare byte by byte. It's not an intensive memory test, just there to test my own timing delay values which, when incorrect, quickly show up as a skew offset by one or more bytes. Only the first 16 bytes of what was sent and received back are dumped for brevity.

    Nice work. It would be interesting to see how far the transitions shift under temperature

    Did you add a capacitor or is this straight parallax hyper accessory board?
  • Just the straight board, no mods.
  • roglohrogloh Posts: 5,122
    edited 2020-05-30 15:36
    I fixed a bug in the HyperFlash testing and can now get it to read successfully with sysclk/2 transfers from 25MHz to 360MHz (didn't want to try any higher).

    However for sysclk/1 read timing and HyperFlash it doesn't seem to follow the same profile as the HyperRAM and I get errors in different ranges if I setup the same input delay as the RAM uses. So it's likely to be the case that it requires a different delay profile. This will be ok as the driver does it per bank, but I'll just need to play more to figure out the new ranges...this is what I found with the delays used earlier:
    25-87 MHz ok
    88-95 MHz Bad 
    96-119 MHz ok
    120-125 MHz Bad
    126-179 MHz ok
    180-191 MHz Bad
    192-224 MHz ok
    225-249 MHz Bad
    250-269 MHz ok
    270-286 MHz Bad
    287-360 MHz ok
    
  • Cluso99Cluso99 Posts: 18,066
    Yes I agree 360MHz is a safe over clocking limit :)
  • evanhevanh Posts: 15,126
    edited 2020-05-30 22:08
    Intriguing, the hyperRAM fades above 300 MHz.

    That hyperFlash has more frequency bands than expected too. Presumably that's registered clock pin, unregistered data pins, correct?
  • roglohrogloh Posts: 5,122
    edited 2020-05-31 00:41
    evanh wrote: »
    Intriguing, the hyperRAM fades above 300 MHz.

    That hyperFlash has more frequency bands than expected too. Presumably that's registered clock pin, unregistered data pins, correct?
    No, this was with the clock unregistered. It was alternating between registered/unregistered data only through the frequency bands.

    I had earlier also tried enabling a registered clock with the HyperRAM only (experimental only right now) and I think it worked only at sysclk/2 IIRC. I probably still have some software timing off with sysclk/1 and registered clock operation in the actual driver code and hopefully may just need to change the delay by another clock to compensate, but I'll need look into that more once I slow it down and hook it back into the logic analyzer again.

    If the address phase ges timed wrong all bets are off so this is important. Writes will be rather risky if the HyperRAM thinks it gets a read command instead but the driver then drives data out from its pins from the P2 at the same time as the device does. :scream:

    Update: It's probably best to register these clock outputs and keep that as the default, plus the upper value is increased. I suspect keeping the clock output timing unregistered is possibly more dependent on path delays through the P2 vs when it is latched but I can't be sure. To do this I'll need to change those breakpoint frequencies again and try to center them in the overlapping portions.
  • roglohrogloh Posts: 5,122
    edited 2020-05-31 01:00
    I've been thinking about this page crossing problem in the HyperFlash. I think it makes sense to break apart the transfers that cross page boundaries into multiple portions. So if the page size is 16 bytes and you wanted to transfer 43 bytes from address offset 9 in some page, you would transfer 16-9 = 7 bytes first, then the remaining 36 bytes using some multiple of the page size as the burst size. I can certainly do this in the SPIN2 driver layer for burst reads but it would be good to squeeze it into the PASM driver itself and it would work with gfx and general list transfers etc.

    Given the way I already fragment the long bursts and can continue them, this may not be too much code and could probably fit the way I do things. I need to think about it...
  • evanhevanh Posts: 15,126
    edited 2020-05-31 01:29
    rogloh wrote: »
    evanh wrote: »
    Intriguing, the hyperRAM fades above 300 MHz.

    That hyperFlash has more frequency bands than expected too. Presumably that's registered clock pin, unregistered data pins, correct?
    No, this was with the clock unregistered. It was alternating between registered/unregistered data only through the frequency bands.
    Oh, ouch, narrow. Makes sense though. That highest band is too wide, it doesn't fit the trend.
  • roglohrogloh Posts: 5,122
    edited 2020-05-31 01:39
    Are you talking about the flash or RAM? I mean the narrow comment.
Sign In or Register to comment.