HyperBus (AKA xSPI) example for P2-ES Eval Board HyperRAM & HyperFlash Add-on, Part number 64004-ES

I've attached Ozpropdev's example program, that we used to test the HyperRAM/HyperFlash add-on. He wrote it as a quick example, so it isn't heavily commented, but it is clearly laid out, with sections that read and write data to the RAM, as well as a separate flash test.
«1

Comments

  • For those of you with RevB Eval boards here's a another code example.
    Requires a VGA module on pin 48+ and hyperram on pin 32+.
    RGB24.DAT file is written to hyperram and repeatedly burst read back to hub.
    Code also includes compensation for all system clock speeds.
    Has tested Ok up to 380MHz (190Mb/S) :cool:
  • RaymanRayman Posts: 10,470
    edited 2019-11-13 - 21:16:15
    This is strange...

    I'm getting some video that almost (but not quite) looks right with VGA on Pin 0+, but nothing on Pin 48+.
    I'm looking at the code and don't see how this is possible though...
  • Oops, I was looking at the pin numbers wrongly, seems board layout different...
    Think it might be working...
    2016 x 1512 - 851K
  • Rayman wrote: »
    Oops, I was looking at the pin numbers wrongly, seems board layout different...
    Think it might be working...

    Yes, that's the correct output.
  • evanhevanh Posts: 9,001
    edited 2019-11-14 - 10:47:53
    I want to see how feasible HyperRAM is with sysclock/1 bursting. If it works at all then I would expect to exceed the rated peak 200 MB/s without needing crazy prop2 overclocking.

    But I'm still waiting for delivery of HyperRAM accessory board. It's been stuck in Auckland since 5:50 AM (18 hours and counting). Sadly tomorrow is Canterbury anniversary day so there won't be any deliveries until Monday now.
  • If you can get it to run at sysclock/1 speeds that would be fantastic for video applications too and enable higher colour depths/resolutions, especially in that midband operating range of around 100MHz where sysclock/2 reduces HyperRAM performance down to only 50MB/s which isn't that fast compared to what it is really capable of.
  • Hell yeah, the benefit for display support really sticks out.

    The other possible strong application is for deep memory "scope mode" capturing. It won't be the GHz sampling of common scopes on the market today but still shouldn't be too shabby. At the very least Brian's logic analyser can be deep memory extended with HyperRAM.
  • roglohrogloh Posts: 1,999
    edited 2019-11-14 - 14:36:59
    evanh wrote: »
    The other possible strong application is for deep memory "scope mode" capturing. It won't be the GHz sampling of common scopes on the market today but still shouldn't be too shabby. At the very least Brian's logic analyser can be deep memory extended with HyperRAM.

    I'll say. I'd take a 200MHz logic analyser with 16MB sample memory or more. Right now at home I'm using a 12MHz 8 channel one and it's good for what it can do when I under clock things but there are times when higher speed and/or more channels is warranted. The best thing is you can basically use it on the same platform you are developing on which is ideal for doing IO state analysis/debug work. It was great for me to capture the HDMI pin output at full speed when I was looking at that.
  • Yeah, sysclk/1 would be cool but trickier.
    Even at sysclk/2 I found I had to "tune" the clock edge to gaurantee valid data.
    Here's what my testing revealed by sweeping sysclk across 20-390 MHz
    '$' represents valid data (On the money!)
    WAITX #004
    $$$$$$$$$$$-----------------------------------------------------------------
    020 to 070 MHz
    
    WAITX #005
    $$$$$$$$$$$$$$$$$$$$$$$$$$$-------------------------------------------------
    020 to 150 MHz
    
    WAITX #006
    --------------$$$$$$$$$$$$$$$$$$$$$$$$$$$$----------------------------------
    090 to 225 MHz
    
    WAITX #007
    ------------------------------$$$$$$$$$$$$$$$$$$$$$$$$$$$$------------------
    170 to 305 MHz
    
    WAITX #008
    -----------------------------------------------$$$$$$$$$$$$$$$$$$$$$$$$$$---
    255 to 380 MHz
    
    WAITX #009
    -----------------------------------------------------------------$$$$$$$$$$-
    345 to 390 MHz
    
    
    000000000000000011111111111111111111222222222222222222223333333333333333333
    223344556677889900112233445566778899001122334455667788990011223344556677889
    050505050505050505050505050505050505050505050505050505050505050505050505050
    
    Looking forward to see what you come up with Evan. :)

  • I ran some tests this afternoon and had code with a sysclk of 200MHz burst reading hyperram @ 200MB/S.
    Encouraging early results... :)
  • evanhevanh Posts: 9,001
    edited 2019-11-15 - 10:26:57
    Well done!

    I had pondered keeping quiet to get first shot at it. :D

    PS: I don't know if you've tried it but I speculate that enabling clocked (registered) pins will be beneficial for a wider working range of each compensating value.
  • roglohrogloh Posts: 1,999
    edited 2019-11-15 - 10:32:44
    This sounds great evanh and ozpropdev. If it all works out it will be handy for cases with P2's doing video with lower to medium clock rates. For P2's at over say 200MHz+, they then could always choose to run at sysclk/2. I wonder what maximum overclocked HyperRAM speed can now be attained before it tops out...?
  • RaymanRayman Posts: 10,470
    edited 2019-11-15 - 12:04:44
    That's great news ozpropdev. I wasn't sure that was going to be possible...

    BTW: Chip asked once about using "registered" I/O... Have you tried that?
  • This is exciting not least because it now opens up the possibility of using the 1v8 hyperram (with the pins in 1v8 dac mode for output, and a>d mode for input).

    Those 1v8 hyperrams are rated to 333 MB/sec from memory
  • jmgjmg Posts: 14,278
    Tubular wrote: »
    This is exciting not least because it now opens up the possibility of using the 1v8 hyperram (with the pins in 1v8 dac mode for output, and a>d mode for input).
    Those 1v8 hyperrams are rated to 333 MB/sec from memory
    I'm not sure those pathways, are going to be capable of useful speed apertures.
    The WAITX bands nicely plotted above by ozpropdev, show that delays are already needing to be tuned.
    Less clear is how much temperature spread comes into play, but usually that is significant.
    That may mean users need care in selecting a MHz justified left in one of those bands, to allow for temperature spreads.
  • Yes these things need to be tested. The output dacs seem plenty fast enough, which leaves the question of the input comparator in pin a>d mode, and the delays associated with this.

    There is another approach and that is to start with the 3v3 (100 MHz) hyperram and gradually lower the supply voltage, to see if the 133 MHz/266MBps and 166 MHz/333MBps spec points can be achieved at some intermediate (non 1v8) operating point

    It makes sense to do some temperature sweeps of the hyperram at some point too
  • evanhevanh Posts: 9,001
    edited 2019-11-15 - 20:24:31
    I'm thinking lowering the voltage won't help at all, ie: the Hyper parts won't change mode. It'll still go faster at 3.3 volts, just out of spec is all. Reliability is the compromise. Attenuation means it'll probably not be a centred waveform and at some higher frequency that'll stop crossing the logic threshold.

    The prop2 will perform worse if the voltage is lowered. It won't be a viable option.
  • There's no changing the prop2 voltage, it still has 3v3 VIO

    You're using the fast settling time of the P2 dacs to do the physical i/o driving at 1v8 or 1v1 or 300mV or whatever level you need. You still use the normal digital signal as if its a 3v3 logic pin. Its a mode I suggested to Chip so we can interface with non 3v3 logic just like this

    The hyperrams have spec points at 100 and 166 MHz but also at 133 MHz in between. Of course this operation may venture off the straight and narrow spec, just like standard overclocking

    Whether the input comparator (in pin A>D mode) can keen up is one question I have, but even if the input can't keep up, being able to burst write fast is still useful for capture (just not so useful for video).
  • The RevA eval pcb would allow tweaking the voltages (or injecting alternative voltage sources) for both VIO and VDD. Could swap the P2 chip out to the latest one.
  • evanhevanh Posts: 9,001
    edited 2019-11-15 - 23:27:02
    Brian,
    Effective use of assembler with pinx and bytx constants you've got there. I don't know what I would have done beyond going straight to streamer ops. Maybe used ALTxx prefix instructions.

    EDIT: Ah, not so effective for the VGA example though because of the WFBYTE.

    It shows up how important the streamer pin groupings are.

  • jmgjmg Posts: 14,278
    Tubular wrote: »
    There is another approach and that is to start with the 3v3 (100 MHz) hyperram and gradually lower the supply voltage, to see if the 133 MHz/266MBps and 166 MHz/333MBps spec points can be achieved at some intermediate (non 1v8) operating point
    You would probably do better tightening up the supply, rather than lowering it.
    Data specs 3.0V, or almost 10%, so improving Vcc to 2% or 1% would give more MHz. - might get to 250MHz P2 sysclk.
    They seem to also have automotive temp specs, so ordering those may give more Temp margin, and so also higher MHz.


  • Are there any more blank rev a pcbs I wonder?
  • There's nothing new in the revB chip special to Hyperbus. A revA chip could prove anything worth testing using those the VIO jumpers on the revA Eval Board.
  • evanh wrote: »
    There's nothing new in the revB chip special to Hyperbus. A revA chip could prove anything worth testing using those the VIO jumpers on the revA Eval Board.

    How about the ability to push the RevB silicon to higher speeds... Would that be relevant in this case?

  • I guess that depends on what extremes can be done at all. RevA can reliably do 250 MHz without cooling. First up, we need to prove sysclock/1 at slower speeds.
  • sysclk/1 burst transfers have turned out to require quite a bit of tuning depending on speed.
    I'm using the streamer to achieve sysclk/1 transfers instead of fast hub mode.
    Some speeds require "clocked IO" and other don't.
    'm still trying to map it all out and nail down a reliable universal code snippet.
    I'll keep digging....
  • "clocked IO"... That's what I was thinking of...
    Hopefully that will work at all speeds...
  • ozpropdev wrote: »
    'm still trying to map it all out and nail down a reliable universal code snippet.
    I'll keep digging....

    Yeah keep at it if you can Brian. This is going to be really useful and max out the HyperRAM's performance with the P2 if it is possible to find a stable/reliable setup.
  • jmgjmg Posts: 14,278
    ozpropdev wrote: »
    sysclk/1 burst transfers have turned out to require quite a bit of tuning depending on speed.
    I'm using the streamer to achieve sysclk/1 transfers instead of fast hub mode.
    Some speeds require "clocked IO" and other don't.
    'm still trying to map it all out and nail down a reliable universal code snippet.
    I'll keep digging....

    Interesting, yes, it's expected to be 'fussy'.
    Hopefully, a solution can be found that does not need re-tune with temperature changes...
  • evanhevanh Posts: 9,001
    edited 2019-11-18 - 05:10:16
    I've got my old streamer-streamer copy via pins testing program setup. Integrated a bit-bashed block write to hyperRAM followed by a bit-bashed block read and compare. Using the same 1 kByte of pre-generated random data as previously, to prove reliability over many cycles and many transition combinations. Took a couple of days to get my act together but all went smoothly there.

    Now I've replaced the bit-bashed block write with a streamer+smartpin fed one. At this point I can reintroduce the compensation value and find a band of useful starting alignments between smartpin and streamer.

    At sysclock/2, and larger dividers, everything went as planned and got reliable 100% scoring on a single compensation value all the way to 380 MHz sysclock. Same method as I was able to do for the dual SPI testing. With high temperatures, > 80 °C, that was reduced to about 320 MHz. Err, haven't got that far yet, that's a rx issue.

    At sysclock/1 not so much. Best I've got so far is about 99% data integrity.

    I thought maybe there was a page boundary issue but even first look using the scope shows me a bit0 error at 18th byte, address 17. It should be a 1 but always reads back as a 0. And it's consistent.

Sign In or Register to comment.