Could you post a closer top-view of the Spansion HyperRam you are using at that setup and also if you have a link to the original datasheet of that exact chip?
Also, is it possible for you to confirm you have the HyperRam chip soldered to a Proto Advantage's IPC0171 BGA-25 to DIP-25 SMT Adapter?
This is my first attempt at running the HyperVGA_640_x_480_16bpp_1d.spin2 code.
Is there no capacitor on that HR board? Looks like it's almost working...
I'd try the VGA 8bpp code and then turn down the P2 clock as low as you can...
Is clocking enabled on the pins, to align in and out timing with the clock?
Thanks. I have no idea how to enable clocking... I'm just using plain WAITSE1 and then a WFBYTE INB,.. loop for reading.
It's a bit, %C, in the pin config bits when setting with WRPIN instruction. Right next to the invert config bits. I doubt you'll notice any improvement though. On the other hand it will give you an additional 2-clocks of lag from HRclock start to the data coming back. May be useful if needing more setup time.
One graphics updating strategy might be to buffer changes to the screen on a per line basis.
Then, update that line after it is read in from HyperRam.
Then, write it back into HyperRam.
For VGA@8bpp, there's easily 16 us of space to edit the line. That's ~2000 instructions worth of time.
Here's that scope capture again showing HyperRam access (when yellow trace is low) and VGA HSync (when green trace is high)
This assumes though that we can write as fast as we read, which I think is true, but not demonstrated...
Is clocking enabled on the pins, to align in and out timing with the clock?
Thanks. I have no idea how to enable clocking... I'm just using plain WAITSE1 and then a WFBYTE INB,.. loop for reading.
Wait, maybe I do know. I suppose one could enable smartpins on the bus and do something like that...
Maybe a pulldown would be a good idea too...
But this works perfectly as is.
Have you tried a temperature sweep test ?
Pin-register clocking enable tightens up timing skews, and especially variations with temperature.
The present code syncs to RWDS, and then reads every second sysclk, so there is scope to have 2 sample phase choices.
Is clocking enabled on the pins, to align in and out timing with the clock?
What is the output and input timing sample instant, relative to an opcode ?
It may be better to wait on RWDS edge and then shift half an opcode (1 sysclk) so you sample away from the edge-zone. (in current 2 sysclk design)
(Assuming WAITSE1 & WFBYTE INB have the same sample instant - the WAITSE1 may have an edge-sense-delay ?)
Would that be a WAIT #3 as the shortest ?
The faster 100MHz HRclk and 200MHz sysclk idea, samples every sysclk, so there is no SW adjust possible, but I think capture of RWDS pattern could give some modest timing check.
If marginal, this may need some external hardware phase adjust.
I think there is no 1 clock instruction.
If there is an issue, which there doesn't seem to be now, maybe changing the RWDS pin to be smartpin and using Schmidt trigger mode would fix it...
Or, if the RWDS timing really is fixed in this configuration, then we could do a 3-clock instruction in there to line it up better...
I think there is no 1 clock instruction.
Or, if the RWDS timing really is fixed in this configuration, then we could do a 3-clock instruction in there to line it up better...
If there is an issue, which there doesn't seem to be now, maybe changing the RWDS pin to be smartpin and using Schmidt trigger mode would fix it...
I would tend to make the RWDS setting identical to the data-bus setting, to try and match-up their sampling instants, and select whatever pin mode gives the lowest skews, CLK-out and DATA-back.
..
The faster 100MHz HRclk and 200MHz sysclk idea, samples every sysclk, so there is no SW adjust possible, but I think capture of RWDS pattern could give some modest timing check.
If marginal, this may need some external hardware phase adjust.
I ran some quick Spice numbers on a simple RLC delay line, to get some ballparks for hardware delay lines...
Delays 50% to 50% Tr,Tf = 1ns Period = 10ns
180/0.5pF/1nH/3pF => 499.45474ps
180/0.5pF/100nH/3pF => 763.4584ps
180/0.5pF/200nH/3pF => 928.80259ps
Digikey lists a series
HK1005R10J-T Taiyo Yuden $0.01208/3k Multilayer Ferrite 100nH ±5% 200mA Unshielded 1.5 Ohm Max Q=8 @ 100MHz SR>600MHz -55°C ~ 125°C 100MHz Surface Mount 0402
Seems things can be moved - I wonder how much P2 varies the sample window instant, with PVT ?
This is my first attempt at running the HyperVGA_640_x_480_16bpp_1d.spin2 code.
Is there no capacitor on that HR board? Looks like it's almost working...
I'd try the VGA 8bpp code and then turn down the P2 clock as low as you can...
I did add two 0.01 uF ceramic capacitors last night right after taking the photo. This was on the surface-mount part of the bottom of the board where +3V and 0V supply wires happened to be adjacent.
Didn't change much at all, but having them there is obviously better.
I'm thinking I'm missing pullups and pulldowns?
RWDS might be picking up noise especially. Of any pin this is the most annoying because it tri-states and is input and output.
Rayman, on your board, is CSn pulled up, and is RWDS pulled down? Or something different?
I'm probably also going to directly solder the CLK wire and run a ground next to it if the resistors don't help.
Most of the pin config bits in a WRPIN instruction are not a smartpin mode setting. So setting pullup/down features can be done to the plain OUT control. Just keep the %MMMMM bits as all zeros, for no smartpin, and you'll get IN/OUT bit-bashing with optional features like a pullup.
Each DAC can be configured and operated without activating its smartpin. In fact using the streamers to control the DACs requires avoiding the smartpin.
BTW: I'm thinking about using two HR chips to make 16-bit bus...
That's game, but cool. Are you making an accessory board each time? Actually, I'm not sure how you've got the existing HyperRAM mounted. I thought you had an adaptor between two eval boards but that might have been someone else.
I was able to bring all the signals out without any vias under the chip...
Thanks. Good, a designed accessory board. You've got a good ground plane, check.
DQ1 track needs some shortening to make it a better match to the others. RWDS track also should length match the data tracks. Maybe move the footprint up a little to make it closer to second header will help with the track lengths.
You'll need a third accessory header to operate a second HyperRAM as 16-bit data. Obviously that's a different arrangement but do keep this guideline in mind when doing the layout.
Here's the source for burst reading after having issued the command to the HyperRAM chip:
setbyte dirb,#$00,#0 'release control of buffer
'configure smartpin to run HR clock
dirl #Pin_CK
wrpin #%1_00110_0,#Pin_CK
wxpin #1,#Pin_CK 'add on every clock
mov pa,#1
shl pa,#30 '250/4 = 62.5 MHz
wypin pa,#Pin_CK
dirh #Pin_CK
'prepare to load buffer using fifo
loc ptra,#@HyperBuffer
mov pa,HyperRow 'need 4 rows to fill buffer
and pa,#3
mul pa,##512
add ptra,pa
wrfast #0,ptra
'Wait for Pin_RWDS to go high
setse1 #$80+Pin_RWDS
waitse1
'read in bytes
rep #1,##512
wfbyte inb
Note it doesn't seem to need to be quick about the setup.
That's one really nice feature of HR, the clock can be lazy...
I was reading another manufacturers datasheet and remember thinking that the text is nearly verbatim...
I wonder if there's a common spec that they all have to meet...
Today, I'm going to dream about what I could do with 4 HR chips to make a 32-bit bus...
The last available version (Document Number: 001-99253 Rev. *H) of HyperBus™ Specification, Low Signal Count, High Performance DDRBus, can be freely downloaded at Cypress site, at the link:
Apart from any device caracteristic/specific data (available at its most recently updated datasheet) , all non NDA-dependant information you may need can be found there.
Since you are laying out your own boards, there is another interesting reading:
It might have helped me to have seen the recommended layout...
I think I probably violated most of their recommendations.
But, I'm less than half of the 333 MBps max. possible read rate.
I thought I was pretty smart when I was able to bring all the traces out without vias, but now I see that it was designed that way!
Just had a thought about better alignment of HR data with INA clock...
I could sacrifice the first byte or two in each row and insert a 3-clock RDLUT instruction before reading in row data.
In the current configurations, I'm not using all the data in each 1024 byte row...
But, seeing as I'm not having an issue, I'll probably just leave as is.
Just had a thought about better alignment of HR data with INA clock...
I could sacrifice the first byte or two in each row and insert a 3-clock RDLUT instruction before reading in row data.
In the current configurations, I'm not using all the data in each 1024 byte row...
But, seeing as I'm not having an issue, I'll probably just leave as is.
It could be worth seeing if it is any worse, or just as stable ? - add some warming and cooling in there too.. ?
It's not easy to nail down the exact slot timing, from the docs, and it may be a Rise detect uses D-FF's+Gate, which has a built in 1-sysclk delay ?
ie If that's the case, adding a single sysclk offset, may make things worse....
Put another way, does wait for high, have identical edge relative timing to wait for rise ?
I was reading another manufacturers datasheet and remember thinking that the text is nearly verbatim...
I wonder if there's a common spec that they all have to meet...
I think that's related to the need for a second source, otherwise no one designs it in.
As weird as it can appear to many people, the true access time limiting factor, at least for 3.0 V parts, is at the interface circuits, crafted to communicate with the external world; they are switching-voltage-related.
There is a lot of externally generated noise being picked-up by the internal interface circuits, that also has to deal with its self-generated noise, and the one that is produced at its close vicinity, by underlying circuits: the next sensible element at the data chain, the main and aux full-row buffers, 1024 bytes-long, each.
To the point they had to limit the specced transfer rate to 200 MB/s (HyperClk = 100 MHz max.).
To achieve 333 MB/s (@ 166 MHz HyperClk), they had to lower the interface voltage levels (1.8 V) and agregate a second clock, CK#, in order to better sync the transfer of data, while relegating to the applications designer/engineer, the burden of ensureing the right phase and timing of both CKs, RWDS and Data lanes.
So the real point is: irrespective to the version an application uses (3.0 V or 1.8 V), good signal conditioning and impedance matching always pays for itself.
At the end of the day, statistics ever bites; without any ECC error-correcting provisions, a single bad readen/writen bit can mostly trash a whole application. Or, at least, be cause of some head aches and incurring not-so-gentle phone calls (back to the old days; now e-mails, for real world applications).
Always remember, the original market HyperRam was intended for, was to enable XIP, as part of a cache scheme. That is the reason it has so many commands intended to rapidly fill a cache line. The early Spansion designers were targeting the $$$$ bucks of massive automotive applications markets. HyperFlash was designed with that kind of application in mind.
The uses HyperRam finds at screen buffers, with its single linear burst transfer mode, are a foreseeable sequell.
Better stay within the best tracks you could craft (pun intended).
Comments
Is clocking enabled on the pins, to align in and out timing with the clock?
Could you post a closer top-view of the Spansion HyperRam you are using at that setup and also if you have a link to the original datasheet of that exact chip?
Also, is it possible for you to confirm you have the HyperRam chip soldered to a Proto Advantage's IPC0171 BGA-25 to DIP-25 SMT Adapter?
Henrique
Wait, maybe I do know. I suppose one could enable smartpins on the bus and do something like that...
Maybe a pulldown would be a good idea too...
But this works perfectly as is.
Is there no capacitor on that HR board? Looks like it's almost working...
I'd try the VGA 8bpp code and then turn down the P2 clock as low as you can...
Then, update that line after it is read in from HyperRam.
Then, write it back into HyperRam.
For VGA@8bpp, there's easily 16 us of space to edit the line. That's ~2000 instructions worth of time.
Here's that scope capture again showing HyperRam access (when yellow trace is low) and VGA HSync (when green trace is high)
This assumes though that we can write as fast as we read, which I think is true, but not demonstrated...
Have you tried a temperature sweep test ?
Pin-register clocking enable tightens up timing skews, and especially variations with temperature.
The present code syncs to RWDS, and then reads every second sysclk, so there is scope to have 2 sample phase choices.
What is the output and input timing sample instant, relative to an opcode ?
It may be better to wait on RWDS edge and then shift half an opcode (1 sysclk) so you sample away from the edge-zone. (in current 2 sysclk design)
(Assuming WAITSE1 & WFBYTE INB have the same sample instant - the WAITSE1 may have an edge-sense-delay ?)
Would that be a WAIT #3 as the shortest ?
The faster 100MHz HRclk and 200MHz sysclk idea, samples every sysclk, so there is no SW adjust possible, but I think capture of RWDS pattern could give some modest timing check.
If marginal, this may need some external hardware phase adjust.
If there is an issue, which there doesn't seem to be now, maybe changing the RWDS pin to be smartpin and using Schmidt trigger mode would fix it...
Or, if the RWDS timing really is fixed in this configuration, then we could do a 3-clock instruction in there to line it up better...
I would tend to make the RWDS setting identical to the data-bus setting, to try and match-up their sampling instants, and select whatever pin mode gives the lowest skews, CLK-out and DATA-back.
I ran some quick Spice numbers on a simple RLC delay line, to get some ballparks for hardware delay lines...
Digikey lists a series
HK1005R10J-T Taiyo Yuden $0.01208/3k Multilayer Ferrite 100nH ±5% 200mA Unshielded 1.5 Ohm Max Q=8 @ 100MHz SR>600MHz -55°C ~ 125°C 100MHz Surface Mount 0402
Seems things can be moved - I wonder how much P2 varies the sample window instant, with PVT ?
I did add two 0.01 uF ceramic capacitors last night right after taking the photo. This was on the surface-mount part of the bottom of the board where +3V and 0V supply wires happened to be adjacent.
Didn't change much at all, but having them there is obviously better.
I'm thinking I'm missing pullups and pulldowns?
RWDS might be picking up noise especially. Of any pin this is the most annoying because it tri-states and is input and output.
Rayman, on your board, is CSn pulled up, and is RWDS pulled down? Or something different?
I'm probably also going to directly solder the CLK wire and run a ground next to it if the resistors don't help.
I think I now have RWDS as input only for reads.
BTW: I'm thinking about using two HR chips to make 16-bit bus...
I did add some WS leds to the other side, but hyperram part is the same...
I was able to bring all the signals out without any vias under the chip...
In your first post you stated 125MB/s... is that number holding up?
DQ1 track needs some shortening to make it a better match to the others. RWDS track also should length match the data tracks. Maybe move the footprint up a little to make it closer to second header will help with the track lengths.
You'll need a third accessory header to operate a second HyperRAM as 16-bit data. Obviously that's a different arrangement but do keep this guideline in mind when doing the layout.
Note it doesn't seem to need to be quick about the setup.
I was reading another manufacturers datasheet and remember thinking that the text is nearly verbatim...
I wonder if there's a common spec that they all have to meet...
Today, I'm going to dream about what I could do with 4 HR chips to make a 32-bit bus...
I hope you already has them, but, just in case...
The last available version (Document Number: 001-99253 Rev. *H) of HyperBus™ Specification, Low Signal Count, High Performance DDRBus, can be freely downloaded at Cypress site, at the link:
https://cypress.com/file/213356/download
Apart from any device caracteristic/specific data (available at its most recently updated datasheet) , all non NDA-dependant information you may need can be found there.
Since you are laying out your own boards, there is another interesting reading:
Cypress: AN211622 HyperFlash™and HyperRAM™Layout Guide, Document No. 002-11622 Rev.*B , available at:
https://cypress.com/file/278156/download
Hope it helps
Henrique
I think I probably violated most of their recommendations.
But, I'm less than half of the 333 MBps max. possible read rate.
I thought I was pretty smart when I was able to bring all the traces out without vias, but now I see that it was designed that way!
I could sacrifice the first byte or two in each row and insert a 3-clock RDLUT instruction before reading in row data.
In the current configurations, I'm not using all the data in each 1024 byte row...
But, seeing as I'm not having an issue, I'll probably just leave as is.
It could be worth seeing if it is any worse, or just as stable ? - add some warming and cooling in there too.. ?
It's not easy to nail down the exact slot timing, from the docs, and it may be a Rise detect uses D-FF's+Gate, which has a built in 1-sysclk delay ?
ie If that's the case, adding a single sysclk offset, may make things worse....
Put another way, does wait for high, have identical edge relative timing to wait for rise ?
As weird as it can appear to many people, the true access time limiting factor, at least for 3.0 V parts, is at the interface circuits, crafted to communicate with the external world; they are switching-voltage-related.
There is a lot of externally generated noise being picked-up by the internal interface circuits, that also has to deal with its self-generated noise, and the one that is produced at its close vicinity, by underlying circuits: the next sensible element at the data chain, the main and aux full-row buffers, 1024 bytes-long, each.
To the point they had to limit the specced transfer rate to 200 MB/s (HyperClk = 100 MHz max.).
To achieve 333 MB/s (@ 166 MHz HyperClk), they had to lower the interface voltage levels (1.8 V) and agregate a second clock, CK#, in order to better sync the transfer of data, while relegating to the applications designer/engineer, the burden of ensureing the right phase and timing of both CKs, RWDS and Data lanes.
So the real point is: irrespective to the version an application uses (3.0 V or 1.8 V), good signal conditioning and impedance matching always pays for itself.
At the end of the day, statistics ever bites; without any ECC error-correcting provisions, a single bad readen/writen bit can mostly trash a whole application. Or, at least, be cause of some head aches and incurring not-so-gentle phone calls (back to the old days; now e-mails, for real world applications).
Always remember, the original market HyperRam was intended for, was to enable XIP, as part of a cache scheme. That is the reason it has so many commands intended to rapidly fill a cache line. The early Spansion designers were targeting the $$$$ bucks of massive automotive applications markets. HyperFlash was designed with that kind of application in mind.
The uses HyperRam finds at screen buffers, with its single linear burst transfer mode, are a foreseeable sequell.
Better stay within the best tracks you could craft (pun intended).
Henrique