Cooled it down with some ice packs and got another 30 MHz at the top.
Great scans
These are the waterfalls I derive from that, and the timing apertures they imply (I dropped the upper 4 pins, as they have unusual PCB loadings)
1/379M-1/357M = 0.162ns timing aperture spread to cover 60 pins, register mode
1/327M-1/287M = 0.426ns timing aperture spread to cover 60 pins, non-register mode
TmNon-TmReg = 0.5399ns shift in mean aperture sample point, relative register to non register. (+2 Sysclks)
As expected, register pin mode is much tighter spread, with 162ps, vs 426ps non-register mode
That register spread is quite good, one sixth of 1 nanosecond across any pin,
There is also an absolute shift in mean sampling point, of just over half a nano second, (0.5399ns + 2 Sysclks) with no skirt overlap, between Reg<->NoReg choices, that might be useful in some testing or design cases.
Some of that 162 ps could even be due XDIV = 20. I had to make it a larger value to achieve 1 MHz steps.
True, but the VCO only needs to be stable over a handful of sysclk cycles here, and the larger XDIV mainly increases lower frequency noise, or the variation over a few thousand sysclks.
It is useful to include any XDIV variation anyway.
If we take ~ 3 sysclks at ~ 10ns, a litmus test of 0.1% jitter variation (poor) on that, maps to ~ 10ps.
That 162ps aperture is going to 'walk' with temperature (PVT) as well.
It is nice to know what the pin-spread component is & how registered helps reduce spread.
Your two plots above are at different cooling, right ?
Do you have a guess/estimate of the dTemp on those 2 captures ?
Did any code change between them ? It may be possible to extract a ps/°C indicator tempco from the two tables ?
Here is a compare of the two warmer/cold plots
There seems to be about 16ps of variance (VCO jitter?) in the waterfall patterns, ie the same pin is last in both plots, but the exact ripples vary a couple of lines across the pins
The shift due to test temperature looks to be ~ 143ps, which needs a Temperature change.
Icepacks may imply 25°C or 30°C change ?
That's a ballpark indication of 5 ps/°C, (=28.6°C) or over a 50°C operating range => 250ps, or over 75°C => 375ps
The Unregistered sweep to next aperture, gives a healthy 3.15ns between 4 & 3 changes, which nicely ~equals the period at 312MHz - ie an 'expected' result.
The puzzle here, is the Registered 6 sysclk & 5 sysclk apertures change points, seem to be only 466ps apart ?!
Below ~ 316MHz, Registered remains stable, (just pin 63 is late change at ~150MHz due to high loading effects on that pin, & the top 6 pins are clearly different )
Not sure how to explain how it can pick up another sysclk in just 466 ps period change ?
This is pushing up to the top-spec and well overclocked, so maybe other effects come into play ?
Below 300+ MHz, the Registered delay is well behaved.
Just above 300MHz is where both Registered & UnRegistered have an aperture point, so that's probably best avoided operating point.
Here's 1080p @ 4bpp. Can't do a pretty bird in 4bpp...
This is full screen, whereas couldn't quite get to full screen at 8bpp.
Need 300 MHz to get my monitor to like it.
Still needed that NOP in read burst routine.
Had to reorder nibbles in video loop. Glad that option was there!
Had to add to front porch Hsync to make right.
I was looking at your code rather in depth today.
I noticed this line in most of your different display resolution sources:
'Wait for Pin_RWDS to go high
setse1 #$80+Pin_RWDS
waitse1
When i look at the P2 documentation V32, i see that setse1 has a pattern like this:
%001_PPPPPP = INA/INB bit of pin %PPPPPP rises
%010_PPPPPP = INA/INB bit of pin %PPPPPP falls
%011_PPPPPP = INA/INB bit of pin %PPPPPP changes
%10x_PPPPPP = INA/INB bit of pin %PPPPPP is low
%11x_PPPPPP = INA/INB bit of pin %PPPPPP is high
Yep, I'd think 30 °C. I have a little PTC based weather thermometer resting on top of the chip. And the ice packs were top and bottom. Bottom one was touching the eval board. I left it that way for a while before the cold run. Thermometer read about 2.0 °C. Got a little lower in the end.
The first run, at room temperature, the thermometer was probably in the low 30's.
I've been planning to buy some fine thermocouple wire, that I'll solder to the centre underside of the board. This'll give much more direct reading. I must get it ordered ...
There is a difference in how those two event types get triggered. The edge detect version, %010, does as you are saying.
The level detect version, %100, however, can catch you out as it will retrigger on a held level if that level condition is still present after checking (clearing the trigger). If not managed, this will create excess unexpected triggers. Which can produced duplicate or leading type effects where the program seems to be instant when it shouldn't be.
Whicker: good catch. I’ll have to fix the text and/or code.
It would be correct if said:
'Wait for Pin_RWDS to go high and then low
Maybe I should try to skew the timing somehow... On the other hand, it seems to work...
I think it works because the data for the first pixel is from skipping at least one byte, or it unfortunately is testing the 8 nS data line hold time right at the margin. I think it's the former, because in WriteRamBurstSub we see the following line:
'Latency Clocks
nop
mov i2,#27'24 'need to check that this is right...
LoopLat1
outnot #Pin_CK
djnz i2,#LoopLat1
I think the default latency is 5 clocks, and 2x latency?
I haven't looked at the ISSI datasheet.
But the latency clocking starts early before all of the command address has arrived. One rise for 15:8 and one fall for 7:0.
This is already clocked manually in the first part of the code.
So with the assumed default settings, 5-1=4 clock cycles for the first part of the latency. 4*2 = 8 transitions.
5 more clock cycles for the second part of the latency. 5*2 = 10 transitions.
So that's only 8+10=18 clock transitions.
If default latency was 6, then it's 6-1=5, 5*2=10 transitions. Then adding to it is 6*2 = 12. 22 transitions to get the first data byte.
As shown in the attached image, with a latency of 4, then it's a total of 14 transitions, then put the data on, then rising edge of clock.
Somewhere there's something interesting going on is all I'm saying.
So despite the comment, it's waiting for a falling edge of RWDS?
...
Somewhere there's something interesting going on is all I'm saying.
Well spotted.
The clock also changes gear, from SW send of the addr.command part, to using a SmartPin generated faster clock, for the latency 'spare clocks' region & following data
setbyte outb,#0,#0 'CA7..0 'lower 3 bits are A2..A0
outnot #Pin_CK
setbyte dirb,#$00,#0 'release control of buffer
'configure smartpin to run HR clock
dirl #Pin_CK
wrpin #%1_00110_0,#Pin_CK
wxpin #1,#Pin_CK 'add on every clock
mov pa,#1
shl pa,#30 'SysCLK/4 = 62.5 or 75MHz
wypin pa,#Pin_CK
dirh #Pin_CK
'prepare to load buffer using fifo
loc ptra,#@HyperBuffer
mov pa,pb
mul pa,##480 'pb is section of buffer, 0..2
add ptra,pa
wrfast #0,ptra
'calc # bytes to read
mov pa,##480
'Wait for Pin_RWDS to go high
setse1 #$80+Pin_RWDS
waitse1
nop 'need this at 300 MHz?
'read in bytes
rep #1,pa
wfbyte inb
.reploop2
Current code uses SysCLK/4, but that pushes to need 300MHz PLL's, which is ok for testing, but not really shippable...
It may be possible to use SysCLK/2, and shuffle the code sections about a little, eg so 'prepare to load buffer using fifo goes before 'configure smartpin to run HR clock
I notice octaRAM floats DQS slightly longer (goes low later), than HyperFLASH RQDS, so maybe a pull-down mode on the pin would tolerate RAM types more.
...
The Unregistered sweep to next aperture, gives a healthy 3.15ns between 4 & 3 changes, which nicely ~equals the period at 312MHz - ie an 'expected' result.
The puzzle here, is the Registered 6 sysclk & 5 sysclk apertures change points, seem to be only 466ps apart ?!
Below ~ 316MHz, Registered remains stable, (just pin 63 is late change at ~150MHz due to high loading effects on that pin, & the top 6 pins are clearly different )
Not sure how to explain how it can pick up another sysclk in just 466 ps period change ?
This is pushing up to the top-spec and well overclocked, so maybe other effects come into play ?
Looking some more into 466ps apparent clock width, I did find one mechanism that appeared in doing PCB trace delay spice runs.. and that is clock ringing.
If the internal cock on the PAD ring has some ringing, as per the spice, that could explain the 466ns 466ps separation measurement result ?
I'n not sure what this could mean for things like edge-detection circuits, but I think that's not done in the PAD Ring ?
...
The Unregistered sweep to next aperture, gives a healthy 3.15ns between 4 & 3 changes, which nicely ~equals the period at 312MHz - ie an 'expected' result.
The puzzle here, is the Registered 6 sysclk & 5 sysclk apertures change points, seem to be only 466ps apart ?!
Below ~ 316MHz, Registered remains stable, (just pin 63 is late change at ~150MHz due to high loading effects on that pin, & the top 6 pins are clearly different )
Not sure how to explain how it can pick up another sysclk in just 466 ps period change ?
This is pushing up to the top-spec and well overclocked, so maybe other effects come into play ?
Looking some more into 466ps apparent clock width, I did find one mechanism that appeared in doing PCB trace delay spice runs.. and that is clock ringing.
If the internal cock on the PAD ring has some ringing, as per the spice, that could explain the 466ns separation measurement result ?
I'n not sure what this could mean for things like edge-detection circuits, but I think that's not done in the PAD Ring ?
It seems maybe it missed the first clock, but was registered on the second clock, instead?
It seems maybe it missed the first clock, but was registered on the second clock, instead?
If that was true, the change would not be 466ps ?
The non-registered path does seem to have next-clock step size, which is more expected.
Maybe there is some delay between the clocking that occurs in the actual I/O pad and the internal core flops? Or, the data paths subject to the two clocking contexts?
I just did a test where I made the first two pixels in the first row different. Did the fast row read and then output result to serial port.
Looks good. We are getting the data read in where it is supposed to be.
I was just wondering why the 640x480x16bpp bird photo looks so good on my 1080p monitor...
Detail around the beak is much smoother than is should be at this resolution.
Turns out the monitor actually has upscaling built into it.
It's a Samsung S24E450. Not even my best monitor...
This is really good for a P2 with VGA HDMI focus...
Think I'm going to target 16bpp VGA for games. 1080p for sure for profi stuff.
Just tried HyperRam code for HyperFlash. Looks to me like has same exact read and write protocol.
But, got no response at all from HyperFlash…
Went back to datasheet and now I see that the CSn ball is different for HyperFlash
Looks like all the other needed pins are the same, except this one.
They did made it that way, in order to enable the existence of multi-chip package options, where both can be accomodated, one HyperFlash and one HyperRam, their respective silicon dices stacked, one on top of the other, using a unique FBGA package footprint.
Be aware that some multi-chip packaged devices could also present a up-to 1 nS timing penalty, at some/all the signals and their respective interrelationships.
Comments
Great scans
These are the waterfalls I derive from that, and the timing apertures they imply (I dropped the upper 4 pins, as they have unusual PCB loadings)
1/379M-1/357M = 0.162ns timing aperture spread to cover 60 pins, register mode
1/327M-1/287M = 0.426ns timing aperture spread to cover 60 pins, non-register mode
TmNon-TmReg = 0.5399ns shift in mean aperture sample point, relative register to non register. (+2 Sysclks)
As expected, register pin mode is much tighter spread, with 162ps, vs 426ps non-register mode
That register spread is quite good, one sixth of 1 nanosecond across any pin,
There is also an absolute shift in mean sampling point, of just over half a nano second, (0.5399ns + 2 Sysclks) with no skirt overlap, between Reg<->NoReg choices, that might be useful in some testing or design cases.
It is useful to include any XDIV variation anyway.
If we take ~ 3 sysclks at ~ 10ns, a litmus test of 0.1% jitter variation (poor) on that, maps to ~ 10ps.
That 162ps aperture is going to 'walk' with temperature (PVT) as well.
It is nice to know what the pin-spread component is & how registered helps reduce spread.
Your two plots above are at different cooling, right ?
Do you have a guess/estimate of the dTemp on those 2 captures ?
Did any code change between them ? It may be possible to extract a ps/°C indicator tempco from the two tables ?
I'd specially enjoyed the waterfalls, because they easy the study of when and where things start to change.
There seems to be about 16ps of variance (VCO jitter?) in the waterfall patterns, ie the same pin is last in both plots, but the exact ripples vary a couple of lines across the pins
The shift due to test temperature looks to be ~ 143ps, which needs a Temperature change.
Icepacks may imply 25°C or 30°C change ?
That's a ballpark indication of 5 ps/°C, (=28.6°C) or over a 50°C operating range => 250ps, or over 75°C => 375ps
I wonder if this MHz sweep technique can give a margin indicator to show where to avoid ?
So I then scan the file for ~50% bits flipped, as a better quality indicator of the middle of the aperture (highest slope of change)
The Unregistered sweep to next aperture, gives a healthy 3.15ns between 4 & 3 changes, which nicely ~equals the period at 312MHz - ie an 'expected' result.
The puzzle here, is the Registered 6 sysclk & 5 sysclk apertures change points, seem to be only 466ps apart ?!
Below ~ 316MHz, Registered remains stable, (just pin 63 is late change at ~150MHz due to high loading effects on that pin, & the top 6 pins are clearly different )
Not sure how to explain how it can pick up another sysclk in just 466 ps period change ?
This is pushing up to the top-spec and well overclocked, so maybe other effects come into play ?
Below 300+ MHz, the Registered delay is well behaved.
Just above 300MHz is where both Registered & UnRegistered have an aperture point, so that's probably best avoided operating point.
This is full screen, whereas couldn't quite get to full screen at 8bpp.
Need 300 MHz to get my monitor to like it.
Still needed that NOP in read burst routine.
Had to reorder nibbles in video loop. Glad that option was there!
Had to add to front porch Hsync to make right.
Maybe half line each line
I noticed this line in most of your different display resolution sources:
When i look at the P2 documentation V32, i see that setse1 has a pattern like this:
so the line with $80 is basically:
So despite the comment, it's waiting for a falling edge of RWDS?
The first run, at room temperature, the thermometer was probably in the low 30's.
BTW: There is 62 pins in the graphs. numbered 61 .. 0.
It would be correct if said:
Maybe I should try to skew the timing somehow... On the other hand, it seems to work...
The level detect version, %100, however, can catch you out as it will retrigger on a held level if that level condition is still present after checking (clearing the trigger). If not managed, this will create excess unexpected triggers. Which can produced duplicate or leading type effects where the program seems to be instant when it shouldn't be.
I think it works because the data for the first pixel is from skipping at least one byte, or it unfortunately is testing the 8 nS data line hold time right at the margin. I think it's the former, because in WriteRamBurstSub we see the following line:
I think the default latency is 5 clocks, and 2x latency?
I haven't looked at the ISSI datasheet.
But the latency clocking starts early before all of the command address has arrived. One rise for 15:8 and one fall for 7:0.
This is already clocked manually in the first part of the code.
So with the assumed default settings, 5-1=4 clock cycles for the first part of the latency. 4*2 = 8 transitions.
5 more clock cycles for the second part of the latency. 5*2 = 10 transitions.
So that's only 8+10=18 clock transitions.
If default latency was 6, then it's 6-1=5, 5*2=10 transitions. Then adding to it is 6*2 = 12. 22 transitions to get the first data byte.
As shown in the attached image, with a latency of 4, then it's a total of 14 transitions, then put the data on, then rising edge of clock.
Somewhere there's something interesting going on is all I'm saying.
The clock also changes gear, from SW send of the addr.command part, to using a SmartPin generated faster clock, for the latency 'spare clocks' region & following data
Current code uses SysCLK/4, but that pushes to need 300MHz PLL's, which is ok for testing, but not really shippable...
It may be possible to use SysCLK/2, and shuffle the code sections about a little, eg so 'prepare to load buffer using fifo goes before 'configure smartpin to run HR clock
I notice octaRAM floats DQS slightly longer (goes low later), than HyperFLASH RQDS, so maybe a pull-down mode on the pin would tolerate RAM types more.
If the internal cock on the PAD ring has some ringing, as per the spice, that could explain the 466ns 466ps separation measurement result ?
I'n not sure what this could mean for things like edge-detection circuits, but I think that's not done in the PAD Ring ?
It seems maybe it missed the first clock, but was registered on the second clock, instead?
The non-registered path does seem to have next-clock step size, which is more expected.
Maybe there is some delay between the clocking that occurs in the actual I/O pad and the internal core flops? Or, the data paths subject to the two clocking contexts?
Looks good. We are getting the data read in where it is supposed to be.
Actually, it's 4 pixels, two bytes...
Detail around the beak is much smoother than is should be at this resolution.
Turns out the monitor actually has upscaling built into it.
It's a Samsung S24E450. Not even my best monitor...
This is really good for a P2 with VGA HDMI focus...
Think I'm going to target 16bpp VGA for games. 1080p for sure for profi stuff.
But, got no response at all from HyperFlash…
Went back to datasheet and now I see that the CSn ball is different for HyperFlash
Looks like all the other needed pins are the same, except this one.
They did made it that way, in order to enable the existence of multi-chip package options, where both can be accomodated, one HyperFlash and one HyperRam, their respective silicon dices stacked, one on top of the other, using a unique FBGA package footprint.
Be aware that some multi-chip packaged devices could also present a up-to 1 nS timing penalty, at some/all the signals and their respective interrelationships.
Also looking at dual HyperRam+HyperFlash along with dual eMMC (see attached)