Cooled it down with some ice packs and got another 30 MHz at the top.
Great scans
These are the waterfalls I derive from that, and the timing apertures they imply (I dropped the upper 4 pins, as they have unusual PCB loadings)
1/379M-1/357M = 0.162ns timing aperture spread to cover 60 pins, register mode
1/327M-1/287M = 0.426ns timing aperture spread to cover 60 pins, non-register mode
TmNon-TmReg = 0.5399ns shift in mean aperture sample point, relative register to non register. (+2 Sysclks)
As expected, register pin mode is much tighter spread, with 162ps, vs 426ps non-register mode
That register spread is quite good, one sixth of 1 nanosecond across any pin,
There is also an absolute shift in mean sampling point, of just over half a nano second, (0.5399ns + 2 Sysclks) with no skirt overlap, between Reg<->NoReg choices, that might be useful in some testing or design cases.
Some of that 162 ps could even be due XDIV = 20. I had to make it a larger value to achieve 1 MHz steps.
True, but the VCO only needs to be stable over a handful of sysclk cycles here, and the larger XDIV mainly increases lower frequency noise, or the variation over a few thousand sysclks.
It is useful to include any XDIV variation anyway.
If we take ~ 3 sysclks at ~ 10ns, a litmus test of 0.1% jitter variation (poor) on that, maps to ~ 10ps.
That 162ps aperture is going to 'walk' with temperature (PVT) as well.
It is nice to know what the pin-spread component is & how registered helps reduce spread.
Your two plots above are at different cooling, right ?
Do you have a guess/estimate of the dTemp on those 2 captures ?
Did any code change between them ? It may be possible to extract a ps/°C indicator tempco from the two tables ?
Here is a compare of the two warmer/cold plots
There seems to be about 16ps of variance (VCO jitter?) in the waterfall patterns, ie the same pin is last in both plots, but the exact ripples vary a couple of lines across the pins
The shift due to test temperature looks to be ~ 143ps, which needs a Temperature change.
Icepacks may imply 25°C or 30°C change ?
That's a ballpark indication of 5 ps/°C, (=28.6°C) or over a 50°C operating range => 250ps, or over 75°C => 375ps
The Unregistered sweep to next aperture, gives a healthy 3.15ns between 4 & 3 changes, which nicely ~equals the period at 312MHz - ie an 'expected' result.
The puzzle here, is the Registered 6 sysclk & 5 sysclk apertures change points, seem to be only 466ps apart ?!
Below ~ 316MHz, Registered remains stable, (just pin 63 is late change at ~150MHz due to high loading effects on that pin, & the top 6 pins are clearly different )
Not sure how to explain how it can pick up another sysclk in just 466 ps period change ?
This is pushing up to the top-spec and well overclocked, so maybe other effects come into play ?
Below 300+ MHz, the Registered delay is well behaved.
Just above 300MHz is where both Registered & UnRegistered have an aperture point, so that's probably best avoided operating point.
Here's 1080p @ 4bpp. Can't do a pretty bird in 4bpp...
This is full screen, whereas couldn't quite get to full screen at 8bpp.
Need 300 MHz to get my monitor to like it.
Still needed that NOP in read burst routine.
Had to reorder nibbles in video loop. Glad that option was there!
Had to add to front porch Hsync to make right.
I was looking at your code rather in depth today.
I noticed this line in most of your different display resolution sources:
'Wait for Pin_RWDS to go highsetse1 #$80+Pin_RWDS
waitse1
When i look at the P2 documentation V32, i see that setse1 has a pattern like this:
%001_PPPPPP = INA/INB bit of pin %PPPPPP rises
%010_PPPPPP = INA/INB bit of pin %PPPPPP falls
%011_PPPPPP = INA/INB bit of pin %PPPPPP changes
%10x_PPPPPP = INA/INB bit of pin %PPPPPP is low
%11x_PPPPPP = INA/INB bit of pin %PPPPPP is high
Yep, I'd think 30 °C. I have a little PTC based weather thermometer resting on top of the chip. And the ice packs were top and bottom. Bottom one was touching the eval board. I left it that way for a while before the cold run. Thermometer read about 2.0 °C. Got a little lower in the end.
The first run, at room temperature, the thermometer was probably in the low 30's.
I've been planning to buy some fine thermocouple wire, that I'll solder to the centre underside of the board. This'll give much more direct reading. I must get it ordered ...
There is a difference in how those two event types get triggered. The edge detect version, %010, does as you are saying.
The level detect version, %100, however, can catch you out as it will retrigger on a held level if that level condition is still present after checking (clearing the trigger). If not managed, this will create excess unexpected triggers. Which can produced duplicate or leading type effects where the program seems to be instant when it shouldn't be.
Whicker: good catch. I’ll have to fix the text and/or code.
It would be correct if said:
'Waitfor Pin_RWDS to go high and then low
Maybe I should try to skew the timing somehow... On the other hand, it seems to work...
I think it works because the data for the first pixel is from skipping at least one byte, or it unfortunately is testing the 8 nS data line hold time right at the margin. I think it's the former, because in WriteRamBurstSub we see the following line:
'Latency Clocksnopmov i2,#27'24 'need to check that this is right...
LoopLat1
outnot #Pin_CK
djnz i2,#LoopLat1
I think the default latency is 5 clocks, and 2x latency?
I haven't looked at the ISSI datasheet.
But the latency clocking starts early before all of the command address has arrived. One rise for 15:8 and one fall for 7:0.
This is already clocked manually in the first part of the code.
So with the assumed default settings, 5-1=4 clock cycles for the first part of the latency. 4*2 = 8 transitions.
5 more clock cycles for the second part of the latency. 5*2 = 10 transitions.
So that's only 8+10=18 clock transitions.
If default latency was 6, then it's 6-1=5, 5*2=10 transitions. Then adding to it is 6*2 = 12. 22 transitions to get the first data byte.
As shown in the attached image, with a latency of 4, then it's a total of 14 transitions, then put the data on, then rising edge of clock.
Somewhere there's something interesting going on is all I'm saying.
So despite the comment, it's waiting for a falling edge of RWDS?
...
Somewhere there's something interesting going on is all I'm saying.
Well spotted.
The clock also changes gear, from SW send of the addr.command part, to using a SmartPin generated faster clock, for the latency 'spare clocks' region & following data
setbyteoutb,#0,#0'CA7..0 'lower 3 bits are A2..A0outnot #Pin_CK
setbytedirb,#$00,#0'release control of buffer 'configure smartpin to run HR clockdirl #Pin_CK
wrpin #%1_00110_0,#Pin_CK
wxpin #1,#Pin_CK 'add on every clockmovpa,#1shlpa,#30'SysCLK/4 = 62.5 or 75MHzwypinpa,#Pin_CK
dirh #Pin_CK
'prepare to load buffer using fifo locptra,#@HyperBuffer
movpa,pbmulpa,##480'pb is section of buffer, 0..2addptra,pawrfast #0,ptra'calc # bytes to readmovpa,##480'Wait for Pin_RWDS to go highsetse1 #$80+Pin_RWDS
waitse1nop'need this at 300 MHz?'read in bytesrep #1,pawfbyteinb
.reploop2
Current code uses SysCLK/4, but that pushes to need 300MHz PLL's, which is ok for testing, but not really shippable...
It may be possible to use SysCLK/2, and shuffle the code sections about a little, eg so 'prepare to load buffer using fifo goes before 'configure smartpin to run HR clock
I notice octaRAM floats DQS slightly longer (goes low later), than HyperFLASH RQDS, so maybe a pull-down mode on the pin would tolerate RAM types more.
...
The Unregistered sweep to next aperture, gives a healthy 3.15ns between 4 & 3 changes, which nicely ~equals the period at 312MHz - ie an 'expected' result.
The puzzle here, is the Registered 6 sysclk & 5 sysclk apertures change points, seem to be only 466ps apart ?!
Below ~ 316MHz, Registered remains stable, (just pin 63 is late change at ~150MHz due to high loading effects on that pin, & the top 6 pins are clearly different )
Not sure how to explain how it can pick up another sysclk in just 466 ps period change ?
This is pushing up to the top-spec and well overclocked, so maybe other effects come into play ?
Looking some more into 466ps apparent clock width, I did find one mechanism that appeared in doing PCB trace delay spice runs.. and that is clock ringing.
If the internal cock on the PAD ring has some ringing, as per the spice, that could explain the 466ns 466ps separation measurement result ?
I'n not sure what this could mean for things like edge-detection circuits, but I think that's not done in the PAD Ring ?
...
The Unregistered sweep to next aperture, gives a healthy 3.15ns between 4 & 3 changes, which nicely ~equals the period at 312MHz - ie an 'expected' result.
The puzzle here, is the Registered 6 sysclk & 5 sysclk apertures change points, seem to be only 466ps apart ?!
Below ~ 316MHz, Registered remains stable, (just pin 63 is late change at ~150MHz due to high loading effects on that pin, & the top 6 pins are clearly different )
Not sure how to explain how it can pick up another sysclk in just 466 ps period change ?
This is pushing up to the top-spec and well overclocked, so maybe other effects come into play ?
Looking some more into 466ps apparent clock width, I did find one mechanism that appeared in doing PCB trace delay spice runs.. and that is clock ringing.
If the internal cock on the PAD ring has some ringing, as per the spice, that could explain the 466ns separation measurement result ?
I'n not sure what this could mean for things like edge-detection circuits, but I think that's not done in the PAD Ring ?
It seems maybe it missed the first clock, but was registered on the second clock, instead?
It seems maybe it missed the first clock, but was registered on the second clock, instead?
If that was true, the change would not be 466ps ?
The non-registered path does seem to have next-clock step size, which is more expected.
Maybe there is some delay between the clocking that occurs in the actual I/O pad and the internal core flops? Or, the data paths subject to the two clocking contexts?
I just did a test where I made the first two pixels in the first row different. Did the fast row read and then output result to serial port.
Looks good. We are getting the data read in where it is supposed to be.
I was just wondering why the 640x480x16bpp bird photo looks so good on my 1080p monitor...
Detail around the beak is much smoother than is should be at this resolution.
Turns out the monitor actually has upscaling built into it.
It's a Samsung S24E450. Not even my best monitor...
This is really good for a P2 with VGA HDMI focus...
Think I'm going to target 16bpp VGA for games. 1080p for sure for profi stuff.
Just tried HyperRam code for HyperFlash. Looks to me like has same exact read and write protocol.
But, got no response at all from HyperFlash…
Went back to datasheet and now I see that the CSn ball is different for HyperFlash
Looks like all the other needed pins are the same, except this one.
They did made it that way, in order to enable the existence of multi-chip package options, where both can be accomodated, one HyperFlash and one HyperRam, their respective silicon dices stacked, one on top of the other, using a unique FBGA package footprint.
Be aware that some multi-chip packaged devices could also present a up-to 1 nS timing penalty, at some/all the signals and their respective interrelationships.
Comments
Great scans
These are the waterfalls I derive from that, and the timing apertures they imply (I dropped the upper 4 pins, as they have unusual PCB loadings)
1/379M-1/357M = 0.162ns timing aperture spread to cover 60 pins, register mode
1/327M-1/287M = 0.426ns timing aperture spread to cover 60 pins, non-register mode
TmNon-TmReg = 0.5399ns shift in mean aperture sample point, relative register to non register. (+2 Sysclks)
As expected, register pin mode is much tighter spread, with 162ps, vs 426ps non-register mode
That register spread is quite good, one sixth of 1 nanosecond across any pin,
There is also an absolute shift in mean sampling point, of just over half a nano second, (0.5399ns + 2 Sysclks) with no skirt overlap, between Reg<->NoReg choices, that might be useful in some testing or design cases.
Derived from pin_echo2.txt Registered 380 MHz 6 00000000000000000000000000000000000000000000000000000000000100 379 MHz 6 00000000000000000000000000000000000000000000000000000000000100 378 MHz 6 00000000000000000000000000000000000000000000010000000000000100 377 MHz 6 00000000001100000010000000000000000000000000010000000000000100 376 MHz 6 00000000001110000010000000000000000000000000010000000000000100 375 MHz 6 00000000001110000110000000000000000000000000010000000000000100 374 MHz 6 00000000001110000010000000000000000000000000010010000000000100 373 MHz 6 00000000001110001110000000010000000000000000110010000000000100 372 MHz 6 00000000001111111110000000010000000000000000111010000110000100 371 MHz 6 00000000001111111110010000111100000000000000111010010110001100 370 MHz 6 00010100011111111110011001111100010011000000111111110110011101 369 MHz 6 00010100011111111110011011111110110011000000111111111110011101 368 MHz 6 00010100011111111110011111111110111011000010111111111110011101 367 MHz 6 00011100011111111110111111111111111011000011111111111110011101 366 MHz 6 00011100011111111110111111111111111011000011111111111110011111 365 MHz 6 00011100011111111111111111111111111111000011111111111110011111 364 MHz 6 00011101111111111111111111111111111111010011111111111110011111 363 MHz 6 00011101111111111111111111111111111111010011111111111110011111 362 MHz 6 00011111111111111111111111111111111111110011111111111110111111 361 MHz 6 00011111111111111111111111111111111111110011111111111110111111 360 MHz 6 00011111111111111111111111111111111111110011111111111110111111 359 MHz 6 00011111111111111111111111111111111111110111111111111111111111 358 MHz 6 00011111111111111111111111111111111111110111111111111111111111 357 MHz 6 00011111111111111111111111111111111111111111111111111111111111 << ignore upper pins 356 MHz 6 00011111111111111111111111111111111111111111111111111111111111 355 MHz 6 00111111111111111111111111111111111111111111111111111111111111 354 MHz 6 01111111111111111111111111111111111111111111111111111111111111 Unregistered 328 MHz 4 00000000000000000000000000000000000000000000000000000000000000 327 MHz 4 00000000000000000000010001000000000000000000000000000000000000 ... 297 MHz 4 00000011011111111111111111111110111111111111101111101111101100 296 MHz 4 00001011011111111111111111111111111111111111111111111111111100 295 MHz 4 00001011011111111111111111111111111111111111111111111111111100 294 MHz 4 00001011011111111111111111111111111111111111111111111111111100 293 MHz 4 00001011111111111111111111111111111111111111111111111111111100 292 MHz 4 00001011111111111111111111111111111111111111111111111111111100 291 MHz 4 00001011111111111111111111111111111111111111111111111111111100 290 MHz 4 00001011111111111111111111111111111111111111111111111111111101 289 MHz 4 00001011111111111111111111111111111111111111111111111111111111 288 MHz 4 00001011111111111111111111111111111111111111111111111111111111 287 MHz 4 00001111111111111111111111111111111111111111111111111111111111 1/379M-1/357M = 0.162ns timing aperture spread to cover all pins, register mode 1/327M-1/287M = 0.426ns timing aperture spread to cover all pins, non-register mode TmReg=1/((379M+357M)/2) = 2.717391304347826087e-9 TmNon = 1/((327M+287M)/2) = 3.2573289902280130293e-9 TmNon-TmReg = 0.5399ns shift in mean aperture sample point, relative register to non register. (+ 2 sysclks)
It is useful to include any XDIV variation anyway.
If we take ~ 3 sysclks at ~ 10ns, a litmus test of 0.1% jitter variation (poor) on that, maps to ~ 10ps.
That 162ps aperture is going to 'walk' with temperature (PVT) as well.
It is nice to know what the pin-spread component is & how registered helps reduce spread.
Your two plots above are at different cooling, right ?
Do you have a guess/estimate of the dTemp on those 2 captures ?
Did any code change between them ? It may be possible to extract a ps/°C indicator tempco from the two tables ?
I'd specially enjoyed the waterfalls, because they easy the study of when and where things start to change.
There seems to be about 16ps of variance (VCO jitter?) in the waterfall patterns, ie the same pin is last in both plots, but the exact ripples vary a couple of lines across the pins
The shift due to test temperature looks to be ~ 143ps, which needs a Temperature change.
Icepacks may imply 25°C or 30°C change ?
That's a ballpark indication of 5 ps/°C, (=28.6°C) or over a 50°C operating range => 250ps, or over 75°C => 375ps
pin_echo2.txt Registered COLD 363 MHz 6 00011101111111111111111111111111111111010011111111111110011111 << 362 MHz 6 00011111111111111111111111111111111111110011111111111110111111 361 MHz 6 00011111111111111111111111111111111111110011111111111110111111 360 MHz 6 00011111111111111111111111111111111111110011111111111110111111 359 MHz 6 00011111111111111111111111111111111111110111111111111111111111 358 MHz 6 00011111111111111111111111111111111111110111111111111111111111 357 MHz 6 00011111111111111111111111111111111111111111111111111111111111 ^- last pin pin_echo.txt Registered WARMER 347 MHz 6 00011101111111111110111111111111111111010011111111111110011111 346 MHz 6 00011101111111111110111111111111111111010011111111111110111111 345 MHz 6 00011111111111111111111111111111111111010011111111111110111111 << 344 MHz 6 00011111111111111111111111111111111111110011111111111111111111 343 MHz 6 00111111111111111111111111111111111111110111111111111111111111 342 MHz 6 01111111111111111111111111111111111111110111111111111111111111 341 MHz 6 01111111111111111111111111111111111111110111111111111111111111 340 MHz 6 01111111111111111111111111111111111111111111111111111111111111 ^- last pin General wobble in pins 1/347M-1/345M = 16.706 ps d Aperture/ dT ?°C 1/363M-1/345M = 143 ps
I wonder if this MHz sweep technique can give a margin indicator to show where to avoid ?
So I then scan the file for ~50% bits flipped, as a better quality indicator of the middle of the aperture (highest slope of change)
pin_echo2.txt Registered COLD 372 MHz -> 6 : 17 1's 371 MHz -> 6 : 23 1's 317 MHz -> 5 : 15 1's 315 MHz -> 5 : 25 1's 316 MHz -> 5 : 29 1's - remains at 5 : 1's in bottom 58 pins, down to 80MHz test stop. 1/372M-1/317M = 0.466ns ?? 312MHz Unregistered -> 4 : 17 1's 158MHz Unregistered -> 3 : 14 1's 1/158M-1/315M = 3.1545ns - more expected spacing
The Unregistered sweep to next aperture, gives a healthy 3.15ns between 4 & 3 changes, which nicely ~equals the period at 312MHz - ie an 'expected' result.The puzzle here, is the Registered 6 sysclk & 5 sysclk apertures change points, seem to be only 466ps apart ?!
Below ~ 316MHz, Registered remains stable, (just pin 63 is late change at ~150MHz due to high loading effects on that pin, & the top 6 pins are clearly different )
Not sure how to explain how it can pick up another sysclk in just 466 ps period change ?
This is pushing up to the top-spec and well overclocked, so maybe other effects come into play ?
Below 300+ MHz, the Registered delay is well behaved.
Just above 300MHz is where both Registered & UnRegistered have an aperture point, so that's probably best avoided operating point.
This is full screen, whereas couldn't quite get to full screen at 8bpp.
Need 300 MHz to get my monitor to like it.
Still needed that NOP in read burst routine.
Had to reorder nibbles in video loop. Glad that option was there!
Had to add to front porch Hsync to make right.
Maybe half line each line
I noticed this line in most of your different display resolution sources:
'Wait for Pin_RWDS to go high setse1 #$80+Pin_RWDS waitse1
When i look at the P2 documentation V32, i see that setse1 has a pattern like this:
so the line with $80 is basically:
setse1 #%1000_0000+Pin_RWDS 'same as: setse1 #%010_000000+Pin_RWDS
So despite the comment, it's waiting for a falling edge of RWDS?
The first run, at room temperature, the thermometer was probably in the low 30's.
BTW: There is 62 pins in the graphs. numbered 61 .. 0.
It would be correct if said:
'Wait for Pin_RWDS to go high and then low
Maybe I should try to skew the timing somehow... On the other hand, it seems to work...
The level detect version, %100, however, can catch you out as it will retrigger on a held level if that level condition is still present after checking (clearing the trigger). If not managed, this will create excess unexpected triggers. Which can produced duplicate or leading type effects where the program seems to be instant when it shouldn't be.
I think it works because the data for the first pixel is from skipping at least one byte, or it unfortunately is testing the 8 nS data line hold time right at the margin. I think it's the former, because in WriteRamBurstSub we see the following line:
'Latency Clocks nop mov i2,#27'24 'need to check that this is right... LoopLat1 outnot #Pin_CK djnz i2,#LoopLat1
I think the default latency is 5 clocks, and 2x latency?
I haven't looked at the ISSI datasheet.
But the latency clocking starts early before all of the command address has arrived. One rise for 15:8 and one fall for 7:0.
This is already clocked manually in the first part of the code.
So with the assumed default settings, 5-1=4 clock cycles for the first part of the latency. 4*2 = 8 transitions.
5 more clock cycles for the second part of the latency. 5*2 = 10 transitions.
So that's only 8+10=18 clock transitions.
If default latency was 6, then it's 6-1=5, 5*2=10 transitions. Then adding to it is 6*2 = 12. 22 transitions to get the first data byte.
As shown in the attached image, with a latency of 4, then it's a total of 14 transitions, then put the data on, then rising edge of clock.
Somewhere there's something interesting going on is all I'm saying.
The clock also changes gear, from SW send of the addr.command part, to using a SmartPin generated faster clock, for the latency 'spare clocks' region & following data
setbyte outb,#0,#0 'CA7..0 'lower 3 bits are A2..A0 outnot #Pin_CK setbyte dirb,#$00,#0 'release control of buffer 'configure smartpin to run HR clock dirl #Pin_CK wrpin #%1_00110_0,#Pin_CK wxpin #1,#Pin_CK 'add on every clock mov pa,#1 shl pa,#30 'SysCLK/4 = 62.5 or 75MHz wypin pa,#Pin_CK dirh #Pin_CK 'prepare to load buffer using fifo loc ptra,#@HyperBuffer mov pa,pb mul pa,##480 'pb is section of buffer, 0..2 add ptra,pa wrfast #0,ptra 'calc # bytes to read mov pa,##480 'Wait for Pin_RWDS to go high setse1 #$80+Pin_RWDS waitse1 nop 'need this at 300 MHz? 'read in bytes rep #1,pa wfbyte inb .reploop2
Current code uses SysCLK/4, but that pushes to need 300MHz PLL's, which is ok for testing, but not really shippable...It may be possible to use SysCLK/2, and shuffle the code sections about a little, eg so 'prepare to load buffer using fifo goes before 'configure smartpin to run HR clock
I notice octaRAM floats DQS slightly longer (goes low later), than HyperFLASH RQDS, so maybe a pull-down mode on the pin would tolerate RAM types more.
If the internal cock on the PAD ring has some ringing, as per the spice, that could explain the 466ns 466ps separation measurement result ?
I'n not sure what this could mean for things like edge-detection circuits, but I think that's not done in the PAD Ring ?
It seems maybe it missed the first clock, but was registered on the second clock, instead?
The non-registered path does seem to have next-clock step size, which is more expected.
Maybe there is some delay between the clocking that occurs in the actual I/O pad and the internal core flops? Or, the data paths subject to the two clocking contexts?
Looks good. We are getting the data read in where it is supposed to be.
Actually, it's 4 pixels, two bytes...
Detail around the beak is much smoother than is should be at this resolution.
Turns out the monitor actually has upscaling built into it.
It's a Samsung S24E450. Not even my best monitor...
This is really good for a P2 with VGA HDMI focus...
Think I'm going to target 16bpp VGA for games. 1080p for sure for profi stuff.
But, got no response at all from HyperFlash…
Went back to datasheet and now I see that the CSn ball is different for HyperFlash
Looks like all the other needed pins are the same, except this one.
They did made it that way, in order to enable the existence of multi-chip package options, where both can be accomodated, one HyperFlash and one HyperRam, their respective silicon dices stacked, one on top of the other, using a unique FBGA package footprint.
Be aware that some multi-chip packaged devices could also present a up-to 1 nS timing penalty, at some/all the signals and their respective interrelationships.
Also looking at dual HyperRam+HyperFlash along with dual eMMC (see attached)