@Scroungre said:
Just in the hardware. It seemed mostly okay, off and on, up to about 150MHz, sysclk/2, registered clk pin, but after that it was just hopeless. Sysclk/1 was worse. This is completely unacceptable for my application - looks like I'm going back to SRAM and a kajillion pins again.
To go above 150 MHz sysclock/2 the delay compensation value has to be increased by one. And increased again at even higher frequencies.
The layout I suspect is not good - I'm using a P2 Edge on a Jon MacPhalen "breadboard" adapter, and the module is plugged into 'base' pin 0 (closest to the chip!). I have not stuck a 22pF capacitor on it.
The added capacitor is only to allow writes at sysclock/1. Adding it actually makes reads less reliable. Writes are rock solid at sysclock/2, across all frequencies, without the capacitor.
Yeah that is interesting @Scroungre . The JohnnyMac board potentially has much longer trace lengths vs P2-EVAL which is where all the HyperRAM driver development and testing has taken place and also what has determined the default timing profile.
The memory timing might be tweakable if you run the included delay test program to find the delay breakpoints in your setup and adjust the driver code accordingly in the area of code shown below in memory.spin2. I typically try to have them transition in the center of the overlapping bands which is why I like to see the percentage of transfers that fail tapering off rather than a simple binary good/bad result in the output.
If that doesn't help at all in the different plug in locations I think what will be needed for the JohnnyMac will be to use the upcoming PSRAM based Edge board, and not the HyperRAM breakout.
With HyperRAM 1.0 devices running at 3.3V, clocking it at anything over 100MHz operation (which is a P2 at 200MHz with sysclk/1 DDR operation) is actually overclocking it, but at room temps I have found typically we can push it to about 150MHz (300MHz P2) before it completely fails (on the P2-EVAL board).
{
Below are the P2 frequency bands with the input timing delay & registered data pins correction
factors to get the memory reads working on the P2. This timing is potentially also affected by
temperature/process/voltage etc, and may need further tweaking in a given setup or whenever non
room temp operating conditions are experienced - YMMV.
Thanks go to @ozpropdev and @evanh for experimentally determining operating ranges that helped
figure some of this out below!
Default delay profiles used for HyperFlash and HyperRAM on P2-EVAL HyperRAM breakout board
operating at room temp. This can be tweaked or others added for different temperatures.
These delay profiles can be assigned to each configured device at address mapping time.
The actual operating input delay can also be adjusted on the fly per bank if the variation
of delay with temperature is already determined and the temperature is known/measurable.
}
'sysclk/1
HyperRamDelays1 long 6,92_000000,135_000000,188_000000,234_000000,280_000000,308_000000,0
HyperRamDelaysUnreg1 long 6,88_000000,120_000000,180_000000,225_000000,270_000000,305_000000,0
'sysclk/2
HyperRamDelays2 long 7,92_000000,135_000000,188_000000,232_000000,280_000000,307_000000,0
HyperRamDelaysUnreg2 long 7,89_000000,135_000000,180_000000,222_000000,266_000000,303_000000,0
HyperFlashDelays1 long 5,70_000000,110_000000,160_000000,225_000000,277_000000,320_000000,0
HyperFlashDelaysUnreg1 long 5,70_000000,105_000000,150_000000,210_000000,260_000000,315_000000,0
HyperFlashDelays2 long 6,70_000000,110_000000,160_000000,225_000000,277_000000,320_000000,0
HyperFlashDelaysUnreg2 long 6,70_000000,105_000000,150_000000,210_000000,260_000000,315_000000,0
PSRamDelays long 7,92_000000,150_000000,206_000000,258_000000,310_000000,333_000000,0
' SRAM delay as yet untested
SRamDelays long 4,26_000000,92_000000,150_000000,206_000000,258_000000,310_000000,333_000000,0
{
The profile format begins with the initial delay value, followed by frequencies at which the
delay is sequentially increased until either it falls below the next frequency, or the list
terminates with a zero. Frequencies must be stored in increasing order.
e.g. using HyperRam data above for sysclk/1 unregistered clock:
if 0 <= freq < 88000000 Hz, the delay compensation value is 6,
if 880000000 <= freq < 120000000 Hz, the delay compensation value is 7,
if 120000000 <= freq < 180000000 Hz, the delay compensation value is 8,
...etc...
Oh, oops, right, the JohnnyMac may be a bad choice. The signal is probably rounding off and fading away (attenuating) too much. Maybe 75 MHz is as good as that can do.
@rogloh said:
If that doesn't help at all in the different plug in locations I think what will be needed for the JohnnyMac will be to use the upcoming PSRAM based Edge board, and not the HyperRAM breakout.
I haven't seen a schematic for the RAM&Edge board. The Prop2 pins used for the RAMs better not also be routed to the edge connector.
Using rogloh's test software the 'delay' value runs up to thirteen, but it doesn't help. I poked around a little looking at some of the default values, and got confused. I'll poke around a bit more. S.
Here's what I see when I run the delay test on a JonnyMac (at base pin 0) vs P2-EVAL (at base pin 16). Exact same test image and HyperRAM board, but I see problems on the JonnyMac. Looks like poorer signal integrity on the JonnyMac board. First result in the file below is the JonnyMac, the second is the P2-EVAL.
Well the traces from the Edge connector to the P0-P15 pins are already up to 3-4 inches long and skew might be a problem.
If you need high speed with parallel busses it might be better to use the P2-EVAL. Not sure the JonnyMac was designed with high speed signal integrity in mind.
That's almost exactly what I see too. I can post my results if you still want me to, but there's not much point, in my opinion. S.
eta: PS, you're probably right, that it's better to use an Eval for this - or even better, a P2 Edge with PsRAM on-board!! (hint hint? ;-) S.
Yeah don't worry about it if it looks about the same as mine. Will very likely be the same issue.
A PSRAM Edge will likely be the way to go for the JonnyMac breadboard users if they want lots of fast external memory, once it becomes available. It should work pretty well with video in particular.
Also if you have a P2 EVAL, building your own PSRAM based breakout can be pretty inexpensive too and easy to solder as I have discovered. Those memories are less than $2 USD from Adafruit or Mouser etc.
I was wondering about fiddling with your 'mailbox' format.
In the application I have in mind each (of four or five) "calculation" cog will want to collect three or four longs from external RAM on a fairly regular basis - and it would be nice to have one custom request that asks for several (possibly non-contiguous) longs from the external RAM.
Writing can be done one long at a time, slowly, offline.
Perhaps (up to a limit: there's actually quite a lot of Hub RAM) there would be a way for a cog to specify several external RAM addresses at once, and expect back data from all of them once the cog doing the transfer is done.
I was also going to put ring buffers in the "calculation" cogs to help keep up with the "eggbeater". This isn't video - that's your department ;-) Just systems control.
So adjustable 'mailbox' sizes would be nice. Anyhow - I guess if I want it, I should get on with writing it, huh? ;-) (I can save some space because I don't care about bytes and words - I'm doing everything in 32 bits). S.
@Scroungre said:
I was wondering about fiddling with your 'mailbox' format.
In the application I have in mind each (of four or five) "calculation" cog will want to collect three or four longs from external RAM on a fairly regular basis - and it would be nice to have one custom request that asks for several (possibly non-contiguous) longs from the external RAM.
Writing can be done one long at a time, slowly, offline.
Perhaps (up to a limit: there's actually quite a lot of Hub RAM) there would be a way for a cog to specify several external RAM addresses at once, and expect back data from all of them once the cog doing the transfer is done.
Yeah that type of operation is supported and I do have the ability to do this in the driver. Instead of a single memory request you can construct a linked list of n independent requests ahead of time and just pass the address of the head of this request list to the memory driver and it will work its way through the list sequence and notify you at the end, instead of one at a time. You can either block on the overall list completion, or instead run this list in a non-blocking mode where the COG that triggers the request can go off to do other work in the meantime and optionally check the result later whenever it is convenient, sometime before the next request list needs to be issued.
Note: When running in a list I think you need to setup burst/block transfers instead of a single item reads. But you can still setup burst transfer sizes for single byte/words/longs in your burst request lengths so there is no restriction for transferring these standard item sizes into your arbitrary hub addresses.
Hm. Dunno if that seemed useful for just three longs, but I guess that counts as 12 bytes, so... I'll tinker with it for awhile. It will be definitely useful for loading up the RAM in the first place. Thanks! S.
I bought a KISS0000 board and built a little adapter for it so I could use the Parallax P2 HyperFlash / HyperRam card I had, and ran some more memory tests on it.
The results were wonderful! Far, far, better than on the Jon McPhalen breadboard. 100% up to 300MHz!! (for some reason, my KISS board doesn't like anything over 300MHz. 350 seems 'right out'. Dunno why - will ask on the correct threads).
But yeah, here's my results up to 300MHz - and they're 100% at the top! Yay! S.
That's a decent result, even via your adapter. The shorter traces on the KISS0000 board obviously help a lot. With it's long path lengths the JonnyMac board is not really designed for high speed operation. You should try the sysclk/1 option as well to see if that can be achieved at any frequencies but it pushes the timing much harder so it might not work as well in your setup though. Its performance might possibly vary with module position on the KISS0000 board too if there are differences in data bus path lengths in different positions.
Pandemic lockdown demotivation (~231 days and counting since March last year) has slowed things down and I've been messing about quite a bit with my P2ME2 and VOYAGER boards in my spare time lately but I know I want to get back and release my video driver update that includes the official external HyperRAM and PSRAM capability that it already supports. This driver will also help when the updated P2Edge board comes along with PSRAM and make it easy to use that system with external video frame buffers, or other applications.
Both interesting thoughts, but if it's perfect as it is (which I'm not sure - Instead of "percentage of successful read/write cycles" I'd rather see "Errors in [say] 10,000 cycles") there's not much point in trying to make it better. Unless I can coax the KISS to 350 and find errors thereby.
I'll try your suggestions anyhow - even if it makes it worse, that's useful information. Oddly enough, unlike the Edge breadboard, the headers on the KISS are equally spaced, so I can move down eight data pins at a time (if I stick in some more headers). I don't think the code will like that, though. Maybe I can tweak it a little - more news to come later. Hafta go do carpentry now.
You're right. I didn't even look at your supplied report file. The fails you are getting above 300 MHz are sounding like a chip reset rather than hyper timing limits. Your report, when compared to the Eval Board, shows there is more room for tight integration. The new Edge Board with RAM should be highly reliable at all clock rates.
It does reek a bit of a chip reset. The RAM test walks up through the frequencies, and if you go over 300MHz it clears the serial terminal window! Rude, that! Can't cut'n'paste at all, let alone logfile it (at least, not with the Propellor Serial Terminal). It also seems a really 'hard' limit - just a bit over, nothing, at 300 exactly, everything's working fine.
If it were weird power or bum caps or something, I'd expect more "flakiness" - more of a 'fuzzy' limit, where below it you get occasional errors and above it you get occasional success.
Looking forward to the Edge with RAM, although the KISS is a more convenient form factor for me. That's just me... S.
I don't see what would be intentionally fitted in place to cause a hard P2 reset above 300MHz. If it is thermal or some current limit on the KISS board then it should also probably be load dependent and idling a single COG at 300MHz shouldn't hit that limit so you could write something that just clocks it at over 300MHz and doesn't do much else and see if that works without failure. Also you could monitor the reset pin too when you encounter this. My own P2ME2 board can reset the P2 if the regulator power good signal is driven low indicating the 1.8V voltage is out of range or for a UVLO condition, maybe your KISS board does something similar.
I think the next gen EDGE should help resolve your problems when it arrives assuming it's all designed well timing wise (should be given its small size) and its power design is solid. The P2-EVAL also works reasonably well (apart from use of P28-P31 for high speed stuff).
@Scroungre said:
... and if you go over 300MHz it clears the serial terminal window! Rude, that! Can't cut'n'paste at all, let alone logfile it (at least, not with the Propellor Serial Terminal). It also seems a really 'hard' limit - just a bit over, nothing, at 300 exactly, everything's working fine.
That's even odder than I was thinking. Time to verify the clock frequencies. Here's a quick test I've thrown together. Compiled with Flexspin. It requires an interactive terminal. It starts at 300 MHz and each key-press there after will bump the frequency by 1 MHz.
Monitor accuracy with an oscilloscope (or frequency counter) on pin P56. Pulse rate is sysclock/1000 so 300 MHz should read 300 kHz.
Clock frequencies verified! I ran your 25MHz code and looked at it on a 'scope, and it does indeed run quite well at up to nearly 400! (yeah, by then it gets weird, but that's to be expected.) It's fine and does indeed display 350kHz when the serial port claims it's running at 350MHz.
Weird. With a fair amount of hackwork I managed to get your code to compile under the Propellor Spin Tool (the remaining difference!) instead of the FlexSpin compiler, and it worked okay there too.
Running Rogloh's driver code through Flexspin got me this note at the end:
Aha, in theory that is because it is trying to go in steps of 1MHz but couldn't find a valid set values using a 25MHz crystal. But it should be able to if it divides by 25 then multiplies by 300. I'll need to look at that code.
Here's the code I use... for a given desired output frequency it should compute the clock mode value for the PLL setup, or fail with a zero returned if it couldn't find a match.
CON
' clock source used below
#0, CLKSRC_XTAL, CLKSRC_XIN
' setup one of these based on your P2 HW input clock,
' this will only be used if the PLL settings get automatically computed (see code below)
'CLKIN_HZ = _xtalfreq ' also only enable CLKSRC_XTAL below as CLKSRC
'CLKIN_HZ = _xinfreq ' also only enable CLKSRC_XIN below as CLKSRC
CLKIN_HZ = 20000000 ' assume 20MHz crystal by default
CLKSRC = CLKSRC_XTAL ' enable this for crystal clock source (default)
'CLKSRC = CLKSRC_XIN ' enable this for direct input clock source on XI (no crystal)
' parameters used when automatically determining PLL settings
TOLERANCE_HZ = 500000 ' pixel clock accuracy will be constrained by this when no exact ratios are found
MAXVCO_HZ = 350000000 ' for safety, but you could try to overclock even higher at your own risk
MINVCO_HZ = 100000000
MINPLLIN_HZ = 500000 ' setting lower can find more PLL ratios but may begin to introduce more PLL jitter
PRI computeClockMode(desiredHz) : mode | vco, finput, fval, p, div, m, error, bestError
bestError := -1
repeat p from 0 to 30 step 2
' compute the ideal VCO frequency fval at this value of P
if p <> 0
if desiredHz > MAXVCO_HZ/p ' test it like this to not overflow
quit
fval := desiredHz * p
else
fval := desiredHz
if fval > MAXVCO_HZ
quit
' scan through D values, and find best M, retain best case
repeat div from 1 to 64
'compute the PLL input frequency from the crystal through the divider
finput := CLKIN_HZ/div
if finput < MINPLLIN_HZ ' input getting too low, and only gets lower so quit now
quit
' determine M value needed for this ideal VCO frequency and input frequency
m := fval / finput
' check for the out of divider range case
if m +> 1024
quit
' zero is special and gets a second chance
if m == 0
m++
' compute the actual VCO frequency at this particular M, D setting
vco := finput * m
if vco +< MINVCO_HZ
quit
if vco +> MAXVCO_HZ
next
' compute the error and check next higher M value if possible, it may be closer
error := abs(fval - vco)
if m < 1024 and (vco + finput) +< MAXVCO_HZ
if error > abs(fval - (vco + finput))
error := abs(fval - (vco + finput))
m++
' retain best allowed frequency error and divider bits found so far
if error +< bestError and error +< TOLERANCE_HZ+1
bestError := error
mode := ((div-1) << 18) + ((m-1) << 8) + (((p/2 - 1) & $f) << 4)
' quit whenever perfect match found
if bestError == 0
quit
if bestError == 0
quit
' final clock mode format is this #%0000_000E_DDDD_DDMM_MMMM_MMMM_PPPP_CCSS
if mode
' also set 15 or 30pF capacitor loading based on input crystal frequency
mode |= (1<<24) ' enable PLL
if (CLKSRC == CLKSRC_XTAL) ' enable oscillator and caps for crystal
mode |= (CLKIN_HZ < 16000000) ? %1111 : %1011
else
mode |= %0111 ' don't enable oscillator
Just tried a quick test program calling this method above with settings for both 20MHz and 25MHz crystals. Code seems to work in both cases and not return zero when it hits 301MHz. I'm not sure what is happening in your setup. I am using a older 5.3.1 version of FlexSpin in case that makes a difference. If you can run this same test code I posted above in your setup it would be worth comparing the results @Scroungre .
OBJ
uart : "SmartSerial"
f : "ers_fmt"
PUB main() | fr, v
'setup serial port output
uart.start(BAUD)
send := @uart.tx
f.nl()
send("PLL test program using crystal operating at ",f.dec(CLKIN_HZ),"Hz",13,10)
repeat fr from 295000000 to 310000000 step 1000000
v := computeClockMode(fr)
f.dec(fr)
send(" - ")
f.hex(v)
f.nl()
repeat
Better! But not quite perfect... I cut and pasted your "PRI computeClockMode(desiredHz)" code and replaced the old method with that, and it ran happily up to about 348MHz before complaining about a 'set RAM Delay failed'. See attached text.
I ran all this through FlexProp 5.2. (No Optimization). It seems to behave the same with the Propellor Tool 2.5.3, except that clears the screen when it exits, so I can't cut and paste the results.
My 'CON' block values are:
CON
_xtlfreq = 25_000_000
_clkfreq = 100000000
RAM_START = $00_000000
FLASH_START = $02_000000
FLASH_SIZE = 32*1024*1024
BAUD = 115200
BUFSIZE = 128
' clock source used below
#0, CLKSRC_XTAL, CLKSRC_XIN
' setup one of these based on your P2 HW input clock,
' this will only be used if the PLL settings get automatically computed (see code below)
'CLKIN_HZ = _xtalfreq ' also only enable CLKSRC_XTAL below as CLKSRC
'CLKIN_HZ = _xinfreq ' also only enable CLKSRC_XIN below as CLKSRC
' CLKIN_HZ = 20000000 ' assume 20MHz crystal by default
CLKIN_HZ = 25000000 ' assume 20MHz crystal by default
CLKSRC = CLKSRC_XTAL ' enable this for crystal clock source (default)
'CLKSRC = CLKSRC_XIN ' enable this for direct input clock source on XI (no crystal)
' parameters used when automatically determining PLL settings
TOLERANCE_HZ = 500000 ' pixel clock accuracy will be constrained by this when no exact ratios are found
MAXVCO_HZ = 350000000 ' for safety, but you could try to overclock even higher at your own risk
MINVCO_HZ = 100000000
MINPLLIN_HZ = 500000 ' setting lower can find more PLL ratios but may begin to introduce more PLL jitter
PS - Using 'fast' sysclk/1' is a dead loss. Not even good up to 300 on my gizmo. S.
The PLL code (with the con block and the Compute Clock method results gave me exactly the same results as yours. (at 25MHz Xtal). S.
PS - For grinsies, I tested out my other KISS (I had bought two) and lightly tweaked your PLL calculation numbers to run up to 360MHz. It did. See attached...
Comments
To go above 150 MHz sysclock/2 the delay compensation value has to be increased by one. And increased again at even higher frequencies.
The added capacitor is only to allow writes at sysclock/1. Adding it actually makes reads less reliable. Writes are rock solid at sysclock/2, across all frequencies, without the capacitor.
Yeah that is interesting @Scroungre . The JohnnyMac board potentially has much longer trace lengths vs P2-EVAL which is where all the HyperRAM driver development and testing has taken place and also what has determined the default timing profile.
The memory timing might be tweakable if you run the included delay test program to find the delay breakpoints in your setup and adjust the driver code accordingly in the area of code shown below in memory.spin2. I typically try to have them transition in the center of the overlapping bands which is why I like to see the percentage of transfers that fail tapering off rather than a simple binary good/bad result in the output.
If that doesn't help at all in the different plug in locations I think what will be needed for the JohnnyMac will be to use the upcoming PSRAM based Edge board, and not the HyperRAM breakout.
With HyperRAM 1.0 devices running at 3.3V, clocking it at anything over 100MHz operation (which is a P2 at 200MHz with sysclk/1 DDR operation) is actually overclocking it, but at room temps I have found typically we can push it to about 150MHz (300MHz P2) before it completely fails (on the P2-EVAL board).
{ Below are the P2 frequency bands with the input timing delay & registered data pins correction factors to get the memory reads working on the P2. This timing is potentially also affected by temperature/process/voltage etc, and may need further tweaking in a given setup or whenever non room temp operating conditions are experienced - YMMV. Thanks go to @ozpropdev and @evanh for experimentally determining operating ranges that helped figure some of this out below! Default delay profiles used for HyperFlash and HyperRAM on P2-EVAL HyperRAM breakout board operating at room temp. This can be tweaked or others added for different temperatures. These delay profiles can be assigned to each configured device at address mapping time. The actual operating input delay can also be adjusted on the fly per bank if the variation of delay with temperature is already determined and the temperature is known/measurable. } 'sysclk/1 HyperRamDelays1 long 6,92_000000,135_000000,188_000000,234_000000,280_000000,308_000000,0 HyperRamDelaysUnreg1 long 6,88_000000,120_000000,180_000000,225_000000,270_000000,305_000000,0 'sysclk/2 HyperRamDelays2 long 7,92_000000,135_000000,188_000000,232_000000,280_000000,307_000000,0 HyperRamDelaysUnreg2 long 7,89_000000,135_000000,180_000000,222_000000,266_000000,303_000000,0 HyperFlashDelays1 long 5,70_000000,110_000000,160_000000,225_000000,277_000000,320_000000,0 HyperFlashDelaysUnreg1 long 5,70_000000,105_000000,150_000000,210_000000,260_000000,315_000000,0 HyperFlashDelays2 long 6,70_000000,110_000000,160_000000,225_000000,277_000000,320_000000,0 HyperFlashDelaysUnreg2 long 6,70_000000,105_000000,150_000000,210_000000,260_000000,315_000000,0 PSRamDelays long 7,92_000000,150_000000,206_000000,258_000000,310_000000,333_000000,0 ' SRAM delay as yet untested SRamDelays long 4,26_000000,92_000000,150_000000,206_000000,258_000000,310_000000,333_000000,0 { The profile format begins with the initial delay value, followed by frequencies at which the delay is sequentially increased until either it falls below the next frequency, or the list terminates with a zero. Frequencies must be stored in increasing order. e.g. using HyperRam data above for sysclk/1 unregistered clock: if 0 <= freq < 88000000 Hz, the delay compensation value is 6, if 880000000 <= freq < 120000000 Hz, the delay compensation value is 7, if 120000000 <= freq < 180000000 Hz, the delay compensation value is 8, ...etc...Oh, oops, right, the JohnnyMac may be a bad choice. The signal is probably rounding off and fading away (attenuating) too much. Maybe 75 MHz is as good as that can do.
I haven't seen a schematic for the RAM&Edge board. The Prop2 pins used for the RAMs better not also be routed to the edge connector.
No they don't route to the edge connector - I also have checked that with Parallax...
Using rogloh's test software the 'delay' value runs up to thirteen, but it doesn't help. I poked around a little looking at some of the default values, and got confused. I'll poke around a bit more. S.
Can you post the delay test results @Scroungre ?
Here's what I see when I run the delay test on a JonnyMac (at base pin 0) vs P2-EVAL (at base pin 16). Exact same test image and HyperRAM board, but I see problems on the JonnyMac. Looks like poorer signal integrity on the JonnyMac board. First result in the file below is the JonnyMac, the second is the P2-EVAL.
Ouch! That's horrible. Need to look below 50 MHz too.
Well the traces from the Edge connector to the P0-P15 pins are already up to 3-4 inches long and skew might be a problem.
If you need high speed with parallel busses it might be better to use the P2-EVAL. Not sure the JonnyMac was designed with high speed signal integrity in mind.
That's almost exactly what I see too. I can post my results if you still want me to, but there's not much point, in my opinion. S.
eta: PS, you're probably right, that it's better to use an Eval for this - or even better, a P2 Edge with PsRAM on-board!! (hint hint? ;-) S.
Yeah don't worry about it if it looks about the same as mine. Will very likely be the same issue.
A PSRAM Edge will likely be the way to go for the JonnyMac breadboard users if they want lots of fast external memory, once it becomes available. It should work pretty well with video in particular.
Also if you have a P2 EVAL, building your own PSRAM based breakout can be pretty inexpensive too and easy to solder as I have discovered. Those memories are less than $2 USD from Adafruit or Mouser etc.
I was wondering about fiddling with your 'mailbox' format.
In the application I have in mind each (of four or five) "calculation" cog will want to collect three or four longs from external RAM on a fairly regular basis - and it would be nice to have one custom request that asks for several (possibly non-contiguous) longs from the external RAM.
Writing can be done one long at a time, slowly, offline.
Perhaps (up to a limit: there's actually quite a lot of Hub RAM) there would be a way for a cog to specify several external RAM addresses at once, and expect back data from all of them once the cog doing the transfer is done.
I was also going to put ring buffers in the "calculation" cogs to help keep up with the "eggbeater". This isn't video - that's your department ;-) Just systems control.
So adjustable 'mailbox' sizes would be nice. Anyhow - I guess if I want it, I should get on with writing it, huh? ;-) (I can save some space because I don't care about bytes and words - I'm doing everything in 32 bits). S.
Yeah that type of operation is supported and I do have the ability to do this in the driver. Instead of a single memory request you can construct a linked list of n independent requests ahead of time and just pass the address of the head of this request list to the memory driver and it will work its way through the list sequence and notify you at the end, instead of one at a time. You can either block on the overall list completion, or instead run this list in a non-blocking mode where the COG that triggers the request can go off to do other work in the meantime and optionally check the result later whenever it is convenient, sometime before the next request list needs to be issued.
Note: When running in a list I think you need to setup burst/block transfers instead of a single item reads. But you can still setup burst transfer sizes for single byte/words/longs in your burst request lengths so there is no restriction for transferring these standard item sizes into your arbitrary hub addresses.
Hm. Dunno if that seemed useful for just three longs, but I guess that counts as 12 bytes, so... I'll tinker with it for awhile. It will be definitely useful for loading up the RAM in the first place. Thanks! S.
Waking up an older thread here, but here goes:
I bought a KISS0000 board and built a little adapter for it so I could use the Parallax P2 HyperFlash / HyperRam card I had, and ran some more memory tests on it.
The results were wonderful! Far, far, better than on the Jon McPhalen breadboard. 100% up to 300MHz!! (for some reason, my KISS board doesn't like anything over 300MHz. 350 seems 'right out'. Dunno why - will ask on the correct threads).
But yeah, here's my results up to 300MHz - and they're 100% at the top! Yay! S.
That's a decent result, even via your adapter. The shorter traces on the KISS0000 board obviously help a lot. With it's long path lengths the JonnyMac board is not really designed for high speed operation. You should try the sysclk/1 option as well to see if that can be achieved at any frequencies but it pushes the timing much harder so it might not work as well in your setup though. Its performance might possibly vary with module position on the KISS0000 board too if there are differences in data bus path lengths in different positions.
Pandemic lockdown demotivation (~231 days and counting since March last year) has slowed things down and I've been messing about quite a bit with my P2ME2 and VOYAGER boards in my spare time lately but I know I want to get back and release my video driver update that includes the official external HyperRAM and PSRAM capability that it already supports. This driver will also help when the updated P2Edge board comes along with PSRAM and make it easy to use that system with external video frame buffers, or other applications.
And after those tests, try shifting the adaptor to the other end so that the hyper data bus is shorter than the hyper clock.
Both interesting thoughts, but if it's perfect as it is (which I'm not sure - Instead of "percentage of successful read/write cycles" I'd rather see "Errors in [say] 10,000 cycles") there's not much point in trying to make it better. Unless I can coax the KISS to 350 and find errors thereby.
I'll try your suggestions anyhow - even if it makes it worse, that's useful information. Oddly enough, unlike the Edge breadboard, the headers on the KISS are equally spaced, so I can move down eight data pins at a time (if I stick in some more headers). I don't think the code will like that, though. Maybe I can tweak it a little - more news to come later. Hafta go do carpentry now.
So far, so good, S.
You're right. I didn't even look at your supplied report file. The fails you are getting above 300 MHz are sounding like a chip reset rather than hyper timing limits. Your report, when compared to the Eval Board, shows there is more room for tight integration. The new Edge Board with RAM should be highly reliable at all clock rates.
It does reek a bit of a chip reset. The RAM test walks up through the frequencies, and if you go over 300MHz it clears the serial terminal window! Rude, that! Can't cut'n'paste at all, let alone logfile it (at least, not with the Propellor Serial Terminal). It also seems a really 'hard' limit - just a bit over, nothing, at 300 exactly, everything's working fine.
If it were weird power or bum caps or something, I'd expect more "flakiness" - more of a 'fuzzy' limit, where below it you get occasional errors and above it you get occasional success.
Looking forward to the Edge with RAM, although the KISS is a more convenient form factor for me. That's just me... S.
S.
I don't see what would be intentionally fitted in place to cause a hard P2 reset above 300MHz. If it is thermal or some current limit on the KISS board then it should also probably be load dependent and idling a single COG at 300MHz shouldn't hit that limit so you could write something that just clocks it at over 300MHz and doesn't do much else and see if that works without failure. Also you could monitor the reset pin too when you encounter this. My own P2ME2 board can reset the P2 if the regulator power good signal is driven low indicating the 1.8V voltage is out of range or for a UVLO condition, maybe your KISS board does something similar.
I think the next gen EDGE should help resolve your problems when it arrives assuming it's all designed well timing wise (should be given its small size) and its power design is solid. The P2-EVAL also works reasonably well (apart from use of P28-P31 for high speed stuff).
That's even odder than I was thinking. Time to verify the clock frequencies. Here's a quick test I've thrown together. Compiled with Flexspin. It requires an interactive terminal. It starts at 300 MHz and each key-press there after will bump the frequency by 1 MHz.
Monitor accuracy with an oscilloscope (or frequency counter) on pin P56. Pulse rate is sysclock/1000 so 300 MHz should read 300 kHz.
Oops, now rebuilt for a 25 MHz crystal:
Clock frequencies verified! I ran your 25MHz code and looked at it on a 'scope, and it does indeed run quite well at up to nearly 400! (yeah, by then it gets weird, but that's to be expected.) It's fine and does indeed display 350kHz when the serial port claims it's running at 350MHz.
Weird. With a fair amount of hackwork I managed to get your code to compile under the Propellor Spin Tool (the remaining difference!) instead of the FlexSpin compiler, and it worked okay there too.
Running Rogloh's driver code through Flexspin got me this note at the end:
because it then asks for an 'enter to continue', which it doesn't in the propellor serial window.
This is quite confusing. I have tried going back to some other code, and it seems to be behaving now? Very strange. S.
Aha, in theory that is because it is trying to go in steps of 1MHz but couldn't find a valid set values using a 25MHz crystal. But it should be able to if it divides by 25 then multiplies by 300. I'll need to look at that code.
Here's the code I use... for a given desired output frequency it should compute the clock mode value for the PLL setup, or fail with a zero returned if it couldn't find a match.
CON ' clock source used below #0, CLKSRC_XTAL, CLKSRC_XIN ' setup one of these based on your P2 HW input clock, ' this will only be used if the PLL settings get automatically computed (see code below) 'CLKIN_HZ = _xtalfreq ' also only enable CLKSRC_XTAL below as CLKSRC 'CLKIN_HZ = _xinfreq ' also only enable CLKSRC_XIN below as CLKSRC CLKIN_HZ = 20000000 ' assume 20MHz crystal by default CLKSRC = CLKSRC_XTAL ' enable this for crystal clock source (default) 'CLKSRC = CLKSRC_XIN ' enable this for direct input clock source on XI (no crystal) ' parameters used when automatically determining PLL settings TOLERANCE_HZ = 500000 ' pixel clock accuracy will be constrained by this when no exact ratios are found MAXVCO_HZ = 350000000 ' for safety, but you could try to overclock even higher at your own risk MINVCO_HZ = 100000000 MINPLLIN_HZ = 500000 ' setting lower can find more PLL ratios but may begin to introduce more PLL jitter PRI computeClockMode(desiredHz) : mode | vco, finput, fval, p, div, m, error, bestError bestError := -1 repeat p from 0 to 30 step 2 ' compute the ideal VCO frequency fval at this value of P if p <> 0 if desiredHz > MAXVCO_HZ/p ' test it like this to not overflow quit fval := desiredHz * p else fval := desiredHz if fval > MAXVCO_HZ quit ' scan through D values, and find best M, retain best case repeat div from 1 to 64 'compute the PLL input frequency from the crystal through the divider finput := CLKIN_HZ/div if finput < MINPLLIN_HZ ' input getting too low, and only gets lower so quit now quit ' determine M value needed for this ideal VCO frequency and input frequency m := fval / finput ' check for the out of divider range case if m +> 1024 quit ' zero is special and gets a second chance if m == 0 m++ ' compute the actual VCO frequency at this particular M, D setting vco := finput * m if vco +< MINVCO_HZ quit if vco +> MAXVCO_HZ next ' compute the error and check next higher M value if possible, it may be closer error := abs(fval - vco) if m < 1024 and (vco + finput) +< MAXVCO_HZ if error > abs(fval - (vco + finput)) error := abs(fval - (vco + finput)) m++ ' retain best allowed frequency error and divider bits found so far if error +< bestError and error +< TOLERANCE_HZ+1 bestError := error mode := ((div-1) << 18) + ((m-1) << 8) + (((p/2 - 1) & $f) << 4) ' quit whenever perfect match found if bestError == 0 quit if bestError == 0 quit ' final clock mode format is this #%0000_000E_DDDD_DDMM_MMMM_MMMM_PPPP_CCSS if mode ' also set 15 or 30pF capacitor loading based on input crystal frequency mode |= (1<<24) ' enable PLL if (CLKSRC == CLKSRC_XTAL) ' enable oscillator and caps for crystal mode |= (CLKIN_HZ < 16000000) ? %1111 : %1011 else mode |= %0111 ' don't enable oscillatorJust tried a quick test program calling this method above with settings for both 20MHz and 25MHz crystals. Code seems to work in both cases and not return zero when it hits 301MHz. I'm not sure what is happening in your setup. I am using a older 5.3.1 version of FlexSpin in case that makes a difference. If you can run this same test code I posted above in your setup it would be worth comparing the results @Scroungre .
OBJ uart : "SmartSerial" f : "ers_fmt" PUB main() | fr, v 'setup serial port output uart.start(BAUD) send := @uart.tx f.nl() send("PLL test program using crystal operating at ",f.dec(CLKIN_HZ),"Hz",13,10) repeat fr from 295000000 to 310000000 step 1000000 v := computeClockMode(fr) f.dec(fr) send(" - ") f.hex(v) f.nl() repeatResults:
and
Better! But not quite perfect... I cut and pasted your "PRI computeClockMode(desiredHz)" code and replaced the old method with that, and it ran happily up to about 348MHz before complaining about a 'set RAM Delay failed'. See attached text.
I ran all this through FlexProp 5.2. (No Optimization). It seems to behave the same with the Propellor Tool 2.5.3, except that clears the screen when it exits, so I can't cut and paste the results.
My 'CON' block values are:
CON _xtlfreq = 25_000_000 _clkfreq = 100000000 RAM_START = $00_000000 FLASH_START = $02_000000 FLASH_SIZE = 32*1024*1024 BAUD = 115200 BUFSIZE = 128 ' clock source used below #0, CLKSRC_XTAL, CLKSRC_XIN ' setup one of these based on your P2 HW input clock, ' this will only be used if the PLL settings get automatically computed (see code below) 'CLKIN_HZ = _xtalfreq ' also only enable CLKSRC_XTAL below as CLKSRC 'CLKIN_HZ = _xinfreq ' also only enable CLKSRC_XIN below as CLKSRC ' CLKIN_HZ = 20000000 ' assume 20MHz crystal by default CLKIN_HZ = 25000000 ' assume 20MHz crystal by default CLKSRC = CLKSRC_XTAL ' enable this for crystal clock source (default) 'CLKSRC = CLKSRC_XIN ' enable this for direct input clock source on XI (no crystal) ' parameters used when automatically determining PLL settings TOLERANCE_HZ = 500000 ' pixel clock accuracy will be constrained by this when no exact ratios are found MAXVCO_HZ = 350000000 ' for safety, but you could try to overclock even higher at your own risk MINVCO_HZ = 100000000 MINPLLIN_HZ = 500000 ' setting lower can find more PLL ratios but may begin to introduce more PLL jitterPS - Using 'fast' sysclk/1' is a dead loss. Not even good up to 300 on my gizmo. S.
The PLL code (with the con block and the Compute Clock method results gave me exactly the same results as yours. (at 25MHz Xtal). S.
PS - For grinsies, I tested out my other KISS (I had bought two) and lightly tweaked your PLL calculation numbers to run up to 360MHz. It did. See attached...