Just did the real run from PSRAM test using P2-EVAL and my own PSRAM board at 160MHz. Only made a small improvement in the total tick count but that's to be expected given the number of iterations and the fact that only 31 rows are ever loaded.
Worked first go which is great but almost makes me believe it's not real. Now want to intentionally impair PSRAM to ensure it's legit. Will just yank the breakout board out for that. UPDATE: yep crashed right away with the board removed. So it's loading the code from the PSRAM now!
2K performance run parameters for coremark.
CoreMark Size : 666
Total ticks : 1853091552
Total time (secs): 11
Iterations/Sec : 18
Iterations : 200
I thought the P2 was originally designed for a maximum 180Mhz clock speed, although it has since shown it can generally go much, much faster. For myself I tend to use 200Mhz, but Catalina's default is 180Mhz, because I thought that was still the design maximum. If there is now a reason to make the default slower than that, I'd like to know it.
Haven't considered it yet but in theory multiple COGs running from external memory could each have their own I-caches and share a common external memory, with a commensurate reduction in performance. The main limitation here is that this model doesn't keep directly accessed data in the external RAM, just code,
That would be analogous to Catalina's SMALL mode, which only stores code in external RAM (as opposed to LARGE mode is where both code and data are stored in external RAM). But even multi-processing in SMALL mode would be a huge win. My biggest problem is that I always tried to keep Catalina compatible with both the P1 and the P2, which I have (just about!) managed. But that is largely just nostalgia on my part - I have really fond memories of the P1, but I knew one day this compatibility would probably have to come to an end.
I thought the P2 was originally designed for a maximum 180Mhz clock speed, although it has since shown it can generally go much, much faster. For myself I tend to use 200Mhz, but Catalina's default is 180Mhz, because I thought that was still the design maximum. If there is now a reason to make the default slower than that, I'd like to know it.
There's no reason for that IMO, 160MHz is just what flexspin had setup as the default from way back.
Haven't considered it yet but in theory multiple COGs running from external memory could each have their own I-caches and share a common external memory, with a commensurate reduction in performance. The main limitation here is that this model doesn't keep directly accessed data in the external RAM, just code,
That would be analogous to Catalina's SMALL mode, which only stores code in external RAM (as opposed to LARGE mode is where both code and data are stored in external RAM). But even multi-processing in SMALL mode would be a huge win. My biggest problem is that I always tried to keep Catalina compatible with both the P1 and the P2, which I have (just about!) managed. But that is largely just nostalgia on my part - I have really fond memories of the P1, but I knew one day this compatibility would probably have to come to an end.
I will follow progress with interest
I think it's probably doable to have multiple COGs sharing the external RAM for their code (or even run different code as separate applications). Right now for this proof of concept setup I init an I-cache and block mapping table at startup time in another linked module called extmem.c whose addresses I pass into the COG external memory handler routines executed from LUTRAM during startup. The only change needed would be to not have this as defined a global variable block but allocated on a per COG stack instead via alloca in main() or something like that. I'm running the cache transfer requests through my external memory mailbox so other COGs can certainly make other requests in parallel to the COG executing external C routines. I can easily already prove that out to myself by introducing another video COG which shares the memory but I know it would work, just slower.
Comments
Just did the real run from PSRAM test using P2-EVAL and my own PSRAM board at 160MHz. Only made a small improvement in the total tick count but that's to be expected given the number of iterations and the fact that only 31 rows are ever loaded.
Worked first go which is great but almost makes me believe it's not real. Now want to intentionally impair PSRAM to ensure it's legit. Will just yank the breakout board out for that. UPDATE: yep crashed right away with the board removed. So it's loading the code from the PSRAM now!
@roghloh
I thought the P2 was originally designed for a maximum 180Mhz clock speed, although it has since shown it can generally go much, much faster. For myself I tend to use 200Mhz, but Catalina's default is 180Mhz, because I thought that was still the design maximum. If there is now a reason to make the default slower than that, I'd like to know it.
That would be analogous to Catalina's SMALL mode, which only stores code in external RAM (as opposed to LARGE mode is where both code and data are stored in external RAM). But even multi-processing in SMALL mode would be a huge win. My biggest problem is that I always tried to keep Catalina compatible with both the P1 and the P2, which I have (just about!) managed. But that is largely just nostalgia on my part - I have really fond memories of the P1, but I knew one day this compatibility would probably have to come to an end.
I will follow progress with interest
There's no reason for that IMO, 160MHz is just what flexspin had setup as the default from way back.
I think it's probably doable to have multiple COGs sharing the external RAM for their code (or even run different code as separate applications). Right now for this proof of concept setup I init an I-cache and block mapping table at startup time in another linked module called extmem.c whose addresses I pass into the COG external memory handler routines executed from LUTRAM during startup. The only change needed would be to not have this as defined a global variable block but allocated on a per COG stack instead via alloca in main() or something like that. I'm running the cache transfer requests through my external memory mailbox so other COGs can certainly make other requests in parallel to the COG executing external C routines. I can easily already prove that out to myself by introducing another video COG which shares the memory but I know it would work, just slower.
Probably originates from the Prop2-Hot days. Its design was targetted for 160 MHz I think.