@evanh said:
That is cool. Just shows that, in good conditions, the specs are conservative. Temperature in particular will be a factor. They'll be spec'd for 70+ C, with some leeway.
Also, the timing compensation transition around 180 MHz is narrow and will shift with temperature, and board layout too. Those regions are always going to be risky to use in an automated way.
Yes unfortunately it's our same old data sampling problem, across all the different RAM types. We get input timing variation with frequency, board layout/delay, process, temperature, and probably voltage as well (though that one is controlled).
@Surac said:
If only we had more ram inside the p2. That would nullify the need for external memory for videoout
Yeah, and you would have to wait for a horizontal or vertical blank to write to it since access to hub ram is arbitrated in hardware. Other cogs could generate sprites and stuff them into the framebuffer at the right time. Actually, you could have a neighboring cog feed sprite data through the mailbox and it not even go into the framebuffer. That way you could have sprites on top of a character mapped screen C64 style..just a lot larger. Anyways the P2 sure does allow for some fun ideas even in 512KB!
If something can be learned by using faster chips, Mouser Australia shows a small stock (26 pieces) of Cypress CY7C1049CV33-8ZSXC (8 nS parts), incorrectly tagged as 15 nS ones:
If something can be learned by using faster chips, Mouser Australia shows a small stock (26 pieces) of Cypress CY7C1049CV33-8ZSXC (8 nS parts), incorrectly tagged as 15 nS ones:
In looking for other parts I was interested in recently for another P2 project, I found there is a certainly a shortage of some chips at places like Mouser and Element14 with various devices I have on my BOM on back order with long lead times. Looks like this worldwide semiconductor shortage affects some of us hobbyists too.
@Surac said:
If only we had more ram inside the p2. That would nullify the need for external memory for videoout
Yeah, and you would have to wait for a horizontal or vertical blank to write to it since access to hub ram is arbitrated in hardware.
This is confusing me. Did you mean "would not have to wait"?
Actually in general if you use my driver, other COG's r/w access to external memory while video is operating does not have to only be in the blanking portions, because a maximum burst size is configurable along with COG priority. You just need configure the burst size for non-video COGs to not delay the video COG's request beyond the next scan line and there still needs to be at least some excess memory bandwidth available per scan line to service the lower priority COG access. More of the latter can allow overlap with the active video portion.
However looking at the chip select low time with the P2 at 277MHz and the dot clock at 138.5MHz, in this particular case I do see there is very little memory bandwidth available to other COGs given we are reading at 138MB/s during the video scan line at the 1080p resolution with 8bpp. It would probably only be feasible to set a very small burst size. Or you could choose to run with 4bpp and get about 1/2x the total bandwidth left for other COG writes.
This is where PSRAM on the next P2 Edge shines with its 16 bit wide bus implementation which doubles the bandwidth available vs SRAM (@ sysclk/2 read rates). It saves pins too and probably $. Just not quite as good for emulators with respect to latency but for video and other general use it's great.
In looking for other parts I was interested in recently for another P2 project, I found there is a certainly a shortage of some chips at places like Mouser and Element14 with various devices I have on my BOM on back order with long lead times. Looks like this worldwide semiconductor shortage affects some of us hobbyists too.
Nice ISSI part (4x capacity), indeed.
I was also checking for the availability of some chips I've been studying since last year or so, and also found a shortage of many, if not all, of them.
(Rochester Electronics) (also promoted at Digikey's martketplace) seems to be a good source for a lot of hard-to-find and obsolete parts, though I know nothing about their commecial policy (minimum quantities/invoice values), or even if they are keen to deal with small companies.
At least a thing I noted about their offerings (at Digikey's web site): sometimes, the available package selection seems to be a little different from the original ones (in height, at least), but I didn't checked the solder footprints thoroughly.
Here's what the 1080p 8bpp external memory CS low time looks like per scan line. Already takes up most of the scan line doing the video read request.
And here's the 4bpp version with about 1/2x the total bandwidth left for writes where CS is high (excluding some overhead per request of ~1us). This is because the video request completes early (half way into the scan line). Its data for the whole scan line is already stored in HUB RAM at this point.
@rogloh said:
So my driver framework can now output video from either Hub RAM, HyperRAM, PSRAM, or SRAM, and you can have a mix of them too (I still need to test that but it's designed to work). Other COGs still get write access to these RAMs. It could also read region image data from HyperFlash too if needed, but that would be limited to just static image data.
This is brilliant!!!
That’s a lot faster than i I imagined would be possible … and lots of spare ram left for blocks or sprites and the like…….nice!
I've received a shipment of SRAM with the 8ns 512kB part @Yanomani suggested (CY7C1049CV33-8ZSXC) as well as the 2MB 10ns ISSI part (IS61WV20488FBLL-10TLI).
Just soldered up the 8ns Cypress part with my PCB using the TSSOP-II footprint and tried it out with the SRAM driver. Getting good random read results using sysclk/2 reads (results attached) although there is a gap around 183MHz at a delay crossover point. Using this 8ns SRAM, both 252 and 270MHz P2 operation look good. At 297MHz it is still working but it's probably a bit too close to the cutoff point of 305MHz for comfort.
Update: in comparison with the SOJ SRAM I tested last time this seems slightly worse. That SOJ SRAM was rated to 12ns but works better than the 8ns one Speed binning?
Video is also working nicely with this.
Will also try the 2MB ISSI one soon which might allow full 1080p frame size at 8bpp depending on performance.
Any possibility of tweaking a bit the A[18:0]/CE/OE/WE arrival/change timings, by experimenting with combinations/variations at P2 pins clocked/unclocked behaviour?
Perhaps the address lines are changing really fast, and the memory device reacting accordingly, thus there are chances that all/some of the data bits modify earlier than ideal, just before being captured at their best "cat's eye", as expected.
Sure, there is also the possibility of Cypress devices being a little "picky", in terms of total noise, be it power supply related, or at the very signaling lines.
Yeah that sort of thing is possible to play with @Yanomani .
I've just soldered up the 2MB ISSI SRAM, but I'm seeing some video noise when testing it which is weird plus what looks like address foldover so maybe something is not right there...either bad soldering, or a bad test setup. Interestingly the delay test I ran with random data worked really well all the way to 350MHz (attached).
Update: okay I found a setup issue in the video test, as it was still setup for 512kB, but it's looking better now, still testing...
Update2: full HD 8bpp appears to be working at 297MHz!
Here's a close up of the pixels in full HD mode on the monitor. Nice and stable.
Yeah the ISSI SRAM looks good based on early results (you'll be happy @aaaaaaaargh as that is the part you have too). Plus it makes better use of the pin groups available on a port and allows a full 1080p framebuffer at 8bpp. I'll probably stick to using this board now for my SRAM experiments, it seems to be the best one I have if it can still run with the P2 at 350MHz (175MHz memory read clock).
Recently I received my PSRAM parts order, and I soldered up my PSRAM test board. My first attempt got fully messed up (wrong silk, don't ask), but I had thankfully ordered enough parts to make up a second one. I had obtained the Adafruit PSRAM parts from Mouser which have their part markings shaved off (why?!) but it looked like at least some of them are ESP-PSRAM64H devices according to some partial logos remaining.
EDIT : first board has been resoldered and is now also working
So far I am getting good results from this board in operation. I ran my video driver using it with different resolutions/P2 frequencies, as well as with my delay test that scans through the frequency range to find stable delay values - it was still working up to 350MHz which is as far as I wanted to take the P2. This means the PSRAM is still clocking readable data at 175MHz, which is quite a lot above its rated 144MHz. The delay test results are also attached below, and they show very good overlap.
Wow!! Normally I complain about circular logic, but in this case I'll give you a pass...
Being serial, I'm thinking that it's 175M bits per second, or about 21M bytes/s - or is there some other serial overhead? For 32-bit words, that would be about 5.4 M words/s. Fast enough for me! S.
PS - half-scraped silkscreen on the chips might just be lousy quality control. If they really wanted to erase what they were, I don't think there'd be any visible markings.
@Scroungre said:
Wow!! Normally I complain about circular logic, but in this case I'll give you a pass...
Being serial, I'm thinking that it's 175M bits per second, or about 21M bytes/s - or is there some other serial overhead? For 32-bit words, that would be about 5.4 M words/s. Fast enough for me! S.
PS - half-scraped silkscreen on the chips might just be lousy quality control. If they really wanted to erase what they were, I don't think there'd be any visible markings.
No the PSRAM is far better than that. With 4 devices in parallel it's able to transfer at 350MByte/s! i.e. 16 bits every 2 P2 clocks @350MHz using quad SPI transfers per device. There is some address setup overhead at the start but it's not too significant for larger video transfers of 8us duration.
This is easily able to do 1080p60 at 8bpp, 18-20 IO pins are required.
My board above splits the PSRAM into two independent banks (two chip select pins, two clock pins, 16 data pins) so it could also be used by two different applications at the exact same time (8 bits each), or operated together for a wider single RAM with higher bandwidth. Right now my driver is 16 bits only, but could be adapted to work with 8 bit widths at some point down the track.
@aaaaaaaargh said:
8 bit would nice - any chance that this could work with just 1 PSRAM chip? (at lower resoluitions?)
With lots of work, potentially yes. But with just a single 4 bit device you'd only get a quarter of the overall bandwidth (87.5MB/s) and even less is actually usable with the address overheads and write bandwidth needs, and this is with a P2 @ 350MHz which is not going to be all that realistic in practice. So a single PSRAM wouldn't support very many video modes. Also you can't really rely on clocking the PSRAM memory above 144MHz if you want to remain within its rated limits, although I have found they do seem to overclock fairly easily (at room temp).
@rogloh said:
Update: in comparison with the SOJ SRAM I tested last time this seems slightly worse. That SOJ SRAM was rated to 12ns but works better than the 8ns one Speed binning?
I wonder if the memory plane differs, as you also changes sizes there, right ?
RAM vendors are usually vague about exact pin-array address mapping, but you are going to have faster access on the lines to the downstream side of the memory arrays, than any address line upstream of the memory array,
Maybe you can go looking for those fast/slow pins ?
@aaaaaaaargh said:
Just checking.... @rogloh: is there perhaps some news on your display driver's (S)RAM support feature?
Maybe a beta version ;-)
Nothing further to report on the display driver as I've been working on my P2ME2 and Voyager boards over the last months and more recently tied up doing some AVR I2C, USB, and touchscreen stuff for it. However I do still want to get back to this and release it. The video driver already works with my HyperRAM driver (released) and PSRAM (unreleased) but is just undocumented. I can't recall exactly how much SRAM was working without reading all the above again but I think it was somewhat functional too.
I do have a 4 bit single PSRAM on my Voyager board (not 16bit) so there will be some impetus to probably get that going as well and the PSRAM based Edge might be coming out soon too which in theory should be able to use my driver right away unless the final chip selected has been changed and differs somehow. Those things will very likely prompt further work on this stuff again soon. It will get done eventually.
Comments
Yes unfortunately it's our same old data sampling problem, across all the different RAM types. We get input timing variation with frequency, board layout/delay, process, temperature, and probably voltage as well (though that one is controlled).
Wow, a truly stunning result. Gotta be happy with those speeds
Yeah, and you would have to wait for a horizontal or vertical blank to write to it since access to hub ram is arbitrated in hardware. Other cogs could generate sprites and stuff them into the framebuffer at the right time. Actually, you could have a neighboring cog feed sprite data through the mailbox and it not even go into the framebuffer. That way you could have sprites on top of a character mapped screen C64 style..just a lot larger. Anyways the P2 sure does allow for some fun ideas even in 512KB!
Yeah, indeed promising results.
Congratulations!
If something can be learned by using faster chips, Mouser Australia shows a small stock (26 pieces) of Cypress CY7C1049CV33-8ZSXC (8 nS parts), incorrectly tagged as 15 nS ones:
(https://au.mouser.com/ProductDetail/Cypress-Semiconductor/CY7C1049CV33-8ZSXC?qs=/z5Af45Rph2Fqr8bD0ehSA==)
At AU$ 7.17 apiece, perhaps worths have it checked...
Yeah at least something's available there. Although for ~$17 there is 4x the capacity with that ISSI SRAM (10ns). That part is also available and it would suit my other half of the board and give full 1080p. I should probably put in an order soon to try it.
https://au.mouser.com/ProductDetail/ISSI/IS61WV20488FBLL-10TLI?qs=l7cgNqFNU1hrNZYLU3MwTg==
In looking for other parts I was interested in recently for another P2 project, I found there is a certainly a shortage of some chips at places like Mouser and Element14 with various devices I have on my BOM on back order with long lead times. Looks like this worldwide semiconductor shortage affects some of us hobbyists too.
I like that 2 MB part. It's still fast.
This is confusing me. Did you mean "would not have to wait"?
Actually in general if you use my driver, other COG's r/w access to external memory while video is operating does not have to only be in the blanking portions, because a maximum burst size is configurable along with COG priority. You just need configure the burst size for non-video COGs to not delay the video COG's request beyond the next scan line and there still needs to be at least some excess memory bandwidth available per scan line to service the lower priority COG access. More of the latter can allow overlap with the active video portion.
However looking at the chip select low time with the P2 at 277MHz and the dot clock at 138.5MHz, in this particular case I do see there is very little memory bandwidth available to other COGs given we are reading at 138MB/s during the video scan line at the 1080p resolution with 8bpp. It would probably only be feasible to set a very small burst size. Or you could choose to run with 4bpp and get about 1/2x the total bandwidth left for other COG writes.
This is where PSRAM on the next P2 Edge shines with its 16 bit wide bus implementation which doubles the bandwidth available vs SRAM (@ sysclk/2 read rates). It saves pins too and probably $. Just not quite as good for emulators with respect to latency but for video and other general use it's great.
Nice ISSI part (4x capacity), indeed.
I was also checking for the availability of some chips I've been studying since last year or so, and also found a shortage of many, if not all, of them.
https://rocelec.com
(Rochester Electronics) (also promoted at Digikey's martketplace) seems to be a good source for a lot of hard-to-find and obsolete parts, though I know nothing about their commecial policy (minimum quantities/invoice values), or even if they are keen to deal with small companies.
At least a thing I noted about their offerings (at Digikey's web site): sometimes, the available package selection seems to be a little different from the original ones (in height, at least), but I didn't checked the solder footprints thoroughly.
Here's what the 1080p 8bpp external memory CS low time looks like per scan line. Already takes up most of the scan line doing the video read request.
And here's the 4bpp version with about 1/2x the total bandwidth left for writes where CS is high (excluding some overhead per request of ~1us). This is because the video request completes early (half way into the scan line). Its data for the whole scan line is already stored in HUB RAM at this point.
This is brilliant!!!
That’s a lot faster than i I imagined would be possible … and lots of spare ram left for blocks or sprites and the like…….nice!
I've received a shipment of SRAM with the 8ns 512kB part @Yanomani suggested (CY7C1049CV33-8ZSXC) as well as the 2MB 10ns ISSI part (IS61WV20488FBLL-10TLI).
Just soldered up the 8ns Cypress part with my PCB using the TSSOP-II footprint and tried it out with the SRAM driver. Getting good random read results using sysclk/2 reads (results attached) although there is a gap around 183MHz at a delay crossover point. Using this 8ns SRAM, both 252 and 270MHz P2 operation look good. At 297MHz it is still working but it's probably a bit too close to the cutoff point of 305MHz for comfort.
Update: in comparison with the SOJ SRAM I tested last time this seems slightly worse. That SOJ SRAM was rated to 12ns but works better than the 8ns one Speed binning?
Video is also working nicely with this.
Will also try the 2MB ISSI one soon which might allow full 1080p frame size at 8bpp depending on performance.
Any possibility of tweaking a bit the A[18:0]/CE/OE/WE arrival/change timings, by experimenting with combinations/variations at P2 pins clocked/unclocked behaviour?
Perhaps the address lines are changing really fast, and the memory device reacting accordingly, thus there are chances that all/some of the data bits modify earlier than ideal, just before being captured at their best "cat's eye", as expected.
Sure, there is also the possibility of Cypress devices being a little "picky", in terms of total noise, be it power supply related, or at the very signaling lines.
Yeah that sort of thing is possible to play with @Yanomani .
I've just soldered up the 2MB ISSI SRAM, but I'm seeing some video noise when testing it which is weird plus what looks like address foldover so maybe something is not right there...either bad soldering, or a bad test setup. Interestingly the delay test I ran with random data worked really well all the way to 350MHz (attached).
Update: okay I found a setup issue in the video test, as it was still setup for 512kB, but it's looking better now, still testing...
Update2: full HD 8bpp appears to be working at 297MHz!
Here's a close up of the pixels in full HD mode on the monitor. Nice and stable.
These 10nS ISSI parts results look really promising!
Pitty the 8nS Cypress ones where not that "faithfull"...
Yeah the ISSI SRAM looks good based on early results (you'll be happy @aaaaaaaargh as that is the part you have too). Plus it makes better use of the pin groups available on a port and allows a full 1080p framebuffer at 8bpp. I'll probably stick to using this board now for my SRAM experiments, it seems to be the best one I have if it can still run with the P2 at 350MHz (175MHz memory read clock).
Wow, a great result. Nice to get reward for effort.
It'd be good to get a heat gun and freeze spray onto it sometime.
Yeah room temp operation is one thing, operating over a temperature/voltage range is another...
Yippie - This gets better and better!
I found what you need Roger
GS82583ED36GK-625I
8Mx36 625MHz 1V3 for only $643.84 at Mouser. 10 in stock
How's your BGA soldering?
The best bit, for an $869AUD order:
This Product Ships FREE
Recently I received my PSRAM parts order, and I soldered up my PSRAM test board. My first attempt got fully messed up (wrong silk, don't ask), but I had thankfully ordered enough parts to make up a second one. I had obtained the Adafruit PSRAM parts from Mouser which have their part markings shaved off (why?!) but it looked like at least some of them are ESP-PSRAM64H devices according to some partial logos remaining.
EDIT : first board has been resoldered and is now also working
So far I am getting good results from this board in operation. I ran my video driver using it with different resolutions/P2 frequencies, as well as with my delay test that scans through the frequency range to find stable delay values - it was still working up to 350MHz which is as far as I wanted to take the P2. This means the PSRAM is still clocking readable data at 175MHz, which is quite a lot above its rated 144MHz. The delay test results are also attached below, and they show very good overlap.
Wow!! Normally I complain about circular logic, but in this case I'll give you a pass...
Being serial, I'm thinking that it's 175M bits per second, or about 21M bytes/s - or is there some other serial overhead? For 32-bit words, that would be about 5.4 M words/s. Fast enough for me! S.
PS - half-scraped silkscreen on the chips might just be lousy quality control. If they really wanted to erase what they were, I don't think there'd be any visible markings.
No the PSRAM is far better than that. With 4 devices in parallel it's able to transfer at 350MByte/s! i.e. 16 bits every 2 P2 clocks @350MHz using quad SPI transfers per device. There is some address setup overhead at the start but it's not too significant for larger video transfers of 8us duration.
This is easily able to do 1080p60 at 8bpp, 18-20 IO pins are required.
My board above splits the PSRAM into two independent banks (two chip select pins, two clock pins, 16 data pins) so it could also be used by two different applications at the exact same time (8 bits each), or operated together for a wider single RAM with higher bandwidth. Right now my driver is 16 bits only, but could be adapted to work with 8 bit widths at some point down the track.
8 bit would nice - any chance that this could work with just 1 PSRAM chip? (at lower resoluitions?)
With lots of work, potentially yes. But with just a single 4 bit device you'd only get a quarter of the overall bandwidth (87.5MB/s) and even less is actually usable with the address overheads and write bandwidth needs, and this is with a P2 @ 350MHz which is not going to be all that realistic in practice. So a single PSRAM wouldn't support very many video modes. Also you can't really rely on clocking the PSRAM memory above 144MHz if you want to remain within its rated limits, although I have found they do seem to overclock fairly easily (at room temp).
Oh no, this was just an idea.... Better spend your precious time on (p)SRAM support :-)
Just checking.... @rogloh: is there perhaps some news on your display driver's (S)RAM support feature?
Maybe a beta version ;-)
I wonder if the memory plane differs, as you also changes sizes there, right ?
RAM vendors are usually vague about exact pin-array address mapping, but you are going to have faster access on the lines to the downstream side of the memory arrays, than any address line upstream of the memory array,
Maybe you can go looking for those fast/slow pins ?
Nothing further to report on the display driver as I've been working on my P2ME2 and Voyager boards over the last months and more recently tied up doing some AVR I2C, USB, and touchscreen stuff for it. However I do still want to get back to this and release it. The video driver already works with my HyperRAM driver (released) and PSRAM (unreleased) but is just undocumented. I can't recall exactly how much SRAM was working without reading all the above again but I think it was somewhat functional too.
I do have a 4 bit single PSRAM on my Voyager board (not 16bit) so there will be some impetus to probably get that going as well and the PSRAM based Edge might be coming out soon too which in theory should be able to use my driver right away unless the final chip selected has been changed and differs somehow. Those things will very likely prompt further work on this stuff again soon. It will get done eventually.
In another more time-abundant universe perhaps. Too many other projects consuming time in this one I'm in...