A lot of small sprites in there. A quick research what this platform is - "Windows 98".. The screen looks like 24/32 bpp, it's Windows and year 200x, so that's normal.
PC hardware has no hardware sprites, 200x graphic cards have no programmable shaders, so these have to be software rendered. 2003 PC is fast enough to do this. P2... I don't know. Especially in 32 bpp.
18 sprites are not too many, but they are up to 512 px wide. At 8 bpp, when limit the sprite width to 32, 128 sprites should be no problem at all, except their coordinates and pointers have to fit in the hub along the driver code and that may limit their count. However the driver code may be as simple as it can be.
My code is about 190 longs, plus variables - 310 longs total, - there is fit 368 there to make a place for a sprite buffer. 58 longs left, 19 sets of sprite coordinates can be added without modifying the driver (except changing constants), giving 37 total... Maybe I will test this because why not?
WMLONG works for 32bpp too, I have the 32 bpp version of the driver. The sprite size is limited to 128 px in it, as the limit is the COG RAM left after the code for the RDLONG/WMLONG buffer. Making sprite colors like $xx0000 or $xxyy00 (etc) can give an interesting effect...
@pik33 said:
200x graphic cards have no programmable shaders, so these have to be software rendered.
It actually does use Direct3D 8, no shaders required. (Just some kind of software pseudo-instancing that puts all sprites of the same type into the same vertex buffer so they can all be drawn at once with a single set of DX API calls)
The main problem when you increase the sprite count isn't the memory, it's that you start wasting a lot of time drawing nothing when you simply check the Y position of every sprite on every line O(N*M) style. The way I solve this in the P1 sprite driver is to use two acceleration structures. (warning: simplified explanation, the way the P1 driver actually works is slightly more cursed) One is a Y-sorted list of sprites that the render cog(s) generate during Vblank. The other is a bit mask of active sprites. Before starting to render sprites for a line, the code goes through the sorted list and enables all sprites it finds in the bit mask, until it finds a sprite with a larger Y than the current line. Then it scans through the bit mask to find which sprites to draw. When it draws the last line of a sprite it unsets the sprite bit. This method does very little work per-line, and mostly depends on how fast you can work with the bit mask. On P1 doing this with more than 32 sprites is kinda eh (there might be a better method for P1 based on linked lists...), but on P2 there's ALT* and ENCOD ops to make it fast.
In terms of memory, maybe each sprite needs 8 bytes (1 type + 3 pointer + 2 X + 2 Y), times two for double buffering, so 16. Additionally, 4 bytes for each entry in the sorted list (no double buffer needed if it can be built during VBlank. Could also be just 2 bytes (no copy of Y value) in exchange for slower sorting and traversal), so 20. So 20K memory usage to handle 1024 sprites, not a lot. Of course the actual images need memory, too, which is as always the larger problem (even if you small ones that need to rotate or something, that eats memory quickly).
So, I un-bitrotted the Spin Hexagon port (fixed compile errors and replaced USB driver). Here's it preconfigured for the funny board. Note that you need to get all the audio files from the P1 version: https://github.com/IRQsome/Spin-Hexagon
config is in platform.spin. Try setting VIDEO_RESOLUTION to different values (allegedly RES_1920x1600 actually works!). Though as I realize, many of the HD modes configure quite pedestrian clock speeds, so perhaps not a good test case for block access power issues.
@Wuerfel_21 That might give me a good way to test robot board.. Looks like I need files on uSD card.
Also, guess it needs to be compiled with FlexProp, right?
@Rayman said:
@Wuerfel_21 That might give me a good way to test robot board.. Looks like I need files on uSD card.
Also, guess it needs to be compiled with FlexProp, right?
Yes, as mentioned, grab the audio files (RAW and VU) from the linked github. Also, yes, flexspin. Note that it's in Spin1 format but obviously needs to be compiled for P2.
BTW: I'm not liking how NeoPixel 2020 led is sharing P7 with Eval header. Just moved it to P17 (sharing with Ping input) in next version. Think that's OK with 1k resistor between Neopixel output pin and Ping input pin.
Also, the Lipo connector is 180 degrees off, fixing that too.
I'm hoping that with just two servos, can power both servos and P2 with AA pack. If not, will have to use the Lipo for P2...
Think have final versions of two of the three Simple boards done.
The LDOs seem to have fixed the audio issues and should be good for ADC work as well.
Added a SparkFun style LiPo battery connector. This is connected to +5V input and seems to work.
Added a QWIIC connector to both and connected to P52 and P53.
These pins have 1k resistors in series with blue LEDs on them, but appears that I2C works just fine with them and the LEDs then become I2C activity lights.
For the ++ version, tested out the mic and think it's OK.
Surface mount RGB led works on Bot board and sharing the pin with Ping sensor seems to work.
Top layer Eagle view and schematics attached.
Probably going to place order for some of these today...
Ordered boards today from Seeed fusion. 4-layer gold boards are expensive everywhere.
But, Fusion has 2 layer boards <100mm x 100mm for dirt cheap, $4.90 for 10.
So, I really quickly (too quickly) put these together to add to order.
Not really thought out, just slapped anything I could find together.
The only board I really need is the one with Leds on every pin. Need that to verify that P2 is soldered in right...
I thought they might complain about the breakout one having too many holes, but they haven't yet...
Built first four units of latest (final?) version, 3k.
Results with MegaYume are that 3 of 4 run at least for a while with no cooler. NeoYume is only 50% (although freezing the boards helps sometimes).
Seems like you can tell how well it will do by just running a simple VGA bitmap viewer and seeing how fast a clock it can work at.
The range of results has me a bit puzzled. Boards and paste are pretty identical...
Wondering if it's really the P2 chips themselves that are different?
P2 Eval boards all seem good though, but maybe they picked out the best chips for their boards somehow? (seems possible, but unlikely).
Hard to imagine what is causing this range in results, but maybe it's all just borderline >320 MHz...
Could try 6 layers, but think I'm done. Having MegaYume work is good enough, I think. Hopefully that low freq one is just a fluke.
Wouldn’t necessarily call the 320 MHz limited board bad, but not what I’m aiming for.
I think the hdmi example at 250 MHz is the highest clock example from parallax. Is this right?
Think original official spec was 160 MHz but seems to be 180 MHz now.
The absolute minimum for me is 300 MHz so can do 1080p. I’m not going to sell anything that can’t do that at least with not cooling or other tricks.
Could wind up that only these boards hand made by yours truly have any chance at >320 MHz operation.
Hmm, crusty crystals? Wonder what my "known non-flakey" boards use. Those being an EVAL Rev C, EC32MB Rev. A and oddly enough, the P2STAMP.
EVAL uses unspecified crystal of some sort.
EDGE uses unspecified oscillator IC with some 74xx part doing... something?
P2STAMP uses IC "ECS-200-10-37B"
So all sorts of osc setups. Note once again that the P2STAMP is rather solid despite having bad thermals and getting really hot under extended operation (somewhere between "fresh cheese pretzel" and "surface of the sun" hot). I think the PSRAMs may be more prone to thermal problems than the P2 itself. (and the stamp uses a HyperRAM that is actually being operated in spec range. Only 16MB, oh well)
Something that all the more stable boards share in common is that the capacitors are the smaller type and placed much closer to the P2. On your board they're all spaced out because you also route all the signals on the top layer.
I have places for closer caps on bottom of boards. I've tested using them, but didn't help with max. clock.
Still not 100% sure it's a thermal issue. It chokes immediately, seems like before it would have time to heat up.
But, putting in freezer helps, so guess is thermal...
Still, going to try CMOS oscillator instead of crystal and hope that will help. Don't think it will though...
@Rayman If you have some decent/stable programmable lab power supply gear with controllable voltage you could possibly try using that instead of your regulators...? Take a lower performing board and feed into its voltage rails directly and bypass the internal regulators. At least to rule out the power supply section as being the cause of your problems, if you've not done that. Maybe some boards are noisier than others at high current loads. Although the fact that you can get a bad result with a simple VGA viewer is probably meaning it doesn't take a lot of current to fail, just high frequency switching currents perhaps.
The classic PVT variations can include process, voltage and temperature, so I'd suggest also logging the supply voltages with the MHz test points, to see if they correlate.
@Rayman said:
Could try 6 layers, but think I'm done.
If you confirm this is thermal, you could try more aggressive copper under the P2, still with 4 layers ?
Not sure what the final PCB used, but the earlier plots had open area on the inner planes, whilst some ground inner pours looked possible.
Even some finger interleave on inner layers, could help transfer heat from GND slug to the planes.
The bottom plane is a trade off between thermal paths and decoupling.
Something less that a-cap-per-pin, would allow more focus on thermal.
Feeding voltage sounds like a good idea. Could be regulators doing something that P2 doesn't like at high frequency...
Be nice if I could use 1 AA for 1.8 V and 2 AA for 3.3 V. Don't think that'd work though...
You could probably hotwire the supply pins from an EVAL board (there's a jumper that has the 1.8V supply on it), that also seems convenient if you don't have a bench supply or something like that.
Okay, that is bad. Not even one layer has good thermal spread.
The bottom layer is the only one that has any spread at all but it is severely limited by the tight ring of capacitor footprints. Fixing this would be a start. Since you've tested successfully without those caps in place you may as well try a layout that has all those footprints removed as well.
Used VGA test on previous (non-LDO) version and it goes to 370 MHz, same as Eval board. So, not clear thermal spread is a factor as bottom layer is pretty much identical...
Did some more tests on worst performing board:
Tried adding in a 1000 uF cap on 3.3 V rail then 1.8 V rail, no help.
Tried Frankenstein-ing in the Eval board's 3.3 and 1.8 V rails, no help.
Guess the CMOS oscillator is last best chance of improvement...
Still 320 MHz is good enough for every code out there that I know of, except of course, NeoYume, MegaYume, and Rogloh's VGA examples at >640x480..
@Rayman said:
Tried adding in a 1000 uF cap on 3.3 V rail then 1.8 V rail, no help.
Tried Frankenstein-ing in the Eval board's 3.3 and 1.8 V rails, no help.
You could use a variable version of the regulator and nudge the 1v8 supply up a little?
That will offset temperature effects.
Comments
A lot of small sprites in there. A quick research what this platform is - "Windows 98".. The screen looks like 24/32 bpp, it's Windows and year 200x, so that's normal.
PC hardware has no hardware sprites, 200x graphic cards have no programmable shaders, so these have to be software rendered. 2003 PC is fast enough to do this. P2... I don't know. Especially in 32 bpp.
18 sprites are not too many, but they are up to 512 px wide. At 8 bpp, when limit the sprite width to 32, 128 sprites should be no problem at all, except their coordinates and pointers have to fit in the hub along the driver code and that may limit their count. However the driver code may be as simple as it can be.
My code is about 190 longs, plus variables - 310 longs total, - there is fit 368 there to make a place for a sprite buffer. 58 longs left, 19 sets of sprite coordinates can be added without modifying the driver (except changing constants), giving 37 total... Maybe I will test this because why not?
WMLONG works for 32bpp too, I have the 32 bpp version of the driver. The sprite size is limited to 128 px in it, as the limit is the COG RAM left after the code for the RDLONG/WMLONG buffer. Making sprite colors like $xx0000 or $xxyy00 (etc) can give an interesting effect...
It actually does use Direct3D 8, no shaders required. (Just some kind of software pseudo-instancing that puts all sprites of the same type into the same vertex buffer so they can all be drawn at once with a single set of DX API calls)
The main problem when you increase the sprite count isn't the memory, it's that you start wasting a lot of time drawing nothing when you simply check the Y position of every sprite on every line
O(N*M)
style. The way I solve this in the P1 sprite driver is to use two acceleration structures. (warning: simplified explanation, the way the P1 driver actually works is slightly more cursed) One is a Y-sorted list of sprites that the render cog(s) generate during Vblank. The other is a bit mask of active sprites. Before starting to render sprites for a line, the code goes through the sorted list and enables all sprites it finds in the bit mask, until it finds a sprite with a larger Y than the current line. Then it scans through the bit mask to find which sprites to draw. When it draws the last line of a sprite it unsets the sprite bit. This method does very little work per-line, and mostly depends on how fast you can work with the bit mask. On P1 doing this with more than 32 sprites is kinda eh (there might be a better method for P1 based on linked lists...), but on P2 there's ALT* and ENCOD ops to make it fast.In terms of memory, maybe each sprite needs 8 bytes (1 type + 3 pointer + 2 X + 2 Y), times two for double buffering, so 16. Additionally, 4 bytes for each entry in the sorted list (no double buffer needed if it can be built during VBlank. Could also be just 2 bytes (no copy of Y value) in exchange for slower sorting and traversal), so 20. So 20K memory usage to handle 1024 sprites, not a lot. Of course the actual images need memory, too, which is as always the larger problem (even if you small ones that need to rotate or something, that eats memory quickly).
Here's the revised robot version of the board.
Adds a lipo connector.
And 2020 Neopixel
Need to test it all out…
So, I un-bitrotted the Spin Hexagon port (fixed compile errors and replaced USB driver). Here's it preconfigured for the funny board. Note that you need to get all the audio files from the P1 version: https://github.com/IRQsome/Spin-Hexagon
config is in
platform.spin
. Try setting VIDEO_RESOLUTION to different values (allegedly RES_1920x1600 actually works!). Though as I realize, many of the HD modes configure quite pedestrian clock speeds, so perhaps not a good test case for block access power issues.@Wuerfel_21 That might give me a good way to test robot board.. Looks like I need files on uSD card.
Also, guess it needs to be compiled with FlexProp, right?
Yes, as mentioned, grab the audio files (RAW and VU) from the linked github. Also, yes, flexspin. Note that it's in Spin1 format but obviously needs to be compiled for P2.
Ok, I got it on Bot board.
Need to figure out why USB LED isn't lit though. Should be right?
BTW: That game is way to fast for me, but is good way to test things...
Time to field test revised bot board.
BTW: I'm not liking how NeoPixel 2020 led is sharing P7 with Eval header. Just moved it to P17 (sharing with Ping input) in next version. Think that's OK with 1k resistor between Neopixel output pin and Ping input pin.
Also, the Lipo connector is 180 degrees off, fixing that too.
I'm hoping that with just two servos, can power both servos and P2 with AA pack. If not, will have to use the Lipo for P2...
I need to figure out which way is forward…
Guessing big gear is in front…
Add a cannon
Think have final versions of two of the three Simple boards done.
The LDOs seem to have fixed the audio issues and should be good for ADC work as well.
Added a SparkFun style LiPo battery connector. This is connected to +5V input and seems to work.
Added a QWIIC connector to both and connected to P52 and P53.
These pins have 1k resistors in series with blue LEDs on them, but appears that I2C works just fine with them and the LEDs then become I2C activity lights.
For the ++ version, tested out the mic and think it's OK.
Surface mount RGB led works on Bot board and sharing the pin with Ping sensor seems to work.
Top layer Eagle view and schematics attached.
Probably going to place order for some of these today...
Ordered boards today from Seeed fusion. 4-layer gold boards are expensive everywhere.
But, Fusion has 2 layer boards <100mm x 100mm for dirt cheap, $4.90 for 10.
So, I really quickly (too quickly) put these together to add to order.
Not really thought out, just slapped anything I could find together.
The only board I really need is the one with Leds on every pin. Need that to verify that P2 is soldered in right...
I thought they might complain about the breakout one having too many holes, but they haven't yet...
Built first four units of latest (final?) version, 3k.
Results with MegaYume are that 3 of 4 run at least for a while with no cooler. NeoYume is only 50% (although freezing the boards helps sometimes).
Seems like you can tell how well it will do by just running a simple VGA bitmap viewer and seeing how fast a clock it can work at.
The range of results has me a bit puzzled. Boards and paste are pretty identical...
Wondering if it's really the P2 chips themselves that are different?
P2 Eval boards all seem good though, but maybe they picked out the best chips for their boards somehow? (seems possible, but unlikely).
Hard to imagine what is causing this range in results, but maybe it's all just borderline >320 MHz...
Could try 6 layers, but think I'm done. Having MegaYume work is good enough, I think. Hopefully that low freq one is just a fluke.
Wouldn’t necessarily call the 320 MHz limited board bad, but not what I’m aiming for.
I think the hdmi example at 250 MHz is the highest clock example from parallax. Is this right?
Think original official spec was 160 MHz but seems to be 180 MHz now.
The absolute minimum for me is 300 MHz so can do 1080p. I’m not going to sell anything that can’t do that at least with not cooling or other tricks.
Could wind up that only these boards hand made by yours truly have any chance at >320 MHz operation.
Think I'm going to try the ECS-TXO-2520-33-200-AN-TR oscillator mentioned by @jmg earlier in the thread.
Don't think it will help, but worth a shot...
Hmm, crusty crystals? Wonder what my "known non-flakey" boards use. Those being an EVAL Rev C, EC32MB Rev. A and oddly enough, the P2STAMP.
EVAL uses unspecified crystal of some sort.
EDGE uses unspecified oscillator IC with some 74xx part doing... something?
P2STAMP uses IC "ECS-200-10-37B"
So all sorts of osc setups. Note once again that the P2STAMP is rather solid despite having bad thermals and getting really hot under extended operation (somewhere between "fresh cheese pretzel" and "surface of the sun" hot). I think the PSRAMs may be more prone to thermal problems than the P2 itself. (and the stamp uses a HyperRAM that is actually being operated in spec range. Only 16MB, oh well)
Something that all the more stable boards share in common is that the capacitors are the smaller type and placed much closer to the P2. On your board they're all spaced out because you also route all the signals on the top layer.
I have places for closer caps on bottom of boards. I've tested using them, but didn't help with max. clock.
Still not 100% sure it's a thermal issue. It chokes immediately, seems like before it would have time to heat up.
But, putting in freezer helps, so guess is thermal...
Still, going to try CMOS oscillator instead of crystal and hope that will help. Don't think it will though...
@Rayman If you have some decent/stable programmable lab power supply gear with controllable voltage you could possibly try using that instead of your regulators...? Take a lower performing board and feed into its voltage rails directly and bypass the internal regulators. At least to rule out the power supply section as being the cause of your problems, if you've not done that. Maybe some boards are noisier than others at high current loads. Although the fact that you can get a bad result with a simple VGA viewer is probably meaning it doesn't take a lot of current to fail, just high frequency switching currents perhaps.
The classic PVT variations can include process, voltage and temperature, so I'd suggest also logging the supply voltages with the MHz test points, to see if they correlate.
If you confirm this is thermal, you could try more aggressive copper under the P2, still with 4 layers ?
Not sure what the final PCB used, but the earlier plots had open area on the inner planes, whilst some ground inner pours looked possible.
Even some finger interleave on inner layers, could help transfer heat from GND slug to the planes.
The bottom plane is a trade off between thermal paths and decoupling.
Something less that a-cap-per-pin, would allow more focus on thermal.
Feeding voltage sounds like a good idea. Could be regulators doing something that P2 doesn't like at high frequency...
Be nice if I could use 1 AA for 1.8 V and 2 AA for 3.3 V. Don't think that'd work though...
Here's what the layers look like:
You could probably hotwire the supply pins from an EVAL board (there's a jumper that has the 1.8V supply on it), that also seems convenient if you don't have a bench supply or something like that.
Okay, that is bad. Not even one layer has good thermal spread.
The bottom layer is the only one that has any spread at all but it is severely limited by the tight ring of capacitor footprints. Fixing this would be a start. Since you've tested successfully without those caps in place you may as well try a layout that has all those footprints removed as well.
Used VGA test on previous (non-LDO) version and it goes to 370 MHz, same as Eval board. So, not clear thermal spread is a factor as bottom layer is pretty much identical...
Did some more tests on worst performing board:
Tried adding in a 1000 uF cap on 3.3 V rail then 1.8 V rail, no help.
Tried Frankenstein-ing in the Eval board's 3.3 and 1.8 V rails, no help.
Guess the CMOS oscillator is last best chance of improvement...
Still 320 MHz is good enough for every code out there that I know of, except of course, NeoYume, MegaYume, and Rogloh's VGA examples at >640x480..
VGA test is what? A couple of cogs doing very little each. Biggest load is the streamer's hubRAM reads for video output.
You could use a variable version of the regulator and nudge the 1v8 supply up a little?
That will offset temperature effects.
Only when sufficient cooling is applied. Otherwise the temperature gets pushed even higher.
Unless its sourcing the image data from PSRAM perhaps, then more IO current will be apparent at high frequencies...?