New p2docs pages: Errata and Optimization Guide

I just added two new pages to the p2docs site:
Optimization and Coding for Speed, which should collect advice for fast coding - please tell me if you don't understand something (unless there's already a "TODO" there)
Hardware Bugs & Errata, a list of P2 hardware bugs. I feel like I forgot at least one that was previously discovered. (official doc still doesn't have the RDFAST startup bug)
Now debate this 2AM coldpost while I go to bed.
(I also added and moved stuff around elsewhere)
Comments
Typos: Memory Access section - First sentence contains the misspelt word "obviosu".
I had seen others but I think you've already fixed them.
Details: HubRAM RDLONG/WRLONG exec times are 9..16 and 3..10 respectively. Not 9..17 and 3..11.
"Hub slice alignment" is a case of not worrying about initial alignment and concentrate just on optimising the loop times. I say this because every recompile or shifted instruction will move the absolute location, and therefore hubRAM slice number, of the referenced data and/or executing code and therefore make every edit a timing minefield to predetermine how neatly the eggbeater aligns upon entering the loop.
Basically, inside or outside the loop, both knowing the relative/cyclic slice order of hubRAM accesses and matching that with associated execution timings is needed. Both are equally important. It's just a lot easier to manage optimising such inside a loop rather than outside.
When it comes to doing both loads and stores in an order, that can be optimised too. But loads are not the same phase as stores. I did measure it once-upon-a-time ...
That's how you know it's my authentic artisan post-midnight screeds.
There's probably many typos in there, I don't have spellchecking in my editor.
I've included the possible +1 disalignment penalty in the table for brevity.
Yes I meant that, the relative alignment of successive ops. Maybe need to clear up wording. Though I think absolute alignment comes into play when mixing hub access and CORDIC commands. In pure ASM you can align to exact addresses, so it's possible to use that there.
Though IME when CORDIC is also involved, there's probably time to use the FIFO trick instead.
Yes, I remember... I need to run my own tests so I can write a guide. If just for my own use, because I can never remember how to do it properly. Other under-researched topic is access stalls due to FIFO in hubexec.
My attitude is outside a tight loop, it gets way too hard too quickly.
Cordic commands will count as hub ops I believe. Just they're on a set phase like all the Prop1 hub ops. They could be used as a reference to compare multiple cogs. I'm guessing now.
Cool.
More errata: If using the internal crystal oscillator, pins P28-P31 should not be used to output high frequency signals.
https://forums.parallax.com/discussion/comment/1520712/#Comment_1520712 Took me way too long to find this.
Oh, forgot about that one (because I was thinking more from the software side)
Yep, 'tis nasty one. Only the 20 Ohm Fast pin drive strength is enough to trip it up though. All other output modes don't affect it. Even the hungry 2.0 volt DAC drive is fine.
Wasn’t this just a layout issue?
Yeah, found my post where checked with my boards here:
https://forums.parallax.com/discussion/173205/how-to-kill-a-p2-video-driver-and-probably-usb-etc/p3
This was just a layout error that only affected the P2 Eval board, AFAIK...
Huh, I'd missed that Cluso had proven his board reliable under all tests.
I see Vons trying to come to a resolution but it doesn't look like he did before Chip just decided to use an external crystal oscillator part in place of a plain crystal.
Cluso's board has everything on a single common 3.3 Volt rail. Given the excessive number of 100 nF capacitors that Cluso used, I now suspect it's down to those caps performing better at high frequency and probably the sheer number of them makes them so effective.
The Eval and Edge boards use none at all. For each group of eight I/O pins there is just one 1 uF and one 4.7 uF on the Eval board, and two 4.7 uF on the Edge revA.
I am really happy to hear this. Since the Edge went to a TCXO I assumed that was to mitigate this issue. I'm designing a Pico style P2 board and really wanted to use a crystal. The eval gerbers are posted in that thread. The eval board has individually shielded IO lines. And then the XI/XO lines are routed like a differential pair. Or a directional coupler
There is still a possibility that the canned oscillators have better phase noise than the P2 oscillator. I might try to check that.
Wow, yeah, I suppose that is problematic. And I see Rayman also noted it back then too - https://forums.parallax.com/discussion/comment/1521000/#Comment_1521000