New p2docs pages: Errata and Optimization Guide

I just added two new pages to the p2docs site:
Optimization and Coding for Speed, which should collect advice for fast coding - please tell me if you don't understand something (unless there's already a "TODO" there)
Hardware Bugs & Errata, a list of P2 hardware bugs. I feel like I forgot at least one that was previously discovered. (official doc still doesn't have the RDFAST startup bug)
Now debate this 2AM coldpost while I go to bed.
(I also added and moved stuff around elsewhere)
Comments
Typos: Memory Access section - First sentence contains the misspelt word "obviosu".
I had seen others but I think you've already fixed them.
Details: HubRAM RDLONG/WRLONG exec times are 9..16 and 3..10 respectively. Not 9..17 and 3..11.
"Hub slice alignment" is a case of not worrying about initial alignment and concentrate just on optimising the loop times. I say this because every recompile or shifted instruction will move the absolute location, and therefore hubRAM slice number, of the referenced data and/or executing code and therefore make every edit a timing minefield to predetermine how neatly the eggbeater aligns upon entering the loop.
Basically, inside or outside the loop, both knowing the relative/cyclic slice order of hubRAM accesses and matching that with associated execution timings is needed. Both are equally important. It's just a lot easier to manage optimising such inside a loop rather than outside.
When it comes to doing both loads and stores in an order, that can be optimised too. But loads are not the same phase as stores. I did measure it once-upon-a-time ...
That's how you know it's my authentic artisan post-midnight screeds.
There's probably many typos in there, I don't have spellchecking in my editor.
I've included the possible +1 disalignment penalty in the table for brevity.
Yes I meant that, the relative alignment of successive ops. Maybe need to clear up wording. Though I think absolute alignment comes into play when mixing hub access and CORDIC commands. In pure ASM you can align to exact addresses, so it's possible to use that there.
Though IME when CORDIC is also involved, there's probably time to use the FIFO trick instead.
Yes, I remember... I need to run my own tests so I can write a guide. If just for my own use, because I can never remember how to do it properly. Other under-researched topic is access stalls due to FIFO in hubexec.
My attitude is outside a tight loop, it gets way too hard too quickly.
Cordic commands will count as hub ops I believe. Just they're on a set phase like all the Prop1 hub ops. They could be used as a reference to compare multiple cogs. I'm guessing now.
Cool.