Shop OBEX P1 Docs P2 Docs Learn Events
3D teapot demo - Page 3 — Parallax Forums

3D teapot demo

13»

Comments

  • RaymanRayman Posts: 15,035

    @Wuerfel_21 you might be a perfectionist :)

    I would have moved on…

  • Wuerfel_21Wuerfel_21 Posts: 5,272
    edited 2025-03-15 00:45

    Of course.

    (Though as noted, I had that idea while writing an unrelated program)

  • Another interesting thought wrt. PSRAM usage: If I write a custom driver, I think I could support a "hybrid" mode between fast and slow modes (sysclk/3 and sysclk/2 for QPI PSRAM, sysclk/2 and sysclk/1 for HyperRAM). This would normally use the slower, more reliable speed, but certain transfers could opt into the faster, unreliable speed. Notably, everything related to textures and the framebuffer should be pretty resilient against bit errors and benefit from faster bulk transfer. A risk here is that if the command phase gets corrupted, unrelated memory may be written to, which would be pretty bad. (Though I think the problem usually happens when data is going from the RAM chip to the P2)

  • evanhevanh Posts: 16,345
    edited 2025-03-15 23:11

    @Wuerfel_21 said:
    ...(Though I think the problem usually happens when data is going from the RAM chip to the P2)

    Correct. Write timings are consistently precise because the Prop2 produces the bus clock - For sysclock/2 at least.

    Sysclock/1 has struggled in both directions. It was hard to construct a suitable clock-data phase relationship for the write timings at sysclock/1. I hope to make progress on this with the P2Stamp. It may still require adding a small capacitor to tweak clock lagging but I'm hopeful that unregistered clock + registering data will suffice, for writes.

    Reads will need all the pin mode combinations. I made some progress on mapping those when testing SD cards at speed. The nice part about this is startup calibration can write to any amount of the RAM to tune itself with. Which wasn't an option with the SD cards.

  • Wuerfel_21Wuerfel_21 Posts: 5,272
    edited 2025-03-15 23:33

    Unrelatedly, here's a "3.2" version of the same teapot demo. I made some further tweaks:

    7597 µs E2E, 2752 µs GEO, 2802 µs RAS <- demo 3.0
    6812 µs E2E, 2227 µs GEO, 2802 µs RAS, 1617 µs upload <- demo 3.1
    6497 µs E2E, 2139 µs GEO, 2576 µs RAS, 1616 µs upload <- demo 3.2
    

    300-something microseconds saved. The main thing is opportunistically switching to the constant-L version of the texture mapping loop (48 cycles/pixel instead of 56), which overall saves time on both GEO and RAS. The latter is obvious, but it turns out the check is very cheap (10 cycles in fail case) and being able to skip the L gradient computation sometimes makes up for it.
    Other than that there's a bunch of more optimized routines and improved control flow.

  • roglohrogloh Posts: 5,886

    @Wuerfel_21 said:
    Another interesting thought wrt. PSRAM usage: If I write a custom driver, I think I could support a "hybrid" mode between fast and slow modes (sysclk/3 and sysclk/2 for QPI PSRAM, sysclk/2 and sysclk/1 for HyperRAM). This would normally use the slower, more reliable speed, but certain transfers could opt into the faster, unreliable speed. Notably, everything related to textures and the framebuffer should be pretty resilient against bit errors and benefit from faster bulk transfer. A risk here is that if the command phase gets corrupted, unrelated memory may be written to, which would be pretty bad. (Though I think the problem usually happens when data is going from the RAM chip to the P2.

    For sysclk/1 writes you would need to delay the clock phase for HyperRAM for reliable command and data latching, and @evanh will no doubt know the intricacies for this with his different board testing and his high speed captures etc. If that is the case it may then make sense to keep writes at sysclk/2 and just do reads optionally at sysclk/1. I'm not sure if registered/unregistered IO pin setting will always resolve the data timing at sysclk/1 rates unlike sysclk/2 which gives you the extra steps to adjust the clock phase.

  • evanhevanh Posts: 16,345

    I've ordered a PLCC84 socket to fit in the breakout board that Knivd supplied with the P2Stamp module. Decided I can just pull ten pins out of that to isolate the HyperRAM pins.

  • Hmm, so it sounds like sysclk/1 is really treacherous. Maybe skip that and focus on the PSRAMs with that idea. Though the win from going sysclk/3 -> sysclk/2 isn't that big, the 96MB type setups that'd need it are already rather bandwidth-constrained. All of the teapot timings have been with @MXX 's 96MB board at 252MHz/2 - going to 320MHz/3 would actually slow things down.
    I'm also thinking it would be more optimal if instead of linear rectangles, PSRAM framebuffers were stored with 3 lines packed together in 2048 byte blocks (leaves 128 byte padding). That way no scanline ever crosses a row boundary, unless 4-bit mode is used. You'd need less dense packing (one line with 384 padding bytes) to be 4-bit compliant. Not that those padding areas need to be wasted, they could buffer audio or something.

  • roglohrogloh Posts: 5,886

    Yes scan line alignment to PSRAM rows is always a handy optimization where possible.

  • evanhevanh Posts: 16,345

    @Wuerfel_21 said:
    Hmm, so it sounds like sysclk/1 is really treacherous. Maybe skip that and focus on the PSRAMs with that idea.

    It's the only real challenge! :)https://forums.parallax.com/discussion/comment/1561510/#Comment_1561510

  • evanhevanh Posts: 16,345
    edited 2025-03-18 04:12

    @evanh said:
    I've ordered a PLCC84 socket to fit in the breakout board that Knivd supplied with the P2Stamp module. Decided I can just pull ten pins out of that to isolate the HyperRAM pins.

    Done and tested with Roger's 4-bit SD add-on board. Now to get back up to speed with HyperRAM ...

Sign In or Register to comment.