To cache or not to cache... musings on improving Spin/LMM performance
Bill Henning
Posts: 6,445
Some musings brought on by Chip's suggested move to RDOCTAL / WROCTAL
This change is suggested for P3
RDxxxx vs RDxxxxC
Performance of all virtual machines (including Spin) would be improved if RDxxxx did not disturb the cache, but bypassed it.
For Spin, and byte code interpreters, any RDxxxx performed during the execution of a byte code flushes the upcoming, already fetched bytes.
WRxxxxx may also damage the cache - I simply don't know. It should update the long in the cache if the address matches, otherwise leave it alone.
Simply avoiding such cache "damage" would significantly speed up LMM & byte code interpreters.
Best Guess: 20%+ improvement to LMM and byte code interpreters.
It is also conceptually cleaner to have the RDxxxx ops be known to bypass the cache line, and RDxxxxC ops fetch from the cache (and refill cache when needed).
This change is suggested for P3
RDxxxx vs RDxxxxC
Performance of all virtual machines (including Spin) would be improved if RDxxxx did not disturb the cache, but bypassed it.
For Spin, and byte code interpreters, any RDxxxx performed during the execution of a byte code flushes the upcoming, already fetched bytes.
WRxxxxx may also damage the cache - I simply don't know. It should update the long in the cache if the address matches, otherwise leave it alone.
Simply avoiding such cache "damage" would significantly speed up LMM & byte code interpreters.
Best Guess: 20%+ improvement to LMM and byte code interpreters.
It is also conceptually cleaner to have the RDxxxx ops be known to bypass the cache line, and RDxxxxC ops fetch from the cache (and refill cache when needed).
Comments
My reading of the docs was that RDBYTE/WORD/LONG would always fetch four longs, overwriting the contents of the cache.
I am delighted to be wrong on this one