To cache or not to cache... musings on improving Spin/LMM performance

Bill Henning · 2013-12-02 18:24

Some musings brought on by Chip's suggested move to RDOCTAL / WROCTAL

This change is suggested for P3

RDxxxx vs RDxxxxC

Performance of all virtual machines (including Spin) would be improved if RDxxxx did not disturb the cache, but bypassed it.

For Spin, and byte code interpreters, any RDxxxx performed during the execution of a byte code flushes the upcoming, already fetched bytes.

WRxxxxx may also damage the cache - I simply don't know. It should update the long in the cache if the address matches, otherwise leave it alone.

Simply avoiding such cache "damage" would significantly speed up LMM & byte code interpreters.

Best Guess: 20%+ improvement to LMM and byte code interpreters.

It is also conceptually cleaner to have the RDxxxx ops be known to bypass the cache line, and RDxxxxC ops fetch from the cache (and refill cache when needed).

cgracey · 2013-12-02 20:01

RDBYTE/RDWORD/RDLONG already bypass the cache. Only RDBYTEC/RDWORDC/RDLONGC//RDQUADC reload the cache if they access an address that is outside of the last-read quad. RDQUAD always resets the cache address. The only way to flush the cache is to execute a CACHEX instruction.

Bill Henning · 2013-12-02 20:03

Excellent!

My reading of the docs was that RDBYTE/WORD/LONG would always fetch four longs, overwriting the contents of the cache.

I am delighted to be wrong on this one

cgracey wrote: »

RDBYTE/RDWORD/RDLONG already do not flush the cache. Only RDBYTEC/RDWORDC/RDLONGC//RDQUADC reload the cache if they access an address that is outside of the last-read quad. RDQUAD always resets the cache address.

To cache or not to cache... musings on improving Spin/LMM performance

Comments