Dual Port LUT RAM - Do we need RD/WRLUT ?
Cluso99
Posts: 18,069
in Propeller 2
Now that Chip has made the LUT dual port, do we need RDLUT & WRLUT ?
If we now treat COG RAM & LUT RAM as a contiguous block of cog memory, we can use RD/WR/RF/WF-LONG/WORD/BYTE etc to load cog/lut simply by addressing the cog/lut register contiguously $000..$3FF. For example:
RD <register>,<hubpointer>
<register> is normally a 9-bit register address in the now lower half of cog ram. It could be modified by an AUGDS D/#,S/# instruction, making the <register> a 10 bit address, addressing the whole of the new cog ram.
<hubpointer> is normally a 9-bit register address in the now lower half of cog ram which contains the 20-bit hub address. It could be modified by an AUGDS D/#,S/# instruction, making the <hubpointer> a 10-bit address, addressing the whole of the new cog ram, which would contain the 20-bit hub address.
PTRA/PTRB/ADDRA/ADDRB would now use 10-bits when referring to cog registers, so the whole of the new cog ram could be selected.
By using the AUGDS D/#,S/# instruction, any existing AND/OR/ADD/etc instruction could now perform any required instruction to the whole of the new cog ram.
Using the above method, the compiler could be extended to automatically insert the "AUGDS" instruction when required, simplifying programming. (similar to how the jmp instruction works).
This should simplify the explanation of the programming model. No need to refer to LUT vs COG any more.
If we now treat COG RAM & LUT RAM as a contiguous block of cog memory, we can use RD/WR/RF/WF-LONG/WORD/BYTE etc to load cog/lut simply by addressing the cog/lut register contiguously $000..$3FF. For example:
RD <register>,<hubpointer>
<register> is normally a 9-bit register address in the now lower half of cog ram. It could be modified by an AUGDS D/#,S/# instruction, making the <register> a 10 bit address, addressing the whole of the new cog ram.
<hubpointer> is normally a 9-bit register address in the now lower half of cog ram which contains the 20-bit hub address. It could be modified by an AUGDS D/#,S/# instruction, making the <hubpointer> a 10-bit address, addressing the whole of the new cog ram, which would contain the 20-bit hub address.
PTRA/PTRB/ADDRA/ADDRB would now use 10-bits when referring to cog registers, so the whole of the new cog ram could be selected.
By using the AUGDS D/#,S/# instruction, any existing AND/OR/ADD/etc instruction could now perform any required instruction to the whole of the new cog ram.
Using the above method, the compiler could be extended to automatically insert the "AUGDS" instruction when required, simplifying programming. (similar to how the jmp instruction works).
This should simplify the explanation of the programming model. No need to refer to LUT vs COG any more.
Comments
Having the LUT placed above COG, in a simple memory map does make things easier.
Depends how much logic is involved in that mapping, and in RDLUT & WRLUT themselves.
I think now they can load # easier and naturally wrap into LUT space only ?
The only complication in coding cog ram is that we need to skip over the special registers $1F0..$1FF. The compiler ultimately will need to generate an error if cog encounters instruction/data for the special registers (sort of like/why FIT $1F0 is used)
FWIW with dual port lut, the instructions can execute from the lut section and access lut ram concurrently (ie 2 clock instructions).
Do you mean that the program code and wrap into LUT?
Code can run between cog and lut naturally using any of the call/jmp/djnz/etc, either directly or indirectly. IIRC there was no problem under the current mapping.
The big advantage is that now we can refer to the space as a singular contiguous cog ram space with a few caveats, rather than two separate cog and lut spaces together with caveats.
Maybe Chip needs to expand on what the LUT Dual port change means, to how the system can now operate ?
All it does is allow the streamer to read the LUT at the same time the cog reads or writes it. Nothing else. This enables the cog to update the LUT at the same time the streamer plays it. Or, you could say that while the streamer plays the LUT, the cog can still use another portion of the LUT for its own purposes, without glitching the streamer's reads.
Currently Cog Ram is $000..$1FF and LUT is $000..$1FF (or mapped to $200..$3FF). Both are addresses as longs only.
As Chip keeps pointing out, the LUT is used by the streamer.
Also, with the limitation of 9-bit addresses (without resorting to AUGx), there is no way for the LUT memory to be accessed in the same way as the COG memory. I see that you are suggesting a solution for that, but I don't think it's that straight forward (hence my original comment).
So... I wonder if this could be used to quickly execute code stored in external RAM...
edit: Never mind. I just reread the streamer docs. It writes to the HUB RAM, not LUT.
edit: no, I'm not advocating more changes. I'd still rather have the chip stay as it is now than to prolong its development any further. This was just a thought that went in a different direction than @Cluso99 had gone.
The magic of the LUT, in relation to the streamer, is that it can be randomly accessed, not just sequentially, like the hub RAM. This is important during DDS mode, where the step size may be greater than one location. Also, color lookups for video must be random.
Excellent point! 'nuff said.