Dual Port LUT RAM - Do we need RD/WRLUT ?

Cluso99 · 2016-04-22 00:21

Now that Chip has made the LUT dual port, do we need RDLUT & WRLUT ?

If we now treat COG RAM & LUT RAM as a contiguous block of cog memory, we can use RD/WR/RF/WF-LONG/WORD/BYTE etc to load cog/lut simply by addressing the cog/lut register contiguously $000..$3FF. For example:

RD <register>,<hubpointer>

<register> is normally a 9-bit register address in the now lower half of cog ram. It could be modified by an AUGDS D/#,S/# instruction, making the <register> a 10 bit address, addressing the whole of the new cog ram.
<hubpointer> is normally a 9-bit register address in the now lower half of cog ram which contains the 20-bit hub address. It could be modified by an AUGDS D/#,S/# instruction, making the <hubpointer> a 10-bit address, addressing the whole of the new cog ram, which would contain the 20-bit hub address.

PTRA/PTRB/ADDRA/ADDRB would now use 10-bits when referring to cog registers, so the whole of the new cog ram could be selected.

By using the AUGDS D/#,S/# instruction, any existing AND/OR/ADD/etc instruction could now perform any required instruction to the whole of the new cog ram.

Using the above method, the compiler could be extended to automatically insert the "AUGDS" instruction when required, simplifying programming. (similar to how the jmp instruction works).

This should simplify the explanation of the programming model. No need to refer to LUT vs COG any more.

jmg · 2016-04-22 00:39

Cluso99 wrote: »

Now that Chip has made the LUT dual port, do we need RDLUT & WRLUT ?

Good point.
Having the LUT placed above COG, in a simple memory map does make things easier.
Depends how much logic is involved in that mapping, and in RDLUT & WRLUT themselves.

I think now they can load # easier and naturally wrap into LUT space only ?

Cluso99 · 2016-04-22 00:53

I was just checking the AUGDS instruction. Not sure where this is up to, so we might require another instruction to achieve what I said above.

The only complication in coding cog ram is that we need to skip over the special registers $1F0..$1FF. The compiler ultimately will need to generate an error if cog encounters instruction/data for the special registers (sort of like/why FIT $1F0 is used)

FWIW with dual port lut, the instructions can execute from the lut section and access lut ram concurrently (ie 2 clock instructions).

jmg said
I think now they can load # easier and naturally wrap into LUT space only ?

Do you mean that the program code and wrap into LUT?

Code can run between cog and lut naturally using any of the call/jmp/djnz/etc, either directly or indirectly. IIRC there was no problem under the current mapping.

The big advantage is that now we can refer to the space as a singular contiguous cog ram space with a few caveats, rather than two separate cog and lut spaces together with caveats.

Seairth · 2016-04-22 02:19

I dunno. I think you are trading one set of complications for another set. LUT (AUX!) memory is not the same as COG memory, no matter how you present it.

Electrodude · 2016-04-22 02:29

I thought LUT was only at $200..$3FF (maybe times 4 because of byte addressing?) for the sake of instruction fetching, not memory access?

jmg · 2016-04-22 02:53

Cluso99 wrote: »

Now that Chip has made the LUT dual port..

Maybe Chip needs to expand on what the LUT Dual port change means, to how the system can now operate ?

cgracey · 2016-04-22 03:06

jmg wrote: »

Cluso99 wrote: »

Now that Chip has made the LUT dual port..

Maybe Chip needs to expand on what the LUT Dual port change means, to how the system can now operate ?

All it does is allow the streamer to read the LUT at the same time the cog reads or writes it. Nothing else. This enables the cog to update the LUT at the same time the streamer plays it. Or, you could say that while the streamer plays the LUT, the cog can still use another portion of the LUT for its own purposes, without glitching the streamer's reads.

Cluso99 · 2016-04-22 03:10

Seairth wrote: »

I dunno. I think you are trading one set of complications for another set. LUT (AUX!) memory is not the same as COG memory, no matter how you present it.

I am certainly not seeing it that way. Perhaps I am missing something.

Electrodude wrote: »

I thought LUT was only at $200..$3FF (maybe times 4 because of byte addressing?) for the sake of instruction fetching, not memory access?

Currently Cog Ram is $000..$1FF and LUT is $000..$1FF (or mapped to $200..$3FF). Both are addresses as longs only.

Seairth · 2016-04-22 10:26

Cluso99 wrote: »

Seairth wrote: »

I dunno. I think you are trading one set of complications for another set. LUT (AUX!) memory is not the same as COG memory, no matter how you present it.

I am certainly not seeing it that way. Perhaps I am missing something.

As Chip keeps pointing out, the LUT is used by the streamer.

Also, with the limitation of 9-bit addresses (without resorting to AUGx), there is no way for the LUT memory to be accessed in the same way as the COG memory. I see that you are suggesting a solution for that, but I don't think it's that straight forward (hence my original comment).

Seairth · 2016-04-22 10:30

cgracey wrote: »

All it does is allow the streamer to read the LUT at the same time the cog reads or writes it. Nothing else. This enables the cog to update the LUT at the same time the streamer plays it. Or, you could say that while the streamer plays the LUT, the cog can still use another portion of the LUT for its own purposes, without glitching the streamer's reads.

So... I wonder if this could be used to quickly execute code stored in external RAM...
edit: Never mind. I just reread the streamer docs. It writes to the HUB RAM, not LUT.

Seairth · 2016-04-22 10:42

Crazy question. Would it be possible for the streamer to read from the HUB instead? And if that were the case, could you get rid of the LUTs altogether and free up enough die space to double HUB memory? That would really simplify the memory model.

edit: no, I'm not advocating more changes. I'd still rather have the chip stay as it is now than to prolong its development any further. This was just a thought that went in a different direction than @Cluso99 had gone.

Electrodude · 2016-04-22 13:58

RD/WRLUT are absolutely critical to the P2. If they are removed or anything about them changed, the P2 will probably never become a reality.

cgracey · 2016-04-22 17:33

Seairth wrote: »

Crazy question. Would it be possible for the streamer to read from the HUB instead? And if that were the case, could you get rid of the LUTs altogether and free up enough die space to double HUB memory? That would really simplify the memory model.

edit: no, I'm not advocating more changes. I'd still rather have the chip stay as it is now than to prolong its development any further. This was just a thought that went in a different direction than @Cluso99 had gone.

The magic of the LUT, in relation to the streamer, is that it can be randomly accessed, not just sequentially, like the hub RAM. This is important during DDS mode, where the step size may be greater than one location. Also, color lookups for video must be random.

Seairth · 2016-04-22 18:20

cgracey wrote: »

Seairth wrote: »

Crazy question. Would it be possible for the streamer to read from the HUB instead? And if that were the case, could you get rid of the LUTs altogether and free up enough die space to double HUB memory? That would really simplify the memory model.

edit: no, I'm not advocating more changes. I'd still rather have the chip stay as it is now than to prolong its development any further. This was just a thought that went in a different direction than @Cluso99 had gone.

The magic of the LUT, in relation to the streamer, is that it can be randomly accessed, not just sequentially, like the hub RAM. This is important during DDS mode, where the step size may be greater than one location. Also, color lookups for video must be random.

Excellent point! 'nuff said.

Dual Port LUT RAM - Do we need RD/WRLUT ?

Comments