SETQ2 with LUT RAM

ke4pjw · 2022-02-15 06:06

I need a quick sanity check. SETQ2 would be used with RDLONG and WRLUT correct? Something like....

     mov       t2, #$200  ' Set the LUT address in t2
     setq2     #10-1      ' get 10 longs from hub to LUT
     rdlong    t1, ptra   ' populate intermediate register with HUB data @ptra
     wrlut     t1, t2     ' write intermediate register data into LUT @t2

Would it increment the intermediate cog register (t2) that is used to shuffle data from HUBram into the LUT?

Am I even close to getting this right?

As always, thanks in advance.

--Terry

evanh · 2022-02-15 07:14

Just the SETQ2+RDLONG is all you need. The SETQ2 completely changes how RDLONG operates.

What you've got there copies ten longwords from hubRAM (beginning at address ptra) to lutRAM (beginning at address t1). The WRLUT then subsequently places the singular value from t1 into lutRAM at address #200.

evanh · 2022-02-15 08:25

There's four types of modal instructions like SETQ. They all have one thing in common - they put a temporary hold on interrupts.

In no particular order:

Firstly, those that that flat modify the next instruction no matter what it is - these are known as prefixing instructions. Eleven of these, all the ALTx's.
Secondly, those that fold result data back as operand data for the next instruction. Four of these: XORO32, SCA, SCAS, and GETXACC.
Thirdly, those that set a hidden mode bit and/or fill out a special register. These will only have an effect on certain instructions that are built for those conditions. Importantly the mode bit self-resets on the next instruction. Four of these: SETQ, SETQ2, GETCT WC, CRCNIB.
And finally, those that retain an operation - only completing on criteria. They have dedicated hidden bits and registers. Three of these: REP and AUGD and AUGS.

ke4pjw · 2022-02-15 17:17

@evanh so this should work?

     mov       t1, #$200  ' Set the LUT address in t2
     setq2     #10-1      ' get 10 longs from hub to LUT
     rdlong   ptra, t1   ' populate LUT with HUB data @ptra

Wuerfel_21 · 2022-02-15 17:27

No, you need

     setq2     #10-1
     rdlong   ($200-$200), ptra

evanh · 2022-02-15 22:46

Yeah, the answer looks odd because the RDLONG assembly syntax is ill-fitting for this. Both operands are addresses but D is immediate direct mode while S is direct indirect mode. You get that when it's a repurposed instruction. ... And then, just to add fun, there's the memory map differences between data space and program space.

ke4pjw · 2022-02-16 03:11

Uh, now I am really confused. What is the ($200-$200) syntax? Isn't that $00 ? I thought LUT memory started at $0200

Sorry for being so thick.

evanh · 2022-02-16 03:18

Yeah, understandable. Data addressing of lutRAM starts at zero. Only the instruction fetching for program execution maps lutRAM to $200 to $3ff. It's maybe something Chip should've aligned but it wasn't thought much about at the time. EDIT: Acutally, there is a good reason: Direct address values (encoded in the instruction itself) are limited to 9 bits. By keeping the addresses in the 0 to $1ff range the program is more compact and faster running.

ke4pjw · 2022-02-16 03:30

Got it. One last, and very dumb question. Can LUTs be reserved for symbols? If so, how is it done?

evanh · 2022-02-16 03:59

Yes, as program labels. But that means if you use them for a data access (RDLUT/WRLUT) then it's up to you to subtract the $200 appropriately.

ke4pjw · 2022-02-16 04:35

How would that look? I mean, how would it make it's way into the LUT address space?

evanh · 2022-02-16 04:36

Here's a list of addressing modes:

Immediate Data - instruction has data value encoded in it.
Register Direct - instruction has register number (cogRAM address) that contains the data.
Memory Direct - instruction has memory address (hubRAM or lutRAM) of data. Generally not use for HubRAM addressing.
Register Indirect Register - instruction has register number which itself contains a register number, ie: The ALTx prefixing instructions. Possibly unique to Propeller architecture.
Register Indirect Memory - instruction has register number which contains a memory address (hubRAM or lutRAM). Most hubRAM accesses use this.

ke4pjw · 2022-02-16 04:47

Wait, I can use a constant to point to a specific LUT address if I wanted to, right?

evanh · 2022-02-16 04:57

@ke4pjw said:
Wait, I can use a constant to point to a specific LUT address if I wanted to, right?

Yep. When it's S operand it has a preceding #. When it's the special case block copy then no #.

Christof Eb. · 2023-11-11 10:20

Hi,
struggling with block move to LUT too. The code is executing from HUB.
I had had:

   mov r0,#$1CB ' retrieve cog3 ########################
   mov r2,#47 ' $1cb...$1F9 
   FOR: ' retrieve cog
      altd r0,#0
      rdlong 0,r4
      add r0,#1      
      add r4,#4
   NEXT: r2
   mov r0,#0 ' retrieve lut ########################
   mov r2,##512
   FOR:
      rdlong r1,r4
      wrlut r1,r0
      add r0,#1
      add r4,#4
   NEXT: r2
   ret

which works

( FOR: ... NEXT: r2
Is a Taqoz macro for the loop, with djnz )

And try to use now a block move for the LUT part:

   mov r0,#$1CB ' retrieve cog3 ########################
   mov r2,#47 ' $1cb...$1F9 
   FOR: ' retrieve cog
      altd r0,#0
      rdlong 0,r4
      add r0,#1      
      add r4,#4
   NEXT: r2
   mov r0,#0 ' retrieve complete lut, first_source=r4, first_dest=r0=0 #####
   setq2 #511 ' Number: 512-1
   rdlong r0,r4
   ret

It does not work. I want to replace the cog move section later, but started with the LUT first here.

I don't understand, why ($200-$200) is used in post #5 instead of $0 ? Is this a way to preload the register0 with zero?

     setq2     #10-1
     rdlong   ($200-$200), ptra

Is it necessary to use a pointer register and r4 will not work?
Edit: Hm, changed to use PTRA, did not work either.

What's wrong with my snippet? Thanks for any hints!
Christof

evanh · 2023-11-11 14:13

@"Christof Eb." said:

   mov r0,#0 ' retrieve complete lut, first_source=r4, first_dest=r0=0 #####
   setq2 #511 ' Number: 512-1
   rdlong r0,r4

The r0 is wrong. The register number itself is the cogRAM or lutRAM address. The earlier ALTD is the solution to indexing the D operand. But you aren't wanting to index it so do this instead:

   setq2 #511 ' Number: 512-1
   rdlong 0,r4

.

I don't understand, why ($200-$200) is used in post #5 instead of $0 ? Is this a way to preload the register0 with zero?
     setq2     #10-1
     rdlong   ($200-$200), ptra

She's used $200-$200 like a comment in an unsuited association with lutRAM's program space - which begins from address $200.
It equates to zero, and zero is correct first address of lutRAM data space. Using a zero is sensible.

Christof Eb. · 2023-11-11 15:27

@evanh said:
@"Christof Eb." said:
   mov r0,#0 ' retrieve complete lut, first_source=r4, first_dest=r0=0 #####
   setq2 #511 ' Number: 512-1
   rdlong r0,r4
The r0 is wrong. The register number itself is the cogRAM or lutRAM address. The earlier ALTD is the solution to indexing the D operand. But you aren't wanting to index it so do this instead:
   setq2 #511 ' Number: 512-1
   rdlong 0,r4
.
I don't understand, why ($200-$200) is used in post #5 instead of $0 ? Is this a way to preload the register0 with zero?
     setq2     #10-1
     rdlong   ($200-$200), ptra
She's used $200-$200 like a comment in an unsuited association with lutRAM's program space - which begins from address $200.
It equates to zero, and zero is correct first address of lutRAM data space. Using a zero is sensible.

Thank you very very much! I had spent several hours on this miracle! Trying to isolate it, checking the code, that the assembler produces, experimenting...
So this is the code, that works:

   mov r0,#$1CB ' retrieve cog3 ########################
   mov r2,#46 ' $1cb...$1F9 minus 1
   setq r2
   altd r0,#0
   rdlong 0,r4    
   add r4,#188 ' 47*4
   mov r0,#0 ' retrieve complete lut, first_source=r4, first_dest=r0=0 #####
   mov r2,#511 ' Number: 512-1
   setq2 r2
   altd r0,#0
   rdlong 0,r4
   ret

Don't you want to write a book on P2?
Christof

evanh · 2023-11-11 16:00

@"Christof Eb." said:
Don't you want to write a book on P2?

Ha, no way! I don't have the ambition. Nor do I have the will to even catalogue a list to write about. Not to mention I'd get sidetracked and not finish anything anyway.

ManAtWork · 2023-11-13 15:03

BTW, does PTRA adressing work with WRLUT? I mean, can I write

  WRLUT data,ptra++

to write a long to LUT with auto-incrementing pointer? And do I have to put $200 into ptra or $0?

And yes, somebody should finally finish the %&#! documentation of the P2. The assembly language manual still misses the explanation of some of the most important instructions.

Wuerfel_21 · 2023-11-13 15:11

@ManAtWork said:
And do I have to put $200 into ptra or $0?

Either. The addresses are truncated to 9 bits.

And yes, somebody should finally finish the %&#! documentation of the P2. The assembly language manual still misses the explanation of some of the most important instructions.

https://p2docs.github.io/lutmem.html#wrlut

ManAtWork · 2023-11-13 15:47

Great, thanks! So you wrote the book I think somebody should put a link to it here.

Christof Eb. · 2023-11-13 17:09

.... still awfully struggling with setq.

The last long of lut (at least) makes trouble. It gets overwritten with wrong values.
This is the routine, which swaps the tasks. For this it swaps the complete LUT and also the registers of the FORTH machine. It executes from HUB.

code swapTasks ( oldTask newTask -- oldTask newTask, does not work  )
   push cog0
   mov cog0,b 

   setq #511 ' write all cog RAM
   wrlong 0,cog0

   mov cog0,b
   add cog0,##2048
   setq2 #511 ' 512-1 write all LUT from 0 3,645ns @200MHz #####
   wrlong 0,cog0

   mov cog0,a
   add cog0,#84 ' 21 4 * 
   setq #8 ' read 21-29 to cog RAM
   rdlong 21,cog0

   mov cog0,a
   add cog0,#148 ' 37 * 4  
   setq #5 ' read from Stackc 37...42 to cog RAM
   rdlong 37,cog0

   mov cog0,a
   add cog0,#1836 ' 459 * 4  
   setq #46 ' read 459...505 to cog RAM
   rdlong 459,cog0

   mov cog0,a   
   add cog0,##2048
   setq2 #511 ' 512-1 read all LUT from 0 3,645ns @200MHz #####
   rdlong 0,cog0

   pop cog0
   ret
end

This is the slow version, that works:

code swapTasks ' ( oldTask newTask -- oldTask newTask )
   mov r4,a
   mov r3,b
   add r3,#8 ' pointer in hub -- Save now #################################
   mov r0,#$15 ' save cog1 ######################
   mov r2,#9   ' $15... 
   FOR: ' save cog1
      altd r0,#0
      wrlong 0,r3
      add r0,#1      
      add r3,#4
   NEXT: r2
   mov r0,#$25 ' save cog2 ########################
   mov r2,#5 ' $25...$29 ( a , b irrelevant )
   FOR: ' save cog2
      altd r0,#0
      wrlong 0,r3
      add r0,#1      
      add r3,#4
   NEXT: r2
   mov r0,#$1CB ' save cog3 ########################
   mov r2,#47 ' $1CB...$1F9 
   FOR: ' save cog2
      altd r0,#0
      wrlong 0,r3
      add r0,#1      
      add r3,#4
   NEXT: r2
   mov r0,#0 ' save LUT ###########################
   mov r2,##512
   FOR: ' save lut
      rdlut r1,r0
      wrlong r1,r3
      add r0,#1
      add r3,#4
   NEXT: r2
   add r4,#8 ' pointer in hub Read now #########################
   mov r0,#$15 ' retrieve cog1 ######################
   mov r2,#9   ' $15... 
   FOR: ' retrieve cog
      altd r0,#0
      rdlong 0,r4
      add r0,#1      
      add r4,#4
   NEXT: r2
   mov r0,#$25 ' retrieve cog2 ########################
   mov r2,#5 ' $25...$29 
   FOR: ' retrieve cog
      altd r0,#0
      rdlong 0,r4
      add r0,#1      
      add r4,#4
   NEXT: r2   
   mov r0,#$1CB ' retrieve cog3 ########################
   mov r2,#47 ' $1cb...$1F9 
   FOR: ' retrieve cog
      altd r0,#0
      rdlong 0,r4
      add r0,#1      
      add r4,#4
   NEXT: r2
   mov r0,#0 ' retrieve lut ########################
   mov r2,##512
   FOR:
      rdlong r1,r4
      wrlut r1,r0
      add r0,#1
      add r4,#4
   NEXT: r2
   ret
end

What do I oversee? Is this some timing problem? Is there some difference between rdlong and wrlong in combination with setq2?

Edit:
If I use lut510 instead lut511, my code works. The slow version though works with lut511.

If someone can see something, I am grateful!
Christof

Christof Eb. · 2023-11-14 09:34

Hm, spent some more hours here.

Is it possible to fill PTRB=505 with fast block move using setq? Something very strange seems to happen here?!

   mov cog0,a
   add cog0,#1836 ' 459 * 4  
   setq #46 ' read 459...505 to cog RAM
   rdlong 459,cog0

Edit: Very strange indeed, if I fill up to 506, which is DIRA, then PTRB gets filled correctly.
The last register is filled with garbage???
Is this known?

Wuerfel_21 · 2023-11-14 10:02

I don't think I ever tested interaction of block moves with hardware registers. Might actually be a real chip bug.

Christof Eb. · 2023-11-15 15:50

@Wuerfel_21 said:
I don't think I ever tested interaction of block moves with hardware registers. Might actually be a real chip bug.

Thanks for the reply! It did give me some encouragement, not to believe to be completely crazy.
So I did some more investigation and wrote a bug report. https://forums.parallax.com/discussion/175592/bug-in-setq-for-fast-block-move#latest

If you are ever looking for some example for https://p2docs.github.io/hubmem.html#block-transfers , you could consider the following code, which swaps all of LUT and also the relevant parts of the Taqoz Forth virtual processor in COG memory. There are 2 buffers in register a and b. As I am fed up with this, PTRA is also done the slow way.... These block moves make the code about 9 times faster, than conventional loops.

code swapTasks ( oldTask newTask -- oldTask newTask 59us )
   push cog0
   mov cog0,b 

   setq #503 ' write cog RAM 0...503
   wrlong 0,cog0

   mov r0,#504 ' PTRA
   mov cog0,r0
   shl cog0,#2
   add cog0,b
   altd r0,#0   
   wrlong 0,cog0

   mov r0,#505 'PTRB
   mov cog0,r0
   shl cog0,#2
   add cog0,b
   altd r0,#0   
   wrlong 0,cog0

   mov cog0,b
   add cog0,##2048
   setq2 #511 ' 512-1 write all LUT from 0 3,645ns @200MHz #####
   wrlong 0,cog0

   mov r0,#21 ' 21...29
   mov cog0,r0
   shl cog0,#2
   add cog0,a
   setq #8 ' read 21-29 to cog RAM
   altd r0,#0
   rdlong 0,cog0

   mov r0,#37 ' 37...42 
   mov cog0,r0
   shl cog0,#2
   add cog0,a
   setq #5 ' 
   altd r0,#0
   rdlong 0,cog0

   mov r0,#459 ' 459...503 
   mov cog0,r0
   shl cog0,#2
   add cog0,a
   setq #44 ' 
   altd r0,#0
   rdlong 0,cog0

   mov r0,#504 ' PTRA
   mov cog0,r0
   shl cog0,#2
   add cog0,a
   altd r0,#0   
   rdlong 0,cog0   

   mov r0,#505 'PTRB
   mov cog0,r0
   shl cog0,#2
   add cog0,a
   altd r0,#0   
   rdlong 0,cog0      

   mov cog0,a   
   add cog0,##2048
   setq2 #511 ' 512-1 read all LUT from 0 3,645ns @200MHz #####
   rdlong 0,cog0
   waitx #80

   pop cog0
   ret
end

Christof

SETQ2 with LUT RAM

Comments