Shop OBEX P1 Docs P2 Docs Learn Events
SETQ2 with LUT RAM — Parallax Forums

SETQ2 with LUT RAM

I need a quick sanity check. SETQ2 would be used with RDLONG and WRLUT correct? Something like....

     mov       t2, #$200  ' Set the LUT address in t2
     setq2     #10-1      ' get 10 longs from hub to LUT
     rdlong    t1, ptra   ' populate intermediate register with HUB data @ptra
     wrlut     t1, t2     ' write intermediate register data into LUT @t2

Would it increment the intermediate cog register (t2) that is used to shuffle data from HUBram into the LUT?

Am I even close to getting this right?

As always, thanks in advance.

--Terry

Comments

  • evanhevanh Posts: 16,023
    edited 2022-02-15 07:19

    Just the SETQ2+RDLONG is all you need. The SETQ2 completely changes how RDLONG operates.

    What you've got there copies ten longwords from hubRAM (beginning at address ptra) to lutRAM (beginning at address t1). The WRLUT then subsequently places the singular value from t1 into lutRAM at address #200.

  • evanhevanh Posts: 16,023

    There's four types of modal instructions like SETQ. They all have one thing in common - they put a temporary hold on interrupts.

    In no particular order:

    • Firstly, those that that flat modify the next instruction no matter what it is - these are known as prefixing instructions. Eleven of these, all the ALTx's.
    • Secondly, those that fold result data back as operand data for the next instruction. Four of these: XORO32, SCA, SCAS, and GETXACC.
    • Thirdly, those that set a hidden mode bit and/or fill out a special register. These will only have an effect on certain instructions that are built for those conditions. Importantly the mode bit self-resets on the next instruction. Four of these: SETQ, SETQ2, GETCT WC, CRCNIB.
    • And finally, those that retain an operation - only completing on criteria. They have dedicated hidden bits and registers. Three of these: REP and AUGD and AUGS.
  • ke4pjwke4pjw Posts: 1,169
    edited 2022-02-15 17:23

    @evanh so this should work?

         mov       t1, #$200  ' Set the LUT address in t2
         setq2     #10-1      ' get 10 longs from hub to LUT
         rdlong   ptra, t1   ' populate LUT with HUB data @ptra
    
  • No, you need

         setq2     #10-1
         rdlong   ($200-$200), ptra
    
  • evanhevanh Posts: 16,023
    edited 2022-02-16 03:25

    Yeah, the answer looks odd because the RDLONG assembly syntax is ill-fitting for this. Both operands are addresses but D is immediate direct mode while S is direct indirect mode. You get that when it's a repurposed instruction. ... And then, just to add fun, there's the memory map differences between data space and program space.

  • Uh, now I am really confused. What is the ($200-$200) syntax? Isn't that $00 ? I thought LUT memory started at $0200

    Sorry for being so thick.

  • evanhevanh Posts: 16,023
    edited 2022-02-16 03:25

    Yeah, understandable. Data addressing of lutRAM starts at zero. Only the instruction fetching for program execution maps lutRAM to $200 to $3ff. It's maybe something Chip should've aligned but it wasn't thought much about at the time. EDIT: Acutally, there is a good reason: Direct address values (encoded in the instruction itself) are limited to 9 bits. By keeping the addresses in the 0 to $1ff range the program is more compact and faster running.

  • Got it. One last, and very dumb question. Can LUTs be reserved for symbols? If so, how is it done?

  • evanhevanh Posts: 16,023
    edited 2022-02-16 04:00

    Yes, as program labels. But that means if you use them for a data access (RDLUT/WRLUT) then it's up to you to subtract the $200 appropriately.

  • How would that look? I mean, how would it make it's way into the LUT address space?

  • evanhevanh Posts: 16,023
    edited 2022-02-16 05:15

    Here's a list of addressing modes:

    • Immediate Data - instruction has data value encoded in it.
    • Register Direct - instruction has register number (cogRAM address) that contains the data.
    • Memory Direct - instruction has memory address (hubRAM or lutRAM) of data. Generally not use for HubRAM addressing.
    • Register Indirect Register - instruction has register number which itself contains a register number, ie: The ALTx prefixing instructions. Possibly unique to Propeller architecture.
    • Register Indirect Memory - instruction has register number which contains a memory address (hubRAM or lutRAM). Most hubRAM accesses use this.
  • Wait, I can use a constant to point to a specific LUT address if I wanted to, right?

  • evanhevanh Posts: 16,023
    edited 2022-02-16 04:59

    @ke4pjw said:
    Wait, I can use a constant to point to a specific LUT address if I wanted to, right?

    Yep. When it's S operand it has a preceding #. When it's the special case block copy then no #.

  • Christof Eb.Christof Eb. Posts: 1,213
    edited 2023-11-11 10:52

    Hi,
    struggling with block move to LUT too. The code is executing from HUB.
    I had had:

       mov r0,#$1CB ' retrieve cog3 ########################
       mov r2,#47 ' $1cb...$1F9 
       FOR: ' retrieve cog
          altd r0,#0
          rdlong 0,r4
          add r0,#1      
          add r4,#4
       NEXT: r2
       mov r0,#0 ' retrieve lut ########################
       mov r2,##512
       FOR:
          rdlong r1,r4
          wrlut r1,r0
          add r0,#1
          add r4,#4
       NEXT: r2
       ret
    

    which works

    ( FOR: ... NEXT: r2
    Is a Taqoz macro for the loop, with djnz )

    And try to use now a block move for the LUT part:

       mov r0,#$1CB ' retrieve cog3 ########################
       mov r2,#47 ' $1cb...$1F9 
       FOR: ' retrieve cog
          altd r0,#0
          rdlong 0,r4
          add r0,#1      
          add r4,#4
       NEXT: r2
       mov r0,#0 ' retrieve complete lut, first_source=r4, first_dest=r0=0 #####
       setq2 #511 ' Number: 512-1
       rdlong r0,r4
       ret
    

    It does not work. I want to replace the cog move section later, but started with the LUT first here.

    I don't understand, why ($200-$200) is used in post #5 instead of $0 ? Is this a way to preload the register0 with zero?

         setq2     #10-1
         rdlong   ($200-$200), ptra
    

    Is it necessary to use a pointer register and r4 will not work?
    Edit: Hm, changed to use PTRA, did not work either.

    What's wrong with my snippet? Thanks for any hints!
    Christof

  • evanhevanh Posts: 16,023
    edited 2023-11-11 14:31

    @"Christof Eb." said:

       mov r0,#0 ' retrieve complete lut, first_source=r4, first_dest=r0=0 #####
       setq2 #511 ' Number: 512-1
       rdlong r0,r4
    

    The r0 is wrong. The register number itself is the cogRAM or lutRAM address. The earlier ALTD is the solution to indexing the D operand. But you aren't wanting to index it so do this instead:

       setq2 #511 ' Number: 512-1
       rdlong 0,r4
    

    .

    I don't understand, why ($200-$200) is used in post #5 instead of $0 ? Is this a way to preload the register0 with zero?

         setq2     #10-1
         rdlong   ($200-$200), ptra
    

    She's used $200-$200 like a comment in an unsuited association with lutRAM's program space - which begins from address $200.
    It equates to zero, and zero is correct first address of lutRAM data space. Using a zero is sensible.

  • @evanh said:

    @"Christof Eb." said:

       mov r0,#0 ' retrieve complete lut, first_source=r4, first_dest=r0=0 #####
       setq2 #511 ' Number: 512-1
       rdlong r0,r4
    

    The r0 is wrong. The register number itself is the cogRAM or lutRAM address. The earlier ALTD is the solution to indexing the D operand. But you aren't wanting to index it so do this instead:

       setq2 #511 ' Number: 512-1
       rdlong 0,r4
    

    .

    I don't understand, why ($200-$200) is used in post #5 instead of $0 ? Is this a way to preload the register0 with zero?

         setq2     #10-1
         rdlong   ($200-$200), ptra
    

    She's used $200-$200 like a comment in an unsuited association with lutRAM's program space - which begins from address $200.
    It equates to zero, and zero is correct first address of lutRAM data space. Using a zero is sensible.

    Thank you very very much! I had spent several hours on this miracle! Trying to isolate it, checking the code, that the assembler produces, experimenting...
    So this is the code, that works:

       mov r0,#$1CB ' retrieve cog3 ########################
       mov r2,#46 ' $1cb...$1F9 minus 1
       setq r2
       altd r0,#0
       rdlong 0,r4    
       add r4,#188 ' 47*4
       mov r0,#0 ' retrieve complete lut, first_source=r4, first_dest=r0=0 #####
       mov r2,#511 ' Number: 512-1
       setq2 r2
       altd r0,#0
       rdlong 0,r4
       ret
    

    Don't you want to write a book on P2?
    Christof

  • evanhevanh Posts: 16,023

    @"Christof Eb." said:
    Don't you want to write a book on P2?

    Ha, no way! I don't have the ambition. Nor do I have the will to even catalogue a list to write about. Not to mention I'd get sidetracked and not finish anything anyway.

  • BTW, does PTRA adressing work with WRLUT? I mean, can I write

      WRLUT data,ptra++
    

    to write a long to LUT with auto-incrementing pointer? And do I have to put $200 into ptra or $0?

    :# And yes, somebody should finally finish the %&#! documentation of the P2. The assembly language manual still misses the explanation of some of the most important instructions.

  • @ManAtWork said:
    And do I have to put $200 into ptra or $0?

    Either. The addresses are truncated to 9 bits.

    :# And yes, somebody should finally finish the %&#! documentation of the P2. The assembly language manual still misses the explanation of some of the most important instructions.

    https://p2docs.github.io/lutmem.html#wrlut

  • Great, thanks! So you wrote the book ;) I think somebody should put a link to it here.

  • Christof Eb.Christof Eb. Posts: 1,213
    edited 2023-11-13 17:30

    .... still awfully struggling with setq.

    The last long of lut (at least) makes trouble. It gets overwritten with wrong values.
    This is the routine, which swaps the tasks. For this it swaps the complete LUT and also the registers of the FORTH machine. It executes from HUB.

    code swapTasks ( oldTask newTask -- oldTask newTask, does not work  )
       push cog0
       mov cog0,b 
    
       setq #511 ' write all cog RAM
       wrlong 0,cog0
    
       mov cog0,b
       add cog0,##2048
       setq2 #511 ' 512-1 write all LUT from 0 3,645ns @200MHz #####
       wrlong 0,cog0
    
       mov cog0,a
       add cog0,#84 ' 21 4 * 
       setq #8 ' read 21-29 to cog RAM
       rdlong 21,cog0
    
       mov cog0,a
       add cog0,#148 ' 37 * 4  
       setq #5 ' read from Stackc 37...42 to cog RAM
       rdlong 37,cog0
    
       mov cog0,a
       add cog0,#1836 ' 459 * 4  
       setq #46 ' read 459...505 to cog RAM
       rdlong 459,cog0
    
       mov cog0,a   
       add cog0,##2048
       setq2 #511 ' 512-1 read all LUT from 0 3,645ns @200MHz #####
       rdlong 0,cog0
    
       pop cog0
       ret
    end
    

    This is the slow version, that works:

    code swapTasks ' ( oldTask newTask -- oldTask newTask )
       mov r4,a
       mov r3,b
       add r3,#8 ' pointer in hub -- Save now #################################
       mov r0,#$15 ' save cog1 ######################
       mov r2,#9   ' $15... 
       FOR: ' save cog1
          altd r0,#0
          wrlong 0,r3
          add r0,#1      
          add r3,#4
       NEXT: r2
       mov r0,#$25 ' save cog2 ########################
       mov r2,#5 ' $25...$29 ( a , b irrelevant )
       FOR: ' save cog2
          altd r0,#0
          wrlong 0,r3
          add r0,#1      
          add r3,#4
       NEXT: r2
       mov r0,#$1CB ' save cog3 ########################
       mov r2,#47 ' $1CB...$1F9 
       FOR: ' save cog2
          altd r0,#0
          wrlong 0,r3
          add r0,#1      
          add r3,#4
       NEXT: r2
       mov r0,#0 ' save LUT ###########################
       mov r2,##512
       FOR: ' save lut
          rdlut r1,r0
          wrlong r1,r3
          add r0,#1
          add r3,#4
       NEXT: r2
       add r4,#8 ' pointer in hub Read now #########################
       mov r0,#$15 ' retrieve cog1 ######################
       mov r2,#9   ' $15... 
       FOR: ' retrieve cog
          altd r0,#0
          rdlong 0,r4
          add r0,#1      
          add r4,#4
       NEXT: r2
       mov r0,#$25 ' retrieve cog2 ########################
       mov r2,#5 ' $25...$29 
       FOR: ' retrieve cog
          altd r0,#0
          rdlong 0,r4
          add r0,#1      
          add r4,#4
       NEXT: r2   
       mov r0,#$1CB ' retrieve cog3 ########################
       mov r2,#47 ' $1cb...$1F9 
       FOR: ' retrieve cog
          altd r0,#0
          rdlong 0,r4
          add r0,#1      
          add r4,#4
       NEXT: r2
       mov r0,#0 ' retrieve lut ########################
       mov r2,##512
       FOR:
          rdlong r1,r4
          wrlut r1,r0
          add r0,#1
          add r4,#4
       NEXT: r2
       ret
    end
    

    What do I oversee? Is this some timing problem? Is there some difference between rdlong and wrlong in combination with setq2?

    Edit:
    If I use lut510 instead lut511, my code works. The slow version though works with lut511.

    If someone can see something, I am grateful!
    Christof

  • Christof Eb.Christof Eb. Posts: 1,213
    edited 2023-11-14 09:39

    Hm, spent some more hours here.

    Is it possible to fill PTRB=505 with fast block move using setq? Something very strange seems to happen here?!

       mov cog0,a
       add cog0,#1836 ' 459 * 4  
       setq #46 ' read 459...505 to cog RAM
       rdlong 459,cog0
    

    Edit: Very strange indeed, if I fill up to 506, which is DIRA, then PTRB gets filled correctly.
    The last register is filled with garbage???
    Is this known?

  • I don't think I ever tested interaction of block moves with hardware registers. Might actually be a real chip bug.

  • @Wuerfel_21 said:
    I don't think I ever tested interaction of block moves with hardware registers. Might actually be a real chip bug.

    Thanks for the reply! It did give me some encouragement, not to believe to be completely crazy.
    So I did some more investigation and wrote a bug report. https://forums.parallax.com/discussion/175592/bug-in-setq-for-fast-block-move#latest

    If you are ever looking for some example for https://p2docs.github.io/hubmem.html#block-transfers , you could consider the following code, which swaps all of LUT and also the relevant parts of the Taqoz Forth virtual processor in COG memory. There are 2 buffers in register a and b. As I am fed up with this, PTRA is also done the slow way.... These block moves make the code about 9 times faster, than conventional loops.

    code swapTasks ( oldTask newTask -- oldTask newTask 59us )
       push cog0
       mov cog0,b 
    
       setq #503 ' write cog RAM 0...503
       wrlong 0,cog0
    
       mov r0,#504 ' PTRA
       mov cog0,r0
       shl cog0,#2
       add cog0,b
       altd r0,#0   
       wrlong 0,cog0
    
       mov r0,#505 'PTRB
       mov cog0,r0
       shl cog0,#2
       add cog0,b
       altd r0,#0   
       wrlong 0,cog0
    
       mov cog0,b
       add cog0,##2048
       setq2 #511 ' 512-1 write all LUT from 0 3,645ns @200MHz #####
       wrlong 0,cog0
    
       mov r0,#21 ' 21...29
       mov cog0,r0
       shl cog0,#2
       add cog0,a
       setq #8 ' read 21-29 to cog RAM
       altd r0,#0
       rdlong 0,cog0
    
       mov r0,#37 ' 37...42 
       mov cog0,r0
       shl cog0,#2
       add cog0,a
       setq #5 ' 
       altd r0,#0
       rdlong 0,cog0
    
       mov r0,#459 ' 459...503 
       mov cog0,r0
       shl cog0,#2
       add cog0,a
       setq #44 ' 
       altd r0,#0
       rdlong 0,cog0
    
       mov r0,#504 ' PTRA
       mov cog0,r0
       shl cog0,#2
       add cog0,a
       altd r0,#0   
       rdlong 0,cog0   
    
       mov r0,#505 'PTRB
       mov cog0,r0
       shl cog0,#2
       add cog0,a
       altd r0,#0   
       rdlong 0,cog0      
    
       mov cog0,a   
       add cog0,##2048
       setq2 #511 ' 512-1 read all LUT from 0 3,645ns @200MHz #####
       rdlong 0,cog0
       waitx #80
    
       pop cog0
       ret
    end
    

    Christof

Sign In or Register to comment.