Shop OBEX P1 Docs P2 Docs Learn Events
Problems with wrlong — Parallax Forums

Problems with wrlong

TectuTectu Posts: 22
edited 2011-07-12 20:01 in Propeller 1
Hello folks!

I have a little issue in an assembler section....
I want to copy the value from INA to the main RAM using wrlong. This is what my code looks like:
makescan                wrlong INA, address               
                        add address, #4                      
                        djnz samples, #makescan




When I execute that, the main RAM is $0000_0000. Then, I tryed this one:
makescan                mov temp, INA
                        wrlong temp, address                               
                        add address, #4                                       
                        djnz samples, #makescan                       


That one is working pretty fine... but WHY???
And how can I fix it? It's very important, that that subrutine (makescan) takes exactly 16 cycles...


Can anyone help me?


Greetings Tectu

Comments

  • localrogerlocalroger Posts: 3,452
    edited 2011-07-11 16:46
    INA cannot be used in the destination field of an instruction. Even though it's technically the "source" in your WRLONG to the PASM microcode it's in the destination field and the instruction doesn't go. I've had the same issue myself.

    On Edit: There's no way to fix this if your requirement is to really record all 32 bits of INA every 16 clocks, but if your real need is more limited it might be possible to do some kind of partial caching into Cog RAM. If you tell us more about the application there might be a workaround.
  • TectuTectu Posts: 22
    edited 2011-07-11 16:57
    Hello localroger,

    Thank you for your very fast response.

    I would like to build a Logic Analyzer. I know that there is the Parallax Digital Storage Logic Analyzer and also the Propalyzer, but I'd like to do one completly on my own, for learning... Another reason is that I want more than a 4.44MHz sample rate.

    Now, should I talk more about my ideas, or will you say me now something like "Don't invent the wheel new" ?


    Thanks for your help!
  • Mike GreenMike Green Posts: 23,101
    edited 2011-07-11 17:18
    You can't do "wrlong ina,address" as localroger mentioned. There's actually 512 longs of memory for each cog. There's special circuitry that chooses INA instead of the corresponding "shadow" memory location when that location occurs in the source field of an instruction. There's no such circuitry when INA occurs in the destination field of an instruction as in your case, so the "shadow" memory location is used.

    It's possible to use more than one cog to sample I/O pins so that one sample is made each system clock cycle. You have to synchronize the cogs so that their 4 clock instruction cycles are offset, each by one. This is done with a WAITCNT instruction with each cog waiting for a different system clock value to pick up execution. One cog waits for TIME+0. Another cog waits for TIME+1, etc. The cogs would store the samples in a buffer in their own memory, then copy the saved values to a common buffer in hub memory for processing. That's how the existing Propeller logic analyzers work.
  • TectuTectu Posts: 22
    edited 2011-07-11 17:26
    The Parallax Logic Analyzer uses more than one cog to sample the I/Os? I don't see that in the source code, where is that?
  • localrogerlocalroger Posts: 3,452
    edited 2011-07-11 17:52
    Mike has it, multiple cogs.

    You write your cog program to do, say, 4x interleaved writes. Each instance doesn't care about the other instances, it just reads, then waitcnts until the right time to do the next 4x interleaved read. When you start the cogs, you give them each a starting CNT to wait for, chosen a judicious way into the future considering cogstart overhead, and offset by the appropriate number of clocks for each of the four cogs. When your four cogs start they each wait for their individual offset start points and go about their individual merry ways stuffing data into Hub RAM at 4 long / 16 byte intervals. What your top spin app sees is a smooth stream of data.
  • kuronekokuroneko Posts: 3,623
    edited 2011-07-11 17:56
    localroger wrote: »
    On Edit: There's no way to fix this if your requirement is to really record all 32 bits of INA every 16 clocks, ...
    Does that count as impossible? Anyway, there is a way to get ina to the hub every 16 cycles as long as you don't mind transferring 2n longs starting at high addresses and going down from there. Check this thread [thread=129719][POC] reverse overlay loader aka cog to hub transfer[/thread]. The only thing which has to change is the main loop, something like this should do:
    [COLOR="blue"]mov     tmp, ina[/COLOR]                '  +8
                    mov     phsa, size              '  -4   hub byte count (8n + 7)
                    
    :copy7          [COLOR="blue"]wrlong  tmp, phsa[/COLOR]               '  +0 = transfer long between cog and hub
                    [COLOR="orange"]mov     tmp, ina[/COLOR]                '  +8
                    sub     phsa, #7 wz             '  -4
    
    :copy1          [COLOR="orange"]wrlong  tmp, phsa[/COLOR]               '  +0 = transfer long between cog and hub
            [COLOR="blue"]if_nz   mov     tmp, ina[/COLOR]                '  +8                                             
            if_nz   djnz    phsa, #:copy7           '  -4
    
  • localrogerlocalroger Posts: 3,452
    edited 2011-07-11 19:11
    kuroneko, that is absolutely wicked brilliant.
  • TectuTectu Posts: 22
    edited 2011-07-11 20:09
    localroger wrote: »
    kuroneko, that is absolutely wicked brilliant.

    To bad that I am new to Assembler and Propeller that I cannot understand it :(
  • K2K2 Posts: 693
    edited 2011-07-11 20:09
    It took a while before my brain would indicate anything other than $deadbeef. But now I get it! I can't believe I actually understand one of kuroneko's masterpieces. One should earn another star just for that.
  • TectuTectu Posts: 22
    edited 2011-07-11 20:11
    Okay... Is anyone so friendly to say how it works, step by step?

    I don't want to use things, when i don't know how they work...
  • kuronekokuroneko Posts: 3,623
    edited 2011-07-11 20:57
    Tectu wrote: »
    To bad that I am new to Assembler and Propeller that I cannot understand it :(
    Well, you noticed that there is not much time left for all the things you have to do (reading ina, writing it (temp) to hub, incrementing the hub address, decrementing the loop counter etc).

    So some people (Hi Phil!) came up with the sub #7/djnz approach. This exploits the fact that rdlong/wrlong ignores the lowest two address bits. Say you have a 4n address. You set the bottom two bits thereby making it 4n+3. Doing a rdlong will read data from 4n (lower two bits ignored). Then you subtract #7 and we end up with 4n-4 (the long address before that). Finally the djnz subtracts #1 and we end up with 4n-5 (or 4(n-2)+3). In the end we transferred two longs and adjusted the address by 8 (7+1) which is what we would have done anyway (4+4). It's a bit tricky the first time you see it but take the time to go through an example (on paper) and follow the steps.

    Something is still missing. The loop size. For a read loop (hub to cog) that was initially handled in that the code being loaded overwrote the final djnz of the transfer loop and the code simply continued with what we just loaded.

    This overwrite-drop-though didn't work too well with the opposite direction though (cog to hub). Because there isn't anything to overwrite. But don't despair. This is where shadow registers come in. Basically some of the special registers ($1F0-$1FF) have special behaviour depending whether you just read from or write to them (or both). One of them is the counter phase accumulator (phsx).
    1. If you read from it you get the counter value (value := counter[phsx]).
    2. If you write to it its shadow location and the counter are written to. (shadow[phsx] := counter[phsx] := value)
    3. Finally, read-modify-write performs the operation based on shadow[phsx] but updates both shadow and counter.
      ' phsx read-modify-write issue
      
            mov     temp, phsx     ' temp := counter[phsx]
            shr     temp, #1       ' temp >>= 1
            mov     phsx, temp     ' update shadow and counter
      
            ' is equivalent to
      
            mov     phsx, phsx     ' shadow[phsx] := counter[phsx]
            shr     phsx, #1       ' r: operate on shadow[phsx]
                                   ' m: shadow[phsx] >>= 1
                                   ' w: update shadow and counter
      
    So what I've done is to exploit this disconnectedness of shadow and counter register. I use the shadow as loop counter (sub phsa, #7 wz/djnz phsa, #:copy7). Given a loop of e.g. 10 longs doesn't give me proper addresses though (the counter register is used as wrlong target). So what we do now is to enable the counter and let it add the base address. Let's look at these two lines:
    mov     phsa, size              '  -4   hub byte count (8n + 7)
    :copy7          wrlong  tmp, phsa               '  +0 = transfer long between cog and hub
    
    Using the 10 longs as an example we feed phsa with 40-1 (one less than byte count). Up to the point when the wrlong has collected all its operands frqa has been added twice to phsa (that's simply something you have to know or figure out). Which means we divide our base address by 2 (it's 4n so no issues here) and place it into frqa. So the first write goes to base/2 + base/2 + 39 = base + 36 + 3. 36 is the offset of element 9, that's what we want (followed by 8 down to 0).

    I'm not a great explainer but I hope that gives you an idea why this works. Feel free to ask more questions (preferably in the Propeller sub forum).
  • K2K2 Posts: 693
    edited 2011-07-11 22:43
    If kuroneko had been born earlier he probably would have discovered relativity or created the first polio vaccine.
  • TectuTectu Posts: 22
    edited 2011-07-12 04:43
    Well, I think I understand the concept form that code now, but I could never build that on my own.

    I have just tw omore questions to your explanation:

    1. What does 4n, 8n, 5n, etc. means? Amount of digits?
    2. What is "size" for, and what value should it have?


    €dit: Someone should move this thread where it should be, I chould not figure out the right board for it - sorry
  • kuronekokuroneko Posts: 3,623
    edited 2011-07-12 06:00
    Tectu wrote: »
    Well, I think I understand the concept form that code now, but I could never build that on my own.
    Don't worry, neither could I when I started :)
    1. What does 4n, 8n, 5n, etc. means? Amount of digits?
    2. What is "size" for, and what value should it have?
    4n simply stands for a number divisible by 4 without remainder and is used when the actual value isn't important, e.g. it could be 1024 or 44. A long variable is usually stored at long aligned addresses which - given its size of 4 bytes - is 4n.

    As for size, this is the amount of longs you want to transfer in bytes -1. Just follow the link to the POC thread and have a look at the code listing (64 longs transferred amounts to 64*4-1 = 255). Or have a look at the [post=978929]cog storage code[/post] which uses - IIRC - 484 longs (size = 1935). HTH
  • TectuTectu Posts: 22
    edited 2011-07-12 08:39
    Okay... I tried to implement that now in my code (which is actually the whole code).
    I just get H(+??????????? from the serial terminal, without any newlines.

    The spin (in normal case) code just reads the main RAM and sends it to the serial terminal, newline after every long.
    CON
      _clkmode      = xtal1 + pll16x
      _xinfreq      = 5_000_000
    
    VAR
      long data[100]
      byte i
    
    OBJ
      Display       : "VGA_text"
      Key           : "Keyboard"
      Debug         : "Parallax Serial Terminal"
    
    PUB main
      Debug.start(115200)
    
      cognew(@entry, @data{0})
    
      waitcnt(clkfreq + cnt)
    
      Debug.char(13)
      Debug.str(string("Begin now: "))                      'begin output now
      Debug.char(13)
    
      repeat i from 0 to 99
        Debug.bin(data[i], 32)
        Debug.char(13)
    
      Debug.str(string("Finished!!"))                       'output finished
    
    
    DAT
                            org
    
    entry                   mov     samples, #100           'make 100 samples
                            mov     size, #3                'we want to sent 1 long (4 bytes) - 1 = 3 to main RAM
    
    makescan                mov     tmp, ina                '  +8
                            mov     phsa, size              '  -4   hub byte count (8n + 7)
    
    :copy7                  wrlong  tmp, phsa               '  +0 = transfer long between cog and hub
                            mov     tmp, ina                '  +8
                            sub     phsa, #7 wz             '  -4
    
    :copy1                  wrlong  tmp, phsa               '  +0 = transfer long between cog and hub
                  if_nz     mov     tmp, ina                '  +8
                  if_nz     djnz    phsa, #:copy7           '  -4
                            djnz    samples, #makescan
    
    :here                   jmp     #:here                  'never-ever-lands
    
    
    tmp           RES       1
    size          RES       1                               'need to send a long to main RAM
    samples       RES       1                               'amount of samples
    


    Sorry, for whatever I am doing wrong. It's my first project with a propeller, so please, tell me what I should do better ;-)


    ~ Tectu
  • TectuTectu Posts: 22
    edited 2011-07-12 12:09
    €dit: I get more than 100 times that term from the serial console, i get that infinitly.
  • kuronekokuroneko Posts: 3,623
    edited 2011-07-12 17:59
    I'll prepare a sample for you unless you figure it out in the meantime :)
  • TectuTectu Posts: 22
    edited 2011-07-12 18:29
    I did not work more on it. I tried to add the multicog stuff, half successfully.
    To try the routines, I tooked my old wrlong scan method, I would replace that with your 16 cycles routine to be faster.

    This is what I wrote:
    CON
      _clkmode      = xtal1 + pll16x
      _xinfreq      = 5_000_000
    
    VAR
      long waitcog                  'how long the cog has to wait
      long samples_amount           'how many samples should be made
      long coghptr                   'to switch to the right pointer
      long data[100]                'space on main RAM
      byte i
    
    OBJ
      Display       : "VGA_text"
      Key           : "Keyboard"
      Debug         : "Parallax Serial Terminal"
    
    PUB main
      Debug.start(115200)
      samples_amount := 5                                   'samples that will be done *4
    
      coghptr := 0
      waitcog := 100_000 + cnt
    
    '----------------------------------
    
      coghptr += 0
      waitcog += 0
      waitcnt(10_000+cnt)
      cognew(@entry, @waitcog)
    
      coghptr += 4
      waitcog += 32
      waitcnt(10_000+cnt)
      cognew(@entry, @waitcog)
    
      coghptr += 4
      waitcog += 32
      waitcnt(10_000+cnt)
      cognew(@entry, @waitcog)
    
      coghptr += 4
      waitcog += 32
      waitcnt(10_000+cnt)
      cognew(@entry, @waitcog)
    
    '----------------------------------
    
      waitcnt(clkfreq + cnt)
    
      Debug.char(13)
      Debug.str(string("Begin now: "))                      'begin output now
      Debug.char(13)
    
      repeat i from 0 to samples_amount*4-1
        Debug.bin(data[i], 32)
        Debug.char(13)
    
      Debug.str(string("Finished!!"))                       'output finished
    
    
    DAT
                            org     0
    
    entry                   mov     tmp, par
                            rdlong  wait, tmp
                            add     tmp, #4
                            rdlong  samples, tmp
                            add     tmp, #4
                            rdlong  addhptr, tmp
                            add     tmp, #4
                            mov     hptr, tmp               'copy pointer to begin of main RAM
    
                            add     hptr, addhptr
    
                            mov     dira, 0                 'make all pins input
    
                            waitcnt wait, #0                'wait for sncy
                            nop
    
    makescan                mov     tmp, ina                'write INA to tmp
                            wrlong  tmp, hptr               'write the sample to main RAM
                            add     hptr, #16               'point to the fourth-next long
                            nop
                            nop
                            nop
                            djnz    samples, #makescan      'do for amount of samples
    
    here                    jmp #here
    
    
    wait          RES       1                               'how many cyles should you wait
    hptr          RES       1                               'pointer to main RAM
    samples       RES       1                               'amount of samples that will be done
    tmp           RES       1
    addhptr       RES       1
    



    and this is what I get on serial terminal:
    Begin now:
    00000000000000000000000000000000
    01011111000000011000000000000011
    01011111000000011000000000000011
    01011111000000011000000000000011
    00000000000000000000000000000000
    01011111000000011000000000000011
    01011111000000011000000000000011
    01011111000000011000000000000011
    00000000000000000000000000000000
    01011111000000011000000000000011
    01011111000000011000000000000011
    01011111000000011000000000000011
    00000000000000000000000000000000
    01011111000000011000000000000011
    01011111000000011000000000000011
    01011111000000011000000000000011
    00000000000000000000000000000000
    01011111000000011000000000000011
    01011111000000011000000000000011
    01011111000000011000000000000011
    Finished!!
    


    no idea why the first cog is not working properly...
    But anyway, this is not why this thread exists.


    ~ Tectu
  • kuronekokuroneko Posts: 3,623
    edited 2011-07-12 18:42
    Can you place the waitcnt after the cognew? The first cog's parameters get corrupted by the second's set.
  • kuronekokuroneko Posts: 3,623
    edited 2011-07-12 20:01
    Here is the 16 cycle sample code. You basically just missed a bit of the setup. This example generates a 2.5MHz wave (@80MHz) so you can see the sampler picking up different values.
    CON
      _clkmode = XTAL1|PLL16X
      _xinfreq = 5_000_000
    
    CON
      lcnt = 32                                             ' must be an even number of longs
      
    VAR
      long  data[lcnt]
    
    OBJ
      debug: "Parallax Serial Terminal"
    
    PUB main | i
    
      debug.start(115200)
      waitcnt(clkfreq*3 + cnt)
      debug.char(0)
    
      ' generate data
      
      dira[16]~~
      ctra := constant(%0_00100_000 << 23 | 16)
      frqa := constant(%00001_0000 << 23)                   ' change every 16 cycles
    
      ' start sampler
    
      data[0] := constant(lcnt*4 -1)                        ' transfer array length
      cognew(@entry, @data{0})
    
      waitcnt(clkfreq + cnt)
    
      ' display data
      
      debug.str(string(13, "Begin now: ", 13))              ' begin output now
    
      repeat i from 0 to constant(lcnt -1)
        debug.bin(data[i], 32)
        debug.char(13)
    
      debug.str(string("Finished!!"))                       ' output finished
    
    DAT             org     0
    
    entry           movi    ctra, #%0_11111_000     '  -4   LOGIC always
    
                    rdlong  size, par               '  +0 = read byte count -1
                                      
                    mov     frqa, par               '  +8   data buffer (base address)
                    shr     frqa, #1                '  -4   base/2
    
                    long    0[2] {2 x nop}          '  +0 =
                    
                    mov     temp, ina               '  +8
                    mov     phsa, size              '  -4   hub byte count (8n - 1)
    
    :copy7          wrlong  temp, phsa              '  +0 = transfer long between cog and hub
                    mov     temp, ina               '  +8
                    sub     phsa, #7 wz             '  -4
    
    :copy1          wrlong  temp, phsa              '  +0 = transfer long between cog and hub
          if_nz     mov     temp, ina               '  +8
          if_nz     djnz    phsa, #:copy7           '  -4
    
                    cogid   cnt                     '
                    cogstop cnt                     ' sayonara ...
    
    ' initialised data and/or presets
    
    ' uninitialised data and/or temporaries
    
    size            res     1
    temp            res     1
    
                    fit
                    
    DAT             org     0                       ' array validation
    
                    res     lcnt & 1                ' lcnt must be 2n (even)
    
                    fit     0
    
    DAT
    
    Note that when you go multi cog the success/failure depends on how you do it. While this is blindingly obvious, using the 16 cycle loop has certain conditions in order to get it to work interleaved. For example each of the 4 cogs would sample 4 cycles apart. This also means that the hub access is 4 cycles apart. That's where the problem lies. Cog N and cog N+1 have their respective hub window slots 2 cycles apart. So you can't use them together however much you sync them with waitcnt. This means that for a 4 cog sampler using the above sample loop you have to use either all even (2n) or odd (2n+1) numbered cogs.
Sign In or Register to comment.