p2asm

24

Comments

  • evanhevanh Posts: 6,964
    edited 2018-10-15 - 03:08:22
    In RTC you use the return-stack pointer as your Forth IP register --- it points to the threaded code --- so, NEXT is a RET instruction.
    There is a mechanism named Xbyte. It might be useful to you.

    When CALL is done, is a 32-bit address pushed onto the return-stack? So RET expects a 32-bit number, although it only uses the lower 20 bits for the address. Is this true?
    All variants of CALLx and RETx only use lower 20 bits for address, bit31 for C flag, bit30 for Z flag. C and Z are the only condition code bits, so all condition codes are preserved across the call. Remaining bits are presumably set zero on call and ignored on return.

    How many clock cycles does RET take?
    4 clocks. Same for all branching ... within register space.

    EDIT: added the proviso for execution speed.
    "There's no huge amount of massive material
    hidden in the rings that we can't see,
    the rings are almost pure ice."
  • jmg wrote: »
    When CALL is done, is a 32-bit address pushed onto the return-stack? So RET expects a 32-bit number, although it only uses the lower 20 bits for the address. Is this true?
    IIRC the stack is only as wide as it needs to be, and I think the non-address bits are ignored

    The stack width was increased to 32-bit some time ago.

    Formerly known as TonyB
  • evanh wrote: »
    In RTC you use the return-stack pointer as your Forth IP register --- it points to the threaded code --- so, NEXT is a RET instruction.
    There is a mechanism named Xbyte. It might be useful to you.

    When CALL is done, is a 32-bit address pushed onto the return-stack? So RET expects a 32-bit number, although it only uses the lower 20 bits for the address. Is this true?
    All variants of CALLx and RETx only use lower 20 bits for address, bit31 for C flag, bit30 for Z flag. C and Z are the only condition code bits, so all condition codes are preserved across the call. Remaining bits are presumably set zero on call and ignored on return.

    How many clock cycles does RET take?
    4 clocks. Same for all branching ... within register space.

    EDIT: added the proviso for execution speed.
    I looked up XBYTE and it says there are 6 clock cycles to call the primitive, plus the RET which is 2 clock cycles. This is a _RET prefix on a 2-cycle instruction.
    This is presumably the 6 clock-cycles that Peter Jakacki mentioned, but he was not counting the RET.
    Also, you said above that RET is 4 clock-cycles, not 2.

    What do you mean "within register space"? Does this mean that the return-stack has to be within the 512 bytes of register space? If so, then that obviously kills RTC --- I need the return-stack pointer to cover the entire range of memory because I would be using it as the Forth IP.

    Actually, I'm chilling out on the P2 as a platform for RTC. Your return-address is 32-bit, so my xt would have to be 32-bit --- this would result in hugely bloated code.
    It may be better to just go with XBYTE like in Tachyon Forth --- that is what the ISA designer wanted.
  • evanhevanh Posts: 6,964
    edited 2018-10-16 - 12:40:39
    I looked up XBYTE and it says there are 6 clock cycles to call the primitive, plus the RET which is 2 clock cycles. This is a _RET prefix on a 2-cycle instruction.
    This is presumably the 6 clock-cycles that Peter Jakacki mentioned, but he was not counting the RET.
    Also, you said above that RET is 4 clock-cycles, not 2.
    That _RET_ is sort of a half instruction. It adds two clocks to whatever instruction it is combined with. It's cool but a real oddity. Oh, and I wasn't complete about the preservation of flags across calls either - they are only restored by RETx when specified. _RET_ does not restore the flags.

    To be honest, I haven't tried to get my head around xbyte at all. I can't help any further on that one sorry.

    What do you mean "within register space"? Does this mean that the return-stack has to be within the 512 bytes of register space? If so, then that obviously kills RTC --- I need the return-stack pointer to cover the entire range of memory because I would be using it as the Forth IP.
    No, that is in reference to execution flow - where the program counter is pointing. Instructions can be fetched from registers (cogram) and also another area called lutram. Both these ram blocks are local to each propeller core (cog).

    Things aren't quite as smooth when executing from the shared main memory (hubram). There is a sort of prefetching buffer, called the fifo, that allows instructions to flow at full speed ... until a branch is hit. Then a refill is needed. This stalls for a 9-17 clocks. Similar for most hub reads. Hub writes are faster, 3-11 clocks. There is a couple of burst type mechanisms for faster data handling.

    Stacking to hubram is easy, there is a couple of special referencing registers, PTRA/PTRB, for this.

    "There's no huge amount of massive material
    hidden in the rings that we can't see,
    the rings are almost pure ice."
  • Not exactly sure when it started happening, but p2asm v0.009 (mingw Windows .exe) is no longer setting ERRORLEVEL > zero on a failed compile:
    p2asm - an assembler for the propeller 2 - version 0.009, 2019-02-21
    
    con
            _1ms = 80_000_000 / 1000
    dat
    org
            waitx   ##_ms * 25
            mov     foo, #1000
    foo     long    0
    
    I mostly do my p2asm compiles using Notepad++ with the NppExec plugin:
    p2asm "test.spin2"
    Process started (PID=4536) >>>
    5: ERROR: Symbol _ms is undefined when evaluating an expression
            waitx   ##_ms * 25
    6: ERROR: Immediate value must be between 0 and 511
            mov     foo, #1000
    <<< Process finished (PID=4536). (Exit code 0)
    ================ READY ================
    
    garryj
  • Dave HeinDave Hein Posts: 5,898
    edited 2019-02-23 - 21:04:19
    p2asm returns a value of 1 when it fails to open a file, encounters a bad command-line parameter or the symbol table is full. I think it's always returned a value of 0 for all other errors. However, it would be useful to return a non-zero value for assembly errors. I'll go ahead and add that.

    EDIT: I checked the change into GitHub. Also bumped the version number to 0.010.
  • Dave Hein wrote: »
    I think it's always returned a value of 0 for all other errors.
    Ah. That could very well be, as I have been using Notepad++ and NppExec to do compiles only, and I never really paid any attention to the program exit value because the error text was always there. Yesterday I was working on a batch file to compile with an option to load/run using loadp2 and its terminal emulator and that's when I noticed the zero exit value on compile error. Guess I'll chock this one up to yet another "senior moment" :blush:

    In any case, thanks very much for the fix!
    garryj
  • There is an incompatibility between PNUT and P2ASM, when parsing instructions like
    mov dest, #-(-1)
    

    This should be interpreted as
    mov dest, #1
    

    I know it looks odd, but such statements are spat out by LCC, which is the code generator I use for Catalina, and there is no easy way to change it. PNUT handles it ok, as did all the old P1 assemblers, but P2ASM doesn't.

    To fix P2ASM I have added the following code to the EvaluateExpression procedure, straight after the first CheckForEOL ...
        if (!strcmp(tokens[i], "-"))
        {
            if (CheckForEOL(++i, num))
            {
                *pindex = i;
                return 1;
            }
            if ((errnum = EvaluateExpression(prevprec, &i, tokens, num, &value, is_float))) return errnum;
    	if (*is_float)
                value ^= 0x80000000;
            else
                value = -value;
            *pval = value;
            *pindex = i;
            return 0;
        }
    

    Someone who is more familiar with P2ASM may want to check this is a correct fix.
    Catalina - a FREE ANSI C compiler for the Propeller.
    Download it from http://catalina-c.sourceforge.net/
  • Ross, thanks for finding the issue. I'll take a look at it, and fix it.
  • Would someone please translate these directives. I understand drvh, drvl, drvnot.
    I am trying to figure out the rest and find which one is the ina.
    Thanks.

    DIRL/DIRH/DIRC/DIRNC/DIRZ/DIRNZ/DIRRND/DIRNOT {#}D - affect pin D bit in DIRx
    OUTL/OUTH/OUTC/OUTNC/OUTZ/OUTNZ/OUTRND/OUTNOT {#}D - affect pin D bit in OUTx
    FLTL/FLTH/FLTC/FLTNC/FLTZ/FLTNZ/FLTRND/FLTNOT {#}D
  • Those instruction write the DIR and/or OUT bits. To read the IN bits, you can use TESTP/TESTPN.
  • @cgracey
    This chantal's buddy Martin in L.A.
    I am attempting the following:
    1. the charlieplexing as described by changing the pins to ina.

    This lookup table shows the required state of the Propeller I/O pins to illuminate each LED. For example, to light the LED at Row 3 Col 6, the table shows 3 HIGH 0 LOW. In this case, we would set IO 3 HIGH and IO 0 LOW to light this LED. Set both IO pins to INPUT mode to turn the LED off.
    This is from the description of the charlieplex accessory.

    Eventually wanting to write my own asynchronous serial routine for my gps.
    I am looking for examples of each of these if possible.
    Thanks
  • I'm afraid my proposed fix (a few posts above) for the p2asm bug involving expressions like "-(-2)" causes yet another problem.

    With my fix, p2asm incorrectly compiles expressions like "-248&$FF" as 0xffffffe4 instead of 0xe4 - i.e. it gets the precedence wrong.

    I can see why, but I cannot seem to fix it - I guess I'll have to leave it to the p2asm experts to come up with a real fix!

    Ross.
    Catalina - a FREE ANSI C compiler for the Propeller.
    Download it from http://catalina-c.sourceforge.net/
  • pilot0315 wrote: »
    Set both IO pins to INPUT mode to turn the LED off.
    All pins are inputs all the time. You can monitor the output by reading same input. However, to stop output drive: DIRL can be used, FLTx can be used, and AND DIRA can be used. There's other ways too.
    "There's no huge amount of massive material
    hidden in the rings that we can't see,
    the rings are almost pure ice."
  • Ross, I fixed the handling of the unary "-" operator in p2asm.c, and pushed the fix to GitHub.
  • Dave Hein wrote: »
    Ross, I fixed the handling of the unary "-" operator in p2asm.c, and pushed the fix to GitHub.

    Thanks, Dave. Unfortunately, I've just found another problem.

    Compiling these hub to cog "calld" instructions ...
    DAT
            org 0
    
    COG_0    nop 
    COG_1    nop 
    COG_2    nop 
    COG_3    nop 
    COG_4    nop 
    COG_5    nop 
    COG_6    nop 
    COG_7    nop 
    COG_8    nop 
    COG_9    nop 
    COG_a    nop 
    COG_b    nop 
    COG_c    nop 
    COG_d    nop 
    COG_e    nop 
    COG_f    nop 
    COG_10   nop 
    
            orgh $1000
    
      calld PA, #COG_0
      calld PA, #COG_1
      calld PA, #COG_2
      calld PA, #COG_3
      calld PA, #COG_4
      calld PA, #COG_5
      calld PA, #COG_6
      calld PA, #COG_7
      calld PA, #COG_8
      calld PA, #COG_9
      calld PA, #COG_a
      calld PA, #COG_b
      calld PA, #COG_c
      calld PA, #COG_d
      calld PA, #COG_e
      calld PA, #COG_f
      calld PA, #COG_10
    
    

    Give this result ...
    '#line 1 "C:\Users\User\AppData\Local\Temp\absolute.spin2"
                       DAT
    00000 000                  org 0
                       
    00000 000 00000000 COG_0    nop
    00004 001 00000000 COG_1    nop
    00008 002 00000000 COG_2    nop
    0000c 003 00000000 COG_3    nop
    00010 004 00000000 COG_4    nop
    00014 005 00000000 COG_5    nop
    00018 006 00000000 COG_6    nop
    0001c 007 00000000 COG_7    nop
    00020 008 00000000 COG_8    nop
    00024 009 00000000 COG_9    nop
    00028 00a 00000000 COG_a    nop
    0002c 00b 00000000 COG_b    nop
    00030 00c 00000000 COG_c    nop
    00034 00d 00000000 COG_d    nop
    00038 00e 00000000 COG_e    nop
    0003c 00f 00000000 COG_f    nop
    00040 010 00000000 COG_10   nop
                       
    01000                      orgh $1000
                       
    01000     fb27edff   calld PA, #COG_0
    01004     fe000001   calld PA, #COG_1
    01008     fe000002   calld PA, #COG_2
    0100c     fe000003   calld PA, #COG_3
    01010     fb27edfc   calld PA, #COG_4
    01014     fe000005   calld PA, #COG_5
    01018     fe000006   calld PA, #COG_6
    0101c     fe000007   calld PA, #COG_7
    01020     fb27edf9   calld PA, #COG_8
    01024     fe000009   calld PA, #COG_9
    01028     fe00000a   calld PA, #COG_a
    0102c     fe00000b   calld PA, #COG_b
    01030     fb27edf6   calld PA, #COG_c
    01034     fe00000d   calld PA, #COG_d
    01038     fe00000e   calld PA, #COG_e
    0103c     fe00000f   calld PA, #COG_f
    01040     fb27edf3   calld PA, #COG_10
                       
    

    These calls should all use absolute addressing, but if the cog address is divisible by 4, it gets it wrong.
    Catalina - a FREE ANSI C compiler for the Propeller.
    Download it from http://catalina-c.sourceforge.net/
  • I fixed the problem with cog addresses that are divisible by 4. It was caused by a kludge that I used to signal whether to use relative or absolute addresses.

    I believe I uncovered a problem with PNut when generating the CALLD instruction in cog memory with a cog address. It doesn't seem to be generating the 9-bit relative address correctly. The source field should be handled the same was as the DJNZ handles it. When PNut assembles the following program
    dat
            org
    COG_0   nop
    COG_1   nop
            calld PA, #COG_0
            calld PA, #COG_1
            calld PA, #COG_2
            calld PA, #COG_3
    COG_2   nop
    COG_3   nop
    COG_4   nop
    COG_5   nop
            djnz  PA, #COG_4
            djnz  PA, #COG_5
            djnz  PA, #COG_6
            djnz  PA, #COG_7
    COG_6   nop
    COG_7   nop
    
    the binary disassemble to this.
    0000 00000000              nop     
    0004 00000000              nop     
    0008 fb27edf7              long    $fb27edf7
    000c fb27edf4              long    $fb27edf4
    0010 fb27edf5              long    $fb27edf5
    0014 fb27edf2              long    $fb27edf2
    0018 00000000              nop     
    001c 00000000              nop     
    0020 00000000              nop     
    0024 00000000              nop     
    0028 fb6fedfd              djnz    $1f6, #$-2
    002c fb6fedfd              djnz    $1f6, #$-2
    0030 fb6fec01              djnz    $1f6, #$+2
    0034 fb6fec01              djnz    $1f6, #$+2
    0038 00000000              nop     
    003c 00000000              nop     
    
    The source fields for the CALLD instructions should match those in the DJNZ instructions.
  • I read the documentation the same way as you did, Dave, and fastspin produces the same offsets for djnz and calld in the example above. @cgracey, is this a PNut bug, or does the hardware really interpret the offsets differently for calld and djnz/tjnz/etc.?
  • Dave Hein wrote: »
    I fixed the problem with cog addresses that are divisible by 4. It was caused by a kludge that I used to signal whether to use relative or absolute addresses.

    Thanks, Dave!
    Catalina - a FREE ANSI C compiler for the Propeller.
    Download it from http://catalina-c.sourceforge.net/
  • It took a long time for me to figure out what was going on here. The problem was that the assembler's internal cog ORG tracks byte locations, not longs. So, I wasn't shifting the current cog ORG value right by 2 before doing the address subtraction. All fixed now. I will get a new assembler out sometime soon, but for now, just be aware of this bug. My compiler/assembler is undergoing a lot of work at this time and isn't that amenable to release just now.
  • evanhevanh Posts: 6,964
    edited 2019-04-10 - 01:24:57
    Sounds like same as the Err, maybe not. Might be good time to look at the LOC bug - https://forums.parallax.com/discussion/comment/1457051/#Comment_1457051 The issue is when, from cogexec, relative referencing hubram below $400. Also, maybe relative cogram referencing from cogexec.

    Interestingly, this can't affect branching as I first thought, so is maybe LOC exclusive issue.
    "There's no huge amount of massive material
    hidden in the rings that we can't see,
    the rings are almost pure ice."
  • evanhevanh Posts: 6,964
    edited 2019-04-13 - 04:36:10
    Here's a refreshed example. It waits for a keypress from terminal, hopefully this is helpful those that need time to start up a terminal program.
    PS: Tested with Fastspin v3.9.21 and Pnut v32i and P2asm v0.012.

    EDIT: Updated with dereferenced content displayed
    EDIT2: Updated with program counter also displayed
    EDIT3: Shift PC display to after adjustment
    EDIT4: Make it loop on every keypress
    EDIT5: Added lutram tests
    "There's no huge amount of massive material
    hidden in the rings that we can't see,
    the rings are almost pure ice."
  • Hi Dave

    Found another problem - this time with alignl or alignw used in cog code. PNUT disallows this, whereas p2sm allows it, but gets it wrong.

    To see the problem, try compiling the following code, with and without the "alignl", and look at the cog addresses generated ...
    DAT 
            org  0
    
    cog0    mov  cog1,cog2
    
            alignl
    
    cog1    long 0
    cog2    long 0
    
    Catalina - a FREE ANSI C compiler for the Propeller.
    Download it from http://catalina-c.sourceforge.net/
  • RossH wrote: »
    Hi Dave

    Found another problem - this time with alignl or alignw used in cog code. PNUT disallows this, whereas p2sm allows it, but gets it wrong.

    To see the problem, try compiling the following code, with and without the "alignl", and look at the cog addresses generated ...
    DAT 
            org  0
    
    cog0    mov  cog1,cog2
    
            alignl
    
    cog1    long 0
    cog2    long 0
    

    In PNut, ALIGNL/ALIGNW both align to hub memory, which cog ORG code may not be aligned to.

    If you want your cog code to be long-aligned to hub memory, you'll need to do an ALIGNL before the ORG.
  • Yet PNut requires that longs must be aligned relative to the starting hub address. The following code fails under PNut.
      org 0
      byte 123
      long 456
    
    So bytes, words and longs cannot be compacted within an ORG section. Also, PNut will silently align PASM instructions relative to the starting hub address.

    originally, p2asm forced long alignment when processing the ORG directive, but this was changed to match PNut many months ago. p2asm currently doesn't implicitly align longs or instructions, but I may change this to match PNut.
  • I fixed the cog addresses for the alignl, alignw and .balign directives. I allow the use of the alignment directives in long-aligned ORG segments, but I print an error if they are used in unaligned COG segments.

    I did not change the alignment handling for words, longs and instruction to match PNut. I'm still thinking about whether this needs to be done.
  • evanhevanh Posts: 6,964
    edited 2019-04-10 - 22:05:24
    Anyone looking at the LOC bug?
    EDIT: I've just updated it.
    "There's no huge amount of massive material
    hidden in the rings that we can't see,
    the rings are almost pure ice."
  • cgraceycgracey Posts: 11,050
    edited 2019-04-10 - 22:02:03
    Dave Hein wrote: »
    Yet PNut requires that longs must be aligned relative to the starting hub address. The following code fails under PNut.
      org 0
      byte 123
      long 456
    
    So bytes, words and longs cannot be compacted within an ORG section. Also, PNut will silently align PASM instructions relative to the starting hub address.

    originally, p2asm forced long alignment when processing the ORG directive, but this was changed to match PNut many months ago. p2asm currently doesn't implicitly align longs or instructions, but I may change this to match PNut.

    It looks like it works, to me, in PNut.

    This code:
    dat	org	0
    
    	byte	123
    	long	456
    

    Produces:
    00000- 7B C8 01 00 00 00 00 00
    

    This is my working version of PNut, which has been undergoing changes, but I don't think this behavior has changed. The object does get padded to the next long alignment, so that when objects are appended, each is long-aligned.
  • Sorry, I didn't actually try to assemble that particular code snippet. The error occurs when there is a label that is not long-aligned, such as with this code.
    dat	org	0
    
    	byte	123
    cog0	long	456
    
    So now I understand that the error concerns the label, and not the long value. That makes sense to me.

    I also now realize that the long-alignment is relative to the start of the ORG section. As an example, if the two least significant hub address bits at the beginning of the COG section are 01, then the hub address of any labels within the ORG section must also end in 01. Tricky!

    Personally, I don't see much value in unaligned COG sections, but the chip can handle it, so I guess it should be allowed.
  • jmgjmg Posts: 13,463
    RossH wrote: »
    Hi Dave

    Found another problem - this time with alignl or alignw used in cog code. PNUT disallows this, whereas p2sm allows it, but gets it wrong.

    To see the problem, try compiling the following code, with and without the "alignl", and look at the cog addresses generated ...
    DAT 
            org  0
    
    cog0    mov  cog1,cog2
    
            alignl
    
    cog1    long 0
    cog2    long 0
    

    In other assemblers I use, placement of ALIGN N in the code, rounds up the location base in that segment, to the next whole-N base.
    eg ALIGN 4 would shift an address that ended in 0b01,0b10,0b11, but not need to change an address that ends in 0b00
Sign In or Register to comment.