p2asm

evanh · 2018-10-15 02:55

HughAguilar wrote: »

In RTC you use the return-stack pointer as your Forth IP register --- it points to the threaded code --- so, NEXT is a RET instruction.

There is a mechanism named Xbyte. It might be useful to you.

When CALL is done, is a 32-bit address pushed onto the return-stack? So RET expects a 32-bit number, although it only uses the lower 20 bits for the address. Is this true?

All variants of CALLx and RETx only use lower 20 bits for address, bit31 for C flag, bit30 for Z flag. C and Z are the only condition code bits, so all condition codes are preserved across the call. Remaining bits are presumably set zero on call and ignored on return.

How many clock cycles does RET take?

4 clocks. Same for all branching ... within register space.

EDIT: added the proviso for execution speed.

TonyB_ · 2018-10-15 12:54

jmg wrote: »

HughAguilar wrote: »

When CALL is done, is a 32-bit address pushed onto the return-stack? So RET expects a 32-bit number, although it only uses the lower 20 bits for the address. Is this true?

IIRC the stack is only as wide as it needs to be, and I think the non-address bits are ignored

The stack width was increased to 32-bit some time ago.

HughAguilar · 2018-10-16 05:01

evanh wrote: »

HughAguilar wrote: »

In RTC you use the return-stack pointer as your Forth IP register --- it points to the threaded code --- so, NEXT is a RET instruction.

There is a mechanism named Xbyte. It might be useful to you.

When CALL is done, is a 32-bit address pushed onto the return-stack? So RET expects a 32-bit number, although it only uses the lower 20 bits for the address. Is this true?

All variants of CALLx and RETx only use lower 20 bits for address, bit31 for C flag, bit30 for Z flag. C and Z are the only condition code bits, so all condition codes are preserved across the call. Remaining bits are presumably set zero on call and ignored on return.

How many clock cycles does RET take?

4 clocks. Same for all branching ... within register space.

EDIT: added the proviso for execution speed.

I looked up XBYTE and it says there are 6 clock cycles to call the primitive, plus the RET which is 2 clock cycles. This is a _RET prefix on a 2-cycle instruction.
This is presumably the 6 clock-cycles that Peter Jakacki mentioned, but he was not counting the RET.
Also, you said above that RET is 4 clock-cycles, not 2.

What do you mean "within register space"? Does this mean that the return-stack has to be within the 512 bytes of register space? If so, then that obviously kills RTC --- I need the return-stack pointer to cover the entire range of memory because I would be using it as the Forth IP.

Actually, I'm chilling out on the P2 as a platform for RTC. Your return-address is 32-bit, so my xt would have to be 32-bit --- this would result in hugely bloated code.
It may be better to just go with XBYTE like in Tachyon Forth --- that is what the ISA designer wanted.

evanh · 2018-10-16 12:36

HughAguilar wrote: »

I looked up XBYTE and it says there are 6 clock cycles to call the primitive, plus the RET which is 2 clock cycles. This is a _RET prefix on a 2-cycle instruction.
This is presumably the 6 clock-cycles that Peter Jakacki mentioned, but he was not counting the RET.
Also, you said above that RET is 4 clock-cycles, not 2.

That _RET_ is sort of a half instruction. It adds two clocks to whatever instruction it is combined with. It's cool but a real oddity. Oh, and I wasn't complete about the preservation of flags across calls either - they are only restored by RETx when specified. _RET_ does not restore the flags.

To be honest, I haven't tried to get my head around xbyte at all. I can't help any further on that one sorry.

What do you mean "within register space"? Does this mean that the return-stack has to be within the 512 bytes of register space? If so, then that obviously kills RTC --- I need the return-stack pointer to cover the entire range of memory because I would be using it as the Forth IP.

No, that is in reference to execution flow - where the program counter is pointing. Instructions can be fetched from registers (cogram) and also another area called lutram. Both these ram blocks are local to each propeller core (cog).

Things aren't quite as smooth when executing from the shared main memory (hubram). There is a sort of prefetching buffer, called the fifo, that allows instructions to flow at full speed ... until a branch is hit. Then a refill is needed. This stalls for a 9-17 clocks. Similar for most hub reads. Hub writes are faster, 3-11 clocks. There is a couple of burst type mechanisms for faster data handling.

Stacking to hubram is easy, there is a couple of special referencing registers, PTRA/PTRB, for this.

garryj · 2019-02-23 20:19

Not exactly sure when it started happening, but p2asm v0.009 (mingw Windows .exe) is no longer setting ERRORLEVEL > zero on a failed compile:

p2asm - an assembler for the propeller 2 - version 0.009, 2019-02-21

con
        _1ms = 80_000_000 / 1000
dat
org
        waitx   ##_ms * 25
        mov     foo, #1000
foo     long    0

I mostly do my p2asm compiles using Notepad++ with the NppExec plugin:

p2asm "test.spin2"
Process started (PID=4536) >>>
5: ERROR: Symbol _ms is undefined when evaluating an expression
        waitx   ##_ms * 25
6: ERROR: Immediate value must be between 0 and 511
        mov     foo, #1000
<<< Process finished (PID=4536). (Exit code 0)
================ READY ================

Dave Hein · 2019-02-23 20:41

p2asm returns a value of 1 when it fails to open a file, encounters a bad command-line parameter or the symbol table is full. I think it's always returned a value of 0 for all other errors. However, it would be useful to return a non-zero value for assembly errors. I'll go ahead and add that.

EDIT: I checked the change into GitHub. Also bumped the version number to 0.010.

garryj · 2019-02-24 00:10

Dave Hein wrote: »

I think it's always returned a value of 0 for all other errors.

Ah. That could very well be, as I have been using Notepad++ and NppExec to do compiles only, and I never really paid any attention to the program exit value because the error text was always there. Yesterday I was working on a batch file to compile with an option to load/run using loadp2 and its terminal emulator and that's when I noticed the zero exit value on compile error. Guess I'll chock this one up to yet another "senior moment"

In any case, thanks very much for the fix!

RossH · 2019-03-25 02:35

There is an incompatibility between PNUT and P2ASM, when parsing instructions like

mov dest, #-(-1)

This should be interpreted as

mov dest, #1

I know it looks odd, but such statements are spat out by LCC, which is the code generator I use for Catalina, and there is no easy way to change it. PNUT handles it ok, as did all the old P1 assemblers, but P2ASM doesn't.

To fix P2ASM I have added the following code to the EvaluateExpression procedure, straight after the first CheckForEOL ...

    if (!strcmp(tokens[i], "-"))
    {
        if (CheckForEOL(++i, num))
        {
            *pindex = i;
            return 1;
        }
        if ((errnum = EvaluateExpression(prevprec, &i, tokens, num, &value, is_float))) return errnum;
	if (*is_float)
            value ^= 0x80000000;
        else
            value = -value;
        *pval = value;
        *pindex = i;
        return 0;
    }

Someone who is more familiar with P2ASM may want to check this is a correct fix.

Dave Hein · 2019-03-25 03:07

Ross, thanks for finding the issue. I'll take a look at it, and fix it.

pilot0315 · 2019-03-30 18:15

Would someone please translate these directives. I understand drvh, drvl, drvnot.
I am trying to figure out the rest and find which one is the ina.
Thanks.

DIRL/DIRH/DIRC/DIRNC/DIRZ/DIRNZ/DIRRND/DIRNOT {#}D - affect pin D bit in DIRx
OUTL/OUTH/OUTC/OUTNC/OUTZ/OUTNZ/OUTRND/OUTNOT {#}D - affect pin D bit in OUTx
FLTL/FLTH/FLTC/FLTNC/FLTZ/FLTNZ/FLTRND/FLTNOT {#}D

cgracey · 2019-03-30 18:45

Those instruction write the DIR and/or OUT bits. To read the IN bits, you can use TESTP/TESTPN.

pilot0315 · 2019-03-30 22:18

@cgracey
This chantal's buddy Martin in L.A.
I am attempting the following:
1. the charlieplexing as described by changing the pins to ina.

This lookup table shows the required state of the Propeller I/O pins to illuminate each LED. For example, to light the LED at Row 3 Col 6, the table shows 3 HIGH 0 LOW. In this case, we would set IO 3 HIGH and IO 0 LOW to light this LED. Set both IO pins to INPUT mode to turn the LED off.
This is from the description of the charlieplex accessory.

Eventually wanting to write my own asynchronous serial routine for my gps.
I am looking for examples of each of these if possible.
Thanks

RossH · 2019-04-05 02:18

I'm afraid my proposed fix (a few posts above) for the p2asm bug involving expressions like "-(-2)" causes yet another problem.

With my fix, p2asm incorrectly compiles expressions like "-248&$FF" as 0xffffffe4 instead of 0xe4 - i.e. it gets the precedence wrong.

I can see why, but I cannot seem to fix it - I guess I'll have to leave it to the p2asm experts to come up with a real fix!

Ross.

evanh · 2019-04-05 06:30

pilot0315 wrote: »

Set both IO pins to INPUT mode to turn the LED off.

All pins are inputs all the time. You can monitor the output by reading same input. However, to stop output drive: DIRL can be used, FLTx can be used, and AND DIRA can be used. There's other ways too.

Dave Hein · 2019-04-06 01:56

Ross, I fixed the handling of the unary "-" operator in p2asm.c, and pushed the fix to GitHub.

RossH · 2019-04-07 07:42

Dave Hein wrote: »

Ross, I fixed the handling of the unary "-" operator in p2asm.c, and pushed the fix to GitHub.

Thanks, Dave. Unfortunately, I've just found another problem.

Compiling these hub to cog "calld" instructions ...

DAT
        org 0

COG_0    nop 
COG_1    nop 
COG_2    nop 
COG_3    nop 
COG_4    nop 
COG_5    nop 
COG_6    nop 
COG_7    nop 
COG_8    nop 
COG_9    nop 
COG_a    nop 
COG_b    nop 
COG_c    nop 
COG_d    nop 
COG_e    nop 
COG_f    nop 
COG_10   nop 

        orgh $1000

  calld PA, #COG_0
  calld PA, #COG_1
  calld PA, #COG_2
  calld PA, #COG_3
  calld PA, #COG_4
  calld PA, #COG_5
  calld PA, #COG_6
  calld PA, #COG_7
  calld PA, #COG_8
  calld PA, #COG_9
  calld PA, #COG_a
  calld PA, #COG_b
  calld PA, #COG_c
  calld PA, #COG_d
  calld PA, #COG_e
  calld PA, #COG_f
  calld PA, #COG_10

Give this result ...

'#line 1 "C:\Users\User\AppData\Local\Temp\absolute.spin2"
                   DAT
00000 000                  org 0
                   
00000 000 00000000 COG_0    nop
00004 001 00000000 COG_1    nop
00008 002 00000000 COG_2    nop
0000c 003 00000000 COG_3    nop
00010 004 00000000 COG_4    nop
00014 005 00000000 COG_5    nop
00018 006 00000000 COG_6    nop
0001c 007 00000000 COG_7    nop
00020 008 00000000 COG_8    nop
00024 009 00000000 COG_9    nop
00028 00a 00000000 COG_a    nop
0002c 00b 00000000 COG_b    nop
00030 00c 00000000 COG_c    nop
00034 00d 00000000 COG_d    nop
00038 00e 00000000 COG_e    nop
0003c 00f 00000000 COG_f    nop
00040 010 00000000 COG_10   nop
                   
01000                      orgh $1000
                   
01000     fb27edff   calld PA, #COG_0
01004     fe000001   calld PA, #COG_1
01008     fe000002   calld PA, #COG_2
0100c     fe000003   calld PA, #COG_3
01010     fb27edfc   calld PA, #COG_4
01014     fe000005   calld PA, #COG_5
01018     fe000006   calld PA, #COG_6
0101c     fe000007   calld PA, #COG_7
01020     fb27edf9   calld PA, #COG_8
01024     fe000009   calld PA, #COG_9
01028     fe00000a   calld PA, #COG_a
0102c     fe00000b   calld PA, #COG_b
01030     fb27edf6   calld PA, #COG_c
01034     fe00000d   calld PA, #COG_d
01038     fe00000e   calld PA, #COG_e
0103c     fe00000f   calld PA, #COG_f
01040     fb27edf3   calld PA, #COG_10

These calls should all use absolute addressing, but if the cog address is divisible by 4, it gets it wrong.

Dave Hein · 2019-04-08 02:16

I fixed the problem with cog addresses that are divisible by 4. It was caused by a kludge that I used to signal whether to use relative or absolute addresses.

I believe I uncovered a problem with PNut when generating the CALLD instruction in cog memory with a cog address. It doesn't seem to be generating the 9-bit relative address correctly. The source field should be handled the same was as the DJNZ handles it. When PNut assembles the following program

dat
        org
COG_0   nop
COG_1   nop
        calld PA, #COG_0
        calld PA, #COG_1
        calld PA, #COG_2
        calld PA, #COG_3
COG_2   nop
COG_3   nop
COG_4   nop
COG_5   nop
        djnz  PA, #COG_4
        djnz  PA, #COG_5
        djnz  PA, #COG_6
        djnz  PA, #COG_7
COG_6   nop
COG_7   nop

the binary disassemble to this.

0000 00000000              nop     
0004 00000000              nop     
0008 fb27edf7              long    $fb27edf7
000c fb27edf4              long    $fb27edf4
0010 fb27edf5              long    $fb27edf5
0014 fb27edf2              long    $fb27edf2
0018 00000000              nop     
001c 00000000              nop     
0020 00000000              nop     
0024 00000000              nop     
0028 fb6fedfd              djnz    $1f6, #$-2
002c fb6fedfd              djnz    $1f6, #$-2
0030 fb6fec01              djnz    $1f6, #$+2
0034 fb6fec01              djnz    $1f6, #$+2
0038 00000000              nop     
003c 00000000              nop

The source fields for the CALLD instructions should match those in the DJNZ instructions.

ersmith · 2019-04-08 10:41

I read the documentation the same way as you did, Dave, and fastspin produces the same offsets for djnz and calld in the example above. @cgracey, is this a PNut bug, or does the hardware really interpret the offsets differently for calld and djnz/tjnz/etc.?

RossH · 2019-04-08 10:59

Dave Hein wrote: »

I fixed the problem with cog addresses that are divisible by 4. It was caused by a kludge that I used to signal whether to use relative or absolute addresses.

Thanks, Dave!

cgracey · 2019-04-09 23:41

It took a long time for me to figure out what was going on here. The problem was that the assembler's internal cog ORG tracks byte locations, not longs. So, I wasn't shifting the current cog ORG value right by 2 before doing the address subtraction. All fixed now. I will get a new assembler out sometime soon, but for now, just be aware of this bug. My compiler/assembler is undergoing a lot of work at this time and isn't that amenable to release just now.

evanh · 2019-04-10 01:12

Sounds like same as the Err, maybe not. Might be good time to look at the LOC bug - https://forums.parallax.com/discussion/comment/1457051/#Comment_1457051 The issue is when, from cogexec, relative referencing hubram below $400. Also, maybe relative cogram referencing from cogexec.

Interestingly, this can't affect branching as I first thought, so is maybe LOC exclusive issue.

evanh · 2019-04-10 03:05

Here's a refreshed example. It waits for a keypress from terminal, hopefully this is helpful those that need time to start up a terminal program.
PS: Tested with Fastspin v3.9.21 and Pnut v32i and P2asm v0.012.

EDIT: Updated with dereferenced content displayed
EDIT2: Updated with program counter also displayed
EDIT3: Shift PC display to after adjustment
EDIT4: Make it loop on every keypress
EDIT5: Added lutram tests

RossH · 2019-04-10 11:54

Hi Dave

Found another problem - this time with alignl or alignw used in cog code. PNUT disallows this, whereas p2sm allows it, but gets it wrong.

To see the problem, try compiling the following code, with and without the "alignl", and look at the cog addresses generated ...

DAT 
        org  0

cog0    mov  cog1,cog2

        alignl

cog1    long 0
cog2    long 0

cgracey · 2019-04-10 17:23

RossH wrote: »
Hi Dave

Found another problem - this time with alignl or alignw used in cog code. PNUT disallows this, whereas p2sm allows it, but gets it wrong.

To see the problem, try compiling the following code, with and without the "alignl", and look at the cog addresses generated ...
DAT 
        org  0

cog0    mov  cog1,cog2

        alignl

cog1    long 0
cog2    long 0

In PNut, ALIGNL/ALIGNW both align to hub memory, which cog ORG code may not be aligned to.

If you want your cog code to be long-aligned to hub memory, you'll need to do an ALIGNL before the ORG.

Dave Hein · 2019-04-10 17:57

Yet PNut requires that longs must be aligned relative to the starting hub address. The following code fails under PNut.

  org 0
  byte 123
  long 456

So bytes, words and longs cannot be compacted within an ORG section. Also, PNut will silently align PASM instructions relative to the starting hub address.

originally, p2asm forced long alignment when processing the ORG directive, but this was changed to match PNut many months ago. p2asm currently doesn't implicitly align longs or instructions, but I may change this to match PNut.

Dave Hein · 2019-04-10 21:45

I fixed the cog addresses for the alignl, alignw and .balign directives. I allow the use of the alignment directives in long-aligned ORG segments, but I print an error if they are used in unaligned COG segments.

I did not change the alignment handling for words, longs and instruction to match PNut. I'm still thinking about whether this needs to be done.

evanh · 2019-04-10 21:50

Anyone looking at the LOC bug?
EDIT: I've just updated it.

cgracey · 2019-04-10 22:00

Dave Hein wrote: »
Yet PNut requires that longs must be aligned relative to the starting hub address. The following code fails under PNut.
  org 0
  byte 123
  long 456
So bytes, words and longs cannot be compacted within an ORG section. Also, PNut will silently align PASM instructions relative to the starting hub address.

originally, p2asm forced long alignment when processing the ORG directive, but this was changed to match PNut many months ago. p2asm currently doesn't implicitly align longs or instructions, but I may change this to match PNut.

It looks like it works, to me, in PNut.

This code:

dat	org	0

	byte	123
	long	456

Produces:

00000- 7B C8 01 00 00 00 00 00

This is my working version of PNut, which has been undergoing changes, but I don't think this behavior has changed. The object does get padded to the next long alignment, so that when objects are appended, each is long-aligned.

Dave Hein · 2019-04-10 22:20

Sorry, I didn't actually try to assemble that particular code snippet. The error occurs when there is a label that is not long-aligned, such as with this code.

dat	org	0

	byte	123
cog0	long	456

So now I understand that the error concerns the label, and not the long value. That makes sense to me.

I also now realize that the long-alignment is relative to the start of the ORG section. As an example, if the two least significant hub address bits at the beginning of the COG section are 01, then the hub address of any labels within the ORG section must also end in 01. Tricky!

Personally, I don't see much value in unaligned COG sections, but the chip can handle it, so I guess it should be allowed.

jmg · 2019-04-10 22:20

RossH wrote: »
Hi Dave

Found another problem - this time with alignl or alignw used in cog code. PNUT disallows this, whereas p2sm allows it, but gets it wrong.

To see the problem, try compiling the following code, with and without the "alignl", and look at the cog addresses generated ...
DAT 
        org  0

cog0    mov  cog1,cog2

        alignl

cog1    long 0
cog2    long 0

In other assemblers I use, placement of ALIGN N in the code, rounds up the location base in that segment, to the next whole-N base.
eg ALIGN 4 would shift an address that ended in 0b01,0b10,0b11, but not need to change an address that ends in 0b00

p2asm

Comments