In RTC you use the return-stack pointer as your Forth IP register --- it points to the threaded code --- so, NEXT is a RET instruction.
There is a mechanism named Xbyte. It might be useful to you.
When CALL is done, is a 32-bit address pushed onto the return-stack? So RET expects a 32-bit number, although it only uses the lower 20 bits for the address. Is this true?
All variants of CALLx and RETx only use lower 20 bits for address, bit31 for C flag, bit30 for Z flag. C and Z are the only condition code bits, so all condition codes are preserved across the call. Remaining bits are presumably set zero on call and ignored on return.
How many clock cycles does RET take?
4 clocks. Same for all branching ... within register space.
When CALL is done, is a 32-bit address pushed onto the return-stack? So RET expects a 32-bit number, although it only uses the lower 20 bits for the address. Is this true?
IIRC the stack is only as wide as it needs to be, and I think the non-address bits are ignored
The stack width was increased to 32-bit some time ago.
In RTC you use the return-stack pointer as your Forth IP register --- it points to the threaded code --- so, NEXT is a RET instruction.
There is a mechanism named Xbyte. It might be useful to you.
When CALL is done, is a 32-bit address pushed onto the return-stack? So RET expects a 32-bit number, although it only uses the lower 20 bits for the address. Is this true?
All variants of CALLx and RETx only use lower 20 bits for address, bit31 for C flag, bit30 for Z flag. C and Z are the only condition code bits, so all condition codes are preserved across the call. Remaining bits are presumably set zero on call and ignored on return.
How many clock cycles does RET take?
4 clocks. Same for all branching ... within register space.
EDIT: added the proviso for execution speed.
I looked up XBYTE and it says there are 6 clock cycles to call the primitive, plus the RET which is 2 clock cycles. This is a _RET prefix on a 2-cycle instruction.
This is presumably the 6 clock-cycles that Peter Jakacki mentioned, but he was not counting the RET.
Also, you said above that RET is 4 clock-cycles, not 2.
What do you mean "within register space"? Does this mean that the return-stack has to be within the 512 bytes of register space? If so, then that obviously kills RTC --- I need the return-stack pointer to cover the entire range of memory because I would be using it as the Forth IP.
Actually, I'm chilling out on the P2 as a platform for RTC. Your return-address is 32-bit, so my xt would have to be 32-bit --- this would result in hugely bloated code.
It may be better to just go with XBYTE like in Tachyon Forth --- that is what the ISA designer wanted.
I looked up XBYTE and it says there are 6 clock cycles to call the primitive, plus the RET which is 2 clock cycles. This is a _RET prefix on a 2-cycle instruction.
This is presumably the 6 clock-cycles that Peter Jakacki mentioned, but he was not counting the RET.
Also, you said above that RET is 4 clock-cycles, not 2.
That _RET_ is sort of a half instruction. It adds two clocks to whatever instruction it is combined with. It's cool but a real oddity. Oh, and I wasn't complete about the preservation of flags across calls either - they are only restored by RETx when specified. _RET_ does not restore the flags.
To be honest, I haven't tried to get my head around xbyte at all. I can't help any further on that one sorry.
What do you mean "within register space"? Does this mean that the return-stack has to be within the 512 bytes of register space? If so, then that obviously kills RTC --- I need the return-stack pointer to cover the entire range of memory because I would be using it as the Forth IP.
No, that is in reference to execution flow - where the program counter is pointing. Instructions can be fetched from registers (cogram) and also another area called lutram. Both these ram blocks are local to each propeller core (cog).
Things aren't quite as smooth when executing from the shared main memory (hubram). There is a sort of prefetching buffer, called the fifo, that allows instructions to flow at full speed ... until a branch is hit. Then a refill is needed. This stalls for a 9-17 clocks. Similar for most hub reads. Hub writes are faster, 3-11 clocks. There is a couple of burst type mechanisms for faster data handling.
Stacking to hubram is easy, there is a couple of special referencing registers, PTRA/PTRB, for this.
Not exactly sure when it started happening, but p2asm v0.009 (mingw Windows .exe) is no longer setting ERRORLEVEL > zero on a failed compile:
p2asm - an assembler for the propeller 2 - version 0.009, 2019-02-21
con
_1ms = 80_000_000 / 1000
dat
org
waitx ##_ms * 25
mov foo, #1000
foo long 0
I mostly do my p2asm compiles using Notepad++ with the NppExec plugin:
p2asm "test.spin2"
Process started (PID=4536) >>>
5: ERROR: Symbol _ms is undefined when evaluating an expression
waitx ##_ms * 25
6: ERROR: Immediate value must be between 0 and 511
mov foo, #1000
<<< Process finished (PID=4536). (Exit code 0)
================ READY ================
p2asm returns a value of 1 when it fails to open a file, encounters a bad command-line parameter or the symbol table is full. I think it's always returned a value of 0 for all other errors. However, it would be useful to return a non-zero value for assembly errors. I'll go ahead and add that.
EDIT: I checked the change into GitHub. Also bumped the version number to 0.010.
I think it's always returned a value of 0 for all other errors.
Ah. That could very well be, as I have been using Notepad++ and NppExec to do compiles only, and I never really paid any attention to the program exit value because the error text was always there. Yesterday I was working on a batch file to compile with an option to load/run using loadp2 and its terminal emulator and that's when I noticed the zero exit value on compile error. Guess I'll chock this one up to yet another "senior moment"
There is an incompatibility between PNUT and P2ASM, when parsing instructions like
mov dest, #-(-1)
This should be interpreted as
mov dest, #1
I know it looks odd, but such statements are spat out by LCC, which is the code generator I use for Catalina, and there is no easy way to change it. PNUT handles it ok, as did all the old P1 assemblers, but P2ASM doesn't.
To fix P2ASM I have added the following code to the EvaluateExpression procedure, straight after the first CheckForEOL ...
if (!strcmp(tokens[i], "-"))
{
if (CheckForEOL(++i, num))
{
*pindex = i;
return 1;
}
if ((errnum = EvaluateExpression(prevprec, &i, tokens, num, &value, is_float))) return errnum;
if (*is_float)
value ^= 0x80000000;
else
value = -value;
*pval = value;
*pindex = i;
return 0;
}
Someone who is more familiar with P2ASM may want to check this is a correct fix.
Would someone please translate these directives. I understand drvh, drvl, drvnot.
I am trying to figure out the rest and find which one is the ina.
Thanks.
DIRL/DIRH/DIRC/DIRNC/DIRZ/DIRNZ/DIRRND/DIRNOT {#}D - affect pin D bit in DIRx
OUTL/OUTH/OUTC/OUTNC/OUTZ/OUTNZ/OUTRND/OUTNOT {#}D - affect pin D bit in OUTx
FLTL/FLTH/FLTC/FLTNC/FLTZ/FLTNZ/FLTRND/FLTNOT {#}D
@cgracey
This chantal's buddy Martin in L.A.
I am attempting the following:
1. the charlieplexing as described by changing the pins to ina.
This lookup table shows the required state of the Propeller I/O pins to illuminate each LED. For example, to light the LED at Row 3 Col 6, the table shows 3 HIGH 0 LOW. In this case, we would set IO 3 HIGH and IO 0 LOW to light this LED. Set both IO pins to INPUT mode to turn the LED off.
This is from the description of the charlieplex accessory.
Eventually wanting to write my own asynchronous serial routine for my gps.
I am looking for examples of each of these if possible.
Thanks
Set both IO pins to INPUT mode to turn the LED off.
All pins are inputs all the time. You can monitor the output by reading same input. However, to stop output drive: DIRL can be used, FLTx can be used, and AND DIRA can be used. There's other ways too.
I fixed the problem with cog addresses that are divisible by 4. It was caused by a kludge that I used to signal whether to use relative or absolute addresses.
I believe I uncovered a problem with PNut when generating the CALLD instruction in cog memory with a cog address. It doesn't seem to be generating the 9-bit relative address correctly. The source field should be handled the same was as the DJNZ handles it. When PNut assembles the following program
I read the documentation the same way as you did, Dave, and fastspin produces the same offsets for djnz and calld in the example above. @cgracey, is this a PNut bug, or does the hardware really interpret the offsets differently for calld and djnz/tjnz/etc.?
I fixed the problem with cog addresses that are divisible by 4. It was caused by a kludge that I used to signal whether to use relative or absolute addresses.
It took a long time for me to figure out what was going on here. The problem was that the assembler's internal cog ORG tracks byte locations, not longs. So, I wasn't shifting the current cog ORG value right by 2 before doing the address subtraction. All fixed now. I will get a new assembler out sometime soon, but for now, just be aware of this bug. My compiler/assembler is undergoing a lot of work at this time and isn't that amenable to release just now.
Here's a refreshed example. It waits for a keypress from terminal, hopefully this is helpful those that need time to start up a terminal program.
PS: Tested with Fastspin v3.9.21 and Pnut v32i and P2asm v0.012.
EDIT: Updated with dereferenced content displayed
EDIT2: Updated with program counter also displayed
EDIT3: Shift PC display to after adjustment
EDIT4: Make it loop on every keypress
EDIT5: Added lutram tests
Yet PNut requires that longs must be aligned relative to the starting hub address. The following code fails under PNut.
org 0
byte 123
long 456
So bytes, words and longs cannot be compacted within an ORG section. Also, PNut will silently align PASM instructions relative to the starting hub address.
originally, p2asm forced long alignment when processing the ORG directive, but this was changed to match PNut many months ago. p2asm currently doesn't implicitly align longs or instructions, but I may change this to match PNut.
I fixed the cog addresses for the alignl, alignw and .balign directives. I allow the use of the alignment directives in long-aligned ORG segments, but I print an error if they are used in unaligned COG segments.
I did not change the alignment handling for words, longs and instruction to match PNut. I'm still thinking about whether this needs to be done.
Yet PNut requires that longs must be aligned relative to the starting hub address. The following code fails under PNut.
org 0
byte 123
long 456
So bytes, words and longs cannot be compacted within an ORG section. Also, PNut will silently align PASM instructions relative to the starting hub address.
originally, p2asm forced long alignment when processing the ORG directive, but this was changed to match PNut many months ago. p2asm currently doesn't implicitly align longs or instructions, but I may change this to match PNut.
It looks like it works, to me, in PNut.
This code:
dat org 0
byte 123
long 456
Produces:
00000- 7B C8 01 00 00 00 00 00
This is my working version of PNut, which has been undergoing changes, but I don't think this behavior has changed. The object does get padded to the next long alignment, so that when objects are appended, each is long-aligned.
Sorry, I didn't actually try to assemble that particular code snippet. The error occurs when there is a label that is not long-aligned, such as with this code.
dat org 0
byte 123
cog0 long 456
So now I understand that the error concerns the label, and not the long value. That makes sense to me.
I also now realize that the long-alignment is relative to the start of the ORG section. As an example, if the two least significant hub address bits at the beginning of the COG section are 01, then the hub address of any labels within the ORG section must also end in 01. Tricky!
Personally, I don't see much value in unaligned COG sections, but the chip can handle it, so I guess it should be allowed.
Found another problem - this time with alignl or alignw used in cog code. PNUT disallows this, whereas p2sm allows it, but gets it wrong.
To see the problem, try compiling the following code, with and without the "alignl", and look at the cog addresses generated ...
DAT
org 0
cog0 mov cog1,cog2
alignl
cog1 long 0
cog2 long 0
In other assemblers I use, placement of ALIGN N in the code, rounds up the location base in that segment, to the next whole-N base.
eg ALIGN 4 would shift an address that ended in 0b01,0b10,0b11, but not need to change an address that ends in 0b00
Comments
All variants of CALLx and RETx only use lower 20 bits for address, bit31 for C flag, bit30 for Z flag. C and Z are the only condition code bits, so all condition codes are preserved across the call. Remaining bits are presumably set zero on call and ignored on return.
4 clocks. Same for all branching ... within register space.
EDIT: added the proviso for execution speed.
The stack width was increased to 32-bit some time ago.
This is presumably the 6 clock-cycles that Peter Jakacki mentioned, but he was not counting the RET.
Also, you said above that RET is 4 clock-cycles, not 2.
What do you mean "within register space"? Does this mean that the return-stack has to be within the 512 bytes of register space? If so, then that obviously kills RTC --- I need the return-stack pointer to cover the entire range of memory because I would be using it as the Forth IP.
Actually, I'm chilling out on the P2 as a platform for RTC. Your return-address is 32-bit, so my xt would have to be 32-bit --- this would result in hugely bloated code.
It may be better to just go with XBYTE like in Tachyon Forth --- that is what the ISA designer wanted.
To be honest, I haven't tried to get my head around xbyte at all. I can't help any further on that one sorry.
No, that is in reference to execution flow - where the program counter is pointing. Instructions can be fetched from registers (cogram) and also another area called lutram. Both these ram blocks are local to each propeller core (cog).
Things aren't quite as smooth when executing from the shared main memory (hubram). There is a sort of prefetching buffer, called the fifo, that allows instructions to flow at full speed ... until a branch is hit. Then a refill is needed. This stalls for a 9-17 clocks. Similar for most hub reads. Hub writes are faster, 3-11 clocks. There is a couple of burst type mechanisms for faster data handling.
Stacking to hubram is easy, there is a couple of special referencing registers, PTRA/PTRB, for this.
EDIT: I checked the change into GitHub. Also bumped the version number to 0.010.
In any case, thanks very much for the fix!
This should be interpreted as
I know it looks odd, but such statements are spat out by LCC, which is the code generator I use for Catalina, and there is no easy way to change it. PNUT handles it ok, as did all the old P1 assemblers, but P2ASM doesn't.
To fix P2ASM I have added the following code to the EvaluateExpression procedure, straight after the first CheckForEOL ...
Someone who is more familiar with P2ASM may want to check this is a correct fix.
I am trying to figure out the rest and find which one is the ina.
Thanks.
DIRL/DIRH/DIRC/DIRNC/DIRZ/DIRNZ/DIRRND/DIRNOT {#}D - affect pin D bit in DIRx
OUTL/OUTH/OUTC/OUTNC/OUTZ/OUTNZ/OUTRND/OUTNOT {#}D - affect pin D bit in OUTx
FLTL/FLTH/FLTC/FLTNC/FLTZ/FLTNZ/FLTRND/FLTNOT {#}D
This chantal's buddy Martin in L.A.
I am attempting the following:
1. the charlieplexing as described by changing the pins to ina.
This lookup table shows the required state of the Propeller I/O pins to illuminate each LED. For example, to light the LED at Row 3 Col 6, the table shows 3 HIGH 0 LOW. In this case, we would set IO 3 HIGH and IO 0 LOW to light this LED. Set both IO pins to INPUT mode to turn the LED off.
This is from the description of the charlieplex accessory.
Eventually wanting to write my own asynchronous serial routine for my gps.
I am looking for examples of each of these if possible.
Thanks
With my fix, p2asm incorrectly compiles expressions like "-248&$FF" as 0xffffffe4 instead of 0xe4 - i.e. it gets the precedence wrong.
I can see why, but I cannot seem to fix it - I guess I'll have to leave it to the p2asm experts to come up with a real fix!
Ross.
Thanks, Dave. Unfortunately, I've just found another problem.
Compiling these hub to cog "calld" instructions ...
Give this result ...
These calls should all use absolute addressing, but if the cog address is divisible by 4, it gets it wrong.
I believe I uncovered a problem with PNut when generating the CALLD instruction in cog memory with a cog address. It doesn't seem to be generating the 9-bit relative address correctly. The source field should be handled the same was as the DJNZ handles it. When PNut assembles the following program the binary disassemble to this. The source fields for the CALLD instructions should match those in the DJNZ instructions.
Thanks, Dave!
Interestingly, this can't affect branching as I first thought, so is maybe LOC exclusive issue.
PS: Tested with Fastspin v3.9.21 and Pnut v32i and P2asm v0.012.
EDIT: Updated with dereferenced content displayed
EDIT2: Updated with program counter also displayed
EDIT3: Shift PC display to after adjustment
EDIT4: Make it loop on every keypress
EDIT5: Added lutram tests
Found another problem - this time with alignl or alignw used in cog code. PNUT disallows this, whereas p2sm allows it, but gets it wrong.
To see the problem, try compiling the following code, with and without the "alignl", and look at the cog addresses generated ...
In PNut, ALIGNL/ALIGNW both align to hub memory, which cog ORG code may not be aligned to.
If you want your cog code to be long-aligned to hub memory, you'll need to do an ALIGNL before the ORG.
originally, p2asm forced long alignment when processing the ORG directive, but this was changed to match PNut many months ago. p2asm currently doesn't implicitly align longs or instructions, but I may change this to match PNut.
I did not change the alignment handling for words, longs and instruction to match PNut. I'm still thinking about whether this needs to be done.
EDIT: I've just updated it.
It looks like it works, to me, in PNut.
This code:
Produces:
This is my working version of PNut, which has been undergoing changes, but I don't think this behavior has changed. The object does get padded to the next long alignment, so that when objects are appended, each is long-aligned.
I also now realize that the long-alignment is relative to the start of the ORG section. As an example, if the two least significant hub address bits at the beginning of the COG section are 01, then the hub address of any labels within the ORG section must also end in 01. Tricky!
Personally, I don't see much value in unaligned COG sections, but the chip can handle it, so I guess it should be allowed.
In other assemblers I use, placement of ALIGN N in the code, rounds up the location base in that segment, to the next whole-N base.
eg ALIGN 4 would shift an address that ended in 0b01,0b10,0b11, but not need to change an address that ends in 0b00