Testing rdbyte/word/long and wrbyte/word/long
**** EDITED - 24th Oct ****
I've decided to attach the source code to this post in case anyone is struggling with P2 FPGA. Not found the error yet. :frown:
I have volunteered myself to checking the rdbyte/word/long and wrbyte/word/long instructions. I have written a basic test harness which uses my DE0-Nano and Parallax Serial Terminal (PST), thanks to the serial test program uploaded on the 14th Oct by mindrobots.
I think there is a memory overwrite within the harness caused by my binary to ASCII decimal conversion routine, but I haven't found where it is happening. Making data areas larger and more code causes slightly different problems. I shall of course continue to look for the error. Thus I wish to find the error before posting the code.
The rd/wr tests need to check every combination, which is a fair few with all pre/post decrement, indexing and scaling options. They also should be checked for execution in COG, LUT and HUB space.
I have written a couple of support functions for the test harness to assist with binary to decimal conversion, which I will post cut'n'pasted if anyone wishes to view or use them:
I've decided to attach the source code to this post in case anyone is struggling with P2 FPGA. Not found the error yet. :frown:
I have volunteered myself to checking the rdbyte/word/long and wrbyte/word/long instructions. I have written a basic test harness which uses my DE0-Nano and Parallax Serial Terminal (PST), thanks to the serial test program uploaded on the 14th Oct by mindrobots.

I think there is a memory overwrite within the harness caused by my binary to ASCII decimal conversion routine, but I haven't found where it is happening. Making data areas larger and more code causes slightly different problems. I shall of course continue to look for the error. Thus I wish to find the error before posting the code.
The rd/wr tests need to check every combination, which is a fair few with all pre/post decrement, indexing and scaling options. They also should be checked for execution in COG, LUT and HUB space.
I have written a couple of support functions for the test harness to assist with binary to decimal conversion, which I will post cut'n'pasted if anyone wishes to view or use them:
multiply32x32 -
divide32b32 -
mutltiply two 32 bit values toegther for a 64 bit result. Two registers supply the two values, and return two longs, a high long nand a low long. This is useful when you require amultipliction of two longs where one has a value fully in the low word of the long, and the other has some data in the high word of its' long, and where the result still fits in 32 bits.
divide32b32 -
divides a 32 bit value by a 32 bit value for a 32bit quotient and a 32 bit remainder.
Comments
FYI. The P2 has a cordic engine that facilitates multiply 32x32 and 64/32 divide in hardware.
CCCC 1101000 0LI DDDDDDDDD SSSSSSSSS QMUL D/#,S/# CCCC 1101000 1LI DDDDDDDDD SSSSSSSSS QDIV D/#,S/# CCCC 1101001 0LI DDDDDDDDD SSSSSSSSS QFRAC D/#,S/# CCCC 1101001 1LI DDDDDDDDD SSSSSSSSS QSQRT D/#,S/# CCCC 1101010 0LI DDDDDDDDD SSSSSSSSS QROTATE D/#,S/# CCCC 1101010 1LI DDDDDDDDD SSSSSSSSS QVECTOR D/#,S/# CCCC 1101011 CZ0 DDDDDDDDD 000011000 GETQX D {WC,WZ} CCCC 1101011 CZ0 DDDDDDDDD 000011001 GETQY D {WC,WZ}
Edit: Sorry, I just noticed you mentioned DE0-Nano. So ignore the above.
No cordic ... no problem.
I have yet to find the error which appears to move.
Thank you, Seairth, well spotted. I have changed my code but the problem is still there ... sometimes ... depending on what is where.
I wonder if Chip would support a new instruction ... FINDBUG D, S/#
Where D is the destination to receive the bug report.
Where S is the source code to check starting address.
Where # is:
%xx0 - simple report / 1= full report
%x0x - find this bug / 1 = find future bugs
%0xx - find bug in my code / 1 = find bug in others' code
Do you suspect there's a bug in your code, the chip, or you're not sure?
I'm not sure as It could be either. I always suspect my code, though I can't see anything obvious. And the problem appears when code or data are added or moved. It doesn't always happen which makes me suspect a code overwrite somewhere.
I am running in HubExec and using @# for Hub references and #\ for Cog references.
I have made another change, looking into what the code looks like for passing parameters and return values on the stack. Looks nice and readable even with the limitations of PNut macro support.
jmp #over_my_data CON STK_LOCAL_1 = 0 STK_TOS_ON_ENTRY = -1 'ptrb here STK_RET_ADDR = -2 STK_ARG_3_DIV_RESULT = -3 STK_ARG_2_DIVISOR = -4 STK_ARG_1_NUMERATOR = -5 STK_RESULT = -6 DAT pass_on_stack pusha ptrb pusha ptra popa ptrb ' now addressing stack ' make space for locals by adding to ptra, ' accessed by rdlong x,ptrb[ 1 ] etc pusha r1 pusha r2 ' divide takes values in registers, so we must load them ' instead of doing a pusha scratch rdlong scratch, ptrb[ STK_ARG_1_NUMERATOR ] mov r1, scratch rdlong scratch, ptrb[ STK_ARG_2_DIVISOR ] mov r2, scratch calla #@divide32b32 ' returns Q:R in r2:r1, r1 / r2 pusha r1 ' Q rdlong scratch, ptrb[ STK_ARG_3_DIV_RESULT ] cmp r1, scratch wc, wz if_e jmp #.L10 ' error - we may wish to return an error code or ' null value here wrlong scratch, ptrb[ STK_RESULT ] loc adrb, #@pass_on_stack_div_error calla #@send_msg pusha ptrb popa adrb ' address of result add adrb, ##STK_ARG_3_DIV_RESULT * 4 ' longs calla #@send_dec loc adrb, #@newline calla #@send_msg jmp #.L20 .L10 ' result is correct wrlong scratch, ptrb[ STK_RESULT ] .L20 popa scratch ' discard r1 popa r2 popa r1 popa ptrb reta ' pass_on_stack end stack_args_result byte "Result from passing arguments on the stack = ", 0 pass_on_stack_div_error byte "Error dividing values passed on stack", 0 over_my_data pusha ##0 ' return value pusha ##$76543210 ' arg 1 pusha ##$1cedcafe ' arg 2 pusha ##$76543210/$1cedcafe ' arg 3 calla #@pass_on_stack sub ptra, #3 * 4 ' 3 args of longs popa r8 ' return value loc adrb, #@stack_args_result calla #@send_msg loc adrb,#\r8 calla #@send_dec loc adrb, #@newline calla #@send_msg
Now a problem which was there, in binary to ascii conversion, has gone away again. Perhaps it is based on a modulo of an address? It may be a case of do something completely different for an AHA! moment later, or tomorrow?
I thought I recalled someone mentioning similar before.
If it is a real problem, how can we:
a. Rule out software bug
b. Determine if it is a hardware bug
I can likely get my code into a state where the effect is happening. Which I don't mind posting here. That would allow more eyes on the software problem. It would also mean Chip could use the code in someway to assist with a hardware assistance / debugging feature in the FPGA verilog code and image.
Fortunately my code is running on the original boot single cog in HubExec mode, without interrupts. The serial receive routine uses edge detection. The debug interrupt vectors are correct, I dumped the memory to ensure they were there even though the Nano is limited to 32K or Hub RAM.
If I can help with this I certainly will.
BTW - you need to send a character to the Nano from a serial terminal to kick it off, done to prevent losing messages.
I have the code arranged so it generates 2 "Decimal error" messages. Code attached below (whole file). Don't worry about the clock and spinners on the screen, they're meant to be there
Locate the 'start' symbol in the source. When some long data above the start symbol are commented out, by uncommenting the block comment lines! (hope that makes sense) the code then generates 5 "Decimal error" messages.
By also uncommenting byte definitions before the start, the number of errors changes. I've recorded the numbers I see as comments in the code for both commented block (assembled) and uncommented block (not assembled - purely comment) to see if you get the same.
'{ 'uncomment this line to generate 5 decimal errors - see few lines below[/color] ' this is 'trap' data to aid in identifying incorrect indexing hub_long_after long 16, 17, 18, 19, 20, 21, 22, 23 ' some data before the array long 24, 25, 26, 27, 28, 29, 30, 31 long 32 '} 'uncomment this line to generate 5 decimal errors - see few lines above[/color] in_hub long 0 '----------------------------------------------------------------------------------- alignl ' aligned long 2 Decimal error reports, with hub_long_after assembled 'byte 1 ' long off by 1 byte 1 Decimal error report 'byte 2 ' long off by 2 byte 1 Decimal error report 'byte 3 ' long off by 3 byte 1 Decimal error report 'byte 4 ' long off by 0 byte 1 Decimal error report - long aligned again 'byte 5 ' long off by 1 byte 1 Decimal error report 'byte 6 ' long off by 2 byte 1 Decimal error report 'byte 7 ' long off by 3 byte 1 Decimal error report 'byte 8 ' long off by 0 byte 1 Decimal error report - long aligned again ' aligned long 5 Decimal error reports, with hub_long_after ' commented out 'byte 1 ' long off by 1 byte 1 Decimal error report 'byte 2 ' long off by 2 byte 2 Decimal error report 'byte 3 ' long off by 3 byte 2 Decimal error report 'byte 4 ' long off by 0 byte 2 Decimal error report - long aligned again 'byte 5 ' long off by 1 byte 1 Decimal error report 'byte 6 ' long off by 2 byte 1 Decimal error report 'byte 7 ' long off by 3 byte 1 Decimal error report 'byte 8 ' long off by 0 byte 1 Decimal error report - long aligned again ' comment out all the hub_long_after data a few lines above and 5 Decimal ' errors are logged start mov dira, ##$ffff
I look forward to hearing your comments.
In a P2 image in an FPGA in a foreign land, far, far away, there resided a small assmbler program, fluent in its' native language of PASM2. The program lived in a file on a disc which went round and round all day long...
A short way down from the start symbol, this code is executed. The send_dec routine is the start of a chain of calls. The value to converted from binary to ASCII numeric is in register r0 in COG.
mov r0, ##SYS_CLK ' system clock loc adrb, #@sys_clocks calla #@send_msg loc adrb,#\r0 calla #@send_dec loc adrb, #@newline calla #@send_msg
In send_dec routine, the divide32b32 is called. r2 is the divisor, although it's called multiplier, it's both, and due to where it originates..mult_is_valid loc ptrb, #@digit_buf .next_digit mov r1, r0 ' r0 = value mov r2, multiplier ' r2 = 2nd term calla #@divide32b32 ' 32b / 32b, returns Q:R in r1:r2 calla #@check_for_decimal_error
The check_for_decimal_error routine is small and checks that the value returned is within 0 and 9 inclusive. For some reason the send_hex doesn't output to the screen, perhaps it has been clobered? The values you see on the screen for the entire converted value, 5íë24544, have had an ASCII '0' added to them, ie $30.check_for_decimal_error cmp r1, #9 wc, wz if_be jmp #.L888 pusha adrb loc adrb, #@decimal_error calla #@send_msg loc adrb, #\r1 calla #@send_hex loc adrb, #@newline calla #@send_msg popa adrb .L888 reta ' check_for_decimal_error end
It would be good too get to the bottom of this, but I think it possible that the registers used in the divide32b32 function, which are located in the single COG, are corrupted by some means. I'm not overwriting when doing divide, so is something else, or did I overwrite prior to it's call? I suspect it is registers r1 and r2, and/or t0 through t7, but what?
divide32b32 pusha t0 pusha t1 pusha t2 pusha t3 pusha t4 pusha t5 pusha t6 or r2, r2 wz if_nz jmp #send_dbg_div_go loc adrb, #@division_by_zero_error calla #@send_msg jmp #send_dbg_div_done send_dbg_div_go mov t1, #0 ' Q mov t2, #0 ' R mov t3, r1 ' N mov t4, ##1 << 31 ' bit mask mov t0, #32 ' num bits in N mov t6, t0 sub t6, #1 ' range 31 to 0 .next_loop shl t2, #1 ' R = R << 1 mov t5, t3 ' N and t5, t4 wz ' bit n set of N if_z clrb t2, #0 ' clr bit 0 of R if_nz setb t2, #0 ' set bit 0 of R cmp t2, r2 wc, wz if_b jmp #.after sub t2, r2 ' R <- R - D setb t1, t6 ' but 31 to 0 of Q .after shr t4, #1 ' bit mask sub t6, #1 ' adjust bit number 31 to 0 djnz t0, #.next_loop send_dbg_div_done mov r1, t1 ' return quotient result in r1 mov r2, t2 ' return remainder result in r2 popa t6 popa t5 popa t4 popa t3 popa t2 popa t1 popa t0 reta ' divide32b32 end
Tune in on this channel, at this time tomorrow, for the same story...or perhaps not.I wish everyone looking into this good furtune in bug hunting.
I have NOT tried it in COG or LOT, so yes, only HUB EXEC. I could try that. I think the main thing I need to do is change all the global calla #@fun to calla #fun or #\fun, and I think all the jumps are relative anyway. Oh, and obviously to load it into a COG and init it or branch.
It is too big to fit in COG. Then I'm caught up in the same problem of moving things around and commenting some out and the problem disappears for a while.
However, I created a copy of the divide function and appended _inCog to its' name. Then I checked with that version loaded in COG that the HUB version still produced the same error, which it did. Then I changed the call to the _inCog version. The same error result occurs.
That function only uses registers in the COG RAM. Thus it looks to me the problem is regardless of HUB EXEC
I do not know what the L and I bit in the WR instructions refer. One obviously means immediate and I suspect that would be the I bit, but what is the L? Is it used to indicate the immediate field addresses hub
locations $00 to $ff inclusive? If it is for hub address, why does it not go the full 9bit range to $1ff? An enquiring mind wishes to know. Plus it I need to know for encoding the instructions to encode them on-the-fly as there are so many combinations to check with all the optional indexing and pre/post de/in-increment, and therefore checking register values pre and post op.
The following taken from P2 documentation:
Immediate S values > $FF indicate indexed PTRx variants.
These labels when referenced as hub addresses fall below $400 (cog and lut addresses).
Erratic issues in some of my code were cured by dropping a ORGH $400 in to bump them into hub space.
Strange behaviour disappeared.
Might be worth looking at.
Ah, may be I am misunderstanding this. I think you can access registers in Cog RAM from Hub Exec mode, without having to do a wrlong etc to Hub RAM. Are you saying this isn't the case?
The use of #\ is to get the absolute address of Cog Ram as that is where the registers are located. The addresses returned are long addresses where adjacent longs are located by current + 1, not +4 as in Hub. They are ORG 0.
My Hub Exec code and data is ORGH $1000.
I think that is correct, if it were not, HubExec would be of limited use.
This type of confusion is why I think the Assembler needs a clean-up pass to remove 'hieroglyph tags'.
I think introducing specific memory areas ( called data segments in some assemblers) can make it clear to the assembler and user which Registers/Ram they are addressing, without needing tags.
Specific data segments also makes code more naturally relocatable, which will matter more as ASM mixes with HLL.
The issue seems to be hub operations addressing < $400
Consider the following code
dat orgh 0 org setb dirb,#62 setb outb,#62 wait4key jp #63,#wait4key loc ptrb,#message call #send_msg here jmp #here send_msg rdbyte ax,ptrb++ wz if_z ret call #send_byte jmp #send_msg send_byte setb ax,#8 shl ax,#1 getcnt bx rep @sb_loop,#10 addcnt bx,bit_time testb ax,#0 wz setbnz outb,#62 waitcnt shr ax,#1 sb_loop ret bit_time long 50_000_000 / 115_200 ax long 0 bx long 0 orgh $400 message byte "Ozpropdev was here!",0
Remove the ORGH $400 and it does not work.I hope we're both correct on the matter of accessing registers in Cog RAM in this manner in Hub Exec.
Are you testing / developing on an FPGA board?
Too true, a good assembler would be useful for all this tinkering, but they take a while to develop. Certainly relocating data segments would be useful.
I find PNut delightful to use, and it's so fast at assembling and downloading to the FPGA that I'm not able to make out more than a few characters on the screen whilst it does its' stuff. My only problem with PNut is when I move to the browser I press F11 to post a message.
and if it is smaller than ORGH $400, what happens ?
With no ORGH there, wouldn't that drop message into COG memory ?
Because opcodes presently have the '# forcing', there is no possible error checks on this.
No, it wouldn't work, because message would then be in Cog RAM and you'd need to do a long access to it and mask or shift to get to the individual bytes in the long.
However, my code in Hub, when accessing bytes in Cog RAM, writes a long from Cog into Hub RAm and then moves each byte out into a local array, via Cog Ram, by means of a RDBYTE loop. Then if the data is valid, it executes a WRBYTE to a RAM buffer in Hub. That all looks correct to me even if it sound a bit long winded. Look at send_msg to see what I mean for the copying from Cog, Lut or Hub to the message buffer.
So, I only access data as longs in Cogs from Hub, even though that data may be packed into a long.
I'm no nearer to finding the problem, though I did look to see if I could see something in my code. And yes, I yes rdlut and wrlut to access Lut RAM.
I think my brain might melt
All attempts to set ORGH to addresses below $400 wil fail.
From what I see all hub references must be $400+
LOC is meant to get code addresses, not data addresses. Replace "loc ptrb, #message" with "mov ptrb, ##@message" and it should work.
Code addresses below $400 are in cogram, while data addresses below $400 are in hubram.
My understanding is:
long aligned x 512, long access only, +1 adddresses next long
$000 <= Cog RAM <= $1ff
$200 <= LUT RAM <= $3ff
byte aligned x lots, byte/word/long access, + 1 addresses next byte
The LOC instruction is for location and can only be used with special address registers, adra, adrb, ptra, ptrb. LOC is for loading these ptr registers. If you want to know the value contained in a pointer register, can you do
mov my_reg, ptrb
or is it safer to dopush my_reg pop ptrb
I think they both work as the ptr access and indexing and scaling are only used with RDBYTE, WRBYTE etcRead the end section of the instructions.txt in the FPGA download files.
For immediate-branch and LOC address operands, "#" is used before the address. In cases where there is an option between absolute and relative addressing, the assembler will choose absolute addressing when the branch crosses between cog and hub domains, or relative addressing when the branch stays in the same domain. Absolute addressing can be forced by following "#" with "\". DJZ/DJNZ/DJS/DJNS/TJZ/TJNZ/TJS/TJNS/JP/JNP - relative/indirect JMP/CALL/CALLA/CALLB/CALLD - absolute/relative/indirect LOC - absolute/relative
However, this there is also this in the P2 Assembly Instructions, so which is right?Label values are determined as follows: ● Labels defined in an ORGH section resolve to a hub address or offset (in bytes), regardless of whether the label is referenced in an ORGH or ORG section. ● Labels defined in an ORG section resolve to a cog address or offset (in longs), regardless of whether the label is referenced in an ORGH or ORG section. ● When the effective hub address or offset is needed for a label that is defined in an ORG section, the label may be preceded by a "@" to force resolution to a hub address or offset. ● Though it is possible to apply the "@" to labels defined in ORGH sections, it has no effect. Expressions ● Expressions can contain numbers, labels, and nested expressions. The simplest expression is either a single number or label. ● An expression that begins with # or ## is known as an "immediate" value. ● For branching instructions, immediate values can be either "absolute" or "relative", depending on context. "relative", depending on context. ● For non-branching instructions, immediate values are always "absolute". ● "Absolute immediate" interpretation can be forced by using "#\" or "##\". ● There is no operator for forcing a "relative immediate" interpretation. ● # indicates a 9-bit (short-form) or 20-bit (long-form) immediate value: ○ For short-form branch instructions, this is a 9-bit relative immediate. ○ For long-form branch instructions that change execution mode (cog <- > hub), this is a 20-bit absolute immediate. ○ For long-form branch instructions that do not change execution mode, this is a 20-bit relative immediate. ○ For all other instructions, this is a 9-bit absolute immediate. ○ In circumstances where an absolute immediate must be forced, the expression is prefaced with "#\". ● ## indicates a 32-bit immediate value ○ An implicit AUGx will precede the instruction containing the expression. ○ The lower 9 bits will be encoded in the instruction and the upper 23 bits will be encoded in the AUGx. ○ For short-form branch instructions, this is a 20-bit relative immediate. The upper 12 bits are ignored. ○ For non-branch instructions, this is a 32-bit absolute immediate. ○ This is meaningless for long-form branche instructions. PNUT throws an error. ● For BYTE/WORD/LONG, the expression is encoded as raw data. If the expression begins with # or ##, PNUT throws an error. ● For all other expressions that do not begin with # or ##, the expression is encoded as a register address and must be between $000 and $1FF.
Thoughts?
If you pattern fill HUB and then dump that, what does that show ?
You didn't do your data access to hubram correctly. You had "loc ptrb, #message", which gets the code address of message into ptrb. You wanted "mov ptrb, ##message", which puts the data address of message into ptrb. Code and data addresses are the same for addresses greater than or equal to $400, which is why it worked with "orgh $400".
Addresses passed to mov and normal ALU instructions are always cog addresses, exactly the same as on the P1. If AUGx or something is used to make it more than 9 bits, those extra bits are ignored.
If you do "WRxxxx data, address", you will always get data out of hubram.
These are both data addresses.
If you do "jmp #x" or #loc ptrb, #x", you get the code address of x. If x is greater or equal to than $400, then x refers to the first byte of the instruction in hubram. If x is less than $400 and greater than $200, x & $1FF refers to the long of lutram that has the instruction. If x is less than $200, x refers to the long of cogram that has the instruction.
You cannot hubexec out of addresses less than $400. There was a long thread about that.
EDITED: added quote to top, moved last paragraph to top