Testing rdbyte/word/long and wrbyte/word/long
78rpm
Posts: 264
**** EDITED - 24th Oct ****
I've decided to attach the source code to this post in case anyone is struggling with P2 FPGA. Not found the error yet. :frown:
I have volunteered myself to checking the rdbyte/word/long and wrbyte/word/long instructions. I have written a basic test harness which uses my DE0-Nano and Parallax Serial Terminal (PST), thanks to the serial test program uploaded on the 14th Oct by mindrobots.
I think there is a memory overwrite within the harness caused by my binary to ASCII decimal conversion routine, but I haven't found where it is happening. Making data areas larger and more code causes slightly different problems. I shall of course continue to look for the error. Thus I wish to find the error before posting the code.
The rd/wr tests need to check every combination, which is a fair few with all pre/post decrement, indexing and scaling options. They also should be checked for execution in COG, LUT and HUB space.
I have written a couple of support functions for the test harness to assist with binary to decimal conversion, which I will post cut'n'pasted if anyone wishes to view or use them:
I've decided to attach the source code to this post in case anyone is struggling with P2 FPGA. Not found the error yet. :frown:
I have volunteered myself to checking the rdbyte/word/long and wrbyte/word/long instructions. I have written a basic test harness which uses my DE0-Nano and Parallax Serial Terminal (PST), thanks to the serial test program uploaded on the 14th Oct by mindrobots.
I think there is a memory overwrite within the harness caused by my binary to ASCII decimal conversion routine, but I haven't found where it is happening. Making data areas larger and more code causes slightly different problems. I shall of course continue to look for the error. Thus I wish to find the error before posting the code.
The rd/wr tests need to check every combination, which is a fair few with all pre/post decrement, indexing and scaling options. They also should be checked for execution in COG, LUT and HUB space.
I have written a couple of support functions for the test harness to assist with binary to decimal conversion, which I will post cut'n'pasted if anyone wishes to view or use them:
multiply32x32 -
divide32b32 -
mutltiply two 32 bit values toegther for a 64 bit result. Two registers supply the two values, and return two longs, a high long nand a low long. This is useful when you require amultipliction of two longs where one has a value fully in the low word of the long, and the other has some data in the high word of its' long, and where the result still fits in 32 bits.
divide32b32 -
divides a 32 bit value by a 32 bit value for a 32bit quotient and a 32 bit remainder.
Comments
FYI. The P2 has a cordic engine that facilitates multiply 32x32 and 64/32 divide in hardware.
Edit: Sorry, I just noticed you mentioned DE0-Nano. So ignore the above.
No cordic ... no problem.
I have yet to find the error which appears to move.
Thank you, Seairth, well spotted. I have changed my code but the problem is still there ... sometimes ... depending on what is where.
I wonder if Chip would support a new instruction ... FINDBUG D, S/#
Where D is the destination to receive the bug report.
Where S is the source code to check starting address.
Where # is:
%xx0 - simple report / 1= full report
%x0x - find this bug / 1 = find future bugs
%0xx - find bug in my code / 1 = find bug in others' code
Do you suspect there's a bug in your code, the chip, or you're not sure?
I'm not sure as It could be either. I always suspect my code, though I can't see anything obvious. And the problem appears when code or data are added or moved. It doesn't always happen which makes me suspect a code overwrite somewhere.
I am running in HubExec and using @# for Hub references and #\ for Cog references.
I have made another change, looking into what the code looks like for passing parameters and return values on the stack. Looks nice and readable even with the limitations of PNut macro support.
Now a problem which was there, in binary to ascii conversion, has gone away again. Perhaps it is based on a modulo of an address? It may be a case of do something completely different for an AHA! moment later, or tomorrow?
I thought I recalled someone mentioning similar before.
If it is a real problem, how can we:
a. Rule out software bug
b. Determine if it is a hardware bug
I can likely get my code into a state where the effect is happening. Which I don't mind posting here. That would allow more eyes on the software problem. It would also mean Chip could use the code in someway to assist with a hardware assistance / debugging feature in the FPGA verilog code and image.
Fortunately my code is running on the original boot single cog in HubExec mode, without interrupts. The serial receive routine uses edge detection. The debug interrupt vectors are correct, I dumped the memory to ensure they were there even though the Nano is limited to 32K or Hub RAM.
If I can help with this I certainly will.
BTW - you need to send a character to the Nano from a serial terminal to kick it off, done to prevent losing messages.
I have the code arranged so it generates 2 "Decimal error" messages. Code attached below (whole file). Don't worry about the clock and spinners on the screen, they're meant to be there
Locate the 'start' symbol in the source. When some long data above the start symbol are commented out, by uncommenting the block comment lines! (hope that makes sense) the code then generates 5 "Decimal error" messages.
By also uncommenting byte definitions before the start, the number of errors changes. I've recorded the numbers I see as comments in the code for both commented block (assembled) and uncommented block (not assembled - purely comment) to see if you get the same.
I look forward to hearing your comments.
In a P2 image in an FPGA in a foreign land, far, far away, there resided a small assmbler program, fluent in its' native language of PASM2. The program lived in a file on a disc which went round and round all day long...
A short way down from the start symbol, this code is executed. The send_dec routine is the start of a chain of calls. The value to converted from binary to ASCII numeric is in register r0 in COG. In send_dec routine, the divide32b32 is called. r2 is the divisor, although it's called multiplier, it's both, and due to where it originates. The check_for_decimal_error routine is small and checks that the value returned is within 0 and 9 inclusive. For some reason the send_hex doesn't output to the screen, perhaps it has been clobered? The values you see on the screen for the entire converted value, 5íë24544, have had an ASCII '0' added to them, ie $30.
It would be good too get to the bottom of this, but I think it possible that the registers used in the divide32b32 function, which are located in the single COG, are corrupted by some means. I'm not overwriting when doing divide, so is something else, or did I overwrite prior to it's call? I suspect it is registers r1 and r2, and/or t0 through t7, but what?
Tune in on this channel, at this time tomorrow, for the same story...or perhaps not.
I wish everyone looking into this good furtune in bug hunting.
I have NOT tried it in COG or LOT, so yes, only HUB EXEC. I could try that. I think the main thing I need to do is change all the global calla #@fun to calla #fun or #\fun, and I think all the jumps are relative anyway. Oh, and obviously to load it into a COG and init it or branch.
It is too big to fit in COG. Then I'm caught up in the same problem of moving things around and commenting some out and the problem disappears for a while.
However, I created a copy of the divide function and appended _inCog to its' name. Then I checked with that version loaded in COG that the HUB version still produced the same error, which it did. Then I changed the call to the _inCog version. The same error result occurs.
That function only uses registers in the COG RAM. Thus it looks to me the problem is regardless of HUB EXEC
I do not know what the L and I bit in the WR instructions refer. One obviously means immediate and I suspect that would be the I bit, but what is the L? Is it used to indicate the immediate field addresses hub
locations $00 to $ff inclusive? If it is for hub address, why does it not go the full 9bit range to $1ff? An enquiring mind wishes to know. Plus it I need to know for encoding the instructions to encode them on-the-fly as there are so many combinations to check with all the optional indexing and pre/post de/in-increment, and therefore checking register values pre and post op.
The following taken from P2 documentation:
Immediate S values > $FF indicate indexed PTRx variants.
These labels when referenced as hub addresses fall below $400 (cog and lut addresses).
Erratic issues in some of my code were cured by dropping a ORGH $400 in to bump them into hub space.
Strange behaviour disappeared.
Might be worth looking at.
Ah, may be I am misunderstanding this. I think you can access registers in Cog RAM from Hub Exec mode, without having to do a wrlong etc to Hub RAM. Are you saying this isn't the case?
The use of #\ is to get the absolute address of Cog Ram as that is where the registers are located. The addresses returned are long addresses where adjacent longs are located by current + 1, not +4 as in Hub. They are ORG 0.
My Hub Exec code and data is ORGH $1000.
I think that is correct, if it were not, HubExec would be of limited use.
This type of confusion is why I think the Assembler needs a clean-up pass to remove 'hieroglyph tags'.
I think introducing specific memory areas ( called data segments in some assemblers) can make it clear to the assembler and user which Registers/Ram they are addressing, without needing tags.
Specific data segments also makes code more naturally relocatable, which will matter more as ASM mixes with HLL.
The issue seems to be hub operations addressing < $400
Consider the following code Remove the ORGH $400 and it does not work.
I hope we're both correct on the matter of accessing registers in Cog RAM in this manner in Hub Exec.
Are you testing / developing on an FPGA board?
Too true, a good assembler would be useful for all this tinkering, but they take a while to develop. Certainly relocating data segments would be useful.
I find PNut delightful to use, and it's so fast at assembling and downloading to the FPGA that I'm not able to make out more than a few characters on the screen whilst it does its' stuff. My only problem with PNut is when I move to the browser I press F11 to post a message.
and if it is smaller than ORGH $400, what happens ?
With no ORGH there, wouldn't that drop message into COG memory ?
Because opcodes presently have the '# forcing', there is no possible error checks on this.
No, it wouldn't work, because message would then be in Cog RAM and you'd need to do a long access to it and mask or shift to get to the individual bytes in the long.
However, my code in Hub, when accessing bytes in Cog RAM, writes a long from Cog into Hub RAm and then moves each byte out into a local array, via Cog Ram, by means of a RDBYTE loop. Then if the data is valid, it executes a WRBYTE to a RAM buffer in Hub. That all looks correct to me even if it sound a bit long winded. Look at send_msg to see what I mean for the copying from Cog, Lut or Hub to the message buffer.
So, I only access data as longs in Cogs from Hub, even though that data may be packed into a long.
I'm no nearer to finding the problem, though I did look to see if I could see something in my code. And yes, I yes rdlut and wrlut to access Lut RAM.
I think my brain might melt
All attempts to set ORGH to addresses below $400 wil fail.
From what I see all hub references must be $400+
LOC is meant to get code addresses, not data addresses. Replace "loc ptrb, #message" with "mov ptrb, ##@message" and it should work.
Code addresses below $400 are in cogram, while data addresses below $400 are in hubram.
My understanding is:
long aligned x 512, long access only, +1 adddresses next long
$000 <= Cog RAM <= $1ff
$200 <= LUT RAM <= $3ff
byte aligned x lots, byte/word/long access, + 1 addresses next byte
The LOC instruction is for location and can only be used with special address registers, adra, adrb, ptra, ptrb. LOC is for loading these ptr registers. If you want to know the value contained in a pointer register, can you do or is it safer to do I think they both work as the ptr access and indexing and scaling are only used with RDBYTE, WRBYTE etc
Read the end section of the instructions.txt in the FPGA download files. However, this there is also this in the P2 Assembly Instructions, so which is right?
Thoughts?
If you pattern fill HUB and then dump that, what does that show ?
You didn't do your data access to hubram correctly. You had "loc ptrb, #message", which gets the code address of message into ptrb. You wanted "mov ptrb, ##message", which puts the data address of message into ptrb. Code and data addresses are the same for addresses greater than or equal to $400, which is why it worked with "orgh $400".
Addresses passed to mov and normal ALU instructions are always cog addresses, exactly the same as on the P1. If AUGx or something is used to make it more than 9 bits, those extra bits are ignored.
If you do "WRxxxx data, address", you will always get data out of hubram.
These are both data addresses.
If you do "jmp #x" or #loc ptrb, #x", you get the code address of x. If x is greater or equal to than $400, then x refers to the first byte of the instruction in hubram. If x is less than $400 and greater than $200, x & $1FF refers to the long of lutram that has the instruction. If x is less than $200, x refers to the long of cogram that has the instruction.
You cannot hubexec out of addresses less than $400. There was a long thread about that.
EDITED: added quote to top, moved last paragraph to top