Cog registers can be accessed indirectly most easily by using the ALTS/ALTD/ALTR instructions. These instructions sum their D[5:0] and S/#[5:0] values to compute an address that is directly substituted into the next instruction’s S field, D field, or result register address (normally, this is the same as the D field). This all happens within the pipeline and does not affect the actual program code. The idea is that S/# can serve as a register base address and D can be used as an index:
ALTS index,#table ‘set next S field to table+index
MOV OUTA,0 ‘output register[table+index] to OUTA
ALTD index,#table ‘set next D field to table+index
MOV 0,INA ‘write INA to register[table+index]
ALTR index,#table ‘set next write to table+index
XOR INA,INB ‘write INA^INB to register[table+index]
The docs were not clear as to which bits are being used.
However, the issue is that normally when a sub-range of bits are used, the others are ignored.
This is not the case here, and in particular, ALTS overwrites the D value, and in some cases it is incorrect.
Also, the result passed through to the next instruction after the ALTS is also incorrect if the ignored bits are not zero.
If ALTS where D=$144 and S=$144, D is overwritten as $5D and the passed S is $88. Note that S[7:0] seems to be added and S[8] results in a subtraction from the D value.
While it is not a show-stopper, it best be spelt out as it is a restriction, not normal use.
ALTS sums D[8:0] and S/#[5:0]. All other bits of D & S must be zero
BTW I often store values in unused bits.
I note so does Chip...
.v _ret_ cmp 0,#0 'bottom byte used as a counter
I discovered this as a consequence of trying to use ALTS followed by a RDLONG, expecting an S result of 20bits
"These instructions sum their D[8:0] and S/#[8:0] values to compute a 9-bit address that is directly substituted into the next instruction’s S field (ALTS), D field (ALTD), or result register address (ALTR) (normally, this is the same as the D field).
"These instructions sum their D[8:0] and S/#[8:0] values to compute a 9-bit address that is directly substituted into the next instruction’s S field (ALTS), D field (ALTD), or result register address (ALTR) (normally, this is the same as the D field).
Upper bits of S are used to inc/dec D[8:0].
So S[D] means base[index], and the new incremented/decremented index is rewritten by ALTS
Needs more explanation in the docs (later)
Looks like the following happens...
Inserts S into the next instruction...
Take original D[31:0] and replace D[8:0] with the sum of (D[8:0]+S[8:0]) ignoring overflow.
Take original D[31:0] and write back with ALTS instruction D[31:0] ADDS S[17:9]>>9
Cluso99
Another contributing factor to the incorrect results in your tests is the effect of the
AUGS instruction also being applied to the instruction following the ALTS.
This can be demonstrated with the following code.
Take original D[31:0] and write back with ALTS instruction D[31:0] ADDS S[17:9]>>9
<then>
Certainly the instruction needs expanding, including a warning about the caveats of lost overflow and non-zero upper bits.
I'm not following here, the equation given shows a 31:0 wide addition, but Chip's comment, and the lost overflow comment, both suggest this +/- wraps inside the lower 9 bits D[8:0] ?
Take original D[31:0] and write back with ALTS instruction D[31:0] ADDS S[17:9]>>9
<then>
Certainly the instruction needs expanding, including a warning about the caveats of lost overflow and non-zero upper bits.
I'm not following here, the equation given shows a 31:0 wide addition, but Chip's comment, and the lost overflow comment, both suggest this +/- wraps inside the lower 9 bits D[8:0] ?
Chip or anyone,
I forget, how much HUB ROM at the top needs to be kept free for the debug and interrupts?
There are 16 longs for buffering cog regs $000..$00F and 16 longs for the debug routine. That's 32 longs per cog. Times 8 cogs means 256 longs, or 1KB. So, $FFC00...$FFFFF are directly needed, plus maybe that much, again, for overlay code to make the whole debugger work. So, perhaps $FF800..$FFFFF should be reserved.
Using v32c fpga code, Windows 10, PST (115,200) and a BeMicroCV-A9
I cannot get the autobaud/Prop_Chk sequence to return anything. pnut will download code.
Maybe the timeout isn't long enough to manually type in the sequence. I tried to load a new ROM code with bigger timeout into $4C000 but that didn't work.
Can someone please confirm that the manual sequence "> " and then "Prop_Chk<cr>" does indeed work (or not) with W10 & PST?
I tried that in a WIN7 system with TeraTerm and typed fairly slow and it worked responding with "Prop_Ver A" so I don't think it is timing out. There is something definitely wrong with Cluso's setup.
Using v32c fpga code, Windows 10, PST (115,200) and a BeMicroCV-A9
I cannot get the autobaud/Prop_Chk sequence to return anything. pnut will download code.
Maybe the timeout isn't long enough to manually type in the sequence. I tried to load a new ROM code with bigger timeout into $4C000 but that didn't work.
Can someone please confirm that the manual sequence "> " and then "Prop_Chk<cr>" does indeed work (or not) with W10 & PST?
Running same setup here.
> Prop_Chk 0 0 0 0 works fine.
> Prop_Chk 0 0 0 0
Prop_Ver F
Edit: Make sure you issue a space after the ">" character.
Yes, issuing a space after ">".
Gave up. Just wont work yet I can download. I can also run my own code with absolutely no problems which outputs characters, then reads and echos correctly.
ozprop What are the 0 0 0 0 after the Prop_Chk ? Are they necessary??? I gave it "> Prop_Chk<cr>" and maybe "<lf>" (not sure what enter gives on PST.
Postedit: Just spoke to Peter. Yep requires the 0 0 0 0
Just copy and paste the next two lines (includes a CR)
> Prop_Chk 0 0 0 0
I think it replies on the 4th 0, and does not need a trailing space, or <cr> ? oops, get_hex (of course) needs anything non-hex to exit, however ">" is swallowed inside INT as timing char.
One detail worth checking is to send a large file of many repeating 18 19 char blocks
That repeat is what MCU booters, and PC hosts, can repeat for the shortest possible reset-boot times.
Test is to confirm P2 always autobauds and echos ID(s), no matter what the reset exit phase is ?
The echo ID is 15 bytes for 18 19 bytes in, and I see a largish 1ms allowed for turnaround, which is 11.52 char times at 115200,
so that means once out of reset, P2 will echo one ID (15+12chT) for every two Prop_Chks (36chT) received.
A small turnaround is useful, but that 1ms sounds high, for Baud speeds of 115200~2MBd ?
If that is dropped to 1 Char time, the echo now always takes 15+1 chars, and gives a 3 char margin to catch the next _Chk - thus will echo on every _Chk, thanks to rx FIFO.
A while back I hooked up 3 Nano's to a propplug and configured assorted pullup/downs to configure their ID's.
I was able to load the nano's seperately and in parallel with ease.
Works great!
One detail worth checking is to send a large file of many repeating 18 char blocks
Sending a repeating string of "> Prop_Chk 0 0 0 0 " responds to each command with "Prop_Ver F" as expected.
hmm... Was that a packed/repeating string, no gaps ? - and every _Chk gave a _Ver ? ie you checked counts ?
That's not quite what I'd expect from the 1ms TX pause, as that adds 11.52 chars(!) to the 15ch reply, and now the rx block IN(19) is smaller than the tx block OUT(11.52+15), and the Rx fifo overruns/wraps
If there is instead a 1-char time TX pause, now the Tx block fits inside the Rx timing, and you can get a _Ver for every _Chk
hmm... Was that a packed/repeating string, no gaps ? - and every _Chk gave a _Ver ? ie you checked counts ?
The test sent 10 x "> Prop_Chk 0 0 0 0 " @ 115200 with a 3mS gap.
10 x "Prop_Ver F" responses were returned.
As the boot loader isn't a full duplex arrangement I wasn't expecting it to respond during the response message transmission.
Sorry I was a bit vague on that detail.
Comments
However, the issue is that normally when a sub-range of bits are used, the others are ignored.
This is not the case here, and in particular, ALTS overwrites the D value, and in some cases it is incorrect.
Also, the result passed through to the next instruction after the ALTS is also incorrect if the ignored bits are not zero.
If ALTS where D=$144 and S=$144, D is overwritten as $5D and the passed S is $88. Note that S[7:0] seems to be added and S[8] results in a subtraction from the D value.
While it is not a show-stopper, it best be spelt out as it is a restriction, not normal use.
ALTS sums D[8:0] and S/#[5:0]. All other bits of D & S must be zero
BTW I often store values in unused bits.
I note so does Chip...
.v _ret_ cmp 0,#0 'bottom byte used as a counter
I discovered this as a consequence of trying to use ALTS followed by a RDLONG, expecting an S result of 20bits
The doc's should say...
"These instructions sum their D[8:0] and S/#[8:0] values to compute a 9-bit address that is directly substituted into the next instruction’s S field (ALTS), D field (ALTD), or result register address (ALTR) (normally, this is the same as the D field).
Upper bits of S are used to inc/dec D[8:0].
Needs more explanation in the docs (later)
Looks like the following happens...
Inserts S into the next instruction...
Take original D[31:0] and replace D[8:0] with the sum of (D[8:0]+S[8:0]) ignoring overflow.
Take original D[31:0] and write back with ALTS instruction D[31:0] ADDS S[17:9]>>9
Another contributing factor to the incorrect results in your tests is the effect of the
AUGS instruction also being applied to the instruction following the ALTS.
This can be demonstrated with the following code. Results :
'
IIRC it was s[17] and S[16:9] is the value to add/subtract.
Certainly the instruction needs expanding, including a warning about the caveats of lost overflow and non-zero upper bits.
I do understand the advantages the instruction gives, but it is not obvious, and a trap.
I'm not following here, the equation given shows a 31:0 wide addition, but Chip's comment, and the lost overflow comment, both suggest this +/- wraps inside the lower 9 bits D[8:0] ?
Such a wrap is not going to be very compiler/relocatable friendly.
I forget, how much HUB ROM at the top needs to be kept free for the debug and interrupts?
isr_address = $ff840 + (!cog & $f) << 7
reg_buffer = $ff800 + (!cog & $f) << 7
There are 16 longs for buffering cog regs $000..$00F and 16 longs for the debug routine. That's 32 longs per cog. Times 8 cogs means 256 longs, or 1KB. So, $FFC00...$FFFFF are directly needed, plus maybe that much, again, for overlay code to make the whole debugger work. So, perhaps $FF800..$FFFFF should be reserved.
I'm still trying to hang onto 16 cogs!
isr_address = $ffc40 + (!test_cog & 7) << 7
reg_buffer = $ffc00 + (!test_cog & 7) << 7
So we can use that area to preload something??? Because all of a sudden, 16KB ROM doesn't seem that much
I seem to be unable to successfully test for a pull-down.
By chance, is there a weak pull-down enabled in the FPGA? And if so, is this also present in the real P2 silicon?
My mistake! Of course the pins without pull-downs will read as high
Here is my test code
I cannot get the autobaud/Prop_Chk sequence to return anything. pnut will download code.
Maybe the timeout isn't long enough to manually type in the sequence. I tried to load a new ROM code with bigger timeout into $4C000 but that didn't work.
Can someone please confirm that the manual sequence "> " and then "Prop_Chk<cr>" does indeed work (or not) with W10 & PST?
My notes from earlier postings say send string of 19 chars is min, "> Prop_Chk 0 0 0 0 "
and echo is 15 chars of CR+LF+“Prop_Ver Au”+CR+LF
You could also capture what pnut sends, as a sanity check.
> Prop_Chk 0 0 0 0 works fine.
Edit: Make sure you issue a space after the ">" character.
Gave up. Just wont work yet I can download. I can also run my own code with absolutely no problems which outputs characters, then reads and echos correctly.
ozprop What are the 0 0 0 0 after the Prop_Chk ? Are they necessary??? I gave it "> Prop_Chk<cr>" and maybe "<lf>" (not sure what enter gives on PST.
Postedit: Just spoke to Peter. Yep requires the 0 0 0 0
Just copy and paste the next two lines (includes a CR)
I think it replies on the 4th 0, and does not need a trailing space, or <cr> ? oops, get_hex (of course) needs anything non-hex to exit, however ">" is swallowed inside INT as timing char.
One detail worth checking is to send a large file of many repeating 18 19 char blocks
That repeat is what MCU booters, and PC hosts, can repeat for the shortest possible reset-boot times.
Test is to confirm P2 always autobauds and echos ID(s), no matter what the reset exit phase is ?
The echo ID is 15 bytes for 18 19 bytes in, and I see a largish 1ms allowed for turnaround, which is 11.52 char times at 115200,
so that means once out of reset, P2 will echo one ID (15+12chT) for every two Prop_Chks (36chT) received.
A small turnaround is useful, but that 1ms sounds high, for Baud speeds of 115200~2MBd ?
If that is dropped to 1 Char time, the echo now always takes 15+1 chars, and gives a 3 char margin to catch the next _Chk - thus will echo on every _Chk, thanks to rx FIFO.
edit: corrected get_hex exit and TX:RX timing
I was able to load the nano's seperately and in parallel with ease.
Works great!
hmm... Was that a packed/repeating string, no gaps ? - and every _Chk gave a _Ver ? ie you checked counts ?
That's not quite what I'd expect from the 1ms TX pause, as that adds 11.52 chars(!) to the 15ch reply, and now the rx block IN(19) is smaller than the tx block OUT(11.52+15), and the Rx fifo overruns/wraps
If there is instead a 1-char time TX pause, now the Tx block fits inside the Rx timing, and you can get a _Ver for every _Chk
10 x "Prop_Ver F" responses were returned.
As the boot loader isn't a full duplex arrangement I wasn't expecting it to respond during the response message transmission.
Sorry I was a bit vague on that detail.