Prop2 FPGA files!!! - Updated 2 June 2018 - Final Version 32i

cgracey · 2018-04-28 09:22

Cluso99, only the 9-bit S field of the next instrucion is affected. I may not be understanding. Did you see the documentation?

Cluso99 · 2018-04-28 10:30

From the docs...

REGISTER INDIRECTION

Cog registers can be accessed indirectly most easily by using the ALTS/ALTD/ALTR instructions. These instructions sum their D[5:0] and S/#[5:0] values to compute an address that is directly substituted into the next instruction’s S field, D field, or result register address (normally, this is the same as the D field). This all happens within the pipeline and does not affect the actual program code. The idea is that S/# can serve as a register base address and D can be used as an index:

ALTS index,#table ‘set next S field to table+index
MOV OUTA,0 ‘output register[table+index] to OUTA

ALTD index,#table ‘set next D field to table+index
MOV 0,INA ‘write INA to register[table+index]

ALTR index,#table ‘set next write to table+index
XOR INA,INB ‘write INA^INB to register[table+index]

The docs were not clear as to which bits are being used.

However, the issue is that normally when a sub-range of bits are used, the others are ignored.

This is not the case here, and in particular, ALTS overwrites the D value, and in some cases it is incorrect.
Also, the result passed through to the next instruction after the ALTS is also incorrect if the ignored bits are not zero.

If ALTS where D=$144 and S=$144, D is overwritten as $5D and the passed S is $88. Note that S[7:0] seems to be added and S[8] results in a subtraction from the D value.

While it is not a show-stopper, it best be spelt out as it is a restriction, not normal use.
ALTS sums D[8:0] and S/#[5:0]. All other bits of D & S must be zero

BTW I often store values in unused bits.
I note so does Chip...
.v _ret_ cmp 0,#0 'bottom byte used as a counter

I discovered this as a consequence of trying to use ALTS followed by a RDLONG, expecting an S result of 20bits

cgracey · 2018-04-28 11:58

Whoops!

The doc's should say...

"These instructions sum their D[8:0] and S/#[8:0] values to compute a 9-bit address that is directly substituted into the next instruction’s S field (ALTS), D field (ALTD), or result register address (ALTR) (normally, this is the same as the D field).

Upper bits of S are used to inc/dec D[8:0].

Cluso99 · 2018-04-28 13:43

cgracey wrote: »

Whoops!

The doc's should say...

"These instructions sum their D[8:0] and S/#[8:0] values to compute a 9-bit address that is directly substituted into the next instruction’s S field (ALTS), D field (ALTD), or result register address (ALTR) (normally, this is the same as the D field).

Upper bits of S are used to inc/dec D[8:0].

So S[D] means base[index], and the new incremented/decremented index is rewritten by ALTS

Needs more explanation in the docs (later)

Looks like the following happens...

Inserts S into the next instruction...
Take original D[31:0] and replace D[8:0] with the sum of (D[8:0]+S[8:0]) ignoring overflow.

Take original D[31:0] and write back with ALTS instruction D[31:0] ADDS S[17:9]>>9

ozpropdev · 2018-04-29 03:54

Cluso99
Another contributing factor to the incorrect results in your tests is the effect of the
AUGS instruction also being applied to the instruction following the ALTS.
This can be demonstrated with the following code.

		mov	y,#$144
		alts	y,##$11223344
		mov	a,#0-0

		mov	y,#$144
		mov	x,##$11223344
		alts	y,x
		mov	a,#0-0

Results :
'


00000: F6045144              MOV     y,#$144
00001: FF089119              AUGS    #$89119
00002: F9945144              ALTS    y,#$144 {$11223344}
00003: F6045200              MOV     a,#$000

COG | $00029: "a"      $11223288 %00010001_00100010_00110010_10001000 #287453832

00004: F6045144              MOV     y,#$144
00005: FF089119              AUGS    #$89119
00006: F6045544              MOV     x,#$144 {$11223344}
00007: F990502A              ALTS    y,x
00008: F6045200              MOV     a,#$000

COG | $00029: "a"      $00000088 %00000000_00000000_00000000_10001000 #136

Cluso99 · 2018-04-29 04:08

I did do results with presetting the S & D values without AUGS.

 mov  base,  #$144      ' base                           
 mov  ndx,   #$144      ' original index                 
 mov  index, ndx        ' index                          
 alts index, base       ' index,base  s[d] = base[index] 
 mov  answer, #0                                         
       ----base--- ---index--- ---index--- ---answer--
  084- 44 01 00 00 44 01 00 00 44 01 00 00 88 00 00 00  'D...D...D.......'
                                              ^^
 -----------------------------------------------------
 mov  base,  ##$244     ' base +1
 mov  ndx,   ##$144     ' original index
 mov  index, ndx        ' index
 alts index, base       ' index,base  s[d] = base[index]
 mov  answer, #0
  084- 44 02 00 00 44 01 00 00 45 01 00 00 88 01 00 00
                               +1
 -----------------------------------------------------
 mov  base,  ##$444     ' base +2
 mov  ndx,   ##$144     ' original index
 mov  index, ndx        ' index
 alts index, base       ' index,base  s[d] = base[index]
 mov  answer, #0
  084- 44 04 00 00 44 01 00 00 46 01 00 00 88 01 00 00
                               +2
 -----------------------------------------------------
 mov  base,  ##$844     ' base +4
 mov  ndx,   ##$144     ' original index
 mov  index, ndx        ' index
 alts index, base       ' index,base  s[d] = base[index]
 mov  answer, #0
  084- 44 08 00 00 44 01 00 00 48 01 00 00 88 01 00 00
                               +4
 -----------------------------------------------------

You will note the overflow loss. I did work out which bit made the ALTS decrement D.
IIRC it was s[17] and S[16:9] is the value to add/subtract.

Certainly the instruction needs expanding, including a warning about the caveats of lost overflow and non-zero upper bits.

 mov  base,  ##%0000_0000__0000_00___111110000___101000100  ' $xxxx_xx44 base -16??
 mov  base,  ##%0000_0000__0000_00___011110000___101000100  ' $xxxx_xx44 base +$F0 -> $0134

I do understand the advantages the instruction gives, but it is not obvious, and a trap.

jmg · 2018-04-29 04:41

cgracey wrote: »

Upper bits of S are used to inc/dec D[8:0]

Cluso99 wrote: »

Take original D[31:0] and write back with ALTS instruction D[31:0] ADDS S[17:9]>>9
<then>
Certainly the instruction needs expanding, including a warning about the caveats of lost overflow and non-zero upper bits.

I'm not following here, the equation given shows a 31:0 wide addition, but Chip's comment, and the lost overflow comment, both suggest this +/- wraps inside the lower 9 bits D[8:0] ?

Cluso99 · 2018-04-29 05:14

jmg wrote: »

cgracey wrote: »

Upper bits of S are used to inc/dec D[8:0]

Cluso99 wrote: »

Take original D[31:0] and write back with ALTS instruction D[31:0] ADDS S[17:9]>>9
<then>
Certainly the instruction needs expanding, including a warning about the caveats of lost overflow and non-zero upper bits.

I'm not following here, the equation given shows a 31:0 wide addition, but Chip's comment, and the lost overflow comment, both suggest this +/- wraps inside the lower 9 bits D[8:0] ?

yes

jmg · 2018-04-29 05:21

Cluso99 wrote: »

yes

So the equation you gave above needs to be modified, to refect the wrap ?
Such a wrap is not going to be very compiler/relocatable friendly.

Cluso99 · 2018-04-30 11:23

Chip or anyone,
I forget, how much HUB ROM at the top needs to be kept free for the debug and interrupts?

ozpropdev · 2018-04-30 11:49

2K $ff800 to $fffff

isr_address = $ff840 + (!cog & $f) << 7
reg_buffer = $ff800 + (!cog & $f) << 7

cgracey · 2018-04-30 13:13

Cluso99 wrote: »

Chip or anyone,
I forget, how much HUB ROM at the top needs to be kept free for the debug and interrupts?

There are 16 longs for buffering cog regs $000..$00F and 16 longs for the debug routine. That's 32 longs per cog. Times 8 cogs means 256 longs, or 1KB. So, $FFC00...$FFFFF are directly needed, plus maybe that much, again, for overlay code to make the whole debugger work. So, perhaps $FF800..$FFFFF should be reserved.

ozpropdev · 2018-04-30 13:27

ozpropdev wrote: »

2K $ff800 to $fffff

isr_address = $ff840 + (!cog & $f) << 7
reg_buffer = $ff800 + (!cog & $f) << 7

I'm still trying to hang onto 16 cogs!

isr_address = $ffc40 + (!test_cog & 7) << 7
reg_buffer = $ffc00 + (!test_cog & 7) << 7

cgracey · 2018-04-30 13:51

Yes, allowance for 16 cogs would work on any possible Prop2 chip.

Cluso99 · 2018-04-30 23:36

None of that code is being setup by the ROM though?

So we can use that area to preload something??? Because all of a sudden, 16KB ROM doesn't seem that much

Cluso99 · 2018-05-01 00:55

Chip,
I seem to be unable to successfully test for a pull-down.

By chance, is there a weak pull-down enabled in the FPGA? And if so, is this also present in the real P2 silicon?

My mistake! Of course the pins without pull-downs will read as high

Here is my test code

test_pullups    mov     pullups, #0
                mov     pulldns, #0

                callpa  #cfg_dq,#check_pullup   'spi_dq pull-up?     load from SERIAL
        if_c    bith    pullups, #0             ' y:
                outh    pa                      'spi_dq pull-down?   ignore SERIAL
                callpa  #cfg_dq,#check_pulldn   '
        if_c    bith    pulldns, #0             ' y:

                callpa  #cfg_cs,#check_pullup   'spi_cs pull-up?     load from FLASH
        if_c    bith    pullups, #2             ' y:
                outh    pa                      'spi_dq pull-down? 
                callpa  #cfg_cs,#check_pulldn   '
        if_c    bith    pulldns, #2             ' y:

                callpa  #cfg_ck,#check_pullup   'spi_ck pull-up?     load from SD
        if_c    bith    pullups, #1             ' y:
                outh    pa                      'spi_ck pull-down?   
                callpa  #cfg_ck,#check_pulldn   '
        if_c    bith    pulldns, #1             ' y:

'+-----------------------------------------------------------------------------+
'+      Check pin pull-up (pull-down)                                          +
'+-----------------------------------------------------------------------------+
check_pullup    outl    pa                      'out bit low
check_pulldn    dirh    pa                      'drive pin low (out bit must be low)
                waitx   #30*1                   'wait >1us
                dirl    pa                      'float pin
                waitx   #30*5                   'wait >5us
        _ret_   testp   pa              wc      'sample pin, c=1 if pull-up
'+-----------------------------------------------------------------------------+

ozpropdev · 2018-05-01 02:17

Also your pulldns result will be the same as pullups result.

'should be
        if_nc    bith    pulldns, #0
'..etc

Cluso99 · 2018-05-01 03:09

ozpropdev wrote: »
Also your pulldns result will be the same as pullups result.
'should be
        if_nc    bith    pulldns, #0
'..etc

yes, i did change that too

Cluso99 · 2018-05-01 04:56

Using v32c fpga code, Windows 10, PST (115,200) and a BeMicroCV-A9

I cannot get the autobaud/Prop_Chk sequence to return anything. pnut will download code.

Maybe the timeout isn't long enough to manually type in the sequence. I tried to load a new ROM code with bigger timeout into $4C000 but that didn't work.

Can someone please confirm that the manual sequence "> " and then "Prop_Chk<cr>" does indeed work (or not) with W10 & PST?

jmg · 2018-05-01 05:09

Cluso99 wrote: »

Using v32c fpga code, Windows 10, PST (115,200) and a BeMicroCV-A9

I cannot get the autobaud/Prop_Chk sequence to return anything. pnut will download code.

Maybe the timeout isn't long enough to manually type in the sequence.

The code shows a 100ms monostable, resets on each char, so I doubt you would be able to type that - most terminals can paste strings tho ?

My notes from earlier postings say send string of 19 chars is min, "> Prop_Chk 0 0 0 0 "
and echo is 15 chars of CR+LF+“Prop_Ver Au”+CR+LF

You could also capture what pnut sends, as a sanity check.

Peter Jakacki · 2018-05-01 05:17

I tried that in a WIN7 system with TeraTerm and typed fairly slow and it worked responding with "Prop_Ver A" so I don't think it is timing out. There is something definitely wrong with Cluso's setup.

ozpropdev · 2018-05-01 05:18

Cluso99 wrote: »

Using v32c fpga code, Windows 10, PST (115,200) and a BeMicroCV-A9

I cannot get the autobaud/Prop_Chk sequence to return anything. pnut will download code.

Maybe the timeout isn't long enough to manually type in the sequence. I tried to load a new ROM code with bigger timeout into $4C000 but that didn't work.

Can someone please confirm that the manual sequence "> " and then "Prop_Chk<cr>" does indeed work (or not) with W10 & PST?

Running same setup here.
> Prop_Chk 0 0 0 0 works fine.

> Prop_Chk 0 0 0 0

Prop_Ver F

Edit: Make sure you issue a space after the ">" character.

Cluso99 · 2018-05-01 07:41

Yes, issuing a space after ">".
Gave up. Just wont work yet I can download. I can also run my own code with absolutely no problems which outputs characters, then reads and echos correctly.

ozprop What are the 0 0 0 0 after the Prop_Chk ? Are they necessary??? I gave it "> Prop_Chk<cr>" and maybe "<lf>" (not sure what enter gives on PST.

Postedit: Just spoke to Peter. Yep requires the 0 0 0 0

Peter Jakacki · 2018-05-01 08:01

You need the 0 0 0 0 as a "select all Props" mask I believe and I know it doesn't work if you leave them out.

Just copy and paste the next two lines (includes a CR)

> Prop_Chk 0 0 0 0

jmg · 2018-05-01 09:05

Peter Jakacki wrote: »
Just copy and paste the next two lines (includes a CR)
> Prop_Chk 0 0 0 0

I think it replies on the 4th 0, and does not need a trailing space, or <cr> ? oops, get_hex (of course) needs anything non-hex to exit, however ">" is swallowed inside INT as timing char.

One detail worth checking is to send a large file of many repeating 18 19 char blocks

> Prop_Chk 0 0 0 0 > Prop_Chk 0 0 0 0 > Prop_Chk 0 0 0 0 > Prop_Chk 0 0 0 0 > Prop_Chk 0 0 0 0 > Prop_Chk 0 0 0 0 > Prop_Chk 0 0 0 0 > Prop_Chk 0 0 0 0 > Prop_Chk 0 0 0 0 > Prop_Chk 0 0 0 0 > Prop_Chk 0 0 0 0

That repeat is what MCU booters, and PC hosts, can repeat for the shortest possible reset-boot times.

Test is to confirm P2 always autobauds and echos ID(s), no matter what the reset exit phase is ?
The echo ID is 15 bytes for 18 19 bytes in, and I see a largish 1ms allowed for turnaround, which is 11.52 char times at 115200,
so that means once out of reset, P2 will echo one ID (15+12chT) for every two Prop_Chks (36chT) received.

A small turnaround is useful, but that 1ms sounds high, for Baud speeds of 115200~2MBd ?

If that is dropped to 1 Char time, the echo now always takes 15+1 chars, and gives a 3 char margin to catch the next _Chk - thus will echo on every _Chk, thanks to rx FIFO.

edit: corrected get_hex exit and TX:RX timing

Cluso99 · 2018-05-01 09:43

It needs a trailing whitespace (space, cr etc)

ozpropdev · 2018-05-01 09:51

jmg wrote:

One detail worth checking is to send a large file of many repeating 18 char blocks

Sending a repeating string of "> Prop_Chk 0 0 0 0 " responds to each command with "Prop_Ver F" as expected.

ozpropdev · 2018-05-01 10:10

Prop_Chk <INAmask> <INAdata> <INBmask> <INBdata>

A while back I hooked up 3 Nano's to a propplug and configured assorted pullup/downs to configure their ID's.
I was able to load the nano's seperately and in parallel with ease.
Works great!

jmg · 2018-05-01 20:43

ozpropdev wrote: »

jmg wrote:

One detail worth checking is to send a large file of many repeating 18 char blocks

Sending a repeating string of "> Prop_Chk 0 0 0 0 " responds to each command with "Prop_Ver F" as expected.

hmm... Was that a packed/repeating string, no gaps ? - and every _Chk gave a _Ver ? ie you checked counts ?

That's not quite what I'd expect from the 1ms TX pause, as that adds 11.52 chars(!) to the 15ch reply, and now the rx block IN(19) is smaller than the tx block OUT(11.52+15), and the Rx fifo overruns/wraps

If there is instead a 1-char time TX pause, now the Tx block fits inside the Rx timing, and you can get a _Ver for every _Chk

present 1ms turnaround (w..w = wait, varies with baud rate, so unpredictable behaviour) RxFIFO - 16 bytes
RX:    > Prop_Chk 0 0 0 0 > Prop_Chk 0 0 0 0 > Prop_Chk 0 0 0 0 > Prop_Chk 0 0 0 0 
FIFO:  -000000000000000000-123456789abcdefWWWWWWWWWW??
TX:                       wwwwwwwwwwwclProp_Ver Acl                                wwwwwwwwwwwclProp_Ver Acl

proposed 1 char turnaround (w=wait)
RX:    > Prop_Chk 0 0 0 0 > Prop_Chk 0 0 0 0 > Prop_Chk 0 0 0 0 > Prop_Chk 0 0 0 0 
FIFO:  -000000000000000000-123456789abcde0000-123456789abcde0000-123456789abcde000
TX:                       wclProp_Ver Acl    wclProp_Ver Acl    wclProp_Ver Acl   wclProp_Ver Acl

ozpropdev · 2018-05-02 03:50

jmg wrote: »

hmm... Was that a packed/repeating string, no gaps ? - and every _Chk gave a _Ver ? ie you checked counts ?

The test sent 10 x "> Prop_Chk 0 0 0 0 " @ 115200 with a 3mS gap.
10 x "Prop_Ver F" responses were returned.
As the boot loader isn't a full duplex arrangement I wasn't expecting it to respond during the response message transmission.
Sorry I was a bit vague on that detail.

Prop2 FPGA files!!! - Updated 2 June 2018 - Final Version 32i

Comments