Prop2 FPGA files!!! - Updated 2 June 2018 - Final Version 32i

Dave Hein · 2018-12-04 20:25

evanh, could you explain in more detail what you're suggesting? The ORG directive currently tells the assembler to go into cog mode, and it sets the starting cog address. What else should it do?

evanh · 2018-12-05 01:49

I'm fine with ORG, that was just a passing remark. It's LOC that needs the work.

EDIT: I'd call it a base address rather than start address. "Start" might be mistaken for start of execution.
EDIT2: Hmm, base is wrong too, ORG is not a relative thing at all. Section origin then.

Dave Hein · 2018-12-05 02:05

I added a warning in p2asm when LOC is used with a relative address. I think that should be sufficient. Maybe PNut should add that as well.

evanh · 2018-12-05 02:20

Thanks.

potatohead · 2018-12-05 18:30

evanh wrote: »

It's not base-relative but PC-relative. PC-relative only makes sense for actual branches.

That really isn't true. One of the things chip made explicit early on was the fact that data can be intermingled with code.

ersmith · 2018-12-05 19:05

Dave Hein wrote: »

I added a warning in p2asm when LOC is used with a relative address. I think that should be sufficient. Maybe PNut should add that as well.

I still don't quite see the danger of using LOC with a relative address, at least one above $400. After this code:

   orgh $400
   loc pa, #@label  ' relative addressing
   loc pb, #\@label ' absolute addressing
   cogstop #0
label
   long 1

PA and PB should have the same value. Am I missing something?

Dave Hein · 2018-12-05 21:21

PA will contain $40C-$400 = $C. PB will contain $40C.

Dave Hein · 2018-12-05 21:28

I was wrong. PA will contain a value of 8. Here's the listing from p2asm:

                   dat
00400                 orgh $400
3: WARNING: Relative mode used with LOC instruction
00400     fe900008    loc pa, #@label  ' relative addressing
00404     fea0040c    loc pb, #\@label ' absolute addressing
00408     fd640003    cogstop #0
0040c              label
0040c     00000001    long 1

As I said before, the might be value in using the difference for position-independent code.

Dave Hein · 2018-12-05 21:36

OK, I was wrong. I ran this under spinsim, and I got PA=$1020 and PB=$1030. This may be correct, or p2asm may be wrong, or spinsim might be wrong. I'll have to check the binary with PNut's binary.

ozpropdev · 2018-12-05 21:49

Both PA/PB will contain $408 $40C

Edit: typo

ersmith · 2018-12-05 22:48

Dave Hein wrote: »

OK, I was wrong. I ran this under spinsim, and I got PA=$1020 and PB=$1030. This may be correct, or p2asm may be wrong, or spinsim might be wrong. I'll have to check the binary with PNut's binary.

p2asm looks OK, it produces the same thing as fastspin does, and when I run the result on the FPGA both pa and pb have the same value. I've attached the code I used for testing: foo.bas is the original source, foo.spin2 is the raw PASM produced by fastspin, foo.lst is the listing file that p2asm produces when it compiles foo.spin2. The output is:

$ bin/loadp2 foo.binary -t
[ Entering terminal mode.  Press ESC to exit. ]
getting values
pa=1064 pb=     1064

which is correct (1064 = $428, which is where the label ends up in memory).

Note that there's a bug in fastspin 3.9.10 such that it cannot handle @ and \ in inline assembly. That's fixed in the current github sources, so you'll need to use those if you want to regenerate foo.spin2.

ozpropdev · 2018-12-05 23:04

Checked in Pnut

 orgh $400
   loc pa, #@label  ' relative addressing
   loc pb, #\@label ' absolute addressing
   cogstop #0
label

shows PA = $8 PB = $40c

00400- 08 00 90 FE 0C 04 A0 FE 03 00 64 FD 00 00 00 00   '..........d.....'

but this code shows PA = $40C and PB = $40C

orgh	$400
	org
	loc	pa,#@label
	loc	pb,#\@label
	cogstop	#0
label

shows

00400- 0C 04 80 FE 0C 04 A0 FE 03 00 64 FD 00 00 00 00   '..........d.....'

Edit: Pnut switches to absolute because the ORG directive causes a domain crossiing.

ersmith · 2018-12-06 00:40

ozpropdev wrote: »

Checked in Pnut

 orgh $400
   loc pa, #@label  ' relative addressing
   loc pb, #\@label ' absolute addressing
   cogstop #0
label

shows PA = $8 PB = $40c

00400- 08 00 90 FE 0C 04 A0 FE 03 00 64 FD 00 00 00 00   '..........d.....'

I think you're confusing the instruction encoding with what is actually put in the register when the instruction executes. If you execute the "relative addressing" version of the loc instruction, the PC after the instruction ($404) is added to the offset ($8 in this case) to get the final value of $40c. In other words, at run time PA and PB will end up with the same value of $40c in them when the two instructions execute.

(Try it!)

Cluso99 · 2018-12-06 01:02

ersmith wrote: »
ozpropdev wrote: »
Checked in Pnut
 orgh $400
   loc pa, #@label  ' relative addressing
   loc pb, #\@label ' absolute addressing
   cogstop #0
label
shows PA = $8 PB = $40c
00400- 08 00 90 FE 0C 04 A0 FE 03 00 64 FD 00 00 00 00   '..........d.....'
I think you're confusing the instruction encoding with what is actually put in the register when the instruction executes. If you execute the "relative addressing" version of the loc instruction, the PC after the instruction ($404) is added to the offset ($8 in this case) to get the final value of $40c. In other words, at run time PA and PB will end up with the same value of $40c in them when the two instructions execute.

(Try it!)

WHAT ?!?!

evanh · 2018-12-06 01:18

hehe ... isn't it lovely ... come join me in my torment ... hahaha

ersmith · 2018-12-06 01:21

Cluso99 wrote: »
ersmith wrote: »
ozpropdev wrote: »
Checked in Pnut
 orgh $400
   loc pa, #@label  ' relative addressing
   loc pb, #\@label ' absolute addressing
   cogstop #0
label
shows PA = $8 PB = $40c
00400- 08 00 90 FE 0C 04 A0 FE 03 00 64 FD 00 00 00 00   '..........d.....'
I think you're confusing the instruction encoding with what is actually put in the register when the instruction executes. If you execute the "relative addressing" version of the loc instruction, the PC after the instruction ($404) is added to the offset ($8 in this case) to get the final value of $40c. In other words, at run time PA and PB will end up with the same value of $40c in them when the two instructions execute.

(Try it!)
WHAT ?!?!

Look at foo.spin2 and/or foo.lst that I posted a few pages back (that's foo.bas converted to PASM2 by fastspin). The relevant instructions are:

                   ' 
                   ' sub getlabelvals()
00408              _getlabelvals
                   '   asm
00408     fe90001c 	loc	pa, #@label
0040c     fea00428 	loc	pb, #\@label
00410     f6006df6 	mov	_var_00, pa
00414     f6006ff7 	mov	_var_01, pb
                   '   paval = x
00418     fc606c2b 	wrlong	_var_00, objptr
                   '   pbval = y
0041c     f1045604 	add	objptr, #4
00420     fc606e2b 	wrlong	_var_01, objptr
00424     f1845604 	sub	objptr, #4
                   ' label:
00428              label
00428              _getlabelvals_ret
00428     fd64002e 	reta

Note that the first loc is encoded as $fe90001c (relative addressing) whereas the second loc is encoded as $fea00428 (absolute addressing). At runtime they both put $428 into the respective registers, as is proven by the program output.

The reason is simple: the PC relative "loc" instruction adds the next PC (PC+4) to the offset to get the value to put into the register, just like a relative "jmp" adds the next PC to the offset to get the new PC. So the first loc, at address $408, adds $40c to the offset $1c to get the final value $428.

Note that it isn't *just* the offset that is different in the two loc encodings, there's actually a bit in the instruction that says whether the offset is absolute or relative.

You should be able to assemble and run foo.spin2 with PNut to verify this. Actually maybe not, it may use @@@, so you may have to use fastspin or p2asm. But all 3 assemblers agree about the encoding of the LOC instructions, so this isn't some quirk of fastspin or p2asm, it's the way the hardware works.

evanh · 2018-12-06 01:38

ersmith wrote: »

The reason is simple: the PC relative "loc" instruction adds the next PC (PC+4) to the offset to get the value to put into the register, just like a relative "jmp" adds the next PC to the offset to get the new PC. So the first loc, at address $408, adds $40c to the offset $1c to get the final value $428.

Oh, oops, I've not been examining the final register content ... and I was convinced I was too, damn ...

Dave Hein · 2018-12-06 01:53

I get $40C for both cases when running on the FPGA. However, spinsim seems to be confused. It produces $1020 and $1030. It's shifting the value up by 2 bits, which means it must think it's in the COG mode.

If I move the routine to a different location other than $400 I get the correct value in the relative mode, but an incorrect value in the absolute mode. This kind of shows the value of having position-independent-code. It appears that my linker isn't adjusting the address for the absolution mode. It doesn't surprise me since I don't recall handling relocation for the LOC command.

I'm going to take the warning print out for the LOC command.

ozpropdev · 2018-12-06 03:09

I get $40C in both cases on silicon.

evanh · 2018-12-06 03:58

The specified ORG addresses are for the "label". There are actually three labels, one for each ORG case.

===================================================
 LOC/MOV syntax        PA register results
 from hubRAM         ORG $0F  ORGH $110  ORGH $600
===================================================
loc  pa, #label     0000000f   00000110   00000600
loc  pa, #@label    0000003c   00000110   00000600
loc  pa, #\label    0000000f   00000110   00000600
loc  pa, #\@label   0000003c   00000110   00000600
mov  pa, ##label    0000000f   00000110   00000600
mov  pa, ##@label   0000003c   00000110   00000600
===================================================
 LOC/MOV syntax        PA register results
 from cogRAM         ORG $0F  ORGH $110  ORGH $600
===================================================
loc  pa, #label     000fffee   000003e9   00000600
loc  pa, #@label    00000081   000003c8   00000600
loc  pa, #\label    0000000f   00000110   00000600
loc  pa, #\@label   0000003c   00000110   00000600
mov  pa, ##label    0000000f   00000110   00000600
mov  pa, ##@label   0000003c   00000110   00000600

evanh · 2018-12-06 04:16

Here's code for one line:

		loc     pa, #\@str1
		call    #puts
		loc     pa, #palette1
		call    #itoh
		call    #putsp
		loc     pa, #palette2
		call    #itoh
		call    #putsp
		loc     pa, #palette3
		call    #itoh
		call    #putnl

evanh · 2018-12-06 04:28

Apologies on the PC-relative complaint. I was way off there.

There is still the bug in Pnut though. It is in the cogexec LOC instruction encoding for PC-relative encoding below absolute $400. I guess that's where Cluso came unstuck and got me digging.

evanh · 2018-12-06 04:32

Here's another one:
I've just been experimenting with building some diagnostic code and discovered it would be nice to know if the caller code was from cogexec or hubexec. A third status bit in the stacked address maybe.

In this case I'm wanting a subroutine to extract the encoding of the instruction prior to the call. If I don't know whether the caller was in cogexec at the time or not then I can't calculate the relative address from the call stack.

EDIT: Ah, forgot that code can't execute below $400 in hubRAM. That should be enough ...

EDIT2: And working source code:

		pop     char                  'grab caller address
		push    char                  'restack it

		cmp     char, ##$400   wcz    'test if caller was cogexec or hubexec, C = borrow of (D - S)

if_c		sub     char, #2              'was cogexec
if_c		alts    char                  'MOV indirection - get register content of register number in "char"
if_c		mov     pa, 0-0

if_nc		sub     char, #8              'was hubexec
if_nc		rdlong  pa, char

evanh · 2018-12-06 06:04

Here's the results of above:

 LOC/MOV syntax         ORG $00F           ORGH $00110         ORGH $00600
 from hubRAM         op-code  PA-data    op-code  PA-data    op-code  PA-data
==============================================================================
loc  pa, #label     fe80000f 0000000f   fe800110 00000110   fe900198 00000600
loc  pa, #@label    fe80003c 0000003c   fe800110 00000110   fe900174 00000600
loc  pa, #\label    fe80000f 0000000f   fe800110 00000110   fe800600 00000600
loc  pa, #\@label   fe80003c 0000003c   fe800110 00000110   fe800600 00000600
mov  pa, ##label    f607ec0f 0000000f   f607ed10 00000110   f607ec00 00000600
mov  pa, ##@label   f607ec3c 0000003c   f607ed10 00000110   f607ec00 00000600

 LOC/MOV syntax         ORG $00F           ORGH $00110         ORGH $00600
 from cogRAM         op-code  PA-data    op-code  PA-data    op-code  PA-data
==============================================================================
loc  pa, #label     fe9fffe0 000ffff7   fe9003dc 000003f5   fe800600 00000600
loc  pa, #@label    fe900070 00000090   fe9003b8 000003da   fe800600 00000600
loc  pa, #\label    fe80000f 0000000f   fe800110 00000110   fe800600 00000600
loc  pa, #\@label   fe80003c 0000003c   fe800110 00000110   fe800600 00000600
mov  pa, ##label    f607ec0f 0000000f   f607ed10 00000110   f607ec00 00000600
mov  pa, ##@label   f607ec3c 0000003c   f607ed10 00000110   f607ec00 00000600

evanh · 2018-12-06 08:20

Wow, that detail is needed. Pnut is making a mess of the PC-relative LOC encodings. Only six of the twelve PC-relative combinations above is correct. Even two of the hubexec encodings ($fe800110 is absolute encoding) is wrong because it is using absolute encoding below $400 in hubRAM where it should still be PC-relative.

Or is that case intentional because hubexec can't go there?

Cluso99 · 2018-12-06 08:40

evanh wrote: »

Wow, that detail is needed. Pnut is making a mess of the PC-relative LOC encodings. Only six of the twelve PC-relative combinations above is correct. Even two of the hubexec encodings ($fe800110 is absolute encoding) is wrong because it is using absolute encoding below $400 in hubRAM where it should still be PC-relative.

Or is that case intentional because hubexec can't go there?

LOC is also usable to obtain the hub address of a table, which can reside below or above hub $400.

That is one of the problems I found - the hard way as I wasted a whole day trying to find a bug in my program.

evanh · 2018-12-17 02:40

Chip,
I've bumped into a design flaw/bug in lutRAM sharing! RDLUT data, or address, is being garbaged if the sharing cog WRLUTs to the same address on the same sysclock.

In my case, an instruction stall would also mess me up but it would have to be a number of clocks to produce the result I'm getting.

PS: I'm very certain. Testing is on P123 board with v32i image loaded.

jmg · 2018-12-17 02:47

evanh wrote: »

Chip,
I think I've bumped into a design flaw/bug in lutRAM sharing! RDLUT data, or address, is being garbaged if the sharing cog WRLUTs to the same address on the same sysclock.

In my case, an instruction stall would also mess me up but it would have to be a number of clocks to produce the result I'm getting.

Do you mean RDLUT from COGn, occurring on the same address, and on the same sysclk as the WRLUT from COGm, is corrupted ?
ie it is neither the old value, nor the new value ?
Do you have come test code that reproduces this ?

evanh · 2018-12-17 02:52

jmg wrote: »

ie it is neither the old value, nor the new value ?

Definitely not the new value. I don't think an old value could upset things the way it has because it will be the same every time and the importance of the data is metronomic ...

Do you have come test code that reproduces this ?

It's messy and non-specific.

evanh · 2018-12-17 02:58

The source as is:

'==================================
' paired mailbox
'==================================
ORG $3fe
monitor         res     1
duration        res     1


'==================================
' Sinc3 filter (cogexec, paired)
'==================================
ORG
start_sinc3
cid		cogid   cid
		testb   cid, #0         wz
if_z		jmp     #start_monitor        'identical code on paired cogs

		wrpin   ##%0111_0000_000_0000100000000_00_01111_0, #mpin
		                'adc/counter mode, bitstream is #tpin, clock input is #mpin (#tpin+1)
		wypin   #0, #mpin             'inc on high
		wxpin   #0, #mpin             'totaliser
		dirh    #mpin                 'enable smart pin

'Sinc3 loop (8 sysclocks)
		rep     @.lend, #0            'loop forever

		rdpin   acc1, #mpin
		add     acc2, acc1
		add     acc3, acc2
		wrlut   acc3, #(monitor & $1ff)           'for the decimator (lut sharing is active)
'		add     acc4, acc3
'		wrlut   acc4, #(monitor & $1ff)           'for the decimator (lut sharing is active)
.lend
		cogstop cid



acc1		long    0
acc2		long    0
acc3		long    0
acc4		long    0
diff1		long    0
diff2		long    0
diff3		long    0
diff4		long    0
period		long    400                   'max 1024 clocks, needs 30-bit registers


'============================
' Monitor (cogexec, paired)
'============================
start_monitor
		lutson
samp		wrpin   ##%1010000000000_00_00010_0, #1
				'set DAC mode for DAC1/Blue, 16-bit dither smartpin mode
offset		wxpin   #1, #1                'continuous dither
scale		wypin   #0, #1                'DAC level
		dirh    #1                    'enable DAC

delay		setse1  #%110_000000|rx_pin     'high IN from smartpin

tick		getct   tick
key		addct1  tick, period          'start the clock!

'keyboard controls
'===================
.keyboard
		rdpin   key, #rx_pin
		shr     key, #32-8           'lsbit align

'scaling recalculation
		encod   scale, period
		shr     scale, #1
		mul     scale, #8

		cmp     key, #"+"      wz
if_z		add     offset, #511
		cmp     key, #"-"      wz
if_z		sub     offset, #511

		cmp     key, #"*"      wz
if_z		add     period, scale
		cmp     key, #"/"      wz
if_z		sub     period, scale
		fles    period, #504           'max of 1024 clocks, multiple of 8
		fges    period, #32            'fastest monitor loop time, multiple of 8

		mov     delay, period
		sub     delay, #25
		waitx   #6                     'lutRAM sharing flaw!!  Change the #6 to hide it

'monitor loop
'==============
.monl
'		waitct1
'		addct1  tick, period

'		rep     @.monend, #0          'loop forever

		waitx   delay                 '2
		rdlut   samp, #(monitor & $1ff) '5
		sub     samp, diff1           '7
		add     diff1, samp           '9
		sub     samp, diff2           '11
		add     diff2, samp           '13
		sub     samp, diff3           '15
		add     diff3, samp           '17
'		sub     samp, diff4
'		add     diff4, samp

'non-smartpin mode
'		shr     samp, scale           '     scale signal to fit DAC trace
'		add     samp, offset          '     offset centring
'		shl     samp, #8              '     align for 8-bit DAC1
'		setdacs samp
'smartpin mode
'		shr     samp, scale           '     scale signal to fit DAC trace
		add     samp, offset          '19   offset centring
		wypin   samp, #1              '21   smartpin 16-bit dither into DAC1
'		jse1   #.keyboard             '23   branch on event reduces monitor loop by one instruction
.monend

		jnse1   #.monl                '25   branch on event reduces monitor loop by one instruction
		jmp     #.keyboard

		cogstop cid

Prop2 FPGA files!!! - Updated 2 June 2018 - Final Version 32i

Comments