Problem with LOCLONG instruction

ozpropdev · 2014-03-22 03:11

Hi All

I'm having a problem with the LOCLONG instruction.
If the instruction sits in certain address spaces it fails.
The code below looks up a long in a table and flashes a LED representing the value. (DE2-115 FPGA)
If I insert spacer NOP's or any other instruction to move the LOCLONG away from those addresses it works.
I kept inserting spacers and a pattern emerged in the bad addresses.
The bad addresses were $E18,$E38,$E58. In another program (Toolbox) I got similar patterns $2338,$2358,$2378.
The common pattern seems to be bites 3 and 4 set in the absolute address.

dat		orgh	$e00/4	'F11 to run from PNUT
		org
		jmp	@hub_code

bx		long	0
timer		long	0
delay		long	40_000_000

		orgh
hub_code	mov	bx,#2	'set item index = 4

		nop	'fail bug address = $e18
		nop	'ok
		nop	'ok
		nop	'ok
		nop	'ok
		nop	'ok
		nop	'ok
		nop	'ok
		nop	'fail bug address = $e38
		nop	'ok
		nop	'ok
		nop	'ok
		nop	'ok
		nop	'ok
		nop	'ok
		nop	'ok
		nop	'fail bug address = $e58
		nop	'ok

bug		loclong	bx,@list
		rdlong	bx,bx

		getcnt	timer
		add	timer,delay
		waitcnt	timer,delay

:loop		setp	#32
		waitcnt	timer,delay
		clrp	#32
		waitcnt	timer,delay
		djnz	bx,@:loop

		jmp	#$

list		long	3	'index 0
		long	1	'index 1
		long	4	'index 2
		long	1	'index 3
		long	5	'index 4
		long	9	'index 5

If Ok LED flashes 4 times, a fail flashes more than I can be bothered to count!
Am I missing something?

Cheers
Brian

Cluso99 · 2014-03-22 05:30

What is more interesting it is the 6th long of the 8 WIDE instruction cache in each case.

ozpropdev · 2014-03-22 17:51

FYI Chip,
I just went back to the previous FPGA Build (6 Feb 2014) and the problem is still there.
Cheers
Brian

Bill Henning · 2014-03-22 19:13

I just tested my code to see if I'd have problems.

I forced the locptra to start out at $400 ($1000 byte address)

then I added nops to test all 8 longs in the cache line $400-$407

locptra worked in every slot.

Why don't you snag the bit-banged tx (verified working in cog and hubexec) then you won't have to count blinking lights... which are still very helpful tools!

CON
		_rx	= 91
		_tx	= 90
DAT
		orgh	$380

start		org

		clkset	#$FF		'set 80MHz
		
		'setsera	ser_parms, bit_time
		clrp	#_tx		' Enable _tx pin
		mov	dirb,leds

		mov	outb,#1		' if we don't get to another outb, we jumped to never-never land
		jmp	#hubexec

		nop
		nop
		nop

ser_parms	long	%10 << 16 + _rx << 9 + %10 << 7 + _tx
bit_time	long	80_000_000 / 115200
ch		long	0
leds		long    $FFF
y		long 	0
x		long	0
w		long	0
count		long	0
    
'----------------------------------------------------------------

		orgh $400			' force locptra to $400 - pass

		nop	' force locptra to $401 - pass
		nop	' force locptra to $402 - pass
		nop	' force locptra to $403 - pass
		nop	' force locptra to $404 - pass
		nop	' force locptra to $405 - pass
		nop	' force locptra to $406 - pass
		nop	' force locptra to $407 - pass

hubexec		locptra #hello_world		' orgh, F10 download works  $38A 

 
		mov	outb,##hubexec		' $38B/C
'		mov	outb,##hello_world	
'		getptra	outb
'		nop	

loop		rdbyte	ch, ptra++ wz		' $38D
		mov	x,ch
	if_nz	call	#tx 			' $38E
'	if_nz	serouta	ch 			' $38E
	if_nz	jmp	#loop			' $38F

		jmp	@hubexec		' $390

hello_world	byte	"Hello World!",13,10,0	' $391

bub		serouta #64
		jmp	#bub

tx              shl     x,#1                    ' insert start bit
                setb    x,#9                    ' set stop bit
		mov	count,#10
                getcnt  w                       ' get initial time
:loop           add     w,bit_time              ' add bit period to time
                shr     x,#1  wc
                setpc   #_tx                    ' write c to tx pin
		'passcnt  w
                waitcnt w,#0                    ' loop until bit period elapsed
                jnz	x,@:loop                 ' loop until 10 bits done
		add     w,bit_time              ' add bit period to time
                waitcnt w,#0                    ' loop until bit period elapsed
                ret

ozpropdev · 2014-03-22 23:23

Looking at the issue a bit deeper it appears to be related to instruction pre fetch.

With pre fetch on (Cog start up default) the code works ok on first pass then fails thereafter.

00000EDC 00000004
00000084 FCFD000D
00000084 FCFD000D
00000084 FCFD000D
00000084 FCFD000D
00000084 FCFD000D
00000084 FCFD000D

With pre fetch off the first pass fails and every other pass works.

00000084 FCFD000D
00000EDC 00000004
00000EDC 00000004
00000EDC 00000004
00000EDC 00000004
00000EDC 00000004
00000EDC 00000004

This is with the LOCLONG aligned to hub addresses with bits 3 and 4 set. (As described in first post above)

con

	tx=90
	rx=91
	baudrate = 9600

dat		orgh	$e00/4	'F11 to run from PNUT
		org
		jmp	@hub_code

ax		long	0
bx		long	0
cx		long	0
val		long	0

timer		long	0
delay		long	10_000_000
delay2		long	80_000_000 * 20

config		long	%10 << 16 | rx << 9 | %10 << 7 | tx
bittime		long	80_000_000 / baudrate

		orgh

hub_code	setsera	config,bittime
		clrp	#tx

	'	icachep		'work first pass then fails thereadter
		icachen		'fails first pass then works thereafter

		long	0[5]
	'	long	0[8]

		getcnt	timer
		add	timer,delay2
		waitcnt	timer,delay


again		mov	bx,#2

bug		loclong	bx,@list

		rdlong	cx,bx

		mov	val,bx
		call	@show_hex
		serouta	#32
		mov	val,cx
		call	@show_hex
		serouta	#13

		getcnt	timer
		add	timer,delay
		and	cx,#15

:loop		setp	#32
		waitcnt	timer,delay
		clrp	#32
		waitcnt	timer,delay
		djnz	cx,@:loop

		getcnt	timer
		mov	cx,delay2
		shr	cx,#3
		add	timer,cx
		waitcnt	timer,cx
	
		jmp	@again


show_hex	reps	#8,#6
		nop
		getnib	ax,val,#7
		cmp	ax,#9 wz,wc
	if_a	add	ax,#"A"-10
	if_be	add	ax,#"0"
		serouta	ax
		shl	val,#4
		ret

list		long	3	'index 0
		long	1	'index 1
brian		long	4	'index 2
		long	1	'index 3
		long	5	'index 4
		long	9	'index 5

I'll keep digging....

Brian

Bill Henning · 2014-03-23 06:22

Very interesting... especially since LOCPTRA does not seem to exhibit the same issue.

I wonder if the related LOCWORD LOCBYTE LOCINST have the same issue?

ozpropdev · 2014-03-23 06:40

Bill Henning wrote: »

Very interesting... especially since LOCPTRA does not seem to exhibit the same issue.

I wonder if the related LOCWORD LOCBYTE LOCINST have the same issue?

Bill
Identical fault for LOCWORD and LOCBYTE.
I'm looking at LOCINST now...

Bill Henning · 2014-03-23 06:50

I suspected as much.

LOCBASE is also part of the same group...

ozpropdev wrote: »

Bill
Identical fault for LOCWORD and LOCBYTE.
I'm looking at LOCINST now...

ozpropdev · 2014-03-23 07:13

Bill Henning wrote: »

I suspected as much.

LOCBASE is also part of the same group...

As expected LOCBASE is the same as the others.
BTW What is LOCINST supposed to do? Wasn't in the last Docs.

ctwardell · 2014-03-23 07:46

Tried this on the nano by changing the LED on P32 to an LED I have on P1.
I see the same behavior.

I like the number sequence, makes me hungry for pie...

C.W.

ctwardell · 2014-03-23 12:47

I played around with this a little more this afternoon.

I have a nano so I modified the original example so that it saves a couple of test values to the hub and then launches the monitor so I they can be reviewed.

It looks like in the cases where LOCLONG fails it is using the index value from D and the relative offset from S but is failing to include the absolute address of the instruction itself.

Examples:

When label bug is at address $e54 the value returned by the LOCLONG is $e70 which points at the correct value in the list.
When label bug is at address $e58 the value returned by the LOCLONG is $1C which is $e74 - $e58. ($e74 is the expected value)

The error pattern repeats as shown by Brian and the error value is always off by the address of the LOCLONG.

dat		orgh	$e00/4	'F11 to run from PNUT
		org
		jmp	@hub_code

bx		long	0
timer		long	0
delay		long	40_000_000
monitor		long	91 << 24 + 90 << 16 + $52C >> 2	'added for launching monitor
tval1		long	$4000	'a place to write a value to check from the monitor
tval2		long	$4004	'a place to write a value to check from the monitor

		orgh
hub_code	mov	bx,#2	'set item index = 4

		nop
		nop
		nop
		nop
		nop
		nop	'fail bug @ $e38
		nop
		nop
		nop
		nop
		nop
		nop
		nop
		nop	'fail bug @ $e58

bug		loclong	bx,@list
		wrlong	bx, tval1	'save bx so it can be checked from the monitor
		rdlong	bx, bx
		wrlong	bx, tval2	'save bx so it can be checked from the monitor
		cogrun	monitor, #0	'run the monitor


list		long	3	'index 0
		long	1	'index 1
		long	4	'index 2
		long	1	'index 3
		long	5	'index 4
		long	9	'index 5

C.W.

ozpropdev · 2014-03-23 17:30

Hi C.W.
Nice work. That explains the results nicely.
I have been testing on DE2 and DE0 with the same results.
The tests I am doing at the moment also seem to show the results being affected by what type
of instruction preceed the LOCLONG instruction. Still working on that one.
Regarding my earlier question on LOCINST it appears to return the offset from the current PC to the @label instruction.
Cheers
Brian

Post Edit: I wondered if anyone would notice the number sequence!

ozpropdev · 2014-03-23 18:28

Here's a different result. By moving the "again" label back 3 instructions makes the code work in pre fetch mode.
If pre fetch is turn off same result as before. Hopefully this makes sense to Chip and he has a "Aha" moment.

con

	tx=90
	rx=91
	baudrate = 9600

dat		orgh	$e00/4	'F11 to run from PNUT
		org
		jmp	@hub_code

ax		long	0
bx		long	0
cx		long	0
val		long	0

timer		long	0
delay		long	10_000_000
delay2		long	80_000_000 * 8

config		long	%10 << 16 | rx << 9 | %10 << 7 | tx
bittime		long	80_000_000 / baudrate

		orgh

hub_code	setsera	config,bittime
		clrp	#tx

		icachep		'work first pass then fails thereadter
	'	icachen		'fails first pass then works thereafter

		long	0[5]
	'	long	0[8]


again
		getcnt	timer
		add	timer,delay2
		waitcnt	timer,delay


'again
		mov	bx,#2

bug		loclong	bx,@list

		rdlong	cx,bx

		mov	val,bx
		call	@show_hex
		serouta	#32
		mov	val,cx
		call	@show_hex
		serouta	#13

		getcnt	timer
		add	timer,delay
		and	cx,#15 wz
	if_z	mov	cx,#15

:loop		setp	#32
		waitcnt	timer,delay
		clrp	#32
		waitcnt	timer,delay
		djnz	cx,@:loop

		getcnt	timer
		mov	cx,delay2
		shr	cx,#3
		add	timer,cx
		waitcnt	timer,cx
	
		jmp	@again


show_hex	reps	#8,#6
		nop
		getnib	ax,val,#7
		cmp	ax,#9 wz,wc
	if_a	add	ax,#"A"-10
	if_be	add	ax,#"0"
		serouta	ax
		shl	val,#4
		ret

list		long	3	'index 0
		long	1	'index 1
brian		long	4	'index 2
		long	1	'index 3
		long	5	'index 4
		long	9	'index 5

cgracey · 2014-03-23 21:12

ozpropdev wrote: »

Here's a different result. By moving the "again" label back 3 instructions makes the code work in pre fetch mode.
If pre fetch is turn off same result as before. Hopefully this makes sense to Chip and he has a "Aha" moment.

con

	tx=90
	rx=91
	baudrate = 9600

dat		orgh	$e00/4	'F11 to run from PNUT
		org
		jmp	@hub_code

ax		long	0
bx		long	0
cx		long	0
val		long	0

timer		long	0
delay		long	10_000_000
delay2		long	80_000_000 * 8

config		long	%10 << 16 | rx << 9 | %10 << 7 | tx
bittime		long	80_000_000 / baudrate

		orgh

hub_code	setsera	config,bittime
		clrp	#tx

		icachep		'work first pass then fails thereadter
	'	icachen		'fails first pass then works thereafter

		long	0[5]
	'	long	0[8]


again
		getcnt	timer
		add	timer,delay2
		waitcnt	timer,delay


'again
		mov	bx,#2

bug		loclong	bx,@list

		rdlong	cx,bx

		mov	val,bx
		call	@show_hex
		serouta	#32
		mov	val,cx
		call	@show_hex
		serouta	#13

		getcnt	timer
		add	timer,delay
		and	cx,#15 wz
	if_z	mov	cx,#15

:loop		setp	#32
		waitcnt	timer,delay
		clrp	#32
		waitcnt	timer,delay
		djnz	cx,@:loop

		getcnt	timer
		mov	cx,delay2
		shr	cx,#3
		add	timer,cx
		waitcnt	timer,cx
	
		jmp	@again


show_hex	reps	#8,#6
		nop
		getnib	ax,val,#7
		cmp	ax,#9 wz,wc
	if_a	add	ax,#"A"-10
	if_be	add	ax,#"0"
		serouta	ax
		shl	val,#4
		ret

list		long	3	'index 0
		long	1	'index 1
brian		long	4	'index 2
		long	1	'index 3
		long	5	'index 4
		long	9	'index 5

I'm glad you guys found this problem. I'll be on it tomorrow morning.

I've been thinking that for the JMP #addresslabel(>>2) issue, I'll have the assembler return >>2 values for labels in the cases of operand use. I'll see in the morning how this will work, but that would be the proper way to handle things. That way, JMP #constant would still be what you'd expect. JMP #addresslabel would also force a long-address check, which is important.

It turns out that no '>>2' was ever needed. I didn't realize that I already had the assembler handing this properly. Sorry for all the confusion on this >>2 matter. Now, to find what's wrong with LOCLONG...

Cluso99 · 2014-03-23 22:56

cgracey wrote: »

I'm glad you guys found this problem. I'll be on it tomorrow morning.

I've been thinking that for the JMP #addresslabel(>>2) issue, I'll have the assembler return >>2 values for labels in the cases of operand use. I'll see in the morning how this will work, but that would be the proper way to handle things. That way, JMP #constant would still be what you'd expect. JMP #addresslabel would also force a long-address check, which is important.

That would only be for addresses >=$200 wouldn't it? Because cog addresses are already longs.

ctwardell · 2014-03-24 02:36

Found another interesting condition.

If there is a JMP directly after the LOCLONG the error does not occur.
It does not matter if the jump is to a COG or HUB address.

If the line with the label 'buggy' is commented out to insert a NOP between the LOCLONG and the JMP the error occurs.

Question: Do the cache lines always load from a WIDE boundary such as $00, $20, $40, etc.?

dat		orgh	$e00/4	'F11 to run from PNUT
		org
		jmp	@hub_code

bx		long	0
timer		long	0
delay		long	40_000_000
monitor		long	91 << 24 + 90 << 16 + $52C >> 2	'added for launching monitor
tval1		long	$4000	'a place to write a value to check from the monitor
tval2		long	$4004	'a place to write a value to check from the monitor


cog_code	nop
		jmp	@hub_code2

		orgh
hub_code	mov	bx,#2	'set item index = 4

		nop
		nop
		nop
		nop	'fail bug @ $e38 when line 'buggy' uncommented, no fail when line 'buggy' commented out
		nop
		nop
		nop
		nop
		nop
		nop
		nop
		nop	'fail bug @ $e58 when line 'buggy' uncommented, no fail when line 'buggy' commented out
		nop

bug		loclong	bx,@list
'buggy		nop
		jmp	@cog_code	'can also use jmp @hub_code2, this was to see if the the jump target mattered
hub_code2	wrlong	bx, tval1	'save bx so it can be checked from the monitor
		rdlong	bx, bx
		wrlong	bx, tval2	'save bx so it can be checked from the monitor
		cogrun	monitor, #0	'run the monitor


list		long	3	'index 0
		long	1	'index 1
		long	4	'index 2
		long	1	'index 3
		long	5	'index 4
		long	9	'index 5

C.W.

Bill Henning · 2014-03-24 05:34

Yes, the WIDE's always load from a 32 byte (8 long) boundary

cgracey · 2014-03-24 10:41

ctwardell wrote: »

It looks like in the cases where LOCLONG fails it is using the index value from D and the relative offset from S but is failing to include the absolute address of the instruction itself.

Good sleuthing, ctwardell!

I was running tests of my own and I remembered that someone said this about the address of the instruction being missing. I did the math and, sure enough, that was the problem, all right.

I had accidentally omitted the pipeline-stage-advance signal from a flop clock gate, so when the cache was reloading, it clocked more than once, and after the first time, the data became errant. I fixed it and now I'm checking for any other such omissions.

I'll do a recompile soon and post new files.

GOOD JOB DISCOVERING THIS, GUYS!!!

Cluso99 · 2014-03-24 13:44

WTG guys. One bug killed

cgracey · 2014-03-24 16:46

I updated the FPGA configuration file here:

http://forums.parallax.com/showthread.php/125543-Propeller-II-update-BLOG?p=1251927&viewfull=1#post1251927

This fixes the LOCxxxx bug.

Cluso99 · 2014-03-24 21:22

Here is a little test I ran to call some of the monitor routines from a different cog program.
Unfortunately it is not possible to use all the routines! For instance, tx_string calls the cog routine rdstring (which of course is not present).
It is ok to call tx_crlf though. It is also possible to use the text strings such as hello, hitspace and error.

Note that the compiler requires the address in an EQU (ie in a CON block) to be shifted >>2. This is both for jmp/call and for locptra/b/etc. The shift is not required when referencing labels for data/instructions within a DAT. It is a pnut inconsistency that can be looked at later.

So here is an example...

CON
  _clkmode = xinput
  _xinfreq = 80_000_000
  _baud    = 115_200
  _bitrate = _xinfreq / _baud
  _txpin   = 90                                 ' P90=SO
  _rxpin   = 91                                 ' P91=SI

' P2 Monitor hub addresses
rx_line         = $990 >> 2
tx_crlf         = $A0C >> 2
tx_string       = $B78 >> 2
tx_hex          = $BCC >> 2
rx_chr          = $BF8 >> 2
rx_check        = $C04 >> 2
error           = $C70 >> 2
hitspace        = $C7C >> 2
hello           = $C88 >> 2
help            = $CE0 >> 2
 

  
DAT
                orgh    $00E00/4                        ' start of hub ram
                org     0
start        
              setp      #0                      '\ external led on
              clrp      #1                      '/

'the following is a 5 sec delay mechanism only (allows PST to start)
                getcnt  waitx
                add     waitx,delta5
                waitcnt waitx,0
              SETSERA   #<<7 + _txpin, baud          'set SERA for 8-bit transmit on pin at baud
              CLRP      #_txpin                         'make pin an output, SERA drives it high
:loop         SEROUTA   #"O"                            'send message
              SEROUTA   #"K"
              SEROUTA   #$0D
              SEROUTA   #"*"
                getcnt  waitx                           '\ 1s delay
                add     waitx,delta                     '|
                waitcnt waitx,0                         '/
              notp      #1                              ' toggle external led  
'             jmp       #:loop
              call      #tx_crlf
              locptra   #hitspace
'              locptra   #msg
'             call      #tx_string                      ' we cannot use this because it then calls cog code!!
:nextchr      rdbyte    x,ptra++        wz,wc           'read string byte
        if_nz serouta   x              
        if_nz jmp       @:nextchr
:monitor
                getcnt  waitx                           '\ 1s delay
                add     waitx,delta                     '|
                waitcnt waitx,0                         '/
              cogrun    monitor_pgm,#0                  'relaunch cog0 with shutdown or monitor
monitor_pgm   long      _rxpin<<24+_txpin<<16+($1B0+$37C)>>2    'monitor parameter (conveys pins)
x             long      0
'------------------------------------------------------------------------------------------------
countx          long    _xinfreq
count           long    5
waitx           long    0
delta           long    _xinfreq                ' 1 sec
delta5          long    _xinfreq * 5            ' 5 sec
baud            long    _bitrate
'================================================================================================
              orgh      $1000
msg           long
              byte      "Hit <space> to start monitor...",0

Note: I have a LED and resistor across P0-P1.

Ariba · 2014-03-24 21:49

Where should we report the bugs we find? Perhaps these thrtead can get a more general title.

Andyway here is what I have found yet:

1) Delayed jumps do not work correct if one of the delayed instructions is a WAITVID, and tasks are enabled.

2) In the Monitor: If you want to start a cog from hubmemory with 0+addr, the addr must be entered as hubaddr/4, which is a bit strange.
I think the monitor should calculate that for us (shr value,#2 before cogrun).

Andy

ctwardell · 2014-03-24 22:17

Ariba wrote: »

Delayed jumps do not work correct if one of the delayed instructions is a WAITVID, and tasks are enabled.

I think that would be by design since waitvid jumps to itself instead of stalling the pipeline when in multitasking mode.

All of the instructions listed below would likely be in the same category.

This is one of those cases where we will need good documentation.

From the 20Mar2014 Prop2_Docs.txt:

Some instructions which stall the pipeline during single-task execution will, instead, jump back to
themselves during multi-task execution (JMP #$), until their release condition is met. This way they
avoid stalling the pipeline, allowing other tasks to execute in the interstitial time slots:

  WAITVID D/#,S/#    wait for VID to grab new data

  SERINA  D          wait for serial input on SERA
  SERINB  D          wait for serial input on SERB
  SEROUTA D/#        wait to send serial output on SERA
  SEROUTB D/#        wait to send serial output on SERB

  GETMULL D          wait for lower multiplier result
  GETMULH D          wait for upper multiplier result
  GETDIVQ D          wait for divider quotient result
  GETDIVR D          wait for divider remainder result
  GETSQRT D          wait for square root result
  GETQX   D          wait for CORDIC X result
  GETQY   D          wait for CORDIC Y result
  GETQZ   D          wait for CORDIC Z result

  SYNCTRA            wait for PHSA to roll over
  SYNCTRB            wait for PHSB to roll over


For the above instructions, multi-tasking is considered to be active when SETTASK D/# has written
a mixture of tasks to the time slots. Remember that in multi-tasking, the above instructions behave
as branches, and therefore cannot be used in REPD/REPS instruction-repeat blocks. Also, you should
not use INDx++/INDx--/++INDx with these instructions during multi-tasking, as they will cause
INDA/INDB to increment or decrement each time they loop back to themselves, before the release
condition is met.

C.W.

Ariba · 2014-03-24 22:37

Thank you C.W

Yes that explains it. I try to port an old code from a time when this was possible, but if I think about it, it was not very efficient because the task stalled until the waitvid was done. So I should have used POLLVID, and then it's the same as we have now.

Andy

cgracey · 2014-03-25 04:03

Ariba wrote: »

Where should we report the bugs we find? Perhaps these thrtead can get a more general title.

Andyway here is what I have found yet:

1) Delayed jumps do not work correct if one of the delayed instructions is a WAITVID, and tasks are enabled.

2) In the Monitor: If you want to start a cog from hubmemory with 0+addr, the addr must be entered as hubaddr/4, which is a bit strange.
I think the monitor should calculate that for us (shr value,#2 before cogrun).

Andy

Andy, this is a hard thing to decide. With hub execution now, things are a little different, since the cog uses a 16-bit address (ignores the two LSBs) for instructions in hub memory, which are longs. Have you noticed the 'X' mode in the ROM Monitor yet? I put it there to help orient people to a cog's execution perspective. In that mode, you see and manipulate the hub memory in this 16-bit-address/long perspective. If you see what the cog sees, it becomes very simple again.

Problem with LOCLONG instruction

Comments