Prop2 FPGA files!!! - Updated 2 June 2018 - Final Version 32i

evanh · 2015-11-09 00:46

Dave Hein wrote: »

... The streamer just consists of a 16-long FIFO and the logic needed to make it work with the rest of the cog.

That's only the FIFO. The Streamer makes use of that FIFO when fetching from HubRAM but they are independent hardware otherwise.

Peter Jakacki · 2015-11-09 01:34

cgracey wrote: »

Pins' outputs can be inverted via the whole smartpin arrangement. Any any cog wrote a '1', the pin would go low.

Just remember that if smart pins are to be "streamlined" that we have the inverter available although IMO the AND arrangement is far more sensible and in keeping with how signals are actually used between cogs.

On the matter of streamlined smartpins if that is what is needed to induce a P2 birth then we would still want some form of serdes and if analog complicates everything then at least have the counter modes from the P1 that allowed us to create DACs easily and output selectable frequencies.

Also, mentioned before about upgrading the bootloader to at least load from SPI Flash. This would be a really big help for me and allow me to build up Tachyon with all the filesystem and networking present at boot.

P.S. I came across this post while looking for information on the new ALTx instructions to include in my assembler document. Would be nice if each of these instructions had their own thread rather than getting buried deep and then not knowing which thread it was discussed in.
EDIT: Found it in it's own thread!

Electrodude · 2015-11-09 01:38

Could the pixel mixer be added as part of the hub CORDIC?

Baggers · 2015-11-09 01:55

Cheers for the info Cluso.

I guess the powers that be will state what goes in or not anyway

Dave Hein · 2015-11-09 11:54

Cluso99 wrote: »

Dave,
Chip said the extras he put in the other day just for NTSC took IIRC 250+ blocks and a couple of multipliers which will need to be converted to gates. And if I understand correctly, this is duplicated in every cog. This is not the streamer which was already in the done.

I suppose my comment was based on evanh's responses to your posts. Those were specific to the streamer, and didn't mention NTSC. So I apologize if you were only referring to the NTSC feature. I agree with you that all cogs would not need this, but I don't like the idea of having cogs with different functionality. The NTSC feature should be a single resource that is shared by all the cogs.

However, I wonder whether an NTSC feature is needed at all. I would much rather see HDMI.

Dave Hein · 2015-11-09 12:04

evanh wrote: »

Dave Hein wrote: »

... The streamer just consists of a 16-long FIFO and the logic needed to make it work with the rest of the cog.

That's only the FIFO. The Streamer makes use of that FIFO when fetching from HubRAM but they are independent hardware otherwise.

With the lack of documentation it is difficult to understand what is contained in the streamer. Could someone please post a list of its features? My understanding is that the streamer consists of the FIFO and the logic that controls the FIFO. This is used with hubexec and the read/write fast instructions. I understand it is also used to stream ADC and DAC data. The actual ADC and DAC circuitry is separate from the streamer since these are a shared resource. Or maybe not. Could somebody please post to some documentation on this?

evanh · 2015-11-09 12:32

True, the FIFO was only for the Streamer initially. I consider it separate now mainly because it can be, and mostly will be, used for non-Streamer activities. Secondly, the Streamer can function from the LUT alone. It doesn't use the FIFO then. Thirdly, I don't know if this was always so, the FIFO has to be initialised with it's own RD/WRFAST instructions.

Streamer documentation: The only one I know of is page 3 of Chip's doc where there is passing mention of what can use the FIFO.

There is probably code examples in Chip's PNUT/FPGA releases.

evanh · 2015-11-09 13:06

evanh wrote: »

T... the FIFO has to be initialised with it's own RD/WRFAST instructions.

That's intriguing. That RDFAST dictates the HubRAM addresses. It hadn't really dawned on me until now that the Streamer has no say in addressing it's Hub data. On the other hand, if it is to fetch from the LUT it has to perform it's own addressing.

Obviously, I don't know what Chip has done for this.

78rpm · 2015-11-09 13:17

Oddity with V4 FPGA files release 7th Nov. My test harness has stopped transmitting to the PST.

start

                mov     dira, ##$ffff
                mov     outa, #0
                setb    outb, #TX_PIN
                setb    dirb, #TX_PIN

                setint1 %000            ' disable interrupt
                setint2 %000            ' disable interrupt
                setint3 %000            ' disable interrupt

                loc     ptra ,#main_stack' set stack for framework

                ' load COG code for direct calling - do not Coginit it
                loc     ptrb, #cog_start
                setq    #(cog_end-cog_start)    ' thank you Searith
                rdlong  0, PTRB 

                calla   #rcv_char       ' kickoff on receive char

Executes if I sprinkle some LED driving ops in there, but nothing comes out on the serial terminal. Some facts: I am running in Hubexec and using ptra for the stack exclusively, thus using calla/reta/pusha/popa instructions.

Now, if I add the following to the code above:

mov tx_char,#"*"
calla #send_char
mov tx_char,#"*"
calla #send_char
mov tx_char,#"*"
calla #send_char
mov tx_char,#"*"
calla #send_char
mov tx_char,#"*"
calla #send_char
mov tx_char,#"*"
calla #send_char
mov tx_char,#"*"
calla #send_char
mov tx_char,#"*"
calla #send_char
mov tx_char,#"*"
calla #send_char
mov tx_char,#"*"
calla #send_char
mov tx_char,#"*"
calla #send_char

when I press CR on PST the leds which indicate a cog is running start lighting up. How does this happen on the bare board Nano when there are only two cogs?!

I have changed some code to reflect ALTDS change to ALTS. It picks up a registers address in cog ram and loads the value into scratch. It looks correct to me, but is the only thing I can think of. Original code was:

                mov     modifier, ptrb          ' == register address
send_get_inst   altds   modifier, #000_000_100  ' use modifier as source reg
                mov     scratch, 0-0            ' read from [ptrb]
                loc     ptrb, #send_hex_value
                wrlong  scratch, ptrb           ' write it to hub

new code is :-

                mov     modifier, ptrb          ' == register address
send_get_inst   alts    modifier, #0		' use modifier as source reg
                mov     scratch, 0-0            ' read from [ptrb]
                loc     ptrb, #send_hex_value
                wrlong  scratch, ptrb           ' write it to hub

The Prop-Plug is functional, and if I run a modified version of mindrobot's comm_test_hi.spin, which is a hubexec version, characters I enter at PST are echoed back correctly.

So it must be in my code somewhere, but I am just not seeing where the problem is?

Any thoughts?

ozpropdev · 2015-11-09 13:24

Having problems with V4 too.
Code that runs fine in V3 is broken in V4.
Spent the last 3 hours diving into my code (it's large)
Can't put a finger on the problem yet.

78rpm · 2015-11-09 13:39

ozpropdev wrote: »

Having problems with V4 too.
Code that runs fine in V3 is broken in V4.
Spent the last 3 hours diving into my code (it's large)
Can't put a finger on the problem yet.

Aha, I shall wait a while on Chip before delving further.

I thought my Prop-Plug had died originally.

My code is also large, over 5000 lines. It was working without obvious error in V3.

Is your code cog, lut or hub exec, or a mixture?
Do you use a ptra or b stack at all?
Did you use altds and now use altd/s/i ?

I wonder if it is a PNut problem or FPGA?

ozpropdev · 2015-11-09 13:48

@78rpm
Both cog and hub exec.
PTRA and PTRB stacks.
ALTDS (now ALTI)
Pnut appears to be compiling ALTx Ok.
Program with data >500K in hub.

Dave Hein · 2015-11-09 14:00

evanh wrote: »

True, the FIFO was only for the Streamer initially. I consider it separate now mainly because it can be, and mostly will be, used for non-Streamer activities. Secondly, the Streamer can function from the LUT alone. It doesn't use the FIFO then. Thirdly, I don't know if this was always so, the FIFO has to be initialised with it's own RD/WRFAST instructions.

Streamer documentation: The only one I know of is page 3 of Chip's doc where there is passing mention of what can use the FIFO.

There is probably code examples in Chip's PNUT/FPGA releases.

So the streamer is just a very simple DMA controller. This just requires an address counter and a small state machine. It can't be more than a few hundred gates per cog, which would be a very small fraction of a cog.

potatohead · 2015-11-09 14:04

Maybe we can get Chip to sketch out a quick block diagram of the various parts. He's got a nice phone. Take a photo of it, and send it to us.

@Chip: How about it? Just scribble it onto some paper or other. We can make it nice and pretty.

78rpm · 2015-11-09 14:06

ozpropdev wrote: »

@78rpm
Both cog and hub exec.
PTRA and PTRB stacks.
ALTDS (now ALTI)
Pnut appears to be compiling ALTx Ok.
Program with data >500K in hub.

Mine is mainly hub withsome calls into cog.
I use PTRA stack, but I use ptrb a lot for indirect addressing.
ALTDS (now ALTS) in my case
Program with data c.9-10KB mainly in hub, some cog.

I also load the initial cog but do not issue a coginit, just call directly into the cog.

I do use a lot of rdlong / wrlong and byte varients to get at data in hub. Of course, a "PUSHA my_reg" is an alias for "WRLONG my_reg,PTRA++".

I changed the CALLA #send_char in my previous pst to a CALL and modified the RETA to RET, but that made no difference, so I'm not sure it is a stack problem, unless perhaps stack direction has suddenly changed, ie downards instead of up?

I am grabbing at straws here, or perhaps thin air. Without being able to see inside easily with the serial terminal I can really only resort to driving leds to find how far I get.

mindrobots · 2015-11-09 14:11

All my little demos work with v4 so far but none of them are large, use much in the way of memory or do any ALT or instruction modification. I've mostly been playing with interrupts (I know, I'm a heathen!) and execution modes.

I get to play some today, so I'll see if I can find any issues that aren't mine.

Wowzer! I'm not sure I'll ever be able to write 5000+ lines of PASM again!

78rpm · 2015-11-09 14:27

mindrobots wrote: »

All my little demos work with v4 so far but none of them are large, use much in the way of memory or do any ALT or instruction modification. I've mostly been playing with interrupts (I know, I'm a heathen!) and execution modes.

I get to play some today, so I'll see if I can find any issues that aren't mine.

Wowzer! I'm not sure I'll ever be able to write 5000+ lines of PASM again!

1236 lines of the program where generated by the program itself, then cut'n'pasted from PST. It is all part of my rd/wr byte/word/long unit test with every permissible use of index value and --/++. It's also to verify that PNut and the test, which synthesises each instruction, agree on the binary representation.

206 indvidual forms of the ptr expression
* 3 for byte, word, long instruction forms
* 2 for read and write
gives us 1236, but
* 3 for execution of the instruction in cog, lit and hub exec modes. The instructions of course only rd/wr between cog and hub, but their execution has to be verified in all exec memory spaces. So a total of 3708 tests.

ozpropdev · 2015-11-09 14:35

Been staring at one's and zero's all day and i'm starting to see two's now.
Time to step away and have another look after some sleep.

78rpm · 2015-11-09 14:47

At least you now know it's not just your code experiencing problems, not that it is a tremendous help, but hopefully we can get things resolved soon and get back on track.

Enjoy you kip, I think I may tackle my soldering that cries out for attention.

78rpm · 2015-11-09 15:39

I have done more testing with the LEDs and it looks to me that rcv_char does not execute as before. In fact it gets stuck at WAITEDG, as LEDs 5 and 4 are on, all others are off. Yet the same code works in the comm_test_hi.spin example. The only difference in my code is the CALLA and RETA. Changing those to CALL and RET make no difference.

rcv_char                  
                setedg  #%0_10_000000 | RX_PIN          'select negative edge on p63

setb outa,#5
                polledg                         'clear edge detector
setb outa,#4
                waitedg                         'wait for start bit
setb outa,#3
                waitx   bit_time                'wait for middle of 1st data bit
setb outa,#2

                rep     @.rep,#8                'ready for 8 bits
                testb   inb,#RX_PIN     wc      'sample rx
                rcr     rx_char,#1              'rotate bit into byte
                waitx   bit_time        'wait for middle of nth data bit
.rep
setb outa,#1
                shr     rx_char,#32-8           'justify received byte
setb outa,#0
                reta

My program is larger, is that a clue? rcv_char is @ $43d2 in my code.

mindrobots · 2015-11-09 16:03

@78RPM,

Don't put your rcv_char at address $43d2!

Seriously, put an ALIGNL before rcv_char to bring it back to LONG aligned memory allocation.

In the code below, it works fine with the ORGH before rcv_char set to $43D0 but breaks if set to $43D2. Either byte aligned code execution got broken or we lost that feature someplace and you need to make sure you manually bring any code into alignment after you've defined WORD or BYTE data in HUBRAM.

con
	SYS_CLK = 50_000_000
	BAUD_RATE = 115_200

	RX_PIN = 63
	TX_PIN = 62

dat
	orgh	0



	org	0
	jmp	#@cog_entry

' these need to stay here (under org 0) so they are in COGRAM and get initialized
bit_time	long	SYS_CLK / BAUD_RATE
tx_char		res	1
timer		res	1
rx_char		res	1

' plenty of room to play in High HUBRAM (COG0 in HUBEXEC at this point)
		orgh	$4300
cog_entry
	setb	outb, #TX_PIN
	setb	dirb, #TX_PIN

'*******************************************************
'********* TEST CODE - PUT YOUR CODE HERE **************
' ***** try some input/output
loopback

		call	#@rcv_char
		mov	tx_char, rx_char
		call	#@send_char
		jmp	#@loopback

'********* TEST CODE - REPLACE WITH YOUR CODE ***********
'********************************************************
	


'*******************************************************************************
' Get one character from the input port.
' Input none
' Changes parm, temp, temp1, temp2
' Output parm
'*******************************************************************************
		orgh	$43d0
rcv_char                  
		setedg	#%0_10_000000 | RX_PIN		'select negative edge on p63

		polledg				'clear edge detector
		waitedg				'wait for start bit

		waitx	bit_time		'wait for middle of 1st data bit

		rep	@.rep,#8		'ready for 8 bits
		testb	inb,#RX_PIN	wc	'sample rx
		rcr	rx_char,#1		'rotate bit into byte
		waitx	bit_time	'wait for middle of nth data bit
.rep
		shr	rx_char,#32-8		'justify received byte
		ret


'*******************************************************************************
' Output a single character to the tx_pin.
' executes in COG mode
' Input: txchar - character to be sent
' Changes parm, temp1, temp2
' Output none             
'*******************************************************************************

send_char	setb	tx_char,#8
		shl	tx_char,#1
		getct	timer

		rep	@.txrep,#10
		testb	tx_char,#0 wz
		setbnz	outb,#TX_PIN
		addct1	timer,bit_time
		waitct1
		shr	tx_char,#1
.txrep
		ret

See? This is fun!!!! (OK, I had fun playing detective!)

78rpm · 2015-11-09 16:12

Well done you!

Trouble is, there are byte strings near routines all over the place at present, it is convienient when developing and debugging.

What a pain, so it's alignment which has broken for some reason.

YUP! Confirmed by placing one ALIGNL, but there are millions, ok, perhaps a couple of dozen, other places to add them. Well at least we have a workaround for the moment.

The question to ask is, why has it broken?

mindrobots · 2015-11-09 16:22

YAY!!!

78rpm wrote: »

The question to ask is, why has it broken?

I can narrow that down to either PNUT or the FPGA image!

As I think about it more, I think PNUT is in the clear. Chip was working in the streamer which also plays a part in Hub exec, maybe?

Since my little test program doesn't use anything that has changed recently, I think I can go backward and use old PNUTs against the new FPGA image.

If I narrow it down, I'll start another thread with just this problem.

Now, get back to your testing!!!

78rpm · 2015-11-09 16:33

Ah, but the LED never gets set after WAITEDG which puts the cog into a low power state whilst it waits for the event to occur. Perhaps Chip now only saves or restores a PC with the two lsbs clear? Maybe it is a bug introduced with the event/polling/getint update, just laying there dormant until now?

mindrobots · 2015-11-09 16:36

In my testing, it works on _v3 FPGA image and fails on _v4 FPGA image. PNUT version does not matter.

New thread coming.

cgracey · 2015-11-09 17:33

'SETDACS D/#' is used to set the four 8-bit DACs associated with each cog. For cog0, this mean pins 0..3, while for cog1, this means pins 4..7, etc.

For any clock on which the streamer is outputting, it selectively mux's its four outputs in lieu of original SETDACS four-byte data, on a per-byte/DAC basis, as the streamer can be configured to affect only certain bytes/DACs.

The colorspace converter, when enabled, grabs the DAC-bound data from the streamer and outputs new DAC-bound data 4 clocks later, mux'ing its output data in lieu of what was going to the DACs from the streamer/SETDACS.

So, if you don't use the streamer, but turn on the colorspace converter, any SETDACS instruction will directly establish the inputs to the colorspace converter. For something as low-bandwidth as NTSC, it's not even necessary to use the streamer, but it can help a lot, and allow you to move video updating entirely into an interrupt.

cgracey · 2015-11-09 17:45

One thing that has changed with PNut is that it only downloads up to where the last code was emitted. It used to download the whole memory image, but that was wasting time.

If your .spin file left off emitting hub data at an address below some other data, it only loads up to the lower address. Could that be the problem?

There has been no change in alignment rules.

78rpm · 2015-11-09 17:49

cgracey wrote: »

One thing that has changed with PNut is that it only downloads up to where the last code was emitted. It used to download the whole memory image, but that was wasting time.

If your .spin file left off emitting hub data at an address below some other data, it only loads up to the lower address. Could that be the problem?

There has been no change in alignment rules.

I don't think so, it's narrowed down later on. Mindrobots has started a new thread about a _v4 bug, which links to this one.

mindrobots · 2015-11-09 17:50

cgracey wrote: »

One thing that has changed with PNut is that it only downloads up to where the last code was emitted. It used to download the whole memory image, but that was wasting time.

If your .spin file left off emitting hub data at an address below some other data, it only loads up to the lower address. Could that be the problem?

There has been no change in alignment rules.

I could see that being the case except the ALIGNL makes it work.

I'll try more testing, I have an even smaller program that exhibits the problem.

con
main_led	=	0
isr_led		=	1

dat
		orgh	0

		org	0

start
		setb	dirb,#main_led
		setb	dirb,#isr_led
		getct	isrticks
		addct1	isrticks,isr_wait
		loc	adra,#@isr
		mov	ijmp1,adra
		setint1	#1

blink		
		notb	outb,#main_led		'flip its output state
		waitx	main_wait		' WAITX blocks interrupts
		jmp	#blink		'do it again
		

isr_wait	long	50_000_000
main_wait	long	5_000_000
isrticks	long	0
isr_in_hub	long	0

		orgh	$402
		ALIGNL			' needed to work on V4 FPGA
isr
		notb	outb,#isr_led
		addct1	isrticks, isr_wait
		reti1

(I never thought I'd need a SECOND 1-2-3 board for testing!!!

Don't tell Ken!)

jmg · 2015-11-09 19:42

Dave Hein wrote: »

So I apologize if you were only referring to the NTSC feature. I agree with you that all cogs would not need this, but I don't like the idea of having cogs with different functionality. The NTSC feature should be a single resource that is shared by all the cogs.

Not sure how practical that is, as it needs to bolt onto streamer flow ?
Adding a 17th streamer means a 17th slot = not possible ?
One channel of NTSC may be too light, but 16 is unlikely to be used.

It comes down to Logic cost, and if something else more important can fit, then COG peripherals do not have to be all equal.

Perhaps with the device routing, a NTSC cell can be MUX'd between two COGS, halving the NTSC count, and every COG can access one.

Dave Hein wrote: »

However, I wonder whether an NTSC feature is needed at all. I would much rather see HDMI.

Yes, but 180nm is not going to manage HDMI.
You can do HDMI with add-on parts.
However, note that everyone does HDMI in the Big Iron ARM parts, and P2 would be lost in the noise.

I think the Composite Video space (and direct LCD drive) are going to be important enough for a long time, and P2 can excel there.

Prop2 FPGA files!!! - Updated 2 June 2018 - Final Version 32i

Comments