Prop2 FPGA files!!! - Updated 2 June 2018 - Final Version 32i

cgracey · 2015-10-09 06:10

Peter Jakacki wrote: »

I am trying to get Tachyon to compile and work properly but so far no joy. The compiler is touchy about the position of the intvecs, as it often says "Hub origin already exceeds target" or I get an error on coginit source of "constant must be from 0 to 511". I guess I am not quite fully understanding all the orgs and orghs and how they affect the compiler itself. I chopped out most of the code but unless I have a small program I get an error.

Here's a test version that worked fine on the last version of the FPGA and PNut.

You are suffering from the same trouble as ozpropdev. We need to rearrange some things.

cgracey · 2015-10-09 06:24

While I work on getting things rearranged, you guys can keep working by only having a small initial cog program at $00000 that relaunches itself to the real program of interest that sits somewhere after $43F.

cgracey · 2015-10-09 06:28

Scratch that idea of moving cog/lut exec to the hub ceiling. It would cause a problem between register addresses and branch addresses within the cog. Not going to work. The solution lies in keeping cog/lut-exec $00000-based in hub, which means that the solution will be here a lot sooner.

jmg · 2015-10-09 07:05

cgracey wrote: »

What if...

hub $00000..$0003F = cog r/w event 16 longs (already the case, but currently covered by initial cog image)
hub $00040..$0007F = initial debug interrupt instructions (16 RETI0s/JMPs)
hub $00080+ = initial cog image

hub $00000..$FFBFF = hub exec range
hub $FFC00..$FFDFF = lut exec range (notice that lut-to-cog exec potential)
hub $FFE00..$FFFFF = cog exec range

I don't how to make this look pretty, but there may be a simple way. Actually ORG would take care of this.

That looks good to me

yes, ORG/SEG will hide the hard-to-remember numbers.

cgracey wrote: »

It would cause a problem between register addresses and branch addresses within the cog.

Can the tools not manage that ?

cgracey · 2015-10-09 07:31

jmg wrote: »

...Can the tools not manage that ?

Here's a problem:

MOV reg,#cogroutine
JMP reg

Imagine #cogroutine is $FFF00, or $100 within the cog. This MOV already has big problems.

And reg holding a cog execution address must contain $FFExx for the JMP.

I don't know how to overcome these kinds of problems. Do you? If we could stuff cog/lut exec against the hub ceiling, it would be fantastic, but how to do it and maintain sanity?

jmg · 2015-10-09 07:57

cgracey wrote: »

jmg wrote: »

...Can the tools not manage that ?

Here's a problem:

MOV reg,#cogroutine
JMP reg

Imagine #cogroutine is $FFF00, or $100 within the cog. This MOV already has big problems.
MOV reg,#cogroutine
And reg holding a cog execution address must contain $FFExx for the JMP.

I don't know how to overcome these kinds of problems. Do you? If we could stuff cog/lut exec against the hub ceiling, it would be fantastic, but how to do it and maintain sanity?

hmm... To still fit in a #9/10? immediate, that would need some form of sign-extend equivalent ?
Either that, or JMP becomes COG-relative, or has a COG-relative variant.

eg ADR() tells the tools what you are trying to do

MOV reg,ADR(cogroutine) ' 9/10 active bits, upper ones are forced COG.LUT
JMP reg ' or shorter jump variant available.

addit : I think this relative form works for ADR(cogroutine) > ADR(JmpBase)

MOV reg,ADR(cogroutine)-ADR(JmpBase)
...
JmpBase
JMPREL reg
...
cogroutine

more:
This has gained smaller code in HUB page 0, and there was already an issue with LUT, so a form like
MOVCADR RegN,ADR(AddressInCog)
MOVLADR RegN,ADR(AddressInLut)
MOVHADR RegN,ADR(AddressInHub)
would allow 3 x short opcode loads of RegN, of address in 3 memory areas.
This gains code size, & is portable**, but does need opcode space.
** In actual ASM use, the common MOVADR would be used and the tools would select which opcode to use, based on memory area. out-of-reach address would expand to the 2 opcode version.
This short load triple form has merit no matter what final memory map is used, all possible code areas, have equally efficient blocks. Q:Is there opcode room ?

cgracey · 2015-10-09 08:13

These are certain:

hub $00000..$001FF = cog exec range
hub $00200..$003FF = lut exec range
hub $00400..$FFFFF = hub exec range

These are flexible:

hub $00000..$0003F = 16 special longs that create r/w events
hub $00400..$0043F = 16 cog-start debug interrupt instructions (RETI0/JMP's)

Initial cog image needs to move from $00000 to maybe $00800.

evanh · 2015-10-09 10:15

Chip,
I have a distinct dislike for System Management Mode on the PC. I'm not sure I want an equivalent being bred here.

Rayman · 2015-10-09 10:22

I suppose debugging ability is worth some pain...

Moving initial cog load to $800 sounds fine.
You could also launch it with HUBEXEC if you wanted too then I think...

What are these "hub $00000..$0003F = 16 special longs that create r/w events"
Is that a new idea?

Doesn't it make sense to keep lowest RAM clear for table usage?
Can you put them after $43F?

cgracey · 2015-10-09 10:30

evanh wrote: »

Chip,
I have a distinct dislike for System Management Mode on the PC. I'm not sure I want an equivalent being bred here.

That doesn't sound fun.

What do you guys think about these debug hooks? Is it the wrong approach? Is your dislike centered on the awkward presence of those 16 longs in lower hub space, or the concept, itself?

Rayman · 2015-10-09 10:36

I like the idea, just wish it was more transparent...
Can ROM not set these longs for us, so we don't need to include it in code?

cgracey · 2015-10-09 10:38

Rayman wrote: »

I suppose debugging ability is worth some pain...

Moving initial cog load to $800 sounds fine.
You could also launch it with HUBEXEC if you wanted too then I think...

What are these "hub $00000..$0003F = 16 special longs that create r/w events"
Is that a new idea?

Doesn't it make sense to keep lowest RAM clear for table usage?
Can you put them after $43F?

When cogs read or write those first 16 longs, events are triggered among all cogs. It's like a mailbox function with full handshake. I think they need to stay at the bottom of hub memory, since they have hardwired special functions.

cgracey · 2015-10-09 10:39

Rayman wrote: »

I like the idea, just wish it was more transparent...
Can ROM not set these longs for us, so we don't need to include it in code?

Of course! We just don't have that kind of context running, at this stage. When you download, you are overwriting the entire hub memory, so you must include those 16 instructions.

Rayman · 2015-10-09 11:13

I guess I mean less visible, not transparent...

Can you do this in PNut?:

reti0 [16]

Tubular · 2015-10-09 11:43

I think it's just hard to get a handle on the implications of such a potent feature, from where we are now. However that in itself is a good reason to include it in the fpga image now, so we can live with it a while, and see further ahead.

It's also going to speed up testing and faultfinding

Another question is whether a cog can detect whether its being debugged, eg by checking the system CNT increment amount over some normally fixed interval, so see whether cycles are 'leaking' due to a debugger. Of course the debugger could get sophisticated too, and return offset corrected CNT values. These kind of alternating layers of countermeasures are probably healthy, I believe.

I suspect we'll all be as enthusiastic about it as you are Chip, once we've experienced the benefits first hand.

Rayman · 2015-10-09 13:14

I think this is the same:
' Cog 0..15 initial debug interrupt vectors
'
orgh $400
long $FABBFFFF [16]

Still don't see why the special longs to create r/w events need to be at $0000.
But, I can see them being very useful.

Seairth · 2015-10-09 13:31

One other thing I don't like about this new scheme is that every coginit will now take an additional 8-10 clock cycles (up to 16-18 cycles) since it must now switch temporarily to hubexec mode and fill the instruction cache. I was looking forward to using coginit to restart a cog at various entry points without reloading code. That debug interrupt can be some significant overhead (not to mention variability).

Because of this, I still think it's a good idea to enable the debug interrupt via a bit in D.

Seairth · 2015-10-09 13:50

What is the format of D/# for SETBRK?

ozpropdev · 2015-10-09 13:52

Chip
I've got my code up an running again. I had to change to hubexec though to get it going.
I don't seem to be able to launch code in cogexec mode.
Here's test code to show what i'm seeing.

dat		orgh	0
		org

		setb	dirb,#8
		setb	outb,#8

		loc	adra,#code2
		coginit	#2,adra		'fails to start - cogexec

		loc	adra,#code3
		coginit	#32+3,adra	'starts ok - hubexec

here		jmp	#here


		orgh	$400
'
' Cog 0..15 initial debug interrupt vectors
'

		reti0
		reti0
		reti0
		reti0
		reti0
		reti0
		reti0
		reti0
		reti0
		reti0
		reti0
		reti0
		reti0
		reti0
		reti0
		reti0

		org			'cog code
code2		setb	dirb,#10
		setb	outb,#10
me		jmp	#me

		orgh			'hub code
code3		setb	dirb,#11
		setb	outb,#11
me2		jmp	#me2

Rayman · 2015-10-09 14:05

If I change your org to orgh above "code2" it seems to work (?)

At least, I get LED10 to light...

ozpropdev · 2015-10-09 14:08

Hmmm
Changing this line seems to fix the problem

		loc	adra,#@code2

#@ for cogexec , # for hubexec ???
It's getting late...time for sleep.

Rayman · 2015-10-09 14:13

#@ appears to work for both hub and cog exex...

cgracey · 2015-10-09 14:33

#@label should get the absolute address in hub.

cgracey · 2015-10-09 14:38

ozpropdev wrote: »

Chip
I've got my code up an running again. I had to change to hubexec though to get it going.
I don't seem to be able to launch code in cogexec mode.
Here's test code to show what i'm seeing.

dat		orgh	0
		org

		setb	dirb,#8
		setb	outb,#8

		loc	adra,#code2
		coginit	#2,adra		'fails to start - cogexec

		loc	adra,#code3
		coginit	#32+3,adra	'starts ok - hubexec

here		jmp	#here


		orgh	$400
'
' Cog 0..15 initial debug interrupt vectors
'

		reti0
		reti0
		reti0
		reti0
		reti0
		reti0
		reti0
		reti0
		reti0
		reti0
		reti0
		reti0
		reti0
		reti0
		reti0
		reti0

		org			'cog code
code2		setb	dirb,#10
		setb	outb,#10
me		jmp	#me

		orgh			'hub code
code3		setb	dirb,#11
		setb	outb,#11
me2		jmp	#me2

When you do that LOC, the address operand label needs to be declared under an ORGH, before the ORG. Or, just use #@label.

cgracey · 2015-10-09 14:40

Seairth wrote: »

What is the format of D/# for SETBRK?

I'll document this after I sleep. I'll make some new FPGA files, too.

Ale · 2015-10-09 15:26

wouldn't be better to access debug functionality through special opcodes ? flags that have to be set and address registers ?
After years of programming various processors in assembly, I tend to favour orthogonal, flat models. If the bootloaderthingy has to live at 0x800, then so be it. I'm glad uneven addresses are gone. org/orgh seem a bit odd. sections sound a bit better...

Seairth · 2015-10-09 15:36

Ale wrote: »

wouldn't be better to access debug functionality through special opcodes ? flags that have to be set and address registers ?
After years of programming various processors in assembly, I tend to favour orthogonal, flat models. If the bootloaderthingy has to live at 0x800, then so be it. I'm glad uneven addresses are gone. org/orgh seem a bit odd. sections sound a bit better...

Uneven addressing is still possible for hub instructions.

Roy Eltham · 2015-10-09 16:28

Chip,
can the r/w event longs and the debug instruction longs all be moved to the end of hub space?

I think that would be the least painful/awkward setup. Then the cog image can go at 0 in hub space again.

It seems really odd to have something at $400 hub space...

Seairth · 2015-10-09 16:50

Roy Eltham wrote: »

Chip,
can the r/w event longs and the debug instruction longs all be moved to the end of hub space?

I think that would be the least painful/awkward setup. Then the cog image can go at 0 in hub space again.

It seems really odd to have something at $400 hub space...

I think he considered that (I'm losing track). The issue is that those addresses are not initialized from the code image when the chip boots.

Electrodude · 2015-10-09 17:12

Instead of putting the debug interrupt instructions in hubram, requiring a switch to hubexec and a streamer load on every coginit and requiring certain locations in hubram to contain certain values, how hard would it be to run a 32 bit global bus to every cog with the debug instruction, wired up the same as CNT is? This would mean every cog would use the same debug instruction as I mentioned before. The instruction would be stored in a register in the hub that would default to RETI0 on boot and would be settable by cogs via an instruction.

Prop2 FPGA files!!! - Updated 2 June 2018 - Final Version 32i

Comments