Prop2 FPGA files!!! - Updated 2 June 2018 - Final Version 32i

potatohead · 2015-10-09 17:14

Maybe move them to top of RAM, and change the internal logic that copies the ROM to RAM. Just have it start below $0 and let it wraparound to complete the ROM copy, so those get initialized right away and the ROM programs can start. After that, the program users load can just write those values as needed, and there can even be a routine in the ROM that one can easily call that does it.

msrobots · 2015-10-09 17:49

Good plan @Potatohead.

Not sure if this is doable with the eggbeater hub, but it opens up the possibility to have more hub later on without moving things and 'normal' hub can start at ORG $0 again.

Starting at $400($800?) would be somehow depressing. Even DOS just needed ORG $100...

Chip mentioned that this are special hardcoded(?) RAM locations, not as easy to move around.

Enjoy!

Mike

potatohead · 2015-10-09 18:00

I think we need to keep this and do the work to have it make sense.

I do worry a little about the locations getting trashed, but not much.

jmg · 2015-10-09 19:04

cgracey wrote: »

Here's a problem:

MOV reg,#cogroutine
JMP reg

Revisiting this, my earlier suggestion was close, but there is a better one.

This moving address uncovers an asymmetry in the P2, where immediate address loads can be compact in one place only.

What is needed is a Short Immediate Relative opcode.

P2 already has adders in two paths

    RJMP  Reg     PC := PC + Reg(signed)
    MOV  Reg,i9   Reg := i9
    ADD   Reg,i9  Reg := Reg + i9
    SUB    Reg,i9  Reg := Reg - i9

and what is needed is a morph of the above, using one of those existing adders, for 
    MOVADR   Reg,Label(+)    Reg := PC + i9
    MOVADR   Reg,Label(-)    Reg := PC - i9

Down from 3 opcodes above this is just 2, and they work over the whole address space.
The PC provides the upper address bits

The Assembler can even use a single MOVADR mnemonic and choose +i9, -i9, or a20 forms, transparent to the user.

This Short Immediate Relative opcode allows any local variable address to be loaded in one opcode, in any memory area.
LUT and HUB code get smaller, and COG can place anywhere.
HLL will benefit, as local vars do not have to use COG registers.

cgracey · 2015-10-09 20:16

jmg wrote: »
cgracey wrote: »

Here's a problem:

MOV reg,#cogroutine
JMP reg

Revisiting this, my earlier suggestion was close, but there is a better one.

This moving address uncovers an asymmetry in the P2, where immediate address loads can be compact in one place only.

What is needed is a Short Immediate Relative opcode.

P2 already has adders in two paths
    RJMP  Reg     PC := PC + Reg(signed)
    MOV  Reg,i9   Reg := i9
    ADD   Reg,i9  Reg := Reg + i9
    SUB    Reg,i9  Reg := Reg - i9

and what is needed is a morph of the above, using one of those existing adders, for 
    MOVADR   Reg,Label(+)    Reg := PC + i9
    MOVADR   Reg,Label(-)    Reg := PC - i9
Down from 3 opcodes above this is just 2, and they work over the whole address space.
The PC provides the upper address bits

The Assembler can even use a single MOVADR mnemonic and choose +i9, -i9, or a20 forms, transparent to the user.

This Short Immediate Relative opcode allows any local variable address to be loaded in one opcode, in any memory area.
LUT and HUB code get smaller, and COG can place anywhere.
HLL will benefit, as local vars do not have to use COG registers.

Neat! What about having these instructions only modify the lower 9 address bits of the PC?

jmg · 2015-10-09 20:23

cgracey wrote: »
jmg wrote: »
cgracey wrote: »

Here's a problem:

MOV reg,#cogroutine
JMP reg

Revisiting this, my earlier suggestion was close, but there is a better one.

This moving address uncovers an asymmetry in the P2, where immediate address loads can be compact in one place only.

What is needed is a Short Immediate Relative opcode.

P2 already has adders in two paths
    RJMP  Reg     PC := PC + Reg(signed)
    MOV  Reg,i9   Reg := i9
    ADD   Reg,i9  Reg := Reg + i9
    SUB    Reg,i9  Reg := Reg - i9

and what is needed is a morph of the above, using one of those existing adders, for 
    MOVADR   Reg,Label(+)    Reg := PC + i9
    MOVADR   Reg,Label(-)    Reg := PC - i9
Down from 3 opcodes above this is just 2, and they work over the whole address space.
The PC provides the upper address bits

The Assembler can even use a single MOVADR mnemonic and choose +i9, -i9, or a20 forms, transparent to the user.

This Short Immediate Relative opcode allows any local variable address to be loaded in one opcode, in any memory area.
LUT and HUB code get smaller, and COG can place anywhere.
HLL will benefit, as local vars do not have to use COG registers.
Neat! What about having these instructions only modify the lower 9 address bits of the PC?

Do you mean simply replace the lower bits ?
I thought about that, (it is simpler), but that is boundary sensitive and as code moves about in memory, it would change size.
If the adder is too hard to patch-in, the simple replace would work on COG and LUT, and need user align on HUB. (which users can do on important ASM code, not so easy on HLL)

With an ASM implement of either short or long forms, source would not need change, but the slight speed change could disturb very critical code.

Cluso99 · 2015-10-09 21:05

Many of you seem to be worried about starting at hub $00400 or $00800.

However, all modern microcontrollers (with internal memory) seem to have weird memory maps with lots of holes, etc.

Once there is a nice pic of the memory map, everything will be simple. So how about we just give it a try with where Chip currently has the cog/lut/special jump registers, etc.
In fact, for the time being why don't we just start our code at $01000 until the dust settles.

While all these changes keep happening, we are not getting to try out the main logic. In fact there are only a few that can use the current FPGA code. The rest of us are patiently waiting for other FPGA board images

I am sure once we settle in for a bit, things will be much clearer, and the simple solution will appear like Chips' Magic

PS. I am certain we can simplify the JUMP/CALL/RET instructions (later) too!

PS#2. I am not in favour of having some things in lower hub and some in upper hub. IMHO those wanting large screen mapping will want to use a big chunk of hub, either at the top or bottom of hub. Having bits at both ends might impact this. eg A 256KB display buffer needs, or at least should be, at one end or the other - $80000..$FFFFF or $00000..$7FFFF.

Rayman · 2015-10-09 21:40

What will change in new fpga image?

Mostly the initial start moving from $000 to $800, right?

Just thinking about the interrupt vectors at $400...
Maybe it's OK, but there also seems like a potential for a difficult to diagnose situation if one cog accidentally writes to that area...
Same for those at hub address $000, I suppose.
But, I suppose that can just be something you have to be aware of...

It's looking like HUB $800 and above will be safe to use. Below that will be use with caution...

I wonder if an alternative to the special registers at $00 in hub is to simply have an event triggered on any write to HUB below $400...
Ok, that's not as good. But, maybe you could divide below $400 into 16 sections with each assigned to a cog?

jmg · 2015-10-09 21:58

Cluso99 wrote: »

Many of you seem to be worried about starting at hub $00400 or $00800.
...
PS#2. I am not in favour of having some things in lower hub and some in upper hub. IMHO those wanting large screen mapping will want to use a big chunk of hub, either at the top or bottom of hub. Having bits at both ends might impact this. eg A 256KB display buffer needs, or at least should be, at one end or the other - $80000..$FFFFF or $00000..$7FFFF.

I agree, and the smarter Short Immediate Relative opcode is pivotal in allowing HUB to be placed anywhere.
I think that could even allow all 512K HUB to be cleanly available, with no missing slices or caveats, by placing LUT:HUB at the very top of the 1M space ?

cgracey · 2015-10-09 22:04

potatohead wrote: »

Maybe move them to top of RAM, and change the internal logic that copies the ROM to RAM. Just have it start below $0 and let it wraparound to complete the ROM copy, so those get initialized right away and the ROM programs can start. After that, the program users load can just write those values as needed, and there can even be a routine in the ROM that one can easily call that does it.

We could easily put them at the top of hub. No need for ROM to start there:

	setq	#15
	wrlong	##code_for_reti0,[ptra-16]	'ptra=0

That's all it would take. Now, if nobody would clobber them...

If we moved those event-triggering longs up there, too, that would pretty much clean up the lower hub:

$00000..$001FF = cog exec range
$00200..$003FF = lut exec range
$00400..$FFFFF = hub exec range

$0000..$007FF = initial cog image

$FFF80..$FFFBF = 16 event-triggering longs
$FFFC0..$FFFFF = 16 initial debug interrupt instructions (RETI0's)

What about that? That looks pretty decent, to me. No need to reinvent anything.

Dave Hein · 2015-10-09 22:06

Are the 16 reti0 instructions necessary at ORGH $400? Is ORGH $400 correct? It seems to specify a byte address of $400, which puts it right in the middle of cog memory. Is that really where we want it?

Roy Eltham · 2015-10-09 22:08

cgracey wrote: »
potatohead wrote: »

Maybe move them to top of RAM, and change the internal logic that copies the ROM to RAM. Just have it start below $0 and let it wraparound to complete the ROM copy, so those get initialized right away and the ROM programs can start. After that, the program users load can just write those values as needed, and there can even be a routine in the ROM that one can easily call that does it.

We could easily put them at the top of hub. No need for ROM to start there:
	setq	#15
	wrlong	##code_for_reti0,[ptra-16]	'ptra=0
That's all it would take. Now, if nobody would clobber them...

If we moved those event-triggering longs up there, too, that would pretty much clean up the lower hub:

$00000..$001FF = cog exec range
$00200..$003FF = lut exec range
$00400..$FFFFF = hub exec range

$0000..$007FF = initial cog image

$FFF80..$FFFBF = 16 event-triggering longs
$FFFC0..$FFFFF = 16 initial debug interrupt instructions (RETI0's)

What about that? That looks pretty decent, to me. No need to reinvent anything.

Yes, I think this is the best option for this stuff.

Rayman · 2015-10-09 22:08

Looks perfect to me, Chip!

Wait.. is $007FF a typo for initial cog image?
Or, is that what ROM will copy into RAM?

cgracey · 2015-10-09 22:10

Rayman wrote: »

Looks perfect to me, Chip!

Okay. This will be the map, for now, at least. I have a good, settled feeling about this.

potatohead · 2015-10-09 22:15

Works for me.

jmg · 2015-10-09 22:19

cgracey wrote: »

$00000..$001FF = cog exec range
$00200..$003FF = lut exec range
$00400..$FFFFF = hub exec range

$FFF80..$FFFBF = 16 event-triggering longs
$FFFC0..$FFFFF = 16 initial debug interrupt instructions (RETI0's)

What about that? That looks pretty decent, to me. No need to reinvent anything.

.. or move the LUT:COG to the top, and clean up the whole HUB - IIRC there is a 1M address space ?

LUT:COG could go at the top of that, would open HUB bottom and I'm not sure if those few INT vectors etc must to be in HUB main, or if they can be a separate memory space just below LUT:COG - giving 100% of 512k as User free HUB ?

Dave Hein · 2015-10-09 22:38

Dave Hein wrote: »

Are the 16 reti0 instructions necessary at ORGH $400? Is ORGH $400 correct? It seems to specify a byte address of $400, which puts it right in the middle of cog memory. Is that really where we want it?

I suppose my question is embedded in all the discussion about moving the interrupt vectors, so it's context may not be clear. I'm trying to get a program working with the latest FPGA, and the interrupt vectors seem to be right in the middle of my cog code. If I don't turn on interrupts do I need to specify the interrupt vectors?

EDIT: I suppose I can work around it by making the initial cog program just start a new cog program and locate my second cog program after the interrupt vectors.

jmg · 2015-10-09 22:44

Dave Hein wrote: »

EDIT: I suppose I can work around it by making the initial cog program just start a new cog program and locate my second cog program after the interrupt vectors.

I thought those were located in HUB ram ?
(but you make a good case for avoiding memory overlaps

)

cgracey · 2015-10-09 23:24

I'll try to get new FPGA images out tonight, with all this addressing stuff cleaned up, so you won't have to put the interrupt vectors into your code. They will stay snug at the top of hub.

potatohead · 2015-10-09 23:25

Nice! Im finally free for the weekend.

Dave Hein · 2015-10-10 00:16

jmg wrote: »

Dave Hein wrote: »

EDIT: I suppose I can work around it by making the initial cog program just start a new cog program and locate my second cog program after the interrupt vectors.

I thought those were located in HUB ram ?
(but you make a good case for avoiding memory overlaps )

I haven't really been following the interrupt discussion, but I believe they are in hub RAM. However, if you want to avoid doing a coginit the cog image in hub RAM overlaps that area. That's why it seems like a coginit would be required the way it was. The new approach will resolve this issue.

Roy Eltham · 2015-10-10 00:37

I think there is some confusion about what "interrupt" things are where.

In the image you have right now (10/8/2015 one). There are 16 longs at the start of hub that trigger in interrupt in the cog if they are written. Then there are 16 longs at $00400 (byte address, $100 long address) that are the debug interrupt instruction thingies. They need to be set to RETIO before the cogs get started by a COGINIT so that the cogs will run normally, otherwise they need to contain a jump to code you want to run as a debug session for the given cog. These are separate from the interrupt vectors that are in COG ram just before the special registers.

Chip is now moving the 16 longs that trigger an interrupt in the COG to the end of hub address space, he is also moving the 16 debug interrupt instruction thingies to the end of hub address space. That way hub memory is just hub memory. There is still the limitation of hubexec only working for addresses starting at $00400 (byte address). When Chip speaks of the cog image being in lower hub memory, he's just talking about where it happens to be now with his current setup. Once the 16 ROM is fully implemented, it will get loaded into the first 16k of HUB and do it's thing, by the time you start actually loading user code from flash (or wherever) it can be wherever you want it to be in hub.

I think things will be a lot more clear/defined/whatever when we get the ROM implemented and the whole boot process going.

potatohead · 2015-10-10 00:49

Agreed, we are one step before it's really organized. But it's good that we are here too. Differences in this design have played out in some interesting and surprising ways compared to how "hot" got done.

One way to think about this is we are at the level Chip was when he setup "hot" with booter, crypt and monitor.

Dave Hein · 2015-10-10 01:36

So just to be clear, if I do a "jmp #$400" the cog will go into hubexec mode, correct? Does this correspond to byte address $1000 in hub RAM, or is it byte address $400?

evanh · 2015-10-10 01:45

It's byte address $400 in Hub. It would also be simultaneously the long address $400 in Cog but that doesn't exist.

cgracey · 2015-10-10 01:55

Those 16 interrupt instructions have been moved up to $FFFC0..$FFFFF.
The 16 event-triggering longs have been moved up to $FFF80.$FFFBF.

The cold-boot ROM in the cog (~10 instructions) writes RETI0's to the top 16 longs in hub.

When you download a program now, those last 16 longs are not loaded from your image. They are left as RETI0's from the cold-boot ROM in the cog. If you want to change them, your code must do it.

I'm almost done compiling for the Prop123-A7 board, then I'll do the DE2-115. I hope to have new files up tonight.

David Betz · 2015-10-10 01:59

cgracey wrote: »

Those 16 interrupt instructions have been moved up to $FFFC0..$FFFFF.
The 16 event-triggering longs have been moved up to $FFF80.$FFFBF.

The cold-boot ROM in the cog (~10 instructions) writes RETI0's to the top 16 longs in hub.

When you download a program now, those last 16 longs are not loaded from your image. They are left as RETI0's from the cold-boot ROM in the cog. If you want to change them, your code must do it.

I'm almost done compiling for the Prop123-A7 board, then I'll do the DE2-115. I hope to have new files up tonight.

Are the interrupt vectors actually hub memory or are they registers? Won't fetching them from hub introduce additional and non-deterministic interrupt latency?

jmg · 2015-10-10 02:07

David Betz wrote: »

..Won't fetching them from hub introduce additional and non-deterministic interrupt latency?

I thought those were Debug Support vectors, and there are still COG vectors for 'real' interrupts & COG SFRs inside the COG memory map ? ( but it is a moving target ATM )

0xfffff = 1048575, so looks like these are also clear of 512k HUB Memory and must be a separate small memory map (/registers) ?

Sounding good, be nice to move LUT:COG up there too ?

FPGAs can support more HUB, and with no hard limits on that 512k, it could be safely bumped with a late change on die routing, should spare die space magically appear (not that that ever happens

)

potatohead · 2015-10-10 02:32

Lets not move the lut and cog.

potatohead · 2015-10-10 02:34

The COG interrupt vectors are still in the COG and still operate as usual. These are for debug of a COG. Even one you didn't write.

Prop2 FPGA files!!! - Updated 2 June 2018 - Final Version 32i

Comments