Prop2 FPGA files!!! - Updated 2 June 2018 - Final Version 32i

cgracey · 2017-11-12 19:48

Sorry, I meant 16KB.

potatohead · 2017-11-12 20:10

16K of vectors!!! No, it's just 16k, pretty sure it's a typo.

potatohead · 2017-11-12 20:13

msrobots wrote: »

FC000...FFFFF is 16K

is it maybe possible to put it at the end of the long address space?

It could be reached with 'negativ' addresses, and flow over to address 0? So sort of be continuous?

FFFFC000-FFFFFFFF

Mike

I don't know. Kind of like it where it is. Flowing around 0 is just asking for trouble, if you ask me.

cgracey wrote: »

Sorry, I meant 16KB.

Didn't refresh!

ozpropdev · 2017-11-12 22:19

Chip
BeMicro_A9_Prop2_v27y.jic is still a flat liner.

cgracey · 2017-11-12 23:17

ozpropdev wrote: »

Chip
BeMicro_A9_Prop2_v27y.jic is still a flat liner.

Thanks for testing it. This just doesn't make any sense.

cgracey · 2017-11-12 23:19

potatohead wrote: »

16K of vectors!!! No, it's just 16k, pretty sure it's a typo.

The debug interrupt vectors start at the last long ($FFFFC) for cog0 and go down. There's one for each COG.

potatohead · 2017-11-12 23:41

Yes, and that makes great sense. Jmg's post implied 16k for vectors. !!! That's a freaking ton of vectors. Kind of funny to think about.

jmg · 2017-11-13 00:21

potatohead wrote: »

Yes, and that makes great sense. Jmg's post implied 16k for vectors. !!! That's a freaking ton of vectors. Kind of funny to think about.

Not really, that's 16k simply because that's the next-increment in memory size. Vectors actually use is a tiny portion of that.
The rest is valid variable or code space.

Some have indicated locking the vectors, is less than ideal for debug, & moving them out of ROM Locked block
is one way to solve that.

potatohead · 2017-11-13 01:14

I think that too, but I don't think it's worth extending to 32k, which is a pretty big chunk of RAM. 16k seems like an amount that won't cost us too much. Global variables, data structures, other things can be packed in there with no real impact, when the region isn't being used as a kernel, OS, or support software.

I also think the possible solutions are pretty reasonable and workable.

They can be redirected, at one instruction cost, or a routine can update them at a few instruction cost.

Or, don't write inhibit.

Edit: Its also nice to have those vectors be write inhibited. The code in the region owns them, and that can make recovery from ugly bugs possible. Bonus for debug tools written to run there.

cgracey · 2017-11-13 02:19

The reason it's 16KB is because that's the ROM size. We had 8 longs up there to cover the debug interrupt instructions, but I expanded the area to accommodate the whole ROM, then added a write-protect mechanism. I think it's just peachy now. We've got the ROM loading into it in just 1.5ms on boot.

I used to have memory wrap, but that caused some real gotcha's. Now, there's a big gap that reads $00's and ignores writes. That is safe and reasonable, I think.

ozpropdev · 2017-11-13 05:18

Chip
Is it possible to tweak Pnut to allow loads into the top 16k ($FC000-$FFFF)?
This would help Nano users in particular who have lost half theeir hub memory.

For example the following code compiles Ok from Pnut but never loads the top memory.
Ctrl-M shows object file is correct although "OBJ byes" value is weird.

OBJ bytes: :32,224

_CLKMODE: 00
_CLKFREQ: 00B71B00

00000- 00 FE 65 FD 3E F8 0C FC 38 5B 81 FF 3E 0E 1C FC   ..e.>...8[..>...
00010- 41 7C 64 FD 5A 62 82 FF 28 00 64 FD 00 C0 CF FE   A|d.Zb..(.d.....
00020- 04 00 B0 FD EC FF 9F FD 61 ED CF FA 2D 00 64 AD   ........a...-.d.
00030- 3E 00 9C FA F8 FF 9F CD 3E EC 27 FC E8 FF 9F FD   >.......>.'.....
00040- 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00   ................
.
.
.
FBFD0- 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00   ................
FBFE0- 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00   ................
FBFF0- 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00   ................
FC000- 48 65 6C 6C 6F 20 77 6F 72 6C 64 2C 20 00 00 00   Hello world, ...
FC010- 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00   ................

and here the source program


CON
	sys_clk = 80_000_000
	baudrate = 115_200
	tx_pin = 62
	nco = (sys_clk / baudrate) * $1_0000 & $FFFFFC00 
	nco_f = (sys_clk - ((sys_clk / baudrate) * baudrate)) * 64 / baudrate

DAT		org

		clkset	#$ff
		wrpin	#%1_11110_0,#tx_pin
		wxpin	##nco | nco_f << 10 |7,#tx_pin
		dirh	#tx_pin

loop		waitx	##sys_clk
		loc	ptra,#@message
		call	#print
		jmp	#loop

print		rdbyte	pa,ptra++ wz
	if_z	ret
send_char	rdpin	0,#tx_pin wc
	if_c	jmp	#send_char
		wypin	pa,#tx_pin
		jmp	#print

		orgh	$fc000
message		byte	"Hello world, ",0

jmg · 2017-11-13 05:37

ozpropdev wrote: »

Chip
Is it possible to tweak Pnut to allow loads into the top 16k ($FC000-$FFFF)?

Does P2load work ?

Seems pnut should allow the full 1M download, as some FPGAs do have that ?
It could easily report the memory slice sizes, on a simple bottom-up and top-down data inspection.

ozpropdev · 2017-11-13 05:48

jmg wrote: »

Seems pnut should allow the full 1M download, as some FPGAs do have that ?
It could easily report the memory slice sizes, on a simple bottom-up and top-down data inspection.

The big A9 FPGA's have been brought back to 512k to represent the final silicon.

jmg · 2017-11-13 05:54

ozpropdev wrote: »

The big A9 FPGA's have been brought back to 512k to represent the final silicon.

Sure, but that's only a software setting and easily changed.
Once the chip is released, some may want to use 1M FPGAs for development platforms.

Cluso99 · 2017-11-13 08:51

What is wrong with putting this 16KB in the bottom 16KB of HUB RAM?

The JMP vectors could be placed from $0_1000 upwards (need to be =>$0_1000 to run hubexec).

We would like to be able to unlock the lower 128 byte block due to rdxxxx/wrxxxx immediate access.

Maybe the write protection could be done in 4x 4KBblocks ?

This way, hub is still contiguous. The ROM is copied into the bottom 16KB of hub. If necessary, the user program can move any/all of this to wherever. It permits a much better use of hub for buffers (particularly screen buffers). It also allows subsequent P2 extended versions to have >1MB of hub (with some caveats due to instruction bits).

cgracey · 2017-11-13 15:31

Whichever the case, the debug interrupt instructions (1 per cog) ought to be placed at one end of memory or the other, as they must be in fixed locations. It seems to me that there is a need for 16KB of write-protectable RAM, as well, and that might as well overlap the debug interrupt instructions, as they need write-protecting, too.

Locating all that at $00000 would be cleaner and would not bifurcate hub memory, but then programs could not start at $00000, like they do now. I think it is nice that beginners get to orient their programs at the start of memory. They can be oblivious, for a time, about things like protected memory at the end of the map which contains the debug interrupt instructions.

Anyway, I think much deference should be given to locating applications at $00000. The end of memory can almost be forgotten about, while frontloading the protected area makes it stick out like a sore thumb. It's kind of like giving your guest the best seat in the room.

potatohead · 2017-11-13 15:38

I really like where it is right now.

Agreed with the sore thumb perception. Doing it this way keeps the number of things one must know to get started down lower.

Write protecting the vectors makes a ton of sense. It's an opportunity for those to be managed by system code, should it be in play.

cgracey · 2017-11-13 15:48

16KB is ~3% of 512KB. Not a big deal, anyway.

potatohead · 2017-11-13 16:05

Yup.

msrobots · 2017-11-13 17:22

My first assumption was that the ROM would be loaded from 0 upwards, containing booter and SHA-stuff.

I am delighted about the current solution. But even with a gap in between, it needs to be possible to load something there while booting.

The simplest way would be to allow Pnut and P2load to load the complete address space even if no RAM is present. Can do no harm?

@ozpropdev's example (as usual) is clean to read and doing a ORG $FC000 for higher RAM or ORG $FFFFC to write a debug vector makes quite sense to any assembler programmer.

But if one wants to use the ROM content and set a debug vector while loading a binary he will need to include the ROM content of the upper area, thus loading a 1MB image.

The other solution would be a change in the binary format, instead of saving a copy of the RAM image, saving every ORG based block with address to load to and size.

Then the P2 booter would need to walk down the list and load each block at each address.

Mike

potatohead · 2017-11-13 17:30

Can't a second stage loader do that?

Not that I mind an upgrade.

Personally, I would prefer it load at addresses contained in the format, just like the P2 monitor would do on data cut n paste. That kind of thing rocks.

We should do it.

Programmers set ORG where needed, go. Simple, lean, fast, robust.

I dislike having to push a whole megabyte when it's just not gonna get used.

And, if we support ORG blocks, developers can still push a megabyte and zero / data fill the gaps in the image if they want or somehow need to.

jmg · 2017-11-13 18:43

cgracey wrote: »

....
Locating all that at $00000 would be cleaner and would not bifurcate hub memory, but then programs could not start at $00000, like they do now. I think it is nice that beginners get to orient their programs at the start of memory. They can be oblivious, for a time, about things like protected memory at the end of the map which contains the debug interrupt instructions.
...

Other MCUs have reset/interrupts at 0000, which means you always know where they are, no matter what future memory size you may have.
If you cannot add 16k of memory above 512k, you are forcing a split on what was a clean binary block, & then I'd say placing the ROM at 00H becomes more important. ( That split has already bitten written code.. )

With other MCUs the offsets are largely managed by the tools, (so invisible to any beginnner) and you can use segments in assembler, so that CSEG ORG 00 is still the first byte of code...
( in P2, first byte of HUB code would be something like HSEG 00 ?)

You would probably want a ROM segment in the Assembler, no matter where the base of that is.

jmg · 2017-11-13 18:48

msrobots wrote: »

The simplest way would be to allow Pnut and P2load to load the complete address space even if no RAM is present. Can do no harm?

I'm guessing P2load already does that, and pnut certainly should be fixed.

msrobots wrote: »

But if one wants to use the ROM content and set a debug vector while loading a binary he will need to include the ROM content of the upper area, thus loading a 1MB image.

The other solution would be a change in the binary format, instead of saving a copy of the RAM image, saving every ORG based block with address to load to and size.
Then the P2 booter would need to walk down the list and load each block at each address.

That's called intel hex

Certainly, you do not want to be sending a large 1MB blob & even many files of 1MB are less than ideal...

msrobots · 2017-11-13 19:00

Sure a second stage loader could do that also, but then you would NEED a second stage loader to access the upper ROM/RAM.

Would it be possible that either the address counter wraps at $FFFFF or the 16 K placed at $FFFFC000 so it wraps with the long boundary?

Then a loader could load a continuous image, say ORG $FFFFFFFC to set a debug vector and then the program image follows at address 0?

Would allow to load continuous starting at the debug vectors leaving BIOS/ROM/RAM unchanged or starting at 0 without changing the upper RAM or start at (FFF)FC000 to load a continuous image in one block?

just asking,

Mike

cgracey · 2017-11-13 19:18

I'll make PNut.exe, for now, just load up to $FFFBF, if there's data ORGH'd up that high. That will protect the last 16 longs, which are the debug interrupt instructions.

I still need to get this BeMicro-A9 problem solved, somehow. I could just make two different images, but that seems ridiculous.

msrobots · 2017-11-13 19:20

why protect the debug vectors?

debug is mostly used in development, so uploading a image with activated debug vector might come in handy?

Mike

Cluso99 · 2017-11-13 19:26

While you can place code currently starting at $0_0000, users cannot run code from there (hubexec) due to mapping of the cog and lut addresses for the program counter.
So that has to be explained.

Why is that any different to explaining that their hubexec code starts at $0_1000 with the first $xxx bytes reserved for the Interrupt vectors.
And the ROM is initially copied to $0_0000-$0_3FFF (bottom 16KB of HUB RAM).

The pnut2 (or whatever) compiler could default to compile at ORGH $0_4000.

These days, memory maps on micros are often quite complex, with maps including ram, bootloaders, flash, and eeprom, registers, etc.

The P2 would still be extremely simple, and wouldn't require the hub to be broken into two blocks, just one contiguous block. This is far superior, especially for some of the proposed later versions with less cogs that most likely will have smaller hub ram.

Contiguous memory is IMHO always better. Think VGA where you want a large frame buffer. In this P2, you have a max frame buffer size of 512KB-16KB= 496KB.
A 256KB P2 would have a max buffer of 240KB, and a 128KB would give 112KB.

Remember all the old discussions about having a place for mailboxes, etc. These could all fit naturally in the bottom 4KB of Hub below the JMP vectors.

BTW I haven't checked lately. I have assumed the Interrupt Vectors to be physical JUMP instructions. If they are in fact just addresses, they could be placed much lower in Hub, just above the 128 bytes that can be directly accessed using immediate addressing in RDxxxx/WRxxxx instructions.

msrobots · 2017-11-13 19:42

wasn't there something that hubexec works below $1000, but just on ODD addresses?

or is that gone?

Mike

potatohead · 2017-11-13 21:07

I think that is gone.

jmg · 2017-11-13 21:26

cgracey wrote: »

I still need to get this BeMicro-A9 problem solved, somehow. I could just make two different images, but that seems ridiculous.

Did you check to confirm the DIP sw is actually wired as expected ? - can you activate some other pin, based on the DIP setting to confirm - even using a similar equation syntax, in case Altera gets confused there ?

Prop2 FPGA files!!! - Updated 2 June 2018 - Final Version 32i

Comments