Prop2 FPGA files!!! - Updated 2 June 2018 - Final Version 32i

Seairth · 2018-03-25 00:38

Out of curiosity, what is the clock cycle overhead of entering and exciting the debug routines? Assuming it's low, I wonder if there are other non-debug uses for those resources. This group has come up with some ingenious uses for Propeller resources in the past, and I suspect it will be no different for P2. I have no doubt that every part of the P2 will be utilized, even if not for their intended purpose.

TonyB_ · 2018-03-25 01:02

potatohead wrote: »

When the end user runs the application there won't be any Debug code. Unfortunately there won't be 512KB of hub RAM, either.

Precisely. It's just there. Standard feature, same as any other feature a user may or may not use.

No other feature has a RAM cost when not used. This is unique.

Debugging as it is now has an avoidable downside and the user should be given the choice of turning it and write-protection off and getting the full RAM instead.

cgracey · 2018-03-25 01:09

Okay.

I made the last 16KB of RAM appear at both its normal address range ($7C000..$7FFFF in the case of 512KB) and in the last 16KB of hub space ($FC000..$FFFFF).

When write-protect is enabled, the last 16KB of RAM disappears from its normal address range ($7C000..$7FFFF) and becomes write-protected at $FC000..$FFFFF. Only code running in debug ISRs will be able to write the RAM from $FC000..$FFFFF. All reads from $7C000..$FBFFF will return $00's.

So, you have a big flat space, and when you want to enable strong debug or ROM at the top of memory, the last 16KB goes away from the normal RAM map and is read-only at the top of the hub map.

I think this is a compromise we can all live with. It allows contiguous access to all 512KB of of RAM if an app doesn't care about debug.

TonyB_ · 2018-03-25 01:37

Thanks, Chip. I think that should avoid the debugging downside.

Will the 12KB block starting at $7C000 / $FC000 contain code copied from the ROM after booting?

cgracey · 2018-03-25 01:41

TonyB_ wrote: »

Thanks, Chip. I think that should avoid the debugging downside.

Will the 12KB block starting at $7C000 / $FC000 contain code copied from the ROM after booting?

Yes. And you mean 16KB, I assume.

TonyB_ · 2018-03-25 01:55

cgracey wrote: »

TonyB_ wrote: »

Thanks, Chip. I think that should avoid the debugging downside.

Will the 12KB block starting at $7C000 / $FC000 contain code copied from the ROM after booting?

Yes. And you mean 16KB, I assume.

I wasn't sure about how or when the debug buffers are set up so I left out 4KB!

Please don't think some of us are ungrateful so-and-so's. Debugging should be great fun, but it won't be needed all the time. I feel a lot happier now. In fact, I'm rejoicing, very quietly.

Dave Hein · 2018-03-25 02:00

Seairth wrote: »

Out of curiosity, what is the clock cycle overhead of entering and exciting the debug routines? Assuming it's low, I wonder if there are other non-debug uses for those resources. This group has come up with some ingenious uses for Propeller resources in the past, and I suspect it will be no different for P2. I have no doubt that every part of the P2 will be utilized, even if not for their intended purpose.

The debug interrupt could be used by an OS for kernel services. With 16K of protected memory, there could be some features of the OS that are only accessible by the kernel.

cgracey · 2018-03-25 02:21

TonyB_ wrote: »

cgracey wrote: »

TonyB_ wrote: »

Thanks, Chip. I think that should avoid the debugging downside.

Will the 12KB block starting at $7C000 / $FC000 contain code copied from the ROM after booting?

Yes. And you mean 16KB, I assume.

I wasn't sure about how or when the debug buffers are set up so I left out 4KB!

Please don't think some of us are ungrateful so-and-so's. Debugging should be great fun, but it won't be needed all the time. I feel a lot happier now. In fact, I'm rejoicing, very quietly.

I feel better about it, too, actually.

jmg · 2018-03-25 03:03

cgracey wrote: »

So, you have a big flat space, and when you want to enable strong debug or ROM at the top of memory, the last 16KB goes away from the normal RAM map and is read-only at the top of the hub map.

Sounds great

When does the 16KB ROM code have to be signed off, and delivered to OnSemi ?

cgracey · 2018-03-25 03:16

jmg wrote: »

cgracey wrote: »

So, you have a big flat space, and when you want to enable strong debug or ROM at the top of memory, the last 16KB goes away from the normal RAM map and is read-only at the top of the hub map.

Sounds great
When does the 16KB ROM code have to be signed off, and delivered to OnSemi ?

I think we have about two weeks, which is short.

I'm working on testing the I/O pad now on the test chip, to make sure it's okay.

After that, I need to make new FPGA images for everyone to try out.

If there are no problems, the Verilog is done, for this version.

Then, we need to get the software wrapped up.

jmg · 2018-03-25 03:36

cgracey wrote: »

I'm working on testing the I/O pad now on the test chip, to make sure it's okay.

Surely you can hand off some of this testing to someone else in the organisation, to free you up ?

Other testing values it would be nice to check are
* Crystal Oscillator Range - What MHz crystals & what ESR margins ? - (fundamental mode Xtals come up to ~ 52MHz these days)
* Clipped sine range - what is Bandwidth of Crystal amplifier, when used with Min C, and AC driven from 800mV p-p signal ?

Peter Jakacki · 2018-03-25 04:35

cgracey wrote: »

jmg wrote: »

cgracey wrote: »

So, you have a big flat space, and when you want to enable strong debug or ROM at the top of memory, the last 16KB goes away from the normal RAM map and is read-only at the top of the hub map.

Sounds great
When does the 16KB ROM code have to be signed off, and delivered to OnSemi ?

I think we have about two weeks, which is short.

I'm working on testing the I/O pad now on the test chip, to make sure it's okay.

After that, I need to make new FPGA images for everyone to try out.

If there are no problems, the Verilog is done, for this version.

Then, we need to get the software wrapped up.

Is that correct then? Is this the final step before you get real silicon back? In that case I need to update and integrate TAQOZ with these new features. However, I haven't really had any user feedback on this either.

cgracey · 2018-03-25 04:54

Peter Jakacki wrote: »

cgracey wrote: »

jmg wrote: »

cgracey wrote: »

So, you have a big flat space, and when you want to enable strong debug or ROM at the top of memory, the last 16KB goes away from the normal RAM map and is read-only at the top of the hub map.

Sounds great
When does the 16KB ROM code have to be signed off, and delivered to OnSemi ?

I think we have about two weeks, which is short.

I'm working on testing the I/O pad now on the test chip, to make sure it's okay.

After that, I need to make new FPGA images for everyone to try out.

If there are no problems, the Verilog is done, for this version.

Then, we need to get the software wrapped up.

Is that correct then? Is this the final step before you get real silicon back? In that case I need to update and integrate TAQOZ with these new features. However, I haven't really had any user feedback on this either.

Me, neither. We'll just use our best judgement.

cgracey · 2018-03-25 04:57

jmg wrote: »

cgracey wrote: »

I'm working on testing the I/O pad now on the test chip, to make sure it's okay.

Surely you can hand off some of this testing to someone else in the organisation, to free you up ?

Other testing values it would be nice to check are
* Crystal Oscillator Range - What MHz crystals & what ESR margins ? - (fundamental mode Xtals come up to ~ 52MHz these days)
* Clipped sine range - what is Bandwidth of Crystal amplifier, when used with Min C, and AC driven from 800mV p-p signal ?

There should be no problem with capacitively-coupled clipped-sine oscillators.

I'm pretty much it, at Parallax, when it comes to working on this project. I just hope and pray that no vital detail escapes awareness.

David Betz · 2018-03-25 10:59

cgracey wrote: »

Okay.

I made the last 16KB of RAM appear at both its normal address range ($7C000..$7FFFF in the case of 512KB) and in the last 16KB of hub space ($FC000..$FFFFF).

When write-protect is enabled, the last 16KB of RAM disappears from its normal address range ($7C000..$7FFFF) and becomes write-protected at $FC000..$FFFFF. Only code running in debug ISRs will be able to write the RAM from $FC000..$FFFFF. All reads from $7C000..$FBFFF will return $00's.

So, you have a big flat space, and when you want to enable strong debug or ROM at the top of memory, the last 16KB goes away from the normal RAM map and is read-only at the top of the hub map.

I think this is a compromise we can all live with. It allows contiguous access to all 512KB of of RAM if an app doesn't care about debug.

That sounds like a good choice. Thanks for doing this.

Cluso99 · 2018-03-25 11:59

Thanks Chip. Being able to havecontiguous memory as an option makes sense. If the 16KB is protected, it's only mapped st the top, makes sense too.

I need to chat with Peter and you for the SD Boot code. It's done but needs integration. Unfortunately my time is limited now as I am working M W F each week.

ErNa · 2018-03-25 12:55

"I just hope and pray that no vital detail escapes awareness. " While normally I only hope and commit praying to those, which love to pray with fervency for victims of superfluous violence, in this case I join you in praying and maybe those from down under can join me (as since about 20 years they saved praying for victims by about 50%) to help support the useful P!

TonyB_ · 2018-03-28 20:52

TonyB_ wrote: »

jmg wrote: »

cgracey wrote: »

I added C and Z flags for when you read breakpoint status using 'GETBRK D WC/WZ/WCZ' in debug ISRs.

On debug ISR entry, 'GETBRK D WC' can be used to find out if the cog has just been (re)started (C=0).

What does Z bit encode ?

Something else, perhaps. Why C=0 instead of C=1?

I mentioned this a week ago (mnemonic changed since):

TonyB_ wrote: »

Is it possible for GETINT GETBRK D WCZ to write something useful to the flags, e.g. C = D[0] and Z = (D[31:0] == 0) ?

GETBRK D WCZ returns all the pending skip bits and can be used outside debug interrupts for nested skipping.

For GETBRK, must C be the same in WC and WCZ, or Z the same in WZ and WCZ?

If no, could the WCZ flags be as I suggested? GETBRK D WCZ is a very handy "anytime" GETSKIP D WCZ instruction. If yes, that still leaves Z available to hold a copy of the most useful single status bit, STALLI perhaps?

Using C but not Z makes it look half-finished.

cgracey · 2018-03-28 21:50

The flags can be whatever we want them to be.

cgracey · 2018-03-31 21:03

I just put v32 at the top of this thread.

There are currently only files for the Prop123-A9 and the BeMicro-A9.

I need to catch up on the documentation.

The debugging has really improved in a lot of ways. It acts very sensibly now, only showing instructions that actually execute, unless they are being individually SKIP'd (unlike SKIPF) or have false conditions. More breakpoint triggers have been added and more data is reported back. Very easy to set up and use, too. No more mandatory debug interrupt instructions at the end of memory, though buffers are now allocated for 16-long save/restore and debug program. If you go to single_step.spin2, you'll see pretty much how it works.

Thanks and I hope to hear from at least a few of you. Early next week would be good.

Have a great Easter!

Peter Jakacki · 2018-03-31 23:27

Thanks Chip, i will make certain i start using this and finalizing TAQOZ. Cheers

TonyB_ · 2018-03-31 23:47

Happy Easter to you, Chip!

What was the final decision about GETBRK D WC/WZ/WCZ flags?

cgracey · 2018-04-01 00:22

Here are the data returned by GETBRK:

// getbrk

wire [33:0] getbrk_czr	=

	i[wc] ? i[wz]	? {	brk_isr ? !brk_pass[1] : int_stall,	// c	getbrk with wcz		(only show brk_pass in brk isr)
				hubs,					// z
				brk_code & {8{brk_isr}},		// r	(only show brk_code in brk isr)
				!brk_pass[1] && brk_isr,		//	(only show brk_pass in brk isr)
				csc_active,
				xfr_active,
				mem_mode,
				int_select[3:1],
				int_state[3:1],
				int_stall,
				hubs }

			: {	skipb[0],				// c	getbrk with wc
				1'b0,					// z
				skipk[3:0],				// r
				skipm,
				lut_share,
				stk_xbyte,
				xbytet[8:0],
				trap[15:0] }

			: {	1'b0,					// c	getbrk with wz
				~|skipb,				// z
				skipb[31:0]	};			// r

TonyB_ · 2018-04-01 00:49

GETBRK has changed a bit. In non-Verilog format it used to be:

GETBRK D WC - writes {CORDIC_inventory[4:0], Last_XBYTE_SETQ[9:0], LUT_share, Event[15:0]} into D, clears C
GETBRK D WZ - writes {8'b0, CALL_depth_during_SKIP[3:0], INT_select[3:1][3:0], INT_state[3:1][1:0], STALLI, SKIP_mode} into D, clears Z
GETBRK D WCZ - writes SKIP_pattern[31:0] into D, clears C and Z

I think an updated version of the above would be appreciated.
stk_xbyte = 1 when $1FF on top of stack? hubs = ?

evanh · 2018-04-01 01:01

I've noted something that might not be obvious at first glance. Because FIFO reloads must wait for a hub rotation to sync up its fetch window, and a hubExec recurring request will occur after the FIFO's hub sync has passed by, therefore the immediate fetch window always gets missed. Eg: REPing a single instruction on a 16-cog Prop2 takes 32 clocks per instruction. Ie: Two hub rotations per instruction.

That'd be 16 clocks on the 8-cog edition but that's still a little more encouragement for using CogExec in tight loops.

evanh · 2018-04-01 01:11

BTW: No problems in testing XORO32 and running my test code on V32.

cgracey · 2018-04-01 01:33

evanh wrote: »

I've noted something that might not be obvious at first glance. Because FIFO reloads must wait for a hub rotation to sync up its fetch window, and a hubExec recurring request will occur after the FIFO's hub sync has passed by, therefore the immediate fetch window always gets missed. Eg: REPing a single instruction on a 16-cog Prop2 takes 32 clocks per instruction. Ie: Two hub rotations per instruction.

That'd be 16 clocks on the 8-cog edition but that's still a little more encouragement for using CogExec in tight loops.

It takes five clocks to get data back from the hub. That is the problem. The next window gets missed by the time next RDxxxx gets queued up. Remember, though, that you can do SETQ + RDLONG and pull in longs at the rate of one per clock after overcoming the initial delay.

cgracey · 2018-04-01 01:34

evanh wrote: »

BTW: No problems in testing XORO32 and running my test code on V32.

Super.

cgracey · 2018-04-01 01:35

TonyB_ wrote: »

GETBRK has changed a bit. In non-Verilog format it used to be:

GETBRK D WC - writes {CORDIC_inventory[4:0], Last_XBYTE_SETQ[9:0], LUT_share, Event[15:0]} into D, clears C
GETBRK D WZ - writes {8'b0, CALL_depth_during_SKIP[3:0], INT_select[3:1][3:0], INT_state[3:1][1:0], STALLI, SKIP_mode} into D, clears Z
GETBRK D WCZ - writes SKIP_pattern[31:0] into D, clears C and Z

I think an updated version of the above would be appreciated.
stk_xbyte = 1 when $1FF on top of stack? hubs = ?

I'll get those docs updated soon.

evanh · 2018-04-01 01:47

cgracey wrote: »

It takes five clocks to get data back from the hub. ...

I've seen indications of that. The below code works as commented but if I throw an extra couple of NOPs after the CALL, but before the GETCT, the total ticks increases by 6. This obviously is due to the number of instructions between first and second reload.

A little more oddly though is I can get increments of 1 tick as well. I have no idea how.

		call    #puts
		getct   ticks               ' Importantly, the return from #puts reloads HubExec FIFO

		rep     @.endl,#20          ' 2 clks + (16 clks x 19 repeats) = 306 clocks
		xoro32  state               ' XORO32 ignores prior S value feed through
.endl
		mov     parm,0-0            ' final random value appears in S port
		getct   ticke               ' 306 + 4 clks = 310 total clocks from GETCT to GETCT ($0136)

Prop2 FPGA files!!! - Updated 2 June 2018 - Final Version 32i

Comments