Prop2 FPGA files!!! - Updated 2 June 2018 - Final Version 32i

ozpropdev · 2016-07-09 14:57

SETI is already supported.

ALTR could be thought of this way.

    add reg1,reg2,reg3
whuch in reality is
    altr #reg1
    add reg2,reg3

So reg1 = reg2 + reg3

evanh · 2016-07-09 15:00

Heh, you've switched from SETR the ALTR.

I was mentally unintentionally switching between SETx and ALTx too. That's another reason to make the naming more different.

ozpropdev · 2016-07-09 15:05

SETR sets bits 19..27 of D for use with ALTI.

evanh · 2016-07-09 15:12

Seairth pointed out that Chip's example of SETR is documented as SETI. And he followed that with a preference for the SETR mnemonic.

I was pondering what really is the best name since this SETI instruction is not really operating on the whole instruction like ALTI does. And as you highlighted, SETR is not a suitable mnemonic either since R implies the result register post execution.

ozpropdev · 2016-07-09 15:21

It depends on the final use of the register.
If you intend of modifying an instructions opcode an/or CZ effect bits then "I" is relevant.
On the other hand if you intend on using the register to configure the ALTI instructions R-field then "R" is relevant.
In the end they are the same 9 bits.

evanh · 2016-07-09 15:35

Oi! You said R = Result register. And that's good right there, don't go adding more meanings. Besides, the "R" single bit field of the instructions don't exist any longer.

evanh · 2016-07-09 15:41

I'm now also convinced SETO is better than SETI. SETI creates an impression of similarity to the ALTI instruction, which it certainly isn't.

cgracey · 2016-07-09 15:45

Seairth wrote: »
cgracey wrote: »
I think I've got the SHA-256/HMAC rewritten from Prop2-Hot, so that the boot ROM can do signed loader verification.

There was one snippet of code that translated in kind of a surprising way:
		setd	i,#w			'save opad key
		setr	i,#opad_key
		rep	@.r,#16
		alti	i,#%111_111_000
		xor	0,opad			'xor bytes with opad ($5C)
This picks up 16 longs in cog RAM, starting at 'w', XOR's them with 'opad' ($5C5C5C5C), then writes them starting at 'opad_key'. It uses the ALTI instruction to indirect and increment pointers for both the D register ('w') and result register ('opad key').
Your documentation states SETI, not SETR. Personally, I think SETR is more appropriate.

Yeah, I changed SET I to SETR.

cgracey · 2016-07-09 15:47

ozpropdev wrote: »

SETR sets bits 19..27 of D for use with ALTI.

Correct.

cgracey · 2016-07-09 15:49

ozpropdev wrote: »

It depends on the final use of the register.
If you intend of modifying an instructions opcode an/or CZ effect bits then "I" is relevant.
On the other hand if you intend on using the register to configure the ALTI instructions R-field then "R" is relevant.
In the end they are the same 9 bits.

This is true. We almost need different names for different uses, even though bits 27..19 are always the ones being affected. SETR is going to be the most common use for setting those bits.

Cluso99 · 2016-07-09 21:39

IMHO

SETS sets sssssss (b8:0) to S/# of the instruction/register pointed to by D
SETD sets ddddddd (b17:9) to S/# of the instruction/register pointed to by D
SETI sets the iiiii_cz (b27:19) to S/# of the instruction/register pointed to by D
Note: If the destination is an instruction to be executed, then at least 2 instructions (4 clocks) must pass before the modified instruction can be executed due to the instruction pipeline.

ALTS alters sssssss (S8:0) field of the following instruction to the value of D plus S/#
ALTS alters ddddddd (D17:9) field of the following instruction to the value of D plus S/#
ALTR alters rrrrrrrrr (result) field (the register address where the following instructions result will be written to) to the value of D plus S/#
Warning: The values of D and S/# are limited to register addresses (ie 9 bits) otherwise overflow fails!

ALTISD alters the I/S/D fields of the following instruction according to the value of S/#
ALTISD is currently ALTI but it is more than that.

What happens with the next instruction if the S and/or D fields are not immediate??? I found that I needed to program MOV x,#0 for the ALTS to work properly.

Seairth · 2016-07-10 00:20

Part of the issue here is that we are using "instruction" to sometimes talk about the entire 32-bit value (e.g. ALTI) and sometimes about iiiiiiicz (e.g. SETI). Renaming SETI to SETR simplifies that somewhat. To reinforce this, it might make more sense to refer to the "iiiiiii" field as an opcode ("ooooooo"):

cccc_ooooooo_czi_ddddddddd_sssssssss

evanh · 2016-07-10 00:32

Seairth wrote: »

cccc_ooooooo_czi_ddddddddd_sssssssss

Ohhhh, you've got my vote!

jmg · 2016-07-10 01:54

Seairth wrote: »

Part of the issue here is that we are using "instruction" to sometimes talk about the entire 32-bit value (e.g. ALTI) and sometimes about iiiiiiicz (e.g. SETI). Renaming SETI to SETR simplifies that somewhat. To reinforce this, it might make more sense to refer to the "iiiiiii" field as an opcode ("ooooooo"):

cccc_ooooooo_czi_ddddddddd_sssssssss

I'm also fine with having dual-opcodes aka 64b opcode generation, if that makes the ASM easier to read.
It is also ok to have multiple names for the same binary opcode, if that also makes intent clearer.

The idea is to have readable ASM files.

ozpropdev · 2016-07-10 02:01

Cluso99 wrote: »

What happens with the next instruction if the S and/or D fields are not immediate??? I found that I needed to program MOV x,#0 for the ALTS to work properly.

That is correct.
If you want the following S field to be an immediate you must have it already defined it as a ",#s".

Cluso99 · 2016-07-14 02:57

I have used the SETCZ instruction a number of times now and I find the operation confusing with the Z flag being set if the bit=1.
This is opposite to the TESTB instruction which will set the Z flag if the bit=0, or other instructions which set the Z flag if the result is 0.

Should it be changed?

Also, it sure would be nice to have the reverse instruction
SAVECZ D
which replaces D[1:0] with the C and !Z flag bits. ie D[31:2] remain untouched.
This could occupy the SETCZ D instruction slot with WC & WZ =00, or one of the spare opcode3 slots.

Cluso99 wrote: »

SETCZ D/# {WC,WZ}

CCCC 1101011 CZ L DDDDDDDDD 000101001 SETCZ D/# {WC,WZ}

Sets the C & Z flags according to D[1:0] and WC, WZ
If WC is specified then C=D[1] (0=C cleared, 1=C set)
If WZ is specified then Z=D[0] (0=Z cleared, 1=Z set) (yes, 1 sets Zero flag)
If neither WC nor WZ is specified, then C & Z flags are not set/changed
If both WC and WZ are specified, then both C & Z flags are set/changed

I have verified the above with v10a.

Note to Chip
Might it be possible/easy to have SETCZ when neither WC nor WZ is specified, that D[1:0] is set/written with the contents of C & Z flags (ie the reverse of SETCZ D WC,WZ) ? It could be called SAVECZ.

Cluso99 · 2016-07-14 16:20

Anyone besides Peter J and myself testing P2?

The forum seems dead as a nail.

Seairth · 2016-07-14 17:59

Cluso99 wrote: »

Anyone besides Peter J and myself testing P2?

The forum seems dead as a nail.

Unfortunately, I'm just not finding the time to do it right now... I had hoped to get the VSCode extension finished enough to help others write more P2 code, but that's been a much bigger effort than I was expecting. I also need to find time to update/re-rerun these unit tests.

David Betz · 2016-07-14 18:43

Cluso99 wrote: »

Anyone besides Peter J and myself testing P2?

The forum seems dead as a nail.

I would like to start working on modifying the PropGCC code generator to produce P2 code but I'm busy with another Parallax project at the moment.

garryj · 2016-07-14 20:31

I'm still plugging along on my USB host, low-speed only at this time. I have six low-speed devices -- a mix of keyboard, mice and joysticks, and I'm able to read configuration info from all of them. I'm just starting on the HID driver to actually do something with the data, but it's been slow going :depressed:

I did some experimenting with full-speed, and it looks like it's going to be a real challenge to get something working reliably at 80Mhz, but it should be doable when the P2's on 160Mhz silicon.

Rayman · 2016-07-14 23:46

I'm in process of converting code to work with latest version.
But been travelling, so has been slow...

Now that it looks like last version may last a while, I'm more motivated...

jmg · 2016-07-15 00:43

garryj wrote: »

I did some experimenting with full-speed, and it looks like it's going to be a real challenge to get something working reliably at 80Mhz, but it should be doable when the P2's on 160Mhz silicon.

Where are the issues with FS USB ?

Hmm, that exposes something of a conundrum, as Full Speed should really be tested before passing to FAB, and it seems 80MHz is the upper ceiling ?
Next choices could be 84MHz & 96MHz ?

Note that 160MHz is purely an aspirational target, and the final device may come in well under that, so best to have some real margin in USB operation.

ozpropdev · 2016-07-15 01:40

Cluso99 wrote: »

Anyone besides Peter J and myself testing P2?

The forum seems dead as a nail.

I've been testing P2 at every free moment since V1.
I'm currently finishing off a P2 demonstration that includes working examples of ALL P2 instructions.
It uses multiple cogs (8) and uses hubexec, lutexec ,lut sharing, interrupts, streamers, cog attention, cordic etc. etc.
All of the instruction examples are working (necessary) parts of the complete demo.
I hope to have it and its documentation close to finalized next week.

Teaser: The video part of the demo uses 5 cogs.

garryj · 2016-07-15 02:20

jmg wrote: »

Where are the issues with FS USB ?

Hmm, that exposes something of a conundrum, as Full Speed should really be tested before passing to FAB, and it seems 80MHz is the upper ceiling ?
Next choices could be 84MHz & 96MHz ?

Note that 160MHz is purely an aspirational target, and the final device may come in well under that, so best to have some real margin in USB operation.

It's very, very close, but a matter of too much to do and not enough time @80Mhz to git-er-done. I can reliably transmit 64 byte OUT data packets at full-speed, as there is less overhead than when receiving. But the tx routine is about a half-dozen clock cycles away from tipping over, i.e. adding one or two instructions and you fall behind in feeding the output buffer and the packet ends up being a bust.

On receive I've had success reading handshake response packets to SETUP transactions, but with data packets I can get only a few bytes read before falling behind the incoming data. I've pretty much run short of ideas on how to tighten up the routine and still do the right thing by running a CRC check on the data. I'm not the brightest bulb in the room when it comes to writing clever code, so there's likely someone that can get some demonstration code up and running at full-speed, as was done on the P1, but on that project there had to be a lot of corners cut to get it to work.

But I'm optimistic that it can be done in a robust manner with just a little more speed. 96MHz might get something workable, and if the silicon comes close to the 160MHz target that should provide more than enough headroom.

Cluso99 · 2016-07-15 02:24

ozpropdev wrote: »

Cluso99 wrote: »

Anyone besides Peter J and myself testing P2?

The forum seems dead as a nail.

I've been testing P2 at every free moment since V1.
I'm currently finishing off a P2 demonstration that includes working examples of ALL P2 instructions.
It uses multiple cogs (8) and uses hubexec, lutexec ,lut sharing, interrupts, streamers, cog attention, cordic etc. etc.
All of the instruction examples are working (necessary) parts of the complete demo.
I hope to have it and its documentation close to finalized next week.

Teaser: The video part of the demo uses 5 cogs.

Ooooh! Sounds interesting, so cannot wait to see

garryj · 2016-07-15 02:45

I've run into some unexpected behavior while trying to implement a simple timespan interrupt service routine that uses the CTx-equals-CT event. It happens when a WAITX is used in non-ISR code and the timespan is equal to, or greater than, the timespan of the ISR. When using a POLLCTx method in the non-ISR code, things work as expected. I didn't see WAITx mentioned in the doc section that lists the interrupt branch conditions. Here's my test code for the P2 1-2-3 A9 board, p2v10a image:

'
' Test of unexpected behavior of an interrupt service routine using
' the CTx-equals-CT event trigger.
' Test run on the Propeller 1-2-3 FPGA board with the P2v10a FPGA image.
'
con
	SYSCLOCK = 80_000_000
	_1ms     = SYSCLOCK / 1_000
'------------------------------------------------------------------------------
dat
		org

'------------------------------------------------------------------------------
init
		setword	dirb, ##$ffff, #0		' Use P2 1-2-3 FPGA USER_LEDs for feedback
		mov	ijmp1, #isr1
		mov	pass, #1
pass2
		getct	ct1
		getct	ct2
		addct1	ct1, ##_1ms * 500		' ISR routine holds steady one Hz blink cycle
		mov	wait1, ##_1ms * 200		' Start non-ISR code with a faster blink rate
		addct2	ct2, wait1
		setint1	#1				' Set ISR event trigger to CTx-equals-CT
		setword	outb, ##$f00f, #0		' Observe what happens on the USER_LEDs
		mov	count, #0
		cmp	pass, #2		wz
	if_z	jmp	#pollx_blink
'------------------------------------------------------------------------------
' When WAITX is used in non-ISR code everyting looks fine until the WAITX
' timespan matches or exceeds the timespan of the ISR routine. The LED
' blink rate becomes synchronized for a few cycles, then the ISR routine's
' blinking goes out to lunch.
'
' If the ISR routine uses GETCT when resetting the blink timespan, when the
' non-ISR code blink rate matches the ISR blink rate the ISR blink rate
' becomes synchronized with the non-ISR blink rate.
'------------------------------------------------------------------------------
waitx_blink
		waitx	wait1
		getnib	wtmp, outb, #0
		xor	wtmp, #$f
		setnib	outb, wtmp, #0
		add	count, #1
		cmp	count, #10		wz
	if_z	add	wait1, ##_1ms * 100		' Start slowing the non-ISR routine's blink rate
	if_z	mov	count, #0
		cmp	wait1, ##_1ms * 800	wz
	if_z	add	pass, #1
	if_z	setword	outb, ##$ffff, #0
	if_z	waitx	##_1ms * 2_000
	if_z	jmp	#pass2
		jmp	#waitx_blink
'------------------------------------------------------------------------------
' When the non-ISR code uses a polling method, the behavior is as expected.
' The ISR blink rate remains static while the non-ISR code's blink rate
' continues to slow.
'------------------------------------------------------------------------------
pollx_blink
		pollct2				wc
	if_nc	jmp	#pollx_blink
		add	count, #1
		cmp	count, #10		wz
	if_z	add	wait1, ##_1ms * 100		' Start slowing the non-ISR routine's blink rate
	if_z	mov	count, #0
		addct2	ct2, wait1
		getnib	wtmp, outb, #0
		xor	wtmp, #$f
		setnib	outb, wtmp, #0
		jmp	#pollx_blink

blink
		jmp	#waitx_blink
'------------------------------------------------------------------------------
' Simple ISR routine to blink USER_LEDs 15..12 at a steady one Hz rate.
'------------------------------------------------------------------------------
isr1
'		getct	ct1
		addct1	ct1, ##_1ms * 500
		getnib	ctmp, outb, #3
		xor	ctmp, #$f
		setnib	outb, ctmp, #3
		reti1
'------------------------------------------------------------------------------
ct1		res	1
ct2		res	1
wait1		res	1
pass		res	1
count		res	1
ctmp		res	1
wtmp		res	1

ozpropdev · 2016-07-15 04:02

@garryi
No interrupts can occur while the WAITX instruction is executing.

ozpropdev · 2016-07-15 04:24

From the P2 documentation

When an interrupt event occurs, certain conditions must be met before
the interrupt branch can happen:

* ALTI / ALTR / ALTD / ALTS must not be executing
* SCLU / SCL must not be executing
* AUGS must not be executing or waiting for a S/# instruction
* AUGD must not be executing or waiting for a D/# instruction
* SETQ / SETQ2 must not be executing
* REP must not be executing or active
* STALLI must not be executing or active
* WAITX must not be executing ' *** Needs to be added ***

Once these conditions are all met, any pending interrupt is allowed to
branch, with priority given to INT1, then INT2, and then INT3.

In this example the ISR will fire once after 10 seconds and wont fire again for
another 43 seconds until the CT1 value wraps around again. Then from there ISR will work Ok.

dat		org

		setb	dirb,#0
		mov	ijmp1, #isr1
		getct	adra
		addct1	adra,##20_000_000
		setint1	#1
		waitx	##80_000_000 * 10
end		jmp	#end

isr1		addct1	adra,##20_000_000
		notb	outb,#0
		reti1

garryj · 2016-07-15 05:06

Thanks much for the explanation. When I saw the REP instruction in the conditions list that made me think that WAITX could be involved, but at the time I had sections of code where WAITX was executing and it didn't show any ill effect. It was new code, and as I was perusing it looking for my usual stupid-programmer-trick of having a "#"-less branch address I spotted a WAITX whose timespan happened to be greater than the ISR's, and it was at that point that I tried the POLLCTx approach, which works just fine -- it's just a little more verbose than a good 'ol waitx

cgracey · 2016-07-15 16:37

Good to see people are still alive!

I've got the ROM booter written now, but it needs some debugging, yet. It does signed loader verification using the fuses. The bug is in the SHA-256/HMAC code. I hope to have that fixed today. Once that's done, we may be finished with the chip, as far as what's needed to make the silicon goes.

I decided NOT to have a monitor program in ROM, for several reasons which I'll write about later.

Prop2 FPGA files!!! - Updated 2 June 2018 - Final Version 32i

Comments