Shop OBEX P1 Docs P2 Docs Learn Events
Propeller II update - BLOG - Page 208 — Parallax Forums

Propeller II update - BLOG

1205206208210211223

Comments

  • cgraceycgracey Posts: 14,152
    edited 2014-03-13 05:55
    ozpropdev wrote: »
    Cool! :) A handy feature!

    Do you have the opcode for that?

    It's right after COGNEWX. No flags written.
  • ozpropdevozpropdev Posts: 2,792
    edited 2014-03-13 06:06
    cgracey wrote: »
    It's right after COGNEWX. No flags written.
    Thanks Chip!
    --L- wrD		1111111 ZC L CCCC DDDDDDDDD 011101001		COGNEWX	D/#				(waits for hub)
    --L-			1111111 00 L CCCC DDDDDDDDD 011101010		RESD	D/#
    
  • Cluso99Cluso99 Posts: 18,069
    edited 2014-03-13 06:06
    Nice one Chip.!
  • roglohrogloh Posts: 5,786
    edited 2014-03-13 06:12
    Are there just 2 bits remaining in the WIDE task state now Chip after this change?
  • Bill HenningBill Henning Posts: 6,445
    edited 2014-03-13 06:51
    I like it!

    A few questions:

    1) What happens when:

    AUGD #some_big_constant
    RESD #C
    MUL A,B

    is executed?

    2) Is the #/D in RESD saved in the task state WIDE? (rogloh asked this above too)

    3) I think AUGS, AUGD are already saved in the task state WIDE, correct?

    cgracey wrote: »
    I added a 'RESD D/#' instruction which sets an override address for the next D register to be written.


    RESD #C
    MUL A,B

    ...writes A*B to C


    RESD #1
    LINK #address16

    ...writes the return address to $001 instead of $000


    RESD A

    ...indirection for writes
  • cgraceycgracey Posts: 14,152
    edited 2014-03-13 06:52
    ozpropdev wrote: »
    Thanks Chip!
    --L- wrD		1111111 ZC L CCCC DDDDDDDDD 011101001		COGNEWX	D/#				(waits for hub)
    ----			1111111 ZC L CCCC DDDDDDDDD 011101010		RESD	D/#
    


    That's right, but get the '--L-' in there and change 'ZC' to '00'.

    Since this RESD instruction is conditional (unlike AUGS/AUGD), I made it wait for an executing instruction that writes a result. So now you can do things like this:
    			RESD	#result
    	if_nz_and_nc	AND	a,b
    	if_nz_and_c	OR	c,d
    	if_z_and_nc	XOR	e,f
    	if_z_and_c	ADD	g,h
    

    ...without affecting source variables.

    This opens up a lot of doors for compact functions. I'll see about addressing your pick-any-set-of-two-bits-from-D-to-get-Z-and-C idea next. These are useful little helpers.
  • cgraceycgracey Posts: 14,152
    edited 2014-03-13 06:54
    rogloh wrote: »
    Are there just 2 bits remaining in the WIDE task state now Chip after this change?

    You got it. Can you think of any simple way we can get some good use out of them?
  • cgraceycgracey Posts: 14,152
    edited 2014-03-13 06:57
    I like it!

    A few questions:

    1) What happens when:

    AUGD #some_big_constant
    RESD #C
    MUL A,B

    is executed?

    2) Is the #/D in RESD saved in the task state WIDE? (rogloh asked this above too)

    3) I think AUGS, AUGD are already saved in the task state WIDE, correct?


    1) Only the bottom nine bits of D in RESD are stored. You weren't hoping this would automatically translate into a WRLONG were you?

    2) Yes, it takes 1 bit for the enable flag and 9 bits for the register address.

    3) Yes.
  • SapiehaSapieha Posts: 2,964
    edited 2014-03-13 07:17
    Hi Chip.

    So I assume that

    RESD
    > without any parameter

    Reset this system to Standard behavior.
  • cgraceycgracey Posts: 14,152
    edited 2014-03-13 07:20
    Sapieha wrote: »
    Hi Chip.

    So I assume that

    RESD
    > without any parameter

    Reset this system to Standard behavior.


    RESD is a one-shot deal. You do a RESD and the next instruction that writes a register uses it, and it's over. Things are back to normal then, until another RESD is issued, followed, at some point, by another instruction that writes.

    I forgot to mention that RESD works with register remapping.
  • SapiehaSapieha Posts: 2,964
    edited 2014-03-13 07:23
    Hi Chip.

    Thanks

    Now I understand
    cgracey wrote: »
    RESD is a one-shot deal. You do a RESD and the next instruction that writes a register uses it, and it's over. Things are back to normal then, until another RESD is issued, followed, at some point, by another instruction that writes.

    I forgot to mention that RESD works with register remapping.
  • Bill HenningBill Henning Posts: 6,445
    edited 2014-03-13 07:31
    EXTREMELY COOL!

    I can see using that quite a bit.

    Basically gives us flexible three operand instructions!
    cgracey wrote: »
    That's right, but get the '--L-' in there and change 'ZC' to '00'.

    Since this RESD instruction is conditional (unlike AUGS/AUGD), I made it wait for an executing instruction that writes a result. So now you can do things like this:
    			RESD	#result
    	if_nz_and_nc	AND	a,b
    	if_nz_and_c	OR	c,d
    	if_z_and_nc	XOR	e,f
    	if_z_and_c	ADD	g,h
    

    ...without affecting source variables.

    This opens up a lot of doors for compact functions. I'll see about addressing your pick-any-set-of-two-bits-from-D-to-get-Z-and-C idea next. These are useful little helpers.
  • Bill HenningBill Henning Posts: 6,445
    edited 2014-03-13 07:34
    cgracey wrote: »
    1) Only the bottom nine bits of D in RESD are stored. You weren't hoping this would automatically translate into a WRLONG were you?

    Not on Prop2...

    I think for P3 we will need to revisit a lot of things, but that will be a while away.

    I do like your other helper idea (any two bits to Z and C)
  • Bill HenningBill Henning Posts: 6,445
    edited 2014-03-13 07:46
    Chip,

    Regarding your idea for any two bits to Z and C. How about:

    zzzzz = bit number to use as bit Z comes from
    ccccc = bit number to use as bit C comes from

    That needs 10 bits

    But

    - WC and WZ are available (as this instruction would always write C and Z, that's the whole point) one could be used to extend D's 9 bits
    - I is available, instruction does not make sense with an immediate operand

    GETZC zzzzzccccc,S

    May I also modestly propose (Don't think we need it)

    PUTWZ D, zzzzzccccc

    These instructions would then be useful for decoding differential input, and outputting differential outout... as INA/B/C/D and OUTA/B/C/D would be addressable.

    GETZC would also be very useful in VM's and other byte code / protocol decoding.

    PUTZC would be useful in protocol encoding.

    The down side? They would need two full ops.

    FYI, I think GETZC is much more useful than PUTZC.
  • RaymanRayman Posts: 14,641
    edited 2014-03-13 07:47
    Any chance the DE2 could run 1 cog at 200 MHz instead of 5 at 80 MHz?
  • cgraceycgracey Posts: 14,152
    edited 2014-03-13 07:51
    Rayman wrote: »
    Any chance the DE2 could run 1 cog at 200 MHz instead of 5 at 80 MHz?


    For some crazy reason, if I compile only one cog on the DE2-115 the Fmax is really low. I need to compile at least two cogs to get a decent Fmax. Two cogs and four cogs are the same Fmax. This is a peculiarity of the Cyclone IV. Stratix III, on the other hand, can go much faster with just one cog.
  • cgraceycgracey Posts: 14,152
    edited 2014-03-13 07:53
    Chip,

    Regarding your idea for any two bits to Z and C. How about:

    zzzzz = bit number to use as bit Z comes from
    ccccc = bit number to use as bit C comes from

    That needs 10 bits

    But

    - WC and WZ are available (as this instruction would always write C and Z, that's the whole point) one could be used to extend D's 9 bits
    - I is available, instruction does not make sense with an immediate operand

    GETZC zzzzzccccc,S

    May I also modestly propose (Don't think we need it)

    PUTWZ D, zzzzzccccc

    These instructions would then be useful for decoding differential input, and outputting differential outout... as INA/B/C/D and OUTA/B/C/D would be addressable.

    GETZC would also be very useful in VM's and other byte code / protocol decoding.

    PUTZC would be useful in protocol encoding.

    The down side? They would need two full ops.

    FYI, I think GETZC is much more useful than PUTZC.


    I was thinking the Z/C pair could come from two adjacent bits. Is that too limiting, do you think?

    For function selectors, I think you'd probably want it that way.

    Cluso99 had brought this request up a few pages back.
  • Bill HenningBill Henning Posts: 6,445
    edited 2014-03-13 07:56
    If it can fit, without re-organizing all the other opcodes, it would be really nice if it could be any two bits, specifically to make arbitrary decoding easier.

    Don't get me wrong, adjacent bits is also useful... but random bits is more useful.
    cgracey wrote: »
    I was thinking the Z/C pair could come from two adjacent bits. Is that too limiting, do you think?

    For function selectors, I think you'd probably want it that way.
  • cgraceycgracey Posts: 14,152
    edited 2014-03-13 09:23
    Any two bits is very awkward to accomplish, unless you had the two five-bit fields waiting in a register.


    I made a new 'PICKZC D/#,S/#' instruction. It picks a Z/C pair using S/# as a 4-bit index into sixteen two-bit groups in D/#. It always writes Z and C, without needing WZ or WC.

    With PICKZC you can use registers or constants for both the data and the index, so it's very flexible.

    PICKZC data,#15 'get Z/C from data's msb's
    PICKZC data,index 'get Z/C from data according to index
    PICKZC #%10,#0 'set Z and clear C

    I will add new conditionals to the assembler:

    if_00
    if_01
    if_10
    if_11
    if_x0
    if_x1
    if_0x
    if_1x
    if_either
    if_same
    if_diff

    I got rid of the SETZC instruction because it was redundant.
  • Bill HenningBill Henning Posts: 6,445
    edited 2014-03-13 09:43
    Looks great!

    Allows trivial decoding of differential digital input pair...
             PICKZC ina,#16 ' 16&17 are differential +/- pair
    if_00    jmp    #S0 state
    if_01    RCL   data,#1   ' received a 1
    if_10    SHL   data,#1   ' received a 0
    if_11    jmp    #S1
    

    Above is almost certainly incorrect for USB, but illustrates the idea. If one of S0 or S1 are not needed, can receive at 50Mbps (@200Mhz) or 48Mbps (@192Mhz)

    Judicious sprinkling of NOP's and re-syncing per packet (or byte) should allow 48Mbps @ 200Mhz for CRC-less testing

    We still need a USB instruction (or state machine) though due to CRC
    cgracey wrote: »
    Any two bits is very awkward to accomplish, unless you had the two five-bit fields waiting in a register.


    I made a new 'PICKZC D/#,S/#' instruction. It picks a Z/C pair using S/# as a 4-bit index into sixteen two-bit groups in D/#. It always writes Z and C, without needing WZ or WC.

    With PICKZC you can use registers or constants for both the data and the index, so it's very flexible.

    PICKZC data,#15 'get Z/C from data's msb's
    PICKZC data,index 'get Z/C from data according to index
    PICKZC #%10,#0 'set Z and clear C

    I will add new conditionals to the assembler:

    if_00
    if_01
    if_10
    if_11
    if_x0
    if_x1
    if_0x
    if_1x
    if_either
    if_same
    if_diff

    I got rid of the SETZC instruction because it was redundant.
  • RaymanRayman Posts: 14,641
    edited 2014-03-13 09:50
    Ok, maybe that was a longshot... Do you think the Cyclone V will get to higher frequency, or is the same limit?
    cgracey wrote: »
    For some crazy reason, if I compile only one cog on the DE2-115 the Fmax is really low. I need to compile at least two cogs to get a decent Fmax. Two cogs and four cogs are the same Fmax. This is a peculiarity of the Cyclone IV. Stratix III, on the other hand, can go much faster with just one cog.
  • Bill HenningBill Henning Posts: 6,445
    edited 2014-03-13 09:55
    I wonder if compiling for two cogs, and setting Quartus for maximum speed would help.

    Or if the $3000 Quartus version would make a difference in the two cog / max speed case.
    Rayman wrote: »
    Ok, maybe that was a longshot... Do you think the Cyclone V will get to higher frequency, or is the same limit?
  • SapiehaSapieha Posts: 2,964
    edited 2014-03-13 10:12
    Hi

    Only compiling as Logic Regions can give better speed

    BUT it is very much work.
    As all Regions need be placed to give short path between relevant signals.
    That said -- every move of regions --- Need one compile
  • cgraceycgracey Posts: 14,152
    edited 2014-03-13 10:12
    Rayman wrote: »
    Ok, maybe that was a longshot... Do you think the Cyclone V will get to higher frequency, or is the same limit?


    I've been trying to compile for Cyclone V through the night, but it keeps changing my flop names to its own generated names where signals feed into DSP blocks. I set a switch on the fitter that tells it to respect SDC (design constraint) settings, but it still keeps changing names and then misses the multicycle assignments. I don't know what to do about it yet. I suspect it will be 20% faster than Cyclone IV. As soon as I can find out, I'll post the results.
  • SeairthSeairth Posts: 2,474
    edited 2014-03-13 10:37
    Looks great!

    Allows trivial decoding of differential digital input pair...
             PICKZC ina,#16 ' 16&17 are differential +/- pair
    if_00    jmp    #S0 state
    if_01    RCL   data,#1   ' received a 1
    if_10    SHL   data,#1   ' received a 0
    if_11    jmp    #S1
    

    Above is almost certainly incorrect for USB, but illustrates the idea. If one of S0 or S1 are not needed, can receive at 50Mbps (@200Mhz) or 48Mbps (@192Mhz)

    Judicious sprinkling of NOP's and re-syncing per packet (or byte) should allow 48Mbps @ 200Mhz for CRC-less testing

    We still need a USB instruction (or state machine) though due to CRC

    How about
             PICKZC ina,#16 ' 16&17 are differential +/- pair
    if_00    jmp    #S0
    if_11    jmp    #S1
             RCL   data,#1   ' C contains received bit
    
  • Bill HenningBill Henning Posts: 6,445
    edited 2014-03-13 10:48
    +1

    My brain was stuck in decoding four states mode...
    Seairth wrote: »
    How about
             PICKZC ina,#16 ' 16&17 are differential +/- pair
    if_00    jmp    #S0
    if_11    jmp    #S1
             RCL   data,#1   ' C contains received bit
    
  • jmgjmg Posts: 15,173
    edited 2014-03-13 12:26
    Seairth wrote: »
    How about
             PICKZC ina,#16 ' 16&17 are differential +/- pair
    if_00    jmp    #S0
    if_11    jmp    #S1
             RCL   data,#1   ' C contains received bit
    

    Close, but it is more accurate to say C contains the raw-serial data
    - in USB the change of state encodes the data, so another XOR from a previous state is needed, and also bit-stuff removal, before you have a valid received data bit.

    Instinct says a pick-pair opcode should also be useful for Quadrature Encoders, but I'm not seeing an elegant outcome yet...
  • jmgjmg Posts: 15,173
    edited 2014-03-13 12:31
             PICKZC ina,#16 ' 16&17 are differential +/- pair
    if_00    jmp    #S0 state
    if_01    RCL   data,#1   ' received a 1
    if_10    SHL   data,#1   ' received a 0
    if_11    jmp    #S1
    

    Above is almost certainly incorrect for USB, but illustrates the idea. If one of S0 or S1 are not needed, can receive at 50Mbps (@200Mhz) or 48Mbps (@192Mhz)

    Judicious sprinkling of NOP's and re-syncing per packet (or byte) should allow 48Mbps @ 200Mhz for CRC-less testing

    We still need a USB instruction (or state machine) though due to CRC

    Slow down thar...
    For USB that's not a decoded bit yet, it still needs applied xor and bit-destuff and bit counter before you can collect it into a byte.
    ( see the USB thread for the verilog skeleton of what is needed )
  • Bill HenningBill Henning Posts: 6,445
    edited 2014-03-13 12:41
    I sit corrected :)

    Unfortunately I have not had time to really look into the USB stuff.
    jmg wrote: »
    Slow down thar...
    For USB that's not a decoded bit yet, it still needs applied xor and bit-destuff and bit counter before you can collect it into a byte.
    ( see the USB thread for the verilog skeleton of what is needed )
  • jmgjmg Posts: 15,173
    edited 2014-03-13 12:53
    cgracey wrote: »
    I've been trying to compile for Cyclone V through the night, but it keeps changing my flop names to its own generated names where signals feed into DSP blocks. I set a switch on the fitter that tells it to respect SDC (design constraint) settings, but it still keeps changing names and then misses the multicycle assignments. I don't know what to do about it yet. I suspect it will be 20% faster than Cyclone IV. As soon as I can find out, I'll post the results.

    You have to love tool flows, that are somehow not quite the same across families...
    Google does not find much, might be time to contact Altera to find the magic preserve button ?
Sign In or Register to comment.