--L- wrD 1111111 ZC L CCCC DDDDDDDDD 011101001 COGNEWX D/# (waits for hub)
---- 1111111 ZC L CCCC DDDDDDDDD 011101010 RESD D/#
That's right, but get the '--L-' in there and change 'ZC' to '00'.
Since this RESD instruction is conditional (unlike AUGS/AUGD), I made it wait for an executing instruction that writes a result. So now you can do things like this:
RESD #result
if_nz_and_nc AND a,b
if_nz_and_c OR c,d
if_z_and_nc XOR e,f
if_z_and_c ADD g,h
...without affecting source variables.
This opens up a lot of doors for compact functions. I'll see about addressing your pick-any-set-of-two-bits-from-D-to-get-Z-and-C idea next. These are useful little helpers.
RESD is a one-shot deal. You do a RESD and the next instruction that writes a register uses it, and it's over. Things are back to normal then, until another RESD is issued, followed, at some point, by another instruction that writes.
I forgot to mention that RESD works with register remapping.
RESD is a one-shot deal. You do a RESD and the next instruction that writes a register uses it, and it's over. Things are back to normal then, until another RESD is issued, followed, at some point, by another instruction that writes.
I forgot to mention that RESD works with register remapping.
That's right, but get the '--L-' in there and change 'ZC' to '00'.
Since this RESD instruction is conditional (unlike AUGS/AUGD), I made it wait for an executing instruction that writes a result. So now you can do things like this:
RESD #result
if_nz_and_nc AND a,b
if_nz_and_c OR c,d
if_z_and_nc XOR e,f
if_z_and_c ADD g,h
...without affecting source variables.
This opens up a lot of doors for compact functions. I'll see about addressing your pick-any-set-of-two-bits-from-D-to-get-Z-and-C idea next. These are useful little helpers.
Regarding your idea for any two bits to Z and C. How about:
zzzzz = bit number to use as bit Z comes from
ccccc = bit number to use as bit C comes from
That needs 10 bits
But
- WC and WZ are available (as this instruction would always write C and Z, that's the whole point) one could be used to extend D's 9 bits
- I is available, instruction does not make sense with an immediate operand
GETZC zzzzzccccc,S
May I also modestly propose(Don't think we need it)
PUTWZ D, zzzzzccccc
These instructions would then be useful for decoding differential input, and outputting differential outout... as INA/B/C/D and OUTA/B/C/D would be addressable.
GETZC would also be very useful in VM's and other byte code / protocol decoding.
PUTZC would be useful in protocol encoding.
The down side? They would need two full ops.
FYI, I think GETZC is much more useful than PUTZC.
Any chance the DE2 could run 1 cog at 200 MHz instead of 5 at 80 MHz?
For some crazy reason, if I compile only one cog on the DE2-115 the Fmax is really low. I need to compile at least two cogs to get a decent Fmax. Two cogs and four cogs are the same Fmax. This is a peculiarity of the Cyclone IV. Stratix III, on the other hand, can go much faster with just one cog.
Regarding your idea for any two bits to Z and C. How about:
zzzzz = bit number to use as bit Z comes from
ccccc = bit number to use as bit C comes from
That needs 10 bits
But
- WC and WZ are available (as this instruction would always write C and Z, that's the whole point) one could be used to extend D's 9 bits
- I is available, instruction does not make sense with an immediate operand
GETZC zzzzzccccc,S
May I also modestly propose(Don't think we need it)
PUTWZ D, zzzzzccccc
These instructions would then be useful for decoding differential input, and outputting differential outout... as INA/B/C/D and OUTA/B/C/D would be addressable.
GETZC would also be very useful in VM's and other byte code / protocol decoding.
PUTZC would be useful in protocol encoding.
The down side? They would need two full ops.
FYI, I think GETZC is much more useful than PUTZC.
I was thinking the Z/C pair could come from two adjacent bits. Is that too limiting, do you think?
For function selectors, I think you'd probably want it that way.
Cluso99 had brought this request up a few pages back.
If it can fit, without re-organizing all the other opcodes, it would be really nice if it could be any two bits, specifically to make arbitrary decoding easier.
Don't get me wrong, adjacent bits is also useful... but random bits is more useful.
Any two bits is very awkward to accomplish, unless you had the two five-bit fields waiting in a register.
I made a new 'PICKZC D/#,S/#' instruction. It picks a Z/C pair using S/# as a 4-bit index into sixteen two-bit groups in D/#. It always writes Z and C, without needing WZ or WC.
With PICKZC you can use registers or constants for both the data and the index, so it's very flexible.
PICKZC data,#15 'get Z/C from data's msb's
PICKZC data,index 'get Z/C from data according to index
PICKZC #%10,#0 'set Z and clear C
Allows trivial decoding of differential digital input pair...
PICKZC ina,#16 ' 16&17 are differential +/- pair
if_00 jmp #S0 state
if_01 RCL data,#1 ' received a 1
if_10 SHL data,#1 ' received a 0
if_11 jmp #S1
Above is almost certainly incorrect for USB, but illustrates the idea. If one of S0 or S1 are not needed, can receive at 50Mbps (@200Mhz) or 48Mbps (@192Mhz)
Judicious sprinkling of NOP's and re-syncing per packet (or byte) should allow 48Mbps @ 200Mhz for CRC-less testing
We still need a USB instruction (or state machine) though due to CRC
Any two bits is very awkward to accomplish, unless you had the two five-bit fields waiting in a register.
I made a new 'PICKZC D/#,S/#' instruction. It picks a Z/C pair using S/# as a 4-bit index into sixteen two-bit groups in D/#. It always writes Z and C, without needing WZ or WC.
With PICKZC you can use registers or constants for both the data and the index, so it's very flexible.
PICKZC data,#15 'get Z/C from data's msb's
PICKZC data,index 'get Z/C from data according to index
PICKZC #%10,#0 'set Z and clear C
For some crazy reason, if I compile only one cog on the DE2-115 the Fmax is really low. I need to compile at least two cogs to get a decent Fmax. Two cogs and four cogs are the same Fmax. This is a peculiarity of the Cyclone IV. Stratix III, on the other hand, can go much faster with just one cog.
Only compiling as Logic Regions can give better speed
BUT it is very much work.
As all Regions need be placed to give short path between relevant signals.
That said -- every move of regions --- Need one compile
Ok, maybe that was a longshot... Do you think the Cyclone V will get to higher frequency, or is the same limit?
I've been trying to compile for Cyclone V through the night, but it keeps changing my flop names to its own generated names where signals feed into DSP blocks. I set a switch on the fitter that tells it to respect SDC (design constraint) settings, but it still keeps changing names and then misses the multicycle assignments. I don't know what to do about it yet. I suspect it will be 20% faster than Cyclone IV. As soon as I can find out, I'll post the results.
Allows trivial decoding of differential digital input pair...
PICKZC ina,#16 ' 16&17 are differential +/- pair
if_00 jmp #S0 state
if_01 RCL data,#1 ' received a 1
if_10 SHL data,#1 ' received a 0
if_11 jmp #S1
Above is almost certainly incorrect for USB, but illustrates the idea. If one of S0 or S1 are not needed, can receive at 50Mbps (@200Mhz) or 48Mbps (@192Mhz)
Judicious sprinkling of NOP's and re-syncing per packet (or byte) should allow 48Mbps @ 200Mhz for CRC-less testing
We still need a USB instruction (or state machine) though due to CRC
How about
PICKZC ina,#16 ' 16&17 are differential +/- pair
if_00 jmp #S0
if_11 jmp #S1
RCL data,#1 ' C contains received bit
PICKZC ina,#16 ' 16&17 are differential +/- pair
if_00 jmp #S0
if_11 jmp #S1
RCL data,#1 ' C contains received bit
Close, but it is more accurate to say C contains the raw-serial data
- in USB the change of state encodes the data, so another XOR from a previous state is needed, and also bit-stuff removal, before you have a valid received data bit.
Instinct says a pick-pair opcode should also be useful for Quadrature Encoders, but I'm not seeing an elegant outcome yet...
PICKZC ina,#16 ' 16&17 are differential +/- pair
if_00 jmp #S0 state
if_01 RCL data,#1 ' received a 1
if_10 SHL data,#1 ' received a 0
if_11 jmp #S1
Above is almost certainly incorrect for USB, but illustrates the idea. If one of S0 or S1 are not needed, can receive at 50Mbps (@200Mhz) or 48Mbps (@192Mhz)
Judicious sprinkling of NOP's and re-syncing per packet (or byte) should allow 48Mbps @ 200Mhz for CRC-less testing
We still need a USB instruction (or state machine) though due to CRC
Slow down thar...
For USB that's not a decoded bit yet, it still needs applied xor and bit-destuff and bit counter before you can collect it into a byte.
( see the USB thread for the verilog skeleton of what is needed )
Slow down thar...
For USB that's not a decoded bit yet, it still needs applied xor and bit-destuff and bit counter before you can collect it into a byte.
( see the USB thread for the verilog skeleton of what is needed )
I've been trying to compile for Cyclone V through the night, but it keeps changing my flop names to its own generated names where signals feed into DSP blocks. I set a switch on the fitter that tells it to respect SDC (design constraint) settings, but it still keeps changing names and then misses the multicycle assignments. I don't know what to do about it yet. I suspect it will be 20% faster than Cyclone IV. As soon as I can find out, I'll post the results.
You have to love tool flows, that are somehow not quite the same across families...
Google does not find much, might be time to contact Altera to find the magic preserve button ?
Comments
It's right after COGNEWX. No flags written.
A few questions:
1) What happens when:
AUGD #some_big_constant
RESD #C
MUL A,B
is executed?
2) Is the #/D in RESD saved in the task state WIDE? (rogloh asked this above too)
3) I think AUGS, AUGD are already saved in the task state WIDE, correct?
That's right, but get the '--L-' in there and change 'ZC' to '00'.
Since this RESD instruction is conditional (unlike AUGS/AUGD), I made it wait for an executing instruction that writes a result. So now you can do things like this:
...without affecting source variables.
This opens up a lot of doors for compact functions. I'll see about addressing your pick-any-set-of-two-bits-from-D-to-get-Z-and-C idea next. These are useful little helpers.
You got it. Can you think of any simple way we can get some good use out of them?
1) Only the bottom nine bits of D in RESD are stored. You weren't hoping this would automatically translate into a WRLONG were you?
2) Yes, it takes 1 bit for the enable flag and 9 bits for the register address.
3) Yes.
So I assume that
RESD
> without any parameter
Reset this system to Standard behavior.
RESD is a one-shot deal. You do a RESD and the next instruction that writes a register uses it, and it's over. Things are back to normal then, until another RESD is issued, followed, at some point, by another instruction that writes.
I forgot to mention that RESD works with register remapping.
Thanks
Now I understand
I can see using that quite a bit.
Basically gives us flexible three operand instructions!
Not on Prop2...
I think for P3 we will need to revisit a lot of things, but that will be a while away.
I do like your other helper idea (any two bits to Z and C)
Regarding your idea for any two bits to Z and C. How about:
zzzzz = bit number to use as bit Z comes from
ccccc = bit number to use as bit C comes from
That needs 10 bits
But
- WC and WZ are available (as this instruction would always write C and Z, that's the whole point) one could be used to extend D's 9 bits
- I is available, instruction does not make sense with an immediate operand
GETZC zzzzzccccc,S
May I also modestly propose (Don't think we need it)
PUTWZ D, zzzzzccccc
These instructions would then be useful for decoding differential input, and outputting differential outout... as INA/B/C/D and OUTA/B/C/D would be addressable.
GETZC would also be very useful in VM's and other byte code / protocol decoding.
PUTZC would be useful in protocol encoding.
The down side? They would need two full ops.
FYI, I think GETZC is much more useful than PUTZC.
For some crazy reason, if I compile only one cog on the DE2-115 the Fmax is really low. I need to compile at least two cogs to get a decent Fmax. Two cogs and four cogs are the same Fmax. This is a peculiarity of the Cyclone IV. Stratix III, on the other hand, can go much faster with just one cog.
I was thinking the Z/C pair could come from two adjacent bits. Is that too limiting, do you think?
For function selectors, I think you'd probably want it that way.
Cluso99 had brought this request up a few pages back.
Don't get me wrong, adjacent bits is also useful... but random bits is more useful.
I made a new 'PICKZC D/#,S/#' instruction. It picks a Z/C pair using S/# as a 4-bit index into sixteen two-bit groups in D/#. It always writes Z and C, without needing WZ or WC.
With PICKZC you can use registers or constants for both the data and the index, so it's very flexible.
PICKZC data,#15 'get Z/C from data's msb's
PICKZC data,index 'get Z/C from data according to index
PICKZC #%10,#0 'set Z and clear C
I will add new conditionals to the assembler:
if_00
if_01
if_10
if_11
if_x0
if_x1
if_0x
if_1x
if_either
if_same
if_diff
I got rid of the SETZC instruction because it was redundant.
Allows trivial decoding of differential digital input pair...
Above is almost certainly incorrect for USB, but illustrates the idea. If one of S0 or S1 are not needed, can receive at 50Mbps (@200Mhz) or 48Mbps (@192Mhz)
Judicious sprinkling of NOP's and re-syncing per packet (or byte) should allow 48Mbps @ 200Mhz for CRC-less testing
We still need a USB instruction (or state machine) though due to CRC
Or if the $3000 Quartus version would make a difference in the two cog / max speed case.
Only compiling as Logic Regions can give better speed
BUT it is very much work.
As all Regions need be placed to give short path between relevant signals.
That said -- every move of regions --- Need one compile
I've been trying to compile for Cyclone V through the night, but it keeps changing my flop names to its own generated names where signals feed into DSP blocks. I set a switch on the fitter that tells it to respect SDC (design constraint) settings, but it still keeps changing names and then misses the multicycle assignments. I don't know what to do about it yet. I suspect it will be 20% faster than Cyclone IV. As soon as I can find out, I'll post the results.
How about
My brain was stuck in decoding four states mode...
Close, but it is more accurate to say C contains the raw-serial data
- in USB the change of state encodes the data, so another XOR from a previous state is needed, and also bit-stuff removal, before you have a valid received data bit.
Instinct says a pick-pair opcode should also be useful for Quadrature Encoders, but I'm not seeing an elegant outcome yet...
Slow down thar...
For USB that's not a decoded bit yet, it still needs applied xor and bit-destuff and bit counter before you can collect it into a byte.
( see the USB thread for the verilog skeleton of what is needed )
Unfortunately I have not had time to really look into the USB stuff.
You have to love tool flows, that are somehow not quite the same across families...
Google does not find much, might be time to contact Altera to find the magic preserve button ?