Instinct says a pick-pair opcode should also be useful for Quadrature Encoders, but I'm not seeing an elegant outcome yet...
Found a solution that is state-based, ( and also a good argument for making Macro Assembler standard).
Idle state is 2 lines (test and jump to same) and
Change action is 3 lines. ( test, inc/dec, jump to new)
appx 28 lines of code for each Quad block.
If Chip has the index choice, then that 28 lines may be mostly share-able across 3 Quad instances.
Could do 3 threads of Quad, leaving 4th for serious lifting.
At a USB compatible, leisurely 48MHz, main thread is (say) ~30 mips and each quad can run at ~6 mips.
I'm not sure how jumps and threads play together in the fine print, but that's roughly 3 x 1MHz quad-capable + Main.
Yep. I was trying for a small, simple generic piece of sample code... yours is simpler and has the right bit-order for USB
I am thrilled at the SETZC / PICZC instruction... it will make a lot of decoding easier!
Yes. Both SETZC / PICKZC and RESD / RESULT are both simple & elegant, and extremely useful.
I really love the extra extension of RESD / RESULT to be used with first following executed instruction permitting a set of alternate conditional instructions. It's the little gems like these that continue to boost the performance for those of us who wish to spend the time.
Yes. Both SETZC / PICKZC and RESD / RESULT are both simple & elegant, and extremely useful.
I really love the extra extension of RESD / RESULT to be used with first following executed instruction permitting a set of alternate conditional instructions. It's the little gems like these that continue to boost the performance for those of us who wish to spend the time.
You have to love tool flows, that are somehow not quite the same across families...
Google does not find much, might be time to contact Altera to find the magic preserve button ?
Contact Altera? They've made that an exercise in futility.
They've got lots of online help, but I can never find anything I need. One other thing I have never been able to find is any kind of benchmark comparisons between device families. If they don't want to "talk" about something, you will find no mention of the matter, anywhere. It all seems very controlled.
Here is an updated Instruction Summary with...
* Column for WZ & WC now has it for each op
* adds new instructions REPD & PICKZC (note PICKZC my not be correct opcode bits)
* removes instruction SETZC Tip to view without wrap - reduce text size (In IE Ctrl-Scroll)
You know we have this new RESD D/# instruction that lets you redirect the next result to another register.
What about expanding this capability a little (rdr = task's redirection register):
RESD D/# 'set rdr to D/#, redirect next write to [rdr]
RESDP D/# 'set rdr to D/#, redirect next write to [rdr++]
RESDN D/# 'set rdr to D/#, redirect next write to [rdr--]
RESDX D/# 'set rdr to D/#, redirect all subsequent writes to [rdr] until RESDOFF
RESDPX D/# 'set rdr to D/#, redirect all subsequent writes to [rdr++] until RESDOFF
RESDNX D/# 'set rdr to D/#, redirect all subsequent writes to [rdr--] until RESDOFF
RESDOFF D/# 'set rdr to D/#, cancel write redirection
RESD ' redirect next write to [rdr]
RESDP ' redirect next write to [rdr++]
RESDN ' redirect next write to [rdr--]
RESDX ' redirect all subsequent writes to [rdr] until RESDOFF
RESDPX ' redirect all subsequent writes to [rdr++] until RESDOFF
RESDNX ' redirect all subsequent writes to [rdr--] until RESDOFF
RESDOFF ' cancel write redirection
This scheme takes two more state bits and an incrementer/decrementer on the task's redirection register.
You know we have this new RESD D/# instruction that lets you redirect the next result to another register.
What about expanding this capability a little (rdr = task's redirection register):
RESD D/# 'set rdr to D/#, redirect next write to [rdr]
RESDP D/# 'set rdr to D/#, redirect next write to [rdr++]
RESDN D/# 'set rdr to D/#, redirect next write to [rdr--]
RESDX D/# 'set rdr to D/#, redirect all subsequent writes to [rdr] until RESDOFF
RESDPX D/# 'set rdr to D/#, redirect all subsequent writes to [rdr++] until RESDOFF
RESDNX D/# 'set rdr to D/#, redirect all subsequent writes to [rdr--] until RESDOFF
RESDOFF D/# 'set rdr to D/#, cancel write redirection
RESD ' redirect next write to [rdr]
RESDP ' redirect next write to [rdr++]
RESDN ' redirect next write to [rdr--]
RESDX ' redirect all subsequent writes to [rdr] until RESDOFF
RESDPX ' redirect all subsequent writes to [rdr++] until RESDOFF
RESDNX ' redirect all subsequent writes to [rdr--] until RESDOFF
RESDOFF ' cancel write redirection
This scheme takes two more state bits and an incrementer/decrementer on the task's redirection register.
WOW! That's some scheme Chip.
So each task has its own task register, and it can auto increment/decrement each time it gets used. That is now a really powerful instruction.
Rather than the RESDP & RESDN in pnut, perhaps they could be RESD #/D++ or RESD #/D-- to indicate increment or decrement, instead of revealing there are separate opcodes used.
You know we have this new RESD D/# instruction that lets you redirect the next result to another register.
What about expanding this capability a little (rdr = task's redirection register):
RESD D/# 'set rdr to D/#, redirect next write to [rdr]
RESDP D/# 'set rdr to D/#, redirect next write to [rdr++]
RESDN D/# 'set rdr to D/#, redirect next write to [rdr--]
RESDX D/# 'set rdr to D/#, redirect all subsequent writes to [rdr] until RESDOFF
RESDPX D/# 'set rdr to D/#, redirect all subsequent writes to [rdr++] until RESDOFF
RESDNX D/# 'set rdr to D/#, redirect all subsequent writes to [rdr--] until RESDOFF
RESDOFF D/# 'set rdr to D/#, cancel write redirection
RESD ' redirect next write to [rdr]
RESDP ' redirect next write to [rdr++]
RESDN ' redirect next write to [rdr--]
RESDX ' redirect all subsequent writes to [rdr] until RESDOFF
RESDPX ' redirect all subsequent writes to [rdr++] until RESDOFF
RESDNX ' redirect all subsequent writes to [rdr--] until RESDOFF
RESDOFF ' cancel write redirection
This scheme takes two more state bits and an incrementer/decrementer on the task's redirection register.
I don't know about this one. I'm just trying to imagine when we would redirect a whole lot of sequential instructions to a register (or a register block), so far all I could dream up was maybe inside some tight REP loop doing a form of SIMD or something. We have INDA++ already which could help do the writing to a block. So the only real benefit is in not destroying both the operands to the opcodes.
Chip, wouldn't post increment/decrement be better?
Might the instructions RESULT / RESULTX / RESULTN (result = result modify off = never) be better, and the ++ or -- on the #/D operand as pre or post (only 1) ?
Example of a Single Instruction Multiple Data (SIMD) type operation on adding 100 numbers was what came to mind.
REPS #100,#2
RESDPX C
op: ADD A,B
INCDS op
RESOFF C
Result C vector = A vector + B vector
EDIT: Actually that probably cannot work because there is not enough time in the self modifying code loop to work right, so perhaps it could be this instead:
REPS #100,#1
RESDPX C
op: ADD INDA++,INDB++
RESOFF C
Actually this is where the incrementing/decrementing RESD stuff became useful because INDA/INDB are already consumed in the loop itself. So yes it definitely could be of some benefit here.
We do 100 adds in 100 cycles above. To do this same operation otherwise takes a lot of instructions if you can't destroy the D operand.
ragloh,
I could see potential in using these instead of INDA/INDB (saved for other uses), as a block move between threads, as a way of using banking (instead of remapping). I am sure there are other uses - we just have to see.
Chip,
I see how RESD,RESDP and RESDN could be useful in multi-tasking as we only have 1 set of INDA/B registers.
Question: The SETMAP instruction now has a #S value. I recall you mentioning somewhere about selecting a
register block using any register rather than INDB or task number? Also can each hw task select a separate block?
I don't know about this one. I'm just trying to imagine when we would redirect a whole lot of sequential instructions to a register (or a register block), so far all I could dream up was maybe inside some tight REP loop doing a form of SIMD or something. We have INDA++ already which could help do the writing to a block. So the only real benefit is in not destroying both the operands to the opcodes.
INDA/INDB are not per-task, but global to the cog, so this would provide some new capability to all tasks.
Chip, wouldn't post increment/decrement be better?
Might the instructions RESULT / RESULTX / RESULTN (result = result modify off = never) be better, and the ++ or -- on the #/D operand as pre or post (only 1) ?
This scheme only does post-inc/dec on the redirect register.
We might need better names, for sure.
Also, these instructions have operand-less versions which don't write rdr, but set a redirect mode.
Example of a Single Instruction Multiple Data (SIMD) type operation on adding 100 numbers was what came to mind.
REPS #100,#2
RESDPX C
op: ADD A,B
INCDS op
RESOFF C
Result C vector = A vector + B vector
EDIT: Actually that probably cannot work because there is not enough time in the self modifying code loop to work right, so perhaps it could be this instead:
REPS #100,#1
RESDPX C
op: ADD INDA++,INDB++
RESOFF C
Actually this is where the incrementing/decrementing RESD stuff became useful because INDA/INDB are already consumed in the loop itself. So yes it definitely could be of some benefit here.
We do 100 adds in 100 cycles above. To do this same operation otherwise takes a lot of instructions if you can't destroy the D operand.
Chip,
I see how RESD,RESDP and RESDN could be useful in multi-tasking as we only have 1 set of INDA/B registers.
Question: The SETMAP instruction now has a #S value. I recall you mentioning somewhere about selecting a
register block using any register rather than INDB or task number? Also can each hw task select a separate block?
Register remapping is a cog-wide function that you'd probably want to set up once, unless, like for preemptive multi-threading, you'd like to switch around under software control. That S/# in SETTASK is to set a static remap value, rather than using only the current task ID or INDA/INDB for instantaneous remap.
RESDP D/# 'set rdr to D/#, redirect next write to [rdr++]
RESDN D/# 'set rdr to D/#, redirect next write to [rdr--]
This scheme takes two more state bits and an incrementer/decrementer on the task's redirection register.
Destination is only COG RAM registers? or can be HUB RAM too?
Do you think that It could be possible (or useful) to add OFFSET to those instructions?
For example (instruction for fixed 8 bytes offset) :
RESDP8 D/# 'set rdr to D/#, redirect next write to [rdr+8]
RESDN8 D/# 'set rdr to D/#, redirect next write to [rdr-8]
With an offset of 8, maybe some application could use eight synchronized cogs to get the fastest data transfer possible to fill HUB RAM (or use HUB RAM to synchonously send data using pin transfer (XFR).
I remember that high speed video on P1 was implemented using several synchronized cogs (is this right?).
Maybe some high speed video application could benefit. Or projects that made use or fast pipelined DAC/ADC.
Another example: with an +3 offset, three cogs can send R, G, B data indepently to a DAC (EDIT: "to three DACs") using pin transfer instructions. And hopefully with a 3x speed improvement over a single COG handling three colors.
Destination is only COG RAM registers? or can be HUB RAM too?
Do you think that It could be possible (or useful) to add OFFSET to those instructions?
For example (instruction for fixed 8 bytes offset) :
RESDP8 D/# 'set rdr to D/#, redirect next write to [rdr+8]
RESDN8 D/# 'set rdr to D/#, redirect next write to [rdr-8]
With an offset of 8, maybe some application could use eight synchronized cogs to get the fastest data transfer possible to fill HUB RAM (or use HUB RAM to synchonously send data using pin transfer (XFR).
I remember that high speed video on P1 was implemented using several synchronized cogs (is this right?).
Maybe some high speed video application could benefit. Or projects that made use or fast pipelined DAC/ADC.
Another example: with an +3 offset, three cogs can send R, G, B data indepently to a DAC (EDIT: "to three DACs") using pin transfer instructions. And hopefully with a 3x speed improvement over a single COG handling three colors.
What do you think?
I don't see a need for an offset because you set whatever address you want and it just goes from there.
This is something that the programmer is going to have to put in his head and then recognize when an opportunity to use it arises.
I can think of all kinds of image processing that would benefit from this. Imagine the task of comparing two images, from known optics, which are similar except for rotation and magnification(cyclopean vision), where your calculation yields a change in the position of the camera repeated cordic and math functions. These tools seem to fit the bill nicely.
Rich
In the past what happened was the documentation seemed to be purposefully targeted at various classes of users at various times in the device's life cycle… first being experts, followed by advanced, beginners and intermediate. Where most Prop1 users will probably want to start is with P1 compatible coding… and then adding in advanced programming as needed.
In the mass adaptation phase, the biggest problem seems to be directing users to appropriate documentation, based upon their individual skill set. Parallax does a good job of this, mostly through the forums. Even advanced users are going to struggle with some of the refinements that have occurred.
Integrating advanced help utilities into the various tools seems essential to me.
Rich
I didn't mean to go off-topic, just couldn't help myself:)
You know we have this new RESD D/# instruction that lets you redirect the next result to another register.
What about expanding this capability a little (rdr = task's redirection register):
RESD D/# 'set rdr to D/#, redirect next write to [rdr]
RESDP D/# 'set rdr to D/#, redirect next write to [rdr++]
RESDN D/# 'set rdr to D/#, redirect next write to [rdr--]
RESDX D/# 'set rdr to D/#, redirect all subsequent writes to [rdr] until RESDOFF
RESDPX D/# 'set rdr to D/#, redirect all subsequent writes to [rdr++] until RESDOFF
RESDNX D/# 'set rdr to D/#, redirect all subsequent writes to [rdr--] until RESDOFF
RESDOFF D/# 'set rdr to D/#, cancel write redirection
RESD ' redirect next write to [rdr]
RESDP ' redirect next write to [rdr++]
RESDN ' redirect next write to [rdr--]
RESDX ' redirect all subsequent writes to [rdr] until RESDOFF
RESDPX ' redirect all subsequent writes to [rdr++] until RESDOFF
RESDNX ' redirect all subsequent writes to [rdr--] until RESDOFF
RESDOFF ' cancel write redirection
This scheme takes two more state bits and an incrementer/decrementer on the task's redirection register.
Comments
Found a solution that is state-based, ( and also a good argument for making Macro Assembler standard).
Idle state is 2 lines (test and jump to same) and
Change action is 3 lines. ( test, inc/dec, jump to new)
appx 28 lines of code for each Quad block.
If Chip has the index choice, then that 28 lines may be mostly share-able across 3 Quad instances.
Could do 3 threads of Quad, leaving 4th for serious lifting.
At a USB compatible, leisurely 48MHz, main thread is (say) ~30 mips and each quad can run at ~6 mips.
I'm not sure how jumps and threads play together in the fine print, but that's roughly 3 x 1MHz quad-capable + Main.
Hard not to be impressed
There are a lot of times where I have needed to decode pairs of bits and needed the full 4 cases. This instruction will help a lot.
BTW I think SETZC is still a better name than PICKZC. If it is not too much trouble, I suggest the compiler permit SETZC <reg> as SETZC <reg>,#0.
Yep, I forgot about USB having to remember the previous bit.
Because LBS is received first. (realised this error in my USB a day or two ago)
I am thrilled at the SETZC / PICZC instruction... it will make a lot of decoding easier!
Yes. Both SETZC / PICKZC and RESD / RESULT are both simple & elegant, and extremely useful.
I really love the extra extension of RESD / RESULT to be used with first following executed instruction permitting a set of alternate conditional instructions. It's the little gems like these that continue to boost the performance for those of us who wish to spend the time.
Contact Altera? They've made that an exercise in futility.
They've got lots of online help, but I can never find anything I need. One other thing I have never been able to find is any kind of benchmark comparisons between device families. If they don't want to "talk" about something, you will find no mention of the matter, anywhere. It all seems very controlled.
* Column for WZ & WC now has it for each op
* adds new instructions REPD & PICKZC (note PICKZC my not be correct opcode bits)
* removes instruction SETZC
Tip to view without wrap - reduce text size (In IE Ctrl-Scroll) InstructionSummary_20140312b.spin
You know we have this new RESD D/# instruction that lets you redirect the next result to another register.
What about expanding this capability a little (rdr = task's redirection register):
This scheme takes two more state bits and an incrementer/decrementer on the task's redirection register.
So each task has its own task register, and it can auto increment/decrement each time it gets used. That is now a really powerful instruction.
Rather than the RESDP & RESDN in pnut, perhaps they could be RESD #/D++ or RESD #/D-- to indicate increment or decrement, instead of revealing there are separate opcodes used.
I don't know about this one. I'm just trying to imagine when we would redirect a whole lot of sequential instructions to a register (or a register block), so far all I could dream up was maybe inside some tight REP loop doing a form of SIMD or something. We have INDA++ already which could help do the writing to a block. So the only real benefit is in not destroying both the operands to the opcodes.
Might the instructions RESULT / RESULTX / RESULTN (result = result modify off = never) be better, and the ++ or -- on the #/D operand as pre or post (only 1) ?
Result C vector = A vector + B vector
EDIT: Actually that probably cannot work because there is not enough time in the self modifying code loop to work right, so perhaps it could be this instead:
Actually this is where the incrementing/decrementing RESD stuff became useful because INDA/INDB are already consumed in the loop itself. So yes it definitely could be of some benefit here.
We do 100 adds in 100 cycles above. To do this same operation otherwise takes a lot of instructions if you can't destroy the D operand. So we get a 4x speedup above!
I could see potential in using these instead of INDA/INDB (saved for other uses), as a block move between threads, as a way of using banking (instead of remapping). I am sure there are other uses - we just have to see.
I see how RESD,RESDP and RESDN could be useful in multi-tasking as we only have 1 set of INDA/B registers.
Question: The SETMAP instruction now has a #S value. I recall you mentioning somewhere about selecting a
register block using any register rather than INDB or task number? Also can each hw task select a separate block?
INDA/INDB are not per-task, but global to the cog, so this would provide some new capability to all tasks.
This scheme only does post-inc/dec on the redirect register.
We might need better names, for sure.
Also, these instructions have operand-less versions which don't write rdr, but set a redirect mode.
There are some cases where there's a definite benefit. This is practically a freebie, so I say put it in the toolbox.
Register remapping is a cog-wide function that you'd probably want to set up once, unless, like for preemptive multi-threading, you'd like to switch around under software control. That S/# in SETTASK is to set a static remap value, rather than using only the current task ID or INDA/INDB for instantaneous remap.
Destination is only COG RAM registers? or can be HUB RAM too?
Do you think that It could be possible (or useful) to add OFFSET to those instructions?
For example (instruction for fixed 8 bytes offset) :
RESDP8 D/# 'set rdr to D/#, redirect next write to [rdr+8]
RESDN8 D/# 'set rdr to D/#, redirect next write to [rdr-8]
With an offset of 8, maybe some application could use eight synchronized cogs to get the fastest data transfer possible to fill HUB RAM (or use HUB RAM to synchonously send data using pin transfer (XFR).
I remember that high speed video on P1 was implemented using several synchronized cogs (is this right?).
Maybe some high speed video application could benefit. Or projects that made use or fast pipelined DAC/ADC.
Another example: with an +3 offset, three cogs can send R, G, B data indepently to a DAC (EDIT: "to three DACs") using pin transfer instructions. And hopefully with a 3x speed improvement over a single COG handling three colors.
What do you think?
I don't see a need for an offset because you set whatever address you want and it just goes from there.
This is something that the programmer is going to have to put in his head and then recognize when an opportunity to use it arises.
Rich
That's fine. When they graduate, or look at someone else's code, they will find these gems being used ;
Probably the instruction set should be split into two, for normal and advanced users.
In the past what happened was the documentation seemed to be purposefully targeted at various classes of users at various times in the device's life cycle… first being experts, followed by advanced, beginners and intermediate. Where most Prop1 users will probably want to start is with P1 compatible coding… and then adding in advanced programming as needed.
In the mass adaptation phase, the biggest problem seems to be directing users to appropriate documentation, based upon their individual skill set. Parallax does a good job of this, mostly through the forums. Even advanced users are going to struggle with some of the refinements that have occurred.
Integrating advanced help utilities into the various tools seems essential to me.
Rich
I didn't mean to go off-topic, just couldn't help myself:)
It's just for cog RAM registers - a little trick to swap out result register addresses.
I can see all sorts of potential uses for this in graphics, string operations and even cryptography.
Sorry, cannot resist...
I wonder if there would be uses for the inverse?
SRCD D/#
SRCDP D/#
SRCDN D/#
SRCDOFF D/#
What would these do?
C.W.