If you look at the opcodes, there's no more bits to use for something like 'indirect'.
It either has to be via special registers, or with a prefix instruction like I've done here.
I am being very non-serious, but my isr is used as a code example in this thread. Consider changing ram size from 32 to 33-1/3 bits per long, the extra bit and a bit will facilitate long play. :crazy: :clown: :crazy:
If we did that wouldn't we have to also switch from digital to analog?
If you look at the opcodes, there's no more bits to use for something like 'indirect'.
It either has to be via special registers, or with a prefix instruction like I've done here.
I am being very non-serious, but my isr is used as a code example in this thread. Consider changing ram size from 32 to 33-1/3 bits per long, the extra bit and a bit will facilitate long play. :crazy: :clown: :crazy:
If we did that wouldn't we have to also switch from digital to analog?
Since ALTR, ALTD and ALTS are "prefixing" instructions, Isn't it possible to "extend" at least some of the instructions that were "prefixed", by adding "extra" GOx stage(s) to them, thus gaining enough time for the addition(s) to happen?
I know the way I've described above will not be exactly the one they'll perform. It'll be more likely to have two decode paths for the same instruction, discriminated by selectors resulting from the prefixing instructions that were executed before them, but I'm sure you could understand what i meant.
Since ALTR, ALTD and ALTS are "prefixing" instructions, Isn't it possible to "extend" at least some of the instructions that were "prefixed", by adding "extra" GOx stage(s) to them, thus gaining enough time for the addition(s) to happen?
I know the way I've described above will not be exactly the one they'll perform. It'll be more likely to have two decode paths for the same instruction, discriminated by selectors resulting from the prefixing instructions that were executed before them, but I'm sure you could understand what i meant.
That would be possible, and it would make the instructions one cycle longer. However, it would mean making a MULTICYCLE assignment, which opens a Pandora's box of other, necessitated MULTICYCLE assignments. I used to have parts of the chip settle over two clocks, but the ins and outs of making every requisite MULTICYCLE assignment was getting to be way too complicated and I was not 100% confident that I covered every case. The trouble is, logic flows all over the place, in ways you don't realize, at first. Unless you go from only one set of flops to another, which is actually not practical in a design where signals are borrowed all over, you may never get it straight.
A few months ago, I got rid of all the MULTICYCLE paths in the Prop2 and inserted some flops in key places, so that the tool could resolved all timing, based on one constant, everywhere. I only suffered a small Fmax reduction, but now I KNOW that I haven't left anything out. So, I don't want to go back to the MULTICYCLE abyss.
What if you change the immediate version of ALT{R,D,S} to act like PTRx? In other words, when bit 8 of the immediate source is set, D will be affected in the same way the address of a "RDLONG x, #%1_SUPNNNNN" would be affected. Then, you could also use PTRx (which would now just be a normal register) for stuff like LUT and cogram and anything else involving things like array[x += y].
ALTS myptr, #%1SUPNNNNN
RDLONG x, 0-0
myptr res1
EDIT: This wouldn't work for RDLONG, but I still think it would be useful in other cases.
Electrodude,
I don't think there is any gain there because the ALT prefixes is always a full accumulate from S to D. That's very flexible. The PTRA/B based operations are far more limited in inputs so needs the special limited indexing instead.
Oops, I'm flat wrong. The new ALTR/ALTD/ALTS instructions don't accumulate at all. It's just an add.
However, the ALTI instruction maintains the old ALTDS functionality which could increment similarly to the PTRx operations. So, we may still have an equivalent ... not sure, I remember it being difficult to handle the S fields which is where the HubOps address HubRAM from.
There are three new instructions that share the opcode space with ALTI (no more C/Z writing options, as they were meaningless for these instructions):
ALTR D,S/# - use the sum of D and S/# for the result register in the next instruction
ALTD D,S/# - use the sum of D and S/# for the D register in the next instruction
ALTS D,S/# - use the sum of D and S/# for the S register in the next instruction
The idea is that D is an offset and S/# is a base:
ALTx offset,#base
So, now we'll have simple-to-use instructions for R/D/S alterations.
This cleans up 78rpm's task switcher.
OLD CODE
task_switcher ' this is the task switcher isr' save current taskaddct1 task_time, ##TASKS_TIMER
mov modify, curr_task_index
add modify, #task_ctrl_blk
altds modify, #%000_100_000' replace D regmov0-0, IRET1' tcb[ task index]add curr_task_index, #1' next taskand curr_task_index, #$3' in range 0 - 3' set new taskmov modify, curr_task_index
add modify, #task_ctrl_blk
altds modify, #%000_000_100' replace S regmovIRET1, 0-0' tcb[ task index]reti1
NEW CODE
task_switcher addct1 task_time,##TASKS_TIMER 'set next interrupt timealtd curr_task_index,#task_ctrl_blk 'save current taskmov0,IRET1incmod curr_task_index,#3'inc/reset task pointeralts curr_task_index,#task_ctrl_blk 'set new taskmovIRET1,0reti1'execute new task
It may be too late in the game to do this, but I will throw this idea into the pot here.
There are pros and cons with what I suggest here.
Reduce the CCCC conditional execution field from 4 bits to 2 bits. This allows the IF_Z and IF_C combination fields to still be utilised directly.
Introduce a FLAGS / COND / CONDREP/ STATUS / CHECK instruction which perfoms the condition check required and sets a counter, in a similar fashion to REP, to specify that the next sequence of instructions is an extended conditional, ie, IF_BE which is IF_C | IF_Z == 1.
if_zjmp #elsewhere ' IF_Z and varients still function for speed' FLAGS #sequence, #condition
flags #.cond_seq, IF_BEadd my_value, #1' only executed IF_BEshl my_value, #3' only executed IF_BE
.cond_seq ' end of conditional sequenceadd my_value, #1' this is always executed, outside of conditionalif_cjmp #somewhere
flags #.old_code, IF_NEVERshl my_value, #4' this is never executed.
.old_code
The FLAGS / COND / CONDREP / STATUS / CHECK instruction has the same attributes as the REP instruction. It inhibits interrupts until complete, and it is cancelled when a branch is executed.
PNut and other assemblers can generate an error if an extended conditional, ie IF_BE, is specified without an enclosing FLAGS specifier.
The 2 new bits from the CCCC field are for pointer selection, but first we rename:
adra to ptrc
adrb to ptrd
This allows ptra and ptrb to continue to be associated with; loc, calla/b, reta/b, pusha/b, and popa/b. In HLL use, one may be used for the stack pointer and the other for the stack frame pointer.
Ptrc and ptrd are used with all other instructions.
The pointer select bits indicate if ptrc and/or ptrd are used in theinstruction in a similar manner to RDLONG etc, but use ptrc and d instead. This allows pointer operation on all instructions, with 1 bit for S reg and 1 bit for D reg. When specified, the pattern in S or D is similar to current RDLONG etc; 1SUPnnnnn. However we do not need the always 1 bit, therefore for our 9 bits we get SUPnnnnnn.
S selects ptrc or ptrd
U specifes pointer update
P specifes pointer post modification
nnnnnn = index -32to31
As pointers can now be specified for all instructions, we can now do:
loc ptrc, #my_long_array ' array or longloc ptrd, #my_structure ' a data array structurerep #.loop, #num_entries
mov ptrc++, ptrd[8]
add ptrd, #sizeof_structure ' next element of array structure
.loop
It may be possible to remove the RDLONG etc instructions externally, or keep them to show explicit Cog/Hub movement, if address resolution in the Cog can operate on a move:
address bits b31:b9 == 0, then cog
address bits b31:11 == 0 and not cog address, then lut
not cog and not lut then hub.
So the verilog internally can deduce that mov #$100, #$408 would be hub to cog move and execute accordingly.
So what do we gain from this:
Four flexible pointers, double the index range, instead of two.
No additional Special Registers.
Possibly freeing or reallocating instruction opcodes for RDLONG etc externally
What do we lose:
Required to place conditional execution except for most basic Z and C in a block.
Logic wise:
An additional counter for the FLAGS / COND execution.
An additional indexing logic / circuit per cog.
Additional logic to prohibit interrupts whilst the conditional block is executing.
Also, can this be done without slowing the clock speed?
Electrodude,
Why don't you think your suggestion will work for a RDLONG?
Reposting code for reference:
ALTS myptr, #%1SUPNNNNN
RDLONG x, 0-0
myptr res1
That would not act like PTRx. It would instead make RDLONG get the address from a different register each time.
Using "RDLONG x, #0-0" instead wouldn't work either (when address & $100 <> 0), because then you would be doing indirect PTRx configuration or something silly that wouldn't make any sense (quadraticly increasing addresses?).
Comments
Not if we go with direct servo drive.
Since ALTR, ALTD and ALTS are "prefixing" instructions, Isn't it possible to "extend" at least some of the instructions that were "prefixed", by adding "extra" GOx stage(s) to them, thus gaining enough time for the addition(s) to happen?
I know the way I've described above will not be exactly the one they'll perform. It'll be more likely to have two decode paths for the same instruction, discriminated by selectors resulting from the prefixing instructions that were executed before them, but I'm sure you could understand what i meant.
That would be possible, and it would make the instructions one cycle longer. However, it would mean making a MULTICYCLE assignment, which opens a Pandora's box of other, necessitated MULTICYCLE assignments. I used to have parts of the chip settle over two clocks, but the ins and outs of making every requisite MULTICYCLE assignment was getting to be way too complicated and I was not 100% confident that I covered every case. The trouble is, logic flows all over the place, in ways you don't realize, at first. Unless you go from only one set of flops to another, which is actually not practical in a design where signals are borrowed all over, you may never get it straight.
A few months ago, I got rid of all the MULTICYCLE paths in the Prop2 and inserted some flops in key places, so that the tool could resolved all timing, based on one constant, everywhere. I only suffered a small Fmax reduction, but now I KNOW that I haven't left anything out. So, I don't want to go back to the MULTICYCLE abyss.
ALTS myptr, #%1SUPNNNNN RDLONG x, 0-0 myptr res 1
EDIT: This wouldn't work for RDLONG, but I still think it would be useful in other cases.
I don't think there is any gain there because the ALT prefixes is always a full accumulate from S to D. That's very flexible. The PTRA/B based operations are far more limited in inputs so needs the special limited indexing instead.
However, the ALTI instruction maintains the old ALTDS functionality which could increment similarly to the PTRx operations. So, we may still have an equivalent ... not sure, I remember it being difficult to handle the S fields which is where the HubOps address HubRAM from.
Why don't you think your suggestion will work for a RDLONG?
It may be too late in the game to do this, but I will throw this idea into the pot here.
There are pros and cons with what I suggest here.
Reduce the CCCC conditional execution field from 4 bits to 2 bits. This allows the IF_Z and IF_C combination fields to still be utilised directly.
Introduce a FLAGS / COND / CONDREP/ STATUS / CHECK instruction which perfoms the condition check required and sets a counter, in a similar fashion to REP, to specify that the next sequence of instructions is an extended conditional, ie, IF_BE which is IF_C | IF_Z == 1.
if_z jmp #elsewhere ' IF_Z and varients still function for speed ' FLAGS #sequence, #condition flags #.cond_seq, IF_BE add my_value, #1 ' only executed IF_BE shl my_value, #3 ' only executed IF_BE .cond_seq ' end of conditional sequence add my_value, #1 ' this is always executed, outside of conditional if_c jmp #somewhere flags #.old_code, IF_NEVER shl my_value, #4 ' this is never executed. .old_code
The FLAGS / COND / CONDREP / STATUS / CHECK instruction has the same attributes as the REP instruction. It inhibits interrupts until complete, and it is cancelled when a branch is executed.
PNut and other assemblers can generate an error if an extended conditional, ie IF_BE, is specified without an enclosing FLAGS specifier.
The 2 new bits from the CCCC field are for pointer selection, but first we rename:
adra to ptrc adrb to ptrd
This allows ptra and ptrb to continue to be associated with; loc, calla/b, reta/b, pusha/b, and popa/b. In HLL use, one may be used for the stack pointer and the other for the stack frame pointer.
Ptrc and ptrd are used with all other instructions.
The pointer select bits indicate if ptrc and/or ptrd are used in theinstruction in a similar manner to RDLONG etc, but use ptrc and d instead. This allows pointer operation on all instructions, with 1 bit for S reg and 1 bit for D reg. When specified, the pattern in S or D is similar to current RDLONG etc; 1SUPnnnnn. However we do not need the always 1 bit, therefore for our 9 bits we get SUPnnnnnn.
S selects ptrc or ptrd U specifes pointer update P specifes pointer post modification nnnnnn = index -32 to 31
As pointers can now be specified for all instructions, we can now do:
loc ptrc, #my_long_array ' array or long loc ptrd, #my_structure ' a data array structure rep #.loop, #num_entries mov ptrc++, ptrd[8] add ptrd, #sizeof_structure ' next element of array structure .loop
It may be possible to remove the RDLONG etc instructions externally, or keep them to show explicit Cog/Hub movement, if address resolution in the Cog can operate on a move:
address bits b31:b9 == 0, then cog address bits b31:11 == 0 and not cog address, then lut not cog and not lut then hub.
So the verilog internally can deduce that mov #$100, #$408 would be hub to cog move and execute accordingly.So what do we gain from this:
Four flexible pointers, double the index range, instead of two. No additional Special Registers. Possibly freeing or reallocating instruction opcodes for RDLONG etc externally
What do we lose:
Required to place conditional execution except for most basic Z and C in a block.
Logic wise:
An additional counter for the FLAGS / COND execution. An additional indexing logic / circuit per cog. Additional logic to prohibit interrupts whilst the conditional block is executing.
Also, can this be done without slowing the clock speed?
Reposting code for reference:
ALTS myptr, #%1SUPNNNNN RDLONG x, 0-0 myptr res 1
That would not act like PTRx. It would instead make RDLONG get the address from a different register each time.
Using "RDLONG x, #0-0" instead wouldn't work either (when address & $100 <> 0), because then you would be doing indirect PTRx configuration or something silly that wouldn't make any sense (quadraticly increasing addresses?).