ALTDS replaced with ALTI/ALTR/ALTD/ALTS

cgracey · 2015-11-04 00:37

David Betz wrote: »

78rpm wrote: »

cgracey wrote: »

If you look at the opcodes, there's no more bits to use for something like 'indirect'.

It either has to be via special registers, or with a prefix instruction like I've done here.

I am being very non-serious, but my isr is used as a code example in this thread. Consider changing ram size from 32 to 33-1/3 bits per long, the extra bit and a bit will facilitate long play. :crazy: :clown: :crazy:

If we did that wouldn't we have to also switch from digital to analog?

Not if we go with direct servo drive.

David Betz · 2015-11-04 00:38

cgracey wrote: »

David Betz wrote: »

78rpm wrote: »

cgracey wrote: »

If you look at the opcodes, there's no more bits to use for something like 'indirect'.

It either has to be via special registers, or with a prefix instruction like I've done here.

I am being very non-serious, but my isr is used as a code example in this thread. Consider changing ram size from 32 to 33-1/3 bits per long, the extra bit and a bit will facilitate long play. :crazy: :clown: :crazy:

If we did that wouldn't we have to also switch from digital to analog?

Not if we go with direct servo drive.

Powered by a flux capacitor no doubt.

Yanomani · 2015-11-04 01:11

Only some food for thought...

Since ALTR, ALTD and ALTS are "prefixing" instructions, Isn't it possible to "extend" at least some of the instructions that were "prefixed", by adding "extra" GOx stage(s) to them, thus gaining enough time for the addition(s) to happen?

I know the way I've described above will not be exactly the one they'll perform. It'll be more likely to have two decode paths for the same instruction, discriminated by selectors resulting from the prefixing instructions that were executed before them, but I'm sure you could understand what i meant.

cgracey · 2015-11-04 01:35

Yanomani wrote: »

Only some food for thought...

Since ALTR, ALTD and ALTS are "prefixing" instructions, Isn't it possible to "extend" at least some of the instructions that were "prefixed", by adding "extra" GOx stage(s) to them, thus gaining enough time for the addition(s) to happen?

I know the way I've described above will not be exactly the one they'll perform. It'll be more likely to have two decode paths for the same instruction, discriminated by selectors resulting from the prefixing instructions that were executed before them, but I'm sure you could understand what i meant.

That would be possible, and it would make the instructions one cycle longer. However, it would mean making a MULTICYCLE assignment, which opens a Pandora's box of other, necessitated MULTICYCLE assignments. I used to have parts of the chip settle over two clocks, but the ins and outs of making every requisite MULTICYCLE assignment was getting to be way too complicated and I was not 100% confident that I covered every case. The trouble is, logic flows all over the place, in ways you don't realize, at first. Unless you go from only one set of flops to another, which is actually not practical in a design where signals are borrowed all over, you may never get it straight.

A few months ago, I got rid of all the MULTICYCLE paths in the Prop2 and inserted some flops in key places, so that the tool could resolved all timing, based on one constant, everywhere. I only suffered a small Fmax reduction, but now I KNOW that I haven't left anything out. So, I don't want to go back to the MULTICYCLE abyss.

Yanomani · 2015-11-04 02:34

Thanks Chip, by showing why it couldn't follow that path.

Electrodude · 2015-11-04 04:08

What if you change the immediate version of ALT{R,D,S} to act like PTRx? In other words, when bit 8 of the immediate source is set, D will be affected in the same way the address of a "RDLONG x, #%1_SUPNNNNN" would be affected. Then, you could also use PTRx (which would now just be a normal register) for stuff like LUT and cogram and anything else involving things like array[x += y].

        ALTS    myptr, #%1SUPNNNNN
        RDLONG  x,     0-0

myptr   res 1

EDIT: This wouldn't work for RDLONG, but I still think it would be useful in other cases.

evanh · 2015-11-04 04:28

Electrodude,
I don't think there is any gain there because the ALT prefixes is always a full accumulate from S to D. That's very flexible. The PTRA/B based operations are far more limited in inputs so needs the special limited indexing instead.

evanh · 2015-11-04 04:32

In other words, it think the new arrangement totally covers everything that PTRA/B can do, and more. The sacrifice is it's a little bigger and slower.

evanh · 2015-11-04 11:16

Oops, I'm flat wrong. The new ALTR/ALTD/ALTS instructions don't accumulate at all. It's just an add.

However, the ALTI instruction maintains the old ALTDS functionality which could increment similarly to the PTRx operations. So, we may still have an equivalent ... not sure, I remember it being difficult to handle the S fields which is where the HubOps address HubRAM from.

evanh · 2015-11-04 11:32

Electrodude,
Why don't you think your suggestion will work for a RDLONG?

78rpm · 2015-11-04 11:45

cgracey wrote: »
In the next release...

What was ALTDS is now ALTI.

There are three new instructions that share the opcode space with ALTI (no more C/Z writing options, as they were meaningless for these instructions):

ALTR D,S/# - use the sum of D and S/# for the result register in the next instruction
ALTD D,S/# - use the sum of D and S/# for the D register in the next instruction
ALTS D,S/# - use the sum of D and S/# for the S register in the next instruction

The idea is that D is an offset and S/# is a base:

ALTx offset,#base

So, now we'll have simple-to-use instructions for R/D/S alterations.

This cleans up 78rpm's task switcher.

OLD CODE
task_switcher   ' this is the task switcher isr
        ' save current task
                addct1  task_time, ##TASKS_TIMER    
                mov     modify, curr_task_index
                add     modify, #task_ctrl_blk
                altds   modify, #%000_100_000           ' replace D reg
                mov     0-0, IRET1                      ' tcb[ task index]
                add     curr_task_index, #1             ' next task
                and     curr_task_index, #$3            ' in range 0 - 3
        ' set new task
                mov     modify, curr_task_index
                add     modify, #task_ctrl_blk
                altds   modify, #%000_000_100           ' replace S reg
                mov     IRET1, 0-0                      ' tcb[ task index]

                reti1
NEW CODE
task_switcher   addct1  task_time,##TASKS_TIMER         'set next interrupt time

                altd    curr_task_index,#task_ctrl_blk  'save current task
                mov     0,IRET1

                incmod  curr_task_index,#3              'inc/reset task pointer

                alts    curr_task_index,#task_ctrl_blk  'set new task
                mov     IRET1,0

                reti1                                   'execute new task

It may be too late in the game to do this, but I will throw this idea into the pot here.

There are pros and cons with what I suggest here.

Reduce the CCCC conditional execution field from 4 bits to 2 bits. This allows the IF_Z and IF_C combination fields to still be utilised directly.

Introduce a FLAGS / COND / CONDREP/ STATUS / CHECK instruction which perfoms the condition check required and sets a counter, in a similar fashion to REP, to specify that the next sequence of instructions is an extended conditional, ie, IF_BE which is IF_C | IF_Z == 1.


	if_z	jmp	#elsewhere		' IF_Z and varients still function for speed

'		FLAGS	#sequence, #condition

		flags	#.cond_seq, IF_BE
		add	my_value, #1		' only executed IF_BE
		shl	my_value, #3		' only executed IF_BE
.cond_seq					' end of conditional sequence
		add	my_value, #1		' this is always executed, outside of conditional
	if_c	jmp	#somewhere
		flags	#.old_code, IF_NEVER
		shl	my_value, #4		' this is never executed.
.old_code

The FLAGS / COND / CONDREP / STATUS / CHECK instruction has the same attributes as the REP instruction. It inhibits interrupts until complete, and it is cancelled when a branch is executed.

PNut and other assemblers can generate an error if an extended conditional, ie IF_BE, is specified without an enclosing FLAGS specifier.

The 2 new bits from the CCCC field are for pointer selection, but first we rename:

	adra to ptrc
	adrb to ptrd

This allows ptra and ptrb to continue to be associated with; loc, calla/b, reta/b, pusha/b, and popa/b. In HLL use, one may be used for the stack pointer and the other for the stack frame pointer.

Ptrc and ptrd are used with all other instructions.

The pointer select bits indicate if ptrc and/or ptrd are used in theinstruction in a similar manner to RDLONG etc, but use ptrc and d instead. This allows pointer operation on all instructions, with 1 bit for S reg and 1 bit for D reg. When specified, the pattern in S or D is similar to current RDLONG etc; 1SUPnnnnn. However we do not need the always 1 bit, therefore for our 9 bits we get SUPnnnnnn.

	S selects ptrc or ptrd
	U specifes pointer update
	P specifes pointer post modification
	nnnnnn = index -32 to 31

As pointers can now be specified for all instructions, we can now do:

		loc	ptrc, #my_long_array	' array or long
		loc	ptrd, #my_structure	' a data array structure
		rep	#.loop, #num_entries
		mov	ptrc++, ptrd[8]
		add	ptrd, #sizeof_structure ' next element of array structure
.loop

It may be possible to remove the RDLONG etc instructions externally, or keep them to show explicit Cog/Hub movement, if address resolution in the Cog can operate on a move:

  address bits b31:b9 == 0, then cog
  address bits b31:11 == 0 and not cog address, then lut
  not cog and not lut then hub.

So the verilog internally can deduce that mov #$100, #$408 would be hub to cog move and execute accordingly.

So what do we gain from this:

  Four flexible pointers, double the index range, instead of two. 
  No additional Special Registers.
  Possibly freeing or reallocating instruction opcodes for RDLONG etc externally

What do we lose:

  Required to place conditional execution except for most basic Z and C in a block.

Logic wise:

  An additional counter for the FLAGS / COND execution.
  An additional indexing logic / circuit per cog.
  Additional logic to prohibit interrupts whilst the conditional block is executing.

Also, can this be done without slowing the clock speed?

evanh · 2015-11-04 11:50

Ugh! I know I can't talk but ... New topic please!

Electrodude · 2015-11-04 18:06

evanh wrote: »

Electrodude,
Why don't you think your suggestion will work for a RDLONG?

Reposting code for reference:

        ALTS    myptr, #%1SUPNNNNN
        RDLONG  x,     0-0

myptr   res 1

That would not act like PTRx. It would instead make RDLONG get the address from a different register each time.

Using "RDLONG x, #0-0" instead wouldn't work either (when address & $100 <> 0), because then you would be doing indirect PTRx configuration or something silly that wouldn't make any sense (quadraticly increasing addresses?).

evanh · 2015-11-06 08:09

Oh gee, it's a mind bender. Bugger, all those ideas are out the door.

ALTDS replaced with ALTI/ALTR/ALTD/ALTS

Comments