XORO32 scrambler output

Chip,
A couple of times now I've wanted to feed another operation directly with a random number. But in both cases it has been a D operand so XORO32 couldn't do it without an intermediate MOV.

A variation of XORO32 that feeds the D port instead of the S port would be nice to have.

"There's no huge amount of massive material
hidden in the rings that we can't see,
the rings are almost pure ice."

Comments

  • evanh wrote: »
    Chip,
    A couple of times now I've wanted to feed another operation directly with a random number. But in both cases it has been a D operand so XORO32 couldn't do it without an intermediate MOV.

    A variation of XORO32 that feeds the D port instead of the S port would be nice to have.

    Okay. Let me look into it....

    It could be done. Could you give me an example of how you would use it?
  • WRLONG was the previous one I think. But just now I was wanting to use SETDACS, albeit just for diagnostics.
    "There's no huge amount of massive material
    hidden in the rings that we can't see,
    the rings are almost pure ice."
  • evanhevanh Posts: 6,614
    edited 2019-01-13 - 10:18:26
    With something like ADD it gets really interesting because it becomes a three operand arrangement with ALU result port retaining the specified D address of the ADD instruction.
    "There's no huge amount of massive material
    hidden in the rings that we can't see,
    the rings are almost pure ice."
  • Heh, oh, there's something to ponder for Prop3 architecture. Have a bit-field in all opcodes just for specifying if the ALU result goes to specified D or to next instruction's D input.
    "There's no huge amount of massive material
    hidden in the rings that we can't see,
    the rings are almost pure ice."
  • cgraceycgracey Posts: 10,924
    edited 2019-01-13 - 10:36:14
    evanh wrote: »
    Heh, oh, there's something to ponder for Prop3 architecture. Have a bit-field in all opcodes just for specifying if the ALU result goes to specified D or to next instruction's D input.

    That's a really neat idea.

    As far as having XORO32 report to D, can you give me a more compelling case? It takes more logic, so it needs to be worth it. XORO32 is an oddball instruction in how it works and it's really nice that it reports to one of D or S. I think S is way more useful if you could pick only D or S, and the S circuit exploits the same path that SCA/SCAS use. An alternate D path would take a whole new set of circuitry.
  • evanhevanh Posts: 6,614
    edited 2019-01-13 - 10:44:08
    You know what, SCA can probably work better outputting to next D input itself. Maybe eliminate the S versions of both.
    "There's no huge amount of massive material
    hidden in the rings that we can't see,
    the rings are almost pure ice."
  • evanh wrote: »
    You know what, SCA can probably work better outputting to next D input itself. Maybe eliminate the S versions of both.

    But, then you couldn't do a multiply-accumulate:

    SCA X,Y
    ADD A,B

    The multiply result would get added to B and then written to A.

    The way it works now, B is ignored and the multiply result gets added into A.

    Am I missing something?
  • TonyB_TonyB_ Posts: 1,124
    edited 2019-01-14 - 00:31:26
    XORO32 is a compound instruction that takes four cycles if both state and PRN are to be updated:
    	XORO32	state
    	ANYDS	PRN,0-0
    
    where ANYDS is any D,S instruction (not just a MOV) with S replaced by XORO32 PRN output.

    It might be possible to skip PRNs as follows:
    	XORO32	state
    	XORO32	state		' PRN in S field ignored?
    	ANYDS	PRN,0-0
    

    One of the C or Z opcode bits in XORO32 could be used to specify which of S or D is replaced by the PRN in the next instruction, but is it really worth it?
    Formerly known as TonyB
  • cgracey wrote: »
    SCA X,Y
    ADD A,B

    The multiply result would get added to B and then written to A.

    The way it works now, B is ignored and the multiply result gets added into A.
    "B is ignored" is the key to it working just as well for A too. ADD A,A is perfectly fine solution there. And could be considered tidier looking even.

    "There's no huge amount of massive material
    hidden in the rings that we can't see,
    the rings are almost pure ice."
  • evanhevanh Posts: 6,614
    edited 2019-01-13 - 20:32:10
    TonyB_ wrote: »
    	XORO32	state
    	XORO32	state		' PRN in S field ignored?
    	ANYDS	PRN,0-0
    
    Correct, the first output is discarded. I have used that to "jump" in testing.

    "There's no huge amount of massive material
    hidden in the rings that we can't see,
    the rings are almost pure ice."
  • evanh wrote: »
    cgracey wrote: »
    SCA X,Y
    ADD A,B

    The multiply result would get added to B and then written to A.

    The way it works now, B is ignored and the multiply result gets added into A.
    "B is ignored" is the key to it working just as well for A too. ADD A,A is perfectly fine solution there. And could be considered tidier looking even.

    Wouldn't it stay as ADD A,B?

    Replacing the S field has at least two drawbacks:

    1. D must be set to something beforehand for many D,S instructions, probably most, requiring an extra instruction.
    2. An immediate or register constant cannot be used, again adding an extra instruction for these cases.

    It's not clear cut that replacing S is better than D. In fact, replacing D might be the best option overall, assuming that not reading D but writing D is no more complicated than not reading S.

    GET/SET/ROLxxxx wouldn't work as they do now, however.
    Formerly known as TonyB
  • Yeah, that's the basis of thinking for overriding next instruction's D input instead of S input.

    As for why I said "ADD A,A", that is because SCA is intended for multiply-accumulate function, and the ADD is the accumulate part. To do that the register to be accumulated to has to be both the input and output. D would normally provide exactly that. But if SCA overrides D input, leaving D output intact, then S input can be pointed to the same register to achieve an accumulator.

    ALTxx + GET/SET/ROLxxxx are their own case. Same for all ALTx instructions.
    "There's no huge amount of massive material
    hidden in the rings that we can't see,
    the rings are almost pure ice."
  • I'm confused. What's the consensus here?
  • My interpretation is Tony was running over the pro's and con's of overriding D inputs vs S inputs and concluded that D always seemed best as a generalisation. The one exception is that example above where one wanted to rapidly iterate XORO32 on its own.

    SCA, since it'll create the three operand arrangement, might even gain a new ability by changing to overriding the next instruction's D input.
    "There's no huge amount of massive material
    hidden in the rings that we can't see,
    the rings are almost pure ice."
  • evanhevanh Posts: 6,614
    edited 2019-01-14 - 21:53:17
    I suppose that's the one advantage overriding S input does have, overriding the D input destroys the fastest case of self-recursion.
    EDIT: Correction, overriding the D input destroys the fastest case of self-recursion only when it's a single operand instruction like XORO32.

    EDIT2: Hmm, was just trying to picture recursing ADD with an override D but that doesn't have any advantage over a regular ADD, it's too simple an instruction. XORO32 is unique in this way I think.
    "There's no huge amount of massive material
    hidden in the rings that we can't see,
    the rings are almost pure ice."
  • In the Prop2:
    MOV  pa,raw
    ADD  pa,offset
    MUL  pa,scale
    CALL  #graph
    
    In the Prop3, it becomes:
    ADD  raw,offset   WQ
    MUL  pa,scale
    CALL  #graph
    
    :D That is cool! Covers the very reason why three-operand instructions ever existed while also eliminating any need for superscalar "move elimination".
    "There's no huge amount of massive material
    hidden in the rings that we can't see,
    the rings are almost pure ice."
  • TonyB_TonyB_ Posts: 1,124
    edited 2019-01-15 - 00:48:13
    cgracey wrote: »
    I'm confused. What's the consensus here?

    My change of mind has probably caused some of the confusion and apologies if so.
    evanh wrote: »
    My interpretation is Tony was running over the pro's and con's of overriding D inputs vs S inputs and concluded that D always seemed best as a generalisation. The one exception is that example above where one wanted to rapidly iterate XORO32 on its own.

    SCA, since it'll create the three operand arrangement, might even gain a new ability by changing to overriding the next instruction's D input.

    Skipping PRNs by following one XORO32 with another was not intended as a serious practical suggestion.

    I now think that P2 rev B would be improved if the SCA/SCAS/XORO32 instructions were changed so that the output is used as the next instruction's D value.

    SCA/SCAS would work just as well as now (better actually) and XORO32 would benefit even more. Code would be smaller and faster for most cases. No need for both S and D options, just the latter.

    If it's too late or D is too difficult, we could live with S.
    Formerly known as TonyB
  • May I suggest that it would be a really, really good idea to minimize changes in the instruction set. We're trying to get tools developed and used now, and instruction set changes are just going to screw up any libraries that users develop on the P2 eval board. Not to mention that requiring changes in the tools, and having the tools support both revs of silicon, are going to be more difficult the more changes that get made.

    At some point the P2 needs to be labelled as "done"!!
  • Everyone is sensitive to that concern Eric.

    I wouldn't have raised the topic if I thought it would have a significant impact on libraries. Or any impact for that matter. I'd be surprised if anyone has built a library around XORO32 or SCA yet.

    "There's no huge amount of massive material
    hidden in the rings that we can't see,
    the rings are almost pure ice."
  • For the SCA+ADD case, if you always write `ADD A, A` on code intented for the P2-ES. which does it the S way, your code would still work on the next silicon revision if it were changed to do it the D way.
  • For the SCA+ADD case, if you always write `ADD A, A` on code intented for the P2-ES. which does it the S way, your code would still work on the next silicon revision if it were changed to do it the D way.

    Yes and the D way would give you the extra functionality of ADD A,B. SCA/SCAS could have identical code for P2 revs A and B, but that would not apply to XORO32, e.g. writing the PRN to cog RAM:
    	XORO32	state		' PRN = S in next instruction
    	MOV	PRN,0
    ' or
    	XORO32	state		' PRN = D in next instruction
    	ADD	PRN,#0
    

    However, the D way would allow writing the PRN to LUT/HUB RAM or to DACs or bit testing with the minimum possible coding and often XORO32 would be only a net two cycle instruction.

    I'm a bit sceptical that a future P3 would cater for both S and D, so the decision made now is likely to be permanent.
    Formerly known as TonyB
  • ElectrodudeElectrodude Posts: 1,269
    edited 2019-02-11 - 16:48:03
    Can single-operand assembler aliases for `MOV x, x` (to receive the result of XORO32) and `ADD x, x` (to receive the result of SCA) be added to encourage people to write forward-compatible code in case this change is ever made?
  • TonyB_TonyB_ Posts: 1,124
    edited 2019-02-11 - 18:51:19
    Can single-operand assembler aliases for `MOV x, x` (to receive the result of XORO32) and `ADD x, x` (to receive the result of SCA) be added to encourage people to write forward-compatible code in case this change is ever made?

    But would MOV x,x actually work as intended after XORO32 if the PRN is in the D field?
    Formerly known as TonyB
  • Good, point Tony. MOV doesn't use D operand as an input.
    ADD dest, #0 would do the equivalent job. Of course this scrappy looking code line makes Electrodudes point more urgent.
    "There's no huge amount of massive material
    hidden in the rings that we can't see,
    the rings are almost pure ice."
  • You're right. On thinking about it more, all I could come up with that would work both ways for XORO32 are tricks equivalent to your `ADD dest, #0` that, while they are sufficiently random, aren't what XORO32 is supposed to output.

    If it's OK for it to compile differently for different chips, the alias could translate to `MOV x, #0` on the current chip and `ADD x, #0` if it's changed.
  • Yeah, it's not really an alias but, like the AUGx instruction, with a difference in the parameters to the mnemonic the assembler can reassemble to different combinations. Existing code can stay as is. eg:
    		xoro32	state, result
    
    'with S port insertion, could reassemble to
    		xoro32	state
    		mov	result, 0-0
    
    "There's no huge amount of massive material
    hidden in the rings that we can't see,
    the rings are almost pure ice."
  • And:
    		xoro32	state, result
    
    'with D port insertion, could reassemble to
    		xoro32	state
    		add	result, #0
    
    "There's no huge amount of massive material
    hidden in the rings that we can't see,
    the rings are almost pure ice."
  • evanhevanh Posts: 6,614
    edited 2019-02-11 - 23:54:51
    I suppose that is still an alias, just composed of multiple instructions is all. Make it user definable and it becomes a macro. :)
    "There's no huge amount of massive material
    hidden in the rings that we can't see,
    the rings are almost pure ice."
  • As pointed out by Seairth here, GETXACC would also be affected by changing from S to D, but it would act in a similar way as XORO32. Putting Goertzel X into D and Y into the next D has a nice symmetry and overall D is better anyway, I think.
    Formerly known as TonyB
Sign In or Register to comment.