When I code for hubexec, I use an ORGH $xxxx statement.
Now my code is being compiled for hub. All DJXX/JMP/CALLx instructions will use relative if possible.
I presume when the DJxx/JMP/CALLx instructions are out of relative range, the direct hub address will be used.
Everything seems fine so far.
Has anyone thought about what happens when we want the code to be able to execute in both hubexec and cogexec/lutexec modes ??? We cannot let the compiler use direct mode.
I have found the ## to automatically insert the additional AUGS or AUGD instruction extremely useful. Lots of the extra instructions support the D/# variant.
I have found the ## to automatically insert the additional AUGS or AUGD instruction extremely useful. Lots of the extra instructions support the D/# variant.
That DJNZ x,##y is another nifty use
Yep, a great feature.
Pnut even can do a double AUG insertion like
wrlong ##$1234_5678,##$3_ffb0
An immediate 32 bit value to a immediate hub address, very cool! :cool:
I would have thought the AUGD gets applied to the AUGS, which would be a mess. There must be some special case detection when they get paired ... Has Chip talked about this before?
EDIT: Maybe it's a bit like the SETQ instruction, ie: Internal registers for S and D that are specially preloaded by the AUGx instructions and not used until a non-AUGx instruction is executed. That would mean any number of AUG's can be sequenced... ?
They load up internal registers which get combined at the next "normal" instruction execution, which also resets those registers. There are S and D registers for AUGS and AUGD.
There are also ALTS, ALTD, ALTI and ALTR instructions which can vary the following instructions S, D, I (instruction) and R (result) addresses.
******* is used in subsequent decoding.
For example, the ******* in the SETNIB/GETNIB/ROLNIB/******* is divided into the next line SETBYTE/GETBYTE. So therefore
ff=00 = SETNIB
ff=01 = GETNIB
ff=10 = ROLNIB
ff=11 = SETBYTE if instr[0]=0, GETBYTE if instr[0]=1
<empty> means the opcode if available. ie not used.
Ok, I found this in the old P2-Hot stuff.
Looks like it should be the same in the new P2?
PIXEL MIXER
-----------
Each cog has a pixel mixer called MIX that can combine two pixels in a sum-of-products
operation, where:
inputs:
DA = D pixel A component (8 bits)
DR = D pixel R component (8 bits)
DG = D pixel G component (8 bits)
DB = D pixel B component (8 bits)
SA = S pixel A component or GETPIX A' component (8 bits)
SR = S pixel R component or GETPIX R' component (8 bits)
SG = S pixel G component or GETPIX G' component (8 bits)
SB = S pixel B component or GETPIX B' component (8 bits)
outputs:
A' = ((DA * DAX + SA * SAX + 255) / 256) max 255
R' = ((DR * DRX + SR * SRX + 255) / 256) max 255
G' = ((DG * DGX + SG * SGX + 255) / 256) max 255
B' = ((DB * DBX + SB * SBX + 255) / 256) max 255
The DAX/DRX/DGX/DBX/SAX/SRX/SGX/SBX terms determine the type of mixing that will be done.
The terms are configurable for the MIXPIX/GETPIX instructions, but fixed for the others:
ADDPIX D,S/# - Add and clamp A:R:G:B components into D
DAX = $FF SAX = $FF
DRX = $FF SRX = $FF
DGX = $FF SGX = $FF
DBX = $FF SBX = $FF
MULPIX D,S/# - Multiply A:R:G:B components into D
DAX = SA SAX = $00
DRX = SR SRX = $00
DGX = SG SGX = $00
DBX = SB SBX = $00
BLNPIX D,S/# - Blend A:R:G:B components by SA into D
DAX = !SA SAX = SA
DRX = !SA SRX = SA
DGX = !SA SGX = SA
DBX = !SA SBX = SA
Here is the general-purpose MIXPIX instruction:
MIXPIX D,S/# - Mix A:R:G:B components according to SETMIX into D
To configure for MIXPIX/GETPIX usage, the SETMIX instruction is used:
SETMIX D/#,S/# - Set MIX configuration to D/#[8..0], S/#[31..0]
D/#[8..0] sets M - initialized to $001 *
S/#[31..24] sets DAB - initialized to $00
S/#[23..16] sets DCB - initialized to $00
S/#[15..8] sets SAB - initialized to $FF *
S/#[7..0] sets SCB - initialized to $00
M[8] = 0 for long mode, where D and S pixels are 8:8:8:8 bit A:R:G:B
M[8] = 1 for word mode, where D and S pixels are 1:5:5:5 bit A:R:G:B
1:5:5:5 pixels are expanded so that %A_BCDEF_GHIJK_LMNOP becomes
%AAAAAAAA_BCDEFBCD_GHIJKGHI_LMNOPLMN for the mixing computation.
When being packed back down to 1:5:5:5 bit A:R:G:B, the single A
bit will be 1 if the resultant A was not 0, and the R:G:B fields
will be set to the top 5 bits of the resultant R:G:B.
In word mode, the low word in D will be operated on and the words
will be swapped, leaving the mixed pixel in the new high word and
the old high word in the new low word. Also, pixel data from S
will be taken alternately from the low and high word with each
operation, with SETMIX resetting the selector to the low word.
Word mode affects all ADDPIX/MULPIX/BLNPIX/GETMIX/GETPIX.
M field 000 001 010 011 100 101 110 111
--------------------------------------------------------------
M[7] DAX = DAB SA
M[6..4] DRX = $00 $FF SA !SA DA !DA DCB SR
M[6..4] DGX = $00 $FF SA !SA DA !DA DCB SG
M[6..4] DBX = $00 $FF SA !SA DA !DA DCB SB
--------------------------------------------------------------
M[3] SAX = SAB DA
M[2..0] SRX = $00 $FF SA !SA DA !DA SCB DR
M[2..0] SGX = $00 $FF SA !SA DA !DA SCB DG
M[2..0] SBX = $00 $FF SA !SA DA !DA SCB DB
* M and SAB are initialized on cog start so that GETPIX will return the
scaled A:R:G:B texture pixel without any blending.
The PIXADD/PIXMUL/PIXBLN/PIXMIX instructions all take 2 clocks, while GETPIX
takes 3 clocks.
The ALTx instructions modify the next instruction whereas SETx modifies the cog ram contents.
The SETx instructions also requires 2 spacer instruction.
Comments
Ahh! That makes sense.
How about SLMBYTE (Shift Left, Move)?
Shift/Rotate D left 1 byte, and/but inserting the new byte 0 from S byte n.
When I code for hubexec, I use an ORGH $xxxx statement.
Now my code is being compiled for hub. All DJXX/JMP/CALLx instructions will use relative if possible.
I presume when the DJxx/JMP/CALLx instructions are out of relative range, the direct hub address will be used.
Everything seems fine so far.
Has anyone thought about what happens when we want the code to be able to execute in both hubexec and cogexec/lutexec modes ??? We cannot let the compiler use direct mode.
I suspect that DJNZ can be augmented with AUGS to extend it's range too.
Edit: Pnut allows DJNZ to use "##" to extend range
That DJNZ x,##y is another nifty use
Pnut even can do a double AUG insertion like An immediate 32 bit value to a immediate hub address, very cool! :cool:
EDIT: Maybe it's a bit like the SETQ instruction, ie: Internal registers for S and D that are specially preloaded by the AUGx instructions and not used until a non-AUGx instruction is executed. That would mean any number of AUG's can be sequenced... ?
There are also ALTS, ALTD, ALTI and ALTR instructions which can vary the following instructions S, D, I (instruction) and R (result) addresses.
Maybe an improved description is in order.
Below are some further instruction opcode suggestions (includes those from above again)... The result is in the following post.
What's the diff between ******* and <empty>?
For example, the ******* in the SETNIB/GETNIB/ROLNIB/******* is divided into the next line SETBYTE/GETBYTE. So therefore
ff=00 = SETNIB
ff=01 = GETNIB
ff=10 = ROLNIB
ff=11 = SETBYTE if instr[0]=0, GETBYTE if instr[0]=1
<empty> means the opcode if available. ie not used.
The second group (SETNIB...QVECTOR) Could be re-ordered to something like the following. Whether it is worth doing is questionable. The remaining sections seem fine.
Any chance on getting some documentation on the following instructions.
I tried a few experiments attempting to work out how they work and this is what I found so far.
ADDPIX seems to add the pixel bytes together and if the 8 bit add overflows the resultant pixel byte is set to $FF
MULPIX appears to multiply the pixel bytes and stores the scaled pixel bytes.
MIXPIX and BLNPIX seem to be affected by the contents of the SETPIX value?
Any other pixel freaks out there played with these yet?
Looks like it should be the same in the new P2?
The SETx instructions also requires 2 spacer instruction.
Edit: See here