SFUNC question
TonyB
Posts: 73
in Propeller 2
Why does SFUNC exist as a separate instruction?
Currently there are only eight functions and the proposed 32-bit version of xoroshiro128+ will add another but that leaves over 500 potential functions unused.
The opcode for SFUNC is in effect
EEEE 1001111 111 DDDDDDDDD 00000FFFF
where FFFF specifies the function.
(Is it 111 or 11I? Documentation is contradictory.)
Meanwhile the opcode
EEEE 1101011 ... ......... .........
for instructions with no source (nor destination sometimes) has room for over 400 more and S[8:7] = 00 always.
Could SFUNC be moved to
EEEE 1101011 00L DDDDDDDDD 01000FFFF ?
(C and Z could be modified if they have any meaning.)
If so this would free up
EEEE 1001111 11I DDDDDDDDD SSSSSSSSS
for a final, last-minute instruction that really needs both D & S, e.g. a MASKNIB as described below.
Nibble-wide video (4bpp) is ideally suited to 32-bit data as one long holds eight pixels but manipulating nibbles can be tedious and slow when the minimum memory width is a byte. Unless I've misunderstood, the current nibble instructions can change only one nibble in D at a time.
I think it would be great if a mask instruction MASKNIB could change each of the eight nibbles in D in one go, as follows:
DDDD := DDDD if SSSS = 0000
DDDD := SSSS if SSSS > 0000
In other words a zero nibble is transparent which is exactly how the TI VDP and its variants in 16-colour mode worked: 0000 = transparent, 0001 = black, ... , 1111 = white. (Colours could be changed in later chips with a palette.)
MASKNIB would make it easy to have loads of sprites or cursors, pointers, grids, rulers, etc. As a logical ALU operation it would not be out of place.
Currently there are only eight functions and the proposed 32-bit version of xoroshiro128+ will add another but that leaves over 500 potential functions unused.
The opcode for SFUNC is in effect
EEEE 1001111 111 DDDDDDDDD 00000FFFF
where FFFF specifies the function.
(Is it 111 or 11I? Documentation is contradictory.)
Meanwhile the opcode
EEEE 1101011 ... ......... .........
for instructions with no source (nor destination sometimes) has room for over 400 more and S[8:7] = 00 always.
Could SFUNC be moved to
EEEE 1101011 00L DDDDDDDDD 01000FFFF ?
(C and Z could be modified if they have any meaning.)
If so this would free up
EEEE 1001111 11I DDDDDDDDD SSSSSSSSS
for a final, last-minute instruction that really needs both D & S, e.g. a MASKNIB as described below.
Nibble-wide video (4bpp) is ideally suited to 32-bit data as one long holds eight pixels but manipulating nibbles can be tedious and slow when the minimum memory width is a byte. Unless I've misunderstood, the current nibble instructions can change only one nibble in D at a time.
I think it would be great if a mask instruction MASKNIB could change each of the eight nibbles in D in one go, as follows:
DDDD := DDDD if SSSS = 0000
DDDD := SSSS if SSSS > 0000
In other words a zero nibble is transparent which is exactly how the TI VDP and its variants in 16-colour mode worked: 0000 = transparent, 0001 = black, ... , 1111 = white. (Colours could be changed in later chips with a palette.)
MASKNIB would make it easy to have loads of sprites or cursors, pointers, grids, rulers, etc. As a logical ALU operation it would not be out of place.
Comments
You tool developers... would it be a hardship for you if this were done?
You are talking about moving the binary encode of SFUNC, and adding a MASKNIB opcode in the space released ?
That seems ok - that adds one more word/encode, and changes one not often used opcode encode.
Main issues is keeping things in phase..., but MASKNIB is new, and SFUNC is sounding rare, so even a skew is unlikely to break much ?
Have you given any consideration to making your 640x480 driver (that's in the works) interoperable with 800x480 displays? In particular, perhaps there could be a switch variable that could insert an extra pixel (duplicate an adjacent pixel) every 4 pixels to stretch what was otherwise 640 horizontal pixels to 800. As we know, many small format displays these days are WVGA with 800 native horizontal pixels per line. VGA driver chips/boards for these displays typically have the ability to stretch 640 horizontal pixels to 800, but, as far as I know, one has to use (or connect) a small keypad to such displays to toggle between stretched (640 px) and non-stretched (800 px) modes. If one doesn't activate stretching manually, then such displays will have 80-pixel-wide black bands on the left and right sides. So, a software switch (programmatic) method of utilization would seem desirable, as it would allow always "driving" such panels--likely through a VGA display driver chip--at their native 800x480 resolution (of course, ultimately, the VGA driver chips use 800x480 if that's what the panel is, but I'm referring to the VGA input signals to such driver chips). Now, while it's the case that VGA driver chips typically remember the stretched setting (i.e., store it in EEPROM), it apparently takes a keypad to be able to change things (to avoid the black bands), hence the desire to control things from the software generating the display signals (to allow changes without a keypad).
What would probably be ideal in a VGA/WVGA driver would be an 800x480 standard mode, a 640x480 stretched to 800x480 mode for driving native 800x640 panels, and a standard 640x480 mode for panels with that native resolution (or configured to display it). Such a driver (or similar drivers) could help the P2 to hit the ground running.
Thinking 4-bpp might be one sweet spot for VGA and XGA resolutions.
Anything that can help make that easier is nice.
I think it's very important that the transparent nibble is 0000 as per the Texas Instruments Video Display Processors. There is a lot of software out there that could run unaltered on the P2 with a Z80 emulator (already done on P1) and a VDP emulator (attempted on P1 but never completed successfully, I believe).
BTW: Would be handy to have a 2-bit version of this too... On the other hand, 2-bit graphics probably won't be games or graphic intensive stuff...
Wouldn't mind a nibble variant of this also.
Would it make sense to offer $00 in some opcodes and $FF in others to cover user needs ?
There's insufficient opcode space to allow both. I think I'll change it to $00.
About a nibble version of WMLONG, this can't happen because whole bytes are the granularity of hub RAM.
A two-bit version could be useful for line or textual graphics, where there is a background color plus three foreground colors. We'd need to get rid of an instruction, though, in the vicinity.
(a) A version of triml that sign extends instead of fills with 0. triml is nice, but you can achieve a similar result in 1 instruction by and'ing with a predefined mask. Doing a sign extend always takes 2 instructions.
(b) A bit field extract instruction, something like BFEXT D, S, which shifts D right by S[0:4] and then masks it by ((1 << S[5:8]) - 1). This would actually serve the purposes of both trimr and triml (at least for masks up to 16 bits) while being more flexible.
(c) A SETCOND instruction: if the EEEE field of the instruction is a true condition then D := S, otherwise D := 0. This one's probably tricky because it requires write back to happen even if the condition is false. Alternatively, we could do a CMPSET instruction: CMPSET D,S compares D to S and changes D to the flags that are the result of the comparison: D[31:2] to 0, D[1] to Z, and D[0] to C.
I'm not seriously suggesting that we make hardware changes at this late stage, but if you're changing anything these are the ones that come to mind. Actually the lack of sign extension options seems really odd given that triml/trimr are there. Did I miss something in the instruction documentation?
Eric
“What's in a name? That which we call a rose, by any other name would smell as sweet.”
I think $00 is fine. It'd be a bit like the P1 NTSC where $00 is sync and $02 through $07 are black through white.
Applying this doubling idea to different data widths produces the family of operations on D shown below. Half the resulting bits in adjacent lines are identical so the logic should optimise quite well.
Before:
XXXXXXXX XXXXXXXX ABCDEFGH IJKLMNOP D[31:16] are don't care
After:
AABBCCDD EEFFGGHH IIJJKKLL MMNNOOPP Bit doubler BITDBL
ABABCDCD EFEFGHGH IJIJKLKL MNMNOPOP Twit doubler TWTDBL
ABCDABCD EFGHEFGH IJKLIJKL MNOPMNOP Nibble doubler NIBDBL
ABCDEFGH ABCDEFGH IJKLMNOP IJKLMNOP Byte doubler BYTDBL
ABCDEFGH IJKLMNOP ABCDEFGH IJKLMNOP Word doubler WRDDBL
I didn't intend to suggest anything else new, these doublers just appeared out of the fog last night. They might be useful when data must be output at twice their usual rate, e.g. when mixing data streams with a single fixed clock.
Unlike MASKNIB/MNIBS, I don't think they are vital but if considered worthwhile they could be added to the fairly similar SFUNCs without affecting other instructions. Bit doubling in particular is a horrible algorithm to do in software.
For massive sprite manipulation, there are 16 COGS.
We have getnib. A small lookup table gets the bits doubled pretty quick.
I'm thinking that we really need a sign-extension instruction. Maybe we could get rid of TRIMR and make it a sign-extending version of TRIML.
For a long time, I've thought about a bit field extractor. I think your way is good, where by limiting the field size to 16 bits, you can fit the bit field parameters into a 9-bit immediate.
I need to think about that last one. That's kind of an interesting requirement.
It's really not a big deal, just something that would save an instruction here and there.
Thanks,
Eric
Is this a frequent-enough occurrence to make a special instruction worthwhile?
Well, we already have a TRIML instruction that zero-extends, i.e. does the equivalent of (and a similar TRIMR that swaps the order of the shifts). I'd argue that a sign extending instruction is more important than that, since the TRIML / TRIMR effects can be achieved with a single AND, whereas sign extension is going to take 2 instructions.
How often would you actually use TRIML to zero-extend? The only use that comes to mind is if you are working with the least-significant field of packed data in a single register. For any other field in the packed data, you'd still end up with two instructions (triml/sar or shl/sar), so that doesn't seem worthwhile.
If anything, I'd argue that TRIML/TRIMR will be infrequently used and can be handled with a pair of shifts. Get rid of them entirely and use the instruction slots for something else...
eg That means 32 booleans can pack into one LONG, saving 31 LONGS for more useful work
Somewhat hidden from the BITx group, are boolean AND OR XOR, to support clearer coding of
bool := (a < b) AND (b < c)
but I think a simple Assembler alias allows
Some of these can already be achieved using these instructions.
Seems to use only 5 bits of the 9 bit S field, looks to be room to use MSB as Atomic/Alone choice ?