SFUNC question

TonyB · 2017-04-25 17:42

Why does SFUNC exist as a separate instruction?

Currently there are only eight functions and the proposed 32-bit version of xoroshiro128+ will add another but that leaves over 500 potential functions unused.

The opcode for SFUNC is in effect
EEEE 1001111 111 DDDDDDDDD 00000FFFF
where FFFF specifies the function.
(Is it 111 or 11I? Documentation is contradictory.)

Meanwhile the opcode
EEEE 1101011 ... ......... .........
for instructions with no source (nor destination sometimes) has room for over 400 more and S[8:7] = 00 always.

Could SFUNC be moved to
EEEE 1101011 00L DDDDDDDDD 01000FFFF ?
(C and Z could be modified if they have any meaning.)

If so this would free up
EEEE 1001111 11I DDDDDDDDD SSSSSSSSS
for a final, last-minute instruction that really needs both D & S, e.g. a MASKNIB as described below.

Nibble-wide video (4bpp) is ideally suited to 32-bit data as one long holds eight pixels but manipulating nibbles can be tedious and slow when the minimum memory width is a byte. Unless I've misunderstood, the current nibble instructions can change only one nibble in D at a time.

I think it would be great if a mask instruction MASKNIB could change each of the eight nibbles in D in one go, as follows:
DDDD := DDDD if SSSS = 0000
DDDD := SSSS if SSSS > 0000

In other words a zero nibble is transparent which is exactly how the TI VDP and its variants in 16-colour mode worked: 0000 = transparent, 0001 = black, ... , 1111 = white. (Colours could be changed in later chips with a palette.)

MASKNIB would make it easy to have loads of sprites or cursors, pointers, grids, rulers, etc. As a logical ALU operation it would not be out of place.

cgracey · 2017-04-25 23:47

Yes, SFUNC could be moved. It is where it is because of incremental development. MASKNIB would be better, for sure, as it would enable efficient 4-bits-per-pixel data manipulation.

You tool developers... would it be a hardship for you if this were done?

Dave Hein · 2017-04-26 00:06

Moving SFUNC to a different codepoint is not a big concern for me. That is, as long as it doesn't cause a reshuffle of a lot of other instructions. I like the idea of a MASKNIB instruction. I've been playing around with a 640x480 16-color display, and handling sprites does take a lot of cycles currently. A MASKNIB instruction would be really useful, especially if a 4-bit transparency value could be specified.

jmg · 2017-04-26 00:13

cgracey wrote: »

Yes, SFUNC could be moved. It is where it is because of incremental development. MASKNIB would be better, for sure, as it would enable efficient 4-bits-per-pixel data manipulation.

You tool developers... would it be a hardship for you if this were done?

You are talking about moving the binary encode of SFUNC, and adding a MASKNIB opcode in the space released ?
That seems ok - that adds one more word/encode, and changes one not often used opcode encode.
Main issues is keeping things in phase..., but MASKNIB is new, and SFUNC is sounding rare, so even a skew is unlikely to break much ?

JRetSapDoog · 2017-04-26 08:04

Dave Hein wrote: »

I've been playing around with a 640x480 16-color display, and handling sprites does take a lot of cycles currently.

Dave, it's great that you're working on 640x480 VGA driver (with sprites to boot), as the P2 seems particularly suited for lower resolution, small-formal screens.

Have you given any consideration to making your 640x480 driver (that's in the works) interoperable with 800x480 displays? In particular, perhaps there could be a switch variable that could insert an extra pixel (duplicate an adjacent pixel) every 4 pixels to stretch what was otherwise 640 horizontal pixels to 800. As we know, many small format displays these days are WVGA with 800 native horizontal pixels per line. VGA driver chips/boards for these displays typically have the ability to stretch 640 horizontal pixels to 800, but, as far as I know, one has to use (or connect) a small keypad to such displays to toggle between stretched (640 px) and non-stretched (800 px) modes. If one doesn't activate stretching manually, then such displays will have 80-pixel-wide black bands on the left and right sides. So, a software switch (programmatic) method of utilization would seem desirable, as it would allow always "driving" such panels--likely through a VGA display driver chip--at their native 800x480 resolution (of course, ultimately, the VGA driver chips use 800x480 if that's what the panel is, but I'm referring to the VGA input signals to such driver chips). Now, while it's the case that VGA driver chips typically remember the stretched setting (i.e., store it in EEPROM), it apparently takes a keypad to be able to change things (to avoid the black bands), hence the desire to control things from the software generating the display signals (to allow changes without a keypad).

What would probably be ideal in a VGA/WVGA driver would be an 800x480 standard mode, a 640x480 stretched to 800x480 mode for driving native 800x640 panels, and a standard 640x480 mode for panels with that native resolution (or configured to display it). Such a driver (or similar drivers) could help the P2 to hit the ground running.

Dave Hein · 2017-04-26 11:34

My main interest right now is just to get a 640x480 display that fits in the 256K hub RAM of a DE2-115. 8 bpp is too large to fit, so I went to 4 bpp instead, which fits nicely and leaves 96K for program space. 16 colors is enough for a lot of applications, such as text displays and games. I haven't considering stretching the screen. I'm not sure that can be done easily with P2's hardware, and stretching messes up the aspect ratio of the pixels. An 800x480 native mode should be easy to do. It just requires an extra 37.5K of memory.

Rayman · 2017-04-26 14:05

I like the sound of this MASKNIB instruction...

Thinking 4-bpp might be one sweet spot for VGA and XGA resolutions.
Anything that can help make that easier is nice.

JRetSapDoog · 2017-04-26 15:04

Thanks, Dave. I veered away from topic with my question (I wasn't ready to start a new thread since it's more of an application thing), but thanks for the reply. Yep, stretching can mess with the aspect ratio of various shapes. It seems that it's most noticeable with circles (in cases where one knows the shape is supposed to be circular). However, for text (such as using the Propeller font), the stretching doesn't seem objectionable or noticeable. And utilizing the full width of the screen makes the text larger and possibly more legible at a distance. Of course, text drivers can be done with tile maps, so memory wouldn't be a problem. And perhaps it would be nice to combine, for example, a two- or four-color bitmap plane with a tile map text to have the best of both worlds within modest memory needs. Anyway, thanks again for the reply, and I hope that you can get the instruction(s) needed to expedite working with transparency for sprites and so on. --Jim

TonyB · 2017-04-26 18:22

Thank you everyone for supporting MASKNIB. Nothing would be lost and a useful instruction would be gained.

I think it's very important that the transparent nibble is 0000 as per the Texas Instruments Video Display Processors. There is a lot of software out there that could run unaltered on the P2 with a Z80 emulator (already done on P1) and a VDP emulator (attempted on P1 but never completed successfully, I believe).

cgracey · 2017-04-26 19:19

Does anyone have an opinion on making the no-write-byte value $00 for WMLONG? It's currently $FF.

Rayman · 2017-04-26 19:40

$00 makes more sense to me, but I think either way is OK.

BTW: Would be handy to have a 2-bit version of this too... On the other hand, 2-bit graphics probably won't be games or graphic intensive stuff...

Roy Eltham · 2017-04-26 19:53

I think $00 makes more sense.
Wouldn't mind a nibble variant of this also.

jmg · 2017-04-26 19:58

cgracey wrote: »

Does anyone have an opinion on making the no-write-byte value $00 for WMLONG? It's currently $FF.

Is it practical to allow either ?
Would it make sense to offer $00 in some opcodes and $FF in others to cover user needs ?

cgracey · 2017-04-26 20:14

jmg wrote: »

cgracey wrote: »

Does anyone have an opinion on making the no-write-byte value $00 for WMLONG? It's currently $FF.

Is it practical to allow either ?
Would it make sense to offer $00 in some opcodes and $FF in others to cover user needs ?

There's insufficient opcode space to allow both. I think I'll change it to $00.

About a nibble version of WMLONG, this can't happen because whole bytes are the granularity of hub RAM.

A two-bit version could be useful for line or textual graphics, where there is a background color plus three foreground colors. We'd need to get rid of an instruction, though, in the vicinity.

jmg · 2017-04-26 20:36

cgracey wrote: »

jmg wrote: »

cgracey wrote: »

Does anyone have an opinion on making the no-write-byte value $00 for WMLONG? It's currently $FF.

Is it practical to allow either ?
Would it make sense to offer $00 in some opcodes and $FF in others to cover user needs ?

There's insufficient opcode space to allow both. I think I'll change it to $00.

That change makes the most sense, unless someone comes with an alternate & compelling use case...

potatohead · 2017-04-26 21:01

$00

ersmith · 2017-04-26 21:39

As long as we're asking for new instructions, I'll just mention a few that would have made porting the RISC-V and ZPU interpreters easier. None of them are must haves, they all just replace a 2 instruction sequence with 1 instruction.

(a) A version of triml that sign extends instead of fills with 0. triml is nice, but you can achieve a similar result in 1 instruction by and'ing with a predefined mask. Doing a sign extend always takes 2 instructions.

(b) A bit field extract instruction, something like BFEXT D, S, which shifts D right by S[0:4] and then masks it by ((1 << S[5:8]) - 1). This would actually serve the purposes of both trimr and triml (at least for masks up to 16 bits) while being more flexible.

(c) A SETCOND instruction: if the EEEE field of the instruction is a true condition then D := S, otherwise D := 0. This one's probably tricky because it requires write back to happen even if the condition is false. Alternatively, we could do a CMPSET instruction: CMPSET D,S compares D to S and changes D to the flags that are the result of the comparison: D[31:2] to 0, D[1] to Z, and D[0] to C.

I'm not seriously suggesting that we make hardware changes at this late stage, but if you're changing anything these are the ones that come to mind. Actually the lack of sign extension options seems really odd given that triml/trimr are there. Did I miss something in the instruction documentation?

Eric

TonyB · 2017-04-26 21:44

$00 no-write for WMLONG, please. In RGBA the alpha value is zero for fully transparent and I think the BLNPIX instruction works the same way.

TonyB · 2017-04-26 21:54

Before it gets set in stone, I'm not mad keen on the name MASKNIB as it suggests only one nibble can be changed. MASKNIBS is too long. How about MNIBS? Shorter, snappier yet more accurate. M for mask to match WMLONG.

“What's in a name? That which we call a rose, by any other name would smell as sweet.”

Tubular · 2017-04-27 06:30

I think it might have been Coley (or Baggers?) who requested the $FF transparency. May be worth checking in with him.

I think $00 is fine. It'd be a bit like the P1 NTSC where $00 is sync and $02 through $07 are black through white.

TonyB · 2017-04-27 12:00

Thinking about sprites led me somewhere unexpected. In the TI VDP sprite bit patterns can be up to 16 x 16 pixels at normal size or 32 x 32 magnified with bits doubled. Normal and magnified sprites can appear onscreen at the same time. A quick way of doubling would save time and code by allowing sprite patterns to be shifted out at the same rate, whether normal size or magnified.

Applying this doubling idea to different data widths produces the family of operations on D shown below. Half the resulting bits in adjacent lines are identical so the logic should optimise quite well.

Before:
XXXXXXXX XXXXXXXX ABCDEFGH IJKLMNOP D[31:16] are don't care

After:
AABBCCDD EEFFGGHH IIJJKKLL MMNNOOPP Bit doubler BITDBL
ABABCDCD EFEFGHGH IJIJKLKL MNMNOPOP Twit doubler TWTDBL
ABCDABCD EFGHEFGH IJKLIJKL MNOPMNOP Nibble doubler NIBDBL
ABCDEFGH ABCDEFGH IJKLMNOP IJKLMNOP Byte doubler BYTDBL
ABCDEFGH IJKLMNOP ABCDEFGH IJKLMNOP Word doubler WRDDBL

I didn't intend to suggest anything else new, these doublers just appeared out of the fog last night. They might be useful when data must be output at twice their usual rate, e.g. when mixing data streams with a single fixed clock.

Unlike MASKNIB/MNIBS, I don't think they are vital but if considered worthwhile they could be added to the fairly similar SFUNCs without affecting other instructions. Bit doubling in particular is a horrible algorithm to do in software.

potatohead · 2017-04-27 12:22

My personal opinion on this is we must remember each of these is times 16!

For massive sprite manipulation, there are 16 COGS.

We have getnib. A small lookup table gets the bits doubled pretty quick.

cgracey · 2017-04-27 16:53

ersmith wrote: »

As long as we're asking for new instructions, I'll just mention a few that would have made porting the RISC-V and ZPU interpreters easier. None of them are must haves, they all just replace a 2 instruction sequence with 1 instruction.

(a) A version of triml that sign extends instead of fills with 0. triml is nice, but you can achieve a similar result in 1 instruction by and'ing with a predefined mask. Doing a sign extend always takes 2 instructions.

(b) A bit field extract instruction, something like BFEXT D, S, which shifts D right by S[0:4] and then masks it by ((1 << S[5:8]) - 1). This would actually serve the purposes of both trimr and triml (at least for masks up to 16 bits) while being more flexible.

(c) A SETCOND instruction: if the EEEE field of the instruction is a true condition then D := S, otherwise D := 0. This one's probably tricky because it requires write back to happen even if the condition is false. Alternatively, we could do a CMPSET instruction: CMPSET D,S compares D to S and changes D to the flags that are the result of the comparison: D[31:2] to 0, D[1] to Z, and D[0] to C.

I'm not seriously suggesting that we make hardware changes at this late stage, but if you're changing anything these are the ones that come to mind. Actually the lack of sign extension options seems really odd given that triml/trimr are there. Did I miss something in the instruction documentation?

Eric

I'm thinking that we really need a sign-extension instruction. Maybe we could get rid of TRIMR and make it a sign-extending version of TRIML.

For a long time, I've thought about a bit field extractor. I think your way is good, where by limiting the field size to 16 bits, you can fit the bit field parameters into a 9-bit immediate.

I need to think about that last one. That's kind of an interesting requirement.

ersmith · 2017-04-27 17:13

A lot of languages/processors have a feature where "x := (a < b)" sets x to either 0 or 1; it's similar to Spin, but Spin uses 0 or -1. Spin's way is a little simpler since you can do a MUXC on the whole output value to get either 0 or FFFFFFFF. To get just 0 or 1 you then have to and with 1 or do an abs. So I was hoping to replace:

   CMPS A, B wc, wz
   MUXC A, maskffffffff
   AND  A, #1

with something like:

  CMPS A, B wc, wz
  SETC  A

where SETC sets A[31:1] to 0 and A[0] to C.

It's really not a big deal, just something that would save an instruction here and there.

Thanks,
Eric

Seairth · 2017-04-27 17:39

But if you already know the data size, you can just do the tried-and-true

SHL reg, #8
SAR reg, #8

Is this a frequent-enough occurrence to make a special instruction worthwhile?

ersmith · 2017-04-27 18:05

Seairth wrote: »
But if you already know the data size, you can just do the tried-and-true
SHL reg, #8
SAR reg, #8
Is this a frequent-enough occurrence to make a special instruction worthwhile?

Well, we already have a TRIML instruction that zero-extends, i.e. does the equivalent of

SHL reg, #val
SHR reg, #val

(and a similar TRIMR that swaps the order of the shifts). I'd argue that a sign extending instruction is more important than that, since the TRIML / TRIMR effects can be achieved with a single AND, whereas sign extension is going to take 2 instructions.

Seairth · 2017-04-27 18:26

ersmith wrote: »

Well, we already have a TRIML instruction that zero-extends...

How often would you actually use TRIML to zero-extend? The only use that comes to mind is if you are working with the least-significant field of packed data in a single register. For any other field in the packed data, you'd still end up with two instructions (triml/sar or shl/sar), so that doesn't seem worthwhile.

If anything, I'd argue that TRIML/TRIMR will be infrequently used and can be handled with a pair of shifts. Get rid of them entirely and use the instruction slots for something else...

jmg · 2017-04-27 20:23

ersmith wrote: »
A lot of languages/processors have a feature where "x := (a < b)" sets x to either 0 or 1; it's similar to Spin, but Spin uses 0 or -1. Spin's way is a little simpler since you can do a MUXC on the whole output value to get either 0 or FFFFFFFF. To get just 0 or 1 you then have to and with 1 or do an abs. So I was hoping to replace:
   CMPS A, B wc, wz
   MUXC A, maskffffffff
   AND  A, #1
with something like:
  CMPS A, B wc, wz
  SETC  A
where SETC sets A[31:1] to 0 and A[0] to C.

The P2 already has BIT opcodes that can place C into any BIT of a register, so that opens up your X being a boolean VAR that consumes just ONE bit of valuable register space, instead of 32 bits in a LONG.
eg That means 32 booleans can pack into one LONG, saving 31 LONGS for more useful work

Somewhat hidden from the BITx group, are boolean AND OR XOR, to support clearer coding of
bool := (a < b) AND (b < c)
but I think a simple Assembler alias allows

  BITAND   ->  IF_NC   BITL   Set bit S[4:0] of D to S[4:0] AND C 
  BITOR  -> IF_C  BITH Set bit S[4:0] of D to S[4:0] OR C   
  BITXOR  -> IF_C  BITN Set bit S[4:0] of D to S[4:0] XOR C

ersmith · 2017-04-27 20:43

jmg wrote: »

The P2 already has BIT opcodes that can place C into any BIT of a register, so that opens up your X being a boolean VAR that consumes just ONE bit of valuable register space, instead of 32 bits in a LONG.

Sure, if you're writing custom code in PASM. But C uses LONGS, not bits. Similarly for the Risc-V and ZPU interpreters I have to deal with the instruction sets as defined (which operate on 32 bits and which have instructions which produce 1/0 based on a test).

ozpropdev · 2017-04-28 02:58

TonyB wrote: »

Thinking about sprites led me somewhere unexpected. In the TI VDP sprite bit patterns can be up to 16 x 16 pixels at normal size or 32 x 32 magnified with bits doubled. Normal and magnified sprites can appear onscreen at the same time. A quick way of doubling would save time and code by allowing sprite patterns to be shifted out at the same rate, whether normal size or magnified.

Applying this doubling idea to different data widths produces the family of operations on D shown below. Half the resulting bits in adjacent lines are identical so the logic should optimise quite well.

Before:
XXXXXXXX XXXXXXXX ABCDEFGH IJKLMNOP D[31:16] are don't care

After:
AABBCCDD EEFFGGHH IIJJKKLL MMNNOOPP Bit doubler BITDBL
ABABCDCD EFEFGHGH IJIJKLKL MNMNOPOP Twit doubler TWTDBL
ABCDABCD EFGHEFGH IJKLIJKL MNOPMNOP Nibble doubler NIBDBL
ABCDEFGH ABCDEFGH IJKLMNOP IJKLMNOP Byte doubler BYTDBL
ABCDEFGH IJKLMNOP ABCDEFGH IJKLMNOP Word doubler WRDDBL

I didn't intend to suggest anything else new, these doublers just appeared out of the fog last night. They might be useful when data must be output at twice their usual rate, e.g. when mixing data streams with a single fixed clock.

Unlike MASKNIB/MNIBS, I don't think they are vital but if considered worthwhile they could be added to the fairly similar SFUNCs without affecting other instructions. Bit doubling in particular is a horrible algorithm to do in software.

Some of these can already be achieved using these instructions.

'AABBCCDD EEFFGGHH IIJJKKLL MMNNOOPP Bit doubler BITDBL
	rolword	dx,dx,#0
	mergew	dx

'ABCDEFGH ABCDEFGH IJKLMNOP IJKLMNOP Byte doubler BYTDBL
	movbyts	dx,#%%1100

'ABCDEFGH IJKLMNOP ABCDEFGH IJKLMNOP Word doubler WRDDBL
	rolword	dx,dx,#0

jmg · 2017-04-28 04:50

ersmith wrote: »

jmg wrote: »

The P2 already has BIT opcodes that can place C into any BIT of a register, so that opens up your X being a boolean VAR that consumes just ONE bit of valuable register space, instead of 32 bits in a LONG.

Sure, if you're writing custom code in PASM. But C uses LONGS, not bits. Similarly for the Risc-V and ZPU interpreters I have to deal with the instruction sets as defined (which operate on 32 bits and which have instructions which produce 1/0 based on a test).

OK, then perhaps a variant of the BIT instruction family, that clears non-destination bits ?
Seems to use only 5 bits of the 9 bit S field, looks to be room to use MSB as Atomic/Alone choice ?

SFUNC question

Comments