Shop OBEX P1 Docs P2 Docs Learn Events
What new assembler instructions would you wish for? — Parallax Forums

What new assembler instructions would you wish for?

pjvpjv Posts: 1,903
edited 2009-11-08 16:13 in Propeller 1
Hi All;

I have been working·in Prop assembler (don't understand SPIN yet) for a while now, and am often in need of bit manipulation. While I like the instructions·that are available for that, there appear to be some inconvenient holes that require multiple instuctions to implement, and at 4 bytes per each, somewhat wasteful.

In particular, I miss or wish for:

-·loading a 32 bit immediate value
- testing an arbitrary bit in any cog location without using a mask
- setting or clearing an arbitray bit in any cog location without using a mask
- copying zero or carry (or their complements) to an arbitrary bit in any cog location without using a mask
- shifting through (as well as around) carry where carry holds its place in the sequence

There are of course many other wishes such as math operations, but I purposely restricted my set to·those bit manipulations I had actually needed, and I wondered if my requirements were unique, or if·others had similar experience.

If there is some consensus, then perhaps it is not too late for some to be included in the Prop2 instructiion set.


Cheers,

Peter (pjv)

Comments

  • Bill HenningBill Henning Posts: 6,445
    edited 2009-11-07 18:42
    Loading an immediate value is handled by a "constant" long declaration - most of my programs have an

    allones long $ffffffff

    declaration.

    I agree about the other bit manipulations... it would be wonderful to have BIT{SET|CLEAR|TEST|TOGGLE} reg,{#}bitnum

    perhaps WZ and WC could be used to mean write the bit to Z or C for the set/clear/toggle
    pjv said...
    Hi All;

    I have been working in Prop assembler (don't understand SPIN yet) for a while now, and am often in need of bit manipulation. While I like the instructions that are available for that, there appear to be some inconvenient holes that require multiple instuctions to implement, and at 4 bytes per each, somewhat wasteful.

    In particular, I miss or wish for:

    - loading a 32 bit immediate value

    - testing an arbitrary bit in any cog location without using a mask

    - setting or clearing an arbitray bit in any cog location without using a mask

    - copying zero or carry (or their complements) to an arbitrary bit in any cog location without using a mask

    - shifting through (as well as around) carry where carry holds its place in the sequence



    There are of course many other wishes such as math operations, but I purposely restricted my set to those bit manipulations I had actually needed, and I wondered if my requirements were unique, or if others had similar experience.



    If there is some consensus, then perhaps it is not too late for some to be included in the Prop2 instructiion set.





    Cheers,



    Peter (pjv)
    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    www.mikronauts.com Please use mikronauts _at_ gmail _dot_ com to contact me off-forum, my PM is almost totally full
    Morpheusdual Prop SBC w/ 512KB kit $119.95, Mem+2MB memory IO board kit $89.95, both kits $189.95
    Propteus and Proteus for Propeller prototyping 6.250MHz custom Crystals run Propellers at 100MHz
    Las - Large model assembler for the Propeller Largos - a feature full nano operating system for the Propeller
  • SapiehaSapieha Posts: 2,964
    edited 2009-11-07 18:44
    Hi pjv.

    One instruction You may consider is Reversing BIT's on BYTE, WORD and LONG base.

    REV dest, Byte 1 else 2 else 3 else 4
    REV dest, word 1 else 2
    REV dest, Long

    Regards
    ChJ

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Nothing is impossible, there are only different degrees of difficulty.
    For every stupid question there is at least one intelligent answer.
    Don't guess - ask instead.
    If you don't ask you won't know.
    If your gonna construct something, make it·as simple as·possible yet as versatile as posible.


    Sapieha
  • Mike GreenMike Green Posts: 23,101
    edited 2009-11-07 18:46
    All of the ones that are "without using a mask" are easily accomplished by allocating a location containing the mask and referencing that. This costs one more location that often can be amortized over several uses. It doesn't mean that it wouldn't be nice to have such instructions, but, in the tradeoff between complicating the instruction set and possibly dropping other instructions, these are low priority.

    The shifting through the carry (having essentially a 33 bit shift) is one where a stronger case can be made. The only way to really implement it on Prop I is to do single bit RCL or RCR shifts. You can't do multiple shifts. Whether it's feasible depends on how the barrel shifter is implemented. There would need to be an additional group of terms. It may be too costly in terms of chip area.
  • potatoheadpotatohead Posts: 10,261
    edited 2009-11-07 19:13
    Actually, it doesn't cost anything. The mask must exist somewhere in the COG, IMHO. This is just a semantics problem from where I stand.

    I would like to see auto increment and decrement instructions, like DJNZ for HUB ops. Things like DWRLONG, WORD, BYTE, where the index is decremented, then a long is fetched from the HUB, that would permit an index to serve as a pointer, and not have to be both the pointer and loop counter, for a tight loop. Match that with IWRLONG, because flow control isn't a factor. Much better HUB utilization could occur in many cases, without so much effort being required to setup the loops.

    I would second the bit, set, clear, test instructions.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Propeller Wiki: Share the coolness!
    Chat in real time with other Propellerheads on IRC #propeller @ freenode.net
    Safety Tip: Life is as good as YOU think it is!

    Post Edited (potatohead) : 11/7/2009 7:18:36 PM GMT
  • jazzedjazzed Posts: 11,803
    edited 2009-11-07 20:46
    potatohead said...

    I would like to see auto increment and decrement instructions, like DJNZ for HUB ops. Things like DWRLONG, WORD, BYTE, where the index is decremented, then a long is fetched from the HUB, that would permit an index to serve as a pointer, and not have to be both the pointer and loop counter, for a tight loop.
    Auto increment/decrement source and destination pointers would be helpful. Meanwhile, you can use DJNZ as one index as long as you have a "zero-sentinel" to demarcate the buffer. With that approach, you can do a buffer transfer at 200ns per data element.

          mov  ptr, par      ' value at BYTE[noparse][[/noparse]par] is zero
          add  ptr, buflen  ' length of buffer
    :loop rdbyte val, ptr wz
          nop               ' free for user :)
    if_nz djnz ptr, #:loop  ' 200ns loop
    
    



    Yes, an immediate bit set/clear/test (1 long only per instruction) would be useful in some cases.

    A WRBYTE etc... that could use INA as destination would be wonderful [noparse]:)[/noparse]

    Post Edited (jazzed) : 11/7/2009 9:35:28 PM GMT
  • pjvpjv Posts: 1,903
    edited 2009-11-07 22:14
    @Potatohead;

    Many of my applications for a "without a mask" are to use the immediate capability within the instruction, hence a long is not required. So bit test/set/clear/toggle and C/Z copy could be directly effected in the destination with the bit position in immediate form in the instruction's source segment. Just like the current shift/rotates are done. I would have been able to save many longs if this instruction were available.

    For generality of ccourse one would still expect the non-immediate mode, and the source to be able to select any cog address.

    Cheers,

    Peter (pjv)
  • potatoheadpotatohead Posts: 10,261
    edited 2009-11-08 00:01
    Yeah, no worries. It's all good discussion.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Propeller Wiki: Share the coolness!
    Chat in real time with other Propellerheads on IRC #propeller @ freenode.net
    Safety Tip: Life is as good as YOU think it is!
  • Cluso99Cluso99 Posts: 18,071
    edited 2009-11-08 00:28
    IIRC, in the long PropII thread, Chip has said there will be fast block read/writes between hub and cog.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Links to other interesting threads:

    · Home of the MultiBladeProps: TriBlade,·RamBlade,·SixBlade, website
    · Single Board Computer:·3 Propeller ICs·and a·TriBladeProp board (ZiCog Z80 Emulator)
    · Prop Tools under Development or Completed (Index)
    · Emulators: CPUs Z80 etc; Micros Altair etc;· Terminals·VT100 etc; (Index) ZiCog (Z80) , MoCog (6809)
    · Search the Propeller forums·(uses advanced Google search)
    My cruising website is: ·www.bluemagic.biz·· MultiBladeProp is: www.bluemagic.biz/cluso.htm
  • RaymanRayman Posts: 15,375
    edited 2009-11-08 00:30
    Uh... MUL and DIV [noparse]:)[/noparse]

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    My Prop Info&Apps: ·http://www.rayslogic.com/propeller/propeller.htm
  • potatoheadpotatohead Posts: 10,261
    edited 2009-11-08 01:52
    LOL!! Absolutely Rayman. I think at least the MUL is on the table, probably both.

    Cluso, I also remember discussion about that not being on the table in favor of some better instructions too. I think we don't know where Chip is at on this just yet. Maybe I'm remembering wrong...

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Propeller Wiki: Share the coolness!
    Chat in real time with other Propellerheads on IRC #propeller @ freenode.net
    Safety Tip: Life is as good as YOU think it is!
  • Phil Pilgrim (PhiPi)Phil Pilgrim (PhiPi) Posts: 23,514
    edited 2009-11-08 02:47
    Single-cycle division is virtually out of the question, and I doubt that Chip would resort to a multi-cycle mode just to support a full DIV instruction. Even DSPs lack such a facility, although many support a single-cycle division step instruction. It would be nice if MUL could produce a 64-bit product in adjacent registers, though.

    In any event, I'm sure the instruction set has been a done deal for quite some time now, so it's hardly worth offering suggestions at this point.

    -Phil

    Addendum: I do kinda remember something about an iterator or repeat instruciton, which would serve to augment a division step instruction. I'm not gonna comb through that thread to find it though! smile.gif Frankly, I trust Chip's judgement enough not be be anxious or even anticipatory about the Prop II. When it's done, it's done, and it will be a welcome advance. Until then, there's plenty of performance left to wring from the Prop I. We've only begun to tap its capabilities.

    Post Edited (Phil Pilgrim (PhiPi)) : 11/8/2009 3:12:15 AM GMT
  • hippyhippy Posts: 1,981
    edited 2009-11-08 15:34
    One thing I've argued for is an instruction which can split 16-bit Thumb-style LMM opcodes into an equivalent 32-bit PASM instructions. No or limited conditonals, reduced size <src> and <dst> fields. That would allow higher LMM code densities without sacrificing an awful lot of speed with numerous bit shifting ops.
  • AleAle Posts: 2,363
    edited 2009-11-08 16:13
    bit field extraction...
    I'd go for:

    mul, mul, mul!, did I said mul already ?
    div... with a step instruction like in the SuperH I'd be happy.
    autoincrement/decrement definitely. I'd even go for base+index wink.gif

    byte transfer, any byte to any byte. That'd rock! like movs and movd but for bytes...

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Visit some of my articles at Propeller Wiki:
    MATH on the propeller propeller.wikispaces.com/MATH
    pPropQL: propeller.wikispaces.com/pPropQL
    pPropQL020: propeller.wikispaces.com/pPropQL020
    OMU for the pPropQL/020 propeller.wikispaces.com/OMU
Sign In or Register to comment.