Shop OBEX P1 Docs P2 Docs Learn Events
Prop II: New Instructions suggested — Parallax Forums

Prop II: New Instructions suggested

Cluso99Cluso99 Posts: 18,069
edited 2010-07-01 11:00 in Propeller 1
I have had this post sitting on my desk adding bits as I think of them. Please understand it is a bit disjointed as I think of other ideas. I won't waste the time trying to rewrite it, so here it is, arts and all.
Bill said...
movbxy dst,src - move byte Y (0..3) of src into byte X of dst
movwxy dst,src - move word Y (0,1) of src into word X of dst
This is what I believe Bill is trying to achieve. Generally we are trying to pack up or unpack bytes from longs. Sometimes though, we are reading bytes from an external source (SRAM) and this may be on pins other than P0-7 (or P24-31 on RamBlade). So for packing, we need to be able to shift them to a position within the long, clear the other bits, and then we can then OR them into a long. We have similar issues for extracting them. The current methods of AND the bits we don't want out, SHX to shift the bits into the correct position, and OR to add the bits into a destination, take both time and code space. We often also do the reverse by MOV as in taking a copy, AND to remove the unwanted bits, and SHX to shift the bits.·This may be followed by an OR.

So to pack into a long...
·· AND dst,src· - extract 8 bits by anding with 32bit source
·· SHX dst,#xx - shift into correct position
·· MOV dst2,dst - copy to the final destination (dest2)
·· ....·············· - obtain the next byte (maybe an INA instruction)
·· AND dst,src· - extract next 8 bits by anding with 32bit source
·· SHX dst,#xx - shift into correct position
···OR dst2,dst - copy to the final destination (dest2)
·· ....·············· - obtain the next byte (maybe an INA instruction)
·· AND dst,src· - extract 3rd set of 8 bits by anding with 32bit source
·· SHX dst,#xx - shift into correct position
···OR dst2,dst - copy to the final destination (dest2)
·· ....·············· - obtain the next byte (maybe an INA instruction)
·· AND dst,src· - extract 4th set of 8 bits by anding with 32bit source
·· SHX dst,#xx - shift into correct position
···OR dst2,dst - copy to the final destination (dest2)
·· ...·············· - dst2 now contains 4 bytes in 1 long

Currently we have movs, movd and movi for moving 9bit fields into source, destination and instruction fields. We do not have the reverse to extract the destination or instruction fields although that is not often required anyway.

I love·Bill's concept. But where are those instructions going to come from? The movbxy will require 4 extra bits and the movwxy will require 2 extra bits. This means finding an extra 20 opcodes. I understand what you are getting at, so is there another way?

Perhaps the destination [noparse][[/noparse]or the source] could be a fixed location (i.e. one of the special registers in $1Fx-$1FF, or pointed to by one of those registers) so that the DST field could be used to hold these extra bits. This may even permit other special op codes to be added here, as 1 opcode would allow 512 sub opcodes (being all 9 bits of the destination). If I understand Chip's concept of the new 6 special registers correctly, then perhaps one of the pointer registers could be used to point to the destination with the cog. Perhaps then, the top bit of the DST field could be used to optionally increment this register.

If my method here could be used, then perhaps 2 more similar instructions (6 opcodes) could be added within this framework to..

···· extractbx src· - zeros all other bits except the byte x (0..3) of the src
···· extractwx src - zeros all other bits except the byte x (0..1) of the src
Of course these can be done with AND x00FF0000 etc, but do save having constants wasting cog space.

Perhaps just combining the AND and SHIFT instructions could help significantly. It would be destructive to the source, but since only 1 operand needs to be specified, the source could be used for the bits to determine..... byte/word, offset, shift and direction
··· extract dst,#000wyydss (immediate value shown) .....see alternative version below XSR & XSL
······················000: unused bits - could be used by another set of instructions
······················· w: 1=word, 0=byte
·······················yy: offset of required byte (0..3) or word (0..1)
························ d: direction of shift 1=SHL (shift left), 0=SHR(shift right)
·······················ss:·shift of required byte (0..3) or word (0..1)

Looking at the current instruction set, there are a number of single operand fields where the src is used to specify parameters. In particular, the ROR, ROL, SHR, SHL, RCR, RCL, SAR, REV use the src as a 5 bit shift/bit specification. Perhaps these instructions could now share a common single instruction with the "extract" above, and perhaps some others, by using the src bits (5..8) as follows (immediate values shown)
·· ROx dst,#000_d_bbbbb·- ROR/ROL, d=direction (1=left, 0=right), bbbbb=bits to shift (0..31)
·· SHx dst,#001_d_bbbbb - SHR/SHL,··· "··············· "········ "··············· "
·· RCx dst,#010_d_bbbbb - RCR,RCL,···· "··············· "········ "··············· "
·· SAR dst,#011_1_bbbbb - SAR······· direction=1=right························ "
·· REV dst,#011_0_bbbbb·- REV······· reverse 32-S[noparse][[/noparse]4..0] bottom bits in D and zero extend
·· XSx dst,#100_d_wyyss - XSR/XSL, EXTRACT+SHIFT d=direction (1=left, 0=right),
········································································· yy= offset of required byte (0..3) or word (0..1),
········································································· ss=·shift of required byte (0..3) or word (0..1)
The above only uses 1 instruction and leaves #101..#111 free for other single operand instructions. So, it actually frees up 7 existing op codes.
Of course, I have no idea how much this requires in silicon, nor what delays in the design it may have. Note the impact on existing software that has the upper bits [noparse][[/noparse]8..4] set.



▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Links to other interesting threads:

· Home of the MultiBladeProps: TriBlade,·RamBlade,·SixBlade, website
· Single Board Computer:·3 Propeller ICs·and a·TriBladeProp board (ZiCog Z80 Emulator)
· Prop Tools under Development or Completed (Index)
· Emulators: CPUs Z80 etc; Micros Altair etc;· Terminals·VT100 etc; (Index) ZiCog (Z80) , MoCog (6809)·
· Prop OS: SphinxOS·, PropDos , PropCmd··· Search the Propeller forums·(uses advanced Google search)
My cruising website is: ·www.bluemagic.biz·· MultiBlade Props: www.cluso.bluemagic.biz
«1

Comments

  • Phil Pilgrim (PhiPi)Phil Pilgrim (PhiPi) Posts: 23,514
    edited 2010-06-29 06:17
    Beau is already laying out silicon. Is it even practical to be suggesting new opcodes at this stage of the game?

    I'm putting my full faith and confidence in Chip's prior judgement on these issues. After all, it was his singular vision gave us the Prop I. I would wish neither to cloud that vision nor to suggest anything that -- notwithstanding any obvious merits -- would further impede the Prop II's gestation.

    -Phil
  • Cluso99Cluso99 Posts: 18,069
    edited 2010-06-29 07:00
    Personally Phil, I am in agreement with you as I am certain any changes will delay the PropII even further.

    However, since Bill raised it with Chip http://forums.parallax.com/showthread.php?p=918739·I thought I would throw out some alternatives.

    So, having said that, I would be happier if Chip actually ignores any requests and just gets us PropII. In fact, since the PropII is still so far away, I would prefer the Prop 1.5 (64 I/O) version (no other changes folks as I believe·the layout has been·complete for some time).

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Links to other interesting threads:

    · Home of the MultiBladeProps: TriBlade,·RamBlade,·SixBlade, website
    · Single Board Computer:·3 Propeller ICs·and a·TriBladeProp board (ZiCog Z80 Emulator)
    · Prop Tools under Development or Completed (Index)
    · Emulators: CPUs Z80 etc; Micros Altair etc;· Terminals·VT100 etc; (Index) ZiCog (Z80) , MoCog (6809)·
    · Prop OS: SphinxOS·, PropDos , PropCmd··· Search the Propeller forums·(uses advanced Google search)
    My cruising website is: ·www.bluemagic.biz·· MultiBlade Props: www.cluso.bluemagic.biz
  • mctriviamctrivia Posts: 3,772
    edited 2010-06-29 07:03
    i thought there was an error in the prop1.5 code which is why it never came out.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Lots of propeller based products in stock at affordable prices.
  • Cluso99Cluso99 Posts: 18,069
    edited 2010-06-29 09:03
    As far as I understood it was in the validation software and I presume that had been resolved.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Links to other interesting threads:

    · Home of the MultiBladeProps: TriBlade,·RamBlade,·SixBlade, website
    · Single Board Computer:·3 Propeller ICs·and a·TriBladeProp board (ZiCog Z80 Emulator)
    · Prop Tools under Development or Completed (Index)
    · Emulators: CPUs Z80 etc; Micros Altair etc;· Terminals·VT100 etc; (Index) ZiCog (Z80) , MoCog (6809)·
    · Prop OS: SphinxOS·, PropDos , PropCmd··· Search the Propeller forums·(uses advanced Google search)
    My cruising website is: ·www.bluemagic.biz·· MultiBlade Props: www.cluso.bluemagic.biz
  • hinvhinv Posts: 1,255
    edited 2010-06-29 14:42
    If you are talking about the port b prop, I wish we had taken that option as an interim. Back then PropII was a year off. I hink I would like that option still, since prop2 is still a year off. I doubt that they got the software problem woth portb prop fixed yet though.
    Prop II looks to be something quite a bit diffent than current prop. Much more powerful, but much more complex.
    A working 4 clock mul and div and port b would be all that is needed to propell the prop 1.5 tonew heights!
    I think many of us would even like a 64i/o prop without mul and div, especially since it would be lower power.

    Does the new prop2 have a single cycle mul and div?
  • LeonLeon Posts: 7,620
    edited 2010-06-29 15:15
    I'd say that it is at least two years off. A shuttle run in a year doesn't mean working silicon, it's just for prototyping: several customers' designs are put on one wafer.

    Single-cycle division isn't feasible, unlike multiplication.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Leon Heller
    Amateur radio callsign: G1HSM

    Post Edited (Leon) : 6/29/2010 3:30:44 PM GMT
  • LeoDLeoD Posts: 4
    edited 2010-06-29 15:27
    I agree. Prop 1.5 with 32 extra io's and 4 cycle multiply would be just awesome !!! As great as Prop is, these are the things I've always wished Prop had. Sram interfacing without running out of io, and most importantly some muscle to do more DSP-like stuff. I believe it would carry us quite a distance !

    Leo
  • potatoheadpotatohead Posts: 10,261
    edited 2010-06-29 16:03
    I wonder if divide couldn't be done in parallel, like a waitvid is... You execute it, get the cog back to be crunching on stuff, then X instructions later, the source register ends up valid?

    DIV Dest, Source WR
    nop
    nop 'Do other things while product of the division is rendered
    nop

    CMP Dest, Value

    Given how some other things work in a Prop, why not?

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Propeller Wiki: Share the coolness!
    8x8 color 80 Column NTSC Text Object
    Wondering how to set tile colors in the graphics_demo.spin?
    Safety Tip: Life is as good as YOU think it is!

    Post Edited (potatohead) : 6/29/2010 4:08:07 PM GMT
  • Phil Pilgrim (PhiPi)Phil Pilgrim (PhiPi) Posts: 23,514
    edited 2010-06-29 16:11
    If there were no Prop II "a year (or two) off", the Prop 1.5 would definitely be a value proposition -- now and for the foreseeable future. I wonder, though, how well it would line up between the Prop I and Prop II in terms of price and performance. It would be a high pin-count device, like the Prop II, but with lower performance. There's a danger that it would have to be priced higher than the Prop II (unless the Prop I price dropped significantly), but its only advantage would be lower power consumption. As a consequence, once the Prop II comes along, I doubt that it would enjoy an exploitable market niche.

    I would love to have a 64-port Prop 1.5, too, and it may have made economic sense in 2007. But what might have been a welcome extra point back then would be a mere punt at this stage of the game. I suspect Parallax has done the math and is shaking their heads over the results.

    -Phil
  • potatoheadpotatohead Posts: 10,261
    edited 2010-06-29 16:18
    Great point Phil. I agree.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Propeller Wiki: Share the coolness!
    8x8 color 80 Column NTSC Text Object
    Wondering how to set tile colors in the graphics_demo.spin?
    Safety Tip: Life is as good as YOU think it is!
  • Beau SchwabeBeau Schwabe Posts: 6,568
    edited 2010-06-29 16:31
    Cluso99,

    "As far as I understood it was in the validation software and I presume that had been resolved." ...

    We have another layout guy that has been working under contract to put together the Prop 1.5 (as it is being called.. not an official name I don't think). I do kind of like the name Prop I.5 (pronounced Eye-Dot-Five) ... but oh well
    Early on I had volunteered my layout resource to help find some errors that our other layout guy was experiencing since Chip was still in the early schematic design of Prop II and for the moment I didn't have any layout to put together ... I found several errors and suggested corrections to the other layout guy, but could not make the corrections on my end since I was geared for another smaller process. It's one thing to do an LVS (Layout Verses Schematic) check for another process, but it's a different game when applying DRC (Design Rule Check) validations.

    Some of the problems are/were getting the tool to recognize that the layout and schematic correlated in a 1 to 1 relationship. As the design can change, so can the hierarchy of certain parts of the design, and although the physical layout is identical, the encapsulation that would define different levels of hierarchy can end up breaking the LVS. (<--- The tool becomes confused) ... In many cases the tool can resolve small hierarchical discrepancies at higher levels, but for best results the layout and schematic should always follow a 1 to 1 correspondence.

    I have been out of the loop with the 1.5 version of the Prop, and what it's latest status is, simply because I have had my own plate of layout to work on that involves the Prop II.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Beau Schwabe

    IC Layout Engineer
    Parallax, Inc.

    Post Edited (Beau Schwabe (Parallax)) : 6/29/2010 4:36:42 PM GMT
  • LeonLeon Posts: 7,620
    edited 2010-06-29 16:40
    potatohead said...
    I wonder if divide couldn't be done in parallel, like a waitvid is... You execute it, get the cog back to be crunching on stuff, then X instructions later, the source register ends up valid?

    DIV Dest, Source WR
    nop
    nop 'Do other things while product of the division is rendered
    nop

    CMP Dest, Value

    Given how some other things work in a Prop, why not?

    Microchip has a clever way of doing a fast divide on their 16-bit MCUs; they have special hardware that does a signed or unsigned 16-bit division in 18 clocks, which is pretty good. 16x16 bit multiply is only one clock, of course.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Leon Heller
    Amateur radio callsign: G1HSM
  • Phil Pilgrim (PhiPi)Phil Pilgrim (PhiPi) Posts: 23,514
    edited 2010-06-29 17:03
    Apparently, my take on the Prop "I.5" was a little off. wink.gif

    -Phil
  • jazzedjazzed Posts: 11,803
    edited 2010-06-29 17:40
    I thought Prop 1.5 was a smaller Propeller [noparse]:)[/noparse]

    Propeller: Today's definition
    TurboProp: (PropII) 8 COGS, 96 IO, 256KB+ HUB, QFP128
    name?: Today's definition + 32 IO B port
    name?: 2 COGS, < 32 IO, 16KB HUB, TSOP28/PDIP28

    I agree that having instructions do reverse of movs, etc... would be good.
    Wouldn't want to see too much more fab-out delay though[noparse]:)[/noparse]

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Propeller Pages: Propeller JVM
  • Dave HeinDave Hein Posts: 6,347
    edited 2010-06-29 17:45
    Beau Schwabe (Parallax) said...

    We have another layout guy that has been working under contract to put together the Prop 1.5 (as it is being called.. not an official name I don't think). I do kind of like the name Prop I.5 (pronounced Eye-Dot-Five) ... but oh well
    If we use Roman numerals, wouldn't it be Prop I.V?· Or would it be Prop I V/X?
  • Phil Pilgrim (PhiPi)Phil Pilgrim (PhiPi) Posts: 23,514
    edited 2010-06-29 21:24
    The 1.5 could be named like Boeing names planes, i.e. "PropX Stretch".

    Or "MaxiProp" and "MiniProp" would work for the 64-port and four-cog (two aren't enough) shrink versions, respectively. I do like "TurboProp" for the Prop II. It's not without precedent, either, since the SX had a "turbo" mode with one instruction per clock.

    -Phil
  • pharseidpharseid Posts: 192
    edited 2010-06-29 21:26
    I think in a TTL Databook from Texas Instruments in the late 80's·they had the algorithms for doing hardware divides and multiplies. Gives excellent insight into what goes on in an integer hardware divide. There's a little setup, then the heart of the algorithm produces one bit of the quotient per step.

    The now ancient Novix Forth Engine also had an instruction which produced a step of an integer square root algorithm. Charles·Moore included it in the instruction set because of his interest in·3D graphics (I assume the square root would be for·computing distances in 3D). While I don't expect this to be included in any version of the prop, with the inclusion of a 3D shading·instruction on the Prop ][noparse][[/noparse], it might be·a good idea for somebody to dig up that algorithm and implement it in PASM, to do·collision detection.
  • jazzedjazzed Posts: 11,803
    edited 2010-06-30 01:14
    Phil Pilgrim (PhiPi) said...
    The 1.5 could be named like Boeing names planes, i.e. "PropX Stretch".

    Or "MaxiProp" and "MiniProp" would work for the 64-port and four-cog (two aren't enough) shrink versions, respectively. I do like "TurboProp" for the Prop II. It's not without precedent, either, since the SX had a "turbo" mode with one instruction per clock.

    -Phil

    I've heard that a 4 COG shrink version would not be much cheaper, smaller, or less power hungry than the current 8 COG Propeller and thus doesn't make much sense.

    I flew on a McDonald Douglas stretch-80 once. Considering how loud the engines were, it should have been called a screech-80 wink.gif ... my worst flight ever.

    Cheers.
    --Steve

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Propeller Pages: Propeller JVM
  • Cluso99Cluso99 Posts: 18,069
    edited 2010-06-30 02:20
    AFAIK a Prop 1 with less cogs would be no cheaper because it is to do with volume. If it were possible to package the Prop 1 or 1.5 in a skinny DIP28 (may not be possible due to die size) then it would be no cheaper. It would just have 20 I/Os brought out (P0..16 & P28..31). Each cog uses virtually no power if inactive, so nothing saved here with fewer cogs. Would Parallax sell any extra Props if it had a DIP28 for the same price as the DIP40??? I doubt it.

    IMHO the best way for a Prop 1.5 would be that it would be a Prop 1 rev B silicon. It would totally replace Prop 1 rev A silicon. The new Prop 1 (or 1.5) rev B would come in DIP40, QFP44, QFN44 (with same footprint as Rev A, so only 32 I/O accessible, but internally another 32 I/O which could be used to communicate between cogs), and QFP84 and/or QFN84 (with the extra 32 I/O). This would allow the volume of Prop 1 & 1.5 to effectively be combined to allow the price to continue to drop.
    Once we ask for extras in Prop 1.5 (other than the extra 32 I/O) you may as well wait for Prop II. Prop 1.5 is close and any other extras would just delay it like Prop II.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Links to other interesting threads:

    · Home of the MultiBladeProps: TriBlade,·RamBlade,·SixBlade, website
    · Single Board Computer:·3 Propeller ICs·and a·TriBladeProp board (ZiCog Z80 Emulator)
    · Prop Tools under Development or Completed (Index)
    · Emulators: CPUs Z80 etc; Micros Altair etc;· Terminals·VT100 etc; (Index) ZiCog (Z80) , MoCog (6809)·
    · Prop OS: SphinxOS·, PropDos , PropCmd··· Search the Propeller forums·(uses advanced Google search)
    My cruising website is: ·www.bluemagic.biz·· MultiBlade Props: www.cluso.bluemagic.biz
  • Phil Pilgrim (PhiPi)Phil Pilgrim (PhiPi) Posts: 23,514
    edited 2010-06-30 02:33
    Cluso99 said...
    ... a Prop 1 rev B silicon. It would totally replace Prop 1 rev A silicon .... but internally another 32 I/O which could be used to communicate between cogs
    Yes, that would be very nice! It would also provide a means for linking certain counter functions without wasting external pins, assuming port A and port B can be mixed in the same counter configuration.
    Cluso99 said...
    Prop 1.5 is close...
    Is it? I'd like to believe that it were so. I guess we'll just have to be patient and see what transpires. (I'm still puzzled that Parallax would pursue it at this late date, unless they're planning significant price reductions across the Prop I line.)

    -Phil
  • Cluso99Cluso99 Posts: 18,069
    edited 2010-06-30 02:53
    When I said Prop 1.5 is close... I didn't realise my statement was ambiguous. I meant that the design completion is close. This has nothing to do with production. I have no inside information about whether Parallax is pursuing the Prop 1.5 or not.

    IMHO I believe Parallax should do the Prop 1.5 given that Prop II is still a year away. This would open up the Prop 1 market and give us some breathing space until the Prop II arrives. IIRC I understood that the Prop 1.5 could take about 3 months to silcon availability.

    Phil: I had not thought about concatenating the counters. I suppose this would be able to be done. I was more interested in having the extra I/O for adding external SRAM (of course) and also some pins could be used internally to pass say a byte between cogs, or flags, etc. I am sure they would be put to good use.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Links to other interesting threads:

    · Home of the MultiBladeProps: TriBlade,·RamBlade,·SixBlade, website
    · Single Board Computer:·3 Propeller ICs·and a·TriBladeProp board (ZiCog Z80 Emulator)
    · Prop Tools under Development or Completed (Index)
    · Emulators: CPUs Z80 etc; Micros Altair etc;· Terminals·VT100 etc; (Index) ZiCog (Z80) , MoCog (6809)·
    · Prop OS: SphinxOS·, PropDos , PropCmd··· Search the Propeller forums·(uses advanced Google search)
    My cruising website is: ·www.bluemagic.biz·· MultiBlade Props: www.cluso.bluemagic.biz

    Post Edited (Cluso99) : 6/30/2010 3:00:43 AM GMT
  • LeonLeon Posts: 7,620
    edited 2010-06-30 06:09
    The shuttle run is a year away, not working silicon.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Leon Heller
    Amateur radio callsign: G1HSM
  • hinvhinv Posts: 1,255
    edited 2010-06-30 06:23
    @Cluso99,

    To me, that is a great idea. If the prop1B or prop1.5 is 100% compatible with the original prop, then the production of the original prop could be dropped without affecting current products.
    Could it work that way in real life without any customers having a legitimate complaint?

    Doug
  • Cluso99Cluso99 Posts: 18,069
    edited 2010-06-30 07:03
    hinv: It is not quite compatible. The interpreter would remain identical. The real difference is in the way the I/O pins are addressed.

    AFAIK the WAITPEQ & WAITPNE are the only instructions affected. If the C is clear it reads P0-31 and if C is set it reads P32-63. Many of us have been remiss (in PASM) in not ensuring bothering with C and maybe relying on it later in our code.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Links to other interesting threads:

    · Home of the MultiBladeProps: TriBlade,·RamBlade,·SixBlade, website
    · Single Board Computer:·3 Propeller ICs·and a·TriBladeProp board (ZiCog Z80 Emulator)
    · Prop Tools under Development or Completed (Index)
    · Emulators: CPUs Z80 etc; Micros Altair etc;· Terminals·VT100 etc; (Index) ZiCog (Z80) , MoCog (6809)·
    · Prop OS: SphinxOS·, PropDos , PropCmd··· Search the Propeller forums·(uses advanced Google search)
    My cruising website is: ·www.bluemagic.biz·· MultiBlade Props: www.cluso.bluemagic.biz
  • AleAle Posts: 2,363
    edited 2010-06-30 09:09
    There was also no "shuttle run" for prop 1.5 so far... then it is as far in the future as propII... maybe a bit sooner if the first silicon works out-of-the-box [noparse]:([/noparse]

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Visit some of my articles at Propeller Wiki:
    MATH on the propeller propeller.wikispaces.com/MATH
    pPropQL: propeller.wikispaces.com/pPropQL
    pPropQL020: propeller.wikispaces.com/pPropQL020
    OMU for the pPropQL/020 propeller.wikispaces.com/OMU
    pPropellerSim - A propeller simulator for ASM development sourceforge.net/projects/ppropellersim
  • Bill HenningBill Henning Posts: 6,445
    edited 2010-06-30 13:57
    Hi Cluso99,

    I agree with parts of your great analysis - and disagree with others.

    1) I agree, 20 instructions is too much, no place to fit them. What I described is what would be idea, but practicality can reduce it to something *almost* as useful that can fit.

    2) I don't think fixing it to I/O register range is a good idea, and I think a combined and/shift may need two cycles, depending on how fast the barrel shifter and ALU are.

    After thinking about your great analysis, I think the way to go is to use INDIRA and INDIRB

    then:

    MOVEB INDIRB[noparse][[/noparse]0-3],INDIRA[noparse][[/noparse]0-3]

    and

    MOVEW INDIRB[noparse][[/noparse]0-1],INDIRA[noparse][[/noparse]0-1]

    fit in a single opcode, and only need 4 bits and 2 bits respectively; so it fits Chip's instruction extension scheme and does not use a single "mainline" instruction

    As I recall, INDIR* have auto-increment modes, so this would work well


    Cluso99 said...
    I have had this post sitting on my desk adding bits as I think of them. Please understand it is a bit disjointed as I think of other ideas. I won't waste the time trying to rewrite it, so here it is, arts and all.

    Bill said...

    movbxy dst,src - move byte Y (0..3) of src into byte X of dst
    movwxy dst,src - move word Y (0,1) of src into word X of dst

    This is what I believe Bill is trying to achieve. Generally we are trying to pack up or unpack bytes from longs. Sometimes though, we are reading bytes from an external source (SRAM) and this may be on pins other than P0-7 (or P24-31 on RamBlade). So for packing, we need to be able to shift them to a position within the long, clear the other bits, and then we can then OR them into a long. We have similar issues for extracting them. The current methods of AND the bits we don't want out, SHX to shift the bits into the correct position, and OR to add the bits into a destination, take both time and code space. We often also do the reverse by MOV as in taking a copy, AND to remove the unwanted bits, and SHX to shift the bits. This may be followed by an OR.



    So to pack into a long...

    AND dst,src - extract 8 bits by anding with 32bit source

    SHX dst,#xx - shift into correct position

    MOV dst2,dst - copy to the final destination (dest2)

    .... - obtain the next byte (maybe an INA instruction)


    AND dst,src - extract next 8 bits by anding with 32bit source

    SHX dst,#xx - shift into correct position

    OR dst2,dst - copy to the final destination (dest2)

    .... - obtain the next byte (maybe an INA instruction)

    AND dst,src - extract 3rd set of 8 bits by anding with 32bit source

    SHX dst,#xx - shift into correct position

    OR dst2,dst - copy to the final destination (dest2)

    .... - obtain the next byte (maybe an INA instruction)

    AND dst,src - extract 4th set of 8 bits by anding with 32bit source

    SHX dst,#xx - shift into correct position

    OR dst2,dst - copy to the final destination (dest2)

    ... - dst2 now contains 4 bytes in 1 long




    Currently we have movs, movd and movi for moving 9bit fields into source, destination and instruction fields. We do not have the reverse to extract the destination or instruction fields although that is not often required anyway.



    I love Bill's concept. But where are those instructions going to come from? The movbxy will require 4 extra bits and the movwxy will require 2 extra bits. This means finding an extra 20 opcodes. I understand what you are getting at, so is there another way?



    Perhaps the destination [noparse][[/noparse]or the source] could be a fixed location (i.e. one of the special registers in $1Fx-$1FF, or pointed to by one of those registers) so that the DST field could be used to hold these extra bits. This may even permit other special op codes to be added here, as 1 opcode would allow 512 sub opcodes (being all 9 bits of the destination). If I understand Chip's concept of the new 6 special registers correctly, then perhaps one of the pointer registers could be used to point to the destination with the cog. Perhaps then, the top bit of the DST field could be used to optionally increment this register.



    If my method here could be used, then perhaps 2 more similar instructions (6 opcodes) could be added within this framework to..



    extractbx src - zeros all other bits except the byte x (0..3) of the src

    extractwx src - zeros all other bits except the byte x (0..1) of the src

    Of course these can be done with AND x00FF0000 etc, but do save having constants wasting cog space.



    Perhaps just combining the AND and SHIFT instructions could help significantly. It would be destructive to the source, but since only 1 operand needs to be specified, the source could be used for the bits to determine..... byte/word, offset, shift and direction

    extract dst,#000wyydss (immediate value shown) .....see alternative version below XSR & XSL

    000: unused bits - could be used by another set of instructions

    w: 1=word, 0=byte

    yy: offset of required byte (0..3) or word (0..1)

    d: direction of shift 1=SHL (shift left), 0=SHR(shift right)

    ss: shift of required byte (0..3) or word (0..1)



    Looking at the current instruction set, there are a number of single operand fields where the src is used to specify parameters. In particular, the ROR, ROL, SHR, SHL, RCR, RCL, SAR, REV use the src as a 5 bit shift/bit specification. Perhaps these instructions could now share a common single instruction with the "extract" above, and perhaps some others, by using the src bits (5..8) as follows (immediate values shown)

    ROx dst,#000_d_bbbbb - ROR/ROL, d=direction (1=left, 0=right), bbbbb=bits to shift (0..31)

    SHx dst,#001_d_bbbbb - SHR/SHL, " " " "

    RCx dst,#010_d_bbbbb - RCR,RCL, " " " "

    SAR dst,#011_1_bbbbb - SAR direction=1=right "

    REV dst,#011_0_bbbbb - REV reverse 32-S[noparse][[/noparse]4..0] bottom bits in D and zero extend


    XSx dst,#100_d_wyyss - XSR/XSL, EXTRACT+SHIFT d=direction (1=left, 0=right),

    yy= offset of required byte (0..3) or word (0..1),

    ss= shift of required byte (0..3) or word (0..1)

    The above only uses 1 instruction and leaves #101..#111 free for other single operand instructions. So, it actually frees up 7 existing op codes.

    Of course, I have no idea how much this requires in silicon, nor what delays in the design it may have. Note the impact on existing software that has the upper bits [noparse][[/noparse]8..4] set.
    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com
    My products: Morpheus / Mem+ / PropCade / FlexMem / VMCOG / Propteus / Proteus / SerPlug
    and 6.250MHz Crystals to run Propellers at 100MHz & 5.0" OEM TFT VGA LCD modules
    Las - Large model assembler Largos - upcoming nano operating system
  • Bill HenningBill Henning Posts: 6,445
    edited 2010-06-30 14:06
    Phil Pilgrim (PhiPi):

    Ofcourse it is up to Chip what he adds and what he does not! I just pointed out the usefullness of such instructions, and he agreed that they would be very useful, and would try to fit them if feasible.

    Beau is only finished with the I/O pads, as far as I know.

    Cluso99:

    Chip is constantly tweaking the design, they are not at a "freeze" stage yet. Since the last time it was "final", 24 bit color, 1080P, and SDRAM support has shown up.

    I think Chip is trying to make it the best possible (achm) chip, and would rather take a bit longer.

    As far as P1.5 goes, my understanding is that there is a bug in the CADCAM layout package which prevents verification; and they are working with the CADCAM vendor to fix it. The bug is not resolved yet, as of Saturday at UPEW.

    hinv:

    I really would like P1.5 as well - it would be lower power than P2, and it could be out relatively soon. This would hold us over until P2 shows up.

    I think P1.5 would be an excellent intermediate device, I for one could really use the additional I/O's.

    Leon:

    Unfortunately I suspect you may be right - unless we are all very lucky and the shuttle run is close to perfect.

    LeoD, potatohead

    P1.5 would not have multiply or divide; it would be exactly the same as P1, but implement PORTB

    Phil Pilgrim (PhiPi)

    I respectfully disagree. I'd like a P1.5 even if P2 came out on the same day. For a lot of designs P2 is a huge overkill, and P1.5 would be much lower powered.

    I think the market would easily support

    P1 ~$6
    P1.5 ~$8
    P2 ~$10

    in qty.1

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com
    My products: Morpheus / Mem+ / PropCade / FlexMem / VMCOG / Propteus / Proteus / SerPlug
    and 6.250MHz Crystals to run Propellers at 100MHz & 5.0" OEM TFT VGA LCD modules
    Las - Large model assembler Largos - upcoming nano operating system
  • Bill HenningBill Henning Posts: 6,445
    edited 2010-06-30 14:10
    I REALLY like this idea!
    Cluso99 said...
    AFAIK a Prop 1 with less cogs would be no cheaper because it is to do with volume. If it were possible to package the Prop 1 or 1.5 in a skinny DIP28 (may not be possible due to die size) then it would be no cheaper. It would just have 20 I/Os brought out (P0..16 & P28..31). Each cog uses virtually no power if inactive, so nothing saved here with fewer cogs. Would Parallax sell any extra Props if it had a DIP28 for the same price as the DIP40??? I doubt it.

    IMHO the best way for a Prop 1.5 would be that it would be a Prop 1 rev B silicon. It would totally replace Prop 1 rev A silicon. The new Prop 1 (or 1.5) rev B would come in DIP40, QFP44, QFN44 (with same footprint as Rev A, so only 32 I/O accessible, but internally another 32 I/O which could be used to communicate between cogs), and QFP84 and/or QFN84 (with the extra 32 I/O). This would allow the volume of Prop 1 & 1.5 to effectively be combined to allow the price to continue to drop.
    Once we ask for extras in Prop 1.5 (other than the extra 32 I/O) you may as well wait for Prop II. Prop 1.5 is close and any other extras would just delay it like Prop II.
    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com
    My products: Morpheus / Mem+ / PropCade / FlexMem / VMCOG / Propteus / Proteus / SerPlug
    and 6.250MHz Crystals to run Propellers at 100MHz & 5.0" OEM TFT VGA LCD modules
    Las - Large model assembler Largos - upcoming nano operating system
  • Phil Pilgrim (PhiPi)Phil Pilgrim (PhiPi) Posts: 23,514
    edited 2010-06-30 14:53
    Bill,

    I think you've illustrated the point I was trying to make about the P1.5: For it to fit into the product lineup, the price of the P1 would have to drop (or that of the P2 would have to be higher than the projected $10) to make room for it. Having a P1.5 priced near or above the P2's price might be a tough sell. The lower power consumption is an advantage for the P1 family, of course, but not enough of one to be competitive with the P2 at the same price.

    At some point in the P2's development, market forces will have to overcome those of engineering, and it will have to be deemed "good enough"; else it will never get finished. If we're not at that point now, we probably should be.

    -Phil
  • Bill HenningBill Henning Posts: 6,445
    edited 2010-06-30 14:59
    I agree, it should be frozen as soon as possible, and pricing has to make room for P1.5 [noparse]:)[/noparse]
    Phil Pilgrim (PhiPi) said...
    Bill,

    I think you've illustrated the point I was trying to make about the P1.5: For it to fit into the product lineup, the price of the P1 would have to drop (or that of the P2 would have to be higher than the projected $10) to make room for it. Having a P1.5 priced near or above the P2's price might be a tough sell. The lower power consumption is an advantage for the P1 family, of course, but not enough of one to be competitive with the P2 at the same price.

    At some point in the P2's development, market forces will have to overcome those of engineering, and it will have to be deemed "good enough"; else it will never get finished. If we're not at that point now, we probably should be.

    -Phil
    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com
    My products: Morpheus / Mem+ / PropCade / FlexMem / VMCOG / Propteus / Proteus / SerPlug
    and 6.250MHz Crystals to run Propellers at 100MHz & 5.0" OEM TFT VGA LCD modules
    Las - Large model assembler Largos - upcoming nano operating system
Sign In or Register to comment.