Shop OBEX P1 Docs P2 Docs Learn Events
Should the next Propeller be code-compatible? - Page 3 — Parallax Forums

Should the next Propeller be code-compatible?

1356727

Comments

  • cgraceycgracey Posts: 14,256
    edited 2008-08-28 16:39
    jazzed said...

    ...will you have any kind of post increment or other variations? This has been requested/suggested before many times.

    Yes,·and it's already been implemented in the FPGA design. It uses the same RDxxxx/WRxxxx instruction codes, but is activated when S is immediate and S[noparse][[/noparse]8] is high (as if you were going to immediately access $100..$1FF, which nobody has probably ever done, as immediate accesses tend to be focused on locations $000..$00F). The S coding looks like this:

    %1_SUP_XXXXX

    1 = Trigger pointer addressing, not immediate $000.$0FF addressing
    S = Select pointer·(0 = PTRA, 1 = PTRB)
    U = Update·PTRx (0 = don't update PTRx, 1 = add scaled index to PTRx)
    P = Pre/Post usage for addressing (0 = use·PTRx plus scaled index, 1 = use PTRx)
    X = MSB-extended index (-16..+15) which gets scaled according to xxBYTE/xxWORD/xxLONG

    Here's how you use them:

    ···············SETPTRA D··············'set·PTRA to D
    ···············SETPTRB D··············'set·PTRB to D

    ···············GETPTRA D··············'get·PTRA into D
    ···············GETPTRB D··············'get·PTRB into D

    ···············RDBYTE· D,PTRA·········'read byte at PTRA into D (S = %1000_00000)
    ···············RDWORD· D,PTRB[noparse][[/noparse]10]·····'read·word at PTRB+10*2 into D (S = %1100_01010)
    ···············RDLONG··D,PTRA[noparse][[/noparse]-4]·····'read·long at PTRA-4*4 into D (S = %1000_11100)
    ···············RDLONG· D,PTRB[noparse][[/noparse]--1]····'read long at PTRB-1*4 into D, subtract 1*4 from PTRB (S = %1110_11111)
    ···············RDBYTE· D,PTRA[noparse][[/noparse]++3]····'read·byte at PTRA+3*1 into D, add 3*1 to PTRA (S = %1010_00011)
    ···············RDBYTE· D,PTRA[noparse][[/noparse]1++]····'read·byte at PTRA into D, add 1*1 to PTRA (S = %1011_00001)
    ···············WRWORD· D,PTRA[noparse][[/noparse]2--]····'write D to word at PTRA, subtract 2*2 from PTRA (S = %1011_11110)

    Both PTRA and PTRB get initialized to PAR.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔


    Chip Gracey
    Parallax, Inc.

    Post Edited (Chip Gracey (Parallax)) : 8/28/2008 5:02:56 PM GMT
  • Beau SchwabeBeau Schwabe Posts: 6,568
    edited 2008-08-28 16:50
    heater,

    "Where did all that extra silicon magically appear from ?" - The current Propeller uses a 350nm process, while the Propeller under development is being done in a 180nm process.
    If all of the transistors, capacitors, resistors, etc. scaled to a 1:1 translation between processes you would basically have a real estate gain of 3.78 times the current design. Unfortunately,
    the components don't a scale the same, but by slightly altering the design and making functional improvements based on characteristics of the different process, you can get close.
    The current die is about 6mm square... the new die will be slightly larger.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Beau Schwabe

    IC Layout Engineer
    Parallax, Inc.
  • potatoheadpotatohead Posts: 10,261
    edited 2008-08-28 16:51
    Thanks Chip!

    And you are just damn cool for talking CPU design with us.

    Personally, I'm very excited about this:

    rep [noparse][[/noparse]32,3] 'repeat 3 instructions 32 times

    nop 'must execute two instructions here

    nop

    shl x,#1 'begin 3-instruction block

    cmpsub x,y wc

    rcl q,#1 'total cycles = 3 + 3*32 = 99

    !!!!

    Way to go on optimizing that COG instruction space! Unrolled loops without actually unrolling them -->at least that's what I'm seeing.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Propeller Wiki: Share the coolness!

    Chat in real time with other Propellerheads on IRC #propeller @ freenode.net
  • jazzedjazzed Posts: 11,803
    edited 2008-08-28 17:06
    Excellent Chip!

    One more ... I've often wanted to do this: "WRLONG INA, ptr"
    Yes, no, maybe ?

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    --Steve
  • Phil Pilgrim (PhiPi)Phil Pilgrim (PhiPi) Posts: 23,514
    edited 2008-08-28 17:10
    Chip,

    That pointer addressing is too cool for words! Thanks for finding a way to make it happen!

    -Phil

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    'Still some PropSTICK Kit bare PCBs left!
  • cgraceycgracey Posts: 14,256
    edited 2008-08-28 17:19
    jazzed said...
    Excellent Chip!

    One more ... I've often wanted to do this: "WRLONG INA, ptr"
    Yes, no, maybe ?

    I understand. That would mean making INA accessible via D. Not a problem, but you'll have six other instructions per hub cycle outside of the WRLONG, so you'll have·plenty of time to do·a 'MOV reg,INA'. If you want to quickly capture INA activity into cog RAM, you could do this:

    ········ SETINDA buffptr·· 'set INDA's pointer to a 256-register buffer

    again··· REP···· [noparse][[/noparse]256,1]
    ········ NOP·············· 'put something useful instead of these two NOPs
    ········ NOP
    ········ MOV···· INDA,INA· 'repeats 256 times, auto-inc'ing and wrapping INDA's pointer

    ········ 'buff = 256 snapshots of INA'

    ········ JMP···· #again


    buffptr· PTRX··· buff,256 'define circular buffer, same as LONG (buff+256-1)<<9 + buff
    buff···· RES···· 256

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔


    Chip Gracey
    Parallax, Inc.
  • rokickirokicki Posts: 1,000
    edited 2008-08-28 17:25
    Break compatibility. Let your ideas run wild. If we get even *one* neat thing by breaking compatibility, it's worth it.

    But at the same time, it would be cool to add conditional compilation to the IDE so we could make our code work on either.
  • cgraceycgracey Posts: 14,256
    edited 2008-08-28 17:26
    Phil Pilgrim (PhiPi) said...
    Chip,

    That pointer addressing is too cool for words! Thanks for finding a way to make it happen!

    -Phil

    Thank Paul Baker. He came up here one day and we got ALL SORTS of cool stuff figured out.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔


    Chip Gracey
    Parallax, Inc.
  • jazzedjazzed Posts: 11,803
    edited 2008-08-28 17:36
    Awesome. So would this sample at 80MHz assuming 160MHz clock?
    Entry     ORG       0
     
              MOV       PTR, PAR
              MOV       T0,  #511
    :loop     WRLONG    INA, PTR[noparse][[/noparse]1++]
              NOP
              WRLONG    INA, PTR[noparse][[/noparse]1++]
              DJNZ      T0,  #:loop
     
    PTR       PTRX      1
    T0        RES       1
    
    

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    --Steve
  • cgraceycgracey Posts: 14,256
    edited 2008-08-28 17:45
    jazzed said...
    Awesome. So would this sample at 80MHz assuming 160MHz clock?
    Entry     ORG       0
     
              MOV       PTR, PAR
              MOV       T0,  #511
    :loop     WRLONG    INA, PTR[noparse][[/noparse]1++]
              NOP
              WRLONG    INA, PTR[noparse][[/noparse]1++]
              DJNZ      T0,  #:loop
     
    PTR       PTRX      1
    T0        RES       1
    
    

    No. Each WRLONG must wait for its hub turn that comes every 8 clocks (assuming 8 cogs). So, it would sample at 20MHz. If you want really high bandwidth you need to use cog ram. Hub ram could only keep up for some periodically-distilled results, but not the whole data stream.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔


    Chip Gracey
    Parallax, Inc.
  • Phil Pilgrim (PhiPi)Phil Pilgrim (PhiPi) Posts: 23,514
    edited 2008-08-28 17:55
    Chip Gracey said...
    ... assuming 8 cogs ...
    ??? Has the number of cogs not been decided yet, or are you counting active cogs? (I presume it's not the latter, since that would break determinism; but I had to ask.)

    -Phil

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    'Still some PropSTICK Kit bare PCBs left!
  • cgraceycgracey Posts: 14,256
    edited 2008-08-28 18:07
    Phil Pilgrim (PhiPi) said...

    ??? Has the number of cogs not been decided yet, or are you counting active cogs? (I presume it's not the latter, since that would break determinism; but I had to ask.)
    Well, the other day Beau and I did some floorplan checking and 16 cogs would result in a die about 8mm on the edge, which is huge, so I've kind of been thinking 8. I think it would be bad to break determinism, in any case.

    Once I break from current compatibility with the FPGA, interpreter, and compiler, it's going to be a lot easier to move into the wild blue yonder. I came to the conclusion last night that the next order of business is to move to 32 bit addressing. This profoundly affects the compiler (now 8,153 lines of 80386 code), interpreter, and booter, not to mention the Windows app. It's an all-or-nothing prospect. This is going to be a difficult transition, but once made, the sky's the limit.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔


    Chip Gracey
    Parallax, Inc.
  • jazzedjazzed Posts: 11,803
    edited 2008-08-28 18:17
    So we're back to this assuming 160MHz ?· 10MHz sample rate ?
                  movd      :wrdata,bptr            ' set pointer
                  mov       ndx,    #511            ' set count
    :wrdata       mov       0-0,    ina             '   0ns ... get data ... next is 100ns
                  add       bptr,   #1              '  25ns ... increment pointer
                  movd      :wrdata,bptr            '  50ns ... save data
                  djnz      ndx,    #:wrloop        '  75ns ... repeat until buffer done
                  
    
    ndx           res       1
    
    

    Or maybe this?· 13.3MHz sample rate ?
                  movd      :wrdata,bptr            ' set pointer
                  mov       ndx,    #511            ' set count
    
    
    :wrdata       mov       0-0,    ina             '   0ns ... get data ... next is 75ns
                  movd      :wrdata,bptr[noparse][[/noparse]1++]       '  25ns ... save data
                  djnz      ndx,    #:wrloop        '  50ns ... repeat until buffer done
    ptr           ptrx      1              
    
    ndx           res       1
    
    

    Apples to apples without REP which I don't exactly grasp yet. Looks like your REP example samples at 8MHz no?


    BTW, how much of schedule change does 32 non-compatability mode cause ? I'll take the current design if the change has too much impact. Right now, you are ahead of the curve and have no competition in this class of microcontroller. Losing to the next leap-frog technology competitor can be devestating in the biggest markets.

    Thanks for entertaining our questions so far.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    --Steve
  • parskoparsko Posts: 501
    edited 2008-08-28 18:20
    Chip Gracey (Parallax) said...
    ... and 16 cogs would result in a die about 8mm on the edge, which is huge, so I've kind of been thinking 8.

    Huh? Did I read that correctly?

    In my best Arnold voice: "Wha' chu talkin bout Willis?"
  • heaterheater Posts: 3,370
    edited 2008-08-28 18:26
    Oh, heart sinking here. I had convinced myself from reading all the posts that you had found a magic way to get 16 COGs in there.
    Still 8 gives us double the HUB access rate of 16 so not so bad.
    Just have to figure out how to get two threads (or more) running in a cog at a super lick with all those new instructions/modes.
    BUT hey, what about 12 COGs then ? wink.gif

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    For me, the past is not over yet.
  • Phil Pilgrim (PhiPi)Phil Pilgrim (PhiPi) Posts: 23,514
    edited 2008-08-28 18:37
    Chip,

    If you do limit it to eight, would there be any chance of including four counters per cog, instead of two? A program I wrote recently gobbled cogs, not because of processing requirements, but because it needed the counters. I know there are some addressing constraints in the cogs, so I don't know where you'd put CTRC .. PHSD without eating into code space or (shudder) banking them. (Actually banking might not be that bad if each cog had a writable address translation table for the SPRs. For example, if I'm using four counters, I probably don't need access to the video registers. And once the CTRXs are set up, I could shunt them out of address range as well, to gain access to INA and INB, say.)

    Barring that, is a non-power-of-two cog count (i.e. 12) out of the question? (I'm not saying any of this stuff is necessary — just trying to probe what's possible. smile.gif )

    -Phil

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    'Still some PropSTICK Kit bare PCBs left!
  • QuattroRS4QuattroRS4 Posts: 916
    edited 2008-08-28 19:06
    Chip Gracey said...
    16 cogs would result in a die about 8mm on the edge, which is huge, so I've kind of been thinking 8.


    As it has been decided 32bits and a clean slate ... I was really hoping you'd say 16Cogs.
    If the slate is clean - go all out ! I am sure anyone here would't mind if PropII launch was delayed to facilitate .

    Due to recent developments Will it be More Cogs AND Ram ? as opposed to more Cogs OR Ram ! ... oh to dream!

    I am getting carried away here !

    Regards,
    John Twomey

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    'Necessity is the mother of invention'

    Those who can, do.Those who can’t, teach.

    Post Edited (QuattroRS4) : 8/28/2008 7:43:01 PM GMT
  • cgraceycgracey Posts: 14,256
    edited 2008-08-28 19:18
    Phil Pilgrim (PhiPi) said...

    If you do limit it to eight, would there be any chance of including four counters per cog, instead of two?
    We have an identical issue with # of ports. I've been thinking that we should have a single set of INP, OUTP, DIRP, and ALTP (new, for analog confguration) registers for all 32-bit ports. There would be a selection mechanism via special instruction (4-port example):

    ······ setport D····· 'mux port D[noparse][[/noparse]6..5] into INP/OUTP/DIRP/ALTP register spaces (D is a pin#)
    ······ setport [noparse][[/noparse]n]··· 'mux port n (0..3) into register INP/OUTP/DIRP/ALTP spaces (n is·a port#)

    We could do the same for however many CTRs we've got. This would also free up some special register space (4-counter example):

    ······ setctr· D····· 'mux D[noparse][[/noparse]1..0] into CTR/FRQ/PHS register spaces
    ······ setctr··[noparse][[/noparse]n]··· 'mux n (0..1) into CTR/FRQ/PHS register spaces

    Once·we get to 32-bit addressing, everything will be malleable and it will be easy to do stuff like this. Putting in 16 cogs (and a mechanism for selecting up to 64) would be really simple. It's going to take several days to get there, though.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔


    Chip Gracey
    Parallax, Inc.
  • Capt. QuirkCapt. Quirk Posts: 872
    edited 2008-08-28 19:33
    Well, the other day Beau and I did some floorplan checking and 16 cogs would result in a die about 8mm on the edge, which is huge, so I've kind of been thinking 8. I think it would be bad to break determinism, in any case.
    How big would a finished chip be with an 8mm sq die?, is the extra 4 sq mm's mega bucks more? and·which manufacture·or what chip(s) are you trying to compete with?
  • simonlsimonl Posts: 866
    edited 2008-08-28 19:45
    @Chip: I can count the number of uC manufacturers that would even contemplate this kind of consumer involvement on - well - NO hands! Kudos to you.

    FWIW: I'd have said take the clean-slate approach anyway, so thanks for that decision.

    What's all this about EIGHT COGs? Noooo! I'd pay extra for 16 - my visions for chip use would mostly need more than 8 COGs - I'm not clever enough to code multiple tasks into a COG, and just throwing a COG at a problem is THE appeal of the Propeller for me...

    Oh; and if the IDE's got to be re-written anyway, PLEASE add conditional compilation wink.gif

    (BTW: How many pins are we gonna have on PII?)

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Cheers,

    Simon
    www.norfolkhelicopterclub.co.uk
    You'll always have as many take-offs as landings, the trick is to be sure you can take-off again ;-)
    BTW: I type as I'm thinking, so please don't take any offense at my writing style smile.gif
  • Mike GreenMike Green Posts: 23,101
    edited 2008-08-28 19:59
    For sure, if you're going to scrimp on pins or cogs or counters, cleanly allow room for 2 or 4 times that many in the instruction set and control registers to avoid having to change code in the future as chip density and process technology continues to improve.
  • SapiehaSapieha Posts: 2,964
    edited 2008-08-28 20:13
    Hi Chip

    You have open mind consrtuct PropI.
    Open it more to avoid PropII from fault in its parellel power procesing

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Nothing is impossible, there are only different degrees of difficulty.

    Sapieha
  • heaterheater Posts: 3,370
    edited 2008-08-28 20:27
    Sapieha: Please elaborate. What fault do you see? How should it be avoided?

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    For me, the past is not over yet.
  • Erik FriesenErik Friesen Posts: 1,071
    edited 2008-08-28 20:30
    Hmm. Do I understand the above to mean a semi- pic style banksel?

    Well whatever but if you do it that way please make it easy to keep straight.
  • RaymanRayman Posts: 14,853
    edited 2008-08-28 20:31
    I think 8 faster cogs (by 8x) would be enough for me. Considering I need 4 to do SXGA with cursor and this could probably be done with just one 8x faster cog...

    Any way to have one cog (or the hub) access the unused counters in another cog?
  • SapiehaSapieha Posts: 2,964
    edited 2008-08-28 20:40
    Hi heater.

    Sorry my bad English but.

    1. LMM that many talk is not parallel power prcesing in full COG speed. Only semi parallel!
    2. If code compatiblity decrease parallel power prcesing. Skip It!
    and many other aspects why decrease COG´s capablites

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Nothing is impossible, there are only different degrees of difficulty.

    Sapieha
  • hippyhippy Posts: 1,981
    edited 2008-08-28 20:41
    Chip Gracey (Parallax) said...
    16 cogs would result in a die about 8mm on the edge, which is huge, so I've kind of been thinking 8.

    I reckon there's going to be a lot of disappointment if that's the case as it seems everyone here has been thinking 16.

    If there's a RDTXFR and WRTXFR which will magically transfer longs between two Propellers connected using a single pin at high speed that would mitigate the number of Cogs needed by making multi-chip arrays easier. Something like that would be nice to have even if it wasn't a blindingly fast link.
  • Paul BakerPaul Baker Posts: 6,351
    edited 2008-08-28 20:45
    Indeed each cog will be 8x faster, higher use of JMPRET can fold multiple processes into the the same cog. And the current multi cog video drivers can be ported into 1 or 2 now.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Paul Baker
    Propeller Applications Engineer

    Parallax, Inc.
  • SapiehaSapieha Posts: 2,964
    edited 2008-08-28 20:47
    Hi hippy.

    It was My point in thred.

    http://forums.parallax.com/forums/default.aspx?f=25&p=1&m=212396

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Nothing is impossible, there are only different degrees of difficulty.

    Sapieha
  • Ken PetersonKen Peterson Posts: 806
    edited 2008-08-28 20:51
    Wow....I'm away from reading this forum for a couple of days and the PropII is being re-designed in real time!

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    ·"I have always wished that my computer would be as easy to use as my telephone.· My wish has come true.· I no longer know how to use my telephone."

    - Bjarne Stroustrup
Sign In or Register to comment.