Prop2 FPGA files!!! - Updated 2 June 2018 - Final Version 32i

15556586061160

Comments

  • Cluso99 wrote: »
    Just finished verifying the internal stack. It is a depth of 8.
    As you push more onto the stack, the bottom one will drop off when full.
    As you pop off the stack, the bottom value moves up one and the new bottom value is the same as it was previously. ie once you empty the stack, the last legitimate value popped will continue to be delivered on underflow.

    I needed to verify this behaviour as I am building some hubexec routines where I want to use the internal stack but I do not know what depth is available for use. So I can now save the stack (to hub???), and restore it when I am finished.

    That behavior is correct. As for saving the stack, you don't need to know the depth. Just pop all 8, then push all 8 when restoring it.
  • Cluso99 wrote: »
    Just finished verifying the internal stack. It is a depth of 8.
    As you push more onto the stack, the bottom one will drop off when full.
    As you pop off the stack, the bottom value moves up one and the new bottom value is the same as it was previously. ie once you empty the stack, the last legitimate value popped will continue to be delivered on underflow.

    I needed to verify this behaviour as I am building some hubexec routines where I want to use the internal stack but I do not know what depth is available for use. So I can now save the stack (to hub???), and restore it when I am finished.
    Also be aware that the hardware stack is 22 bits wide not 32.
  • SeairthSeairth Posts: 2,428
    edited 2016-06-26 - 14:46:34
    You know, storing the stack to hub memory will be kinda slow. Too bad there isn't a pair of instructions that would pop/push all of them at once to/from the hub. They would take ~9-24 clock cycles. This would make context switches much faster than the current approach (I think).
  • Transferring the stack to cog RAM would take just 8 cycles, and then storing it to hub RAM ~9-24 cycles. It doesn't seem worth the effort to create special instructions to do the same thing. I suppose the special instructions could be faster by overlapping the stack accesses with the hub accesses. If we're voting, I'd vote not to do it so the P2 would be available sooner. I don't think I can wait another decade.
  • Dave Hein wrote: »
    Transferring the stack to cog RAM would take just 8 cycles, and then storing it to hub RAM ~9-24 cycles. It doesn't seem worth the effort to create special instructions to do the same thing. I suppose the special instructions could be faster by overlapping the stack accesses with the hub accesses. If we're voting, I'd vote not to do it so the P2 would be available sooner. I don't think I can wait another decade.

    Wouldn't that be 16 cycles, not 8? And you would also require 8 unused cog register. And you would need to set up the fast write. But, you are right. It certainly shouldn't be added if it affects delivery of the chip. This is purely a minor optimization, and therefore not critical to the design.
  • As ozpropdev just said, stack is only 22 bits. I don't think it's very useful in the general case. I'm just going to leave it alone and let it do it's call/ret thing.
  • Sorry, I forgot that instructions take 2 cycles instead of 1. It's been a while since I paid much attention to Parallax. As time marches on there are more new and exciting chips and boards out there, and Parallax is kind of getting lost in the crowd. They really need to get the P2 out ASAP to remain relevant in the market.
  • RaymanRayman Posts: 10,939
    edited 2016-06-26 - 19:25:46
    For smartpin instructions, think I'd like:

    SetPinM
    SetPinX
    SetPin
    GetPin
    AckPin
  • Rayman wrote: »
    For smartpin instructions, think I'd like:

    SetPinM
    SetPinX
    SetPinY
    GetPin
    AckPin

    Have you tried writing code and keeping track of what you are doing? Which is it that we write data to again? Was that X or was that Y?

    Having written some smartpin code in assembler and in high level I must say that to read and write data using RDPIN and WRPIN makes it a lot easier to read and remember. As for setting the mode then SETPIN also makes sense but so as not to confuse what we do with the extra parameter register I would avoid using the word set and simply say WXPIN. Also all these mnemonics end cleanly in "PIN" and are clearly identified as the PIN family along with their cousin AKPIN. I would say that WXPIN could be called something else but it is just as easy to leave it too.

  • RaymanRayman Posts: 10,939
    edited 2016-06-26 - 17:05:10
    Looks to me like M, X, and Y will usually only get set once, so they're not so important.

    The thing you'll do a lot of is reading and acking the pin.

    I'd be fine with READPIN instead of GETPIN.
    Not a fan of contracting syntax when not necessary. Think that makes it less clear, and harder to remember.
  • Also they way I have it, they all sorta rhyme with "SmartPin". I think that will also make it easier to remember...
  • contracting is normal for mnemonics, that's why they're called mnemonics, you know like rdbyte and wrfast etc. I don't know where you get the idea from that you would only be doing a lot of reading from a pin as I have found that that is not the case. What do you do when you are transmitting serial data?
  • RaymanRayman Posts: 10,939
    edited 2016-06-26 - 17:19:56
    You're right Y, you do need that a lot too...
    Maybe SetPinY should just be SetPin, or WritePin (if that's not to long) or SendPin or something else without the Y.

    There is rdbyte and wrfast already, like you say, so maybe that's OK. I think those things are harder to remember though.
  • Cluso99Cluso99 Posts: 16,185
    edited 2016-06-27 - 01:40:45
    When you are coding or looking at your code later, it's nice to be able to clearly see where you are reading and writing the data register. I think we are agreed on WRPIN and RDPIN (the Y and Z registers).

    That. Just leaves us with the M and X registers.

    Sine you are mostly only ever going to set the M once, and that is the smart pin MODE, then why not just call it MODEPIN ?

    In some modes, the X can be set once and other modes it it set often. How about we call it either SETPIN or SETXPIN. I very much dislike WXPIN as it is confusing with WRPIN which is the data.

    I vote for
    M:   MODEPIN
    X:   SETPIN (or SETXPIN)
    Y:   WRPIN
    Z:   RDPIN 
    
    and (postedit)
         ACKPIN
    
  • Cluso99Cluso99 Posts: 16,185
    edited 2016-06-26 - 19:16:34
    Stack saving...

    I don't see a need for special saving instructions. My use is not very time specific as it will usually result in waiting for user input. If my code only requires a depth of 4 then I only need to save/restore 4 levels.

    My code is for doing an extended monitor/debugger like I did before P2Hot. The routines will also be useful for standard input/output plus serial. Things like data, hex, decimal, binary, dump, string, etc.

    And yes, the width is only 22 bits. (Not verified - will check tomorrow)
  • Cluso99Cluso99 Posts: 16,185
    edited 2016-06-26 - 19:37:35
    <ducks for cover>

    While coding, I have been thinking about the differences between the P1 instruction set and the P2 instruction set. They are really quite different aside from the basic instructions such as ADD, AND, ROR, etc.

    This made me ponder the question...

    What if the first 8 cogs were P1B compatible?

    The P1 section would be (than P1) faster, much smaller die than P2, 64 I/O, more hub, security, maybe in the wash up not that much more power than the P1, and would give an immediate upgrade path.

    Remember, P1V verilog is already done.

    Or dare I even suggest, configurable P1 or P2 ALU ???

    At least, it could be considered for a pin compatible variant.
  • [Comment deleted]
  • After getting to know this P2, I don't really want to go back to P1...
    Except maybe for cases where small size is important...
  • Rayman wrote: »
    After getting to know this P2, I don't really want to go back to P1...
    Except maybe for cases where small size is important...

    I am really missing the "glue" objects in the P1. I have to get them running before I can get to the real P2 stuff.

    So I was just musing that a mix of P1 & P2 cogs would be quite nifty. We could have the old combined with the new. Of course we would want a version with just the new bits. Others would likely be happy with just a P1 with more hub and faster. Later maybe we could get a mix of P2 and P2Hot cogs ;)

    But if the first cab out of the ranks were a P1 & P2 combo, then if the P2 section had problems, at least the P1 would most likely work.
  • Cluso99 wrote: »
    Others would likely be happy with just a P1 with more hub and faster.

    Or just faster. :)

    Tiny and with four fast P1 cogs - that's where I live.

  • jmgjmg Posts: 14,324
    Cluso99 wrote: »
    <ducks for cover>
    What if the first 8 cogs were P1B compatible?
    It is easy to write, just a mere few words, but much harder to actually do.

    In reality, this would add more complexity to testing, increasing the chance of falling between two stools and having neither work properly.
    Documentation gets much more difficult, as do toolchains, as now you have to manage two compilers at the same time.... well, probably 3 as the P1B is bound to end up not-quite-P1.

    There seems little risk of P2 16 COGS not fitting, and that is the only scenario where any consideration of a half-baked hybrid would be needed.

    Compared to the issues you create, the issue you sought to solve is quite minor.
    Most new users will not be coding in PASM anyway.
  • Cluso99 wrote: »
    And yes, the width is only 22 bits. (Not verified - will check tomorrow)
    See here for more discussion on the hardware stack

  • jmgjmg Posts: 14,324
    User Name wrote: »
    Or just faster. :)

    Tiny and with four fast P1 cogs - that's where I live.

    Remember, Chip has the Verilog configured so four fast P2 COGS is quite practical.

  • Does anyone recall what the ROLBYTE and MOVBYTS instructions do?

    I am trying to reverse the bytes in a long. A while ago we had ESWAP8 but this is gone now.
  • SeairthSeairth Posts: 2,428
    edited 2016-06-27 - 10:19:24
    Cluso99 wrote: »
    Does anyone recall what the ROLBYTE and MOVBYTS instructions do?

    I am trying to reverse the bytes in a long. A while ago we had ESWAP8 but this is gone now.

    I think you want:

    MOVBYTS d, #%%0123

    (Edit: dangit! I thought MOV while I typed ROL!)
  • Use MOVBYTS for a replacement for ESWAP8
    			movbyts	myreg,#%%0123
    
  • jmg wrote: »
    User Name wrote: »
    Or just faster. :)

    Tiny and with four fast P1 cogs - that's where I live.

    Remember, Chip has the Verilog configured so four fast P2 COGS is quite practical.

    Good point. A reduced P2 has an excellent chance of being made. Seems likely the package & pin count would be reduced too.
  • User Name wrote: »
    Good point. A reduced P2 has an excellent chance of being made. Seems likely the package & pin count would be reduced too.
    A big yes to all of the above - http://forums.parallax.com/discussion/164364/prop2-family/p1
  • ozpropdevozpropdev Posts: 2,688
    edited 2016-06-27 - 11:54:36
    Re: ROLBYTE instruction

    This instruction shifts he D register left by 8 bits and inserts the selected byte in S register in D[7:0]

    Edit: Treat the following example as a sequence of instructions.
    		mov	myreg,##$01234567
    		mov	myreg2,##$89abcdef
    
    		rolbyte	myreg,myreg2,#0		'results in myreg = $234567ef
    		rolbyte	myreg,myreg2,#1		'results in myreg = $4567efcd
    		rolbyte	myreg,myreg2,#2		'results in myreg = $67efcdab
    		rolbyte	myreg,myreg2,#3		'results myreg = $efcdab89
    

    Here are the Prop1 equivalents
    'equvalent P1 code for P2 "rolbyte myreg,myreg2,#0"
    
    		shl	myreg,#8
    		mov	temp,myreg2
    		and	temp,#$ff
    		or	myreg,temp
    
    'equvalent p1 code for P2 "rolbyte myreg,myreg2,#1"
    
    		shl	myreg,#8
    		mov	temp,myreg2
    		shr	temp,#8
    		and	temp,#$ff
    		or	myreg,temp
    
    'equvalent p1 code for P2 "rolbyte myreg,myreg2,#2"
    
    		shl	myreg,#8
    		mov	temp,myreg2
    		shr	temp,#16
    		and	temp,#$ff
    		or	myreg,temp
    
    'equvalent p1 code for P2 "rolbyte myreg,myreg2,#3"
    
    		shl	myreg,#8
    		mov	temp,myreg2
    		shr	temp,#24
    		and	temp,#$ff
    		or	myreg,temp
    

    Does this instruction and ROLWORD,ROLNIB really need to be SHLBYTE, SHLWORD & SHLNIB?

  • ozpropdev wrote: »
    		mov	myreg,##$01234567
    		mov	myreg2,##$89abcdef
    
    		rolbyte	myreg,myreg2,#0		'results in myreg = $234567ef
    		rolbyte	myreg,myreg2,#1		'results in myreg = $4567efcd
    		rolbyte	myreg,myreg2,#2		'results in myreg = $67efcdab
    		rolbyte	myreg,myreg2,#3		'results in myreg = $efcdab89
    

    Based on your description, should that be:
    		rolbyte	myreg,myreg2,#0		'results in myreg = $234567ef
    		rolbyte	myreg,myreg2,#1		'results in myreg = $234567cd
    		rolbyte	myreg,myreg2,#2		'results in myreg = $234567ab
    		rolbyte	myreg,myreg2,#3		'results in myreg = $23456789
    
Sign In or Register to comment.