Shop OBEX P1 Docs P2 Docs Learn Events
Question about the PINx & DIRx registers and instructions AND, ANDN, OR, XOR behavour - Page 2 — Parallax Forums

Question about the PINx & DIRx registers and instructions AND, ANDN, OR, XOR behavour

2

Comments

  • David BetzDavid Betz Posts: 14,516
    edited 2013-05-29 09:32
    I think common usage will dictate that we don't have a real problem here. outa virtually never has to be read, except in r/m/w atomic operations, and we've got that covered. We've always had the rule that ina not be used on the left side of a Spin assignment. Adding the rule that outx not be used on the right does not seem burdensome to me. Pathological examples aside, such use would almost never occur in real life.

    -Phil
    I think this is only true if you're talking about Spin where Chip will add the shadow register. If you compile the statement below with PropGCC you will get two different results depending on whether you use optimization. To get the r/m/w code you will have to use -Os or some other form of optimization. Unoptimized code won't work because it will expand the assignment operator into a separate read and write of the register.
    PINA &= 7;
    

    Without optimization:
        15c8:	f80fbca0 			mov	1c <r7>, 7e0 <PINA>
        15cc:	070efc60 			and	1c <r7>, #7
        15d0:	07f0bfa0 			mov	7e0 <PINA>, 1c <r7>
        15d4:	f80fbca0 			mov	1c <r7>, 7e0 <PINA>
        15d8:	070efc60 			and	1c <r7>, #7
        15dc:	07f0bfa0 			mov	7e0 <PINA>, 1c <r7>
    

    With -Os optimization:
        15bc:	07f0ff60 			and	7e0 <PINA>, #7
        15c0:	07f0ff60 			and	7e0 <PINA>, #7
    

    The optimized code will work but the unoptimized code will fail. This will be difficult to explain to people since we normally use unoptimized code when debugging and we don't expect it to behave differently from optimized code other than being slower and bigger. In practice, PropGCC will probably have to provide inline assembly to update the PINx registers so the semantics are the same whether optimizing or not.
  • Phil Pilgrim (PhiPi)Phil Pilgrim (PhiPi) Posts: 23,514
    edited 2013-05-29 09:58
    cgracey wrote:
    And we'd have to apply the same rules for DIRx, in order to keep all operations atomic so that concurrent PASM code runs properly.
    I guess you could say the same about the counter registers, too, but I think concurrent PASM is a separate issue from the outx/inx/pinx conundrum. In the case of the other SFRs, including dirx, it should be the programmer's responsibility to make sure his code does not stumble over itself trying to make concurrent accesses. Also, atomic operations like &= should always be atomic in compiled code -- not just in optimized form.

    -Phil
  • cgraceycgracey Posts: 14,206
    edited 2013-05-29 09:59
    David Betz wrote: »
    I think this is only true if you're talking about Spin where Chip will add the shadow register. If you compile the statement below with PropGCC you will get two different results depending on whether you use optimization. To get the r/m/w code you will have to use -Os or some other form of optimization. Unoptimized code won't work because it will expand the assignment operator into a separate read and write of the register.
    PINA &= 7;
    

    Without optimization:
        15c8:	f80fbca0 			mov	1c <r7>, 7e0 <PINA>
        15cc:	070efc60 			and	1c <r7>, #7
        15d0:	07f0bfa0 			mov	7e0 <PINA>, 1c <r7>
        15d4:	f80fbca0 			mov	1c <r7>, 7e0 <PINA>
        15d8:	070efc60 			and	1c <r7>, #7
        15dc:	07f0bfa0 			mov	7e0 <PINA>, 1c <r7>
    

    With -Os optimization:
        15bc:	07f0ff60 			and	7e0 <PINA>, #7
        15c0:	07f0ff60 			and	7e0 <PINA>, #7
    

    The optimized code will work but the unoptimized code will fail. This will be difficult to explain to people since we normally use unoptimized code when debugging and we don't expect it to behave differently from optimized code other than being slower and bigger. In practice, PropGCC will probably have to provide inline assembly to update the PINx registers so the semantics are the same whether optimizing or not.

    In Spin, we can't even have shadow registers because it would preclude use of concurrent (multi-tasking) PASM code which will be doing atomic operations directly on PINx and DIRx. All operations in Spin on PINx and DIRx must be atomic (MOV/AND/OR/XOR PINx/DIRx,data) to make everything play together.

    I think that in C, you're going to need to work on shadow registers and then do 'MOV PINx,shadow' afterwards.
  • cgraceycgracey Posts: 14,206
    edited 2013-05-29 10:01
    I guess you could say the same about the counter registers, too, but I think concurrent PASM is a separate issue from the outx/inx/pinx conundrum. In the case of the other SFRs, including dirx, it should be the programmer's responsibility to make sure his code does not stumble over itself trying to make concurrent accesses. Also, atomic operations like &= should always be atomic in compiled code -- not just in optimized form.

    -Phil

    Because counter registers aren't mapped, you have only atomic access, so it's no problem.
  • Phil Pilgrim (PhiPi)Phil Pilgrim (PhiPi) Posts: 23,514
    edited 2013-05-29 10:07
    cgracey wrote:
    Because counter registers aren't mapped, you have only atomic access, so it's no problem.
    But what about
            getphsa x
            add     x,#10
            setphsa x
    

    -Phil
  • David BetzDavid Betz Posts: 14,516
    edited 2013-05-29 10:18
    cgracey wrote: »
    In Spin, we can't even have shadow registers because it would preclude use of concurrent (multi-tasking) PASM code which will be doing atomic operations directly on PINx and DIRx. All operations in Spin on PINx and DIRx must be atomic (MOV/AND/OR/XOR PINx/DIRx,data) to make everything play together.

    I think that in C, you're going to need to work on shadow registers and then do 'MOV PINx,shadow' afterwards.
    I think it would be better to just have inline functions or macros with embedded inline assembly to handle these registers. Otherwise, there is magic going on under the hood that won't be obvious to users. Also, it may be difficult to get the GCC compiler to treat the fake OUTA differently than other variables. Eric knows more about the code generator though so I may be wrong.
  • Phil Pilgrim (PhiPi)Phil Pilgrim (PhiPi) Posts: 23,514
    edited 2013-05-29 10:31
    Isn't concurrent access what locks are for? It seems that locks should work between Spin and concurrent PASM threads without a problem. But that should be left up to the app programmer to implement, not to the compiler writer.

    -Phil
  • cgraceycgracey Posts: 14,206
    edited 2013-05-29 11:03
    But what about
            getphsa x
            add     x,#10
            setphsa x
    

    -Phil

    We have instructions for that: ADDPHSA/ADDPHSB/SUBPHSA/SUBPHSB.

    It's the pins, though, that are the prime resources that must be accessible without caveats by concurrent programs.
  • David BetzDavid Betz Posts: 14,516
    edited 2013-05-29 11:06
    cgracey wrote: »
    We have instructions for that: ADDPHSA/ADDPHSB/SUBPHSA/SUBPHSB.

    It's the pins, though, that are the prime resources that must be accessible without caveats by concurrent programs.
    When you say "concurrent" I assume you're talking about multiple threads running on a single COG. Is that right? Are you planning on supporting that in Spin2?
  • Phil Pilgrim (PhiPi)Phil Pilgrim (PhiPi) Posts: 23,514
    edited 2013-05-29 11:19
    cgracey wrote:
    We have instructions for that: ADDPHSA/ADDPHSB/SUBPHSA/SUBPHSB.
    I realize that and should have used a different example, e.g. and x,_0xffff or somesuch, instead of add.

    Anyway, locks for accesses that aren't real-time critical. Atomic access for those that are. I do think that Spin programmers should have flexibility in their use of dirx, though, with the caveat that atomic operations are necessary for real-time-critical concurrency.

    For example, A spin program may be blinking LEDs, while a concurrent PASM program is using dira or outa to control a pair of I2C lines. In that case, locking would be adequate for access to dira or outa, since neither task is real-time critical.

    I guess if a programmer is advanced enough to multitask Spin and PASM, they should be able to handle the concurrency requirements on their own.

    -Phil
  • David BetzDavid Betz Posts: 14,516
    edited 2013-05-29 11:23
    I realize that and should have used a different example, e.g. and x,_0xffff or somesuch, instead of add.

    Anyway, locks for accesses that aren't real-time critical. Ataomic access for those that are.

    -Phil
    Since every COG has its own counters I assume you're talking about multiple threads running on a single COG. Can they use locks or would that result in a deadlock while one thread waits for a lock that is owned by another on the same COG?
  • Phil Pilgrim (PhiPi)Phil Pilgrim (PhiPi) Posts: 23,514
    edited 2013-05-29 11:33
    David Betz wrote:
    Since every COG has its own counters I assume you're talking about multiple threads running on a single COG.

    Yes.
    Can they use locks or would that result in a deadlock while one thread waits for a lock that is owned by another on the same COG?
    Since locks are polled, waiting for a lock will not freeze the task sequencing. It's not as if you're using a waitlock atomic instruction that brings everything to a screeching halt

    -Phil
  • cgraceycgracey Posts: 14,206
    edited 2013-05-29 12:49
    David Betz wrote: »
    When you say "concurrent" I assume you're talking about multiple threads running on a single COG. Is that right? Are you planning on supporting that in Spin2?

    That's right. The Spin2 interpreter only takes about 25% of the register space, so there is room for big in-line assembly programs, as well as terminate-and-stay-resident multitasking programs.
  • David BetzDavid Betz Posts: 14,516
    edited 2013-05-29 12:52
    cgracey wrote: »
    That's right. The Spin2 interpreter only takes about 25% of the register space, so there is room for big in-line assembly programs, as well as terminate-and-stay-resident multitasking programs.
    Will it be possible to run Spin in more than one thread at the same time?
  • pedwardpedward Posts: 1,642
    edited 2013-05-29 13:02
    cgracey wrote: »
    That's right. The Spin2 interpreter only takes about 25% of the register space, so there is room for big in-line assembly programs, as well as terminate-and-stay-resident multitasking programs.

    And cached user defined functions? It would be nice to write functions in PASM that can be called from SPIN, so it would be like extending the SPIN language directly.

    From a compiler standpoint, it *is* possible to get rid of the parentheses that surround parameters so that you can define functions like LOOKDOWN a,b or TRANSMUTE x,y,z and they look like language native keywords.
  • cgraceycgracey Posts: 14,206
    edited 2013-05-29 13:25
    David Betz wrote: »
    Will it be possible to run Spin in more than one thread at the same time?

    I wasn't planning on it, but it is possible. The limiter is that the stack RAM is being used as the Spin stack and its 256 words is plenty for a single thread, but to support, say, 4 threads, they'd only get 64 longs each - which is adequate for most things, but it's getting tight.
  • AribaAriba Posts: 2,690
    edited 2013-05-29 18:45
    Why does nobody answer to the arguments of Seairth? I think I see it the same way.

    Why is OUTA := INA & 7 different from OUTA := OUTA & 7 ?
    For pins set to output you will read the states of the output latch with ina, otherwise you have a serious hardware problem (a short or another output drives the output of the Prop pin concurrently).

    The problem I see is that for pins set to input, the bits in outa may change with outa := ina & 7. This is not a problem as long as the pins stay inputs. But if you switch a pin from input to output, you can not be sure what state the output latch has, so you need to write to outa every time right before you switch a pin to output.

    The biggest problem may occure with pins handled as open-drain, like in PS/2 or I2C driver:
    P1
      outa[sda..scl] := %00
      dira[sda..scl] := %11
      ...
      dira[scl] := 1         'clock low
      dira[sda] := value     'data out
      dira[scl] := 0         'clock high
    
    
    P2
      pina[sda..scl] := %00
      dira[sda..scl] := %11
      ...
      pina[scl] := 0
      dira[scl] := 1         'clock low
      pina[sda] := 0
      dira[sda] := value     'data out
      dira[scl] := 0         'clock high
    

    Andy
  • Phil Pilgrim (PhiPi)Phil Pilgrim (PhiPi) Posts: 23,514
    edited 2013-05-29 18:49
    Andy,

    IIRC, P2 pins have an open drain option, so the dira two-step is unnecessary.

    -Phil
  • Cluso99Cluso99 Posts: 18,069
    edited 2013-05-29 19:31
    On the P1 I drive the outa and dira when accessing my SRAMs.

    For instance, I do this...
    mov outa, addr
    mov dira, addrena
    ...
    or outa, data
    or dira, dataena (sometimes mov dira,addrdataena sometimes or dira,#xx)

    It appears this all still works correctly on P2.

    I also have thought about using this (not sure if I actually have done it or not atm)
    add outa,#1 (increments the address pins)

    From what Chip says, I believe this would work fine too.

    But what will not work is this... (remember INA and OUTA are actually PINA on P2, but I am using INA and OUTA to show the problem)
    mov tmp,OUTA (because it reads INA)
    xor tmp,invert (just invert some pins)
    mov OUTA,tmp

    Phil's suggestion above will flag the "mov tmp,OUTA" with an error, so it would be caught as an error.

    I really like the single PINx combined registers particularly because it saves valuable cog space. Therefore, if there has to be a silicon change, my preference would be to add a specific instruction to read the OUTx registers like...
    GETOUTx D/#n
    or
    GETOUTx D
    Instruction space in '000011' opcode with S=000000010 and using WZ & WC bits to encode OUTa/b/c/d or
    Instruction space in '000011' opcode with S=01000xxxx are available.
  • AribaAriba Posts: 2,690
    edited 2013-05-29 20:14
    Cluso99 wrote: »
    ....
    But what will not work is this... (remember INA and OUTA are actually PINA on P2, but I am using INA and OUTA to show the problem)
    mov tmp,OUTA (because it reads INA)
    xor tmp,invert (just invert some pins)
    mov OUTA,tmp
    ...
    Why does this not work?
    If you have set DIRA before with the right output pins then it should work.
    It will not work if you do it before you have set the pins to outputs. So you can not prepare the OUTA states and then set DIRA to enable the outputs.

    @Phil
    I don't see an open-drain mode in the pin modes, but it does not really matter. What I mean is that for open-drain pins it is a normal case that a High can be overwritten by a LOW from an other device, and then the INA and OUTA have not the same state, which can make problems.

    Edit: I see now that you can set the drive strength for Low and High separatly per Pin, so one of the possible cases is in fact an open-.drain mode.


    Andy
  • Cluso99Cluso99 Posts: 18,069
    edited 2013-05-29 22:33
    Ariba wrote: »
    Why does this not work?
    If you have set DIRA before with the right output pins then it should work.
    It will not work if you do it before you have set the pins to outputs. So you can not prepare the OUTA states and then set DIRA to enable the outputs.
    Andy

    Andy: When you perform the MOV TMP,OUTA, what you are actually reading are the physical pin conditions.
    Here is an example...
    MOV OUTA,#$F
    MOV DIRA,#$F0
    Let us say that P0 is an input and is currently "1" and P2,3+4="0" set by off-chip pullup and pulldowns.
    MOV TMP,PINA (this is MOV TMP,INA but what we require is MOV TMP,OUTA)
    So we read (lower 8 bits only shown)
    11_0001
    But we expected 11_0000
    Now we want to invert P0 and P1 to the inverse of what we previously set...
    XOR TMP,#$03 (expected result = 11_0011 but we have 11_0010)
    MOV PINA,TMP (this is MOV OUTA,TMP)

    This is the basic error. Using single bits allows us to use the xxxPinx intructions. But multiple bits does not. This works correctly if you replace those last 3 instructions with
    XOR PINA,#$03 (because OUTA is read for read/modify/write, not INA)

    No actual problem occurs until we follow with...
    MOV DIRA,#$FF (when P0 & P1 become outputs. P0 is incorrect!)
  • AribaAriba Posts: 2,690
    edited 2013-05-30 00:27
    Cluso
    Yes, sure.
    Is this not exactly what I have said in the statement you have quoted ?

    Andy
  • Cluso99Cluso99 Posts: 18,069
    edited 2013-05-30 01:37
    Andy: It wasn't what understood from your statement, hence the explanation. I am pleased we both agree.
  • cgraceycgracey Posts: 14,206
    edited 2013-05-30 08:58
    I've been taking a break from documentation to get Spin to resolved, but here are pin the configuration modes:

    PinModes.bmp

    Note that both high and low output states are individually settable to fast/slow/1500ohm/10kohm/100kohm/100ua/10ua/float.

    You can set the output states (by writing PINx) before enabling the outputs (by writing DIRx), so that when outputs become enabled, pins immediately head to their output states.
  • Phil Pilgrim (PhiPi)Phil Pilgrim (PhiPi) Posts: 23,514
    edited 2013-05-30 09:18
    Thanks, Chip! A couple questions:

    1. The C bit applies to both input and output for clocking, according to your chart. I assume that input clocking is for synchronization to ensure that the signal "arrives" simultaneously to all cogs. But what would be an example of an unclocked output?

    2. In the P1, outputs from the various cogs are ORed. Given that there are so many output impedance options (not to mentioned DACs) in the P2, how are the outputs from the various cogs combined?

    -Phil
  • cgraceycgracey Posts: 14,206
    edited 2013-05-30 09:42
    Thanks, Chip! A couple questions:

    1. The C bit applies to both input and output for clocking, according to your chart. I assume that input clocking is for synchronization to ensure that the signal "arrives" simultaneously to all cogs. But what would be an example of an unclocked output?

    2. In the P1, outputs from the various cogs are ORed. Given that there are so many output impedance options (not to mentioned DACs) in the P2, how are the outputs from the various cogs combined?

    -Phil

    1) The clocking is not to ensure that all cogs get the signal on the same clock, but to tighten up feedback-related phenomena which is otherwise loose, having to suffer bidirectional delays to and from cogs.

    2) The OUTs are still OR'd. In many pin modes, though, OUT is not used, but the input state of the pin, itself. For example, you can use Schmitt mode to build a 1-pin relaxation oscillator. Add clocking and you've got a cycle-accurate quantizer.
  • Phil Pilgrim (PhiPi)Phil Pilgrim (PhiPi) Posts: 23,514
    edited 2013-05-30 10:28
    cgracey wrote:
    The OUTs are still OR'd. ...
    I guess I'm still confused, then, about how outputs of various impedances are ORed. If cog 0 outputs a high at 10k and cog 1 outputs a high at 1.5k, what is the net impedance of the output on the pin?

    -Phil
  • cgraceycgracey Posts: 14,206
    edited 2013-05-30 10:35
    I guess I'm still confused, then, about how outputs of various impedances are ORed. If cog 0 outputs a high at 10k and cog 1 outputs a high at 1.5k, what is the net impedance of the output on the pin?

    -Phil


    Oh, sorry! The single-bit OUTs from all the cogs are OR'd together for each I/O pin, just like on Prop1. Each I/O pin, though, is configured with a 13-bit word sent to it serially via the CFGPINS instruction. Each pin's configuration determines how its OUT signal is used, and how its IN signal is derived.
  • Phil Pilgrim (PhiPi)Phil Pilgrim (PhiPi) Posts: 23,514
    edited 2013-05-30 11:16
    Ah so! I wasn't understanding that the pin modes were global, rather than per cog.

    Thanks,
    -Phil
  • cgraceycgracey Posts: 14,206
    edited 2013-05-30 11:42
    Ah so! I wasn't understanding that the pin modes were global, rather than per cog.

    Thanks,
    -Phil

    That's right. They are per-pin.
Sign In or Register to comment.