Shop OBEX P1 Docs P2 Docs Learn Events
Smarter/faster array element indexing using indirect register addressing in PASM? — Parallax Forums

Smarter/faster array element indexing using indirect register addressing in PASM?

northcovenorthcove Posts: 49
edited 2014-08-26 22:32 in Propeller 1
I am trying to improve the performance of a PASM routine that continually processes registers which have originated as separate LONG arrays of user values in hub ram. My PASM implementation works correctly using MOVS and MOVD for register indexing but my approach has several drawbacks:

1. my actual implementation reads and writes registers in many different places; keeping track of all the instructions that need modification via MOVS / MOVD is onerous;
2. my approach requires additional temporary "working" registers that represent the current element of each vector I'm processing;
3. the overhead of code self-modification for simulating vector indexing before and every iteration is causing performance problems with my solution.

Here's an example that illustrates my issues:
dat                  
              org

main       'compute S[i]:=A[i]+B[i]+C[i]+S[i] repeatedly

:restart      movs          :rd_A, #A           'init A src reg
              movs          :rd_B, #B           'init B src reg
              movs          :rd_C, #C           'init C src reg
              movs          :rd_S, #S           'init S src reg
              movd          :wr_S, #S           'init S dst reg
              mov           _N, #N              'init counter
              mov           _P, par             'init hub ram dst reg

:compute      '      
:rd_A         mov           _A, 0-0             'read current A element into working register
:rd_B         mov           _B, 0-0             'read current B element into working register
:rd_C         mov           _C, 0-0             'read current C element into working register

              'do lots of read/write with A, B, and C here before they are added to S
              
:rd_S         mov           _S, 0-0             'read current S element into working register
              add           _S, _A              'add working A to working S
              add           _S, _B              'add working B to working S
              add           _S, _C              'add working C to working S
:wr_S         mov           0-0, _S             'write working S to pasm reg
              wrlong        _S, _P              'write working S to hub ram
              djnz          _N, #:increment     'operate on the next array elements
              jmp           #:restart           'start all over from the first array elements

:increment    add           :rd_A, #1           'incr A src reg
              add           :rd_B, #1           'incr B src reg
              add           :rd_C, #1           'incr C src reg
              add           :rd_S, #1           'incr X src reg
              add           :wr_S, DI           'incr X dst reg
              add           _P, #4              'incr dst hub ram ptr           
              jmp           #:compute

  A     long  0 [ N ]       'input vector inited by hub cog 
  B     long  0 [ N ]       'input vector inited by hub cog  
  C     long  0 [ N ]       'input vector inited by hub cog
  S     long  0 [ N ]       'input vector sum result in pasm cog
  DI    long  1 << 9        'reg val to incr dst reg       
  
  'my "working" registers
  _A    res
  _B    res
  _C    res
  _S    res
  _N    res
  _P    res

              fit
              

The single biggest issue is my loop needs to run as fast as possible but the overhead of the :restart and :increment instruction blocks is causing problems. (The project is a serial multiplexer that handles up to 32 input channels, up to 1megabaud input data rate, and offers delivers a 3 megabaud output data rate.) Is there a better way to achieve array-like indexing with PASM registers than what I'm doing?

Cheers,

Christopher

Comments

  • kuronekokuroneko Posts: 3,623
    edited 2014-08-11 16:53
    The three hub arrays (A, B, C) are never updated (based on the fragment you posted). Why do you keep adding them up (once should be fine)? Also, _A, _B and _C are written to and read from once. Merge those. Then you write _S to hub, why do you keep it in cog RAM as well?

    I'd also re-order the jumps slightly (but that's just me):
    wrlong        _S, _P              'write working S to hub ram
    
    :increment    add           :rd_A, #1           'incr A src reg
                  add           :rd_B, #1           'incr B src reg
                  add           :rd_C, #1           'incr C src reg
                  add           :rd_S, #1           'incr X src reg
                  add           :wr_S, DI           'incr X dst reg
                  add           _P, #4              'incr dst hub ram ptr           
                  djnz          _N, #:compute       'operate on the next array elements
                  jmp           #:restart           'start all over from the first array elements
    
  • northcovenorthcove Posts: 49
    edited 2014-08-11 17:17
    Thanks for your reply but they are not the performance improvements you are looking for (Jedi mind trick-hand wave.)

    That code is not intended to update hub ram or even do anything useful; it is a simple example just to show how a technique of indexing through PASM registers to access individual elements. I only wrote out S to hub ram to verify my example worked correctly.

    In reality, in my actual implementation A, B and C are being modified in the inner loop (hence the "do lots of read/write.." comment.)

    I need to find a better way of reading/writing registers as though they are elements in large arrays. Still learning how to best achieve this on that offers self-modifying instructions rather than more conventional indirect register addressing.

    My ideal is to merely adjust a base pointer and/or register using a single instruction each time through the main loop and then access current elements of A, B, C etc using a fixed offset from that main pointer / register. But unfortunately I'm not yet high enough on the PASM learning curve to achieve this.
  • Bill HenningBill Henning Posts: 6,445
    edited 2014-08-11 17:41
    The Propeller 1 does not have any index registers, indexed register modes, or auto increment/decrement addressing.
  • northcovenorthcove Posts: 49
    edited 2014-08-11 17:45
    ^Thanks Bill, can you think of a faster way to index through registers than what I'm doing?
  • Bill HenningBill Henning Posts: 6,445
    edited 2014-08-11 17:52
    Take a look at the documentation for hub ops, and re-organize your code to optimally interleave it.
    top:
    RDxxxxx
    ins1
    ins2
    RDxxxxx
    ins1
    ins2
    RDxxxxx
    ins1
    ins2
    WRxxxxx
    ins1
    jmp #top

    Note that once you are synchronized to the hub, each of the above RDxxxx blocks will take 16 clock cycles, ins1/2 essentially execute for free.

    If there are three instructions after a hub operation, it will waste 12 clock cycles - ie above executes in 64 clock cycles, but below takes 80

    top:
    RDxxxxx
    ins1
    ins2
    ins3 ***** effectively takes 16 clock cycles
    RDxxxxx
    ins1
    ins2
    RDxxxxx
    ins1
    ins2
    WRxxxxx
    ins1
    jmp #top



    (thanks to kuroneko for pointing out my silly mistake. Shows me I have not been writing enough crazy pasm lately)

    Basically, schedule your hub ops carefully :)
  • kuronekokuroneko Posts: 3,623
    edited 2014-08-11 18:32
    top:
    RDxxxxx
    ins1
    ins2
    ins3
    RDxxxxx
    ...

    Note that once you are synchronized to the hub, each of the above RDxxxx blocks will take 16 clock cycles, ins1/2/3 essentially execute for free.
    Neat trick.
  • Bill HenningBill Henning Posts: 6,445
    edited 2014-08-11 18:37
    LOL!!!!

    Yep.

    DUH!

    TWO spacer instructions!!!!

    I am getting old and forgetfull... fixing post.

    Thanks.
    kuroneko wrote: »
    Neat trick.
  • Phil Pilgrim (PhiPi)Phil Pilgrim (PhiPi) Posts: 23,514
    edited 2014-08-11 18:54
    Just iimagine how fast LMM would be if the three-spacer-instruction cha-cha actually worked!

    -Phil
  • northcovenorthcove Posts: 49
    edited 2014-08-11 18:58
    Take a look at the documentation for hub ops, and re-organize your code to optimally interleave it.

    Yeah, that's a neat trick - thanks.

    Got any tricks for faster r/w access of incremented PASM registers other than modifying instructions via MOVS/MOVD?

    Cheers,

    Christopher
  • msrobotsmsrobots Posts: 3,709
    edited 2014-08-11 19:06
    you can also replace complete instructions with a simple mov. Or use add to increment source and dest in one operation.

    Enjoy!

    Mike
  • Bill HenningBill Henning Posts: 6,445
    edited 2014-08-11 19:23
    not really

    well... kuroneko has used PHSA (if my memory is working this time) as a hub pointer, but that is crazy code.
    northcove wrote: »
    Yeah, that's a neat trick - thanks.

    Got any tricks for faster r/w access of incremented PASM registers other than modifying instructions via MOVS/MOVD?

    Cheers,

    Christopher
  • ChrisGaddChrisGadd Posts: 310
    edited 2014-08-11 19:32
    Your top post mentions up to 32 input channels, each of which I'm guessing stores data in its own array, and you're trying to find a routine to read from each array in turn? If that's accurate, then is there a way to have all of the input channels write to the same array, and simply interleave the registers that they write to? The first channel writes to longs 0, 32, 64... channel 2 gets 1, 33, 65... and so on. The PASM routine could then just start at the beginning and read sequentially down the array.
  • northcovenorthcove Posts: 49
    edited 2014-08-11 20:56
    ChrisGadd wrote: »
    Your top post mentions up to 32 input channels, each of which I'm guessing stores data in its own array, and you're trying to find a routine to read from each array in turn? If that's accurate, then is there a way to have all of the input channels write to the same array, and simply interleave the registers that they write to? The first channel writes to longs 0, 32, 64... channel 2 gets 1, 33, 65... and so on. The PASM routine could then just start at the beginning and read sequentially down the array.

    ^Yeah, that's closer to my real problem.

    My actual code scans multiple pins for delimited serial data not dissimilar to NMEA sentences from a GPS, for example. (In my application the data could be coming in at 500k-1Mbaud / 1kHz records over fibre optic cable, not 4800 baud / 1Hz records from a GPS.) The PASM code performs checks on the incoming bytes and writes the record to hub ram when records are complete. The receiver code also performs more complex preprocessing of these records before writing them to hub ram. Each receiver channel has about twenty fields associated with it: rx pin mask, ticks per bit, start char, stop char, bits remaining, checksum, error count, destination buffer address, destination buffer length, routing information, IO stats, etc. My single port implementation handles 1.5Mbaud input data just fine (tested with a multi-port transmitter I wrote that handles 3Mbaud and ~1k records per second per port.)

    Converting the single port code over to multi-port I'm finding that the MOVS/MOVD workaround for the absence of indexed register modes or auto incrementing is causing overhead that I wasn't anticipating writing PASM on the Prop. My multi-channel code requires ~twenty MOVS/MOVD before the first iteration to set the base source and destination registers to load the elements into working registers. Each sucessive iteration requires another ~twenty instructions to increment the source registers and yet another twenty instructions to save & increment the destination registers. And I'm trying to keep the port switching overhead to under 0.2uSecs !!
  • Bill HenningBill Henning Posts: 6,445
    edited 2014-08-11 21:25
    Keep in mind the instruction rate of the propeller, and the bit rates you are dealing with. There is no way that you could handle more than a few high speed channels per cog, even if the timing is perfectly predictable within each stream after the start bit. Assuming a standard start bit, 8 data bits, stop bit, and only sampling each bit cell in the middle, you *might* be able to handle 3 mbps streams per cog, at best, and I would not rely on being able to handle more than two.
    northcove wrote: »
    ^Yeah, that's closer to my real problem.

    My actual code scans multiple pins for delimited serial data not dissimilar to NMEA sentences from a GPS, for example. (In my application the data could be coming in at 500k-1Mbaud / 1kHz records over fibre optic cable, not 4800 baud / 1Hz records from a GPS.) The PASM code performs checks on the incoming bytes and writes the record to hub ram when records are complete. The receiver code also performs more complex preprocessing of these records before writing them to hub ram. Each receiver channel has about twenty fields associated with it: rx pin mask, ticks per bit, start char, stop char, bits remaining, checksum, error count, destination buffer address, destination buffer length, routing information, IO stats, etc. My single port implementation handles 1.5Mbaud input data just fine (tested with a multi-port transmitter I wrote that handles 3Mbaud and ~1k records per second per port.)

    Converting the single port code over to multi-port I'm finding that the MOVS/MOVD workaround for the absence of indexed register modes or auto incrementing is causing overhead that I wasn't anticipating writing PASM on the Prop. My multi-channel code requires ~twenty MOVS/MOVD before the first iteration to set the base source and destination registers to load the elements into working registers. Each sucessive iteration requires another ~twenty instructions to increment the source registers and yet another twenty instructions to save & increment the destination registers. And I'm trying to keep the port switching overhead to under 0.2uSecs !!
  • northcovenorthcove Posts: 49
    edited 2014-08-11 22:25
    Keep in mind the instruction rate of the propeller, and the bit rates you are dealing with. There is no way that you could handle more than a few high speed channels per cog, even if the timing is perfectly predictable within each stream after the start bit. Assuming a standard start bit, 8 data bits, stop bit, and only sampling each bit cell in the middle, you *might* be able to handle 3 mbps streams per cog, at best, and I would not rely on being able to handle more than two.

    Copy that. Realistically there will only be one or two high speed streams (>500kbaud) to handle. When streams are numerous they'll mostly be in the 38.4-115.2kbaud range. I have sufficient free cogs to allocate to specific streams when the bandwidth requires it. I'm also fortunate in that I can tweak the incoming baud to match my receivers because I developed the transmitters.

    One of my issues is that the PASM register layout is inherited from arrays in the the Spin cog's DAT section, eg:
    DAT
      'initialised by hub obj and copied to cog ram
      rxmask      long  0 [ max_ports ]    'rx rxmask for port
      rxbyte      long  0 [ max_ports ]    'current byte for port
      bitime      long  0 [ max_ports ]    'word 0: data bit time, word 1: start bit delay
      delims      long  0 [ max_ports ]    'byte 0: start char, byte 1: end char
      flags       long  0 [ max_ports ]    'word 0: cur flags, word 1: ini flags 
      headref     long  0 [ max_slots ]    'handle to head addr 
      headptr     long  0 [ max_slots ]    'addr of rcv buffer start
      tailref     long  0 [ max_slots ]    'handle to tail addr
      tailptr     long  0 [ max_slots ]    'addr of rcv buffer end
      bufcnt      long  0 [ max_ports ]    'curr user buffers for port
      ...
    
    In the single-to-multi-port conversion I've lazily resorted to using "working" registers that reflected the single port usage of registers. What I've ended up with this overhead before every channel loop:
    :rescan
                        'init base registers        
                        movs        :r_rxmask, #rxmask
                        movs        :r_rxbyte, #rxbyte
                        movd        :w_rxbyte, #rxbyte
                        movs        :r_nxtbit, #nxtbit
                        movd        :w_nxtbit, #nxtbit
                        movs        :r1_bitime, #bitime
                        movs        :r2_bitime, #bitime
                        movs        :r_recdelim, #delims
                        movs        :r_flags, #flags 
                        movd        :w_flags, #flags 
                        movs        :r_dstptr, #dstptr
                        movd        :w_dstptr, #dstptr 
                        movs        :r_tailptr, #tailptr
                        movs        :r_headref, #headref 
                        movs        :r_headptr, #headptr
                        movs        :r_tailref, #tailref
                        movs        :r_hsh_rx, #hsh_rx
                        movd        :w_hsh_rx, #hsh_rx
                        movs        :r_hsh_tx, #hsh_tx
                        movd        :w_hsh_tx, #hsh_tx
    

    And this overhead after every channel iteration:
    :incr_regs
                        add         :r_rxmask, #1
                        add         :r_rxbyte, #1
                        add         :w_rxbyte, _dri
                        add         :r_nxtbit, #1
                        add         :w_nxtbit, _dri
                        add         :r1_bitime, #1
                        add         :r2_bitime, #1
                        add         :r_recdelim, #1
                        add         :r_flags, #1 
                        add         :w_flags, _dri
                        add         :r_dstptr, #1
                        add         :w_dstptr, _dri
                        add         :r_tailptr, #1
                        add         :r_headref, #1 
                        add         :r_headptr, #1
                        add         :r_tailref, #1
                        add         :r_hsh_rx, #1
                        add         :w_hsh_rx, _dri
                        add         :r_hsh_tx, #1
                        add         :w_hsh_tx, _dri
    

    Yikes & yuck.

    Possibly a better solution can be found by reorganising the layout such that each channel has a base register and its fields are in adjacent registers. Not sure this would fix much the absence of register indexing/autoincrementing instructions, however.
  • ChrisGaddChrisGadd Posts: 310
    edited 2014-08-12 06:29
    yikes and yuck indeed. Have you considered unrolling the loop? That is instead of one loop that handles everything, break every iteration of the loop into its own routine? It would almost certainly be faster, and due to the number of registers that need updating might not be too much larger.
  • northcovenorthcove Posts: 49
    edited 2014-08-12 13:36
    I cannot think of any way to unroll the loop without making the implementation be limited to a fixed number of channels. I want my implementation to be flexible enough to handle 1 channel at 1Mbaud while also supporting up to 32 channels at much slower speeds. My application has both requirements.

    While my single port implementation was really fast, I'm having a complete rethink and starting over for the multi-port implementation. Given the Prop's self-modifying code capability next I will experiment with a template sequence of instructions (start bit scanning, data byte assembly) to handle a single channel and then upon entry have the cog duplicate that instruction template for additional channels with a one-time setting of each channel's instruction's source and destination registers. The return registers will be appropriately to process channels in sequence. I'm a n00b at PASM and this is my first time writing self-modifying code but I like the new direction it's taking me.
  • northcovenorthcove Posts: 49
    edited 2014-08-26 18:19
    ChrisGadd wrote: »
    yikes and yuck indeed. Have you considered unrolling the loop? That is instead of one loop that handles everything, break every iteration of the loop into its own routine? It would almost certainly be faster, and due to the number of registers that need updating might not be too much larger.

    Here's an update with my loop unrolling experiment for my multi-channel serial receiver. The project took almost two weeks longer than I expected. My excuse is I spent most of last week skiing in the mountains with my kids, my girlfriend, her kids and her ex. Also, writing self-modifying Propeller assembly for the first time using only a terminal program and 200MHz scope for debugging is without a doubt the slowest development I have ever endured.

    Initial performance results from testing from a single cog reading serial 8 data bits-1 stop bit, no handshaking for delimited records. For this project I am using records defined by a "$" start char and $D (carriage return) stop char as commonly used by NMEA 0183. The records used for testing had pseudo-random length between 12 and 96 bytes. My transmitter was emitting about 1,500 records per second at 1.5M Baud.
              Chans   Max Baud      MCSF*         
              -----  ---------   -----------
                1    1_500_000     1560KHz
                2      460_800      833KHz
                3      250_000      556KHz                   
                4      230_400      417KHz
                5      115_200      333KHz
                6      115_200      278KHz
    
    *Minimum Channel Sample Frequency reported by my scope when XORin a diagnostic pin each time a port's data pin was sampled.
    

    Here's what I did:

    1. Wrote a tiny PASM cog that supported reading and writing its registers from another Spin object using only 12 instructions and 2 data registers in the cog RAM.
    pub start
      return cognew ( @entry, @_req )
    
    pub get_reg ( reg_idx )
      result := reg_idx << 1
      _req := @result
      repeat while _req
    
    pub set_reg ( reg_idx, reg_val )
      result := reg_idx << 1 | 1
      _req := @result
      repeat while _req
      
    dat           org   'hub ram
    
    _req    long 0
    
    dat           org   'cog ram   
    
    entry               rdlong      _ptr, par       wz
            if_z        jmp         #entry          'no cmd, check again      
                        rdlong      _val, _ptr      'get cmd val     
                        shr         _val, #1        wc 'check set bit and shift to get reg index       
            if_nc       movd        $+2, _val       'set get_reg dest
            if_c        movd        $+3, _val       'set set_reg dest
            if_nc       wrlong      0-0, _ptr       'read reg value, copy to usr hub var
            if_c        add         _ptr, #8        'offset to reg_val param
            if_c        rdlong      0-0, _ptr       'write usr val from hub ram to reg
                        mov         _val, #0        'zero cmd
                        wrlong      _val, par       'reset cmd reg
                        jmp         #entry          'do it again
    
    _ptr    res
    _val    res
    

    This cog enabled me to use its registers $00E to $1EF for my own purposes. Once the registers were configured, I set register $000 with an instruction to execute the custom code that was created on-the-fly by a Spin object.

    2. Wrote a generic PASM template for the instructions and data for reading serial data from a single pin. For explanation purposes, here is the basic code for assembling bytes from bits.
    dat
    
    port_code_beg       tjnz    _rxbyte, #:check_time
            
                        'check start bit
    :seek_start         test    _rxmask, ina        wz
            if_nz       jmp     #0-0                'process next port
    
                        'got start bit
                        mov     _nxtime, cnt 
                        add     _nxtime, _sbtime
                        mov     _rxbyte, _hibyte
                        jmp     #0-0                'process next port                             
    
    :check_time         cmp    _nxtime, cnt         wc              
            if_nc       jmp     #0-0                'process next port                                          
    
                        'got data bit
                        test    _rxmask, ina        wz
                        shl     _rxbyte, #1         wc                             
            if_nz       or      _rxbyte, #1
            if_nc       jmp     #:got_byte
                        add     _nxtime, _bitime
                        jmp     #0-0                'process next port 
    
    :got_byte           shr     _rxbyte, #1         'shift out stop bit                         
                        rev     _rxbyte, #24        'reverse data bit order
    
                        'got complete byte here
                        call    #handle_byte
    
    :exit               mov     _rxbyte, #0        'start all over with new byte
                        jmp     #0-0               'process next port
    port_code_end
    
      _rxmask     long  0
      _bitime     long  0
      _sbtime     long  0
      _dbmask     long  0
      _nxtime     long  0
      _rxbyte     long  0
      _bufreg     long  0
      _rcvptr     long  0
      _endptr     long  0
    
    

    3. Then I wrote the Spin code to duplicate that PASM template for every serial channel and set the instruction register's source and destination bits customised for each channel. Every source bits for JMP #0-0 instruction was overwritten with the register number for the next channel's code. Oh yeah, I also wrote a little disassembler to see what was going on with the code I was writing on-the-fly. Here's a dump of the multi-receiver's PASM registers when configured for two channels starting at register $010:
    0010:00004000
    0011:000002B6
    0012:000003E9
    0013:00010000
    0014:00000000
    0015:00000000
    0016:0000002D
    0017:00000000
    0018:00000000
    0019:E87C2A20 TJNZ   111010 0001 015 #020
    001A:623C21F2 TEST   011000 1000 010  1F2
    001B:5C54003A JMP    010111 0001 000 #03A
    001C:A0BC29F1 MOV    101000 0010 014  1F1
    001D:80BC2812 ADD    100000 0010 014  012
    001E:A0BC2A0E MOV    101000 0010 015  00E
    001F:5C7C003A JMP    010111 0001 000 #03A
    0020:853C29F1 CMP    100001 0100 014  1F1
    0021:5C4C003A JMP    010111 0001 000 #03A
    0022:623C21F2 TEST   011000 1000 010  1F2
    0023:2DFC2A01 SHL    001011 0111 015 #001
    0024:68D42A01 OR     011010 0011 015 #001
    0025:5C4C0028 JMP    010111 0001 000 #028
    0026:80BC2811 ADD    100000 0010 014  011
    0027:5C7C003A JMP    010111 0001 000 #03A
    0028:28FC2A01 SHR    001010 0011 015 #001
    0029:3CFC2A18 REV    001111 0011 015 #018
    002A:5CFC1E0F CALL   010111 0011 00F #00F
    002B:A0FC2A00 MOV    101000 0011 015 #000
    002C:5C7C003A JMP    010111 0001 000 #03A
    002D:00001FFC
    002E:00001BD8
    002F:00001C38
    0030:00000000
    0031:00020000
    0032:000002B6
    0033:000003E9
    0034:00000000
    0035:00000000
    0036:00000000
    0037:0000004E
    0038:00000000
    0039:00000000
    003A:E87C6C41 TJNZ   111010 0001 036 #041
    003B:623C63F2 TEST   011000 1000 031  1F2
    003C:5C540019 JMP    010111 0001 000 #019
    003D:A0BC6BF1 MOV    101000 0010 035  1F1
    003E:80BC6A33 ADD    100000 0010 035  033
    003F:A0BC6C0E MOV    101000 0010 036  00E
    0040:5C7C0019 JMP    010111 0001 000 #019
    0041:853C6BF1 CMP    100001 0100 035  1F1
    0042:5C4C0019 JMP    010111 0001 000 #019
    0043:623C63F2 TEST   011000 1000 031  1F2
    0044:2DFC6C01 SHL    001011 0111 036 #001
    0045:68D46C01 OR     011010 0011 036 #001
    0046:5C4C0049 JMP    010111 0001 000 #049
    0047:80BC6A32 ADD    100000 0010 035  032
    0048:5C7C0019 JMP    010111 0001 000 #019
    0049:28FC6C01 SHR    001010 0011 036 #001
    004A:3CFC6C18 REV    001111 0011 036 #018
    004B:5CFC1E0F CALL   010111 0011 00F #00F
    004C:A0FC6C00 MOV    101000 0011 036 #000
    004D:5C7C0019 JMP    010111 0001 000 #019
    004E:00001FFE
    004F:00001C38
    0050:00001C98
    0051:00000000
    

    Notice that each channel's instructions have their source and destination bits independently configured, this is how I keep the channel switching overhead to a minimum. The receiver is put into action by setting register $000 to JMP #$019. My Spin code went through the PASM code registers for each channel's instruction block and fixed up the source and destination bits whereevery needed. JMPs to #0-0 were overwritten to jump to the next channel's instruction block. The data registers before each channel's instruction block contains the channel's settings: pin mask, bit ticks, next data bit ticks, received bits so far, trace mask, etc. The data registers following each code block contain the hub addresses of where and how the received data should be written for the user. (My implementation supports multiple receive buffers for each channel. If a receiver buffer is busy, eg. being processed by the user, the current record can be written to another buffer. This allows a single channel's records to be processed by multiple cogs, quite useful when records are coming in at 1KHz and the higher-level processing is being done in Spin.) I've deliberately omitted the code that does the processing of a received byte. This code is part of the generic template for the real project but I left it to make this explanation easier to understand. After more testing I'll post the receiver project to the Propeller objex.

    Cheers,

    Christopher
  • kuronekokuroneko Posts: 3,623
    edited 2014-08-26 18:33
    northcove wrote: »
    test    _rxmask, ina        wz
                        shl     _rxbyte, #1         wc                             
            if_nz       or      _rxbyte, #1
    
    This is more efficiently done with test wc and rcl.

    I'm also a bit worried about this bit:
    :check_time         cmp    _nxtime, cnt         wc              
            if_nc       jmp     #0-0                'process next port                                          
    
  • northcovenorthcove Posts: 49
    edited 2014-08-26 19:05
    Quite true, thanks. I've noticed Chip's FullDuplexSerial code using test wc & rcl too. I've deferred switching over because I'm using the C status from shl to determine when the byte is complete. _rxbyte is initialised with $ff00_0000; when the MSB is zero I know all the bits have been sampled. This approach eliminates the overhead of a bit counter.
  • northcovenorthcove Posts: 49
    edited 2014-08-26 19:07
    kuroneko wrote: »
    This is more efficiently done with test wc and rcl.

    I'm also a bit worried about this bit:
    :check_time         cmp    _nxtime, cnt         wc              
            if_nc       jmp     #0-0                'process next port                                          
    

    I was wondering how long it would take you to find that. Usually only takes me $ffffffff / 80_000_000 seconds maximum. ;)
  • kuronekokuroneko Posts: 3,623
    edited 2014-08-26 19:09
    northcove wrote: »
    I've deferred switching over because I'm using the C status from shl to determine when the byte is complete. _rxbyte is initialised with $ff00_0000; when the MSB is zero I know all the bits have been sampled.
    Whether you use shl wc or rcl wc doesn't really matter in this case. They both move D[31] into carry.
    northcove wrote: »
    I was wondering how long it would take you to find that. Usually only takes me $ffffffff / 80_000_000 seconds maximum. ;)
    Are you saying you put that there on purpose? :)
  • northcovenorthcove Posts: 49
    edited 2014-08-26 19:27
    Nice one, thanks. I knew the fault was in the code when I posted it, figured someone would flag it. Created it a couple days ago while hastily freeing up registers.

    Also, the 1.5MBaud MCSF* in my previous table should be 3.8MHz, not 1.56MHz.
  • northcovenorthcove Posts: 49
    edited 2014-08-26 22:25
    kuroneko wrote: »
    I'm also a bit worried about this bit:
    :check_time         cmp    _nxtime, cnt         wc              
            if_nc       jmp     #0-0                'process next port                                          
    

    Is there a better way than this? Would prefer to not introduce a temporary register.
                        mov     _tmp, _nxtime
                        sub     _tmp, cnt       
                        cmps    _tmp, #0            wc
            if_nc       jmp     #0-0
    
  • kuronekokuroneko Posts: 3,623
    edited 2014-08-26 22:32
    northcove wrote: »
    Is there a better way than this? Would prefer to not introduce a temporary register.
    Use cnt instead?
    mov     cnt, _nxtime
                        sub     cnt, cnt       
                        cmps    cnt, #0 wc
            if_nc       jmp     #0-0
    
    You could also use a normal counter which makes the comparison easier, i.e. run any counter in LOGIC.always so the above becomes e.g.
    cmp     _nxtime, phsa wc
    if_nc   jmp     #0-0
    
Sign In or Register to comment.