Machine Code Question ... will it work?

Tapperman · 2015-03-01 10:47

Greetings,

I'm trying to write a tool that requires fast study of input pins. I'm hoping somebody may see a way to improve the 'speed' of the main pin study section. Any errors pointed out will be a great help also.

I hope to test it on a I2C conversation, by installing two of these objects at the same time. It should work because the edges are recorded as clock 'ticks' ... which are global?

Thank you in advance for all your attention and input

... Tim

Duane Degn · 2015-03-01 12:01

Tapperman wrote: »

Any errors pointed out will be a great help also.

Does the code work?

It seems like the Start method would immediately stop the cog soon after it was started.

PUB Start(Ptr, aPin)
  Long[Ptr] := |< aPin
  if (cog := cognew(@Capture, Ptr) + 1) > 0
    repeat until Long[Ptr] <> aPin
    stop

It looks like you use aPin to set a mask but then compare the mask against the pin number.

Doesn't this stop the cog before it has a chance to do its job?

How about adding a line to save the mask?

PUB Start(Ptr, aPin)
  Long[Ptr] := |< aPin
  [b]aPin := Long[ptr][/b]
  if (cog := cognew(@Capture, Ptr) + 1) > 0
    repeat until Long[Ptr] <> aPin
    stop

I think the above code probably does what you want.

evanh · 2015-03-01 13:45

grr, I give up, don't my spin ...

Tapperman · 2015-03-01 15:40

Duane Degn wrote: »
How about adding a line to save the mask?
PUB Start(Ptr, aPin)
  Long[Ptr] := |< aPin
  [b]aPin := Long[ptr][/b]
  if (cog := cognew(@Capture, Ptr) + 1) > 0
    repeat until Long[Ptr] <> aPin
    stop
I think the above code probably does what you want.

Great catch ... no I'm not home, until this evening ... will try it out then .... but, I originally was passing the pin number and converting it to a mask in assembly. Then decided to just pass it the mask instead ... but failed to check the repeat/until structure for exit condition. The logic was, a pin mask would be nonzero value. And if I passed pin0 as a parameter (0) then I would have to come up with another exit strategy?

Many thanks for catching that ... hoping that someone may know if the assembly loop for detecting an edge was fast enough? Or could me made faster yet ... since I'm still pretty green at assembly on the prop.

... Tim

Duane Degn · 2015-03-01 17:41

You could speed up your loops a bit by using "decrement, jump not zero" djnz instead of comparing an accumulating counter to a set value.

Here's the code to write the data to hub.

mov       temp, zero
              mov       index, buffer
cloop         add       temp, #1
              add       xfer, _512
              add       index, #4
xfer          wrlong    0_0, index
              sub       temp, #127              wc, nr
        if_c  jmp       #cloop

Here's the same loop using djnz.

mov       temp, #127
              mov       index, buffer
cloop         add       xfer, _512
              add       index, #4
xfer          wrlong    0_0, index
              djnz      temp, #cloop

The djnz command doesn't increase the speed of the program in this example since the loop misses the hub window, The command djnz could speed up the main loop (a little (by removing two instructions)) but I didn't study your code enough to know if djnz would make much of a difference.

I think djnz is a nice command to be aware of. It trims the three instructions used to monitor the loop down to a single instruction.

Tapperman · 2015-03-01 19:21

@Duane ... I will definitely use DJNZ for the capture buffer write ... But, I'm thinking about the two timers in each cog available.

If I made one a 'high' detector ... and one a 'low' detector .... could I sense the change of state, and then subtract PHSA from CNT and get the edge tick time to within one clock 'tick' time?

I did switch around my test for edge detection and reduced it to 5 machine cycles for loop time (when no edge is present) ... but that amounts to 20 clock ticks per pass. Seems like your CNT results could be off somewhat. Here's what I changed in that section:

:NewAnchor    mov       temp, cnt
              add       temp, delay
              
:NoChange     mov       Cstate, ina
              and       Cstate, mask
              sub       last, Cstate            wz, nr
              sub       temp, cnt               wc, nr
  if_nc_and_z jmp       #:NoChange
        if_c  jmp       #getout

What do you think about the timer thing?

... Tim

Duane Degn · 2015-03-01 19:30

Tapperman wrote: »

What do you think about the timer thing?

I've only used counters when copying examples from the PEK or similar sources. I haven't used them from within a PASM loop. Hopefully someone else will chime in on your plan to count pulses.

I made the "two instructions shorter" remark based on the last loop of the program. I didn't study the earlier part enough to know if the first loop could also be reduced. I (apparently incorrectly) assumed it used the same technique.

Tapperman · 2015-03-01 21:02

Duane Degn wrote: »

I haven't used them (COG Timers) from within a PASM loop. Hopefully someone else ...

Me either ... but I did write a SPIN version (for fun only) auto-baud detector, that trains on the first 'space-bar' received:

PRI AutoBuad                           '' runs in its own cog
  {   This routine looks for a space-bar character before training
      on the serial stream. After figuring out the baud rate the
      top level cog will remove it from the cpu.
      
     'space-bar' character (ASCII '32') looks like this:

     
                          | |<--- contains clock speed info
     Rcv Pin: &#61574;&#61574;&#61574;&#61574;&#61574;&#61573;&#61569;&#61569;&#61569;&#61569;&#61569;&#61569;&#61570;&#61574;&#61573;&#61569;&#61569;&#61570;&#61574;&#61574;&#61574;&#61574;&#61574;  ---> time
      (#31)    1    000000 1 00  1 --> stop bit never changes position
               ^    ^\----+---/        if so, frame error gen.
               |    |     |
          Line Idle |     \----------> byte in frame (LSB to MSB bits)
        Always '1'  |
                    \----------------> start bit always '0'
                    
     We see that there are 6 zeros (w/start bit) before the
     first high bit, then two trailing zeros before the stop bit.

     This routine uses two fast timers to integrate needed info.
  }

  waitcnt(clkfreq + cnt)
  
' start counters
  frqa~
  ctra := %11010 << 26 + Rcv_Pin                        'time bit = 1
  frqb~
  ctrb := %10101 << 26 + Rcv_Pin                        'time bit = 0
  Phsb~
  repeat until Phsb
    Phsa~
    Frqb++
    waitpne(|< Rcv_Pin,|< Rcv_Pin,0)          ' wait for start bit (zero)
    Frqa++                                    ' begin adding 1 
    waitpeq(|< Rcv_Pin,|< Rcv_Pin,0)          ' wait for first high bit
    Frqb := -3                                ' on next low bit count by (-3)
    waitpne(|< Rcv_Pin,|< Rcv_Pin,0)          ' wait for low bit
    frqa~                                     ' stop counting high bit time
    waitpeq(|< Rcv_Pin,|< Rcv_Pin,0)          ' wait for stop bit
    frqb~                                     ' stop counting low bit time
    if ||Phsb > Phsa                          
      Phsb~                                   ' not space-bar, clear & go again
    else
      Phsb := 1                               ' space-bar sensed, indicate all done!
    repeat until ina[Rcv_Pin]
      waitcnt(clkfreq/10 + cnt)               ' settle time needed.
      
  ' upon exit of the repeat loop, only a space-bar is sensed
  ' and we can now trust phase-a to hold one bit time only!
  
  case phsa           ' calc @ 80MHz & +or- 2% xtal error
    01361..01417: Baud := 57_600
    01418..01457: Baud := 56_000
    02042..02125: Baud := 38_400
    02126..02136: Baud := 38_200
    02509..02611: Baud := 31_250
    02722..02833: Baud := 28_800
    04083..04250: Baud := 19_200
    05444..05666: Baud := 14_400
    08167..08500: Baud :=  9_600
    16333..17000: Baud :=  4_800
    32667..34000: Baud :=  2_400
    65333..68000: Baud :=  1_200
    other:        Baud := -1

Anyway, when counting 'zeros' ... in counts by 'one' (at the beginning) ... then after the first '1' is detected, it counts 'zeros' by negative three (-3) ... a kind of dual slope integrator?

Anyway, I'm playing with an idea right now ... see if I can code it??

... Tim

PS - I'll update w/snippets

Tapperman · 2015-03-03 06:09

Follow up, and greetings once again

I have a code snippet here that shows the direction I'm thinking of going with the timers:

'-------------------------------------------------------------
              mov       lastedge, cnt
              and       original, #1            wz, nr
        if_z  jmp       #:Low
        
:Hi     if_nz add       lastedge, phsb
        if_nz mov       phsb, zero
        if_nz call      #record_edge
              sub       phsa, delay             wc, nr
              xor       phsb, zero              wz, nr
  if_c_and_nz jmp       #:Low
        if_nc jmp       #getout
              jmp       :#Hi

:Low    if_nz add       lastedge, phsa
        if_nz mov       phsa, zero
        if_nz call      #record_edge
              sub       phsb, delay             wc, nr
              xor       phsa, zero              wz, nr
  if_c_and_nz jmp       #:Hi
        if_nc jmp       #getout
              jmp       :#Low
'-------------------------------------------------------------

The timers are initialized as:

CTRA --> Counts cpu clock ticks while PIN is high. Stop counting when low.
CTRB --> Counts cpu clock ticks while PIN is low. Stop counting when high.

The transitions are sensed by the timer who's phase suddenly starts advancing from zero. Which means the other timer has stopped counting cpu ticks. Adding this other phase value to LASTEDGE should tell us the point where it switched ... and record that edge in the capture buffer.

Is there a better way? I'm not real savvy on these timers folks.

... Tim

ksltd · 2015-03-03 07:53

Tim,

You really haven't said anything about what you're trying to accomplish - "fast study of input pins" doesn't really mean anything except, perhaps, to you. A good problem statement would really help.

It seems from various stuff that you're trying to detect pulse widths. That's certainly possible, but only if the minimum pulse width's you're guaranteed to see are longer than perhaps 10ish instructions. If your signal has any glitches, your pulse detector can easily get confused, stuck and report bogus results.

Similarly, you don't explain your criteria for "better" - is that shorter minimal pulse widths? Less memory? Spin vs. assembly language? Fewer cores?

If you're trying to auto-detect baud rate, there are a number of considerations. The choice of character used becomes more important as the baud rate goes up and the distance traveled goes up. Knowing how much control you have over the "other" end of the communication link could also help.

Tapperman · 2015-03-03 11:34

ksltd wrote: »

Tim,

You really haven't said anything about what you're trying to accomplish - "fast study of input pins" doesn't really mean anything except, perhaps, to you. A good problem statement would really help.

Everybody is a critic these days! If you look at the original message to this thread ... I say

I'm trying to write a tool

Most of the people I know ... understand this statement ... and since I included a file that was written in assembler for the prop ... Me personally, If you posted that thread, I would assume you were trying to write a assembly version for your tool. However, since not everyone thinks the way I do, I should write a problem statement as you suggest:

I wish to write an 'assembler' capture object that records the first 128 'edges' of any burst of activity on a given pin. Each edge should be recorded as the system clock 'tick' value at the instant the pin under study changed state. Also, if more than 128 edges should be detected, each edge should be tallied and the very last edge clock 'tick' value should be recorded so the user can calculate the burst length. And once complete the cog running this code should be 'freed' for other uses.

To this end, I have started a file (which is attached to this thread) that attempts to achieve this. Knowing ahead of time, that writing to HUB ram could cause the tool to miss 'edges' ... I wrote the first version to capture the edges in COG ram .. and then later xfer to HUB ram. This implies the user must pass a pointer to a buffer of at least 128 longs in HUB ram.

My very first version used WAITPNE and the read the CNT global variable ... never off in edge value by more the 4 machine cycles. But that approach does not allow me to 'test' for the end of a 'burst' ... it just hangs that COG and doesn't function until an edge occurs on the study pin.

So, that was the reason for posting the snippet for an idea I had employing the two timers each cog has at its disposal. This could possibly get much closer to the transition time ... without the use of WAITPNE.

Now, to its use ... In my first post I said:

I hope to test it on a I2C conversation, by installing two of these objects at the same time. It should work because the edges are recorded as clock 'ticks' ... which are global?

In other words ... when plotted/graphed you should be able to see the relationship between the two signals.

I hope that clears up the questions most of you may have had that encountered this mysterious thread.

... Tim

Duane Degn · 2015-03-03 11:46

Tapperman wrote: »

I hope to test it on a I2C conversation, by installing two of these objects at the same time. It should work because the edges are recorded as clock 'ticks' ... which are global?

Tim, depending on whether you want to do this the easy way or the fun way, you might want to consider one of the many inexpensive logic analyzers available.

There are logic analyzers available for less than $20 which will capture I2C communication and display the data in the communication on the PC screen.

Of course using a commercial LA doesn't require PASM so isn't going to be as much fun to use as writing your own analyzer.

(In case you're not aware, there is Propeller based LA code available.)

ksltd · 2015-03-03 15:14

You still don't say anything about the minimum pulse widths you're trying to capture. If they're small, what you're trying to do simply isn't possible. You'll get erroneous results and not be able to detect that you got them.

Do you know anything about the signal you're trying to analyze?

Tapperman · 2015-03-03 15:48

Duane Degn wrote: »

Tim, depending on whether you want to do this the easy way or the fun way, you might want to consider ...

The Fun way ALWAYS!!! That's what I bought it for ... it's a hobby, plain and simple!

Duane Degn wrote: »

There are logic analyzers available for less than $20 which will capture I2C communication and display the data in the communication on the PC screen.

Again ... I don't actually need to analyze I2C conversations ... I said "I should be able to ..." with this tool, as a kind of running test.

I never should have mentioned I2C at all ... it tends to confuse the readers, who are (in their heart) trying to help. And the subject goes off topic ... and the thread more or less gets hi-jacked ... we've all seen that one happen.

I think I may even be guilty of that one myself, in the past?

Anyway, returning to the subject at hand ... did you understand any of the timer snippet?

Because if you did ... forget it! I found an even easier approach! I think I can use a second pin and generate a square wave output and adjust my WAITPNE mask to include the study pin and the pin generating a square wave. Read the edge CNT immediately after the waitpne and determine which pin caused the interruption? This would keep me from 'hanging' the cog, free up one of the timers, while aligning the CNT with the edges that I record. I don't know if I could get it any faster reading than that?

Working on code ... later

... Tim

Tapperman · 2015-03-03 17:44

ksltd wrote: »

You still don't say anything about the minimum pulse widths you're trying to capture. If they're small, what you're trying to do simply isn't possible. You'll get erroneous results and not be able to detect that you got them.

Do you know anything about the signal you're trying to analyze?

I agree that at some point "what you're trying to do simply isn't possible" ... but I'm not so sure about your comment "... and not be able to detect that you got them"

If I use 2 timers and set them on edge detection ... One high one low ... if the total edges did not match what the capture routine sensed as 'total edges', then I would be able to tell what I captured was total 'Smile'.

And, I plan on using that scheme to send a test signal into the routine with a 'square wave' to determine at just what frequency that happens at? A kind of feed back.

... Tim

PS - good input!

Tapperman · 2015-03-09 09:26

ksltd wrote: »

... You'll get erroneous results and not be able to detect that you got them ...

In an attempt to catch those 'erroneous' results you refer to I have a written this piece of code to run in a cog at the same time the capture cog is running.

The theory of operation is the edge detection mode (timer A, positive edge. timer B, negative edge) combined with adding 512 instead of 1 on each edge ... and writing the clock 'tick' count to a capture buffer at the address generated by adding both PHSA and PHSB.

If, any edges are missed in the other cog capturing counts ... this capture buffer will have a record that can be studied to prove the other cogs data is valid or junk.

However, I must be doing something wrong ... as it does not seem to operate? Where oh where did my code go wrong?

... Tim

Tapperman · 2015-03-14 17:16

ksltd wrote: »

You'll get erroneous results and not be able to detect that you got them.

Just playing around and was curious ... has anyone ever mapped their logic in cog ram for an app?

Here's an example: dira mask for low order 9 bits (%0_0011_1100)

DAT

'Logic concept
{
                4 bits       2 bits
            &#9484;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9488;   &#9484;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9488;
   I/O pins &#9474; state#   &#9474;   &#9474; input &#9474;----+
            &#9492;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9496;   &#9492;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9496;    |
                                        |
                        SCK-SDA         |
            &#9484;&#9472;&#9472;&#9472;&#9472;&#9472;&#9516;&#9472;&#9472;&#9472;&#9472;&#9472;&#9516;&#9472;&#9472;&#9472;&#9472;&#9472;&#9516;&#9472;&#9472;&#9472;&#9472;&#9472;&#9488;   |
      State &#9474; 00  &#9474;  01 &#9474;  11 &#9474;  10 &#9474; <-+
      &#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9532;&#9472;&#9472;&#9472;&#9472;&#9472;&#9532;&#9472;&#9472;&#9472;&#9472;&#9472;&#9532;&#9472;&#9472;&#9472;&#9472;&#9472;&#9532;&#9472;&#9472;&#9472;&#9472;&#9472;&#9508;
        0   &#9474; 0/36&#9474; 1/24&#9474; 3/0 &#9474; 2/4 &#9474;
        4   &#9474; 4/8 &#9474; 5/36&#9474; 7/0 &#9474; 6/4 &#9474;
        8   &#9474; 8/8 &#9474; 9/12&#9474;11/36&#9474;10/16&#9474;   <-- each table entry shows
       12   &#9474;12/8 &#9474;13/12&#9474;15/20&#9474;14/36&#9474;       cog ram address / next state
       16   &#9474;16/8 &#9474;17/36&#9474;19/0 &#9474;18/16&#9474;
       20   &#9474;20/36&#9474;21/12&#9474;23/20&#9474;22/4 &#9474;
       24   &#9474;24/28&#9474;25/24&#9474;27/0 &#9474;26/36&#9474;
       28   &#9474;28/28&#9474;29/24&#9474;31/36&#9474;30/32&#9474;
       32   &#9474;32/28&#9474;33/36&#9474;35/0 &#9474;34/32&#9474;
       36   &#9474;36/36&#9474;37/36&#9474;39/36&#9474;38/36&#9474;  <--- Error state trap.     
      &#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9524;&#9472;&#9472;&#9472;&#9472;&#9472;&#9524;&#9472;&#9472;&#9472;&#9472;&#9472;&#9524;&#9472;&#9472;&#9472;&#9472;&#9472;&#9524;&#9472;&#9472;&#9472;&#9472;&#9472;&#9496;    
}

              ORG       64       ' leave room for a 6 bit table
        ' init not shown.
Loop          mov       outa, 0_0
              movs      Loop, ina
              jmp       #Loop

              fit

This table is configured for I²C (pin1 clock, pin 0 data) ... following the code from a normal start (both inputs '1'), the first move will write zero, but the direction mask prevents the lowest 2 bits from being included in the write. Therefore the contents of long address 3 (lower 9 bits only) is written to the source field of the Loop's mov instruction. Output bits are placed above the lower order 9 bits in each of the entries of the cog ram table

Whenever the next state is the same as the current state ... no state change occurs (stable state). Each row of the table has one stable state ... and, one 'impossible' state which represents both input bits changing at the same time (used for error detection)

IF the execution of the loop is fast enough, only one bit will change at a time ... but, if it's to slow, the error detection state will inform us of the condition.

What do you think? (file it / flush it)

... Tim

PS - Anyone think the loop code is too long?

msrobots · 2015-03-14 17:48

ORG 64 might not do at all what you think it does. ( And is doing in a lot of assemblers)

It will assemble the code as if it would run from cog memory 64, but still put the code at cog memory position 0, thus rendering the code useless. Unless you relocate the code from cog memory 0 to cog memory 64 by yourself in your init code.

So whatever you put in the ORG statement will just affect the register addresses used in the code, NOT the position in the cog ram of the assembled code.

coginit and cogrun will always load your code starting at memory address 0 and jump to memory address 0.

Using ORG with something else as 0 will just work if you manually load that code from hub inside the cog and load it manually into the desired address range in the cog. Think of overlay code or things like that.

Still not sure what Chip was thinking there to make that decision. But it is what it is.

If your code really needs uninitialized data in the first 64 longs, do a ORG 0 and fill the first 64 longs with NOP (0).

Or - better - put the data behind your code with res...

Enjoy!

Mike

Tapperman · 2015-03-14 20:17

msrobots wrote: »

If your code really needs uninitialized data in the first 64 longs, do a ORG 0 and fill the first 64 longs with NOP (0).

Thanx ... I never would have caught that!!!!!!!!!!!!

... Tim

Tapperman · 2015-03-14 23:20

Okay,

Help me here guys.

Am I correct in assuming that once this code is correctly debugged ... Only the table and mask change for a different logic problem ... the code stays the same?

And therefore, the table could be loaded from somewhere else. Did I just stumble across something here?

... Tim

kuroneko · 2015-03-15 00:45

@Tapperman: can you guarantee that ina[8:6] stays low? Otherwise you may output values from outside the table. Also, you may not want to output entry 0 first (first insn at Loop), so some form of initialisation might be in order. Remainder looks OK, arguably the table could be written as long 0[65] but that's just me ...

:xfer can be preinitialised (unless you want live reload). So can temp (temp long 65).

Tapperman · 2015-03-15 01:03

kuroneko wrote: »

... can you guarantee that ina[8:6] stays low?

No ... unless, it's the only cog running ... 1 trumps 0, and all that jazz. Or, no cog outputs to pins 0..9 except that code.

              mov       dira, mask
              jmp       #Init

Loop          mov       outa, 0_0
Init          movs      Loop, ina
              jmp       #Loop

And I think that takes care of a tiny over sight on startup. Thanks for spotting that!

At present, I'm making up a table that provides the following output pins ... for test:

- I²C Start Bit detected
- I²C Data flowing
- SPI data flowing
- I²C Stop bit detected
- Bus Idle
- Error ... overflow

Some may think a couple of these are redundant ... but the start bit detect COULD be followed by nothing more than a stop bit ... aka, no device code, no direction bit, no ack/nack bit.

Perhaps, use that as a non-standard prelude to an anch serial feed over the data line ... say, supplying a COG_Table?

... Tim

Machine Code Question ... will it work?

Comments