Shop OBEX P1 Docs P2 Docs Learn Events
High Performance Needed — Parallax Forums

High Performance Needed

WaffenGeistWaffenGeist Posts: 5
edited 2012-04-12 19:35 in General Discussion
I need to process 10 to 20 million bytes per second. Needless to say that the Propeller is not powerful enough to perform this.

I have a true random numbers generator that outputs bytes and I need to rapidly count them to see which bytes occur more often out of 1 trillion bytes.

If you needed to count 20 million bytes per second all the way up to 1 trillion bytes, what hardware would you use ?

Thanks
«1

Comments

  • User NameUser Name Posts: 1,451
    edited 2012-04-11 09:45
    Well, you might be surprised. It mostly boils down to how the bytes are being delivered. But in general, eight parallel processors go a long way toward decimating such a task.

    In a cryptography project a few years ago I plopped several propellers on a perfboard, and with surprisingly little hassle and time broke up a very big problem very nicely.

    Anyway, you'll probably have to provide more info for anyone to make specific recommendations.
  • rod1963rod1963 Posts: 752
    edited 2012-04-11 09:54
    You aren't giving much info but from what have given, see if you can host the programs on a PC/laptop/netbook.
  • Spiral_72Spiral_72 Posts: 791
    edited 2012-04-11 12:35
    I'll bet a Prop will rip right through those numbers actually. I think I'd use two cogs, one for communications and a read buffer, the other one for the "check"

    If the random numbers are a byte, that's 0-255 or 0-FF so I'd setup a data table in RAM with 256 slots. You'll need a pointer to keep track of your current read in the buffer.

    MYTABLE[read value]=+1
    increment buffer pointer one byte
    get next number

    When you are all done, you'll have a data table with 256 numbers in it. Each position in the table corresponds to the number and it's number of occurrence. That would be a thousand times faster than IF - THEN statements.

    disclaimer: it's been a couple months since I've touched my prop unfortunately, so the "code" above is not code. You'll have to work that out. It'll be slightly more complicated than that since you'll have to deal with overrunning and overwriting your data table, but you get the idea.
  • Mike GreenMike Green Posts: 23,101
    edited 2012-04-11 12:50
    A Propeller could do this easily assuming that the bytes are available as 8 (parallel) I/O lines plus a clock whose leading edge indicates that a new byte value is available. The normal Propeller's clock is 80MHz and it can easily be overclocked with some simple attention to PCB layout and filtering. Instructions mostly take 50ns to execute. That leaves about 10 instruction per sample at the higher rate you mentioned. With byte values, you'd keep an occurance table of 256 32-bit longs in the cog's memory and increment one based on the 8-bit random number received. This could be easily done in 10 instructions and at a 20MHz rate. You would use an external pushbutton or other signal to stop the counting briefly (maybe 50ms) to take a snapshot of this table, copy to the "hub" shared memory, then resume the counting. The other cogs could transfer the data, either in raw form or sorted or otherwise analyzed to a PC or display it graphically on a VGA display, allow you to enter commands to modify the display using a PS/2 keyboard, etc.

    The snapshot could also be triggered based on elapsed time since there's a 32-bit system-wide clock counter with a 12.5ns resolution. If it's important to never drop any samples, two cogs could alternate the counting of samples with the copying of the counts to the hub memory buffer and the counts could be combined there.
  • jmgjmg Posts: 15,184
    edited 2012-04-11 16:01
    I need to process 10 to 20 million bytes per second. Needless to say that the Propeller is not powerful enough to perform this.

    I have a true random numbers generator that outputs bytes and I need to rapidly count them to see which bytes occur more often out of 1 trillion bytes.

    If you needed to count 20 million bytes per second all the way up to 1 trillion bytes, what hardware would you use ?

    Thanks

    How about wait for Prop II ? :) - or is the assignment due in, before then ?

    First you need to nail down some corners,

    10^12/2^32 = 232.830
    10^12/20^6 = 15625s 10^12/20^6/60/60 = 4.340 hours

    The first number shows 32 bit counters could be marginal, on big skews, and 10^12 needs careful control.
    You could consider 40 bits, and more likely 48 bits ?

    The code is then a repeat of
    INC(CounterArray[ByteValue])
    and
    CounterArray is 256 x 48 bit Memory.

    A PC can just stream 20Mbytes/s over high speed USB, so a FTDI High speed + PC is one pathway.

    Or, you could use a small FPGA/CPLD - they can easily increment memory at 20 MHz, and you need just 1536 bytes of memory.
    Streaming rates of 50MHz-100MHz+ should be possible here.


    ["Needless to say that the Propeller is not powerful enough to perform this."]
    Well, certainly not easily, at a student level.... but...

    At 20 MBytes/sec, data is arriving at exactly a Prop 1's nominal opcode rate per cog.

    - but lets imagine we can phase-lock 4 identical coded cogs, each sampling at 200ns, but 50ns apart.
    Here is a rough once-over of how this might work :

    You now have spread your total over 4 arrays of numbers, adding at 1/4 the rate, and have 4 opcodes to play with...
    Candidates would be
    MOVD D,S Insert S[8..0] into D[17..9]
    This allows the Port pins, to self-modify a Destination byte ready for a fixed S=1 add
    100000 001i 1111 ddddddddd sssssssss ADD D,S Add S into D
    and a small fix is needed to avoid the very-ends of COG RAM address range

    and finally
    DJNZ D,S Dec D, jump if not zero to S (no jump = 8 clocks, jump = 4 clocks)

    On average, each 32 bit counter bucket will be 22.7% full at 10^12, but to reach 10^12/2 in each cog, we need 58.207 * 2*32
    - so that's another problem....
    Choices are to unroll to 59 or 60, or work on less than 10^12, and collect later.

    MOVD AddressOfADDOpCode,ByteFromPort
    ADD AddressOfADDOpCode,OffsetAroundCode
    ADD Dummy_Address,1 ; INC the actual array, note Dummy_Address has been changed each time.
    NOP / DJNZ ; unroll as needed

    We have 512 x 32 bit registers, but the top 16 are special, and code starts at bottom, so 512-16-60*4 = 256 (!!)
    - but that does not allow cog-sync ; unroll of 59, allows 4 opcodes to sync - Possible ?

    Of course, this is just a quick scribble, but it looks like 4 cogs could _just_ cope, and manage 10^12/4 loop as well.
    I make each cell with an ideal value of 976562500, over the (almost) 10^12 samples.
    A Prop II would gain in MHz and also the new Loop opcode saves 25%

    Notice this counting code, will return a prefect random score card, for a binary counter - so perhaps a distance check would catch that - but that has a time-penalty.
  • jmgjmg Posts: 15,184
    edited 2012-04-11 16:03
    Mike Green wrote: »
    Instructions mostly take 50ns to execute. That leaves about 10 instruction per sample at the higher rate you mentioned.
    Not on my calculator ? - I make 20MBytes / sec, equal to the instruction rate in one cog ?
  • Mike GreenMike Green Posts: 23,101
    edited 2012-04-11 16:25
    @jmg,
    Oops, I'm indeed off by an order of magnitude. Sorry.
  • kwinnkwinn Posts: 8,697
    edited 2012-04-11 21:14
    As User Name says “*It mostly boils down to how the bytes are being delivered”. If they are coming at a fixed rate a single propeller programmed in PASM could probably do the job using several cogs. Even with a variable rate (50ns or more between bytes) it could be doable with the help of a little hardware to indicate a valid byte.

    With truly random distribution a trillion bytes distributed over 8 cogs would be 125 billion per cog, and distributed over 256 longs that would be a count of 488,281,250 per long. Even if the distribution is not perfectly random each long can count up to 4,294,967,295 which is 8 times what the average count should be for a perfectly random distribution.
  • pik33pik33 Posts: 2,402
    edited 2012-04-11 23:27
    Mike Green wrote: »
    @jmg,
    Oops, I'm indeed off by an order of magnitude. Sorry.

    You have 8 cogs
  • Phil Pilgrim (PhiPi)Phil Pilgrim (PhiPi) Posts: 23,514
    edited 2012-04-11 23:48
    You can accumulate data at 10 MHz using a single cog, under the following conditions:

    1. The data is eight bits parallel.
    2. The data can be synchronized to a 10 MHz clock provided by the Propeller.

    The trick here is to input the data on P16..9, which is the lower eight bits of a PASM instruction's 9-bit destination field. Then setup dira and outa, so that an instruction executed from ina adds one to the bin selected by the 8-bit input and sets the carry flag on overflow. The next register after ina is inb, a normal register on the P8X32A, and it will contain a jump-on-no-carry instruction back to ina. If a bin overflows, the jump will not be taken, and the instruction in outa will be executed. But that just adds an extra one to bin zero, which can be subtracted from the results. Then the instruction in outb gets executed: a jump to the report section of the code, which writes all of the bin contents to the hub.

    Here's the skeletal code:
    CON
    
      _clkmode      = xtal1 + pll16x
      _xinfreq      = 5_000_000
    
    VAR
    
      long histo[257]
    
    PUB  start
    
      cognew(@acquire, @histo)
      repeat until histo[256]       'Wait for "done" flag.
                                    'Start serial I/O here, and output the results.
    DAT
    
    acquire       jmp       #setup
                  long      0[255]
                  
    setup         mov       0,1 wz                  'Clear first instruction and set Z flag.
                  mov       dira,dira0
                  mov       inb,inb0
                  mov       outa,outa0
                  mov       outb,outb0
                  mov       frqa,frqa0
                  mov       phsa,phsa0
                  mov       ctra,ctra0
                  jmp       #ina
    
    report        'Write the accumulated data back to the hub, then set "done" flag in histo[256].
    
    dira0         long      %111111_1111_1111_100000000_111111111
    outa0   if_z  add       0-0,#1 wc
    'Could be if_nc_or_z if CLK is high, but it doesn't matter, since Z is set anyway.
    'Same as      long      %100000_0111_101*_0********_000000001
    '                                         ├─DATA─┤ Bits 16..9
    '                                       └CLK output Bit  18
    inb0    if_nc jmp       #ina
    outb0         jmp       #report
    
    frqa0         long      $2000_0000
    phsa0         long      0                       'Set to correct value so data is stable during ina instruction fetch.
    ctra0         long      %00100 << 26 | 18       'Output a 10 MHz clock on P18.
    
    {{ When configured, this is what SFRs will look like, in order:
    
    ina     if_z  add       0-0,#1 wc
    inb     if_nc jmp       #ina
    outa    if_z  add       0,#1 wc               'Accumulates one extra count.
    outb          jmp       #report
    }}
    

    The clock signal comes out on P18, which is one of the condition code bits of the add instruction. The condition for this instruction is chosen such that the state of this bit does not matter. phsa must be initialized so that the data is stable when the add instruction in ina is fetched.

    Anyway, I haven't tried this, but I believe the principle is sound. It's not. Se post #19.

    -Phil
  • LoopyBytelooseLoopyByteloose Posts: 12,537
    edited 2012-04-12 00:06
    This really depends on the actual scope of the task. The Propeller may do quite nicely, but if you really want fast parallel processing in the extreme - Russian hackers have been using Nvida video cards as specifically tasked parallel processors to hack passwords. They happen to be faster than a PC in some contexts. The only drawback is that you will have to make your own way with learning their system and creating code.
  • jmgjmg Posts: 15,184
    edited 2012-04-12 01:08
    The trick here is to input the data on P16..9, which is the lower eight bits of a PASM instruction's 9-bit destination field.

    Nifty. ( I guess this Prop is doing nothing else anyway )
    Then setup dira and outa, so that an instruction executed from ina adds one to the bin selected by the 8-bit input and sets the carry flag on overflow. The next register after ina is inb, a normal register on the P8X32A, and it will contain a jump-on-no-carry instruction back to ina

    I think that means it loops until any one counter saturates (overflows), and then all others will be fractions from that ?
    Not quite the original spec, but a quite nice way to normalise results.
  • LeonLeon Posts: 7,620
    edited 2012-04-12 02:08
    pik33 wrote: »
    You have 8 cogs

    How does that help in this situation?
  • jmgjmg Posts: 15,184
    edited 2012-04-12 03:22
    Leon wrote: »
    How does that help in this situation?

    You can spread the load, by interleaving samples over multiple COGS ?
    It means more housekeeping, and you need to sum the totals from each cog before you are done, but it does buy
    some cycles.
  • LeonLeon Posts: 7,620
    edited 2012-04-12 03:28
    I can't see it being useful for this particular problem, though. The inputs need to be be processed by a single cog, don't they?
  • jmgjmg Posts: 15,184
    edited 2012-04-12 04:06
    Leon wrote: »
    I can't see it being useful for this particular problem, though. The inputs need to be be processed by a single cog, don't they?

    Why do they 'need to be be processed by a single cog' ?

    All the OP wants to do, is sum the bytes as a histogram, for the nominal 1 trillion bytes.

    So they do not care if that is 4 x 5 MByte streams summed, or one 20MB stream.
  • LeonLeon Posts: 7,620
    edited 2012-04-12 04:12
    It seems to be a somewhat academic exercise, anyway, as the OP hasn't commented on any of the suggested solutions. We don't even know whether the data are parallel or serial. Timing details would be useful, as well.
  • Phil Pilgrim (PhiPi)Phil Pilgrim (PhiPi) Posts: 23,514
    edited 2012-04-12 08:36
    Kuroneko has informed me by email that reading an instruction from the input pins will not work. Instructions are always fetched from the shadow registers, so nothing that appears on the pins will affect an instruction read from ina. Oh, well. The best laid plans .... :)

    -Phil
  • Phil Pilgrim (PhiPi)Phil Pilgrim (PhiPi) Posts: 23,514
    edited 2012-04-12 12:19
    Here's another attempt, this time more canonical. It uses four interleaved cogs to read and accumulate data at 20 MHz. The data must be synchronized to the 20 MHz clock output from the Prop. Here's the skeletal code (untested):
    {{
    20 MHz data analyzer. Parallel data is input on P7..P0, synchronized to 20 MHz clock on CLK_PIN.
    Four cogs take turns accumulating data in four 20 MHz bursts each, as determined by the quadrature
    clocks on I_PIN and Q_PIN:
    
        &#61570;&#61573;&#61570;&#61573;&#61570;&#61573;&#61570;&#61573;&#61570;&#61573;&#61570;&#61573;&#61570;&#61573;&#61570;&#61573;&#61570;&#61573;&#61570;&#61573;&#61570;&#61573;&#61570;&#61573;&#61570;&#61573;&#61570;&#61573;&#61570;&#61573;&#61570;&#61573;&#61570; CLK_PIN (20 MHz)
        &#61570;&#61574;&#61574;&#61574;&#61574;&#61574;&#61574;&#61574;&#61574;&#61574;&#61574;&#61574;&#61574;&#61574;&#61574;&#61574;&#61573;&#61569;&#61569;&#61569;&#61569;&#61569;&#61569;&#61569;&#61569;&#61569;&#61569;&#61569;&#61569;&#61569;&#61569;&#61569;&#61570; I_PIN (1.25 MHz)
        &#61569;&#61569;&#61569;&#61569;&#61569;&#61569;&#61569;&#61569;&#61570;&#61574;&#61574;&#61574;&#61574;&#61574;&#61574;&#61574;&#61574;&#61574;&#61574;&#61574;&#61574;&#61574;&#61574;&#61574;&#61573;&#61569;&#61569;&#61569;&#61569;&#61569;&#61569;&#61569;&#61569; Q_PIN (1.25 MHz)
        
        &#9484;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9516;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9516;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9516;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9488;
        &#9474; READ  &#9474; Accum &#9474; Loop & wait...&#9474; histo2
        &#9500;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9532;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9532;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9532;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9508;
        &#9474;wait...&#9474;  READ &#9474; Accum &#9474; Loop &&#9474; histo3
        &#9500;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9532;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9532;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9532;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9508;
        &#9474; Loop & wait...&#9474; READ  &#9474; Accum &#9474; histo1
        &#9500;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9532;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9532;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9532;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9508;
        &#9474; Accum &#9474; Loop & wait...&#9474; READ  &#9474; histo0
        &#9492;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9524;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9524;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9524;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9496;
    
    }}
    
    CON
    
      _clkmode      = xtal1 + pll16x
      _xinfreq      = 5_000_000
    
      Q_PIN         = 9
      I_PIN         = Q_PIN + 1
      CLK_PIN       = 11
    
    VAR
    
      long histo0[257], histo1[257], histo2[257], histo3[257]
    
    PUB  start
    
      dira[CLK_PIN]~~               'Drive clock pins low to start.
      dira[I_PIN]~~
      dira[Q_PIN]~~
      cognew(@acquire, @histo0)
      cognew(@acquire, @histo1)
      cognew(@acquire, @histo2)
      cognew(@acquire, @histo3)
      cognew(@quadrature, 0)
      repeat until histo0[256] and histo1[256] and histo2[256] and histo3[256] 'Wait for all "done" flags.
    
      'Code to output accumulated data.
    
    DAT
    
    ''-------[ 5 MHz Quadrature Output ]-------------------------------------------
    
                  org       0
    quadrature    mov       dira,:dira0
                  mov       frqa,:frqx0
                  mov       frqb,:frqx0
                  mov       phsa,:phsa0             'ctra will lead ctrb by 90°.
                  mov       ctra,:ctra0             
                  mov       ctrb,:ctrb0
                  jmp       $                       'Keep cog alive.
    
    :dira0        long      1 << I_PIN | 1 << Q_PIN
    :frqx0        long      $0200_0000              '1.25 MHz.
    :phsa0        long      $4000_0000 - $0800_0000 'Quadrature, less count in one instruction time.
    :ctra0        long      %00100 << 26 | I_PIN    'ctra outputs on I_PIN.
    :ctrb0        long      %00100 << 26 | Q_PIN    'ctrb outputs on Q_PIN.
    
    ''-------[ Data Acquisition and Accumulation ]---------------------------------
    
                  org       0
    acquire       jmp       #:start
                  long      255[0]                  'Data bins, including address 0.
                  
    :start        mov       0,#0                    'Clear bin 0 of jmp instruction.
                  mov       :quad,par               'Each successive hub array is offset by %100 in last three bits.
                  shr       :quad,#2                'Isolate them.
                  and       :quad,#3 wz             'Cog with histo0 drives the data clock.
                  shl       :quad,#Q_PIN            'Shift match mask into position.
            if_z  mov       dira,:dira0             'If histo0 set up to output data clock.
            if_z  mov       frqa,:frqa0
            if_z  mov       phsa,:phsa0
                  waitpeq   :mask,:mask             'Wait for quadrature cog to start.
            if_z  mov       ctra,:ctra0             'Start data clock, sync'd with quadrature.
    
    :loop         waitpeq   :quad,:mask             'Wait for our turn.
                  movd      :add0,ina               'Read data at 20 MHz burst.
                  movd      :add1,ina
                  movd      :add2,ina
                  movd      :add3,ina
    :add0         add       0-0,#1 wc               'Accumulate data that was read.
    :add1   if_nc add       0-0,#1 wc               'For shorter sampling epochs, something other than #1 can be used.
    :add2   if_nc add       0-0,#1 wc
    :add3   if_nc add       0-0,#1 wc
            if_nc jmp       #:loop                  'Loop back if no overflows.
    
    'Code here to write counts back to hub and flag to histox[256].
    
    :dira0        long      1 << CLK_PIN
    :frqa0        long      $4000_0000              '20 MHz.
    :phsa0        long      0                       'Adjust so that data is stable relative to waitpeq at :loop.
    :ctra0        long      %00100 << 26 | CLK_PIN  'Data clock outputs on P11.
    :mask         long      1 << I_PIN | 1 << Q_PIN
    
    :quad         res       1
    

    I dunno, though. Rather than engaging in the discussion, the OP seems to have left the building. This may not have been worth the effort I put into it.

    -Phil
  • LeonLeon Posts: 7,620
    edited 2012-04-12 12:26
    I'd like to see how he's generating the random bytes at 20 MHz. That isn't easy.
  • User NameUser Name Posts: 1,451
    edited 2012-04-12 14:03
    I dunno, though. Rather than engaging in the discussion, the OP seems to have left the building. This may not have been worth the effort I put into it.

    Wow! A slick trick like that is never wasted. It's now in the forum archive and I, for one, have also squirreled it away for my own delectation and benefit.
  • User NameUser Name Posts: 1,451
    edited 2012-04-12 14:11
    Leon wrote: »
    I'd like to see how he's generating the random bytes at 20 MHz. That isn't easy.

    Yup. A multiplicative congruential generator in an LPC1768 might do it. A hardware LFSR would surely do it. ...Just the first two thoughts that come to mind.
  • LeonLeon Posts: 7,620
    edited 2012-04-12 14:18
    They are real random numbers, though, not pseudo-random ones. That's a lot harder at that frequency.
  • jmgjmg Posts: 15,184
    edited 2012-04-12 15:26
    Leon wrote: »
    They are real random numbers, though, not pseudo-random ones. That's a lot harder at that frequency.

    Maybe - notice that a simple 20Mhz counter will pass this 'Random Test', at least for a novice ;)
  • jmgjmg Posts: 15,184
    edited 2012-04-12 16:14
    I dunno, though. Rather than engaging in the discussion, the OP seems to have left the building. This may not have been worth the effort I put into it.
    -Phil

    It may be a Homework type question, but these are actually very good for defining just what is possible and where the limits are.
    So the effort is certainly worthwhile, as it is now 'on record'.

    Like my #6 this refined version also uses 4 cogs, but unlike #6 you do burst-sync.

    If I read it right, this code also does not count a specific total number of bytes, but exits on the first overflow [full histogram] (which could get interesting with 4 cogs making different exit decisions - thus breaking the 'same counts' rule ? )

    The outline in #6 relied on phased-starts of 4 COGS, which I presume is easy to do ?
    If we skip the offset add, by the cleaner JMP replace you do, that is 3 opcodes including a total-sample counter.
    So that could fit in 3 COGS and meet the data rate budget ?

    Which raises a new question, of could this spread over 6 cogs, and sync to a half opcode, for a 40MHz effective rate ?
    That would need 2 CLK granularity on first opcode launch in each COG, which I think is legal ?

    The ideal would be to have ONE copy that is launched 4 times with a param tweak to set the go-time.
  • User NameUser Name Posts: 1,451
    edited 2012-04-12 16:37
    Leon wrote: »
    They are real random numbers, though, not pseudo-random ones. That's a lot harder at that frequency.

    What 'real' random numbers are you referring to? The ones mentioned by the OP? Since he never clarified a thing, how do you know they were 'real'? And if they were truly 'real' why would he be interested in a histogram?
  • LeonLeon Posts: 7,620
    edited 2012-04-12 17:25
    He said they were true random numbers.
  • User NameUser Name Posts: 1,451
    edited 2012-04-12 18:47
    Then all I can imagine is that he's using something like an avalanche diode and an 8-bit flash ADC. Those two items are common and inexpensive; though I couldn't speculate as to any embellishments that might be involved. In fact it's just possible that's the reason for the histogram requirement. Maybe it's to look for missing codes or other non-linearity.
  • LawsonLawson Posts: 870
    edited 2012-04-12 19:20
    Let me take a stab at doing this with one cog at 10MHz. I'm making the same assumptions that Phill did, except I put the data buss on P0-P7. In this case, the count starts at -1 and counts down until zero is reached. Once one bin hits zero the code should exit. This is untested and may blow up due to the Prop's pipeline. If it works, the bin array will be one count behind vs the input port so the "cleanup code" will need to check the destination address of the instruction at :loop2 and subtract one from that bin.
    'assume a parallel data buss on pins P0-P7
    'synchronised to a clock provided by the propeller       
    
    org 0
    start   jmp #start2
    hist    long  -1
            long  -1
            'repeat the above line 255 times
    start2  mov   start, hist       'reset first bin of histogram to -1 (should be at cog address 256 or higher)
            'setup a counter for a 10MHz clock
    
    :loop   movd  :loop2, ina
    :loop2  djnz  0-0,      #:loop
    
            'one of the bins overflowed
            'start the cleanup code
    

    Lawson
  • Phil Pilgrim (PhiPi)Phil Pilgrim (PhiPi) Posts: 23,514
    edited 2012-04-12 19:35
    Hey, Lawson, that's pretty clever! You'll get an extra count in bin 0, due to pipelining the first time though, but you can always start that one at zero instead of -1.

    BTW,
    hist long -1[255]

    will repeat the long for you.

    -Phil
Sign In or Register to comment.