Shop OBEX P1 Docs P2 Docs Learn Events
Bursting data to/from Cog RAM- 6.6MB/sec using 9 pins? - Page 2 — Parallax Forums

Bursting data to/from Cog RAM- 6.6MB/sec using 9 pins?

24

Comments

  • jazzedjazzed Posts: 11,803
    edited 2009-05-06 20:55
    Phil Pilgrim (PhiPi) said...

    jazzed,

    I'm not sure I follow you. One of the gotchas with the carry flag and shifts/rotates is that only the starting bit (31 or 0) gets shifted into carry. IOW, the carry bit doesn't act like a super MSB or LSB when you do a shift or rotate with a wc.

    -Phil


    Ok, I see it now. The original value of bit 0 is put into C, so to get the original value, one must first shr by 7. But you did say that in so many words. Thanks again [noparse]:)[/noparse]


    Welcome to our collaboration Dr. Jim

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    --Steve


    Propalyzer: Propeller PC Logic Analyzer
    http://forums.parallax.com/showthread.php?p=788230

    Post Edited (jazzed) : 5/6/2009 9:07:57 PM GMT
  • GiemmeGiemme Posts: 85
    edited 2009-05-06 21:08
    @Mikediv

    the other chip are:
    2 x 512 Kb SRAM (K6X4008C1F
    1 X Latch 74HC573N

    regards

    Gianni
  • lonesocklonesock Posts: 917
    edited 2009-05-06 21:10
    Taking jazzed's & Phil's code and tweaking it a bit, I can use a simple MOV for the first byte (so the 0 bit is actually in D[noparse][[/noparse]0]), then a ROR to get it into position, writing the Carry flag at that point (which pulls the carry flag from, coincidentally, D[noparse][[/noparse]0]), saving an instruction later

                  mov       val,ina     'xxxxxxxxxxxxxxxxxxxxxxxxAAAAAAAA .
                  ror       val,#17 wc  'xxxxxxxxxAAAAAAAAxxxxxxxxxxxxxxx A0
                  movi      val,ina     'bBBBBBBBBAAAAAAAAxxxxxxxxxxxxxxx A0
                  shr       val,#8      '........bBBBBBBBBAAAAAAAAxxxxxxx A0
                  movi      val,ina     'cCCCCCCCCBBBBBBBBAAAAAAAAxxxxxxx A0
                  shr       val,#8      '........cCCCCCCCCBBBBBBBBAAAAAAA A0
                  movi      val,ina     'dDDDDDDDDCCCCCCCCBBBBBBBBAAAAAAA A0
                  rcl       val,#1      'DDDDDDDDCCCCCCCCBBBBBBBBAAAAAAAA A0
    
    
    



    Did I miss something, or would that work?
    Jonathan

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    lonesock
    Piranha are people too.

    Post Edited (lonesock) : 5/6/2009 9:35:36 PM GMT
  • jazzedjazzed Posts: 11,803
    edited 2009-05-06 21:17
    MagIO2 said...
    Xilinx XC9572
    Nice part. Also I noticed the data sheet mentions PQFP-100 ... (wider pin spacing than TQFP-100). Is it generally available? Digikey has a VQFP-64 with 52 IO, is is that a type-o? This is more attractive than the Altera Max3064.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    --Steve


    Propalyzer: Propeller PC Logic Analyzer
    http://forums.parallax.com/showthread.php?p=788230
  • MagIO2MagIO2 Posts: 2,243
    edited 2009-05-06 21:33
    I got the PLCC84 version. For that sockets are available. So you can even do wirewrap prototyping. With only 2 74HC125 and some resistors you can build the programming interface which is supported by the XILINX development environment. Ok ... my current version is 5V, but the shift register with which I tested the video generator shift out is 5V as well and it works fine. So for prototyping it should work. First I have to find the cable - we just moved ;o)
  • HannoHanno Posts: 1,130
    edited 2009-05-06 21:45
    You guys rock! I run away for a round of golf and look what everyone has come up with! I think we're onto something here. I like the idea of having just 9 IO pins- 8 for the data which will be burst, and 1 as a serial interface to tell some other device/cog whether you want to read or write a constant amount of data. Once the serial command is sent, the other device is responsible for sending a set amount of bytes at a set speed. This allows you to do away with the clock and other io pins. Exactly how to massage the data into memory will depend on the application. If all you care about is 100 bytes- and need lots of program room, you can use my original code. If you want to pack it into longs, you'll need to shift, probably in an unrolled loop. Or, if you want to store more data, then use the movs, movi, movd variants. BUT, the hardware remains the same. Who wants to cook up a simple demo so we can talk more concretely about this? (I would, but I still have work to do for next week's meetings at Parallax and Google [noparse]:)[/noparse] ) Demo should show:
    - 1 cog-the "controller" has some user interface to do something interesting
    - 1 cog-the "sender" has data it wants to send to another cog
    - 1 cog-the "receiver" wants that data to do something useful with it
    You won't need any hardware for this, just pretend that the "sender" and "receiver" are separated by wires- in reality they're communicating via INA and OUTA. Ideally sender and receiver should share a common subroutine... Extra credit if you use some standard serial interface- but it has to be fast. And woohoo, looks like Brian is tempted to make us hardware [noparse]:)[/noparse]
    Hanno
  • jazzedjazzed Posts: 11,803
    edited 2009-05-06 22:11
    lonesock said...

                  mov       val,ina     'xxxxxxxxxxxxxxxxxxxxxxxxAAAAAAAA .
                  ror       val,#17 wc  'xxxxxxxxxAAAAAAAAxxxxxxxxxxxxxxx A0
                  movi      val,ina     'bBBBBBBBBAAAAAAAAxxxxxxxxxxxxxxx A0
                  shr       val,#8      '........bBBBBBBBBAAAAAAAAxxxxxxx A0
                  movi      val,ina     'cCCCCCCCCBBBBBBBBAAAAAAAAxxxxxxx A0
                  shr       val,#8      '........cCCCCCCCCBBBBBBBBAAAAAAA A0
                  movi      val,ina     'dDDDDDDDDCCCCCCCCBBBBBBBBAAAAAAA A0
                  rcl       val,#1      'DDDDDDDDCCCCCCCCBBBBBBBBAAAAAAAA A0
    
    


    Did I miss something, or would that work?

    Works for me! But then again without a quick test apparently, anything works for me [noparse]:)[/noparse]
    This is also > than 6.6MB/s. Of course adding instructions for pointer adjust and djnz len
    (3 inst or 2 if you get really creative) adds 100-150ns per long to make it 7.3 to 8 MB/s.
    Too bad one needs overhead [noparse]:)[/noparse]

    MagIO2, the VQFP is similar to TQFP spacing ... 7.5 mil ugh.

    Hanno, you're right you don't need a clock pin if it's 2 Propellers since they can share XI pins.
    But to do external memory transfers one has no choice because of the variety of data modes.

    Maybe Brian would like to talk about how to get a CPLD and SRAM on to a nice small size
    prototype that could easily fit into a generally availabe and cheap enclosure. I've been working
    on that myself, but having a more experienced layout guy do it would "de-risk" things.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    --Steve


    Propalyzer: Propeller PC Logic Analyzer
    http://forums.parallax.com/showthread.php?p=788230
  • Phil Pilgrim (PhiPi)Phil Pilgrim (PhiPi) Posts: 23,514
    edited 2009-05-06 22:15
    lonesock,

    Excellent! I've been racking my brain, trying to get rid of that extra shift, and you solved it! This makes it possible to use the code with a 10MHz synchronous clock.

    -Phil
  • lonesocklonesock Posts: 917
    edited 2009-05-06 22:51
    Glad I could help, thanks to Phil and jazzed for the heavy lifting! [noparse][[/noparse]8^)

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    lonesock
    Piranha are people too.
  • mynet43mynet43 Posts: 644
    edited 2009-05-06 23:09
    I've been following this post all day.

    It's fascinating to see the brain cells synchronize and come out with something like this.

    You've definitely convinced me to move my data I/O pins to D0..D7. There's no way I want to miss out on the code that's evolved.

    Keep up the great work! I was already packing stuff into 4 bytes/long and this will speed things up quite a bit, even with control lines.

    Keep us posted as the hardware progresses.

    Thanks everyone,

    Jim
  • jazzedjazzed Posts: 11,803
    edited 2009-05-06 23:20
    Don't forget Andy's and Hanno's inspirational posts.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    --Steve


    Propalyzer: Propeller PC Logic Analyzer
    http://forums.parallax.com/showthread.php?p=788230
  • HannoHanno Posts: 1,130
    edited 2009-05-06 23:28
    Had some more thoughts:
    - If you're using something smart like a Propeller for the sender/receiver, you don't even need the control line, 8 data lines is enough
    - However, since the movs,movd,movi commands move 9 bits, you could use 9 lines and move 9 bits at a time
    - To connect multiple devices, use an ethernet/canbus type strategy to listen until the bus is empty, then negotiate to use it
    - Even if you're not connecting to external memory, this is a great way to share data between cogs at fast speeds!
    Hanno (thanks _rjo for the signature tip- here is my very first use of it!)

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Download a free trial of ViewPort- the premier visual debugger for the Propeller
    Includes full debugger, simulated instruments, fuzzy logic, and OpenCV for computer vision. Now a Parallax Product!
  • jazzedjazzed Posts: 11,803
    edited 2009-05-07 00:56
    Nice .sig Hanno ... you forgot Logic State Analyzer.

    So for 8 data lines, you assume one cog is constantly polling for some non-zero address or attention byte?

    What if there is more than one "secondary" Propeller ? Having a start strobe that coincides with the address ensures that there is no confusion about what the byte means especially if a valid token in a packet between the "primary" and another "secondary" happens to have the attention byte for the "N secondary". If there is only one pair of Propellers (or cogs), it doesn't matter.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    --Steve


    Propalyzer: Propeller PC Logic Analyzer
    http://forums.parallax.com/showthread.php?p=788230
  • HannoHanno Posts: 1,130
    edited 2009-05-07 01:18
    Steve,
    A separate control line does simplify things like deciding when a request is being started. It's probably worth the 1 IO pin. Have you discovered that the "z" flag can be set on "mov" instructions if you're moving a 0? Same with wrlong/rdlong. Very useful! Is anyone programming this yet? I'm getting anxious to try out some real code and critique a real solution!
    (Logic Analyzer is one of the "simulated instruments", ViewPort also offers a spectrum analyzer, xy mode, and of course oscilloscope)
    Hanno

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Download a free trial of ViewPort- the premier visual debugger for the Propeller
    Includes full debugger, simulated instruments, fuzzy logic, and OpenCV for computer vision. Now a Parallax Product!
  • mctriviamctrivia Posts: 3,772
    edited 2009-05-07 01:50
    you guys are killing me. I had the prop galore almost done now I have to re wroute all the bus lines. oh well it is worth it so I am off to the drawing board.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Need to make your prop design easier or secure? Get a PropMod has crystal, eeprom, and programing header in a 40 pin dip 0.7" pitch module with uSD reader, and RTC options.
  • jazzedjazzed Posts: 11,803
    edited 2009-05-07 02:31
    @Mac,

    I did mention P0..9 in your thread on April 17th [noparse]:)[/noparse] For some reason I thought a ready bit was important too ... maybe for arbitrage and/or "asynchronous packet" responses ... too tired to think precisely now.

    @Hanno,

    Yes indeed. I first ran into this feature last summer when someone was trying to use "rdlong ... wc" which was a bug of course [noparse]:)[/noparse] I'm not coding this right now because I'm too busy trying out Verilog. Not knowing Verilog is a career weakness for me.

    Give the driver your best shot ... until hardware is generally available that fits the mold, it's mostly theory anyway. You could do it inter-COG. Also, I do have a 2 Propeller board with 9 bits attached already if you want me to test something.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    --Steve


    Propalyzer: Propeller PC Logic Analyzer
    http://forums.parallax.com/showthread.php?p=788230

    Post Edited (jazzed) : 5/7/2009 5:18:43 AM GMT
  • mctriviamctrivia Posts: 3,772
    edited 2009-05-07 02:38
    yes jazzed i should have listened to you in the begining. Went with P10-19 because it allowed me to make the board a fair bit narrower. Now I am going with P0-P11 though the upper bits will be easily seperated for io use.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    My new unsecure propmod both 1x1 and full size arriving soon.
  • Brian FairchildBrian Fairchild Posts: 549
    edited 2009-05-07 06:35
    I haven't fully read all the details in this thread but just one of those early morning random thoughts....

    Does it matter if the data is 'scrambled' in the memory? Does it matter if a strange number of bytes/words/longs is mapped to a sensible number of memory bytes or if it's stored in 'odd' addresses in the memory? As long as reading and writing are symmetrical then "data in = data out".

    Many years ago, in the days when 8k SRAMs were a luxury, we designed a large PCB with lots of static ram and eprom on. When we had it made we realised the pcb layout guy had messed up and the address lines were scrambled. A new version was going to put the project late and then we realised it didn't matter. The RAM was fine, all we did for eprom was write a little utility to scramble the data so that when read out it was in the right places.

    If we accept that RAM is cheap then if we waste even half a chip at the expense of fast access then we're still streets ahead.
  • jazzedjazzed Posts: 11,803
    edited 2009-05-07 07:01
    By the way, the read32 algorithm data rate in an uncached asynchronous XMM is
    up to 1.25M LIPS (5.0MB/s) depending on SRAM type.

    loadxmm        ' 20 instructions ... 16 instructions for 3.3V < 50ns SRAM ...
                   ' unlatched, direct address/data bus ...
                   ' dira set by init code to $0fffff00
                   mov      outa,   addr
                   shl      outa,   #8      ' mov address bits into position
                   add      outa,   _0x300  'add to address to fix endian order
                   nop                         ' remove for < 50ns 3.3V SRAM
                   nop                         ' remove for < 100ns 3.3V SRAM
                   mov      val,    ina     'xxxxxxxxxxxxxxxxxxxxxxxxAAAAAAAA .
                   sub      outa,   #$100
                   ror      val,    #17 wc  'xxxxxxxxxAAAAAAAAxxxxxxxxxxxxxxx A0
                   nop                         ' remove for < 100ns 3.3V SRAM
                   movi     val,    ina     'bBBBBBBBBAAAAAAAAxxxxxxxxxxxxxxx A0
                   sub      outa,   #$100
                   shr      val,    #8      '........bBBBBBBBBAAAAAAAAxxxxxxx A0
                   nop                         ' remove for < 100ns 3.3V SRAM
                   movi     val,    ina     'cCCCCCCCCBBBBBBBBAAAAAAAAxxxxxxx A0
                   sub      outa,   #$100
                   shr      val,    #8      '........cCCCCCCCCBBBBBBBBAAAAAAA A0
                   nop                         ' remove for < 100ns 3.3V SRAM
                   movi     val,    ina     'dDDDDDDDDCCCCCCCCBBBBBBBBAAAAAAA A0
                   rcl      val,    #1      'DDDDDDDDCCCCCCCCBBBBBBBBAAAAAAAA A0
                   ret
    
    _0x300         long $300
    addr           long 0               
    val            res 1
    
    



    Brian if one uses a loader and SRAM the address or data doesn't matter as long as there is no address aliasing.
    But you knew that. Neat that you had a script to fix the EPROM file [noparse]:)[/noparse]

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    --Steve


    Propalyzer: Propeller PC Logic Analyzer
    http://forums.parallax.com/showthread.php?p=788230

    Post Edited (jazzed) : 5/7/2009 7:09:53 AM GMT
  • Cluso99Cluso99 Posts: 18,069
    edited 2009-05-07 08:06
    Nice method to recover 32 bits guys·cool.gif··This concept will work on both Blades #1 & #2 of the TriBladeProp.

    Jazzed:
    You will require the first 'nop' for <50nS 3v3 sram because...
    ···············add······outa,···_0x300··'add·to·address·to·fix·endian·order
    ···············nop·························'·remove·for·<·50ns·3.3V·SRAM
    ···············nop·························'·remove·for·<·100ns·3.3V·SRAM
    ···············mov······val,····ina·····'xxxxxxxxxxxxxxxxxxxxxxxxAAAAAAAA·.
    The timing for this at 5MHz....·····
    add      IdSDeR
    mov          IdSDer
                  ^      <-- write of address to the output pins
                   ^     <-- read of data pins
                  ||     less than 12.5nS but cannot guarantee any time > 0nS       
    

    ·The address output will be output at the R cycle and the data will be read in the next S clock cycle. The timing between these cannot be guaranteed (I don't recall this sort of timing published on the data sheet). So you will require the first 'nop' no matter how fast the memory.

    Postedit 14Dec2009: ERROR ABOVE: See http://forums.parallax.com/showthread.php?p=861676·for·confirmation from Chip that "ina" is sampled on the "e" clock, not the "S" clock.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Links to other interesting threads:

    · Home of the MultiBladeProps: TriBlade,·RamBlade,·SixBlade, website
    · Single Board Computer:·3 Propeller ICs·and a·TriBladeProp board (ZiCog Z80 Emulator)
    · Prop Tools under Development or Completed (Index)
    · Emulators: CPUs Z80 etc; Micros Altair etc;· Terminals·VT100 etc; (Index) ZiCog (Z80) , MoCog (6809)
    · Search the Propeller forums·(uses advanced Google search)
    My cruising website is: ·www.bluemagic.biz·· MultiBladeProp is: www.bluemagic.biz/cluso.htm

    Post Edited (Cluso99) : 12/14/2009 5:17:07 AM GMT
  • heaterheater Posts: 3,370
    edited 2009-05-07 10:52
    Jazzed, I'm not sure I'm awake enough to fully understand that last code snippet. Are you saying we can now fetch LONGs through a byte wide interface at 1.25 M longs per second from normal RAM with no hardware assistance? That is we can execute XMM PASM at say 1MIP. That's totally awesome.

    Looks like the addresses are progressing backwards, or is it me?

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    For me, the past is not over yet.
  • jazzedjazzed Posts: 11,803
    edited 2009-05-07 14:29
    Yes, the address has to decrement for little endian storage ... DDCCBBAA -> 00112233. This is nice for using DJNZ though ....
    Thanks for clarity on the delay Ray ... guess the NOP can be used for something.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    --Steve


    Propalyzer: Propeller PC Logic Analyzer
    http://forums.parallax.com/showthread.php?p=788230
  • HannoHanno Posts: 1,130
    edited 2009-05-07 15:18
    Good morning!
    I think we're getting there together- nice teamwork everyone. Sorry I couldn't help this time around-I'll have time next Wednesday when I fly home! I've gone through this thread and collected what I think are the salient points:
    - synchronous access device has a defined transfer packet format
    - would be great to use the Prop's XO line as a clock
    - 8 bits can be shifted/moved into 32 bits
    - Xilinx XC9572 may provide a nice interface
    - it's ok to waste external memory, or leave it scrambled
    Study below:

    jazzed: "
    Obviously as you mention a synchronous access device is necessary. As I see it, one needs 2 pins in addition to the 8 for byte data. One pin would be for the clock which would be produced by the CTRA on demand. The other pin would be a start bit. The data access would be via a packet as described below.


    Synchronous memory transfer packet format:

    # S BYTE
    1 1 WLLLLLLL
    2 0 AAAAAAAA
    3 0 AAAAAAAA
    4 0 AAAAAAAA
    5 0 XXXXXXXX
    6 0 DDDDDDDD
    7 0 DDDDDDDD
    8 0 DDDDDDDD
    M 0 DDDDDDDD

    Legend:
    # - Packet Byte
    S - Start bit state
    W - Write Bit: Write if high, Read if low
    L - Length Bit: Transaction up to N bytes
    A - Address Bit: Target address
    X - Turn Around: Need to turn the BUS to input
    D - Data Bits
    M - Packet Length: N data + 5 setup

    Timing all depends on how data is stored by Propeller. The packet could be smaller for smaller length and address.
    The turnaround byte can be skipped in a write packet. Obviously the more data that is transferred, the higher the burst rate.
    "

    jazzed: BTW, I don't think it's possible to use 9 pins for this with an 8 bit bus unless the Propeller XO clock can be used some way.

    lonesock:
    Taking jazzed's & Phil's code and tweaking it a bit, I can use a simple MOV for the first byte (so the 0 bit is actually in D[noparse][[/noparse]0]), then a ROR to get it into position, writing the Carry flag at that point (which pulls the carry flag from, coincidentally, D[noparse][[/noparse]0]), saving an instruction later


    mov val,ina 'xxxxxxxxxxxxxxxxxxxxxxxxAAAAAAAA .
    ror val,#17 wc 'xxxxxxxxxAAAAAAAAxxxxxxxxxxxxxxx A0
    movi val,ina 'bBBBBBBBBAAAAAAAAxxxxxxxxxxxxxxx A0
    shr val,#8 '........bBBBBBBBBAAAAAAAAxxxxxxx A0
    movi val,ina 'cCCCCCCCCBBBBBBBBAAAAAAAAxxxxxxx A0
    shr val,#8 '........cCCCCCCCCBBBBBBBBAAAAAAA A0
    movi val,ina 'dDDDDDDDDCCCCCCCCBBBBBBBBAAAAAAA A0
    rcl val,#1 'DDDDDDDDCCCCCCCCBBBBBBBBAAAAAAAA A0

    jazzed:
    MagIO2 said...
    Xilinx XC9572

    Nice part. Also I noticed the data sheet mentions PQFP-100 ... (wider pin spacing than TQFP-100). Is it generally available? Digikey has a VQFP-64 with 52 IO, is is that a type-o? This is more attractive than the Altera Max3064.

    Brian Fairchild:
    Does it matter if the data is 'scrambled' in the memory?
    If we accept that RAM is cheap then if we waste even half a chip at the expense of fast access then we're still streets ahead.

    Jazzed:

    loadxmm ' 20 instructions ... 16 instructions for 3.3V < 50ns SRAM ...
    ' unlatched, direct address/data bus ...
    ' dira set by init code to $0fffff00
    mov outa, addr
    shl outa, #8 ' mov address bits into position
    add outa, _0x300 'add to address to fix endian order
    nop ' remove for < 50ns 3.3V SRAM
    nop ' remove for < 100ns 3.3V SRAM
    mov val, ina 'xxxxxxxxxxxxxxxxxxxxxxxxAAAAAAAA .
    sub outa, #$100
    ror val, #17 wc 'xxxxxxxxxAAAAAAAAxxxxxxxxxxxxxxx A0
    nop ' remove for < 100ns 3.3V SRAM
    movi val, ina 'bBBBBBBBBAAAAAAAAxxxxxxxxxxxxxxx A0
    sub outa, #$100
    shr val, #8 '........bBBBBBBBBAAAAAAAAxxxxxxx A0
    nop ' remove for < 100ns 3.3V SRAM
    movi val, ina 'cCCCCCCCCBBBBBBBBAAAAAAAAxxxxxxx A0
    sub outa, #$100
    shr val, #8 '........cCCCCCCCCBBBBBBBBAAAAAAA A0
    nop ' remove for < 100ns 3.3V SRAM
    movi val, ina 'dDDDDDDDDCCCCCCCCBBBBBBBBAAAAAAA A0
    rcl val, #1 'DDDDDDDDCCCCCCCCBBBBBBBBAAAAAAAA A0
    ret

    _0x300 long $300
    addr long 0
    val res 1




    Hanno

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Download a free trial of ViewPort- the premier visual debugger for the Propeller
    Includes full debugger, simulated instruments, fuzzy logic, and OpenCV for computer vision. Now a Parallax Product!
  • MagIO2MagIO2 Posts: 2,243
    edited 2009-05-07 15:34
    Why would you like to send the WLLLLLLLL? The XC9572 simply stays in burst mode until it finds the next start bit. So, as long as it has a clock it counts up the adress. Maybe you have an application where you don't need high speed access, but you want to copy 20k into HUB RAM. Can easily be done that way. And I'd prefere to have the prop control the WR signal, which then allows read modify write cycles.52 IOs is not a typo for the 64pin version of the CPLD. Of course it has some reserved pins for GND and Vcc and so on. It's not necessary to have one pin for each of the 72 Macrocells. Some of the macrocells will be internally then. And we don't need to have pins for the shift register - only for the latch/counter.I also saw a PLC44 version of this CPLD, which has 36 IOs. Maybe in the end this would be enough.PS: Don't know why I don't have linebreaks when I post with the PS3 but ... You can decide in your personal design if you want the WR to be a dedicated pin of the propeller or not. We have a 31 bit wide shift register/latch/counter. All not used bits can be used as output pin expansion of the propeller. It's only a matter of driver programming.

    Post Edited (MagIO2) : 5/7/2009 3:54:50 PM GMT
  • jazzedjazzed Posts: 11,803
    edited 2009-05-07 16:39
    MagIO2, I thought early on about just having the start bit be an enable; this way an "assert" change in enable loads the address and while enabled, the CPLD can count. I abandoned that though for some reason. The design I have today uses the length "tuple" ... that could change to save a byte for throughput and allow a 23 bit address range because we still need the WE bit ....

    Regarding Write Enable (WE), the more work the Propeller has to do, the slower the interface will be. Perhaps writing is not as critical performance wise as reading, but it would be easier to manage with the CPLD. BTW, you can't just keep WE asserted and change the address; data will get corrupted.

    Regarding clock, it would be a reasonable compromise to connect a 3 pin jumper between an XO driver output, a Propeller IO pin, and the CPLD clock input. This would de-risk the issue and allow clock flexibility, but it would be a waste of PCB real estate. Best to de-risk cut/jump though.

    By the way, I think I have an incrementing asynchronous read32 that uses the same number of instructions as the decrementing read32 ... it is not symmetrical though.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    --Steve


    Propalyzer: Propeller PC Logic Analyzer
    http://forums.parallax.com/showthread.php?p=788230
  • hippyhippy Posts: 1,981
    edited 2009-05-07 17:12
    lonesock said...
                  mov       val,ina     'xxxxxxxxxxxxxxxxxxxxxxxxAAAAAAAA .
                  ror       val,#17 wc  'xxxxxxxxxAAAAAAAAxxxxxxxxxxxxxxx A0
                  movi      val,ina     'bBBBBBBBBAAAAAAAAxxxxxxxxxxxxxxx A0
                  shr       val,#8      '........bBBBBBBBBAAAAAAAAxxxxxxx A0
                  movi      val,ina     'cCCCCCCCCBBBBBBBBAAAAAAAAxxxxxxx A0
                  shr       val,#8      '........cCCCCCCCCBBBBBBBBAAAAAAA A0
                  movi      val,ina     'dDDDDDDDDCCCCCCCCBBBBBBBBAAAAAAA A0
                  rcl       val,#1      'DDDDDDDDCCCCCCCCBBBBBBBBAAAAAAAA A0
    
    
    

    Very neat and very impressive. In terms of 32-bit data transfers ( I'm looking at executing code from XMM rather than just getting the data ) - PASM 20 MIPS, LMM 5 MIPS, XMM 2 MIPS which isn't too bad, although that considers instructions streamed in and not any random adressing overhead.

    In terms of the block transfer protocol; would it be possible to minimise the overhead ? That is a command to set block size ( 1..128 longs ) which will be a one-off command for many uses, another to just fetch the next block, maybe another to set address without having to send all address bytes ? Maybe that needs various data transfer modes to be configured ?

    Perhaps the WLLLLLLL command could be WxxxxLLL where L sets the 2^L size of block giving 4 bits of other control info, such as indicating how many address bytes follow ( None, 1, 2 or 3 ); WxxAALLL and still 2 bits spare.

    This would help in executing code where one wants a single long at a time and a minimum overhead to fetch the next long/bytes is desirable

    Also in streaming a block; at what rate do the longs clock out ? This depends on how the Cog is streaming them in, whether an un-rolled loop or in a loop where there will need to be a pause while any 'djnz' and pointer update occurs. One of those control bits could be 'all burst', 'burst-pause-burst-pause..."

    I'm assuming that whatever comes out of this will be a 'de facto XMM implementation' which everyone will sieze upon so if it can be optimised for fastest 'one long at a time' fetch as well as burst mode blocks that should suit all uses and maximise interest.
  • jazzedjazzed Posts: 11,803
    edited 2009-05-07 18:28
    hippy said...


    Perhaps the WLLLLLLL command could be WxxxxLLL where L sets the 2^L size of block giving 4 bits of other control info, such as indicating how many address bytes follow ( None, 1, 2 or 3 ); WxxAALLL and still 2 bits spare.
    I like this, but too much "coding" flexibility will require more real-estate and a more expensive CPLD. Just having a fixed number of addresses will be an advantage in many ways including potential performance issues (it will take cycles to construct the first byte).

    As far as clocking in/out, the "header bytes" WxxAALLL, AAAAAAA, ... can be pushed in consecutively (CLK*1). The "data bytes" for long READ access will need CLK*2 to give the hardware time to "deliver" the goods ... Cluso99 made this clear. Long WRITE access will only need CLK*1 ... but again CPLD cost and size may impact that.

    A simple implementation CPLD would cost < $4 (qty 1) for the Xilinx part.

    Added: Here is an incrementing asynchronous version of the long read.

    loadxmm
                   mov      outa,   addr
                   shl      outa,   #8
                   nop
                   mov      val,    ina     'xxxxxxxxxxxxxxxxxxxxxxxxAAAAAAAA .
                   add      outa,   #$100
                   ror      val,    #8      'AAAAAAAAxxxxxxxxxxxxxxxxxxxxxxxx .
                   movs     val,    ina     'AAAAAAAAxxxxxxxxxxxxxxxbBBBBBBBB .
                   add      outa,   #$100
                   ror      val,    #16     'xxxxxxxxxxxxxxxxBBBBBBBBAAAAAAAA .
                   shl      val,    #7 wc   'xxxxxxxxxBBBBBBBBAAAAAAAxxxxxxxx A0
                   movi     val,    ina     'cCCCCCCCCBBBBBBBBAAAAAAAxxxxxxxx A0
                   add      outa,   #$100
                   shr      val,    #8      '........cCCCCCCCCBBBBBBBBAAAAAAA A0
                   movi     val,    ina     'dDDDDDDDDCCCCCCCCBBBBBBBBAAAAAAA A0
                   rcl      val,    #1      'DDDDDDDDCCCCCCCCBBBBBBBBAAAAAAAA A0
                   ret
    
    

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    --Steve


    Propalyzer: Propeller PC Logic Analyzer
    http://forums.parallax.com/showthread.php?p=788230

    Post Edited (jazzed) : 5/7/2009 6:45:42 PM GMT
  • Cluso99Cluso99 Posts: 18,069
    edited 2009-05-08 03:51
    Here is what I am doing with my next pcb...
    A0... = P0...
    D0..D7 = P24..P31· (corrected)
    loadxmm
     mov outa, addr
     add addr, #4      '<-- nop reqd but can advance addr
     mov data, ina     'AAAAAAAA....
     add outa, #1
     shr data, #24     '000000000000000000000000AAAAAAAA
     mov d2, ina       'BBBBBBBB....
     add outa, #1
     and d2, hFF000000 'BBBBBBBB000000000000000000000000
     mov d3, ina       'CCCCCCCC....
     add outa, #1
     shr d3, #24       '000000000000000000000000CCCCCCCC
     mov d4, ina       'DDDDDDDD....
     and d4, hFF000000 'DDDDDDDD000000000000000000000000
     or data, d4       'DDDDDDDD0000000000000000AAAAAAAA
     or d2, d3         'BBBBBBBB0000000000000000CCCCCCCC
     rol d2, #16       '00000000CCCCCCCCBBBBBBBB00000000
     or data, d2       'DDDDDDDDCCCCCCCCBBBBBBBBAAAAAAAA
     ret
     
    random
     mov outa, addr
     add addr, #1      '<-- nop reqd but can advance addr
     mov data, ina     'AAAAAAAA....
     shr data, #24     '000000000000000000000000AAAAAAAA
     ret
    
    

    This method adds 2 instructions to XMM, but random is 1 instruction faster. The main disadvantage to my method·is that·I require the use of SI/SO and the Eeprom pins.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Links to other interesting threads:

    · Home of the MultiBladeProps: TriBladeProp, SixBladeProp, website (Multiple propeller pcbs)
    · Single Board Computer:·3 Propeller ICs·and a·TriBladeProp board (ZiCog Z80 Emulator)
    · Prop Tools under Development or Completed (Index)
    · Emulators: Micros eg Altair, and Terminals eg VT100 (Index)
    · Search the Propeller forums (via Google)
    My cruising website is: ·www.bluemagic.biz·· MultiBladeProp is: www.bluemagic.biz/cluso.htm

    Post Edited (Cluso99) : 5/8/2009 8:31:11 AM GMT
  • jazzedjazzed Posts: 11,803
    edited 2009-05-08 06:25
    I thought about that some, but the possibility of writing the eeprom by accident scared me out of it. One could activate the boot eeprom write protect line for normal use and have a switch for reprogramming though. After boot, the serial ports are mostly dispensable one way or another.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    --Steve


    Propalyzer: Propeller PC Logic Analyzer
    http://forums.parallax.com/showthread.php?p=788230
  • virtuPICvirtuPIC Posts: 193
    edited 2009-05-08 07:02
    Cluso99 said...

    ...
    D0..D7 = P25..P31
    Shouldn't this be D0..D7 = P24..P31? I know, I am nitpicking, sorry.

    Ever thought about a different wiring and using hub as target on chip memory or conversion buffer? Something like (untested!!!)
    A0.. = P8..
    D0..D7 = P0..P7
    load_via_hub
      mov     outa, addr
      shl     outa, #8
      add     outa, #$100
      wrbyte  outa, hubbuf
      add     addr, #4
      add     outa, #$100
      wrbyte  outa, hubbuf
      'instruction
      add     outa, #$100
      wrbyte  outa, hubbuf
      'instruction
      wrbyte  outa, hubbuf
      'instruction
      'instruction
      rdlong  data, hubbuf
      return
    


    Observations:
    • The code is a little shorter than loadxmm.
    • You can insert 4 other instructions executed in the stall time of hub accesses.
    • The code is slower if you don't insert such instructions.
    • The code is faster if you insert such instructions.
    • If you use this code without call / return there is space for two more instructions in the rdlong stall.
    • In burst transfers you can save the shl in the beginning.
    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Airspace V - international hangar flying!
    www.airspace-v.com/ggadgets for tools & toys
Sign In or Register to comment.