Evaluation Process

We are almost ready to exit this phase of engineering development and move on to figuring out where the P2 fits in the real world.

In my mind, there will always be a call for more RAM and improved total bandwidth. IMHO it is always going to be faster to access Hub RAM than almost any other option. This means a multi-P2 design is always going to be superior to any other option when it comes to adding RAM. For increasing total bandwidth a multi-P2 is always going to be better than a single (similarly configured) P2.

I'm not an engineer, but I want a local area network of P2's. I don't know if my bus lines should be tied low(to eliminate noise) or high (to allow tri-stating).

I would like to see a protoboard that does nothing but connect two P2_ES_v2's. Thiswould allow me a convenient way to play with networked P2's. I know... I could do it myself... If I wanted to dig out the design software and go through the process of figuring out which Gerber files I need to send. This stuff doesn't interest me. I'm sorry... I know it is like honey to most of us here, but I really just don't want to mess with it.

Volunteers? Parallax?

Thanks

Rich

Comments

  • rjo__rjo__ Posts: 2,027
    edited 2019-09-27 - 13:39:49
    Oh... why not just use a cable? I'm going to use the P2's default, naïve pin configurations. I'm willing to do some soldering:)

    Thanks
  • rjo__ wrote: »
    Oh... why not just use a cable?

    This was my first thought too!

    Sure, there's always educational value doing this stuff yourself. But it also takes a lot of time you might not have, so getting 2 or 3 Eval boards and a pack of jumper wires could be a reasonable option. Probably not much more cost than rolling your own at this stage either.

    These things are always relative I suppose!

  • My gut feel is that each line is going to have a least one resistor... so a straight jumper won't work for me. If I don't need to tie low then I'm going to tie each line high.

    I'm thinking of a quad-P2 configuration (any handywork X3). If the quad-P2 works for me... then this will be a ground floor configuration.

    After that I will add a little rebar and move my way up... given that my work area has more head than elbow room.

    .
  • rjo__rjo__ Posts: 2,027
    edited 2019-09-27 - 14:19:32
    That's about 5.1kMIPS per floor.
  • rjo__ wrote: »
    My gut feel is that each line is going to have a least one resistor... so a straight jumper won't work for me. If I don't need to tie low then I'm going to tie each line high.

    For speed, you could grab a pair of these to go between each P2-EVAL, and add whatever resistor combo you need.

    https://www.parallax.com/product/64005-es

    OR...

    You could look at the various P2 internal pullup and drive options- maybe something will work for what you have in mind.


    rjo__ wrote: »
    I'm thinking of a quad-P2 configuration
    Ah, ok. That would be a monster board indeed ! Nice!

    I think you'd really need to nail down the design concept and features/connections before starting to make such a thing, or you could easily be going in circles and burning through time and cash! Starting with a pluggable off-the-shelf solution should simplify that task, and then turn that into a custom board once you've got things figured out.


  • I agree... except that I'm hoping that the commercial value will be recognized and I can just remain a customer:)

  • About money... I'm really old. One thing I have learned in life is that money is never the problem.
    If an idea doesn't pay for itself... it isn't much of an idea.
  • msrobotsmsrobots Posts: 2,913
    edited 2019-09-27 - 20:50:20
    @rjo__,

    I totally agree with you and am facing the same problem. I simulated connecting multiple P2 evals on one P2 eval, using smartpins connected to neighboring pins and using the streamer for fast transfer to 'share' a common HUB buffer .

    I basically stole the idea from @Beau Schwabe and am using his version on multiple P1 ones. The basic concept is that one can use the same mailbox system used between COGs on one propeller between COGs on different propellers by circulating a hub-buffer around.

    Thanks to @DaveJenson and @frida I do have 3 eval-a boards now and - hm - was to afraid yet to directly connect them with jumper wires, because I KNOW I will make wiring and programming mistakes.

    There is some long winded thread about it http://forums.parallax.com/discussion/170216/ringbuffer-was-streamer-questions-how-to-sync#latest

    Sadly I am not old enough (or rich enough) to retire so real work is hindering me to play with them right now. The current code is just running on rev-a since the streamer encodings changed with rev-b, but that will be a minor issue to solve.

    One nice thing is that the same code basically can run with 2 to 64 pins. I am more a soft then a hardware guy so I am unable to build a board like you where asking for, but maybe we find someone here to do so.

    On the other hand if Parallax starts mass-producing their own P2 module (or @Peter Jakacki's P2D2) then some standard could evolve to connect them. The current code needs two COGs, but in my mind I already think I can reduce that to one COG, since it basically just needs to wait for it's buffer, allow changing (using locks) and sends it to the next P2. Experiments showed me that there can just be ONE buffer circulating around or you can not guarantee that all P2's see the same data.

    Anyways,

    Mike
    I am just another Code Monkey.
    A determined coder can write COBOL programs in any language. -- Author unknown.
    Press any key to continue, any other key to quit

    The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this post are to be interpreted as described in RFC 2119.
  • Thanks Mike.

    There are a lot of gifted people around here. I count you as one of them.

    I missed the early phases of the P1 development and was determined not to be left in the dust again:)

    Thank God that there is just enough of the P1 in the P2 that I can do some things without thinking about it too much.

    And I'm confident that all I have to do is post a question and I'll get 300 replies:)

    I hope the good people at Parallax know just how much their efforts reward the willing.





  • jmgjmg Posts: 13,928
    rjo__ wrote: »
    .... For increasing total bandwidth a multi-P2 is always going to be better than a single (similarly configured) P2.

    I'm not an engineer, but I want a local area network of P2's. I don't know if my bus lines should be tied low(to eliminate noise) or high (to allow tri-stating).

    What speed and distance do you target ?

    For local use, the simplest will be direct wired Async comms, and the P2 can manage up to 32b payloads P2 to P2.
    Not sure where the upper P2-P2 practical baud limit is, could be 50Mbd or more ?


    For PC links, the Async limit seems to be 12MBd commonly (FT232xH), and 15Mbd or more in niche cases.
    ( I have EFM8UB3 doing bursts of 8Mbd 8.n.2 )

    For higher speeds, useful may to be target a QuadSPI variant. That supports any mix of memory and P2's, and streamers can manage 4 bit transfers.
    The FT4222H looks useful there too, claims QSPI 40MHz master and 20MHz slave clocks, and throughputs of 52.8Mbps, for PC links.

    For moderate distances, direct wires may need buffering, choices could be CAN transceivers, RS485/422 transceivers and M-LVDS transceivers, or maybe digital isolators ?


  • rjo__rjo__ Posts: 2,027
    edited 2019-09-28 - 19:39:18
    First iteration: I'm planning to keep the P2's as tightly grouped as possible. What I want to do is see what I can get from the simplest logic possible, using 8 bits, byte aligned.

    on the send side the pseudo code would be:


    REP ……………..
    set CLKPIN to Low '2 clocks
    read fast ONE byte 'read fast from buffer... ? 1 clock?
    set corresponding byte of OUTA, Byte value, #bytenum '2 clocks
    set CLKPIN to High '2 clocks

    loop to REP '0 clocks


    On the receive side the

    Rep...……..
    wait for positive edge on clkpin
    get corresponding byte from INA
    WRITE FAST that byte 'write to buffer
    loop to Rep

    The first step will be to make sure that the receive side is getting all of the positive edges:)

    If so, it looks like sysclock/7 or sysclock/8 MB/s (40-45 MB/s at 320MHz)

    Of course, this could be doubled by using 16 bit transfers.

    This approach doesn't require a common clock or even a common setting for the sysclocks...as long as the receive sysclock >= sending sysclock.

    IF it doesn't work... then I will need to slow the sending with NOPs and the overall exchange rate will be less.


    Regards












  • jmgjmg Posts: 13,928
    rjo__ wrote: »
    ...
    on the send side the pseudo code would be:


    REP ……………..
    set CLKPIN to Low '2 clocks
    read fast ONE byte 'read fast from buffer... ? 1 clock?
    set corresponding byte of OUTA, Byte value, #bytenum '2 clocks
    set CLKPIN to High '2 clocks

    loop to REP '0 clocks

    On the receive side the

    Rep...……..
    wait for positive edge on clkpin
    get corresponding byte from INA
    WRITE FAST that byte 'write to buffer
    loop to Rep

    The first step will be to make sure that the receive side is getting all of the positive edges:)

    If so, it looks like sysclock/7 or sysclock/8 MB/s (40-45 MB/s at 320MHz)
    P2 HUB access is less deterministic than (eg) LUT buffers so highest burst speeds would avoid HUB inside the inner most loop.
    You could also look to handshake on the first clock edge, to tolerate interrupts, or work inside an interrupt ?
    P2 can also wait on either edge, so you can halve the clock speed using DDR transfer.

  • rjo__rjo__ Posts: 2,027
    edited 2019-09-30 - 20:24:49
    1)
    hmmm. My understanding was that when using RDFAST and WRFAST, each transfer was only one clock?
    I could use the LUT as a buffer for some of my 3D analysis stuff... it is certainly big enough.
    But if we want to split a display buffer between P2s, the size of the LUT and the refresh process would
    seem to be a challenge. My general thinking on this was that as long as the transfer rate (P2<->P2)exceeds the pixel rate of the display, we should be good, and it is simply a matter of properly timing the transfers to the primary display buffer.
    So... with a pixel display rate of 25MHz, a 2-P2 setup should be capable of a full 16-bit VGA display using 8-bit transfers.

    If the P2<->P2 transfer rate is inadequate. There is no rule that says that you couldn't use two cogs on the receive side and one cog each from more than 1 sending P2s.

    I think using the LUT really shines when you have to send and receive the same data very quickly within a single P2. By using the LUT and a companion COG, which can read that LUT, you can make the data really fly.

    In the end, the dictum remains intact: you can do it, but you might need more than 1 P2:)

    2)
    Handshake would take two clocks, but those could be gained back by using both edges of the transfer clock. I could essentially get a handshake for free...

    My original idea was to make the various P2 sysclocks completely independent... so long as the receive P2 sysclock was greater than or equal to the sending P2 sysclock (and in the case of a video buffer, the sending rate would have to exceed the pixel rate of the display) it should work fine. I think the handshake preserves that.

    Can't wait.







  • jmgjmg Posts: 13,928
    rjo__ wrote: »
    1)
    hmmm. My understanding was that when using RDFAST and WRFAST, each transfer was only one clock?

    When crossing the COG-HUB boundary, the slot alignment matters, so the opcode timing is variable, hence the data says
    "2 or WRFAST finish + 10...17"
    Once you have waited for that alignment, then the next transfer can be as quick as one per sysclk, which is how the streamer can run at high speeds.
    - but it's the jitter on that first wait for align that will cause problems to simplest SW clocked transfers.

    rjo__ wrote: »
    1)
    ...
    But if we want to split a display buffer between P2s, the size of the LUT and the refresh process would
    seem to be a challenge.
    That's sounding more a job for the streamer, where you config everything and then say go, probably with a small ping-pong handshake so the sender knows the receiver has slot align.
    rjo__ wrote: »
    My original idea was to make the various P2 sysclocks completely independent... so long as the receive P2 sysclock was greater than or equal to the sending P2 sysclock (and in the case of a video buffer, the sending rate would have to exceed the pixel rate of the display) it should work fine. I think the handshake preserves that.
    For highest speeds, you would need your P2's operating phase locked.
    Hopefully, a common XI signal is enough and the PLLs should have a common delay offset from XI edges, with tolerable PVT changes.
    Some tests would be needed, and P2's heating at differing rates may cause movements in relative PLL timings.
    That aligns SysCLKS, but another step would be needed if you wanted to align HUB slots. I'm not sure how you would manage that ?

    Async links can tolerate non locked clocks, but they do that by running slower, and ensuring the sample point is never close to an edge.

  • Thanks... I'll try to process that:) Have grandson tonight.

  • rjo__rjo__ Posts: 2,027
    edited 2019-09-30 - 21:28:37
    time for a quick note or two
    thanks for the truth about RDFAST and WRFAST... so burst reading and then burst writing will be in order, which might just bring the LUT into play after all:)

    OOPs edit
Sign In or Register to comment.