Shop OBEX P1 Docs P2 Docs Learn Events
What is the max throughput for the Spinneret using Indirect mode? — Parallax Forums

What is the max throughput for the Spinneret using Indirect mode?

agsags Posts: 386
edited 2011-03-06 12:27 in Accessories
I've been working with the SPI interface (original coding by Timothy, fixes by others) posted on the Google repository. I can see that the inline comments claim 3.55 uSec for a read (1 byte of data) and 3.60 uSec for a write (1 byte of data).

I've looked at the Indirect driver code and don't see any similar time specs. Has anyone measured and/or calculated the actual Rx/Tx time?

I'm coming to realize two things:

a) In my application, latency isn't very important (order of 10s of milliSeconds is fine) but throughput is. When throughput of Rx (using UDP) doesn't keep up with the sender, packets are lost after the WIZnet memory fills. This isn't rocket science, but I'm seeing that I cannot ignore this. (I had originally assumed, incorrectly, that I would never need to send data at such a rate for this to become an issue)

and

b) I'm now back to engineering 101: manage the opposing trade-offs of consuming more pins for the Indirect driver, allowing more throughput - but taking away output pins (the purpose of the application is to drive many output signals) means I will not need the higher data rate to satisfy the now-fewer "clients" on the reduced output pin count --- or preserve the output pin count, and do everything possible to get the throughput up. I see lots of PASM coding in the near future.

Thanks.

Comments

  • David CarrierDavid Carrier Posts: 294
    edited 2011-03-03 12:16
    The 8-bit data portion of the indirect bus is connected to a set of pins that can be controlled by the video generator. With some optimised assembly coding, the Propeller could output as fast as the W5100 could transmit. Using multiple cogs to read back the data, the Propeller could read in data as fast as the W5100 could deliver it.

    The code hasn't been written yet, but there is a trade-off between development time, resources, speed, and latency. On the plus side, if one person writes the optimised code then releases it under an open-source license, the development will only need to be done once.

    If you have enough resources left inside the Propeller (e.g. cogs and RAM) then I would go for external shift registers to increase the number of available output pins. It may seem old fashioned, but a CMOS shift register can run at several MHz. What do you intend to connect to the output pins?

    — David Carrier
    Parallax Inc.
  • agsags Posts: 386
    edited 2011-03-03 21:47
    Yes, engineering is always a tradeoff of opposing factors. I agree.

    I have studied the WIZnet datasheet, and what I see is that the maxium clock (XTAL) freq is 25MHz. I don't know how to translate that to actual data speed. I'll have to look at the timing diagrams. However, since it is touted as supporting 10/100Mbps autodetect, unless it's dropping bits at 100Mbps, that means a data rate of something approaching 5 MBps (I'm assuming the packet overhead is included in the raw bitrate, derating the actual data rate).

    If I have to go this route I'd be willing to contribute the code, if it would be useful to the community.

    I am driving a proprietary (in the sense of "not like SPI or I2C or other standards" but not "closed due to license restrictions") self-clocking protocol. It requires resolution of 5 uSec. Frames are: 5 uSec high ("start"); 5 uSec high, 10 uSec low ("1"); 10 uSec high, 5 uSec low ("0"), 15 uSec low ("stop"). It is inherently limited to roughly 8KBps. I want to drive as many of those lines in parallel as possible (32 would be great but I've given up on that; 24 seems reasonable if I'm able to consume data from the WIZnet fast enough; 16 is an acceptable, but barely successful goal).

    Would you please elaborate on how to use a shift register to increase output pins? Since the output I need is self-clocking, I presume that means using three Prop outputs (one to clock a serial-in, parallel-out shift register, one as serial data out to the register, and the third to latch the parallel value into a latch once the "word" is loaded into the shift register). I'd have to think about the code to do that and determine if I could get all that done in 5 uSec with a dedicated cog running PASM. Of course the length of the register is a factor in the timing. Is that basically the approach you're suggesting?

    Thanks for the response.
  • Mike GMike G Posts: 2,702
    edited 2011-03-04 07:39
    Using multiple cogs to read back the data, the Propeller could read in data as fast as the W5100 could deliver it

    I've been moving my web server code to PASM. So, before I get any further down the development path, can anyone point me to a resource that demonstrates how to coordinate multiple COGs for receiving data? A conceptual overview would be fine.

    I image you would need one COG that directs traffic? So a minimum of 3 COGs would be needed?
  • David CarrierDavid Carrier Posts: 294
    edited 2011-03-04 12:55
    Mike,
    We don't currently have any documentation on using multiple cogs to sample, but it boils down to this:

    1) Launch the same code in multiple cogs and pass a future cnt value for them to start
    2) The cogs wait for the cnt value plus on offset set by their cogid
    3) The cog samples one or more times
    4) The cog writes the sample to its own buffer in hub RAM
    5) The cog returns to step three until it is done taking samples
    6) The code reading the data alternates between each cogs buffer while reading

    You don't need a cog to coordinate the others; they use the system counter. You do need to interleave data when reading it, but this won't take extra time, just more code space.

    — David Carrier
    Parallax Inc.
  • jstjohnzjstjohnz Posts: 91
    edited 2011-03-04 17:49
    Are you driving RGB pixels?
    ags wrote: »
    Yes, engineering is always a tradeoff of opposing factors. I agree.

    I have studied the WIZnet datasheet, and what I see is that the maxium clock (XTAL) freq is 25MHz. I don't know how to translate that to actual data speed. I'll have to look at the timing diagrams. However, since it is touted as supporting 10/100Mbps autodetect, unless it's dropping bits at 100Mbps, that means a data rate of something approaching 5 MBps (I'm assuming the packet overhead is included in the raw bitrate, derating the actual data rate).

    If I have to go this route I'd be willing to contribute the code, if it would be useful to the community.

    I am driving a proprietary (in the sense of "not like SPI or I2C or other standards" but not "closed due to license restrictions") self-clocking protocol. It requires resolution of 5 uSec. Frames are: 5 uSec high ("start"); 5 uSec high, 10 uSec low ("1"); 10 uSec high, 5 uSec low ("0"), 15 uSec low ("stop"). It is inherently limited to roughly 8KBps. I want to drive as many of those lines in parallel as possible (32 would be great but I've given up on that; 24 seems reasonable if I'm able to consume data from the WIZnet fast enough; 16 is an acceptable, but barely successful goal).

    Would you please elaborate on how to use a shift register to increase output pins? Since the output I need is self-clocking, I presume that means using three Prop outputs (one to clock a serial-in, parallel-out shift register, one as serial data out to the register, and the third to latch the parallel value into a latch once the "word" is loaded into the shift register). I'd have to think about the code to do that and determine if I could get all that done in 5 uSec with a dedicated cog running PASM. Of course the length of the register is a factor in the timing. Is that basically the approach you're suggesting?

    Thanks for the response.
  • agsags Posts: 386
    edited 2011-03-05 18:27
    Well, I'm still forging ahead on the design options here. I'd appreciate comments from any of the experts here to validate my progress so far. Here's a rough sketch of the situation:
    • I'm attempting a design that would consume upwards of 3Mbps of data (not including transport overhead) sent over UDP.
    • The sustained throughput is the primary concern in the application; latency of 10's of mSec is acceptable.
    • Using the existing WIZnet SPI driver, the read time for one byte is 3.55 uSec or rougly 2Mbps - and this includes any transport overhead.
    • Unless I want to slow my data rate, or number of devices driven, I'll therefore have to move to a faster interface. I'm planning on the WIZnet Indirect interface.
    • I'm chosing this because it requires 13 I/O pins (if I don't use RESET or INT).
    • I found a newer version of the WIZnet datasheet and see that the read cycle using the Indirect interface is 80nSec (which translates magically to 100Mbps - go figure). That is raw bit rate, not data.
    • That speed should be sufficient for my needs, and while it doesn't consume as many pins as the Direct interface, it does however leave me with not enough pins (driving 16 devices is the minimum acceptable, 32 is preferred).
    • Shift registers were suggested as a way to deal with this. I found the 74LV594 part here: http://www.st.com/stonline/books/pdf/docs/8069.pdf. It's an 8-bit serial-in, parallel-out with latch. It runs at 40 MHz in the cold at low voltage. They can also be cascaded if needed. They will require a minimum of 3 pins (serial data, shift_clk and latch_clk). That still fits within the Propeller's total 32 pin budget.
    Now some specific questions/calculations:

    It has been suggested earlier in this thread that multiple cogs in parallel could consume all the data the WIZnet could provide. If I can read one byte from the WIZnet in Indirect mode in 80 nSec (which is 12.5 MHz, half of the WIZnet XTAL frequency - and 100Mbps raw) and I can shift out a byte to the shift registers at 40 MHz (8bits x 25 nSec = 200nSec = 5 Mbps raw) then I'm in the ballpark. I might be able to squeeze all the data out through 4x8bit cascaded registers, but I think that it's more likely that 2 - 2x8bit or even 4 - 1x8bit would give me lots of headroom. Does this make sense?

    I've not used the cog counters, but have read all the desciption in the Propeller Manual and think I get the idea (functioning PASM code is a far distance beyond that). I presume I could just load up CTRB with n[1:4] bytes then fire off CTRA at 40 MHz with a SHL/ROL at each clock, stop CTRA, clock the serial word into the latch, repeat. Is this correct? Can anyone point me towards a good resource on use of the cog counters? I've looked at the WIZnet Indirect driver, are there recommendations for other good code examples around?

    Two things are still unclear to me and require more thought/design. Help/suggestions/examples here would be greatly appreciated:

    1) What is the best (meaning fastest) method of converting <n> "parallel" bytes into 8 "serial" words of length <n> bits in PASM? I've used MUXC before but am not experienced in PASM and think there's likely to be a much faster pattern known & used by the experts.

    2) The WIZnet driver will run in one cog; each bank of shift registers will be driven in it's own cog. Data received by the WIZnet will have to be distributed to the shift register cogs through hub RAM. I don't know how to calculate the impact of that on timing, other than the worse-case schenario of 22 clocks for each write to hub RAM by the WIZnet cog and 22 clocks for each read from hub RAM by each shift register cog. Are there methods used to reliably make sure that each cog is able to read hub RAM in 7 cycles instead of closer to the worst-case 22 cycles? (I see a way to write a long at a time rather than bytes/words to save time.) Can I time a cog so that it can get 2 hub RAM reads or writes in one "slice" of access to shared resources?

    Thanks to any and all that offer their knowledge and experience on this.
  • DynamoBenDynamoBen Posts: 366
    edited 2011-03-06 12:27
    A word of caution, from experience, the W5100 doesn't function reliably when you read/write over 10MHz. Take a close look at the memory read/write timing specs in the datasheet.
Sign In or Register to comment.