Prop2 FPGA files!!! - Updated 2 June 2018 - Final Version 32i

16364666869159

Comments

  • cgraceycgracey Posts: 11,267
    edited 2016-10-15 - 21:31:43
    The space ($20) is the continuous autobaud character and the only autobaud mechanism. Because $20 is ALWAYS trapped (if the baud rate is not too fast), it provides a high-frequency means of correcting the baud, as spaces are likely everywhere in the download. I like it this way because it is absolutely safe, working with every $20, no matter how disparate in time.

    The two-stage scheme was to first trap a $20, but be too late to engage the receiver for the next byte, needing a timed delay to land in the next stop bit, if the start bit was already seen at the end of the autobaud trap. Then, a maintenance autobaud was done on $20's, firing only once per byte. The problem with this scheme is that there must be high continuity, in time, of autobaud characters. It may not tolerate multi-second delays that can happen with wifi serial ports.
  • jmgjmg Posts: 13,609
    edited 2016-10-15 - 21:50:45
    cgracey wrote: »
    The space ($20) is the continuous autobaud character and the only autobaud mechanism. Because $20 is ALWAYS trapped (if the baud rate is not too fast), it provides a high-frequency means of correcting the baud, as spaces are likely everywhere in the download. I like it this way because it is absolutely safe, working with every $20, no matter how disparate in time.

    Given the Autobaud-tracking char can be selected to be almost anything, can the Smart Pins support this,
    via ability to Time X edges on B, started by A ?

    * Start measurement on Falling edge A (here the Start bit)
    * Count (eg) 5(X=5) _/= on B, then Capture time from Start ie @ Stop bit, t9-baud capture
    * Signal capture ready (right after the stop-bit leading edge)
    * Wait for read and re-arm on read

    If yes, for tracking, this is even wider in tolerance than a 'receive & check', as the Rx can fail on any leading Autobaud-tracking char, but correct in time for the next command char, which I think is what you are after ?
  • jmgjmg Posts: 13,609
    cgracey wrote: »
    ...., which would allow for RC drift.

    Curious if you have any measurements on the Test-die yet of ?

    RC Oscillator Frequency at 25'C and std Vcc

    Variation with Vcc

    Variation with Temperature

    Short term jitter
  • jmg wrote: »
    cgracey wrote: »
    ...., which would allow for RC drift.

    Curious if you have any measurements on the Test-die yet of ?

    RC Oscillator Frequency at 25'C and std Vcc

    Variation with Vcc

    Variation with Temperature

    Short term jitter

    Not yet. They should be coming this next week. I still need to design a board to hold them.
  • jmgjmg Posts: 13,609
    cgracey wrote: »
    ..The SHA-256/HMAC now works as the data comes in...
    How is that handled, in a flow sense ?

    Does the download host need to pause to check blocks are done ?
    If there are over-run issues, what happens to the link ?

    If there is no SHA append, as in this DOC example, what does that mean for speed ?
    Sender:	“ Prop_Txt 0 0 0 0 +/cj9v37I/YlJoD/KIBm/fD/n/0 ~”
    Loader:	CR+LF+”PASS”+CR+LF
    

    Some low cost MCUs have somewhat coarse BAUD choices at upper speeds, eg

    1.382400MBd, 691.2kBd, 345.6kBd

    Here, being able to burst at the higher baud would have the most appeal, even if that means some block-pause is added.

  • cgraceycgracey Posts: 11,267
    edited 2016-10-17 - 04:26:34
    jmg wrote: »
    cgracey wrote: »
    ..The SHA-256/HMAC now works as the data comes in...
    How is that handled, in a flow sense ?

    Does the download host need to pause to check blocks are done ?
    If there are over-run issues, what happens to the link ?

    If there is no SHA append, as in this DOC example, what does that mean for speed ?
    Sender:	“ Prop_Txt 0 0 0 0 +/cj9v37I/YlJoD/KIBm/fD/n/0 ~”
    Loader:	CR+LF+”PASS”+CR+LF
    

    Some low cost MCUs have somewhat coarse BAUD choices at upper speeds, eg

    1.382400MBd, 691.2kBd, 345.6kBd

    Here, being able to burst at the higher baud would have the most appeal, even if that means some block-pause is added.

    Every 64 bytes, a block gets hashed and that takes some time. I have a ~100 byte serial receive buffer, right now, that smooths things over.

    The 64-byte hash routine takes about 2.5ms. Serial bytes received go into the buffer on interrupts. I need to test this out in both hex and base 64 modes, to assure that we don't get buffer over-runs during continuous loading at top baud rates.
  • jmgjmg Posts: 13,609
    cgracey wrote: »
    Every 64 bytes, a block gets hashed and that takes some time. I have a ~100 byte serial receive buffer, right now, that smooths things over.

    The 64-byte hash routine takes about 2.5ms. Serial bytes received go into the buffer on interrupts. I need to test this out in both hex and base 64 modes, to assure that we don't get buffer over-runs during continuous loading at top baud rates.

    How much does that 'about 2.5ms' vary, and in the base64 with no SHA append, is that 2.5ms faster ?

    It sounds like this really needs a software handshake, or a Buffer check token, as Baud-rate and packet-rates are really two different things here.

    At higher baud rates, that token ping-pong will not cost much on a MCU, and ensures fastest practical Boot times.
    It will need some headroom, so Not-Full reply means some reasonable number of byte-quads can be sent.
    eg ~50% of that 100 bytes would be 48 quads or 36 raw bytes or 9 words.
    96 quads or 18 words is the next whole-opcode point, which is getting close to that ~100 byte you mention.

    For systems with poor handshake performance, setting the baud lower (with margin) can avoid ping-pong, but does mean a slower boot than could have occurred, in those systems.

    Do you have a means to read-back the AutoBaud divider (4 bytes, send b16) in there yet ?




  • cgraceycgracey Posts: 11,267
    edited 2016-10-17 - 06:11:48
    There is no need for handshaking, as making sure it works under fastest conditions will be sufficient. The 64 byte hash always takes the exact same amount of time.
  • jmgjmg Posts: 13,609
    cgracey wrote: »
    The 64 byte hash always takes the exact same amount of time.

    Is that delay there, even if the data indicates no SHA (as per DOC example of b64 ) ?

    Handshakes would be a pain, maybe the P2 can echo on AutoBaud with a nibble value indicating the headroom.

    That is read once, and signals the host NOP packers to use.

    eg 0x?0 could be Baud <= lowest P2 clock ability.
    0x?1..F could be table based NOP (Autobaud.Trim?) counts to pack every (say) 48 chars


  • ozpropdevozpropdev Posts: 2,516
    edited 2016-10-20 - 05:55:32
    Chip
    On the next release any chance you can restore PB0 on the P123-A7/A9 boards back to P54 instead of a P2 reset.
    I'm experiencing "capacitive coupling effects" that cause intermittent resets on these boards.
    The A7 seems a lot more sensitive than the A9 board.

    P.S. Can we also drop some cogs in a A9 version (maybe to 12?) to get some smartpins back?
    Melbourne, Australia
  • ozpropdev wrote: »
    Chip
    On the next release any chance you can restore PB0 on the P123-A7/A9 boards back to P54 instead of a P2 reset.
    I'm experiencing "capacitive coupling effects" that cause intermittent resets on these boards.
    The A7 seems a lot more sensitive than the A9 board.

    P.S. Can we also drop some cogs in a A9 version (maybe to 12?) to get some smartpins back?

    I can do that for the -A7.

    It takes as much logic for 1 cog as it does for almost 9 smart pins. Getting rid of 4 cogs would yield about 35 more smart pins. Where would you want them?
  • cgracey wrote: »

    It takes as much logic for 1 cog as it does for almost 9 smart pins. Getting rid of 4 cogs would yield about 35 more smart pins. Where would you want them?

    Chip, I am looking at testing a 34-bit bus between three(?) A-9's. I don't think I will need smart pins for it, but if you think I will eventually use smart pins for the external bus, loading up one of the Ports with all smart pin capabilities would make sense. I am currently thinking I will use Port A, to avoid Pins 62 and 63.
  • jmgjmg Posts: 13,609
    rjo__ wrote: »
    Chip, I am looking at testing a 34-bit bus between three(?) A-9's. I don't think I will need smart pins for it, but if you think I will eventually use smart pins for the external bus, loading up one of the Ports with all smart pin capabilities would make sense. I am currently thinking I will use Port A, to avoid Pins 62 and 63.

    Is this serial, or parallel ?

    Parallel Bus, using the Streamer, I think do not need many smart pins, but they would expect locked-clocks on all P2's.

    Multiple Serial UARTS can configure up to 32b payloads for a 34b frame, and those do need SmartPins.

    Just wiring 3 A9 boards in parallel is going to be a challenge, as those BUS need to be very short and low cross talk.
  • cgracey wrote: »
    I can do that for the -A7.

    It takes as much logic for 1 cog as it does for almost 9 smart pins. Getting rid of 4 cogs would yield about 35 more smart pins. Where would you want them?
    That would be great!
    I need to capture smartpin activity with the streamer in 32 bit mode.
    So maybe P39..P8 for a complete 32 bit sample width.



    Melbourne, Australia
  • Wouldn't it be better to have a version with all 64 smart pins and as many cogs as can fit?
    Prop Info and Apps: http://www.rayslogic.com/
  • JMG

    It's parallel, I only need about 5MB/sec, but I'm shooting for about 20 MB/sec right now. I know...it's stupid, I could do that
    with a handful of serial lines but you have to consider the source:)

    I'm handshaking my way around clock differences. I'm curious to see if the way I understand it, will end up being the way it
    actually is:)

    Rich












  • cgraceycgracey Posts: 11,267
    edited 2016-10-31 - 23:31:21
    I noticed a really stupefying problem with the 'smartpin_usb_turnaround.spin2' program. Since I sped up the smart pin comms, it just hasn't been working right, at all. Turns out that the implied AKPIN that now occurs when RDPIN executes was causing conflicts when one cog was doing WxPIN instructions and the other was doing a RDPIN. RDPIN is no longer passive with the automatic AKPIN. I just need to find a way to do a RDPIN without an automatic AKPIN (maybe using WZ, or something) or make AKPIN not conflict with WxPIN. This has been holding up the FPGA release.
  • jmgjmg Posts: 13,609
    cgracey wrote: »
    ... Turns out that the implied AKPIN that now occurs when RDPIN executes was causing conflicts ..
    Implied ACK is nice for many cases, but I can see that having the ACK optional can make it more general and useful.
    Another case would be multiple Pin-Cells that need to ARM and then ACK/ReARM all together, but Write (to configure) and read (to get results) need to be sequential over many Pin-Cells.

  • Can you just stall wrpin until rdpin is done?
    Prop Info and Apps: http://www.rayslogic.com/
  • Rayman wrote: »
    Can you just stall wrpin until rdpin is done?

    No. These are independent processes.

    I solved the problem by splitting RDPIN into RDPIN (auto acknowledge) and RQPIN (read "quiet", no acknowledge). Works fine now.
  • garryjgarryj Posts: 265
    edited 2016-11-01 - 18:39:36
    I solved the problem by splitting RDPIN into RDPIN (auto acknowledge) and RQPIN (read "quiet", no acknowledge). Works fine now.
    Great news!

    I'm on the cusp of getting reliable (through a little cheating) full-speed working @80MHz with a keyboard/mouse wireless combo, as long as both keyboard and mouse support the HID boot protocol.

    USB tx is working great, but it has been the rx side that has presented most of the challenges. The last hurdle is trying to find a way to deal with rx end-of-packet in a timely manner, as it takes at two+ RDPINs to distinguish the last byte read from EOP. Hopefully, the faster RDPIN will be the magic bullet that gets me over the hump, as I'm about 4 clocks away from doing a Snoopy happy-dance.
    garryj
  • cgraceycgracey Posts: 11,267
    edited 2016-11-01 - 19:03:41
    garryj wrote: »
    I solved the problem by splitting RDPIN into RDPIN (auto acknowledge) and RQPIN (read "quiet", no acknowledge). Works fine now.
    Great news!

    I'm on the cusp of getting reliable (through a little cheating) full-speed working @80MHz with a keyboard/mouse wireless combo, as long as both keyboard and mouse support the HID boot protocol.

    USB tx is working great, but it has been the rx side that has presented most of the challenges. The last hurdle is trying to find a way to deal with rx end-of-packet in a timely manner, as it takes at two+ RDPINs to distinguish the last byte read from EOP. Hopefully, the faster RDPIN will be the magic bullet that gets me over the hump, as I'm about 4 clocks away from doing a Snoopy happy-dance.

    We'll get you there soon. I'm recompiling right now. You have a Prop123-A9 board, right?
  • cgracey wrote: »
    We'll get you there soon. I'm recompiling right now. You have a Prop123-A9 board, right?

    Affirmative.
    garryj
  • I recall reading that the smart-pin upgrade brings with it a full 32-bit pipe with no time penalty? If so, how difficult would it be to have the USB receiver use the upper 16 bits of the new-byte/status register as a received byte counter? The full-speed maximum packet size is 1023 bytes + PID and CRC16. It's not a must-have, but IMO would be handy, and could possibly replace (or augment) the "new-byte toggle" mechanism.

    Too late to think about this?
    garryj
  • garryj wrote: »
    I recall reading that the smart-pin upgrade brings with it a full 32-bit pipe with no time penalty? If so, how difficult would it be to have the USB receiver use the upper 16 bits of the new-byte/status register as a received byte counter? The full-speed maximum packet size is 1023 bytes + PID and CRC16. It's not a must-have, but IMO would be handy, and could possibly replace (or augment) the "new-byte toggle" mechanism.

    Too late to think about this?

    All those flops have already been dedicated to state storage for the send and receive engines. They are just masked off when you read the pin.
  • Chip,
    When we bit-bash the pins without using the smart pins, how many clocks transpire between the sample and the action?

    ie the number of clocks between the instruction and appearing at the pin / sampling the pin and the reading instruction?
      mov dira, #1
      mov outa, #1
      mov x, ina
    
    My Prop boards: P8XBlade2, RamBlade, CpuBlade, TriBlade
    Prop OS (also see Sphinx, PropDos, PropCmd, Spinix)
    Website: www.clusos.com
    Prop Tools (Index) , Emulators (Index) , ZiCog (Z80)
  • cgraceycgracey Posts: 11,267
    edited 2016-11-03 - 06:49:06
    Cluso99 wrote: »
    Chip,
    When we bit-bash the pins without using the smart pins, how many clocks transpire between the sample and the action?

    ie the number of clocks between the instruction and appearing at the pin / sampling the pin and the reading instruction?
      mov dira, #1
      mov outa, #1
      mov x, ina
    

    I don't know at the moment. I just run a test to figure it out when I need to know. I will determine the hard values sometime soon and document them. Whatever they are, they don't change. It is constant for each direction and it's about 2 clocks.
  • The new Version 13 is posted at the top of this thread.

    * Fast smart pin reading and writing (2 clocks)
    * Event jumps
    * ALTB for multi-register bit field access
    * Booter ROM now does 2M baud serial
    * A few assembler branch-address bugs were fixed in PNut.exe
  • Thanks Chip!
    Melbourne, Australia
  • Text loader working nicely @ 2Mbaud. :)
    268 x 195 - 19K
    Melbourne, Australia
Sign In or Register to comment.