Issues running W5200 demo program on Spinnerette

twc · 2012-12-10 06:19

Hey Mike:

As mentioned on the other thread I had pretty good luck modifying some W5200 programs (yours and mine) to run on Spinnerette. Testimony to the quality of your code that it ports so easily...

- Change occurrences of OBJ "W5200" to "W5100"
- Change Prop pin numbers used to Spinnerette pin assignment (i.e.replace wiz.QS_Init with wiz.Start)
- Change socket numbers used to fit 0...3 instead of 0..7
- Change MAC address to Spinnerette MAC address

Oddly I had some issues yesterday that I'm not seeing this early AM. May be exposed by network loading & delays, I'll test more as the day goes on. For now I'll just mention them, let you know if I can confirm as I continue to test.

1. The only 'for sure' issue at this point is the code that tests SPI connection with W5200 (in for example TcpSocketClientDemoDhcpDns.spin aka TSCDDD) doesn't work with W5100 (which doesn't have version register). Not a surprise or big deal - I think you can just change to a different register that both chips init to the same known value such as reg $17=$7 (or $18=$D0, $19=$8, etc.). Alternatively a routine that would a) also test write cycles and b) figure out which chip is attached (by testing wiz txbuffer RAM) might look like..
NOTE: RAM_TEST means read-complement-write-read-compare (and then optionally complement-write to leave wiz txbuffer RAM unchanged)
1. RAM_TEST wiz address $4000, if pass chip = W5100
2. RAM_TEST wiz address $8000, if pass chip = W5200
3. if both fail SPI Error

NOTE: I (thought...) I saw following issues #2 and #3 yesterday, but right now (early AM) I can't reproduce them. I might be going crazy, or it might be network loading/timing dependent (i.e. both my LAN and overall internet much busier yesterday afternoon than now at 5AM). So for now I'll just mention in case it brings anything to mind, will confirm whether I do/don't see the issue(s) again while I test today.

2. Premature EOF
It seemed the following code triggered premature EOF i.e. partial download of webpage...

    if(bytesToRead == 0)
       receiving := false
        'pst.str(string(CR, "Done Receiving Response", CR))
       next

...which raised a possible question that's always been lurking - how do you know if the entire webpage has been received? Depending on timing (program and net) it seems possible that the wiz rxbuffer could be empty (i.e. RXRSR==0) while a following packet is incoming. I'd asked a Wiznet person before and he suggested checking RXRSR twice with a delay between. So, when I was experiencing the problem yesterday this seemed to fix it...

    if(bytesToRead == 0)
      if(sock.Available == 0)
        receiving := false
        'pst.str(string(CR, "Done Receiving Response", CR))
        next

...but again it's working right now without the extra check, so just FYI until further notice. Anyway, even if it turns out to be an issue it would arguably be for the application to deal with (i.e. whether it's a 'true' or 'premature' EOF could be decided by comparing totalBytes (received) to expected content length from HTTP header).

EDIT: It did just fail without the 'check EOF twice' fix (webpage is 7066 bytes) - I'll test more with/without the fix...

Sending HTTP Request
Bytes Sent........99
Bytes Received....6144
************ Count Error **********

'here is another couple minutes later...

Sending HTTP Request
Bytes Sent........99
Bytes Received....2048
************ Count Error **********

3. I went ahead and included W5200 DNS though it wasn't in your Spinnerette distribution, hope I'm not jumping the gun. Anyway, it worked pretty well but (yesterday) it would intermittently 'fail' returning resolved ip 0.0.0.0. I say 'fail' in quotes because I'm not sure if this is really a problem or something to be dealt with by the app in normal operation (i.e. if resolved ip == 0.0.0.0 just retry DNS). The notable point is that this was NOT the '$801' issue i.e. I am using '$801' and when DNS 'failed' the site variable (i.e. "finance.google.com") had NOT been corrupted. Maybe you can comment on whether resolved ip==0.0.0.0 is really a 'failure' or just something that might happen from time to time and should be handled by the app (i.e. retry DNS). And in any case, DNS is working fine right at this moment so just FYI unless I see it again.

I'll continue testing, update the status later today.

twc · 2012-12-10 06:35

...as for issue #3 (DNS), no sooner than I hit submit on previous message and then it did 'fail' after all...

DNS Init (bool)...-1
DNS Lookup........finance.google.com
Resolved IP(0)....0.0.0.0
66 f
69 i
6E n
61 a
6E n
63 c
65 e
2E .
67 g
6F o
6F o
67 g
6C l
65 e
2E .
63 c
6F o
6D m

...and you can see the site variable is OK (I'm using the $801 fix for now). As in previous post discussion, not sure resolved ip==0.0.0.0 is really a 'failure' per se. If it is just something the app should expect from time to time and be prepared to handle we can put the issue aside. I want to see if frequency of this (and 'Premature EOF' *** Count Error *** ) increases as net traffic picks up during the day. I'm going to embellish the test code a bit and run more tests, let you know.

twc · 2012-12-10 17:09

Hammered on it all day and think I've got a pretty clear picture about 1) the 'Premature EOF' and 2) DNS Null IP

1. Premature EOF
I'd seen this a lot yesterday afternoon, but could hardly see it early this morning. Guessed it might be net traffic (and thus timing) related, so I wasn't surprised to see the frequency increase as the net woke up for the day...

Total passes: 176
Total DNS returned null ip: 1
Total TCP timeouts: 6
Total byte receive count errors: 62

Note: What I'm calling 'TCP Timeouts' are this code...

    'Check for a timeout
    if(bytesToRead < 0)
      receiving := false
      t3++
      pst.str(string("Timeout", CR))
  '    return

...though I'm not actually sure how deep in the stack (wiz chip, W5200.spin, socket.spin) they originate. They occur after the GET is issued, presumably if the server doesn't respond quickly enough. Anyway, I've seen them before on W5200, just retry the GET, no biggie.
Main point is the 'byte count errors' i.e. premature eof here...

    if(bytesToRead == 0)
        receiving := false
        'pst.str(string(CR, "Done Receiving Response", CR))
        next

So as the net got busy the frequency of premature eof shot up to 56 (i.e. byte count errors - tcp timeouts) out of 176 iterations.
Next, as mentioned in previous post I applied the 'check EOF twice' fix...

    if(bytesToRead == 0)
      if(sock.Available == 0)
        receiving := false
        'pst.str(string(CR, "Done Receiving Response", CR))
        next

That made a huge improvement...

Total passes: 1861
Total DNS returned null ip: 8
Total TCP timeouts: 23
Total byte receive count errors: 27

...just 4 (27-23] premature eof in 1861 iterations, but still not perfect. Just to confirm the timing nature of the issue, now I'm running...

    if(bytesToRead == 0)
      pause(500)
      if(sock.Available == 0)
        receiving := false
        'pst.str(string(CR, "Done Receiving Response", CR))
        next

...and the results so far...

Total passes: 1111
Total DNS returned null ip: 4
Total TCP timeouts: 10
Total byte receive count errors: 10

...0 (i.e. 10-10) premature eof in 1111 iterations.

All that being said, I don't know if your software can or even should try to do anything with this kind of configuration/traffic/application dependent delay. Seems to me only the application can know whether bytesToRead==0 really means eof so I guess this is just a heads-up for developers.

2. DNS Null IP
You can see above the null ip issue (resolved ip==0.0.0.0) is very rare and doesn't seem directly related to time of day/traffic. Whatever the cause, the good news is it recovers, just retry...

DNS Init (bool)...-1
DNS Lookup........finance.google.com
Resolved IP(0)....0.0.0.0
66 f
69 i
6E n
61 a
6E n
63 c
65 e
2E .
67 g
6F o
6F o
67 g
6C l
65 e
2E .
63 c
6F o
6D m
DNS Init (bool)...-1
DNS Lookup........finance.google.com
Resolved IP(0)....74.125.224.131
Resolved IPs......11

...so again, the app can deal with this (as you do in the Dhcp demos), something like...
REPEAT
Do DNS
Check for app-specific timeout
UNTIL resolved_ip<>0.0.0.0

I thought these issues (EOF and null ip) might be W5100 vs. W5200 differences because I didn't notice them using the W5200. But I wasn't really looking either. I suspect another W5200 app, with different timing, etc., might see them too.

Anyway, your code came out yesterday and now I've got my W5200 app(s) running solid on Spinneret, nice!

Mike G · 2012-12-11 09:27

I have an idea how to handle these errors. I'll post a solution once I can verify.

twc · 2012-12-11 11:24

Just a status update...

As for the 'DNS null ip' (very rare), though a curiousity for sure at this point I have no problem trapping 0.0.0.0 and retry in the application, works fine.

As for the 'Premature EOF', I'm currently using...

    if(bytesToRead == 0)
      receiving:=false
        if(sock.Available <> 0)
          pst.str(string("******** Avoiding Premature EOF ********",CR ))
          t6++
          receiving:=true
      next

...and it's solid for 1000s of cycles (i.e. no (unavoided) Premature EOFs)...

Sending HTTP Request
Bytes Sent........99
******** Avoiding Premature EOF ********
Bytes Received....7109
Disconnect
Total passes: 1385
Total TCP timeouts: 25
Total avoided premature EOFs: 496
Total premature EOFs: 0

..but you can see a lot of 'avoiding' going on (which definitely increases during the day).

Was wondering why I hadn't noticed these issues before on W5200. Don't know if the DNS issue ever happens with W5200 (could look for it), but might not have noticed because it's so rare

As for the Premature EOF I think I know what the problem is - your SPI driver is too fast ! I think it's draining the W5200 socket buffer down and exposing gaps between packets. Might not have been noticed on the W5200 demos because when you display the incoming data (slow) that would keep the W5200 socket buffer filled. On the other hand, with the pst.str(buffer) commented out, the demo does no other checking for a complete download so premature eof could have just gone by unnoticed. Meanwhile, my own W5200 programs do a lot of byte-by-byte processing (again slow) of the incoming data, keeping the W5200 buffer filled so they would never encounter premature eof. But in this stress test (and your demos if the file disply is disabled) the Prop is in a pretty tight loop just sucking the data which exposes the gap between packets. Anyway, that's my guess. Seems like the application should/would have to deal with it (and many/most apps will be 'slow enough' to never see it), but probably you've got a better idea.

Bottem line is it's working solid for me on W5200 and Spinneret.

Issues running W5200 demo program on Spinnerette

Comments