Maddeningly bizarre Spinneret issue

Phil Pilgrim (PhiPi) · 2011-02-21 22:05

I've come across a Spinneret programming issue that's driving me nuts. I've got a wrapper on Beau's "SimpleHTML" wrapper for the BrilIdea driver (v006-p1, as modified by Kuisma). Basically, for each page or image, I call a begin method, some content methods, and an end method. These methods depend on what kind of content (e.g. html, plain text, custom) is being delivered. Only the html end method is special. The others just call a universal end_all method. Here are the end methods (still in the top-level cog: no Spin cogs were started):

PUB end_html_page

  pop_all
  newline
  last_indent~
  str(string("  </body>", 10, "</html>", 10))
  end_all

PUB end_plain_page

  end_all

PUB end_custom_content

  end_all

PUB end_all

  flush
  wrapper.NoPersistanceAllowed(sock)

flush just sends any data remaining in the local character buffer before telling Beau's wrapper to wrap it up.

Now, here's the weird part. If I'm sending plain text, for example, and just do a short-circuit call to end_all to wrap it up, everything works fine. But end_all was planned to be a PRIvate method. So if I call end_plain_page instead, things blow up. I can get the first page (sometimes), but the Spinneret keeps sending stuff after that. By using an Ethernet sniffer, I see that the additional data comes in batches of 80 bytes (a pair of 40-byte packets) at about one-second intervals, coinciding with the red and blue LED flashes on the Spinneret. If I try to refresh the page in my browser, sometimes I get the right stuff; sometimes I get streams of old HTML data repeating themselves in a plain text page.

My inclination is to believe that somehow the top-level stack is intruding on a buffer somewhere. And I do notice that the BrilIdea driver, rather than allocating variables for its transmit and receive buffers, merely states where they begin in RAM ($4000 and $6000). I can't think of a reason to do it this way, but I understand almost zilch about the W5100 or the BrilIdea driver, so maybe there's a good reason. Anyway, I'm using socket 0, which should use the buffers from $4000-$47FF and $6000-$67FF. I tried changing the base address of the upper buffer to $5000, but to no avail.

I'm stumped. Maybe the forum can see something obvious -- or non-obvious -- that I can't.

Thanks,
-Phil

kuisma · 2011-02-22 00:29

Phil Pilgrim (PhiPi) wrote: »

And I do notice that the BrilIdea driver, rather than allocating variables for its transmit and receive buffers, merely states where they begin in RAM ($4000 and $6000). I can't think of a reason to do it this way, but I understand almost zilch about the W5100 or the BrilIdea driver, so maybe there's a good reason. Anyway, I'm using socket 0, which should use the buffers from $4000-$47FF and $6000-$67FF. I tried changing the base address of the upper buffer to $5000, but to no avail.

Wrong address space.

This address refers to the W5100 on-chip address space, not Propeller hub RAM.

Phil Pilgrim (PhiPi) · 2011-02-22 00:58

Ah, so. Thanks.

So much for the stack theory then, although the only difference between working and not working is one more nested method call...

-Phil

kuisma · 2011-02-22 01:15

I still believe in your stack theory, you've just looked at the wrong place, I guess. Must be something overwriting something else, but if it is in your code or the driver..?

You are using quite an old version of the driver. I do not believe this is the reason, but try using the last version.

Phil Pilgrim (PhiPi) · 2011-02-22 09:01

Oh, I though your P1 mod was the latest. I guess I'd better wander over to the OBEX...

Thanks,
-Phil

kuisma · 2011-02-22 09:03

Not found in OBEX. Have a look at http://forums.parallax.com/showthread.php?128520-Google-Code-repository-for-open-source-Spinneret-Web-Server-firmware

Or download it directy at http://code.google.com/p/spinneret-web-server/source/browse/#svn%2Ftrunk

Phil Pilgrim (PhiPi) · 2011-02-22 09:11

'Found it, thanks! I'll give it a try after breakfast. So have you taken over dev and maintenance of this driver from Timothy?

-Phil

kuisma · 2011-02-22 09:23

It's a joint venture - he corrects my bugs, I corrects his. :thumb:

Phil Pilgrim (PhiPi) · 2011-02-22 09:51

'Just tried the new driver. 'Still no joy. But at least I know I've got the very latest.

-Phil

Phil Pilgrim (PhiPi) · 2011-02-23 16:03

Okay, I feel like I've gone down the rabbit hole with this project. Here's what I'm discovering:

The first request I'm receiving when I send a request from the browser is an empty string. If I don't answer it with something (anything?), I don't get a further request for a webpage, and the browser times out.
If I answer that first request with an HTTP response, followed by a call to end_all (see above), everything behaves normally from there on. (This explains an earlier head-scratcher, when my first response after reset showed me as the second visitor.)
If I answer that first request with an HTTP response, followed by a call to end_plain_page (which just calls end_all), I get a barrage of empty requests from that point on, which also get responded to (but the response text never shows up in the minimal response packets) and the browser times out.
This is not a stack issue, it turns out. I ran my main program in a separate Spin cog using a stack I can monitor, and it only used 48 longs.

So here are my questions:

What is the first empty request for?
What's the appropriate response?
Why would calling NoPersistanceAllowed from a different stack level cause different behavior?

This just get curiouser and curiouser.

-Phil

kuisma · 2011-02-23 22:01

Phil Pilgrim (PhiPi) wrote: »

The first request I'm receiving when I send a request from the browser is an empty string.

This is actually an impossibility.

TCP has no mechanism to send an empty string, since it is purely stream oriented. A TCP packet with payload length zero if of course possible, and is often seen (handshaking etc), and it may cause the driver (i.e. rxTCP) to return, so you can check socket status etc, but this is not to be interpreted as a transmission of the empty string.

In TCP, you iterate calls to rxTCP collecting data until you got everything you need. Every call to rxTCP may return zero, one or more bytes of your stream.

(this said, tcp keep-alives actually sends something very close to empty strings, but I hope this is handled by the W5100, never seen neither in the driver nor at application level)

Phil Pilgrim (PhiPi) wrote: »

If I don't answer it with something (anything?), I don't get a further request for a webpage, and the browser times out

This is strange. You should just iterate rxTCP and get the request you are expecting. This sounds more and more like you got a race condition in you code.

Phil Pilgrim (PhiPi) wrote: »

This is not a stack issue, it turns out. I ran my main program in a separate Spin cog using a stack I can monitor, and it only used 48 longs.

Not using any pointers? I still vote for an accidental overwrite, and a race condition as second most plausible alternative.

Phil Pilgrim (PhiPi) · 2011-02-23 22:55

I'm not calling rxTCP directly. I'm using Beau's wrapper routine:

PUB HTMLReady(DataAddress)|packetSize,i,stat,skip   '<-GBS new
    readIND(Wiz_socket,@stat)
    skip := 1
    
    if stat == $14
       skip := 0
       repeat until SocketTCPestablished(Wiz_socket)       
         readIND(Wiz_socket, @stat)
{{
          stat should return $14 and then change to $16

          in the case of an error, stat can sometimes return $00 or a $1C
          $1C happens when the browser windo is terminated by the client.
}}         
         if stat <> $14
            if stat == $00 or stat == $1C
               skip := 1
               quit


       if skip==0
          'Initialize the buffers and bring the data over
          longfill(DataAddress, 0, BUFFER_SIZE/4)
          repeat  
            packetSize := rxTCP(Wiz_socket, DataAddress)
'            i++
'            if i>1000      'option to timeout if packet not received
'               skip := 1
'               quit
          while packetSize == 0

     result := skip

This is called from my own wrapper:

PUB request

  repeat while wrapper.HTMLReady(@data)
  return @data

It's the data array that often returns as an empty string, even though Beau's code is checking for a non-zero packet size. And I'm not sure how to respond to such a request. I only know that if I don't respond, things seize up; so I just send the page I would've sent to answer a valid request.

I was hoping I didn't have to delve too deeply into the bowels TCP/IP protocol to get this working, but it looks like I may have to.

-Phil

Phil Pilgrim (PhiPi) · 2011-02-23 22:59

kuisma wrote:

In TCP, you iterate calls to rxTCP collecting data until you got everything you need. Every call to rxTCP may return zero, one or more bytes of your stream.

So how do you know when you've got everything? (I know that may seem like an awfully basic question, but that's the level I'm working at right now.)

-Phil

kuisma · 2011-02-23 23:01

Phil Pilgrim (PhiPi) wrote: »

So how do you know when you've got everything? (I know that may seem like an awfully basic question, but that's the level I'm working at right now.)

That's what you got your application level protocol for.

Phil Pilgrim (PhiPi) · 2011-02-23 23:12

I'm only processing GET requests, so I only really need everything up to the first CRLF in the HTTP message. What happens if the rest of the message comes later (in another packet?), and I simply ignore it?

-Phil

kuisma · 2011-02-23 23:19

Phil Pilgrim (PhiPi) wrote: »

I'm only processing GET requests, so I only really need everything up to the first CRLF in the HTTP message. What happens if the rest of the message comes later (in another packet?), and I simply ignore it?

There is some other thread more about this, but briefly, use one of the following mechanisms;

a) Parse the content-length http header field.

b) Use HTTP/1.0 or HTTP/1.1 without HTTP keep-alive, and collect everything until the peer closes the session.

Wait a minute - this above is if you act client side, you are acting server? Then it's simple. The HTTP header/request terminates by an empty line by itself. Some HTML commands may have trailing data, but you know that depending on the what HTML request you get.

Phil Pilgrim (PhiPi) · 2011-02-23 23:28

Oh, right, the blank line. D'oh!

I have a suspicion that's not going to help, though, because if I wait for a complete HTTP message without responding to the null payloads, it will freeze up again like it's doing now. Am I supposed to be sending some kind of ACK for each packet? Or is the W5100 supposed to do that for me?

-Phil

kuisma · 2011-02-23 23:34

Phil Pilgrim (PhiPi) wrote: »

I'm not calling rxTCP directly. I'm using Beau's wrapper routine:

It's the data array that often returns as an empty string, even though Beau's code is checking for a non-zero packet size. And I'm not sure how to respond to such a request. I only know that if I don't respond, things seize up; so I just send the page I would've sent to answer a valid request.

I was hoping I didn't have to delve too deeply into the bowels TCP/IP protocol to get this working, but it looks like I may have to.

-Phil

a) HTMLReady() do not even return data length, so in TCP talk, you can actually not claim that you got anything "empty" back. But ok, in HTTP I guess NULL is not a valid character, so you may implicit say that if data[0] == 0, you got "empty" back.

b) HTMLReady() may very well return before it has collected any of the TCP stream what so ever. 1000 calls to rxTCP may very well pass during the time the client manufactures its request, espeacially as it is untimed.

c) Is your data structure at least of size BUFFER_SIZE bytes and aligned to longs? No so good by HTMLReady() to assume a byte array is long aligned without explicit documentation. If you really need to initialize it, use bytefill instead of longfill. Myself, I would only terminate the first byte with byte[DataAddress][0] := 0 since you are already are depending on NULL termination.

d) You do not know in your wrapper why HTMLReady returned, since you do not know if it did timeout, got any payload, the socket disconnected or anything. You are blind here.

Do not think "packets" when dealing with TCP, it's a stream, and thinking packets will confuse you.

kuisma · 2011-02-23 23:42

Phil Pilgrim (PhiPi) wrote: »

I have a suspicion that's not going to help, though, because if I wait for a complete HTTP message without responding to the null payloads, it will freeze up again like it's doing now. Am I supposed to be sending some kind of ACK for each packet? Or is the W5100 supposed to do that for me?
l

You do not need to think about ACKs, but you do have to keep track of why, what and how much rxTCP returns.

Phil Pilgrim (PhiPi) · 2011-02-24 00:09

a) "...so you may implicit say that if data[0] == 0, you got 'empty' back." Yes, that's what I'm doing. Since the array is always cleared to zero first, the return behaves like a zero-terminated string.

b) Beau's routine checks for a non-zero packet size before returning, but apparently that does not imply there will be anything in data.

c) Yes, I caught the long alignment thing, and my data array is defined as a long.

d) That's what concerns me. I may need to do my own calls to rxTCP, rather than using Beau's wrapper -- except that, intially, it seemed to work so well.

My objective is to have a request method that transparently returns GET requests without having to mess with TCP stuff in my top-level program. That's the only way the Spinneret will really catch fire with the customers ... I think.

-Phil

kuisma · 2011-02-24 00:12

Ha!

I re-checked the code, and realized I was wrong about your own wrapper being blind.

There are two errors in the wrapper causing all your problems;

a) The variable i in uninitialized.
b) The order of the checks i>1000 and packetLength are wrong, must check packetLength before the timeout.

If you initializes the variable i to zero, it will work in 999 or 1000 cases. This also explains why the stack frame seemed to be involved, since it typical change the values of uninitialized variables.

Phil Pilgrim (PhiPi) · 2011-02-24 00:15

The timeout part is commented out in his code. So the packetSize has to be non-zero for it to return, unless it's for a "skip" condition detected earlier.

-Phil

kuisma · 2011-02-24 00:25

Phil Pilgrim (PhiPi) wrote: »

The timeout part is commented out in his code. So the packetSize has to be non-zero for it to return, unless it's for a "skip" condition detected earlier.

Yes indeed. I need more coffee. So - the only condition your wrapper may return, is when HTMLReady gets a non-zero return from rxTCP, right? If so, there may be an error in the driver.

Phil Pilgrim (PhiPi) · 2011-02-24 00:29

So - the only condition your wrapper may return, is when HTMLReady gets a non-zero return from rxTCP, right?

Correct.

-Phil

kuisma · 2011-02-24 00:31

But wait again ... if you DO get a non-zero return (from rxTCP), how can you tell you don't get any data?

In TCP, NULL is a perfectly legal character, and I do not know how to interpret a NULL in HTTP -- Do you?

Rewrite the wrapper to return payload length and continue there.

kuisma · 2011-02-24 00:37

E.g. print out the data returned according to what rxTCP returns based on the return value (i.e. length), not on the idea it must be non-NULL.

Phil Pilgrim (PhiPi) · 2011-02-24 00:38

In TCP, NULL is a perfectly legal character, and I do not know how to interpret a NULL in HTTP. do you?

No. HTTP is an ASCII protocol.

Rewrite the wrapper to return payload length and continue there.

Does this imply that I need to be dealing with non-HTTP messages somehow? IOW, if it includes nulls, it can't be HTTP, right? I'm not sure how to respond to those.

-Phil

kuisma · 2011-02-24 00:47

Phil Pilgrim (PhiPi) wrote: »

No. HTTP is an ASCII protocol.

Uhum? NULL is an ASCII character.

A quick check in the RFC, nothing about any NULLs.

Phil Pilgrim (PhiPi) wrote: »

Does this imply that I need to be dealing with non-HTTP messages somehow? IOW, if it includes nulls, it can't be HTTP, right? I'm not sure how to respond to those.

Nah, only rewrite the wrapper to handle data+length instead of depending on null-terminated strings, mostly to get more control to actually see what it is you are receiving. If you get "<NULL>GET /hello.world HTTP/1.0" there might be an error in the client, w5100 or driver.

kuisma · 2011-02-24 00:50

... or pack a self sufficient (all objects needed) zip archive with everything I need to reproduce the error, and I'll have a look myself. Right now it feels like I'm fumbling in the dark. :cool:

kuisma · 2011-02-24 01:00

Have you performed an ethernet packet trace watching the "empty" packet on the net?

Phil Pilgrim (PhiPi) · 2011-02-24 01:04

Right now it feels like I'm fumbling in the dark.

So am I.

Actually, it's 1 a.m. here, and my mind has turned to mush. I'd send you a zip right now, but I'm mid modification. I'll see what tomorrow brings and take up the thread then. I really appreciate the time and attention you're giving me on this! But right now, I need to sleep.

Thanks!
-Phil

Maddeningly bizarre Spinneret issue

Comments