Maddeningly bizarre Spinneret issue
Phil Pilgrim (PhiPi)
Posts: 23,514
I've come across a Spinneret programming issue that's driving me nuts. I've got a wrapper on Beau's "SimpleHTML" wrapper for the BrilIdea driver (v006-p1, as modified by Kuisma). Basically, for each page or image, I call a begin method, some content methods, and an end method. These methods depend on what kind of content (e.g. html, plain text, custom) is being delivered. Only the html end method is special. The others just call a universal end_all method. Here are the end methods (still in the top-level cog: no Spin cogs were started):
Now, here's the weird part. If I'm sending plain text, for example, and just do a short-circuit call to end_all to wrap it up, everything works fine. But end_all was planned to be a PRIvate method. So if I call end_plain_page instead, things blow up. I can get the first page (sometimes), but the Spinneret keeps sending stuff after that. By using an Ethernet sniffer, I see that the additional data comes in batches of 80 bytes (a pair of 40-byte packets) at about one-second intervals, coinciding with the red and blue LED flashes on the Spinneret. If I try to refresh the page in my browser, sometimes I get the right stuff; sometimes I get streams of old HTML data repeating themselves in a plain text page.
My inclination is to believe that somehow the top-level stack is intruding on a buffer somewhere. And I do notice that the BrilIdea driver, rather than allocating variables for its transmit and receive buffers, merely states where they begin in RAM ($4000 and $6000). I can't think of a reason to do it this way, but I understand almost zilch about the W5100 or the BrilIdea driver, so maybe there's a good reason. Anyway, I'm using socket 0, which should use the buffers from $4000-$47FF and $6000-$67FF. I tried changing the base address of the upper buffer to $5000, but to no avail.
I'm stumped. Maybe the forum can see something obvious -- or non-obvious -- that I can't.
Thanks,
-Phil
PUB end_html_page pop_all newline last_indent~ str(string(" </body>", 10, "</html>", 10)) end_all PUB end_plain_page end_all PUB end_custom_content end_all PUB end_all flush wrapper.NoPersistanceAllowed(sock)flush just sends any data remaining in the local character buffer before telling Beau's wrapper to wrap it up.
Now, here's the weird part. If I'm sending plain text, for example, and just do a short-circuit call to end_all to wrap it up, everything works fine. But end_all was planned to be a PRIvate method. So if I call end_plain_page instead, things blow up. I can get the first page (sometimes), but the Spinneret keeps sending stuff after that. By using an Ethernet sniffer, I see that the additional data comes in batches of 80 bytes (a pair of 40-byte packets) at about one-second intervals, coinciding with the red and blue LED flashes on the Spinneret. If I try to refresh the page in my browser, sometimes I get the right stuff; sometimes I get streams of old HTML data repeating themselves in a plain text page.
My inclination is to believe that somehow the top-level stack is intruding on a buffer somewhere. And I do notice that the BrilIdea driver, rather than allocating variables for its transmit and receive buffers, merely states where they begin in RAM ($4000 and $6000). I can't think of a reason to do it this way, but I understand almost zilch about the W5100 or the BrilIdea driver, so maybe there's a good reason. Anyway, I'm using socket 0, which should use the buffers from $4000-$47FF and $6000-$67FF. I tried changing the base address of the upper buffer to $5000, but to no avail.
I'm stumped. Maybe the forum can see something obvious -- or non-obvious -- that I can't.
Thanks,
-Phil
Comments
Wrong address space. This address refers to the W5100 on-chip address space, not Propeller hub RAM.
So much for the stack theory then, although the only difference between working and not working is one more nested method call...
-Phil
You are using quite an old version of the driver. I do not believe this is the reason, but try using the last version.
Thanks,
-Phil
Or download it directy at http://code.google.com/p/spinneret-web-server/source/browse/#svn%2Ftrunk
-Phil
-Phil
- The first request I'm receiving when I send a request from the browser is an empty string. If I don't answer it with something (anything?), I don't get a further request for a webpage, and the browser times out.
- If I answer that first request with an HTTP response, followed by a call to end_all (see above), everything behaves normally from there on. (This explains an earlier head-scratcher, when my first response after reset showed me as the second visitor.)
- If I answer that first request with an HTTP response, followed by a call to end_plain_page (which just calls end_all), I get a barrage of empty requests from that point on, which also get responded to (but the response text never shows up in the minimal response packets) and the browser times out.
- This is not a stack issue, it turns out. I ran my main program in a separate Spin cog using a stack I can monitor, and it only used 48 longs.
So here are my questions:- What is the first empty request for?
- What's the appropriate response?
- Why would calling NoPersistanceAllowed from a different stack level cause different behavior?
This just get curiouser and curiouser.-Phil
This is actually an impossibility. TCP has no mechanism to send an empty string, since it is purely stream oriented. A TCP packet with payload length zero if of course possible, and is often seen (handshaking etc), and it may cause the driver (i.e. rxTCP) to return, so you can check socket status etc, but this is not to be interpreted as a transmission of the empty string.
In TCP, you iterate calls to rxTCP collecting data until you got everything you need. Every call to rxTCP may return zero, one or more bytes of your stream.
(this said, tcp keep-alives actually sends something very close to empty strings, but I hope this is handled by the W5100, never seen neither in the driver nor at application level)
This is strange. You should just iterate rxTCP and get the request you are expecting. This sounds more and more like you got a race condition in you code.
Not using any pointers? I still vote for an accidental overwrite, and a race condition as second most plausible alternative.
This is called from my own wrapper:
It's the data array that often returns as an empty string, even though Beau's code is checking for a non-zero packet size. And I'm not sure how to respond to such a request. I only know that if I don't respond, things seize up; so I just send the page I would've sent to answer a valid request.
I was hoping I didn't have to delve too deeply into the bowels TCP/IP protocol to get this working, but it looks like I may have to.
-Phil
-Phil
That's what you got your application level protocol for.
-Phil
There is some other thread more about this, but briefly, use one of the following mechanisms;
a) Parse the content-length http header field.
b) Use HTTP/1.0 or HTTP/1.1 without HTTP keep-alive, and collect everything until the peer closes the session.
Wait a minute - this above is if you act client side, you are acting server? Then it's simple. The HTTP header/request terminates by an empty line by itself. Some HTML commands may have trailing data, but you know that depending on the what HTML request you get.
I have a suspicion that's not going to help, though, because if I wait for a complete HTTP message without responding to the null payloads, it will freeze up again like it's doing now. Am I supposed to be sending some kind of ACK for each packet? Or is the W5100 supposed to do that for me?
-Phil
a) HTMLReady() do not even return data length, so in TCP talk, you can actually not claim that you got anything "empty" back. But ok, in HTTP I guess NULL is not a valid character, so you may implicit say that if data[0] == 0, you got "empty" back.
b) HTMLReady() may very well return before it has collected any of the TCP stream what so ever. 1000 calls to rxTCP may very well pass during the time the client manufactures its request, espeacially as it is untimed.
c) Is your data structure at least of size BUFFER_SIZE bytes and aligned to longs? No so good by HTMLReady() to assume a byte array is long aligned without explicit documentation. If you really need to initialize it, use bytefill instead of longfill. Myself, I would only terminate the first byte with byte[DataAddress][0] := 0 since you are already are depending on NULL termination.
d) You do not know in your wrapper why HTMLReady returned, since you do not know if it did timeout, got any payload, the socket disconnected or anything. You are blind here.
Do not think "packets" when dealing with TCP, it's a stream, and thinking packets will confuse you.
You do not need to think about ACKs, but you do have to keep track of why, what and how much rxTCP returns.
b) Beau's routine checks for a non-zero packet size before returning, but apparently that does not imply there will be anything in data.
c) Yes, I caught the long alignment thing, and my data array is defined as a long.
d) That's what concerns me. I may need to do my own calls to rxTCP, rather than using Beau's wrapper -- except that, intially, it seemed to work so well.
My objective is to have a request method that transparently returns GET requests without having to mess with TCP stuff in my top-level program. That's the only way the Spinneret will really catch fire with the customers ... I think.
-Phil
I re-checked the code, and realized I was wrong about your own wrapper being blind.
There are two errors in the wrapper causing all your problems;
a) The variable i in uninitialized.
b) The order of the checks i>1000 and packetLength are wrong, must check packetLength before the timeout.
If you initializes the variable i to zero, it will work in 999 or 1000 cases. This also explains why the stack frame seemed to be involved, since it typical change the values of uninitialized variables.
-Phil
Yes indeed. I need more coffee. So - the only condition your wrapper may return, is when HTMLReady gets a non-zero return from rxTCP, right? If so, there may be an error in the driver.
-Phil
In TCP, NULL is a perfectly legal character, and I do not know how to interpret a NULL in HTTP -- Do you?
Rewrite the wrapper to return payload length and continue there.
Does this imply that I need to be dealing with non-HTTP messages somehow? IOW, if it includes nulls, it can't be HTTP, right? I'm not sure how to respond to those.
-Phil
Uhum? NULL is an ASCII character. A quick check in the RFC, nothing about any NULLs.
Nah, only rewrite the wrapper to handle data+length instead of depending on null-terminated strings, mostly to get more control to actually see what it is you are receiving. If you get "<NULL>GET /hello.world HTTP/1.0" there might be an error in the client, w5100 or driver.
Thanks!
-Phil