PDA

View Full Version : Webserver randomly becomes unresponsive



chillybasen
01-04-2011, 03:33 AM
I'm using Bean's code (http://forums.parallax.com/showthread.php?127301-FORM-METHOD-quot-GET-quot-Demo), but I've had this happen with Mike's as well (http://forums.parallax.com/showthread.php?128445-Dynamic-Spinneret-with-HttpRequest-Object-and-EEPROM-Configuration-Page)

Everything works perfectly fine, sometimes for days. And then I'll just be unable to connect. I tried telneting to port 80 and it won't even connect. It just times out.

If I reset it, everything is fine again.

Anyone else experiencing this? Any ideas on how to fix?

zapmaster
01-04-2011, 03:38 AM
yes this is happening to me.
nope have not solve it yet.
also when my page is viewed on my network it is fine when viewed from the web it is cut short.

Beau Schwabe
01-04-2011, 04:29 AM
I have seen this also... it appears that this is hanging somewhere in software. The reason I say this is because If I hit the 'Retry' from the web browser claiming to not be able to load the page, there is consistent activity on the Spinneret lights every time I hit 'Retry' from the browser. So the request seems to make it to the Spinneret, but the Spinneret is hung waiting for something else.

One solution might be to introduce a timeout if the data packet is not cleared to zero length in a reasonable amount of time.

Beau Schwabe
01-04-2011, 04:59 AM
Partially SOLVED!!

I was testing a program that was border line and adding a few extra spaces caused the Spinneret to hang.

It has to do with the number of BYTES you are sending at once to the W5100 .... It breaks if you try to send more than 2048 bytes at once. Looking at the code at the very top...

bytebuffersize = 2048

.... So apparently That's my sign, smack it right on my forehead :-) Duh!


EDIT Breaking down the html code length seems to have solved my problems with the Spinneret becoming unresponsive.

Daniel Harris
01-04-2011, 06:38 AM
This is a good description of the exact problem I seem to be having (as described in your thread, Beau). Beau, when you say "partially solved", what is still broken? Does it still hang for you? For me, I could get my Spinneret to hang if I spammed a bunch of page requests in my browser (I.E. holding F5). Maybe a receive buffer in the Propeller or transmit buffer in the W5100 is getting full as well?

I'll mess around with it a bit tomorrow to see if I can characterize my Spinneret's behavior a little better..

Beau Schwabe
01-04-2011, 06:45 AM
Daniel Harris,

I say partially, because I literally just stumbled on what appears to be the problem... in other words it should have further confirmation.

Sending the string size to the serial terminal just before sending it to the W5100 is where I had an 'Ah Ha' moment.

To answer your question... it doesn't hang right now, but under previous conditions I can cause it to hang.

I'll let it run until tomorrow, but at around 6pm I need to take it off-line ... (Meeting with Robotics group) ... and I actually don't have a Spinneret
of my own. The one I'm using is being borrowed from the Robotics club, go figure :-)

jstjohnz
01-04-2011, 06:49 AM
Re buffer size issues. The W5100 defaults to a 2k receive and a 2k transmit buffer for each of the 4 sockets, but these buffer sizes can be altered, keeping in mind the total buffer size is 16k. If you only need 1 socket you could (I believe) have an 8k transmit buffer and an 8k receive buffer on that socket.
Also, there is no check in the driver for the situation where you are trying to send a string longer than the buffer size. If you do, the driver will hang forever.

kuisma
01-04-2011, 07:49 AM
Partially SOLVED!!

I was testing a program that was border line and adding a few extra spaces caused the Spinneret to hang.

It has to do with the number of BYTES you are sending at once to the W5100 .... It breaks if you try to send more than 2048 bytes at once. Looking at the code at the very top...

bytebuffersize = 2048

.... So apparently That's my sign, smack it right on my forehead :-) Duh!


EDIT Breaking down the html code length seems to have solved my problems with the Spinneret becoming unresponsive.

I patched this bug weeks ago and posted it here. It's not really a matter of the size of the W5100 buffer, but the concept of the TCP implementation.

Again, you find my patched version at http://whiteboard.ping.se/Propeller/Network.

Beau Schwabe
01-04-2011, 07:59 AM
kuisma,

Thanks! I've implemented your patch for the Indirect driver on the Web server... If all goes well
it will still be up and running by 6pm tomorrow. ...Leaving it alone til then.

Mike G
01-04-2011, 03:43 PM
Thanks, kuisma Updated mine as well.
http://spinneret.servebeer.com:5000

Check out the xml stylesheet transform (xslt). Works in IE not Firefox and I'm not sure about other browsers. I think I know what's up though. Will check when I get home from work.
http://spinneret.servebeer.com:5000/xslt/sensor.xml

Also the configuration link pulls Spinneret settings from EEPROM. The save button only shows the posted values.

chillybasen
01-04-2011, 05:20 PM
Thanks kuisma. I tried your new patch, but it still hangs after a certain number of requests.

I'm able to pretty easily recreate the issue now by using apache benchmark

ab -n 300 -c 1 http://192.168.1.252/

After around 250 requests, it hangs. I'm guessing the number of requests is dependent on the size of bytes being received and sent

chillybasen
01-04-2011, 05:24 PM
Mike, I did just try hammering your server with 400 requests and it's still up, so maybe it's my server code as well. Can you post your latest server code?

Mike G
01-04-2011, 05:30 PM
Sure, when I get home from work.

sstandfast
01-04-2011, 05:32 PM
I too have been having this problem, but I am not sure it is 100% the fault of the driver. I have added Kuisma's modifications to the SPI driver and I am still getting the random hangs. I have also set my prop/W5100 combo up to use interrupts. When the server becomes unresponsive, the W5100 does not fire the interrupt after the "refresh" button is pressed, but I can see on Wireshark that it is generating lots of TCP traffic in response to the request. Also, since my page is essentially just a modified version of the HTTP demo (HTML embedded in the SPIN code), I know that I am not coming anywhere near the 2K buffer boundary. I have not yet had an opportunity to dig through the TCP traffic and figure out what the packets are but I will hopefully soon. I do know that it does not matter if I am on my local network or an outside network, when it hangs, I can not get a page at all.

Shawn

sstandfast
01-04-2011, 05:39 PM
@Mike G

Your XML style sheet does not work in Chrome. I get the following error:
"
This page contains the following errors:

error on line 2 at column 6: XML declaration allowed only at the start of the document
Below is a rendering of the page up to the first error."

Followed by a blank page. Just FYI...

kuisma
01-04-2011, 05:45 PM
II know that I am not coming anywhere near the 2K buffer boundary.


With my code, there is no longer any 2K boundary. It is all handled internally in a sound way, and you can transmit (e.g. enqueue) a 10kB request with txTCP as TCP is supposed to work. Try it out some more, and if you still believe it is the driver, I'll have another look at it.

sstandfast
01-04-2011, 06:04 PM
No, I think it is NOT the driver, but rather a glitch in the W5100 itself. Since I've enabled interrupts (basically replaced the .TCPListen() with waitpeq) and the fact that when the server hangs, no more interrupts are generated, the Prop code is not in question. (at least not when no responses are concerned.) Since the W5100 doesn't interrupt, the Prop sits and waits, like it should. However, I do see TCP traffic being generated by the W5100 on Wireshark, I just haven't had an opportunity to open the packets up and see if they are connection requests, or what. I should have some time later this week to dig some more, but I'll let you know what I find.

Shawn

kuisma
01-04-2011, 06:14 PM
No, I think it is NOT the driver, but rather a glitch in the W5100 itself. Since I've enabled interrupts (basically replaced the .TCPListen() with waitpeq) and the fact that when the server hangs, no more interrupts are generated,

It sounds like a race condition. Do you check you have processed everything before clearing the interrupt, or do you only process one event?

sstandfast
01-04-2011, 07:07 PM
Right now, I am only processing TCP Connect events on socket 0. Any other interrupts are ignored. I suppose I should be responding to 'Timeouts' as well by resetting the socket. Here is the bit of code associated with interrupts:

'Enable W5100 Interrupts on Socket 0
dira[8] := w5100#_input
W5100.ReadSPI(W5100#_IMR, @InterruptMask, 1)'Get current Interrupt mask
InterruptMask |= 1 'Turn on Socket 0 interrupt
w5100.WriteSPI(true, W5100#_IMR, @interruptMask, 1) 'Write new Interrupt mask to W5100

repeat

repeat 'Wait for an interrupt to occur
waitpeq(0, |< 8, 0) 'Conserve power by halting until the INT pin goes low
repeat until not lockset(lock_id)
W5100.ReadSPI(W5100#_IR, @InterruptMask, 1) 'Get source of interrupt
if InterruptMask & $01 'If socket 0 interrupt
W5100.ReadSPI(W5100#_S0_IR, @InterruptMask, 1) 'Get Socket Interrupt Status
W5100.WriteSPI(true, W5100#_S0_IR,@InterruptMask, 1) 'Clear Socket Interrupt; This should drive INT pin back High unless other interrupt sources are enabled.
if InterruptMask & $01 'We only care about Connection Established Interrupts
quit 'If connected continue
lockclr(lock_id)


Perhaps I should try this variation when I get home tonight:


'Enable W5100 Interrupts on Socket 0
dira[8] := w5100#_input
W5100.ReadSPI(W5100#_IMR, @InterruptMask, 1)'Get current Interrupt mask
InterruptMask |= 1 'Turn on Socket 0 interrupt
w5100.WriteSPI(true, W5100#_IMR, @interruptMask, 1) 'Write new Interrupt mask to W5100

repeat

repeat 'Wait for an interrupt to occur
waitpeq(0, |< 8, 0) 'Conserve power by halting until the INT pin goes low
repeat until not lockset(lock_id)
W5100.ReadSPI(W5100#_IR, @InterruptMask, 1) 'Get source of interrupt
if InterruptMask & $01 'If socket 0 interrupt
W5100.ReadSPI(W5100#_S0_IR, @InterruptMask, 1) 'Get Socket Interrupt Status
W5100.WriteSPI(true, W5100#_S0_IR,@InterruptMask, 1) 'Clear Socket Interrupt; This should drive INT pin back High unless other interrupt sources are enabled.
if InterruptMask & $08 'Check for socket timeout
'Connection terminated
W5100.SocketClose(0)

'Clear any interrupts generated by closing the socket
InterruptMask := $FF
W5100.WriteSPI(true, W5100#_S0_IR, @InterruptMask,1)

'Once the connection is closed, need to open socket again
OpenSocketAgain

elseif InterruptMask & $01 'We only care about Connection Established Interrupts
quit 'If connected continue
lockclr(lock_id)


I'll let you know what happens.

Shawn

Beau Schwabe
01-04-2011, 07:42 PM
Last night just after I posted and changed to using the patch version of the Indirect driver, Someone kicked me with about 300 hits. ... chillybasen was that you? :-) ... as a result my end locked and I had to reset it.

I'm not convinced that this is an issue within the W5100 ... I still think it might be how the software handles the W5100. Anyone run a trace as to where 'in code' the Propeller is when this happens?

The reason I'm still leaning that the W5100 is not at fault is because I can hit it when it 'appears' to be locked up and the Status lights respond as I would expect them to. ... The page just doesn't load, 'I think' because the code is looking for something else to happen and doesn't see the request in order to present the html code to the browser.

Mike G
01-04-2011, 07:48 PM
I don't think this is a hardware issue either. I have 1359 hits today and counting. I'm also able to retrieve the 500k Spinneret pdf.
http://spinneret.servebeer.com:5000/docs/32203.pdf

chillybasen
01-04-2011, 08:18 PM
haha, nope, but I would have if you posted the URL :)

Mike G
01-04-2011, 08:20 PM
@sstandfast. did you happen to view the source in Chrome? I think the problem is with the headers sent back from the server. IE does not care but Firefox and, I guess, Chrome do care. Not sure though until I get home.

sstandfast
01-04-2011, 08:39 PM
I too see that the W5100 is still talking via TCP even when it is hung. However, I have my scope connected to the ~INT line and when the server is "locked up", I do not see it pulsing low in response to the TCP traffic, indicating to me that the problem is on the W5100 side. I.E. it is not establishing a TCP connection. *Edit* This could be due to software misconfiguration (such as clearing an interrupt without performing some action first) or it could be hardware.*/Edit* Now it could be that the pulse is too fast for my scope (it is only a 10MHz single channel B&K Precision scope) and I am just not seeing it so I will modify the interrupt code to stretch the time before it clears the S0_IR register and see if it shows up then. Although I thought that even at 80MHz clock, SPIN only appears to run at about 1.5MIPS which means I should be able to see a nice little pulse.

Right now, my interrupt code only responds to a socket 0 connection established interrupt. All other interrupts are ignored. I will also check the status of the ISR when the server locks up to see if it is waiting for something to happen, like waiting for me to reset the socket after a connection timeout or similar. I'll post back when I get home.

Shawn

sstandfast
01-04-2011, 08:43 PM
@Mike G.

Below is a copy of the source. The most notable thing is the presence of a blank line as the first line in the file.




<?xml version="1.0" encoding="utf-8"?>

<?xml-stylesheet type="text/xsl" href="sensor.xsl"?>

<root>

<readings>

<reading time="1">

<ir distance="10" />

<ping distance="12" />

<tilt x="1" y="2" z="3" />

</reading>

<reading time="2">

<ir distance="12" />

<ping distance="36" />

<tilt x="85" y="100" z="-69" />

</reading>

</readings>

</root>

Mike G
01-04-2011, 09:17 PM
@sstandfast, yeah, when I dropped the files on the SD card this morning I forgot to add a context header for an xml and xsl file. So you're seeing the main header content render and a line, then a condition block is never entered, and finally two blanks lines. For a total of 3 blank lines. I hope that's all it is. I'll know more when I get out of the office.

sstandfast
01-04-2011, 11:56 PM
I just got home a few minutes ago and have looked at the TCP traffic to and from the W5100 during a "Lock-Up" condition. It appears that when the W5100 is "hung", all TCP "Connection Establish Requests" (TCP Flags value = 0x02) are responded to with "Connection Reset" (TCP Flags value = 0x04) responses from the W5100 and no interrupt is generated. This could mean that the previous TCP connection did not close properly. I will amend my code to test this theory. Attached is the capture from Wireshark if anybody wants to view the TCP traffic in a "hung" condition.


76945

*Edit - I should qualify the above attachment with the fact that the Prop/W5100 is located at 192.168.0.100 and the PC is 192.168.0.102

Mike G
01-05-2011, 12:57 AM
Just got home and added the "content-type: text/xml" for xml and xsl files. Now Firefox is responding as expect IE, Safari, Opera, and Chrome look good too.
http://spinneret.servebeer.com:5000/xslt/sensor.xml

I had the web server up all day. The server received 1780 hits and no crashes not too shabby. I requested the spinneret pdf several times which is 500k and no problems. I'm seeing timeouts in my log file. Not sure of the cause. When a timeout occurs the I reset the socket, that's seems to work pretty good.

@sstandfast, thanks for the Ethernet capture.

jstjohnz
01-05-2011, 07:35 AM
I patched this bug weeks ago and posted it here. It's not really a matter of the size of the W5100 buffer, but the concept of the TCP implementation.

Again, you find my patched version at http://whiteboard.ping.se/Propeller/Network.

Hello,
I have found a bug in the SPI driver re truncating transmitted UDP packets. Since you have addressed a couple of other issues in your version you may want to add that fix as well, that way there will be a single version that has all known bug fixes. It's referenced in another thread on this forum, also in the collaberation thread.

Beau Schwabe
01-05-2011, 05:41 PM
I think we (Our robotics group) might have found something that could be the cause of the Webserver randomly becoming unresponsive...

It appears to be a timing issue, especially if you have dynamic HTML code that is generated.

We reduced the html down to bare minimum, and the hang problems went away, but we could introduce a controlled amount of delay of the html generation and the hang problems came back.

minimum HTML code:



HTTP/1.1 404 File Not Found
Content-Type: text/html


The two lines above are sent separately using ... StringSend
An adjustable delay using ... PauseMSec ... between the two lines was used to simulate an 'html creation' delay.

A separate cog for html generation might be required, but then you introduce a latency with the most recent data.
This might boil down to a speed issue between Spin vs. Pasm.

...Anyway, just thought I would pass this info along. I think the software driver is ok, and the 5100... it's a UCE 'user code error' problem.

chillybasen
01-05-2011, 05:58 PM
@Beau that could definitely be my problem as I am controlling a servo based on the request. Is there any way to thread that?

sstandfast
01-05-2011, 05:59 PM
@Beau - Just out of curiosity, what was the break-over point? I.E. how long could you pause before the hang issues came back.

Beau Schwabe
01-05-2011, 06:07 PM
sstandfast,

"Just out of curiosity, what was the break-over point? I.E. how long could you pause before the hang issues came back. " --- Since the problem seems to be random in nature anyway (<--human ping rate, how fast can you click refresh) ... Without any delay in the code we could get thousands of hits... with a 1 second or so delay we might get 2 hits one time, and 25 another before it became 'unresponsive'.

Mike G
01-05-2011, 06:21 PM
Thanks for sharing Beau.

For another point of reference, my server has been up for two days. Some of the processes the web server executes are expensive. Every time a dynamic page is requested a temp file is created on the SD card and served up. There's also some delay involved in retrieving POST and GET values. I'm grabbing the key/value pairs at the time of the request, putting the values in a stack, decoding the string, and returning a pointer. When there are a lot of requests, it take a bit of time. See this guy and click save, http://spinneret.servebeer.com:5000/config.htm.

I did some stress testing a few weeks ago and found that sometimes a request would come in but with no data. So I set a timer when this condition happens. If time runs out and still no data then I reset the socket. That along with the driver fix seemed to get me up and running consistently.

Beau Schwabe
01-05-2011, 06:39 PM
Mike G,

That gives me an idea... short of a PWp "Please Wait page" that loads the dynamic page after it has been rendered might be a workable solution.

The PWp wouldn't need to say anything indicating that the user is waiting. A simple script in the PWp would reload the rendered html when it was complete. In most cases that should be fast enough, that you shouldn't be aware of the PWp.

just a thought.

Mike G
01-05-2011, 06:45 PM
I read your post a couple of times... I'm having a hard time picturing how that would work.

Beau Schwabe
01-05-2011, 06:52 PM
Mike G,

Basically upon the initial connection you have a bare minimum html that responds back to the web request. In that bare minimum html code there is a timer that launches the page >again<. The html timer allows you time to render the "new" html in the background so to speak, without causing any timeout issues from the web request. The actual duration of the html timer would be small enough that to the person making the web request, they would be unaware of the delay.

chillybasen
01-05-2011, 07:08 PM
FYI, I just moved my servo moving code outside the ethernet code (after ETHERNET.SocketClose(0)) and it seems to be much more stable. I threw a few thousand requests at it and it's still up.

The code is here (https://github.com/billychasen/sms-door-opener/blob/master/door.spin) and it built off of bean's webserver with GET

Mike G
01-05-2011, 08:18 PM
One other thing that I always do is write the HTML header in a separate method right before rendering a dynamic (or static) page. So the browser is only waiting for the message body and usually the body is ready to go. For the most part, I already ran whatever processes and have the rendered content in memory before sending the response header.

@chillybasen, I'd do all the processing right after getting the request not in the middle of sending the response. Basically, get all the data ready to go, spin up any processing etc. then render the page.

For example, I'd put this logic elsewhere if you can. I know it does not take too long to run.


if (byte[VarStr[1]] <> 0)
if (byte[VarStr[1]] == "o")
status := "1"
StringSend(0, string("opening", CR))
elseif (byte[VarStr[1]] == "c")
status := "0"
StringSend(0, string("closing", CR))
elseif (status == "1")
StringSend(0, string("open", CR))
elseif (status == "0")
StringSend(0, string("closed", CR))

jstjohnz
01-06-2011, 03:37 AM
I am seeing this same issue. I started monitoring the socket status register. Normally it's value is $14 while waiting for a connection. What I found was that when the hangs were occurring the status register showed either $1C or 0.

What I am trying now is, every time I check for the presence of a connection, if there is no connection I then check the status register. If it's not $14, I close and re-open the socket.

So far so good. Doesn't explain the why, but seems to be a workaround.

jstjohnz
01-06-2011, 10:28 AM
With my code, there is no longer any 2K boundary. It is all handled internally in a sound way, and you can transmit (e.g. enqueue) a 10kB request with txTCP as TCP is supposed to work. Try it out some more, and if you still believe it is the driver, I'll have another look at it.

Kuisma, can you explain your rxtcp fix? I don't understand what this is supposed to do.

-jim-

kuisma
01-06-2011, 10:38 AM
Kuisma, can you explain your rxtcp fix? I don't understand what this is supposed to do.

-jim-

RSR starts out uninitialized and is only assigned data to the lower 16 bits, but returned the entire 32 bit long. This way it from time to time returned garbage instead of the amount of data read.

Beau Schwabe
01-06-2011, 04:04 PM
jstjohnz,

Hmm, on page 29 and 30 of the PDF...(see below) (http://www.parallax.com/Portals/0/Downloads/docs/prod/prop/W5100Datasheetv1.2.2.pdf)

$14 certainly seems to be the correct status mode. $00 seems to shutdown the port indefinitely, while $1C perpetually waits for a close. If it doesn't get a close it will just sit there.

Can you provide the snippet of code you are using to check the status?



$00 - SOCK_CLOSED - It is shown in case that CLOSE commands are given to
Sn_CR, and Timeout interrupt is asserted or connection is
terminated. In this SOCK_CLOSED status, no operation
occurs and all resources for the connection is released.

$1C - SOCK_CLOSE_WAIT - It is shown in case that connection termination request is
received from peer host. At this status, the Acknowledge
message has been received from the peer, but not
disconnected. The connection can be closed by receiving
the DICON or CLOSE commands.

$14 - SOCK_LISTEN - It is shown in case that LISTEN commands are given to
Sn_CR at the SOCK_INIT status. The related socket will
operate as TCP Server mode, and become ESTABLISHED status
if connection request is normally received.

jstjohnz
01-06-2011, 07:27 PM
jstjohnz,

Hmm, on page 29 and 30 of the PDF...(see below) (http://www.parallax.com/Portals/0/Downloads/docs/prod/prop/W5100Datasheetv1.2.2.pdf)

$14 certainly seems to be the correct status mode. $00 seems to shutdown the port indefinitely, while $1C perpetually waits for a close. If it doesn't get a close it will just sit there.

Can you provide the snippet of code you are using to check the status?



$00 - SOCK_CLOSED - It is shown in case that CLOSE commands are given to
Sn_CR, and Timeout interrupt is asserted or connection is
terminated. In this SOCK_CLOSED status, no operation
occurs and all resources for the connection is released.

$1C - SOCK_CLOSE_WAIT - It is shown in case that connection termination request is
received from peer host. At this status, the Acknowledge
message has been received from the peer, but not
disconnected. The connection can be closed by receiving
the DICON or CLOSE commands.

$14 - SOCK_LISTEN - It is shown in case that LISTEN commands are given to
Sn_CR at the SOCK_INIT status. The related socket will
operate as TCP Server mode, and become ESTABLISHED status
if connection request is normally received.


I first added this method to the ethernet driver:


'***************************************
PUB SocketStatus(_socket) | temp0
'***************************************
'' return socket status register new 01/05/2011 jstjohnz for detecting 'bad' socket status
''
'' params: _socket is a value of 0 to 3 - only four sockets on the W5100
'' return: none

'If the ASM cog is running, execute the command
if (W5100flags & _Flag_ASMstarted)

readSPI((_S0_SR + (_socket * $0100)), @temp0, 1)
return temp0.byte[0]


Then in my main spin routine, if I don't see a connection established I do a check to see if the status register value is wrong:



if WIZ.SocketTCPestablished(websocket)==0 'if no connection is established
if wiz.socketstatus(websocket)<>$14 'chk for proper socket status
webstate:=99 'if wrong, set flag so we know to close/reopen the socket
'basically this should execute the code that you would normally execute after sending
'the page, ie close then re-open the socket.
return
else
'code to send the page goes here


The reason for the "webstate:=" is that I am having to split my web page code into small chunks because I am trying to render the web page without missing incoming UDP packets on the other 3 sockets. So, I can only spend a few ms at a time executing the server code. I use the webstate variable with a CASE command to execute a specific portion of the page rendering code on each pass, then I check and deal with the other 3 sockets.

Here's a copy of the page I'm getting now after being up about 11 hours:

-jim-

77002

Beau Schwabe
01-08-2011, 04:22 AM
I am trying out a 'new' correction scheme to monitor the status register... The server has been up almost 24 hours now without hanging ... I'll let it continue over the weekend, but I need some more hits to be certain that this scheme will work. I'll leave the web-cam up as well, so you can see the interactive results. ...I am keeping a tally on this end of bad vs. good status results and will provide code and the results later.

Thanks

Web-Server --> http://24.253.241.231:5555/
Web-Camera --> http://24.253.241.231:8081/

Beau Schwabe
01-08-2011, 06:15 AM
Here is a basic connection flow that I'm using... Within about 1200 hits there are at least 3 that would have resulted in an 'unresponsive' state. The Attached flow was able to recognize a transmission error and recover.

Refer to the PDF (http://www.parallax.com/Portals/0/Downloads/docs/prod/prop/W5100Datasheetv1.2.2.pdf) starting at the bottom of page 29 for the meaning of the Status values

Hope this helps

DynamoBen
01-08-2011, 05:24 PM
While checking the status and resetting the socket is a reasonable workaround for this, I'm concern by the fact that it is happening in the first place. Has anyone been able to isolate if this is on the WizNet side or in the driver?

kuisma
01-08-2011, 05:30 PM
While checking the status and resetting the socket is a reasonable workaround for this, I'm concern by the fact that it is happening in the first place. Has anyone been able to isolate if this is on the WizNet side or in the driver?

Using the TCP Echo server and quite a nasty client, I have exercised the driver quite thorough without being able to neither hang it nor lose any data.

DynamoBen
01-08-2011, 06:48 PM
Using the TCP Echo server and quite a nasty client, I have exercised the driver quite thorough without being able to neither hang it nor lose any data.

Interesting so the driver may be OK, then it must be an issue with the way we are using the driver.

Anyone know if this problem is before, during, or after a page is served up?

Assuming this issue occurs after a page is served up there might be a timing problem between resetting the socket and waiting for another connection. If that is the case a simple fix could be a short wait between reloading the socket and listening.

kuisma
01-08-2011, 06:54 PM
Interesting so the driver may be OK, then it must be an issue with the way we are using the driver.


Well, not really. I just can't reproduce any error related to the driver, but still, I assure you, there are bugs in the driver. :)

If you have code reproducing the bug, and believe it's the driver, please upload the code here.

Mike G
01-08-2011, 11:59 PM
I wanted to mention this... browsers look for favicon.ico. I knew about favicon.ico but it did not cross my mind until I started using Kye's FAT driver with my HttpRequest object. The HttpRequest object processes all requests. So when it sees



GET /favicon.ico HTTP/1.1


the object tries to grab and render the file. Well, if the file does not exist, the FAT driver has a problem. Anyway, just be aware that the browser requests favicon.ico .

sstandfast
01-10-2011, 05:24 PM
I have made a similar change to my interrupt based W5100 as recommended by Beau. I am now checking the status register and resetting the socket if it is not in the "listen" state prior to sleeping the Cog. I also changed the dynamic page generation to get data from a stored variable, rather than polling the node controllers directly. (This was on the ToDo list anyway as I need to read and respond to various room temperatures.) So far it has been up and running the last two days (211800+ seconds uptime) without locking up. Unfortunately, I can't generate enough traffic to really test it out. I know I'm probably gonna be disappointed with the results, but do you guys think you might be able to give me thousand or so hits to test stability? I would really appreciate it. The address is http://krwsms.dyndns.org I'm at work now, so I won't be able to reset it if it goes down, but I should be able to check it throughout the day to see if it is still up.

Thanks again,

Shawn

DynamoBen
01-10-2011, 11:15 PM
For those looking to test their webserver you may want to consider Selenium:

http://seleniumhq.org/

Mike G
01-10-2011, 11:27 PM
@sstandfast, it's hard to hit your site with any stress because of the googlelead.g.doubleclick.net... script has to load first.

Anyway, the site seems to work fine.

David Carrier
01-11-2011, 12:13 AM
sstandfast,
I think I crashed your Spinneret Web Server. The best test I have found is to enter the IP address and port as th URI (in this case http://74.195.253.203:8000/) and hold Ctrl+R to reload continuously. It has been able to crash everyone else's Spinneret Web Server too.

David Carrier
Parallax Inc.

David Carrier
01-11-2011, 12:21 AM
sstandfast,
I stand corrected, it is back up and the Site Counter has gone up quite a bit. It may have queued all of those requests and it wasn't responding until it finished them.

David Carrier
Parallax Inc.

Mike G
01-11-2011, 12:24 AM
Give this one your Ctrl-R best.
http://www.agaverobotics.com/spinneret/controlpanel.htm

or this
http://spinneret.servebeer.com:5000/index.htm

sstandfast
01-11-2011, 12:24 AM
@David, I thought you crashed it too at around 395-400 hits, but apparently it was able to recover after about 10 more connection attempts.

@Mike, to avoid the google ads at the top, try using this URI instead: http://krwredirector.dyndns.org:8000/ This is the destination of the WebHop URI and points directly to my IP.

Mike G
01-11-2011, 12:32 AM
I crashed it myself

zapmaster
01-11-2011, 01:30 AM
I'm have the crash issues also. my server is up and running. It auto resets when it crashes. i'm going to log the hits and crashes to the eeprom.
here is my ip adress:http://72.55.239.53
i also have a issue with not all the data transmited to the destantion. the page only partaily loads.

I idid not have the .8 code running i had the .6
I will see how it tomrrow at work.
Tell me what you see.

Mike G
01-13-2011, 02:38 PM
I made a slight change to my web server that allows it to recover if it gets stuck. I ran a few test and it seems to be working.
http://spinneret.servebeer.com:5000/

Beau Schwabe
01-13-2011, 03:25 PM
Mike G,

So the question begs, what changes did you make? :-)

chillybasen
01-13-2011, 05:09 PM
My webserver started hanging again, so I decided to try jstjohnz solution. However, on SocketTCPestablished, there'd be a $14 sent and it would reset. So I modified the code slightly



repeat
if !ETHERNET.SocketTCPestablished(0)
if ETHERNET.SocketStatus(0) == $00 OR ETHERNET.SocketStatus(0) == $1C
ResetSocket
else
'send HTML


And ResetSocket is just



PRI ResetSocket
ETHERNET.SocketTCPdisconnect(0)
ETHERNET.SocketClose(0)
ETHERNET.SocketOpen(0, ETHERNET#_TCPPROTO, localSocket, destSocket, @destIP[0])
ETHERNET.SocketTCPlisten(0)
return


My server has been up 3 days now, a record. And I've hit it with thousands of requests in testing.

Beau Schwabe
01-13-2011, 10:47 PM
chillybasen,

It might also be a good idea to periodically check 'ETHERNET.SocketTCPestablished(0)' and make sure that it still reads as what you would expect as well as the status

In addition to checking for a status of $14 ... it might be a good idea to check that it eventually becomes $17 while the ETHERNET.SocketTCPestablished(0) is still valid.
A valid status of $14 does change to $16 before it changes to $17, but it can go directly from $14 to $17 ... much of this depends on the flavor of browser and the handshaking
implemented on the client side.

$00 is basically a complete shutdown (release of socket service) on the server side ... could be caused by a terminated or misdirected client connection.
$1C is generated when the client forces a quit ... i.e they close the browser connection.

zapmaster
01-16-2011, 01:42 AM
This is the code i use for the web serving of the current 3 pages:
it does not lock up any more.
i only tested this in house.
72.55.239.53
give it a try


PUB webserver | idx , x ,temp0


'Infinite loop of the server
repeat
'Waiting for a client to connect
'Testing Socket 0's status register and looking for a client to connect to our server
ETHERNET.readIND(ETHERNET#_S0_SR, @temp0, 1)
temp0 <<=24 'bit shift left for the first byte
temp0 >>=24 'bit right left for the first byte
PST.Str(string(" hold for page"))
PST.Str(string(PST#NL))

repeat while temp0 == $14
dira[23]~~ ' turn on led to show ready for web page
outa[23]~~ ' turn on led to show ready for web page
temp0 <<=24 'bit shift left for the first byte
temp0 >>=24 'bit right left for the first byte
ETHERNET.readIND(ETHERNET#_S0_SR, @temp0, 1)
waitcnt(clkfreq/40 + cnt)

ETHERNET.readIND(ETHERNET#_S0_SR, @temp0, 1)
temp0 <<=24 'bit shift left for the first byte
temp0 >>=24 'bit right left for the first byte
PST.Str(string(PST#NL))
if temp0 <> $17
PST.hex(temp0,8)
PST.Str(string(" socket receaved bad data "))
PST.Str(string(PST#NL))
crash:=crash+1
i2cObject.Writelong(i2cSCL, EEPROMAddr, crashprom , crash)
end 'reset the socket
next
outa[23]:=0
'Initialize the buffers and bring the data over
bytefill(@data, 0, _bytebuffersize)
ETHERNET.rxTCP(0, @data)

if data[5] == "A" 'A = home page
home_page
if data[5] == "B" 'B = boiler page
boiler_page
if data[5] == "C" 'C = Solar page
Solar_page
if data[0] == "G"
home_page
.

sstandfast
01-18-2011, 05:21 PM
@zapmaster - just wanted to point out that you can make a small improvement by changing all your


temp0 <<= 24 'bit shift left for the first byte
temp0 >>=24 'bit right shift for the first byte


with a simple masking function:


temp0 &= $FF


This will accomplish the same thing as your back to back shifts with two less instructions (shift then assignment).

Just thought I would mention it.

Shawn

Mike G
01-20-2011, 02:06 PM
@Beau (and all),

Sorry I was off on a San Diego field trip last weekend with the Young Explorers. We saw a Grey Whale out in the Pacific, got video and all. Anyway, kinda' relaxed a bit and stayed out of the office.

I followed Beau's advice to get around the lockups. Plus I read the Wiz5100 manual. I imagine this is similar to what everyone else coded. This snippet is by no means production code but it seems to work. BTW, I also handle 0 data. I'm not sure why I get 0 data but it happens often enough.



t1 := cnt + HTTP_TIMEOUT
repeat while !W5100.SocketTCPestablished(0)
if(t1 > cnt)
W5100.readIND(W5100#_S0_SR, @S0_SR, 1)
ifnot((S0_SR == $14) OR (S0_SR == $16) OR (S0_SR == $17))
PST.Str(string("Status Code 2: "))
PST.Hex(S0_SR, 2)
ResetSocket


http://spinneret.servebeer.com:5000/
http://www.agaverobotics.com/spinneret/controlpanel.htm

Oldbitcollector (Jeff)
01-24-2011, 03:33 PM
I'm seeing this as well..

Perhaps I'm using old code?

My server is: http://software.propellerpowered.com with a link to the code itself at the bottom of the page.

OBC

DynamoBen
01-24-2011, 03:36 PM
I wonder if this issue has to do with an uninitialized variable or a variable rolling over somewhere. :innocent:

Mike G
01-24-2011, 09:24 PM
I'm finding that the lockups are usually my fault. I log every request along with key variable states like the status register. This has helped me find bugs in my code by following what the user was doing when the server locked up.

Timothy D. Swieter
01-25-2011, 12:45 AM
When discussing code lockup or bugs or responsiveness issues, we should reference what driver and version being used.

There have been a handful of bugs fixes to both the SPI and Indirect version of the drivers. The driver code is living on Google Code here: http://code.google.com/p/spinneret-web-server/

If you are able to, I'd suggest downloading the latest SPI and IND drivers and to give them a try. One to see if it improves the performance and two to verify we aren't adding new problems and bugs. I love having a community to test code - you guys have the applications where I don't right now, so I am happy to be tech support for the drivers.

W5100_SPI_Driver.spin (http://spinneret-web-server.googlecode.com/svn/trunk/W5100_SPI_Driver.spin)
W5100_Indirect_Driver.spin (http://spinneret-web-server.googlecode.com/svn/trunk/W5100_Indirect_Driver.spin)

Oldbitcollector (Jeff)
01-25-2011, 02:22 PM
@Mike,

What is the value of HTTP_TIMEOUT in your server?

OBC



@Beau (and all),

I followed Beau's advice to get around the lockups. Plus I read the Wiz5100 manual. I imagine this is similar to what everyone else coded. This snippet is by no means production code but it seems to work. BTW, I also handle 0 data. I'm not sure why I get 0 data but it happens often enough.



t1 := cnt + HTTP_TIMEOUT
repeat while !W5100.SocketTCPestablished(0)
if(t1 > cnt)
W5100.readIND(W5100#_S0_SR, @S0_SR, 1)
ifnot((S0_SR == $14) OR (S0_SR == $16) OR (S0_SR == $17))
PST.Str(string("Status Code 2: "))
PST.Hex(S0_SR, 2)
ResetSocket


http://spinneret.servebeer.com:5000/
http://www.agaverobotics.com/spinneret/controlpanel.htm

Mike G
01-25-2011, 08:32 PM
2 seconds but I removed the timeout in the most recent stuff.

Oldbitcollector (Jeff)
01-25-2011, 08:42 PM
I'm still getting lock ups over here. Would you mind sharing your source code (complete) for some comparison?

OBC

Timothy D. Swieter
01-25-2011, 11:49 PM
I was digging in the WIZnet datasheet last night while working on the feature for adjusting the W5100 socket memory size. While I was in the sheet I saw a couple W5100 values for Retry Time-value REgister (RTR) and Retry Count Register (RCR). These are settings inside the W5100 for handling communications. The default for RTR is 200ms and RCR is 8. I need to read and do some experimenting, but I wonder if RTR should be increased to 400ms or 500ms. I am thinking this has to deal with latency in a network.

Any regular web server programmers around that can talk about RTR timing on a real web server? How long should a server wait for connections and response before moving on?

After I get the socket memory size register going I was thinking of building a demo for watching the RTR, RCR and various error bits in the W5100 so we can verify if it is the W5100 having issues with serving or if it is our code in the Propeller. Maybe this weekend I will get to it, unless someone wants to beat me to it.

Mike G
01-26-2011, 02:00 AM
@OBC, the base code can be found on http://forums.parallax.com/showthread.php?128445-Dynamic-Spinneret-with-HttpRequest-Object-and-EEPROM-Configuration-Page

The only difference related to lockups is the code snippet above. My latest code has a lot more stuff like HTTP file upload development and tons of logging. I'm afraid to make that code available on the forum. I ran out of programming space so the latest code has blocks plugged in here and there and would be difficult to follow if your not familiar the logic.

CassLan
01-27-2011, 09:20 PM
So after reading through the thread...to my understanding is this all solved?

I had this issue while working several weeks back and would like to get back on things :)

Rick

Oldbitcollector (Jeff)
01-28-2011, 01:17 AM
Not resolved here..

I've started using this and I can run the server for most of a day without a powercycle. (However, it does cause more problems with page load, requiring a 'reload', but at least not a powercycle.)



'Infinite loop of the server
repeat
ETHERNET.SocketOpen(0, ETHERNET#_TCPPROTO, localSocket, destSocket, @destIP[0])
ETHERNET.SocketTCPlisten(0)

repeat while !ETHERNET.SocketTCPestablished(0)

ResetSocket
bytefill(@data, 0, _bytebuffersize)
WaitCnt(clkfreq / 100 + cnt) ' 10mSec

ETHERNET.rxTCP(0, @data)

if data[0] == "G" ' Assume a GET request
ParseURL

'Send the web page - hardcoded here
'status line
StringSend(0, string("HTTP/1.1 200 OK", CR, LF))





PRI ResetSocket
ETHERNET.SocketTCPdisconnect(0)
ETHERNET.SocketClose(0)
ETHERNET.SocketOpen(0, ETHERNET#_TCPPROTO, localSocket, destSocket, @destIP[0])
ETHERNET.SocketTCPlisten(0)
return


OBC

Mike G
01-28-2011, 02:35 AM
@CassLan, simply verifying the status register goes from 0x14 to 0x16 or 0x17 and checking for zero data fixed my problems.

OBC, you're making the Spinneret do a lot of unnecessary socket work?

Beau Schwabe
01-28-2011, 09:32 PM
I think a majority of us have decided that it's just a matter of getting the right state machine going.

Understanding the meaning of the status codes helps.

In addition to checking for "0x14 to 0x16 or 0x17" depending on what your code does, it might also be wise to check for 0x00 and 0x1C which can occur after 0x17.

0x00 - can be generated from the browser if the person reloads before the previous page has loaded.

0x1C - can be generated from the browser if the person just closes the browser

In either case if you get a 0x00 or 0x1C 'before' you send the HTML, you might as well not send anything before you re-negotiate.

Timothy D. Swieter
01-29-2011, 02:28 AM
Regarding the state machine, should this be something that we embed in the driver and keep out of the application code? The application code would still need to know something about the state of the socket, but the gory details could be handled in the driver and the status announced to the application code if it asks for it.

CassLan
01-29-2011, 03:29 AM
@Mike - Yeah, thats what I was wondering...had all (or most) of the causes been determined and had a way to deal with.

@Beau - Thank you for explaining, I guess its def worth reading the 5100 Sheet

@Tim - There are times when folks may want this stuff true, maybe a behavior profile with which to call the driver...certain automated flags depending on how "simple" you want to use it?


I'm planning some quality spinnerett time tommorow muhahahahaha, Mike I love your eeprom IP settings save page.
Maybe I'll have something fun to post back by eod tommorow!!

Beau Schwabe
01-29-2011, 03:43 AM
Timothy D. Swieter,

"Regarding the state machine, should this be something that we embed in the driver and keep out of the application code?" - For most applications I would agree, but as soon as I say that there's probably a reason why it shouldn't be managed.

The Demo Code (http://forums.parallax.com/showthread.php?129127-DEMO-Easy-HTML-with-the-Spinneret) I posted attempts to move all of the 'gory details' to the background...

ags
02-03-2011, 05:40 AM
I coded a simple web server (just to prove the basic building blocks for a larger effort) and during development found with my first implementation that for a client on my LAN, everything was fine. When I tried to browse with an iPad over WiFi, or my Android phone over wireless, things fell apart. It turned out to be a mistake in the way I expected state transitions to happen (added latency made intermediate steps last longer, causing problems). This was the source of my problems, and I wonder if it's the cause of the problems described here.

I'll offer my tiny server up for your abuse - for the purpose of seeing if what I've done is truly a robust implementation that could be shared. The page is static: no buttons, just #page hits and uptime.

I'm already shuddering at what is going to happen here - but go at it, try to bust it!

http://xxxxxxxxx

eagletalontim
11-29-2014, 02:58 PM
I just experienced my first inbound connection loss after leaving my server up and running overnight. I was able to see the Spinneret sending data out to my server and receiving the needed data back along with DHCP and SNTP checks but I was not able to access the server through a web browser. A simple reset and I was able to connect again. I am using the latest W5100 driver. Do I need to post my code or is this still an issue with the W5100?