5 Replies Latest reply: Jun 14, 2013 7:59 PM by OvCollyer RSS

    Server process appears to crash randomly, on average every 10 hours

    test350 Novice

      My Slingbox 350 stops streaming at random times.  The problem is easily reproduced.  It occurs with various client computers and devices, both on the LAN and over the Internet, so it's quite certain the problem is not with my setup.  I'm pretty sure that this is a firmware bug, and am willing to help  Sling Media engineers isolate and fix the problem.  It has plagued me since the box was new in November, 2012.

       

      My environment (for the test case described below):  Slingbox 350, running 1.1.52, fed with 1280x720p60 component video from a Motorola QIP-6416 DVR.  Connected to Verizon FiOS (50 Mbps upload) in New York via stock Actiontec router.  TCP port 5201 forwarded to Slingbox.  WebSlingPlayer version 1.5.14.755 is running in Paris on a Toshiba R835 laptop under Firefox 21.0 / Windows 8 Pro.  Connected to Orange La Fibre (100 Mbps download) via stock Livebox router.  TCP Streaming is (until the error) stable at about 4 Mbps.

       

      The symptom: Video and audio are playing normally, then the sound suddenly stops and the picture freezes.  The player appears to attempt a restart, but about one minute later it aborts with a W201 error.  I have an old Slingbox PRO-HD connected to the 350's component outputs and have confirmed that there is no problem with the incoming video around the time of failure.  There is also no problem with the network: pings from Paris to New York (and vice-versa) show no packet loss.  Nor does the Slingbox OS crash or reboot -- pings over the LAN to the Slingbox IP also show no loss.

       

      I took a Wireshark capture to see what is going wrong.  Here is a summary; I can supply a capture file if desired.  Packets are coming in on the stream connection and (after about 26,000,000 captured packets) suddenly cease.  The last IP packet received properly completes a 3072-byte ASF packet and gets ACKed normally by Windows.  Let's call this time t=0.  At t=~9 (in this case), the player sends the next routine keep-alive (function 0x66) on the control connection, but that does not get ACKed.  Windows does a series of TCP retransmissions, without any response, and at t=~30 sends a RST on the control connection.  A second later, the player attempts to open a new control connection to the Slingbox, but gets a [RST, ACK] (connection refused) response, i.e. the Slingbox is not "listening" on TCP port 5201 anymore.  However, the player keeps retrying at about one second intervals.  At t=~61, the connection succeeds, i.e. the server process appears to have been restarted, and the player sends HTTP headers followed by a login (function 0x67).  Unfortunately, the player tagged the login with the old Session ID, which the "new" server knows nothing about.  The server's response, although 'success' (status code 0x00), is tagged with a new Session ID.  The player appears to dislike the mismatch and closes the new control connection.  About 100 milliseconds later, a RST comes in for the old stream connection.  I don't know whether the player would have otherwise retried further, but at this point, it gives up and displays the W201 error code.

       

      I hope that you have seen this problem before, or can reproduce it in your lab.  If not, I'd be glad to perform further tests and/or provide captures as needed.

       

      I'm also interested in hearing whether any other members have encountered this issue, and if any workaround is known.

       

      Thanks in advance for your help,

       

      Stewart

        • Re: Server process appears to crash randomly, on average every 10 hours
          OvCollyer Apprentice

          Stewart

           

          I am encountering something similar.

           

          I am streaming to Istanbul, in Turkey and I've just rented two 350s which are connected to FiOS boxes (one is a QIP 7216 the other a QIP 7332) in New York.

           

          Both, however, are at separate locations/data centres and independent from one another.

           

          After a few hours, the picture and sound freeze and I have to disconnect and then reconnect. It appears the Slingbox is rebooting because at first it cannot connect via TCP but sometime succeeds with SNATT or RELAY. If I disconnect that and then try again, it goes in via TCP. Alternatjvely, perhaps it is the router that is rebooting?

           

          When this happens, all other connectivity is fine.

           

          This happens with both boxes.

           

          Interestingly, I also have my own 350 in London and don't encounter this issue at all when streaming from this box. It's only the NY based boxes that have this issue.

           

          It's notable that you are also streaming from New York to Europe too. Perhaps a certain type or make of router is contributing to this - though I don't know what router both the rented boxes are connected to. Is there a standard FiOS issued one or something? I may be able to find out.

           

          I haven't found a workaround though.

           

          I'd say it happens more frequently for me than 10 hours, maybe every 2-3 hours.

           

          Enough to be irritating, but not showstopping.

           

          Oliver

            • Re: Server process appears to crash randomly, on average every 10 hours
              test350 Novice

              Many thanks for your reply.  We should soon have enough information on what fails and what doesn't, to track down the trigger condition.  With luck, this will enable Slingmedia to reproduce and fix the problem, or it will at least suggest an easy workaround.

               

              I believe that FiOS supplies a router, usually an Actiontec, for all installations.  Although you could use a different router to just access the Internet, the Actiontec includes several proprietary functions needed for FiOS TV and Voice, so in practice everyone uses it.  Although there are various complaints on the forums, the device is very reliable and IMO does not have any shortcomings related to simple TCP port forwarding.  I just logged in and found: Model MI424-WR; H/W rev D; F/W version 4.0.16.1.56.0.10.14.4; uptime 2364 hours.

               

              I had tentatively eliminated the Actiontec as a possible culprit, because my normal viewing setup captures the stream locally with a homebrew script, which connects to the Slingbox on its 192.168.1.x private IP.  This path does not involve the router at all, yet exhibits the same failure.  But you got me thinking -- the router is still physically connected and a request from the public Internet woud be forwarded.  This request might be malicious or just noise traffic.  I'm pretty certain that an outside request would not be able to authenticate to the Slingbox, but there may be a firmware bug that allows the outsider to crash it anyway. Which external TCP ports do your units use?  Mine is on the default 5201.  I may try temporarily disabling the forwarding and see if LAN streaming still crashes.

               

              What access, if any, do you have to the LAN in your New York setups?  (Shell account, remote desktop, web hosting account, etc.)

               

              Do you have a way to get another video source connected, e.g. DVD player?  Perhaps it's some unusual timing or other signal from FiOS TV that triggers this.  What resolution(s) do you feed into the Slingboxes?  What video size(s) do you stream?  Do all the combinations that you've tried have the trouble?

               

              In my case, I'm pretty sure that the Slingbox OS does not reboot, because it responds to ping from the LAN during the trouble.  While it will not respond correctly to a TCP open, to me that indicates that the server process crashed, because there is a RST response (presumably from the OS) on the attempt.

               

              When your picture freezes, if you wait up to two minutes, does the web player show an error (in my case it's W201)?  Also, if you wait two minutes after it has stopped retrying and then reconnect, do you get a TCP connection?

               

              With a great deal of effort, I've attempted to make the record script more robust.  In version 1, I detect the freeze, close the current stream file, beat on the Slingbox until it responds, discard the ASF headers, and append the new stream to the file.  You have to adjust all the send and presentation timestamps for each frame, since the Slingbox starts over at zero.  The result is a file that plays correctly, but ~67 seconds of the program are missing.  With my luck, that would be the most exciting play of the game!  So, for version 2, when the new stream is started, I send three skip-back commands on the remote, each backs up 30 seconds; the file now has ~23 seconds duplicated.  I have no idea how much work it would take to implement this for the web player.

                • Re: Server process appears to crash randomly, on average every 10 hours
                  OvCollyer Apprentice

                  The two boxes I am using use port numbers 5201 and 5211.

                   

                  I don't have access to the LANs these are on, or the router, because they are rented boxes.

                   

                  As for resolutions I haven't seen any pattern.

                   

                  Originally both FiOS boxes were outputting a component 60Hz stream and the issue occurred at 640 x 480 and above. I didn't try below.

                   

                  I then changed the config on one box to use the HDMI output feeding into an Atlona scaler to change the framerate to 50Hz and then into an HDFury to convert to component for the Slingbox. (I did this for unrelated reasons btw). The issue still occurs.

                   

                  I then hooked up an Atlona component scaler to the other box's component output, again to convert the framerate from 60Hz to 50Hz. The issue still occurs on this box.

                   

                  So it's not limited to 60hz (my working UK Sky+ HD box outputs 50Hz) but it does seem limited to these NY based Fios boxes!

                   

                  I'm also now connecting via a streaming proxy, but it still occurs.

                   

                  Your observation about the box not rebooting makes sense. It just seemed to behaved that way from the outside but unlike you I cannot observe the goings on inside the LAN.

                   

                  By the way, I also had one of the boxes replaced for a new one but that didn't change anything.

                   

                  I'm using a custom thirdparty app (built around the active x web plugin) to view, I've not tried any other players long enough to replicate the issue. This app doesn't report the error code, unfortunately.

                   

                  Not really sure where to go from here...

                   

                  PS you asked what happens if I reconnect after two minutes. Well, the sequence of events for me is like this:

                   

                  - picture and sound freezes

                  - I wait a few seconds to be sure it's not something else, and then disconnect

                  - if I immediately reconnect it times out and says the box is not online

                  - if I try again maybe 10-20s or so later it lets me in but via SNATT or RELAY, as if it's not listening on the TCP port

                  - maybe a minute or so later it lets me connect via TCP

                  - at all times during the above I have Internet connectivity to the outside World, indeed I can connect to my UK box

                   

                  This seems consistent with the server process restarting but not immediately having started listening on its TCP port.

                    • Re: Server process appears to crash randomly, on average every 10 hours
                      test350 Novice

                      Given that you have trouble on port 5211 and that I see it on the LAN, I'm leaning away from the FiOS network environment as the cause.  I'd like to rule out that the FiOS set-top boxes occasionally have an anomaly in their video output for which the Slingbox firmware is not prepared.  I can have an alternative video source connected, e.g. an old DVD player, and see whether it still freezes.  However, I don't want to disturb my only source of US TV until after the NBA championship series   Do you have an alternate video source on either of your FiOS-connected boxes?

                       

                      If there are no freezes with an alternate source, we could take a closer look at the FiOS video around the time of failure.  I have an ASF file analyzer that shows inter-frame times, etc., though I don't know how timing variations in the incoming component video are reflected in the PRO-HD stream time stamps.

                       

                      If the alternate source also has trouble, I can try to have the Slingbox connected via a managed switch, so all packets to/from the box can be captured; perhaps we'll see something at the time of trouble, not directly related to the stream TCP connection.

                        • Re: Server process appears to crash randomly, on average every 10 hours
                          OvCollyer Apprentice

                          Since the boxes I am having trouble with are rented it's not practical for me to test another source.

                           

                          The guy did tell me he has over 50 customers with similar FiOS rentals using the standard FiOS issued kit and all previous occurrences of similar issues have been down to the customer's ISP or router, and right now I'm the only one getting the issue. Not everyone would report it though of course.

                           

                          I wonder if changing my router or PC's MTU settingwould have any impact on it. Not because there is anything inherently wrong with the existing settings but maybe it would just change something that would stop the issue causing the server process to die.

                           

                          Though of course the fact I have no issues streaming from my UK box would appear to go against it being related to my setup here.

                           

                          Unfortunately I'm rather limited it what I can do - the fact you are able to reproduce it and analyse it on your LAN puts you in a better position than me to help track it down.

                           

                          I will remote login to my Mac Mini on my London LAN and set that viewing the FiOS box and see if it experiences the issue. This might help establish if the viewing environment has a bearing.

                           

                          All assuming we are even talking about the same issue, though the symptoms do seem the same.