Weird Networking Issue

TushonTushon I'm scared, CoachAlexandria, VA Icrontian
edited June 2013 in Hardware
I've tried to troubleshoot this rather exhaustively and don't know what else to try. The first symptom was a loss of connectivity to pleaseignore.com and all sub-domains. This is a problem as the comms for EVE, jabber (a chat client/announce tool), forums, wiki, etc all rely on that domain being accessible. I've tried using two different routers, both with intermittent success (more failure than success) if left to their own devices. I had temporary success with hard resetting on both, but then it reverts to failing again. It's not a DNS issue, since attempting a ping or tracert does resolve to an address, under Time Warner DNS, Google DNS and OpenDNS. I've had intermittent and variable-length success with plugging my computer directly into the modem, rebooting modem, successfully navigating to pleaseignore.com and other sub-services, then plugging back into router (and router back into modem) and it continues to work. Last time I did this, it worked for a couple weeks. Yesterday it stoppped working and the trick only helped for 30 minutes before it failed to connect again.

Any ideas or troubleshooting I can do will be gladly accepted. All other internet access seems to work normally, though there are very occasional failures to load some content on the first try (e.g. a youtube timeout, which loads normally on refresh).

Routers: Asus RT-N66U, latest firmware; Cisco E2000 w/ DD-WRT (tried several versions)

Comments

  • ardichokeardichoke Icrontian
    A traceroute when the problem is occurring would help identify the problem. The possibility exists that the problem is on the TEST infrastructure side as it appears (from your post) to be the only thing that has consistent problems. Then again, your ISP might be screwing with something (or just have a bad hop)
  • TushonTushon I'm scared, Coach Alexandria, VA Icrontian
    I've tried to work with TEST ITbros, and they are aware of certain ISPs having routing issues with their infrastructure, but the odd part to me is that it works, at least temporarily, when plugged directly into the modem. I have working and non-working available at the links below:
    https://mega.co.nz/#F!fZ5UzIZZ!LDEuaBIRfmk73-9ETeFGdw
    https://mega.co.nz/#F!2JolBLSY!GhUNqRAe9yoT5rfvVy7h9w

    Those are both behind a router (I assume it was the same one but I don't remember) and include multiple traces to top level and sub-domains.
  • Are you behind a proxy? That could be the issue given ping works fine but your browsers/other apps have problems resolving.
    Also, I would try hardcoding the domain in your hosts file. Then your browser/apps won't even have to generate a dns query to your router and/or dns servers.

    I'd find it hard to believe that the routers involved in picking your route to the endpoint are so frequently choosing bad routes that will drop off and timeout. I'd be more inclined to guess there is something funky on your pc.

    Do you have another computer behind your router(s) that exhibits the same issue?
  • ardichokeardichoke Icrontian
    Well, at least with the traceroutes you listed, the problem is happening at this hop:

    22 167 ms 211 ms 192 ms hos-tr3.ex3k8.rz16.hetzner.de [213.239.223.201]

    From the look of it, that is the last route before traffic goes into the TEST server, so it's either a problem with the TEST server rejecting the traffic or with the TEST hosting provider dropping it instead of passing traffic on to the server.

    Also, it's not that uncommon to have routers pick bad hops. We see it all the time at work (especially with certain ISPs that favor lower cost but less reliable routes for traffic in some locations). If traceroutes continue to fail at that point for you when you're having issues, you need to pester TEST IT because it's something that they or their hosting provider have to resolve.

    In other words, there's not much you can do besides pester the people that can do something as it doesn't appear (from the limited information available) to be a problem on your end, but rather on TEST's end.
  • TushonTushon I'm scared, Coach Alexandria, VA Icrontian
    edited June 2013

    Are you behind a proxy?

    Also, I would try hardcoding the domain in your hosts file. Then your browser/apps won't even have to generate a dns query to your router and/or dns servers.

    Do you have another computer behind your router(s) that exhibits the same issue?

    In order:
    No, unless there is some virus which installed one that has escaped the detection of myself and multiple antivirus apps and exhibits no other symptoms. :D

    I can try that, but I don't believe it'll fix since DNS resolution isn't the problem. That also forces me to manually update if they change IP for whatever reason.

    Yes, my phone exhibits identical issue when behind the router, and works perfectly when on cell network.
    ardichoke said:

    Well, at least with the traceroutes you listed, the problem is happening at this hop:

    22 167 ms 211 ms 192 ms hos-tr3.ex3k8.rz16.hetzner.de [213.239.223.201]

    From the look of it, that is the last route before traffic goes into the TEST server, so it's either a problem with the TEST server rejecting the traffic or with the TEST hosting provider dropping it instead of passing traffic on to the server.

    Also, it's not that uncommon to have routers pick bad hops. We see it all the time at work (especially with certain ISPs that favor lower cost but less reliable routes for traffic in some locations). If traceroutes continue to fail at that point for you when you're having issues, you need to pester TEST IT because it's something that they or their hosting provider have to resolve.

    In other words, there's not much you can do besides pester the people that can do something as it doesn't appear (from the limited information available) to be a problem on your end, but rather on TEST's end.

    That was my impression as well (that it is something wrong with either Hetzner [their hosting provider] or TEST IT itself). I tried to insinuate this in as nice a way as possible, but the couple of dev responses were quite non-committal, and I would guess a significant portion of that has to do with my second post in the thread (after first dev response) was that it was working ... anyways, it does always fail at the same point (where it appears that hetzner is passing off to the TEST servers, as you stated).

    I've bumped the thread probably 6 times over the last 3-4 weeks hoping to arouse some response but they've also been very busy with some really large loads against the servers starting with the CFC reset and subsequent PL/CFC/NC. "totally not an invasion".
  • Bummer, is there some network wizardry in which case you can deny routes that contain hetzner.de and request a new route if it exists?
  • TushonTushon I'm scared, Coach Alexandria, VA Icrontian

    Bummer, is there some network wizardry in which case you can deny routes that contain hetzner.de and request a new route if it exists?

    You're asking if I can deny routes to the TEST hosting provider? I don't think that would work since it has to go
    my local network > TWC > internet > hetzner > TEST IT services
  • I meant to ask if there was another hop before TEST IT, an alternative to hetzner, and if you could build a static route based on such an alternative.
  • ardichokeardichoke Icrontian
    I'm pretty sure there is no way around hetzner. TEST only has a few servers (last I checked) which are hosted at hetzner. Pretty sure that hetzner hop is inside the datacenter and all traffic to the TEST servers has to go through it. If I had to guess, I'd say the problem is MOST LIKELY on the TEST IT side. They're probably employing some firewall rules or DoS prevention that isn't behaving the way they expect it to but can't be assed to admit it/look at it.
  • TushonTushon I'm scared, Coach Alexandria, VA Icrontian
    I see. I'm not sure, but I doubt it :/. Is it very common for hosting providers to offer that sort of routing?
  • ardichokeardichoke Icrontian
    Tushon said:

    I see. I'm not sure, but I doubt it :/. Is it very common for hosting providers to offer that sort of routing?

    No. Especially with commodity hosting providers (ie - ones where anyone can just rent a server). Many have redundant routes and if one fails the other will take over, but it's highly unlikely they would be willing to go in and add custom rules for people (unless they are a massive customer) as it would quickly add an unreasonable amount of complexity to managing their network infrastructure.
  • Well it sounds completely crappy, but it could be the only thing you can do is find some sort of proxy that isn't subject to TEST ITs rules. But then you have to deal with the bleh of a proxy.
  • TushonTushon I'm scared, Coach Alexandria, VA Icrontian
    Welp, thanks for the confirmation of what I had seen/troubleshooting I had already done. Even though it's most likely an issue somewhere in their infrastructure, a la eccentric rules of some forgotten use, I would like to blame TWC since ... it's time warner. Precisely last in customer service! :D

    Till next time, boys!
    image
    JBoogaloo
  • TushonTushon I'm scared, Coach Alexandria, VA Icrontian
    One thing I have noticed since then:
    So the RT-N66U has a screen for status of things like firewall packet notifications. With default settings, like firewall on, I get drop notifications from weird IP address when attempting to navigate to TEST services. If I disable the hardware firewall, it no longer logs message but does not fix the issue (I re enabled it in case you are wondering). Perhaps that will jog something. I would also be happy to let someone poke around via remote access if desired.
  • That did give me one new idea:
    I think it would be interesting (or at least provide some answer) to see if during a period where you can not access the servers you can get a new IP from your ISP. Then test again from the new IP. If it works, and your first, second, and third octets from your new WAN address are the same as your old address -- then it would seem likely TEST at least temporarily banned your specific IP, not the class it was in.
    If you do get back in with a new IP, maybe you can try to better guess what causes it and/or somehow find a way to easily get a new IP from the ISP for future issues.
  • TushonTushon I'm scared, Coach Alexandria, VA Icrontian
    Mmm I was trying to say that my router's firewall appeared to be dropping incoming packets rather than something on TEST's side. It works perfectly when I'm plugged in to the modem directly for any length of time and then works for a variable (i.e. 24 hours or less, so far) length of time after replugging in router before ceasing to work ...
  • Right, I get your router is dropping incoming packets. Wouldn't it be possible that:

    When you plug your modem in directly you are using a new mac address and you get a new IP from the ISP. You go back to your router, and the ISP may give you your old IP back because you are using the old Mac address again and they haven't refreshed their routing table yet.

    The incoming packets your router drops may be some service TEST uses to ping you or something, I don't know. Did you see what port they were requesting? But if your router blocks them, AND if they were something TEST was using to check you out, it doesn't mean that disabling the hardware firewall doesn't help because you might have a temporary ban on you from TEST and you have to wait for it to run out but you don't know what their rule/timer is set to. So you don't know how long you would have to disable the router firewall if that is the case.

    Anyway, it would be worth checking what your WAN IP is when you have a working connection on your modem AND what your WAN IP is when you have a non-working connection with your router being used. If it is the same IP, you can rule my theory out and move on to the next.

  • Or you can just say, Dan you suck at networking. Get off my lawn! <3
    JBoogaloo
  • TushonTushon I'm scared, Coach Alexandria, VA Icrontian
    Good call. I will make sure to check external IP in both instances.
  • TushonTushon I'm scared, Coach Alexandria, VA Icrontian
    Roommates weren't home yesterday, so here is connected directly to modem
    76.184.148.48

    and here is modem > router > switch > switch > me (believe me, i tried elimianating the switches as well)
    76.184.137.245

    Though I'm not sure what if anything that tells me, since it is currently working right now in the second setup. FUCK YOU OBSCURE AND NIGH UNDIAGNOSABLE ISSUES.
  • Straight_ManStraight_Man Geeky, in my own way Naples, FL Icrontian
    Is the router set to DHCP? Please set it so for WAN IP getting. And router can be plugged into non-bridged modem, and router will get IP from modem if modem is behaving right. Tell router to get IP from modem, 76.184.148.48 as gateway for router.

    That's IF it breaks again.
Sign In or Register to comment.