How To Report A Problem

From wiki.outages.org
Jump to: navigation, search

Part of keeping the signal-to-noise ratio on the outages mailing list as high as possible -- important since many people have that list forwarded to a cellphone for fastest response time -- is knowing how to report an outage on the list, or to ask for help in determining what might be wrong, and where.

This page has been developed to provide some hints for you, as a potential reporter, about what you can do before posting a report, so as to improve the outages-list experience for everyone.

If you're really motivated, you might also want to read How To Ask Questions The Smart Way, by Eric Raymond and Rick Moen, which will explain to you in more depth why it's so important to do your homework before making use of 'free' resources like our lists, and How To Report Bugs Effectively, by PuTTY author Simon Tatham, which will give you more insight into these matters, albeit from a more softwary perspective.

Why you might need the list

Well, because multiple reports on an outage make it easier to triangulate what's going on and where, mostly. Or, at the least, because you're not sure what you're seeing; you're just pretty sure it's not right.

Types of problems

A "problem with the Internet" can be an outage, impaired transmission, inaccessible sites, sites which don't load completely or correctly, or many other things. Different problems present different symptoms, and -- just like doctors -- network engineers need to know as many symptoms as possible to make a reasonable diagnosis...

even if the diagnosis is just picking from a list in their head of known outages to figure out that one of them applies to your report.

What can cause problems

Stuff breaking.  :-) All seriousness aside, there are about a billion and 6 'moving' parts between you and some other computer you're trying to get to; DNS records can get broken; DNS servers can quit and be insufficiently backed up (or you might just time out trying to get an address from a different one); routers and switches can fail; copper cable and connectors can die; optical cables can tangle with their natural enemy--the backhoe--and lose.

And, as (I think it was someone at Renesys) said:

"The Internet is the only human endeavour in which a single character typographical error in a file on a server *on the other side of the planet* can cause your entire network to come crashing to the ground."

Usually, that's BGP prefix hijacking, but it can be other things as well; we haven't yet seen Libya accidentally go off the air and take bit.ly down with it... but I'm waiting for it to happen.

What we need to know before we can help

What's your Chief Complaint

It's pretty important to characterize precisely what you're seeing that caused you to post in the first place, especially if you're asking for advice. What do we mean by characterize correctly?

  • What are you seeing that made you report it?
    • Physicians call this the 'chief complaint', or 'presenting symptoms'.
  • What did you expect to see?
    • Sometimes, this is obvious from the phrasing of the first item, but not always. Reread it and make sure.
  • Where are you looking from?
    • Source IP addresses are best. If you don't want to give that out, at least include a city and carrier, ie: "I'm connected to FPL Fibernet in Cocoa Beach"
  • What are you looking with?
    • There are many network diagnosis tools available for figuring out what's wrong with your connection; we list a lot of them on our XYZ page. traceroute (and its cousins tcptraceroute and mtr) are probably the most common, though you do have to know how to interpret their results.

How do you connect?

As noted above, it's hard to tell exactly what's wrong, if we don't know where you're looking from, fairly precisely. While we don't need a physical address (city's usually enough, or a building, if you're in a "lit" one), your source IP address will usually help a lot.

You might not want to (or be permitted to) provide one, and that's ok, though you have to understand that it may impact your ability to receive useful answers. At least knowing what carrier you're getting your uplink from, though, is pretty much a must.

"A Road Runner Business (/"residential") cablemodem in St Pete FL" is usually enough.

On the other hand, if the problem is "my users can't connect to me" or the ever popular "we've turned up a new block of IP addresses and they don't seem to be reachable" then you pretty much *have to* provide an IP address at which we can aim tests, even if you have to set up a machine to do it.

How do they connect?

If you're (not) talking to someone at the other end that is not a major public website (a business partner or branch office, frex), knowing how the other end is connected (to the same degree of precision as above) is also important--we can't always tell just from doing our own traceroute, in some circumstances (routing problems and prefix theft can cause packets to go entirely awry).

What's your evidence?

This is probably the most important part.

To boil down "Smart Questions", mentioned above (you didn't read it, did you. Go back and read at least the first couple sections. It's important. I'll wait.  :-), you're asking people who (usually) aren't even associated with the carrier or target you're having trouble with to help you diagnose it.

We don't especially mind doing some of that work for free, if we have the time, but the easier you make the job, the more likely we'll help.

You've got to do your homework, and give your posting a little thought. More to the point, you have to be seen to have done your homework.

If you're having trouble reaching a site, do a traceroute, and post the results. Explain, as suggested above, what you expected, and what you actually got.

Most importantly--especially if your connection is not to something big like Google or Yahoo--say whether the problem is new, whether you've had similar problems in the past, and whether any part of your connection is new, or has been recently changed. Fingers stirring the braaaaaainz of devices in the path are probably the #1 or 2 cause for problems.

What do we get out of this?

Well, as usual, different things for different people. If I scratch your back today, you may scratch mine next week. Everyone learns something, even those who are contributing answers.

And, importantly: if there was some problem you or your carrier had, and fixed, please let us know what happened, if for no other reason, for our mailing list archives. Letting us know when you find out what the problems was is part of your payment for the "free" help you get.