Maybe you’ve experienced this before. You are minding your own business without a care in the world when all of the sudden the phone rings.
You: Hello?
Them: There’s a problem and we think the network is the cause. Can you check it?
You: Check what? The network?
Them: Yes.
You: Which part? What am I looking for?
Them: Any sort of problem.
(Fast forward an hour or so later)
You: Well, I ran a packet capture on the switch port connecting to system XYZ. I see a bunch of TCP resets coming from your server.
Them: Okay. We’ll take a look.
(Fast forward another half hour or so)
Them: It looks like we found the problem. Process blah-blah-blah was failing due to a dependency on process ha-ha-ha. We reset the services and everything is working again. Thanks for your help.
You: Okay. Not a problem.
(Back to life as before)
Sound familiar? If you have been in networking for more than a couple of years, this should invoke all kinds of warm and fuzzy memories. Meals were missed. Plans were canceled. Sleep was lost. All in the name of defending the network’s honor. Oh yes. This is the part about a career in networking that is conveniently left out of the brochure you are given before signing your life over to Cisco/Juniper/Citrix/Aruba/Nortel/F5/Brocade/Alcatel/etc.
I have seen more than my fair share of these incidents. With the exception of a brief stint in consulting and about 2 years doing things in the US military that you’ll never do anywhere else, I have lived my entire IT existence in the “corporate” setting. By that I mean chained to a desk looking over logs and configurations. Slaving away on the same network for years on end. Getting to know the lay of the land in the same way one knows all the sounds an old car or house makes. In short, after you work on a certain network long enough, you can see into the guts of it like Neo can with the Matrix.
If you are like me, you have a certain affinity towards your network. Sure, it may need some help with cabling or a cleaner route table, but you work with what you have. You make changes as you can. You replace hardware as the budget allows. You care for it like a farmer does his corn fields. Is this creeping you out yet? Well it shouldn’t. There are plenty of people out there who love their networks even to the point of showing them off to the world.
Here’s the problem with being a networking engineer/administrator/architect/designer/janitor. You have to understand everyone else’s piece of the pie, but not too many people have to understand yours. Fair? No, it isn’t, but as an officer I once worked for in the military told me: “That’s a burden you have to bear.” He was right, even if I didn’t like hearing it. That is not to say that all other entities within IT or greater corporate America are completely clueless when it comes to networks. Quite the contrary. There are plenty of systems people who understand networks very well. You can give them an IP with a classless subnet mask and they don’t even bat an eye because they know exactly what you mean when you say it’s a slash 26 network. However, when it comes to “applications” people, my experience has been that they only have to know their piece of the pie and can conveniently blame the network when a problem arises. I know what you’re thinking. Did he just paint all applications people with a broad brush? Yes. Yes I did. Of course, if you happen to be an applications person, I meant everyone else. Not you. 😉
That brings me to the title of this mini-rant/post. You can plead your case before everyone telling them that it probably isn’t the network, but they’re not going to believe you. Why? A lack of understanding or a lack of visibility into your world. You see, the network is just a big murky box to them. Maybe if they had access to some monitoring platforms they could be swayed, but unless your monitoring package can go down to the transaction level like Compuware’s Vantage product, you’re still going to have some explaining to do. However, in a way that I cannot begin to explain, people tend to believe packet captures. Don’t ask me why. I can tell you until I am blue in the face that the switches and routers on the network for the most part could care less what your payload is and you won’t believe me. You may not even understand TCP, UDP, and the rest of the acronym soup being tossed around, but for some reason, Wireshark or tcpdump results are more credible than Steven Hawking discussing time travel. If you want some good laughs around things like this, follow this guy on Twitter. He seems to deal with this on a regular basis and has some hilarious tweets to show for it.
Let me end this post with the following suggestions:
1. Get familiar with interpreting packet captures. Wireshark is the most well known packet capture utility for Windows boxes out there. There’s even a good book out there that covers everything in detail. You’ll also need to know about TCP and how it works. There are other protocols like UDP and ICMP that will be good to know, but TCP is by far the most useful protocol to know and understand when dealing with packet captures. For some good info on TCP, see here.
2. Don’t be afraid to run a packet capture early on in the troubleshooting process. I am finding that this tends to solve the problem when all other methods fail.
3. Don’t EVER, and I stress EVER, state emphatically that there is no way possible that the network is at fault. 99 times out of 100 you may be right. Get it wrong 1 time, and everyone will be gunning for you. There’s always the possibility that the network is at fault. Even when everything you know is telling you that it isn’t the network, if you don’t have a packet capture to back it up, you’re wasting your time.
4. Educate your co-workers about the network, or networking in general. Try to do this without condescension. Nobody wants to listen to Nick Burns tell them how stupid they are. The more people know, the less likely they are to hurl unsubstantiated accusations your way that you are manipulating traffic to break their application. It makes every organization a lot stronger when education is provided from the various departments. Please understand that although you and I might get excited when talking about routing protocols, not everyone else will. Oh how I wish my wife and I could have the EIGRP vs OSPF discussion, but it’s just not going to happen. Some people are not going to want to know a whole lot about the network, so try and figure out how much they really want to know and tailor the education to that level.
If nothing else, looking at a bunch of packet captures will help you appreciate what is going on behind the scenes every time you read an e-mail message or look at a website. Although other people might not appreciate it, I find that it helps my wife fall asleep faster when I talk about the various TCP flags and why they are used in data transmissions. At least she will never blame the network. 🙂
Pingback: Interesting Packets On The Web – 8/10/2010 | Echo .. Reply Packets!!!!!
Curse your eyes, I should have written that myself. Well done, bravo. Should be some kind of a class based on this for network admin n00bs.
I have also learned over the years to never assume it isn’t the network. Usually it isn’t. But this could be the time you’ve got the odd interface throwing away packets or a sup engine acting up. Going all Nick Burns never helps, agreed. I’ve made great strides by working closely with the admins, hearing out their symptoms, and brainstorming WITH them about what the potential causes are. If I demonstrate that I’m open to the root cause being a network issue, and I’m transparent about what I’m doing to troubleshoot, they tend to be more open with me about what they are considering and checking.
It doesn’t hurt to have a little server background. That way, when the apache guys are talking about the new mod they loaded last night or maybe how they got a little more restrictive with the ciphers they’ll talk to a client with (whatever), you can tune in and bounce around how that issue might be related the problem being experienced. Credibility and openness is so very key, though. If they know they can trust you to own up when it’s your (a network) problem and not just CYA, the process goes a whole lot better. And then the next time it’s even better than that.
You bring up an interesting point about having “a little server background”. I’ve been mulling over the concept of technical backgrounds for awhile now. I might even do a post on it. I’ve always enjoyed working with former server, generalist, and telecom people because they always have a different approach with regards to how they look at things. There is tremendous value in someone with a server background when dealing with application issues as opposed to service provider issues in which an old telecom engineer can really shine.
As for going all Nick Burns on someone, I am reminded of a book that came out a few years back entitled Network Warrior. The last chapter of that book is entitled “How Not To Be A Computer Jerk”. Although the entire book is very well done, I think I enjoyed that last chapter the most.
Great post! I learned the hard way – “it’s not the network”. Better to prove it with packet captures! The bad side to my job is that either way I am the one that would have to look into it since I handle both network and servers 😉