Tuesday, November 29, 2011

Ping!


Regularly, I see web services offer a ping(), or equivalent, operation. I hear several reasons:
1. “If I depend on the availability of another service, I’ll ask if it is available, just before I use it.”
2. “My service provider says he is available at least 90% of the time, so I will check him out regularly if he keeps his promise!”
3. “My service depends on another service, so I will not make my service available if my service provider is not available.”

And thus, we see a cascade of pings arising.


Let's have a closer look at what this actually means.

1. “If I depend on the availability of another service, I’ll ask if it is available, just before I use it.”

This is brilliant. The network is known to have availability issues every now and then, especially with synchronous protocols like HTTP. So before calling a SOAP web service, let’s do a ping first. If the service doesn’t answer, we don’t call it with the real request.

Flaws:
a.There is no guarantee that after a successful ping, the real request will pass through.
b.An unsuccessful ping does not guarantee that the real request would not have passed through.

The conclusion drawn from the result of the ping can thus very well be wrong!

Solution:
Just call the service with the real request and make sure that you handle a connection error properly. This way, you have never taken the wrong decision. Compared with the ping, there will be more successful hits, which means more business.

2.  “My service provider says he is available at least 90% of the time, so I will check him out regularly if he keeps his promise!”

Let’s do a ping every n minutes, and calculate the availability percentage of the provider. This is a measure of the uptime of the service provider.

Flaws:
a.There is no guarantee that between two successful pings, the service was really available.
b.An unsuccessful ping can be an intermittent network issue of very short duration, but it will be counted as n minutes

The conclusion drawn from the monitoring by pinging has nothing to do with the real statistics of uptime of the provider
Solution:
Just call the service with the real request and make sure that you handle a connection error properly. It doesn’t matter if the service provider is off-air in between requests. All that matters is that at least x percent of your requests are served correctly. That’s what needs to be stated in a service level agreement. And that is what is measurable.

3. “My service depends on another service, so I will not make my service available if my service provider is not available.”

Let’s do a ping of the service provider. If he doesn’t answer I will close down my web application so consumers can’t use this functionality. That avoids frustration of the people receiving connection errors. I’ll open the web application again after a successful ping.

Flaws:
a.There can be no guarantee the service is down. The web application will sometimes be shut down erroneously. This can result in missed business.

Solution:
Just call the service with the real request and make sure that you handle a connection error properly. Even in this way you can shut down the web application, for example after p unsuccessful requests. Compared with the ping “solution,” there will be more successful hits, which means more business.

Okay, so there can be a reason to do "proactive monitoring." After all, if the service goes down in the evening, it would be nice to notice before the first user in the morning notices it. But do you need a ping for that? The ping may check the connection to the service, but doesn't check the systems behind it. It would be far better to do a real request. The service  provider has to agree that this request is recognized as a monitoring request only, and not a real consumer request.

So don't use ping operations! And especially, don't draw any statistics conclusions from it.

No comments:

Post a Comment