Limitations of site statistics
Although web server log files do keep a record of every request
they receive, this doesn't necessarily mean you can tell exactly
how many visitors you've had or how many pages have been requested.
Caching refers to situations where the file a visitor requests
from a web server has already been stored offline. This could be
either on their own computer via their browser's cache, or through
an Internet service provider's use of a cache to reduce the load
of traffic and data being sent from their server (this is less of
an issue if most of your traffic is coming from within the Boston
University network).
In both cases, multiple readings of your content are registered
only once or possibly not at all. Sometimes the browser will check
to see if there's a new version of the page available, and this
results in an entry in the log file. But, not all browsers check
for cached versions of pages every time. Think about how many times
during your web browsing you use the Back button to navigate (unless
you have set your browser's local cache to 0, which is unlikely).
None of those requests are being logged by the server -- they're
all fulfilled from your own local computer.
The stateless nature of HTTP connections also poses problems for
accurate tracking of web site usage. Basically, unless you require
users to log in to your site with registered user names, you can't
tell much about who they are or their behavior on your site. For
example, while you can see a total number of requests for a specific
page, you still don't know how much time users spent on that page.
You also can't tell where they went after they left your site. And
while you can get a total count of hosts that have visited, it's
possible that one user has been counted as multiple hosts because
of a dynamic IP address, or multiple users could be counted as one
if they're sharing an ISPs proxy server. Search spiders are counted
as hosts, although you wouldn't consider them true visitors.
In short, site statistics reports are based on data from server
log files, and so they can only be as accurate and complete as their
source. So rather than obsess about the specific numbers in your
reports, use them to monitor general trends and to get the bigger
picture of site usage.
|