Collecting Web Data: A Look at Web Analytics Methodology

A few months back, I posted briefly on Script-Based Versus Log-Based Tracking, discussing the differences between various web analytics data collection methods. With more and more questions cropping up about reporting discrepencies between the two types, I felt the time was right to revisit the topic and put some key concerns to rest.

Logfile Analysis, the older of the two methods, simply counts the hits made in the web server logs and stores the data in an easily-readable, easily-managable format. This method is based on server-side data collection; there is nothing stored on the visitor’s computer, nothing that runs in their browser.

In the late 1990s, search engine spiders were increasingly present on the web, and made a considerable impact on the logfiles of the sites they crawled. Along with web proxies, the popularity of consumer Internet service (and subsequent rise in dynamic IP assignment), and browse caching, it became apparent that logfile analysis needed a breath of fresh air. Supplementing logfile analysis with cookie tracking and robot exclude lists helped to solve some of the problem, but a second method was already being developed.

Page Tagging was meant to solve many of the accuracy concerns that had arisen with logfile analysis. You probably remember the iconic web counters from the mid-90s. These were some of the first examples of client-side web traffic analysis. Eventually, this method evolved into what it is today: script-based data collection which assigns a cookie to each user, analyzes their behavior on the website, and then processes the data remotely.

The popularity of page tagging is due in large part to this outsourcing. In many cases, giving the job to a remote service provider contributes to an ease of configuration and vastly decreased overhead. Both of these tracking methods have their own advantages and disadvantages, and when used alone, each can fail to provide the complete data-set of a website’s performance.

Let’s take a look at the methods’ disadvantages, on their own:

Logfile Analysis DisadvantagesPage Tagging Disadvantages
  • Web browser caching can drastically affect the data; your web server won’t see the hit
  • Web proxies and dynamic IP addresses make it difficult to track unique visits to the site
  • Difficult to track scripting events (Javascript, Flash, etc.)
  • Little to no information about unique visitors’ computer setup (screen resolution, plugins, etc.)
  • Higher cost upfront than hosted solution
  • Higher IT overhead for custom setup
  • Difficult to accurately measure traffic across domains
  • Can’t track bandwidth
  • Depending on users’ browser setups, may ignore certain visits (cookies or Javascript disabled, older browsers)
  • Ignores search engine spiders
  • Typically very difficult to reprocess data, which means…
  • Every page must be tagged from the start; no way to recapture data, and…
  • You can’t use existing historical data logs, so any data collected in logs up until the point where you tagged is out of the equation
  • Your data is being stored on someone else’s servers, usually
  • What happens when you decide to change vendors? Where does your data go?

So the natural assumption is, if we could just combine the two methods, we would get all of the advantages and bypass some of the drawbacks:

Logfile Analysis AdvantagesPage Tagging Advantages
  • Can track bandwidth and file downloads
  • Won’t ignore visitors without current browsers, Javascript, cookies, etc.
  • Robot traffic is counted and can oftentimes be filtered into a separate profile for search engine inclusion analysis
  • Data is based on info from webserver logfiles and can therefore by re-processed, if necessary
  • Your data is your own and you can take it from one analytics package to another, no problem
  • Bypasses web browser caching issues and lets you see the hit, even when your web server doesn’t
  • Multiple visits from the same dynamic IP can be broken down into unique visitors due to cookie-based session IDs
  • Scripting events are easily tracked
  • Can view extensive information about visitor’s computer setup
  • Typically less expensive for short-term–low monthly fees
  • More of the IT overhead handled remotely–less for your internal staff to deal with
  • Easily trace sessions across multiple domains

In fact, this is what many web analytics vendors are moving toward: hybrid data collection methods. While Google Analytics is itself a hosted page tagging solution, its predecessor, Urchin Software, uses cookie-enabled logfiles with page tagging to give its users the best of both worlds. This allows for greater accuracy of tracking sessions across multiple domains, eliminating the caching issues, and tracking detailed web design metrics. Bandwidth and search engine spider data is still available, and, if necessary, all data can be reprocessed, as it’s based on webserver logfiles. While setting up a server-side package like Urchin could potentially involve more upfront overhead, Google Analytics Authorized Consultants like ROI Revolution can help you get over the finish line quickly, and on your way to getting the most out of your web analytics data.

If you have any questions about the methods discussed above, please reach out directly to our team.

Sources:

 

Share This Page

Posted by

One thought on “Collecting Web Data: A Look at Web Analytics Methodology

  1. Nice share! but when it comes to gather data from remote locations, we prefer offline data collection techniques provided by survey apps. It’s useful as we can customise the mobile forms and gather data with results shown in real-time.

Leave a Reply

Your email address will not be published. Required fields are marked *