A Look at Web Analytics Methodology

Note, this blog is related to Universal Analytics, which has been deprecated. To view information about Google Analytics 4 (GA4), please visit our Latest Updates on GA4.

A few months back, I posted briefly on Script-Based Versus Log-Based Tracking, discussing the differences between various web analytics data collection methods. With more and more questions cropping up about reporting discrepancies between the two types, I felt the time was right to revisit the topic and put some key concerns to rest.

Logfile Analysis, the older of the two methods, simply counts the hits made in the web server logs and stores the data in an easily-readable, easily-manageable format. This method is based on server-side data collection; there is nothing stored on the visitor’s computer, nothing that runs in their browser.

In the late 1990s, search engine spiders were increasingly present on the web, and made a considerable impact on the logfiles of the sites they crawled. Along with web proxies, the popularity of consumer Internet service (and subsequent rise in dynamic IP assignment), and browse caching, it became apparent that logfile analysis needed a breath of fresh air. Supplementing logfile analysis with cookie tracking and robot exclude lists helped to solve some of the problem, but a second method was already being developed.

Page Tagging was meant to solve many of the accuracy concerns that had arisen with logfile analysis. You probably remember the iconic web counters from the mid-90s. These were some of the first examples of client-side web traffic analysis. Eventually, this method evolved into what it is today: script-based data collection which assigns a cookie to each user, analyzes their behavior on the website, and then processes the data remotely.

The popularity of page tagging is due in large part to this outsourcing. In many cases, giving the job to a remote service provider contributes to an ease of configuration and vastly decreased overhead. Both of these tracking methods have their own advantages and disadvantages, and when used alone, each can fail to provide the complete data-set of a website’s performance.

Let’s take a look at the methods’ disadvantages, on their own:

Logfile Analysis Disadvantages

Page Tagging Disadvantages

Web browser caching can drastically affect the data; your web server won’t see the hit
Web proxies and dynamic IP addresses make it difficult to track unique visits to the site
Difficult to track scripting events (Javascript, Flash, etc.)
Little to no information about unique visitors’ computer setup (screen resolution, plugins, etc.)
Higher cost upfront than hosted solution
Higher IT overhead for custom setup
Difficult to accurately measure traffic across domains

Can’t track bandwidth
Depending on users’ browser setups, may ignore certain visits (cookies or Javascript disabled, older browsers)
Ignores search engine spiders
Typically very difficult to reprocess data, which means…
Every page must be tagged from the start; no way to recapture data, and…
You can’t use existing historical data logs, so any data collected in logs up until the point where you tagged is out of the equation
Your data is being stored on someone else’s servers, usually
What happens when you decide to change vendors? Where does your data go?

So the natural assumption is, if we could just combine the two methods, we would get all of the advantages and bypass some of the drawbacks:

Logfile Analysis Advantages

Page Tagging Advantages

Can track bandwidth and file downloads
Won’t ignore visitors without current browsers, Javascript, cookies, etc.
Robot traffic is counted and can oftentimes be filtered into a separate profile for search engine inclusion analysis
Data is based on info from webserver logfiles and can therefore by re-processed, if necessary
Your data is your own and you can take it from one analytics package to another, no problem

Bypasses web browser caching issues and lets you see the hit, even when your web server doesn’t
Multiple visits from the same dynamic IP can be broken down into unique visitors due to cookie-based session IDs
Scripting events are easily tracked
Can view extensive information about visitor’s computer setup
Typically less expensive for short-term–low monthly fees
More of the IT overhead handled remotely–less for your internal staff to deal with
Easily trace sessions across multiple domains

In fact, this is what many web analytics vendors are moving toward: hybrid data collection methods. While Google Analytics is itself a hosted page tagging solution, its predecessor, Urchin Software, uses cookie-enabled logfiles with page tagging to give its users the best of both worlds. This allows for greater accuracy of tracking sessions across multiple domains, eliminating the caching issues, and tracking detailed web design metrics. Bandwidth and search engine spider data is still available, and, if necessary, all data can be reprocessed, as it’s based on webserver logfiles. While setting up a server-side package like Urchin could potentially involve more upfront overhead, Google Analytics Authorized Consultants like ROI Revolution can help you get over the finish line quickly, and on your way to getting the most out of your web analytics data.

If you have any questions about the methods discussed above, please reach out directly to our team.

Sources:

Web Analytics Demystified by Eric Peterson
Web Traffic Data Sources & Vendor Comparison by Omega Digital Media