Add to Google Add to Bloglines Add to Newsgator Add to Yahoo! Contact ROI Revolution Sign Up for the ROI Newsletter Unofficial Google Analytics Blog Feed Unofficial Google Analytics Blog
Unofficial Google Analytics Blog by ROI Revolution

Categories Search This Blog Blogroll Archive

| |

May 1, 2006

Collecting Web Data: A Look at Web Analytics Methodology

By Michael Harrison, Google Analytics Support Tech

A few months back, I posted briefly on Script-Based Versus Log-Based Tracking, discussing the differences between various web analytics data collection methods. With more and more questions cropping up about reporting discrepencies between the two types, I felt the time was right to revisit the topic and put some key concerns to rest.

Logfile Analysis, the older of the two methods, simply counts the hits made in the web server logs and stores the data in an easily-readable, easily-managable format. This method is based on server-side data collection; there is nothing stored on the visitor's computer, nothing that runs in their browser.

In the late 1990s, search engine spiders were increasingly present on the web, and made a considerable impact on the logfiles of the sites they crawled. Along with web proxies, the popularity of consumer Internet service (and subsequent rise in dynamic IP assignment), and browse caching, it became apparent that logfile analysis needed a breath of fresh air. Supplementing logfile analysis with cookie tracking and robot exclude lists helped to solve some of the problem, but a second method was already being developed.

Page Tagging was meant to solve many of the accuracy concerns that had arisen with logfile analysis. You probably remember the iconic web counters from the mid-90s. These were some of the first examples of client-side web traffic analysis. Eventually, this method evolved into what it is today: script-based data collection which assigns a cookie to each user, analyzes their behavior on the website, and then processes the data remotely.

The popularity of page tagging is due in large part to this outsourcing. In many cases, giving the job to a remote service provider contributes to an ease of configuration and vastly decreased overhead. Both of these tracking methods have their own advantages and disadvantages, and when used alone, each can fail to provide the complete data-set of a website's performance.

Let's take a look at the methods' disadvantages, on their own:

Logfile Analysis Disadvantages Page Tagging Disadvantages
  • Web browser caching can drastically affect the data; your web server won't see the hit
  • Web proxies and dynamic IP addresses make it difficult to track unique visits to the site
  • Difficult to track scripting events (Javascript, Flash, etc.)
  • Little to no information about unique visitors' computer setup (screen resolution, plugins, etc.)
  • Higher cost upfront than hosted solution
  • Higher IT overhead for custom setup
  • Difficult to accurately measure traffic across domains
  • Can't track bandwidth
  • Depending on users' browser setups, may ignore certain visits (cookies or Javascript disabled, older browsers)
  • Ignores search engine spiders
  • Typically very difficult to reprocess data, which means...
  • Every page must be tagged from the start; no way to recapture data, and...
  • You can't use existing historical data logs, so any data collected in logs up until the point where you tagged is out of the equation
  • Your data is being stored on someone else's servers, usually
  • What happens when you decide to change vendors? Where does your data go?

So the natural assumption is, if we could just combine the two methods, we would get all of the advantages and bypass some of the drawbacks:

Logfile Analysis Advantages Page Tagging Advantages
  • Can track bandwidth and file downloads
  • Won't ignore visitors without current browsers, Javascript, cookies, etc.
  • Robot traffic is counted and can oftentimes be filtered into a separate profile for search engine inclusion analysis
  • Data is based on info from webserver logfiles and can therefore by re-processed, if necessary
  • Your data is your own and you can take it from one analytics package to another, no problem
  • Bypasses web browser caching issues and lets you see the hit, even when your web server doesn't
  • Multiple visits from the same dynamic IP can be broken down into unique visitors due to cookie-based session IDs
  • Scripting events are easily tracked
  • Can view extensive information about visitor's computer setup
  • Typically less expensive for short-term--low monthly fees
  • More of the IT overhead handled remotely--less for your internal staff to deal with
  • Easily trace sessions across multiple domains

In fact, this is what many web analytics vendors are moving toward: hybrid data collection methods. While Google Analytics is itself a hosted page tagging solution, its predecessor, Urchin Software, uses cookie-enabled logfiles with page tagging to give its users the best of both worlds. This allows for greater accuracy of tracking sessions across multiple domains, eliminating the caching issues, and tracking detailed web design metrics. Bandwidth and search engine spider data is still available, and, if necessary, all data can be reprocessed, as it's based on webserver logfiles. While setting up a server-side package like Urchin could potentially involve more upfront overhead, Google Analytics Authorized Consultants like ROI Revolution can help you get over the finish line quickly, and on your way to getting the most out of your web analytics data.

If you have any questions about the methods discussed above, please drop us a line or leave a comment below.

Sources:

Posted by Michael Harrison at 4:45 PM









Filed under: ,

TrackBack

TrackBack URL for this entry:
/mt/mt-tb.cgi/113.

Comments

A good analysis. I would like to nominate three more problems to page-tagging analytics: If you forget to tag a page, you're SOL. And, the data doesn't start until you start tagging (so those ten years of server logs are useless to a page-tagging solution without the hybrid option.) And if the analytics vendor goes down, you lose the data you would have collected.

Robbin

Posted by: Robbin Steif at May 1, 2006 11:03 PM

Thanks for the addenda, Robbin. I've added your very keen nominations to the list.

Posted by: Michael Harrison at May 4, 2006 2:43 PM

Post Your Comments

Post a comment

Contacting ROI Revolution

Contact Us

Address:
3109 Poplarwood Ct.
Suite 219
Raleigh, NC 27604


Authorized Google Analytics Consultant

Google Analytics Certified Service and Support Consultant


Free Webinar!

Join a Live FREE Google Analytics Training Webinar
Register now to learn how to
turn Google Analytics into
pure profit! You'll learn how to use many of the key reports, a ten-step system for properly configuring Google Analytics on your website, and how to apply what you learn to the attainment of your online goals. Register for our free Google Analytics training webinar today.


Subscribe to the
ROI Revolution Google
Analytics Newsletter today

Free monthly email publication that contains valuable, practical tips, secrets, and much more! Subscribe to our free Google Analytics newsletter today.


Looking to Purchase
Urchin 5 Software or Fee
Based Support?

We are a Urchin software certified service and support partner! Buy Urchin 5 software, profile packs, load balancing modules, campaign tracking, or fee-based support today.