The Complete
Internet Marketer
How Web Analytics Works
Under The Hood
by Jay Neuman

This article is an excerpt from: The Complete Internet Marketer: A Practical Guide To Everything
You Need To Know About Marketing Online


Web Analytics is the software used to measure activity on your website.  In this article, you will
learn what Web Analytics is and how to use it.  Web Analytics comes in two varieties,
log file parsing
and
page tagging.  You will learn how both work under the hood.


The ABC’s Of Log File Parsing

Log file parsing software dominated the Web Analytics market from 1995 through 2000.  During this
period, all of the fundamental website metrics were defined in terms of information that can be
obtained from web server logs.  The terminology developed at that time focused on the technical
aspects of serving web pages and tracking user sessions.  This technological (versus business)
focus set the tone for Web Analytics jargon, which continues to this day.

Web servers collect information about visits to your website in log files.  These are called web
server logs.  Every time a visitor enters the website an entry is recorded in the log file.  Every time
a link is clicked an entry is recorded in the log file.  The web server log provides you with a history
of every click that happens on the site.  

In its raw form, the log file is a big text file.  It does not really mean much to look at it.  That is
where Web Analytics comes in.  The software will splice up the log file into discrete pieces of
meaningful information and store it in a database.  Once the key information is in the database, the
software is able to analyze it to identify patterns in the data and generate reports.  The process of
slicing up a text file into meaningful chunks of information is called
parsing.

The chief advantage of log file parsing, over page tagging, is in its ability to accurately report on
website diagnostics.  Diagnostic information, such as failed page loads, is found in the web server
logs.  Page tagging does not have access to it.

Since it is reading information from your web server logs, log file parsing software must be installed
on-site.


Key Data Collected in the Web Server Log

The first step to understanding how log file parsing works is to look at the information that is
collected by your web server.  Figure 13.1 shows the most important information collected by your
web server.  These are the building blocks of your reports.  


























With these basic pieces of information, the Web Analytics software is able to calculate a tremendous
amount of information about your website visitors and their activity on the website.  Some of the
most important things log file parsing allows you to accomplish are:

Provide basic site diagnostics, to keep your website functioning properly
Identify the sources and volume of web traffic, to target marketing efforts
Identify usage patterns, to optimize site content
Measure click-thru from marketing efforts, to increase ROI


Problems with Log File Parsing

Log file parsing provides a tremendous amount of information to support your Internet Marketing program.  However, there are some challenges that limit the effectiveness of this method.  The four
biggest issues are as follows.

1.        Dynamically Assigned IP Addresses

The first problem companies had with log file parsing was a difficulty identifying unique users.  The
original method for identifying a user on the website was to use the IP address of the user’s
computer.  However, most people are not connected directly to the Internet.  They connect to the
Internet through an Internet Service Provider (ISP) such as AOL.  So the IP address is coming from
AOL not from the actual user.  Additionally, the ISP will dynamically assign a new IP address with
each click.  This is good for the ISP, but it makes it very difficult to identify unique visitors on the
website.

There was a solution to this problem.  It was to make user cookies a standard practice for
websites.  When a user enters the website, a cookie is placed on his computer.  Then, with each
subsequent click on the site, the web log records both the IP address and the cookie.  That allows
all of the hits during the visit to be associated with that specific user.  However, this method still
does not work for people who have disabled cookies on their computers.  For these cases, the
imperfect method of using IP addresses is all that can be done using the log file information.

2.        Page Caching

The second problem encountered with log file parsing would prove to be even more difficult than the
first.  As soon as a company’s website becomes popular, they start experiencing performance
drains on their web servers.  During periods of high traffic, this means customers may have to wait
a long time for pages to be displayed, because the server is also processing the requests of many
other customers at the same time.  In order to optimize the performance of their websites,
companies started saving copies of pages being served in a virtual memory storage, called a
cache.  That way, if the same page is requested again, it will be served from the cache.  This results
in tremendous performance gains which both improves user experience and saves money.  So page
caching quickly became an indispensable practice.  However, when a page is served from the
cache, it does not record an entry in the log file.  Therefore, it is impossible to accurately record site
visits when page caching is being used.

3.        Outsourcing Web Analytics

The third challenge confronting companies using log file parsing was the desire to have another
company perform their analytics for them.  Web analytics is a somewhat technical endeavor.  Not
all companies are able to dedicate in-house staff to it.  On the other hand, it is a fairly
straightforward process that could easily be done by an outside vendor.  However, with log file
parsing, the software must directly access the web server logs to work.  That means the software
must be installed in-house on the company’s web servers.  This makes it difficult to outsource.

4.        Measuring Business Objectives

The fourth problem companies had using log file parsing was that it is difficult to directly measure
whether you are meeting your business objectives online.  The log file records which pages are
being served.  It does not necessarily tell you what the customer was doing while they were on that
page.  For example, you can measure whether a sale took place on the website, by checking to see
if a confirmation page was served.  But it is difficult to tell what they actually bought, or how much
they spent.  That information is not typically recorded in the log file.


The ABC’s Of Page Tagging

The second method of performing Web Analytics, page tagging, became the method of choice for
marketers after 2001.  Companies were still reeling from the recession that followed the Dot-Com
crash.  Many were looking for a pay-as-you-go outsourcing solution for their Web Analytics.  
Businesses were also learning how to tie website activity more directly to their marketing
objectives.  They wanted a solution that reported marketing results rather than just the technical
activity on the website.

Page tagging allows companies to overcome the challenges experienced with log file parsing.  With
page tagging, you identify all of the actions you want to measure on the website.  Then you put a
small piece of programming code (usually Java Script) on every page where those actions occur.  
This is called tagging the page.  When an identified action occurs, the
tag will send a message to the
Web Analytics software recording the action in a database.  As with log file parsing, analytics is then
performed on information in the database to report on key site metrics

Page tagging is only offered as an outsourced solution.


Going Beyond Log File Parsing

Page tagging has some significant advantages over log file parsing.  For these reasons it has
become the method of choice for companies who are using Web Analytics as a strategic tool to
measure and increase the profitability of their Internet Marketing programs.  Page tagging
overcomes three of the four major challenges faced by log file parsing.  Identifying unique users
still relies on cookies being enabled on the user’s computer.

1.        Overcomes Page Caching Limitations

With page tagging, the action is recorded by programming code on the web page itself.  When the
web page loads on the user’s computer, the script file runs and records the identified actions.  This
allows companies to overcome the problem of caching web pages.  Whether a page is served from
the web server or the cache, it will still be recorded when it is loaded by the user’s browser.

Nevertheless, this method has its drawbacks also.  The data collected by page caching depends on
the user’s browser running the script file contained in the page tag.  This will fail with some
percentage of users on the website.  Those users will then be lost in the reported site metrics.  
Those users whose computers do run the page tag scripts, though, will be recorded accurately.  So,
even though there is missing data in the report, the trends reported will be accurate.

2.        Enables Outsourcing

As important as overcoming the caching limitation is the ability to outsource Web Analytics.  Page
tagging sends information over the Internet to the Web Analytics software.  One of the great things
about the Internet is that the software can be literally anywhere in the world.  That means Web
Analytics can be installed on your company’s website without needing to install any software at all.  
You just need to put the tags on your website and direct the output to your Web Analytics vendor.  
Their software will process the information and provide all the reports for you.

3.        Measures Business Objectives

Since page tagging records actions occurring while a user is viewing the web page, and not just the
log file entry recorded when the page loads, this method is able to capture more information about
the user’s visit.  You can capture information entered into forms contained on the web page as well
as data pulled from a database into the page view.  Examples of some of the information you can
record with page tagging is:

Responses submitted in online forms
Items put into the shopping cart
Actions taking place within a Flash content element
Behavior occurring within a page view, such as scrolling down or accessing an onsite utility


Problems with Page Tagging

It would be nice if there was a perfect world of clean data.  Unfortunately, there are always tradeoffs.  As with log file parsing, there are also shortcomings to page tagging.

The biggest shortcoming of log file parsing is caused by the source of information used to generate
reports.  Analytics is limited by what is captured in the web server logs.  In the same way, the
shortcomings of page tagging are also caused by its source of data.  Page tagging only records
information sent from the user’s browser once a page loads.  There are two significant drawbacks:

1.        Missing Visits

The first drawback to page tagging was already discussed.  It relies on information captured by a
script file running while the page is active on the user’s computer.  Therefore, it will be missing data
from users with browsers that fail to run the script file.  

2.        Unable to Run Site Diagnostics

A second, and more significant problem with page tagging is the inability to run certain site
diagnostics.  Page tagging can only report successful page loads for computers that successfully run
the script file contained in the tag.  Therefore, it is unable to record failed requests, such as broken
links.  It also is unable to provide the complete picture of site traffic provided by the web server
logs.  

Because of this drawback, it is not uncommon for companies to set up a basic log file parsing
solution to measure site diagnostics, while using page tagging to measure their business objectives.


Website Traffic Metrics

You now know how the two methods of Web Analytics work.  These methods both start with basic
data coming from a user’s visit on your website.  That data is then assembled into meaningful
information that can be compiled into reports measuring the success of your website.  The only
thing remaining to understand how Web Analytics works is to see what the basic building blocks of a
web traffic report are.  We conclude this chapter with a brief overview of the basic metrics used to
create Web Analytics reports.  In the next chapter, we will take a look at how these building blocks
can be assembled to create your website usage reports.

1.        Hit

A hit is the very first metric used to measure website activity.  It is also the simplest metric to
calculate.  A hit is simply one entry in the web server log.  In the very first websites, each web page
might be no more than a simple HTML page with text on it.  In this simple page, there are no
images or other files associated with the web page.  So each web page has only one single entry in
the log.  That translates into one hit for each page viewed on the website.

That quickly changed.  Today, there are very few web pages that contain nothing except HTML code
and text.  As we’ve seen above, you may have pictures, graphic images, movies or other media on
a single web page.  Each one of these will record a separate entry in the log file.  Therefore, each
time a page is viewed, there will be many “hits” recorded in the log.  For this reason, a hit is not
really a useful metric any longer.

2.        Page View

A page view is one complete web page loaded to a user’s browser.  In the web server log, a page
view consists of the HTML file for the web page plus all the associated graphics and other files
associated with that page.  A page view is made up of one or more hits.

3.        Visit / Session

The words visit and session are used interchangeably.  It refers to all of the pages viewed by a
single user at one sitting.  The session is identified by finding all of the hits for a given user that
occur within a specified period of time from each other.  Typically, a half hour is used as the cutoff.  
In other words, a session is calculated by stringing together all of the hits for a given user, where
each hit occurs no longer than 30 minutes from the one immediately before it.  The result is a
complete session.

4.        Unique Visitor

A unique visitor is a visitor to the website who can be uniquely identified.  That way if the same
visitor returns multiple times, you can measure his activity over time.  Unique visitors are typically
identified by the user cookie.  As discussed above, the older method of using the IP address is not a
reliable method for measuring unique visitors.  It is possible that a unique visitor can actually be
multiple persons.  In the case when a family or multiple employees at a company are using the
same computer, they will all have the same user cookie.

5.        Authenticated User

If the user is required to log in to the website at the start of the visit, they become an authenticated
user
.  

6.        Referring URL

The referring URL is the web page where the link that sent a visitor to your website is located.  If
the user types your URL directly into her web browser, she will have no referring URL.  These are
sometimes called walk-ins.

7.        Entry Page

The first page in a unique visit is called the entry page.  

8.        Exit Page

The last page in a unique visit is called the exit page.


Web Analytics solutions come in many varieties.  There are solutions for small businesses that
provide basic reporting at a low cost.  There are also solutions for large businesses that provide in-
depth, customized reporting and analysis for a much larger cost.  Whatever size business you have,
there is a Web Analytics solution for you.


==========================
This article is an excerpt from
The Complete Internet Marketer: A Practical Guide To Everything You
Need To Know About Marketing Online by Jay Neuman.

Since 1994, Jay Neuman has been helping businesses as varied as Fortune 500 companies, startup
Dot-Coms and nonprofit organizations overcome their Internet Marketing and Database Marketing
challenges.  

Jay is currently Sole Proprietor of the KnExT Consulting Group.
www.knextconsulting.com.  
He can be reached at
jay.neuman@knextconsulting.com