Stats the way to do it
SummaryStatistics, properly audited and interpreted, allow you to establish hundreds of useful pieces of information about your customers after careful filtering from the enormous mass of information that is
It is a classic paradox of the new economy that a media that arose from the ground up offering free, democratic information access to all also provides companies and governments with an unparalleled opportunity to play at Big Brother. Putting aside these issues though, internet statistics are the single most vital aid to those responsible for websites. Statistics, properly audited and interpreted, allow you to establish hundreds of useful pieces of information about your customers after careful filtering from the enormous mass of information that is the logfile. The web is the ultimate weapon for person to person marketing because it can be so easily tailored to individuals, in the last couple of years we’ve seen sites develop their customisation enormously. Amazon.co.uk, always the benchmark for B2C offers you ‘People who bought X also bought Y, Z, A and B’, highly advanced person to person marketing. Pharmaceutical industry sites have huge potential, health is the principle topic on which people search, but it is vital to create the kind of experience that builds trust in the patient. As the internet becomes the principle source of health advice you must be confident that anyone coming to your site can find what they need quickly and easily without errors or failed links, pages that won’t load because they don’t have the right software or sites that simply don’t work on older machines. Fortunately everything that happens on your site is logged and so you can build up a very good idea of your users needs, if you know where to look. Why are statistics important? Statistics can never be fully accurate, without retinal scanning of individual users it will remain impossible to determine whether the same person is looking at a site from different machines at home, work and in an internet café. However the importance of statistics is growing as the media of the internet matures, Chrysalis, the media conglomerate, wrote an open letter to the top 500 UK B2C sites in Febrauary of this year calling for them to “Publish a recognised industry audit by September 1st or be named as a company that is not prepared to help this platform mature.” Frustration with the accuracy of statistics is commonplace among advertisers and cases of massaging of figures are numerous, the most famous being the example of entertainment site e-district, the AIM listed company had contrived to falsify key statistics in the run-up to it’s listing and the fraud was missed by both PriceWaterhouse and WestLB Panmure in their rush to get the company to market. It is doubly frustrating for advertisers that exaggerations like this exist when web traffic can be measured far more accurately than TV or Radio. No law currently exists explicitly forcing independent verification however it is implicit in the Financial Services Act, but there is also no official body to ensure audits are independent. ABCe, whose homepage is an excellent example of how easily statistical information can be gleaned from a viewer, is the new media arm of ABC, UKs foremost media auditing org. ABCe, is a member of the International Federation of Audit Bureaux Of Circulations (IFABC) and rigorously applies internationally-agreed standards to all of its own UK and Ireland audits. JICWEBS (The Joint Industry Committee for Web Standards in the UK and Ireland) is another media industry body that offers independent auditing services but the costs involved in large scale regular auditing are often prohibitive for smaller internet companies, especially when money is no longer washing around the industry. Fortunately for advertisers and site owners alike, there are a number of packages available that enable you to verify the advertising information being given by sites or check the progress of your own site against the information received from the hosting or designing company. Definitions It is important first to get to grips with definitions, there are four phrases that are often bandied around by sites touting for advertising or companies offering designing or hosting services. We’ll look at them individually. Hits Hits are the individual requests a server answers in order to render a single Web page completely. The page document itself, the various images on the page, any other media files embedded there - each of these items represents a separate hit. In other words, the more GIFs (images) used in a page, the higher the hit count - so while hits may be a good indication of poor page design, they won't tell you much about traffic. In the early days of the internet salesmen would often tout hits as being synonymous with users, whether through poor technical understanding or a deliberate aim to mislead. In these more enlightened times hits are a redundant measurement and should be disregarded, as an example it is feasible that a company could put a series of plain white images 1px by 1px all down the sides of a 800 x 600 webpage, every access to that web page would then generate a hit for each of these images in addition to the genuine number. Page Impression Page impressions, also known as page views. According to IFABC Global Web Standards (www.ifabc.org) a Page Impression is "A file or a combination of files sent to a user as a result of that user's request being received by the server." In effect, one request by a valid user should result in one page impression being counted. In most cases, a single request from a user causes the server to send several files to satisfy the request. For example, the server may send an .html file followed by several associated graphic images and audio files. A single request from a user may also cause the server to send several additional .html files to build a frameset. The site must ensure that all additional, non-requested files are filtered out and excluded when counting the claimed number of page impressions. Framesets (where a page as seen by the viewer is actually made up of several pages, all in different frames) make this less reliable but it certainly a better unit of measurement than hits. As an example of the usefulness of page impressions, imagine getting statistics that tell you your site has had 10,000 users in a month and 12,000 page impressions, the substantial number of users is highly gratifying, but the fact that they are only looking at 1.2 pages each suggests that your content is not as compelling as you would wish. Visits Visits, sometimes known as user sessions, are defined by the IFABC as "A series of one or more page impressions, served to one user, which ends when there is a gap of 30 minutes or more between successive page impressions for that user." A visit is effectively a near-continuous burst of activity by a valid user. A visit is determined by counting bursts of activity (page impressions) made by valid unique users that have not re-entered the site within the past 30 minutes. Visits are a 'better' indicator of total site activity than "unique users" since they indicate frequency of use. Unique Users The holy grail for site owners, a huge number of visits could mean a huge number of unique users but by the same token it could mean the same single person visiting the site a myriad of times. According to IFABC Global Web Standards a unique users is "An IP address plus a further identifier. Sites may use user agent, cookie and/or registration ID." Note that where users are allocated IP addresses dynamically (for example by dial-up Internet Service Providers), this definition may overstate or understate the real number of individual users concerned. A unique user is at a minimum an IP address + the browser ID with a unique address entering a web site by any page and is counted once for the given period (the minimum audit period is one calendar month). The number of unique users is an indicator of the sites audience or reach. What statistics can tell you A vast amount of data can be generated from a sites log files, however the key elements for site owners are as follows;
- Traffic - how many people are looking and how often? From this raw data other useful information can be generated, is your traffic affected by new content, or external developments in the news? Information on reach can also be established, if there are 100,000 people directly affected by a certain condition in the UK, and your therapy area sit for that condition averages at 30,000 unique users a month then there is still a substantial number of people in that market that either haven’t found your site yet or perhaps haven’t found anything useful on it.
- Audience – who is your audience and where are they coming from, are most accesses from work or from home, what IP addresses (the 4 number address that actually identify a webpage, for example PharmiWeb is actually http://188.8.131.52, try it!) readers are coming from and hence, if they’re accessing the web from work, what companies they work for using a reverse look up tool like www.ripe.net.
- Platforms/Browsers - Are your users using the Microsoft Internet Explorer or Netscape, which versions are they using. A high percentage of people accessing your site from older machines using old version of browsers will affect usability. The most aesthetically pleasing website in the world is useless to a user on an old machine with a slow download time, plug-ins lie Flash or Shockwave are all very well but having “Plug-in required” on your site can send users away quickly.
- Errors - where are people having problems and what problems are they having? We’ve all seen the infamous “HTTP 404 - File not found” error but other error codes can tell you if people are having a real problem accessing certain pages or areas, these will then need attention
- Referrers - where are people coming to your site from, which banner campaigns are working, which search engines are delivering the best results? The issue of referrers is becoming increasingly vital. Google www.google.com the best search engine around at the moment delivers quality results by assessing which sites that fit your search criteria have the highest number of links from other sites. Referrer statistics also allow analyses of banner successes. Very few companies realise that they can derive their own information on how many click throughs have resulted from a banner placement rather than be dependent on the statistics given by the site hosting the banner.
Log files All of these statistics come from log files, logfiles can come in a variety of formats but most servers generate what are known as Common Log Format files (CLF), these are standardized so most logfiles will look similar and be read by statistical packages in a similar way. A logfile tracks every hit to a webpage over a given period of time, that’s ever image downloaded to a browser with every page to house that image and every piece of embedded media in that page. A typical entry may look like this, which represents one hit…
adsl-63-183-164.ilm.bellsouth.net - - [09/May/2001:13:42:07 -0700]"GET /about.htm HTTP/1.1" 200 3741"http://www.statsaregreat.com""Mozilla/4.0 (compatible; MSIE 5.0; Windows 98)" if you can imagine that many sites are looking at having upwards of half a million hits a day then you’ll soon see how big these logfiles can be and hence why many people shy away from descending into the mine of information they amount to. But broken down they become a little more intelligible…
adsl-63-183-164.ilm.bellsouth.net - user's computer IP address and sometimes domain name of client computer requesting file
- - - userID – if your site needs a password on login then the username is entered here, if there’s no user login feature then it is portrayed as a double blank
[09/May/2001:13:42:07 - 0700] - date and time of the hit
"GET /about.htm HTTP/1.1" - the path to the requested file, POST, DELETE and HEAD can also be seen in addition to GET not as often seen.
200 – status of the hit, always a three digit code 200 means the hit was successful, anything in the 400’s is a failed hit
3741 - filesize in bytes of the requested file
http://www.statsaregreat.com – the referrer, where the user came from
"Mozilla/4.0 (compatible; MSIE 5.0; Windows 98)" - info about users browser and platform Packages We’ve seen the enormous amount of information contained within logfiles and the tremendous use that can be made of it, but how do you get this information out of the thousands of lines that make up a logfile? Fortunately there are a variety of packages on the market that can help you interpret logfiles, as well as settings on an awful lot of the servers that run the sites. Microsoft Internet Information Server (IIS) - Microsoft's Site Server application has extensive logging and analyzing capabilities. If your site is hosted, better web hosting providers will offer browser-based log file reports as part of (or as an add-on to) their basic service. Graphs, charts, numbers in a row — all a few clicks away. WebTrends - The industry standard application, Web Trends is highly configurable and user friendly. Reports are produced into a variety of formats compatible with the office suite and there is an excellent telephone support system. The only draw back with the Web Trends package is the cost which may be prohibitive for smaller companies. Sawmill - Considerably cheaper than WebTrends and with a number of devotees across the web, Sawmill earns considerable credit for its range of graphical representation options, user friendliness and ease of install. Analog - billing itself as "most popular log file analyzer in the world", Analog is the best of the freeware (free software) products available, lacking some of the user friendliness of the commercial packages and also the quality support it is nevertheless an impressive programme offering a full range of statistical analyses. Your logfiles are the key to the habits and identities of your users and can afford a wealth of information that will enable you to better tailor and develop your site, by investing the time and effort to understand what your sites statistics can tell you, you’ll achieve a tighter more targeted website along with a better return for your advertising spend.