The statistics module does some of what I need and I am extending it to do the rest. If the extra code is of use to other Drupal users then I offer it free for someone to incorporate into Drupal.

This code is copyright by PeterMoulding.com and is supplied free for use in Drupal provided Drupal stays free. I hope to see Drupal set up a foundation and not go the way of some open projects such as imdb.com or Mambo.

Done

New Table

Each new visitor is recorded in table statistics_visitor. Composite fields are split into components before entry into the database so that the data can be grouped and sorted by the components.

statistics_visitor

Table statistics_visitor contains the following fields:
svid to identify the entry for the new visitor. Example: 35
date contains the timestamp the visitor arrived. Example: 20050526095220
agent contains the agent string from the visitor's browser. Example: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.7.8) Gecko/20050511 Firefox/1.0.4 (ax)
ip_address contains the ip address of the visitor's server used to access the site. Example: 192.168.0.100
scheme contains the scheme from the visitor's server. Example: http
host contains the host name of the visitor's server. Example: all.petermoulding.com
dir contains the directory. Example: /
page contains the page. Example: Australia
query contains the query. Example: Australia
fragment contains the bit after the # for a request to a location on a page. Example: top
language contains the language request from the visitor's browser and is often not supplied. Example: en-us,en;q=0.5
encoding contains the encoding request from the visitor's browser if the encoding is supplied. Example: gzip,deflate
javascript indicates the visitor's browser accepts. 1 (or true) indicates Javascript is working. 0 (or false) indicates Javascript is not working. null (or true) indicates Javascript has not yet been tested.
ajax indicates the visitor's browser works for Ajax requests. 1 (or true) indicates Ajax works. 0 (or false) indicates Ajax does not work. null (or true) indicates that Ajax has not yet been tested.

The creation SQL is:

-- 
-- Table structure for table `drupal_statistics_visitor`
-- 

create table `drupal_statistics_visitor` (
  `svid` int(10) unsigned not null auto_increment,
  `date` timestamp(14) not null,
  `agent` tinytext not null,
  `ip_address` tinytext not null,
  `port` tinytext not null,
  `scheme` tinytext not null,
  `host` tinytext not null,
  `dir` tinytext not null,
  `page` tinytext not null,
  `query` tinytext not null,
  `fragment` tinytext,
  `language` tinytext not null,
  `encoding` tinytext not null,
  `Javascript` tinyint(4) default null,
  `Ajax` tinyint(4) default null,
  primary key  (`svid`)
) type=myisam;

Original Tables

Accesses are recorded in table accesslog and node usage is recorded in table node_count.

accesslog

Table accesslog contains the following fields:
aid to identify the log entry for one access of one page. Example: 10350
title contains the page. Example: Australia
path contains the path to the page. Example: node/15. This should be the path the user sees.
url contains a URL. Example: http://test.petermoulding.com/colour. We need the URL from the referrer.
hostname contains an IP address. Example: 192.168.0.100. This should be the referrer.
uid contains the user id. Example: 0. Zero indicates the user has not logged in.
timestamp contains the timestamp that includes the date and time the page is read. Example: 1138787372. If this field was separated into date and time fields, they could be selected and sorted in a more useful manner.
timer contains an estimate of the time taken to create the page. Example: 243. This field often contains zero.
sid should contain the session id. Example: tmv6coh5j7p5rbeo1f6obq5ss3. For some strange reason the session id is often blank.

node_counter

This table is read by the node module in two selects. One select could be replaced with a function in the statistics module as it is a discrete select. The other select is a multiple table join built by the node module.
Table node_counter contains the following fields:
nid to identify the node. Example: 2
totalcount contains the total number of times this node is accessed. Example: 81.
daycount contains the total number of times this node is accessed today. Example: 3.
timestamp contains the timestamp from the last update. Example: 1142453979.

Code Changes

Visits are recorded in module statistics.module.

statistics.module

Module statistics.module contains function statistics_settings() to offer options in the administration settings page.

statistics_exit()

Add the following code after the accesslog update. This will use the statistics_visitor class to create an entry in the statistics_visitor table. Part of the entry is loaded into the session so that any other code can access the data without reading the database.

/*	Copyright PeterMoulding.com 2006 04 23
	Use the statistics_visitor table.
*/
if(variable_get('statistics_visitor', 0))
	{
	if(!isset($_SESSION['statistics']['visitor']))
		{
		include_once 'statistics_visitor.class';
		$visitor = new statistics_visitor();
		$_SESSION['statistics']['visitor'] = $visitor->get_visitor();
		}
	}

statistics_settings()
Add the following code just before the return $form; statement. This will offer the administrator the option of using the statistics_visitor table.

	$form['visitor'] = array('#type' => 'fieldset', '#title' => t('Use visitor table'));
	$form['visitor']['statistics_visitor'] = array('#type' => 'radios',
		'#title' => t('Use visitor table'),
		'#default_value' =>  variable_get('statistics_visitor', 0),
		'#options' => $options,
		'#description' => t('Create a row in the statistics_visitor table when a visitor arrives at your site.'));

statistics_visitor.class

Module statistics_visitor.class contains class statistics_visitor() and creates a new entry in the statistics_visitor table. Parts of the entry are returned so they can be loaded into the session where other code can access the data without reading the database.

To Do

Optimise table accesslog to remove data that is stored in table statistics_visitor.

Rename Tables

Accesses are recorded in table accesslog. A better name for the table would be statistics_access_log so that people can see the table is created by the statistics module.

Add Javascript Test

Table statistics_visitor contains a field to indicate visitors who has Javascript active. One way to test for Javascript would be to have a one off Javascript script to update the links on the page with an extra query, such as &js=yes, then have code to detect the extra query value and update the statistics_visitor table entry. If the extra value is not added, we can assume Javascript is not active.

The overhead would be an extra script on the first page visited and an extra database update but performed while the user is reading the next page, not while the user is waiting for a page to download.

Add Ajax Test

Table statistics_visitor contains a field to indicate visitors who have everything available to use Ajax. The simplest way to test for Ajax capability would be to have a one off Ajax script to update the statistics_visitor table entry if the user has Javascript activated and the other bits working that are needed for Ajax.

The overhead would be an extra script on the first page visited and an extra database update but performed while the user is reading the page, not while the user is waiting for the page download.

Add Reports on Visitor Table

I have a separate external reporting system and will gradually replace it with a Drupal/Ajax based reporting system once Ajax usage settles down in Drupal. If someone adds Ajax to the exisitng reports in the statistics module then the whole reporting process will be easier because additional reports will require only simple SQL additions.

petermoulding.com/web_architect

Comments

mwu’s picture

thank you.

could you contact the maintainer of graphstat, who might be incorporating additional features into the module (which extends statistics)? maybe the code can be incorporated into a new or enhanced module.

peterx’s picture

statistics_reports.module contains reports based on the statistics_visitor table.

Originally I added the reports to the statistics module but that just adds code to a module that is used for every page. I aim to make the statistics_reports module load code only when in the administration pages so that there is not a large overhead for the rest of the processing.

petermoulding.com/web_architect

peterx’s picture

Someone loaded up with more caffeine and less beer will come up with a better solution than the following code. Whatever way you do it, make sure the included file is included only when an administrator requests a report and not when regular visitors visit.

/*	Copyright PeterMoulding.com 2006 04 23. Free for use in Drupal.
Report visitor statistics.
The bulk of the code is in statistics_reports.admin.php and is loaded only when
you run the reports. That will save load time for all the other parts of your Web site.
*/

$statistics_reports_request_parts = explode('/', $_REQUEST['q']);
if($statistics_reports_request_parts[0] == 'admin'
and $statistics_reports_request_parts[1] == 'statistics_reports')
	{
	include_once('statistics_reports.admin.php');
	}

/**
 * Implementation of hook_menu().
 */
function statistics_reports_menu($may_cache)
	{
	$items = array();
	$access = user_access('access statistics');
	if($may_cache)
		{
		$items[] = array('path' => 'admin/statistics_reports',
			'title' => t('statistics reports'),
			'callback' => 'statistics_reports_reports',
			'access' => $access);
		}
	if(function_exists('_statistics_reports_menu'))
		{
		$items = _statistics_reports_menu($may_cache, $items);
		}
	return $items;
	}

The main code is in statistics_reports.admin.php and that module is loaded only when you click on the statistics report link in admin. statistics_reports.admin.php has everything except an init function.

This two step system works because this module performs only reporting. All the data is created by the regular statistics module.

Here is the first iteration of the alpha version of the first cut of statistics_reports.admin.php version 0.0.1:

/*	Copyright PeterMoulding.com 2006 04 23. Free for use in Drupal.
Report visitor statistics.
*/

/**
 * Implementation of hook_help().
 */
function statistics_reports_help($section)
	{
	switch ($section)
		{
		case 'admin/help#statistics_reports':
			$output = '<p>'. t('The statistics module keeps track of site usage.'
				. ' The statistics reports module reports from the tables produced by the statistics module.') .'</p>';
			$output .= t('<p>Statistics reports module features</p>
				<ul>
				<li>Reports show how many times people visit your site and accesses to specific content.</li>
				</ul>
				');
			$output .= t('<p>Configuring the statistics reports module</p>
				<ul>
				<li>This module will automatically use the statistics_visitor table if the table is switched on in the statistics module.</li>
				</ul>
				');
			return $output;
		case 'admin/modules#description':
			return t('Reports access statistics for your site. Requires statistics module.');
		case 'admin/statistics_reports':
			return t('<p>Visitors lists the statistics_visitor table which has one entry per visitor to your site.</p>'
				. '<p>Visits by Host lists the total visits by hosts from the statistics_reports_visitor table.</p>');
		case 'admin/statistics_reports/visitor':
			return t('<p>List of the statistics_visitor table which has one entry per visit to your site.'
				. ' The Visits by Host report is derived from this table by counting unique hosts.</p>');
		case 'admin/statistics_reports/host':
			return t('<p>List the total visitors from each site.'
				. ' The Visits by Host report is derived from the statistics_visitor table by counting unique hosts.</p>');
		}
	}

function _statistics_reports_menu($may_cache, $items)
	{
	$access = user_access('access statistics');
	if(!$may_cache)
		{
		$items[] = array('path' => 'admin/statistics_reports/reports',
			'title' => t('Reports'),
			'type' => MENU_DEFAULT_LOCAL_TASK, 'weight' => -10);
		$items[] = array('path' => 'admin/statistics_reports/visitor',
			'title' => t('Visitors'),
			'callback' => 'statistics_reports_visitor',
			'access' => $access,
			'type' => MENU_LOCAL_TASK);
		$items[] = array('path' => 'admin/statistics_reports/hosts',
			'title' => t('Visits by Host'),
			'callback' => 'statistics_reports_visitor_host',
			'access' => $access,
			'type' => MENU_LOCAL_TASK);
		}
	return $items;
	}

/*	Copyright PeterMoulding.com 2006 04 23
List visitor statistics.
*/
function statistics_reports_reports()
	{
	$output = '';
	return $output;
	}

function statistics_reports_visitor()
	{
	$header = array(
		array('data' => t('Date'), 'field' => 'sv.date', 'sort' => 'desc'),
		array('data' => t('Host'), 'field' => 'sv.host'),
		array('data' => t('Dir'), 'field' => 'sv.dir'),
		array('data' => t('Page'), 'field' => 'sv.page'),
		t('Display')
		);
	$sql = 'SELECT sv.svid, sv.date, sv.host, sv.dir, sv.page FROM {statistics_visitor} sv';
	$sql .= tablesort_sql($header);
	$result = pager_query($sql, 50);
	$status = array(t('blocked'), t('active'));
	while ($visitor = db_fetch_object($result))
		{
		$rows[] = array($visitor->date,
		$visitor->host,
		$visitor->dir,
		$visitor->page,
		l(t('display'), 'statistics/' . $visitor->svid . '/display', array()));
		}
	$output = theme('table', $header, $rows);
	$output .= theme('pager', NULL, 50, 0);
	return $output;
	}

/*	Copyright PeterMoulding.com 2006 04 23
List visitor statistics.
*/
function statistics_reports_visitor_host()
	{
	$header = array(
		array('data' => t('Host'), 'field' => 'sv.host'),
		array('data' => t('Visits'), 'field' => 'visits', 'sort' => 'desc'));
	$sql = 'SELECT count(*) as visits, sv.host FROM {statistics_visitor} sv group by sv.host';
	$sql .= tablesort_sql($header);
	$result = pager_query($sql, 50);
	while ($visitor = db_fetch_object($result))
		{
		$rows[] = array($visitor->host, $visitor->visits);
		}
	$output = theme('table', $header, $rows);
	$output .= theme('pager', NULL, 50, 0);
	return $output;
	}

petermoulding.com/web_architect