Tag: debian

  • Import historical data from Apache logs into AWStats

    One of my clients had a problem where the last 6 months of data was not in Google Analytics. Upon investigating it turned out that for some reason the WordPress Google Analytics plugin was not active. I could not determine why it was not active when I am sure I set it up in the past.
    I had all the Apache logs for the period in question so it seemed a simple idea to put the data into something useful that would show charts to the client. AWStats is perfect for that. In fact I used to use it long ago before Google Analytics was available but I had forgotten about it. As with all good open source software, the project is still there and ticking along.
    Configuring AWStats turned out to be a but tricky. By default, debian sets AWStats up for a one domain host. My Apache logs are configured in the vhost_combined format which is one access.log file for all the virtual hosts.
    The log files are rotated by logrotate and numbered access.log.1 access.log.2 access.log.3 .. access.log.10 etc. This presents another problem as you need to get them into order and normal alphabetical sorting does not work as there are no leading 0s in the file names.
    Further, Apache was misconfigured and all the virtual host entries which should have indicated which virtual host was serving that access were in fact showing the ServerName. Luckily the entries do include the actual URL that was requested so with a bit of grep and sed it was easy to reconstruct what the virtual host should have been.
    I wrote little bash script that would take a file name, either (eg access.log or access.log.gz) and would output that file after having parsed it to fix up the errors (later I discovered zcat -f will cat a file whether it is gziped or not so invalidating the need for the mycat function). You’ll see in the sed regular expression that I change the : to a space, AWStats does not like having a : between the hostname and the port and I could find no way to making AWStats parse that correctly. The reason there is two regex replacements in the sed command is that I fixed the apache logging of the host name prior to running this script, so needing to take into account both cases of old hostname and new hostname.
    I could have made the sed regex taking into account the port number but I’m only interested in port 80 anyway and didn’t see the need to spend time on getting that working.
    Log file format:

    # Actual
    old.host.name:80 199.7.156.141 - - [16/Sep/2012:17:25:51 +1000] "GET /wp-content/themes/grip/style.css HTTP/1.1" 200 7108 "http://correct.host.name/" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; Trident/4.0; BTRS126493; EasyBits GO v1.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; eSobiSubscriber 2.0.4.16; InfoPath.2)"
    old.host.name:80 199.7.156.141 - - [16/Sep/2012:17:25:51 +1000] "GET /wp-content/themes/grip/stylesheet/nivo-slider/nivo-slider.css HTTP/1.1" 200 968 "http://correct.host.name/" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; Trident/4.0; BTRS126493; EasyBits GO v1.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; eSobiSubscriber 2.0.4.16; InfoPath.2)"
    # Required for importing to AWStats
    correct.host.name 80 199.7.156.141 - - [16/Sep/2012:17:25:51 +1000] "GET /wp-content/themes/grip/style.css HTTP/1.1" 200 7108 "http://correct.host.name/" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; Trident/4.0; BTRS126493; EasyBits GO v1.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; eSobiSubscriber 2.0.4.16; InfoPath.2)"
    correct.host.name 80 199.7.156.141 - - [16/Sep/2012:17:25:51 +1000] "GET /wp-content/themes/grip/stylesheet/nivo-slider/nivo-slider.css HTTP/1.1" 200 968 "http://correct.host.name/" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; Trident/4.0; BTRS126493; EasyBits GO v1.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; eSobiSubscriber 2.0.4.16; InfoPath.2)"
    

    catlogs.sh finds relevant lines in given log file and reformats them to be suitable for importing into AWStats and outputs to stdout:

    #!/bin/bash
    mycat() {
        local f;
        for f; do
            case $f in
                *.gz) gzip -cd "$f" ;;
                *) cat "$f" ;;
            esac;
        done;
    }
    mygrep() {
        #get all the lines from log file which have accesses to the correct.host.name
        mycat $1 | grep 'http://correct.host.name' | \
            sed -e 's/old.host.name:80/correct.host.name 80/ ; s/correct.host.name:80/correct.host.name 80/' # replace incorrect hostnames
    }
    mygrep $1
    

    Then I needed to loop through all the access.log files in the apache log directory in historical order. To do that I wrote a simple for loop on the command line.

    for i in $(ls /var/log/apache2/access.log* | sort -r -n -k 3 -t '.' ) ; do sudo -u www-data /usr/lib/cgi-bin/awstats.pl -showcorrupted -showsteps -LogFile="bash /home/jason/catlogs.sh $i |" -config=/etc/awstats/awstats.correct.host.name.conf ; done;
    

    A nice thing with AWStats is you can pass in a command that outputs to stdout as the log file -LogFile="bash /home/jason/catlogs.sh $i |". I used sort to get the files into numerical order. sort’s -k and -t options let you sort by a “KEY”. The logs need to go from oldest at the top to newest at the bottom, so you have to process the files in reverse number order.
    Lastly, to ensure AWStats can read the apache access logs in future, I changed the apache vhost_combined format to:

    LogFormat "%V %p %h %l %u %t \"%r\" %>s %O \"%{Referer}i\" \"%{User-Agent}i\"" vhost_combined

    and I changed awstats log format to:

    LogFormat = "%virtualname %other %host %other %logname %time1 %methodurl %code %bytesd %refererquot %uaquot"
  • Apache Virtual Host configuration for a Networked WordPress Installation

    Direct any URL request that Apache receives to the WordPress installation. You need to do it if you are setting up a WordPress Network multi-site installation that has sites with their own unique domain names. e.g. site1.org, site2.com, someothersite.co.uk etc.

    /etc/apache2/sites-enabled$ ls -al
    total 8
    drwxr-xr-x 2 root root 4096 Jul 25 13:18 .
    drwxr-xr-x 7 root root 4096 Jul 24 12:28 ..
    lrwxrwxrwx 1 root root 40 Jul 24 12:06 000-wordpress-network-ssl -> ../sites-available/wordpress-network-ssl
    lrwxrwxrwx 1 root root 36 Jul 24 12:02 010-wordpress-network -> ../sites-available/wordpress-network
    

    Order of the files is very important. wordpress-network contents below:

    <VirtualHost *:80>
    UseCanonicalName Off
    ServerAlias *.examplehost.com examplehost.com
    ServerName examplehost.com
    DocumentRoot /var/www
    Options All
    ServerAdmin myname@examplehost.com
    # Store uploads of www.domain.com in /srv/www/wp-uploads/\
    <VirtualHost *:80>
    UseCanonicalName Off
    ServerAlias *.examplehost.com examplehost.com
    ServerName examplehost.com
    DocumentRoot /var/www
    Options All
    ServerAdmin myname@examplehost.com
    # Store uploads of www.domain.com in /srv/www/wp-uploads/$0
    RewriteEngine On
    RewriteRule ^/wp-uploads/(.*)$ /var/www/wp-uploads/%{HTTP_HOST}/$1
    # try and make server-status return server status
    #RewriteRule ^/server-status - [L]
    RewriteCond %{REQUEST_URI} !=/server-status
    <Location /server-status>
    SetHandler server-status
    Order Deny,Allow
    # Deny from all
    # Allow from localhost
    Allow from all
    </Location>
    <Directory />
    Options FollowSymLinks
    AllowOverride All
    </Directory>
    CustomLog /var/log/apache2/access.log vhost_combined
    ErrorLog /var/log/apache2/error.log
    # this is needed when activating multisite, WP needs to to a
    # fopen("http://randomname.domain.com") to verify
    # that apache is correctly configured
    php_admin_flag allow_url_fopen on
    </VirtualHost>
    
    RewriteEngine On RewriteRule ^/wp-uploads/(.*)$ /var/www/wp-uploads/%{HTTP_HOST}/\ # try and make server-status return server status #RewriteRule ^/server-status - [L] RewriteCond %{REQUEST_URI} !=/server-status <Location /server-status> SetHandler server-status Order Deny,Allow # Deny from all # Allow from localhost Allow from all </Location> <Directory /> Options FollowSymLinks AllowOverride All </Directory> CustomLog /var/log/apache2/access.log vhost_combined ErrorLog /var/log/apache2/error.log # this is needed when activating multisite, WP needs to to a # fopen("http://randomname.domain.com") to verify # that apache is correctly configured php_admin_flag allow_url_fopen on </VirtualHost>
  • Collectd causing rrd illegal attempt to update using time errors

    I found collectd causing rrd illegal attempt to update using time errors. I was seeing a whole load of lines like this in my syslog:
    Aug 20 16:27:12 mythbox collectd[32167]: rrdtool plugin: rrd_update_r (/var/lib/collectd/rrd/mythbox/df-root/df_complex-free.rrd) failed: /var/lib/collectd/rrd/mythbox/df-root/df_complex-free.rrd: illegal attempt to update using time 1345444032 when last update time is 1345444032 (minimum one second step)

    It was adding one message like that every second so my logs were completely full of it. Google didn’t reveal much except that this sort of error is either because there are two instances of RRD trying to write the RRD database at the same time, or that my server’s date and time are way out of sync. Neither of these were true in my case.
    I asked on #collectd on freenode and a very nice person by the name of tokkee told me that it’s a known issue of sorts. The df plugin for collectd uses /proc/mount to determine which drives to check free space on and if / is in there twice, it tries to update the entry for / twice and causes the problem.
    The solution is to add the following to the /etc/collectd/collectd.conf file:

    
            FSType "rootfs"
            IgnoreSelected true
    
    

    Then I restarted collectd and my logs were peaceful again.
    Update 2014-04-10:
    I was getting these errors again on one of my VPS hosts. In this instance, / only appeared once in /proc/mounts but /run was in there multiple times:

    root@new:/etc/collectd# cat /proc/mounts
    rootfs / rootfs rw 0 0
    /dev/root / ext3 rw,relatime,errors=remount-ro,data=ordered 0 0
    devtmpfs /dev devtmpfs rw,relatime,size=1085360k,nr_inodes=271340,mode=755 0 0
    tmpfs /run tmpfs rw,nosuid,noexec,relatime,size=217328k,mode=755 0 0
    tmpfs /run/lock tmpfs rw,nosuid,nodev,noexec,relatime,size=5120k 0 0
    proc /proc proc rw,nosuid,nodev,noexec,relatime 0 0
    sysfs /sys sysfs rw,nosuid,nodev,noexec,relatime 0 0
    tmpfs /run/shm tmpfs rw,nosuid,nodev,noexec,relatime,size=460860k 0 0
    devpts /dev/pts devpts rw,nosuid,noexec,relatime,gid=5,mode=620 0 0
    root@new:/etc/collectd#
    

    The solution is to ignore tmpfs instead of rootfs:

    
            FSType "tmpfs"
            IgnoreSelected false