Category: General

  • Import historical data from Apache logs into AWStats

    One of my clients had a problem where the last 6 months of data was not in Google Analytics. Upon investigating it turned out that for some reason the WordPress Google Analytics plugin was not active. I could not determine why it was not active when I am sure I set it up in the past.
    I had all the Apache logs for the period in question so it seemed a simple idea to put the data into something useful that would show charts to the client. AWStats is perfect for that. In fact I used to use it long ago before Google Analytics was available but I had forgotten about it. As with all good open source software, the project is still there and ticking along.
    Configuring AWStats turned out to be a but tricky. By default, debian sets AWStats up for a one domain host. My Apache logs are configured in the vhost_combined format which is one access.log file for all the virtual hosts.
    The log files are rotated by logrotate and numbered access.log.1 access.log.2 access.log.3 .. access.log.10 etc. This presents another problem as you need to get them into order and normal alphabetical sorting does not work as there are no leading 0s in the file names.
    Further, Apache was misconfigured and all the virtual host entries which should have indicated which virtual host was serving that access were in fact showing the ServerName. Luckily the entries do include the actual URL that was requested so with a bit of grep and sed it was easy to reconstruct what the virtual host should have been.
    I wrote little bash script that would take a file name, either (eg access.log or access.log.gz) and would output that file after having parsed it to fix up the errors (later I discovered zcat -f will cat a file whether it is gziped or not so invalidating the need for the mycat function). You’ll see in the sed regular expression that I change the : to a space, AWStats does not like having a : between the hostname and the port and I could find no way to making AWStats parse that correctly. The reason there is two regex replacements in the sed command is that I fixed the apache logging of the host name prior to running this script, so needing to take into account both cases of old hostname and new hostname.
    I could have made the sed regex taking into account the port number but I’m only interested in port 80 anyway and didn’t see the need to spend time on getting that working.
    Log file format:

    # Actual
    old.host.name:80 199.7.156.141 - - [16/Sep/2012:17:25:51 +1000] "GET /wp-content/themes/grip/style.css HTTP/1.1" 200 7108 "http://correct.host.name/" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; Trident/4.0; BTRS126493; EasyBits GO v1.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; eSobiSubscriber 2.0.4.16; InfoPath.2)"
    old.host.name:80 199.7.156.141 - - [16/Sep/2012:17:25:51 +1000] "GET /wp-content/themes/grip/stylesheet/nivo-slider/nivo-slider.css HTTP/1.1" 200 968 "http://correct.host.name/" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; Trident/4.0; BTRS126493; EasyBits GO v1.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; eSobiSubscriber 2.0.4.16; InfoPath.2)"
    # Required for importing to AWStats
    correct.host.name 80 199.7.156.141 - - [16/Sep/2012:17:25:51 +1000] "GET /wp-content/themes/grip/style.css HTTP/1.1" 200 7108 "http://correct.host.name/" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; Trident/4.0; BTRS126493; EasyBits GO v1.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; eSobiSubscriber 2.0.4.16; InfoPath.2)"
    correct.host.name 80 199.7.156.141 - - [16/Sep/2012:17:25:51 +1000] "GET /wp-content/themes/grip/stylesheet/nivo-slider/nivo-slider.css HTTP/1.1" 200 968 "http://correct.host.name/" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; Trident/4.0; BTRS126493; EasyBits GO v1.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; eSobiSubscriber 2.0.4.16; InfoPath.2)"
    

    catlogs.sh finds relevant lines in given log file and reformats them to be suitable for importing into AWStats and outputs to stdout:

    #!/bin/bash
    mycat() {
        local f;
        for f; do
            case $f in
                *.gz) gzip -cd "$f" ;;
                *) cat "$f" ;;
            esac;
        done;
    }
    mygrep() {
        #get all the lines from log file which have accesses to the correct.host.name
        mycat $1 | grep 'http://correct.host.name' | \
            sed -e 's/old.host.name:80/correct.host.name 80/ ; s/correct.host.name:80/correct.host.name 80/' # replace incorrect hostnames
    }
    mygrep $1
    

    Then I needed to loop through all the access.log files in the apache log directory in historical order. To do that I wrote a simple for loop on the command line.

    for i in $(ls /var/log/apache2/access.log* | sort -r -n -k 3 -t '.' ) ; do sudo -u www-data /usr/lib/cgi-bin/awstats.pl -showcorrupted -showsteps -LogFile="bash /home/jason/catlogs.sh $i |" -config=/etc/awstats/awstats.correct.host.name.conf ; done;
    

    A nice thing with AWStats is you can pass in a command that outputs to stdout as the log file -LogFile="bash /home/jason/catlogs.sh $i |". I used sort to get the files into numerical order. sort’s -k and -t options let you sort by a “KEY”. The logs need to go from oldest at the top to newest at the bottom, so you have to process the files in reverse number order.
    Lastly, to ensure AWStats can read the apache access logs in future, I changed the apache vhost_combined format to:

    LogFormat "%V %p %h %l %u %t \"%r\" %>s %O \"%{Referer}i\" \"%{User-Agent}i\"" vhost_combined

    and I changed awstats log format to:

    LogFormat = "%virtualname %other %host %other %logname %time1 %methodurl %code %bytesd %refererquot %uaquot"
  • Lego Brick Separator

    Lego Brick Separator

    We got some Lego for my son the other day and it came with a Lego Brick Seprarator which has to be the best invention since Lego itself. It makes short work of stubborn bricks that are stuck together. As a kid I used to resort to using my teeth to separate them. This thing makes it easy.

  • .bash_profile vs .bashrc in OS X Terminal.app

    .bash_profile vs .bashrc or why does OS X ignore my .bashrc in Terminal.app?

  • Keyboard shortcut to un-minimise a window in OS X

    It’s a bit fiddly but it can be done.
    Cmd+m to the item you want to un-minimise, then while still holding cmd, press Option. Release Cmd and then finally release Option.

  • Cmd+m to Minimise a window in OS X

    Cmd+m to minimise a window in OS X

  • Collectd causing rrd illegal attempt to update using time errors

    I found collectd causing rrd illegal attempt to update using time errors. I was seeing a whole load of lines like this in my syslog:
    Aug 20 16:27:12 mythbox collectd[32167]: rrdtool plugin: rrd_update_r (/var/lib/collectd/rrd/mythbox/df-root/df_complex-free.rrd) failed: /var/lib/collectd/rrd/mythbox/df-root/df_complex-free.rrd: illegal attempt to update using time 1345444032 when last update time is 1345444032 (minimum one second step)

    It was adding one message like that every second so my logs were completely full of it. Google didn’t reveal much except that this sort of error is either because there are two instances of RRD trying to write the RRD database at the same time, or that my server’s date and time are way out of sync. Neither of these were true in my case.
    I asked on #collectd on freenode and a very nice person by the name of tokkee told me that it’s a known issue of sorts. The df plugin for collectd uses /proc/mount to determine which drives to check free space on and if / is in there twice, it tries to update the entry for / twice and causes the problem.
    The solution is to add the following to the /etc/collectd/collectd.conf file:

    
            FSType "rootfs"
            IgnoreSelected true
    
    

    Then I restarted collectd and my logs were peaceful again.
    Update 2014-04-10:
    I was getting these errors again on one of my VPS hosts. In this instance, / only appeared once in /proc/mounts but /run was in there multiple times:

    root@new:/etc/collectd# cat /proc/mounts
    rootfs / rootfs rw 0 0
    /dev/root / ext3 rw,relatime,errors=remount-ro,data=ordered 0 0
    devtmpfs /dev devtmpfs rw,relatime,size=1085360k,nr_inodes=271340,mode=755 0 0
    tmpfs /run tmpfs rw,nosuid,noexec,relatime,size=217328k,mode=755 0 0
    tmpfs /run/lock tmpfs rw,nosuid,nodev,noexec,relatime,size=5120k 0 0
    proc /proc proc rw,nosuid,nodev,noexec,relatime 0 0
    sysfs /sys sysfs rw,nosuid,nodev,noexec,relatime 0 0
    tmpfs /run/shm tmpfs rw,nosuid,nodev,noexec,relatime,size=460860k 0 0
    devpts /dev/pts devpts rw,nosuid,noexec,relatime,gid=5,mode=620 0 0
    root@new:/etc/collectd#
    

    The solution is to ignore tmpfs instead of rootfs:

    
            FSType "tmpfs"
            IgnoreSelected false
    
    
  • Choosing passwords for the 21st century

    The recent Mat Honan hack got me thinking about password strength. It turns out he was hacked not due to having a poor password, but because of a security flaws in Amazon and Apples’ systems. Nevertheless it serves as a good reminder to keep yourself safe.
    One thing you can do is use very long passwords for important things. Increasing the length of your password can make it seriously more difficult for anyone to brute force attack your password.
    To get an idea of the impact a long password, have a look at this site: How Big is your Haystack. It lets you type in a password and it gives you an idea of how long it would withstand a brute force attack for. Obviously don’t type your real password in, but type in something that uses the same number of letters, numbers, capitals and punctuation and see how it looks.
    8 lower case letter passwords? 2.17 seconds in an offline attack scenario. It’s not until you get up to 17 lower case letters that it pushes the offline attack scenario into the the virtually impossible range.
    So how do you go about picking a strong password?
    Diceware. Essentially you roll a dice 25 times to form 5 groups of 5 numbers. Then you look each number up in the list of words to generate a 5 word password. Being 5 words makes it relatively easy to remember but also very long.
    If you don’t feel like rolling dice, you could consider using random.org to generate a list of numbers for you. If you choose this approach, make sure to visit the site using https and get a nice long list and choose a set of numbers from the list. Write it down on a piece of paper and put it in a safe place. Note this is not as secure as using the offline dice rolling approach.
    As a final note, consider using multi factor authentication if you can. Google have made it available for gmail and I recommend you sign up for it.

  • Event Cinemas’ lousy customer service comes good

    Recently I went to see Dark Knight Rising with my wife. The volume in the cinema was too loud and we didn’t enjoy it because of that. My wife went out to complain to the only person she could find. A security guard who couldn’t be arsed to do anything about it. He said he would “try” but there was no change in the volume that we could detect. I find this trend in increasing volume in cinemas to be disturbing.
    On the plus side, they actually have a ticketing system for complaints.
    On the negative side, at first their response was lousy but after a bit more complaining they improved. Conversation follows:

    Hi,
    on 6/8/2012 at 8:20pm session in cinema 8 i saw Dark Night Rising.
    Right from the start of the movie it was too loud. My wife and I felt uncomfortable at the noise level.
    After a while my wife left the cinema to complain. She spoke to the security guard on the ticket collection stand. He said he would “try to do something” but there was no detectable change in the volume level. it really was too loud during the action scenes and consequently we left the cinema with ringing ears. My ears continued to ring all night long.
    I don’t really like to think I paid good money to have hearing damage. I’m not quite sure what was wrong with your setup but the system failed us on two counts.
    1. it was too loud to begin with. you should have systems in place to ensure it is not too loud.
    2. when we complained about it, your staff did nothing to rectify the situation.
    I might also add that on leaving the cinema, we felt like complaining but the place was deserted except for the same security person who did nothing in the first place.
    This is not the kind of experience I expect from the Event Cinema brand and frankly I doubt I will be attending an Event Cinema again for quite some time.
    Yours, Jason.

    To which they replied with the fairly lack lustre:

    Hi Jason,
    Thank you for contacting us.
    Customer Service is of the upmost priority to us. We do endevour to make the cinema going experience the best for each individual & we can assure you that it is very rare that we receive feedback about our film volume levels. Please know that we do appreciate you bringing this to our attention and will contact the appropriate parties so this doesn’t happen again.
    We hope that you will attend our cinemas again in the near future.
    Kind Regards,
    E Support – Event Cinemas George Street

    This riled me up a little and I sent them this:

    Wow, I’m pretty surpprised that you couldn’t even manage an apology. And not only that, your email aludes to perhaps the issue is my ears rather than your sound configuration.
    In order for me to go to the cinema, I need to organise baby sitting. It’s not easy to get to the cinema, and when we got there, it was uncomfortably loud and your staff couldn’t be bothered to even fix it.
    I’m very disappointed.

    which apparently got their attention:

    Hi Jason,
    Please let me sincerely apologies for the interruption to your movie experience. As a mother of 5 myself I do know how difficult it actually is to get some time out from the kids to attend any outing. If you would like to forward your address to us, we are more than happy to provide you with some passes for you to come back & watch a movie at a time that suits you.
    We will definately speak to our staff & the security company to ensure that any issues like this are looked after appropriately & accordingly.
    Thank you for taking the time out to give us your feedback. It is feedback like this that we take on board seriously & use to better our service.

  • Debugging of Pathfinder Rovers

    Jeff Waugh @jdub tweeted an interesting article about how the previous Mars rovers turned out to have a fairly serious software problem, and how the JPL engineers diagnosed and fixed it.
    Some of the points I found particularly interesting:

    • JPL use a propriety, off the shelf operating system to run the rovers (VxWorks)
    • having a replica of a live system for debugging is very useful
    • leaving debugging tools in the remote system saved the day
    • Finding a way to reproduce the error is critical
    • Don’t ignore strange behaviour thinking it might just go away

    I love reading stories of issues like this and how the engineers fixed them. Well worth reading.

  • Flipping the mouse wheel scroll direction in Windows 7

    I have been getting very confused between the mouse wheel scroll directions in Windows 7 and Mac OS X Lion. As I consider OS X to be the future, I decided to try and flip the mouse wheel scroll direction in Windows.
    Turns out there is a registry setting to do this: HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Enum\HID\????\????\Device Parameters
    Set the value of FlipFlopWheel to 1
    You need to find the USB enumeration values (shown above as ????). You can get those by going to the Mouse Control Panel, click on the hardware tab and click Properties. then in the Details tab of the HID-compliant mouse Properties window, look at the Device Instance Path property. It will be something like: HID\VID_046D&PID_C049&MI_00\7&25DD4DC&0&0000

    This is quite well documented on superuser.com and there is even a link to a little .exe that automates the whole process for you. Although I have not tried it so I can’t vouch for it.