Written by Steve Perry
Published on June 29, 2018

Gathering data from log files

I learn more about the command line every day, especially at the moment in terms of working with large text files and cleaning data. This, along with learning some Python, is such a help when sifting through server logs etc.

For example, I have a very large Magento payment gateway log that I’m monitoring for errors. To grab the entries after a certain known time instead of sifting through the whole log, I’m doing:

$ awk '/2018-06-28T14:56/,0' gene_braintree.log > gene_from_2018_06_28T14:56.txt

I’m then left with just the data I want to look through. Such a time saver stuff like this is when it all adds up to daily use. Also doing:

$ grep 'failed' gene_braintree.log | grep 'DEBUG' > gene_all_DEBUG_failed.txt

Gets me all lines – which happen to include timestamps – which have both failed and DEBUG in the lines.

As I’ll be running this a few times over the next few days, I’ve created a bash script that I can run. This runs both the awk and grep commands for me and creates the output files. So a quick $ sh script.sh now gives me the data I need from an input file:

#! /bin/bash

awk '/2018-06-28T14:56/,0' gene_braintree.log > gene_from_2018_06_28T14:56.txt
grep 'failed' gene_braintree.log | grep 'DEBUG' > gene_all_DEBUG_failed.txt