Using AWK in Bash Scripts

In the past Python has been my go to language for quick scripts, however lately I’ve done a lot of projects where I’ve needed to use small Bash scripts.

I found that by adding a little bit of AWK to my Bash scripts I’ve been able to do something in one line of Bash/AWK that would of taken me multiple lines of Python.

AWK is named after it’s authors: Alfred Aho, Peter Weinberger, and Brian Kernighan, and it is an old school (1994) text extraction and reporting tool.

The nice thing about AWK is that you only need to learn a couple of commands to make it usefully.

Get a Specific Row and Column Item

The iostat command can be used to show CPU stats. To get the idle time the 6th item in the 4th needs to accessed:

The AWK code can read the piped information and the AWK NR (row number) variable can be used filter just that row. The CPU idle time is item 6 (variable $6)

~$ iostat | awk '{if (NR==4) print $6}'
96.92

AWK logic need to be in single quotes and curly brackets groups together statements. This logic says: if the Number of Record (NR) variable is 4 print the 6th item.

Integer and Float Math, Variables and Formatting

Managing integers and floats in Bash can be a little challenging, luckily awk can offer some help.

Below is an example of float math and printing (yes… using a let with bc might be easier… but this is an example):

$ # Do math with printf
$ echo "3 4" | awk '{a=$1; b=$2; printf "%0.2f \n", (a / b) }' 
0.75
 
$ # Do math in awk and format the print (4 decimals)
$ echo "3 4" | awk '{c=$2/$1; printf "%0.4f \n", c }' 
1.3333 

$ # Send awk output to a variable
$ d=$(echo "3 4" | awk '{c=$2/$1; printf "%0.4f\n", c }') 
$ echo $d
1.3333

Below is an example of getting just the CPU temperature from the sensors utility and stripping out the “+” and “°C”:

$ sensors
dell_smm-virtual-0
Adapter: Virtual device
Processor Fan: 2706 RPM
CPU:            +44.0°C  
Ambient:        +37.0°C  
SODIMM:         +36.0°C  

$ sensors | grep CPU
CPU:            +44.0°C  

$ sensors | grep CPU | awk '{printf "%d\n", $2}'
44

Formatting Text File Output

To get specific columns, print them as required. The example below only prints columns 1 and 3:

$ cat pi_data.txt
time temp wave(ft) comments
---- ---- -------- --------
10:00 24   3       No wind
12:00 26   5       High winds
14:00 25   4       wind calming down

$ # print columns 1 and 3 with a tab between
$ cat pi_data.txt | awk '{print $1 "\t" $3}'
time	wave(ft)
----	--------
10:00	3
12:00	5
14:00	4

Output can be filter based on an item in a row. For example only print if the 1st item is a number:

$ cat pi_data.txt | awk '{if ( $1 ~ /[0-9]/ ) print $0}'
10:00 24   3       No wind
12:00 26   5       High winds
14:00 25   4       wind calming down

If-Else Logic

More complex logic can be added with if-else logic. Below is an example that pipes only the data and then it changes a column value to a string based on a condition:

$ cat pi_data.txt
time temp wave(ft) comments
---- ---- -------- --------
10:00 24   3       No wind
12:00 26   5       High winds
14:00 25   4       wind calming down

$ # Show time and small or medium for wave size
$ cat pi_data.txt | \
>   awk '{if ( $1 ~ /[0-9]/ ) print $0'} | \
>   awk '{if ($3 < 4) {print $1 "\t small"} else { print $1 "\t medium"} }'
10:00	 small
12:00	 medium
14:00	 medium

A single AWK command to adjust the title and then change the data:

$ cat pi_data.txt | \
>   awk '{if ( $1 ~ /[0-9]/ ) \
>            { \
>               {if ($3 < 4) {print $1 "\t small"} else { print $1 "\t medium"} } \
>        } else { print $1 "\t " $3} \
>        }'
time	 wave(ft)
----	 --------05/20/2022
10:00	 small
12:00	 medium
14:00	 medium

Math on a Column of Data

Bash is easy to use for a row of data, but I find it tricky on columns of data.

As an example, to get the total size of all Sqlite files, ls – l *db is piped to an awk statement. This awk statement that has two parts the first part sums column five (sum +=$5), the END is used to do line-by-line, the second part prints the result.

pete@lubuntu:~/dbs$ ls -l *.db 
-rw-r--r-- 1 pete pete  323584 Apr 14  2020 ebola.db
-rw-r--r-- 1 pete pete 5124096 Apr 14  2020 netflix.db
-rw-r--r-- 1 pete pete   98304 Apr  8  2020 sars.db
-rw-r--r-- 1 pete pete  503808 Apr 14  2020 schools.db
-rw-r--r-- 1 pete pete  208896 Apr  8  2020 schools_old.db
-rw-r--r-- 1 pete pete    8192 Feb  8 20:11 someuser.db
pete@lubuntu:~/dbs$ ls -l *.db | awk '{sum +=$5 } END {print "Total= " sum}'
Total= 6266880

I used this approach to find how much power is being consumed on all my USB ports:

pete@lubuntu:~/dbs$ lsusb -v  2>&- | grep -E  'Bus 00|MaxPower'
Bus 002 Device 002: ID 8087:0024 Intel Corp. Integrated Rate Matching Hub
    MaxPower                0mA
Bus 002 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
    MaxPower                0mA
Bus 001 Device 002: ID 8087:0024 Intel Corp. Integrated Rate Matching Hub
    MaxPower                0mA
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
    MaxPower                0mA
Bus 004 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
    MaxPower                0mA
Bus 003 Device 004: ID 413d:2107  
    MaxPower              100mA
Bus 003 Device 003: ID 04b3:310c IBM Corp. Wheel Mouse
    MaxPower              100mA
Bus 003 Device 002: ID 1a40:0101 Terminus Technology Inc. Hub
    MaxPower              100mA
Bus 003 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
    MaxPower                0mA

pete@lubuntu
:~/dbs$ (lsusb -v 2>&- | grep MaxPower | grep -o -E '[0-9]+' ) | awk '{ sum += $1} END {print "\nTotal= " sum " mA"}' Total= 300 mA

Finding and Killing a Task

There are a number different approaches to doing. To find and kill a number of tasks:

#!/bin/bash
#
# stop_task.sh - stop a task
#
task1="edublocks"

echo "Stopping $task1..."
ps -e | grep -E $task1 | \
 awk '{print $1}' | xargs sudo kill -9 1>&-

# another option:
ps -e | grep $task1 | awk '{system("sudo kill " $1 "  1>&-")'}

Some Useful AWK Statements

There is an enhanced version of AWK, GAWK (GNU AWK) that might already be loaded on your Linux system. If you are on a Raspberry Pi you can install GAWK by:

sudo apt-get install gawk

There are some excellent tutorials on AWK below are some of commands that I’ve found useful:

substr(string,position,length) – get part of a string:

An example of substr could be used to get the CPU temperature from the sensors utility:

~$ sensors | grep CPU | awk '{print substr($2,2,4)}'
 44.0

The substr() command looks at the 2nd item (+44.0°C), and starts at the 2nd character and it gets 4 characters.

The AWK print statement can be used with an if statement to show a filtered list.

An example of this would be to filter the ps (snapshot of the current processes) command, and print only lines with a time showing:

~$ # SHOW ALL PROCESSES
~$ ps -e 
   PID TTY          TIME CMD
     1 ?        00:00:03 systemd
     2 ?        00:00:00 kthreadd
     4 ?        00:00:00 kworker/0:0H
     6 ?        00:00:00 mm_percpu_wq
     7 ?        00:00:00 ksoftirqd/0
     8 ?        00:01:10 rcu_sched
...
~$  # SHOW ONLY PROCESSES WITH TIME
~$ ps -e | awk '{if ($3 != "00:00:00") print $0}'
   PID TTY          TIME CMD
     1 ?        00:00:03 systemd
     8 ?        00:01:10 rcu_sched
    10 ?        00:00:06 migration/0
    15 ?        00:00:03 migration/1
...

systime() / strftime() – get time and format time:

These time functions allow you to add time stamps and then do formatting on the date/time string. I found this useful in logging and charting projects. An example to add a time stamp to the sensor’s Core temperature would be:

$ sensors | grep Core
Core 0:        +33.0°C  (high = +100.0°C, crit = +100.0°C)
Core 1:        +33.0°C  (high = +100.0°C, crit = +100.0°C)

$ sensors | grep Core | awk '{print strftime("%H:%M:%S ",systime()) $0 }'
 11:06:18 Core 0:        +33.0°C  (high = +100.0°C, crit = +100.0°C)
 11:06:18 Core 1:        +33.0°C  (high = +100.0°C, crit = +100.0°C)

AWK Script Files

There are a number of different ways to run AWK scripts. To execute a file that is 100% AWK:

# Pass an input file to AWK
awk -f awkscriptfile  inputfile
# Or with a pipe
cat inputfile | awk -f awkscriptfile
# Pass an input file and have AWK create an output file
awk -f awkscriptfile  inputfile > outputfile
# Or with a pipe
cat inputfile | awk -f awkscriptfile > outputfile

Multiple lines of AWK can be using in a Bash file, using the AWK BEGIN and END statement. Below is an example to convert a CSV file to a OPCUA XML file. The first AWK section prints a header, and the second section reader the input file line by line (getting line variables $1,$2…) and formats an output:

#!/usr/bin/bash
# csv2xml.sh - create an OPC UA XML file from CSV
# 
 
# add the xml header info
awk ' BEGIN {
  print "<?xml version=\"1.0\" encoding=\"utf-8\"?>"
  print "<UANodeSet xmlns=\"http://opcfoundation.org/UA/2011/03/UANodeSet.xsd\"" 
  print "           xmlns:uax=\"http://opcfoundation.org/UA/2008/02/Types.xsd\""
  print "           xmlns:xsd=\"http://www.w3.org/2001/XMLSchema\"" 
  print "           xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\">"
  print "<NamespaceUris>"
  print "  <Uri>http://192.168.43.228/demo/</Uri>"
  print "</NamespaceUris>"
}'

# Read the input CSV format and process to XML
awk ' {
   FS=","
# Skip any comment lines that start with a #
  if ( substr($1,1,1) != "#" )
  {
    i = i+1
    print "<UAVariable BrowseName=\"1:"$1"\" DataType=\"Int32\" NodeId=\"ns=1;i="i"\" ParentNodeId=\"i=85\">"
    print "  <DisplayName>"$1"</DisplayName>"
    print "  <Description>"$2"</Description>"
    print "      <References>"
    print "        <Reference IsForward=\"false\" ReferenceType=\"HasComponent\">i=85</Reference>"
    print "      </References>"
    print "    <Value>"
    print "      <uax:Int32>"$3"</uax:Int32>"
    print "    </Value>"
    print "</UAVariable>"
  }   
}
END{ print "</UANodeSet>"} '

The script and output for an example CSV file:

$ # A simple  CSV file
$ cat tags.csv
# field: tag, description, default-value
TI-101,temperature at river, 25
PI-101,pressure at river, 14

$ # Read the CSV and format a XML output 
$ cat tags.csv | ./csv2xml.sh
<?xml version="1.0" encoding="utf-8"?>
<UANodeSet xmlns="http://opcfoundation.org/UA/2011/03/UANodeSet.xsd"
           xmlns:uax="http://opcfoundation.org/UA/2008/02/Types.xsd"
           xmlns:xsd="http://www.w3.org/2001/XMLSchema"
           xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<NamespaceUris>
  <Uri>http://192.168.43.228/demo/</Uri>
</NamespaceUris>
<UAVariable BrowseName="1:TI-101" DataType="Int32" NodeId="ns=1;i=1" ParentNodeId="i=85">
  <DisplayName>TI-101</DisplayName>
  <Description>temperature at river</Description>
      <References>
        <Reference IsForward="false" ReferenceType="HasComponent">i=85</Reference>
      </References>
    <Value>
      <uax:Int32> 25</uax:Int32>
    </Value>
</UAVariable>
<UAVariable BrowseName="1:PI-101" DataType="Int32" NodeId="ns=1;i=2" ParentNodeId="i=85">
  <DisplayName>PI-101</DisplayName>
  <Description>pressure at river</Description>
      <References>
        <Reference IsForward="false" ReferenceType="HasComponent">i=85</Reference>
      </References>
    <Value>
      <uax:Int32> 14</uax:Int32>
    </Value>
</UAVariable>
</UANodeSet>

This example could have been easily done with Python but the Bash/AWK will actually require less lines of code because input and output files don’t be to be defined (Bash uses pipes and redirection).

Final Comments

I’ve found that learning a little bit of AWK has really paid off.

AWK supports a lot of functionality and it can be used to create full on scripting applications with user inputs, file I/O, math functions and shell commands, but despite all this I’ll stick to Python if things get complex.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s