6 Lines of Python to Plot SQLite Data

There are some great tutorials on SQL, SQLite and Matplotlib, however I wanted something super simple that I could use to teach my daughters with.

This blog documents my notes on what I used to teach them SQL, SQLite3 and then how to plot the results in Python. Yes it can only takes 6 lines of Python code to read an SQLite3 file and plot it, if you are doing 2 line charts with a title and legends its 9 lines.

Getting Started with SQL

There are some great SQL databases and SQL tools that can are quite user friendly.

For us I started with SQLite3 which is file based, no server component is required and it runs on all major OS’s and Raspberry Pi’s. There are a number of SQL Query builder tools but we found the DB Browser for SQLite gave us all the functionality we needed.

Check the install documentation for your hardware/OS, to install SQLite3 and DB Browser on Debian/Ubuntu/Raspberry Pi’s :

sudo apt install sqlite3
sudo apt-get install sqlitebrowse

SQLite Databases

For testing there are a lot of excellent databases that can downloaded from Kaggle. These data files are in CSV format and they can be imported into a SQLite database file using the DB Browser.

For our first test we used the Kaggle SARS 2003 data set.

A new database file (sars.db) was created and then the CSV was imported.

Create VIEWS to Simplify

SQL Views can be created to simply the field names, reduce the number of fields or add custom fields.

Use the “Execute SQL” tab to create a view. An example to create a view with simplified field names would be:

Views and SELECT queries can be generated to add custom fields. An example to extract the month from the date field and add 2 other fields (month number and month name):

Test SQL Queries

The DB Browser tool is good for testing out SQL queries and plot the data before moving to Python.

A query to find the worst 5 countries affected by SARS would:

select sum(deaths) sum_deaths, country from v_sars group
  by Country having sum_deaths > 0 order by sum_deaths desc limit 5

Plotting in Python

There are number of ways to plot SQL data in Python. I found that the easiest way was to use Pandas dataframes.

To load the necessary Python libraries:

pip install sqlite3
pip install pandas
pip install matplotlib

The Python code to connect to SQLite3, run the SQL query (of sum of deaths vs. country) and plot the data is:

#
# sql_bar0.py - Sqlite to Bar Charts
#
import sqlite3, pandas , matplotlib.pyplot as plt

conn = sqlite3.connect("/media/pete/RED_4GB/sql/sars.db")

sql = """select sum(deaths) sum_deaths, country from v_sars group
  by Country having sum_deaths > 0 order by sum_deaths desc limit 5"""

data = pandas.read_sql(sql, conn)
#x values: data.Country,  y values: data.sum_deaths
plt.bar(data.Country, data.sum_deaths)
plt.title("SARS Death in 2003")
plt.show()

An example with 2 lines to shows the monthly deaths and cases would be:

#
# sql_line2.py - Sqlite to 2 Line Charts
#
import sqlite3, pandas , matplotlib.pyplot as plt

conn = sqlite3.connect("/media/pete/RED_4GB/sql/sars.db")

sql = """select s_month, sum(deaths) as sum_deaths, sum(cases) as sum_cases from v_month group by n_month"""

data = pandas.read_sql(sql, conn)

plt.plot(data.s_month,data.sum_deaths, label = "Deaths")
plt.plot(data.s_month,data.sum_cases, label = "Cases")
plt.legend()
plt.title("SARS Death in 2003")
plt.show()

Summary

By keeping the Python code simple we were able to focus on SQL queries.

Using the basic Python code the SQL connection we later changed from SQLite3 to MySQL or Progresql.

GnuPlot: Realtime Plots in 20 lines

There are some excellent charting and plotting packages but if you’re like me you sometimes want to do a quick test plot to capture some realtime data.

Gnuplot has been around for quite awhile and I was happily surprised at what it can do. I was amazed that I could make real-time bar and line charts in only 20 lines of scripting code. (For simple Bash applications it is possible to create charts in just 1 line.)

In this blog I’ll introduce Gnuplot and show two examples. The first will show the status Raspberry Pi I/O pins, and the second example will be a line chart of CPU diagnostics.

Getting Started

Gnuplot can be installed on Linux, Windows, and Mac. To install Gnuplot on Ubuntu:

sudo apt-get install gnuplot

Gnuplot is typically run as a command line utility, but it can also be run manually. When Gnuplot is run manually the charting instructions and data values can be inserted directly. Below is an example where 4 sets of data points are plotted in a line chart.

pi@raspberrypi: $ gnuplot

gnuplot> $Mydata << EOD
# Now enter some data                                
2 1
3 1.5
4 2.1
5 3.3
EOD
gnuplot> plot $Mydata with line

Data block names must begin with a $ character, which distinguishes them from other types of persistent variables. The end-of-data delimiter (EOD in the example) may be any sequence of characters. For this example the plot command will use the data in the $Mydata variable and create a line chart.

A Static Bar Chart

A simple but useful example of Gnuplot would be to show the realtime status of the Raspberry Pi General Purpose Input/Output (GPIO) pins as a bar chart.

A bar chart presentation can be created using a data file of:

# gpio.dat - data file for GPIO pin values
# column1 = chart position, column2 = heading, column3 = value
0 GPIO2 0
1 GPI03 1
2 GPI04 1
#....

To plot a bar chart the fill style and bar width needs to be defined. The plot using 1:3:xtic(2) argument will make the first column in the data file the x-position, the third column the y-value and the x-labels be the second column. The interactive commands to plot the file:

pi@raspberrypi:~/pete/gnuplot $ gnuplot 

gnuplot> set style fill solid
gnuplot> set boxwidth 0.5
gnuplot> plot "gpio.dat" using 1:3:xtic(2) with boxes title ""

Real-Time Bar Chart of PI GPIO

The previous example used a manually created gpio.dat data file. The status of GPIO pins can be found using the gpio command line utility. To get the status of GPIO pin 9:

gpio read 9

By adding some Bash and awk script it is possible to create a gpio.dat file:

$ gpio read 9
1
$ gpio read 9 | awk '{ print "9 GPIO9 " $1 }'
9 GPIO9 1
$ gpio read 9 | awk '{ print "9 GPIO9 " $1 }' > gpio.dat
$ cat gpio.dat
9 GPIO9 1

To make a dynamic bar chart I created a Gnuplot script file (gpio_bars.txt). The Gnuplot scripting language is quite powerful, and it support a wide range of functions and control statements.

Rather than manually adding lines for each GPIO pin status a for loop can iterate from pins 2-29. A system command is used to run the gpio utility and bash commands. To refresh the data a replot and a pause command is used.

# Create Dynamic bar that read GPIO pins every 5 seconds
#
set title "PI GPIO Data"
set boxwidth 0.5
set style fill solid

# Create a dummy file to get started
system "echo '0 GPIO2 1' > gpio.dat"

plot "gpio.dat" using 1:3:xtic(2) with boxes title ""

while (1) {  # make a new 'gpio.dat' every cycle with fresh data
  system "echo '' > gpio.dat"
  do for [i=2:29] {
    j = i-2 # put first GPIO pin at position 0
    system "gpio read " .i.  "  | awk '{ print  \"" . j . " GPIO" . i . " \" $1 }' >> gpio.dat
  }
  replot
  pause 5
}

To run this script enter:

gnuplot -persist gpio_bars.txt

A Simple Line Chart

A bar chart presentation can be created using a data file of:

# GPU.dat - a time stamp with two data points
18:48:30 51.0 49.0
18:48:40 50.5 49.5
18:48:45 51.5 49.0
18:48:50 50.0 50.5
18:48:55 50.5 49.5

The interactive Gnuplot commands to show a line chart of this data would be:

pi@raspberrypi: $ gnuplot

gnuplot> set xdata time
gnuplot> set timefmt "%H:%M:%S"
gnuplot> set format x "%H:%M:%S"
gnuplot> plot "gpu.dat" using 1:2 with line title "GPU temp" ,\
>"gpu.dat" using 1:3:3 with line title "CPU temp"  

A few extra lines are needed in the Gnuplot script. First the plot needs to know that the x-axis is time data, and it needs to know the format of the time data. Multiple data points can be plotted at the same time, and the using argument tells Gnuplot the x:y-point data. (If the data file had a third point the using reference would be 1:4:4 ).

A Real-Time Line Chart

Linux has a lot of useful command line trouble shooting tools, one of these is the sensors utility that allows users to get fan speed and temperatures.

$ sensors
dell_smm-virtual-0
Adapter: Virtual device
Processor Fan: 2676 RPM
CPU: +47.0°C
Ambient: +38.0°C
SODIMM: +37.0°C
...

Using some Bash and Awk commands it is possible to get the fan speed and CPU temperature:

$ sensors | grep RPM
Processor Fan: 2685 RPM
$ sensors | grep RPM | awk '{print $3}'
2685

$ sensors | grep CPU
CPU: +50.0°C
$ sensors | grep CPU | awk '{print $2}'
+48.0°C
$ sensors | grep CPU | awk '{print substr($2,2,4)}'
48.0

Awk supports a systime() call to return the present date/time, and a strftime() call to customize the presentation of the time/date. (Note: a Raspberry Pi might need to have gawk installed to get this functionality, use “sudo apt-get install gawk”).

The next step is to format the sensor output with a timestamp:

$ sensors |grep RPM | awk '{print strftime("%H:%M:%S ", systime()) $3}'

10:26:46 2687

$sensors | grep CPU | awk '{print strftime(\"%H:%M:%S \",systime()) substr($2,2,4)}'

10:27:46 49.0

Now that a time and value string can be generated a Gnuplot script can be created (line_fan_cpu.txt) to show real time data.

To make the Bash code a little easier two data files are created, fan.dat and cpu.dat.

The plot has to account for different scale ranges, so y2range and y2label definitions are used. The final script addition is to include an axes (x1y2 or x1y2) to each plot point, this lines up the data value to the right or left y-axis.

The complete Gnuplot script to show fan speed and CPU temperature is only 20 lines of code!

# Create a Plot or User and System CPU Usage, update every 5 seconds
#
set title "GnuPlot - Fan Speed  and CPU Temperature"
set yrange [2650:2700] 
set ylabel "Fan Speed"
set y2range [43:49] 
set y2label "CPU Temp (C)"
set y2tics
set xdata time
set timefmt "%H:%M:%S"
set format x "%H:%M:%S"

system "sensors |grep RPM | awk '{print strftime(\"%H:%M:%S \", systime()) $3}' > fan.dat"
system "sensors | grep CPU | awk '{print strftime(\"%H:%M:%S \",systime()) substr($2,2,4)}'  > cpu.dat"

plot "fan.dat" using 1:2  with lines axes x1y1 title "fan speed (RPM)",  "cpu.dat" using 1:2 with lines axes x1y2 title "CPU Temp (C)"
while (1) {
	pause 5
	system "sensors |grep RPM | awk '{print strftime(\"%H:%M:%S \", systime()) $3}' >> fan.dat"
	system "sensors | grep CPU | awk '{print strftime(\"%H:%M:%S \", systime()) substr($2,2,4)}'  >> cpu.dat"
	replot
}

To run this script enter: $ gnuplot -persist line_fan_cpu.txt

Final Comments

I won’t give up using plotting packages like MatPlotlib or ggplot, but I was very impressed how easy it was to create real-time plots using Gnuplot.

Manipulating the Bash/awk script can be a little complex but it’s incredible useful to be able to use output from almost any command line utility in Gnuplot.

Gnuplot can plot a very large number of data points, but it makes sense to a tail command to only show the latest x-number of points.

Curve Fits for Kids

I was helping my daughters with some school projects in graphing, and I thought that it would be good  to document and share our discussion. This blog will look at solving the equations for plots using Desmos, Google Docs and Excel.

Desmos

Desmos (https://www.desmos.com/calculator) is great online free graphing package that is used in many school systems. Desmos has a lot of options to simulate, tweek and view data.

To add a table of data, use the “+” button and select table.

demos1

Once the data table has been entered an f(x) expression can be linked to the data. In the example below a linear expression was used. It’s important to note, two things:

    • The subscript must match. So if the table uses: y2  x2 then the formula needs to use the same.
    • A ~ (tilda) is used instead of an = (equals) sign.

 

demos2

If the formula matches up with the table a plot will be drawn and the slope, intercept and r-squared (goodnes of fit) values will be shown.

We found that for simple first order (straight line curves) Desmos worked really well, however for polynomials and other more advanced equations it was a little awkward.

Google Docs

Our school board uses Google Classroom which is an online suite of tools that enables kids to do Microsoft Excel, Word and Powerpoint in a free web based environment.

The Sheets options (https://docs.google.com/spreadsheets) of Google Docs works very much like Excel. To do some charting, enter a table of data and use the “chart” button to present the desired plot.

In the “Customize” options of the chart, under Series, a trend line can be added with an equation and a R-squared (fitness of curve) text.

googledoc1

It is also possible to use the SLOPE, INTERCEPT and RSQ formulas show the results in a cell.

The regression type supports functions such as log, exponential, and polynomial. For the example below we got the equation for a 2nd order polynomial.

googledoc2

Note: Sheets can import and export to and from Excel.

Excel

Different versions of Excel (and LibreOffice) will have a slightly different way to get the stats on a curve. For our work we used Excel from Office 2016.

In the Chart Elements, under Trend Line, the more Options items will offer features that are almost identical to Google Sheets.

excel1

A type of trend line can be selected and the solved equation and R-squared value can be shown on the plot.

excel2

Advanced equations like higher order polynomials can also be used.

excel3

Excel (and LibreOffice) support function calls for : SLOPE, INTERCEPT and RSQ.

 

Summary

Desmos is an excellent package for kids to play with and understand the different plot types, but it probably isn’t the best tool for doing curve fits of test data for projects.

Excel and Google Sheets are excellent for most charting and plot statistical projects.