Home automation packages offer a huge selection of data import and integration tools, but they can never cover everything.
Many of us have some personal or hobby topics that we check periodically. These personal data points could be the wave height at a local surf spot, the bug levels at a favorite camping area, or when is the best time of go fishing.
In this blog I will look at how to scrape some hobby data points from web pages with just a single line of Bash code. I will then show how these results can be integrated into two home Internet of Things packages, Home Assistant and Node-Red.
Getting Started
Each of the different automation solutions offer web scraping tools. For example Python has the Beautiful Soup library, Home Assistant has the Scrape sensor, and Node-Red has the Scrape-it flow. These web scraping tools are all very usable, but unfortunately they require a detailed knowledge of the HTML/Document-Object-Model for the requested page.
A simple alternate approach is to use the lynx text based browser. This command line tool can be used to strip out all the HTML tagging and dump out just the text on a page. The output from lynx can be piped to common commands like grep, sed, and awk to filter out the required values. The advantage in the lynx approach is that you don’t need to understand the internal HTML on a page, and it only takes 1 line of Bash to get what you need.
To install Lynx on Raspian/Debian/Ubuntu:
sudo apt install lynx
Two Off-line Examples
The first step in developing custom web scraped data points is to find the required Bash commands. Working directly on either Home Assistant or Node-Red can be a little challenging. Luckily, you can do all your basic testing on a laptop and then once you’ve got things working you can move the code over to your IoT system.
Two personal interest topics for me, are the pollen levels in a local hiking region, and the amount of snow at a ski resort that I’m planning to take the family to.
The Lynx -dump option will output a stream of text with HTML tags, HTML encoding and Javascript removed. The command syntax that I’m looking for is:
lynx -dump http://somepage.com | "filter the output until I get the result"
The picture below shows how I used lynx to find the pollen level from a weather page. For this example I first looked at the top 10 lines in the output, and I compared this output to the actual web page. A quick check showed that the pollen level value was on the sixth line. The sed utility can be used to delete all but the sixth line by setting the option to : ‘6!d’.

The full Bash script to get the pollen level from my favourite hiking area is:
# define the weather page URL
theurl="https://www.theweathernetwork.com/en/city/ca/ontario/lions-head/pollen"
# get the pollen value (on the 6th line)
lynx -dump $theurl | sed '6!d'
The second example, (picture below), uses lynx with a grep call to find the text “Top Lift” on a web page. In this snapshot, the output is returned as four words: Top Lift: 2.3 m. The snow depth (2.3) is the third word in the string. There are a few ways to extract words in a string, and for this example I used the awk utility.

The full Bash script to get the snow base for my upcoming ski trip is:
# define the ski resort URL
theurl="https://www.snow-forecast.com/resorts/Whistler-Blackcomb/6day/mid"
# find the line with "Top Lift", and
# then parse out the 3rd (snow) value
lynx -dump $theurl | grep 'Top Lift' | awk '{ print $3 }'
Now that I’ve got the Bash commands for my personal data points I can move to the next step of adding them to my Home Assistant or Node-Red systems.
Home Assistant Command Line Sensors
The Home Assistant (HA) Command Line sensors offer an interface that allows the output from Bash commands to be used as HA viewable entities.
Before our earlier commands can be run, the Lynx utility needs to be installed. There are several ways to install applications into HA. The important point is that Lynx needs to be available in the same working space that the Command Line sensors run in. A simple way to ensure this, is to use a Command Line sensor to install Lynx directly.
To install sensors, modify the /config/configuration.yaml file. This file can accessed through the File Editor or Terminal HA add-ons, or via a secure shell (SSH) connection.
The picture below shows a temporary sensor called Install_Lynx that has been added to /config/configuration.yaml. This sensor will run the apk add command to install software. After this file is updated and saved, a Home Assistant restart will be required.
Once the required software is installed, it would be recommended that you remove this temporary “install” sensor, otherwise the system will try to re-install Lynx every 60 seconds.

Another approach would be to only install the software if isn’t there. (Note: the sensor is still run every minute by default). The command for this would be:
if ! ( apk list lynx ); then apk add lynx; fi
After Lynx is installed some new command line sensors can be added that access the personal web page data. The image below shows a sample /config/configuration.yaml file with the Install_Lynx sensor removed and two new sensors added that use our Bash commands.
These web pages don’t update too frequently so the scan_interval is set to 1 hour (3600 seconds). Also it good to ensure that the Lynx command is given enough time to run, so I set the command_timeout to 15 seconds. Like in the previous step HA needs to be restarted after new sensors are added.

The final step is to put the new sensor tags into a viewable presentation. The Home Dashboard can be selected from HA’s Overview option. The pic below shows the addition of a card that contains the newly created web scraped command line sensors.

Using Lynx with Node-Red
Like Home Assistant, the Lynx utility needs to be loaded on the Node-Red system. For Raspian/Debian/Ubuntu system this is done by: sudo apt install lynx.
To test that Lynx is working within Node-Red an inject, exec and a debug node can be used.

For my test I used the Bash statement to find pollen levels as the command in the exec flow. It’s important to note that the top output connector of exec flow is the output result (stdout).
The next step is to create some logic that schedules the scraping of web pages and then shows the data on a web dashboard. The image below shows a Node-Red example that uses a Big Timer to trigger the scraping of web pages. For the presentation of data a Text dashboard flow is used to show the pollen levels, and a Gauge dashboard flow is used for the snow levels.

The realtime data is available at: http://my_node_red_ip:1880/ui .

Summary
To create a successful home automation solution it’s important to access and view all the relevant signals.
From my personal experience I’ve found that once I setup my key measuring and controlling devices I would only infrequently look at the system. However, since I started adding hobby topics to the system I’ve noticed that I’m using the system a lot more often.