My adventures with AWS IoT and beyond
It started out pretty simple. I had just completed the bulk portion of a commissioned Raspberry Pi based embedded application project, and thought it would be a good idea to try and add remote logging to make future troubleshooting easier. Other than a little bit of EC2 linux server configuration and usage a few years ago, my experience with the majority of Amazon Web Services was essentially non-existent. But based on what I researched, it looked like it would be fairly easy to achieve AWS remote logging using a Python library called watchtower (available on PyPi and based on boto3). And it was. Very easy in fact:
And that was it. I now had remote logging to AWS Cloudwatch from my application. The only small catch was that I had to create and configure an Amazon Web Services account. This included setting up the necessary security credentials to allow my application to connect to my newly minted AWS account. In retrospect, this ended up being the more difficult part of the task. But it certainly wasn’t insurmountable.
Being that I have a general interest in IoT technology, and since I already went through the trouble of setting up an AWS account, I started looking into what else AWS had to offer. And one thing led to another.
Since I wanted to be able to remotely manage the Raspberry Pi, I started with giving the AWS Simple Management Service a try. Amazon had developed an SMS Agent for the ARM platform that the Raspberry Pi runs on, so it seemed to be a logical thing to do. After getting everything installed and configured, it seemed to mostly do what I wanted. But then I found out that the SMS Agent itself had been deprecated in lieu of some other Amazon service offerings. And then, I couldn’t get it to consistently start at boot time. That issue was definitely a deal killer. So I wiped it off the operating system and deleted it from my AWS account. I was not off to a good start.
Next up was AWS IoT Core, and more specifically, the IoT Greengrass service. This ultimately proved to be very useful, and was central to everything else that followed. It didn’t directly solve my remote shell requirement, but it opened up a number of options for collecting information from the Raspberry Pi. The framework is pretty involved, but once you understand where all the pieces go (running in the cloud or on the Raspberry Pi itself), it all starts to make some sense.
With the Greengrass client installed on the Raspberry Pi, I used the Cloudwatch Metrics connector to easily send operational information like CPU temperature and disk usage up to Cloudwatch. Once data was being collected, setting alarms in Cloudwatch for specific conditions was trivial. Connecting those alarms to the AWS Simple Notification Service in order to receive e-mail and text notifications when the alarms were triggered was also very easy to do.
On the management side of things, one desired item on my list was the ability to easily trigger restarting various services on the Raspberry Pi, such as my application, a reverse SSH tunnel service, or even rebooting the Raspberry Pi itself. I was able to accomplish this by taking advantage of the MQTT communication that is a primary function of the IoT Core service which acts as an MQTT broker. An MQTT topic was created and the Raspberry Pi was then subscribed to it. I created an IoT key certificate for use on my laptop, and was then able to securely send MQTT messages to the IoT Core via HTTPS from anywhere that my laptop has internet access. To make sending the HTTPS POST request easier, I threw together a basic Python Tk GUI for generating and sending the request. With that tool, I just select the service I want to restart and click the submit button.
The HTTPS request goes to the IoT Core endpoint, which then forwards the MQTT message via Greengrass on to a Lambda function running on the Raspberry Pi. The same Greengrass Lambda function then sends an SNS request out via another Greengrass connector, that will subsequently send out an e-mail letting me know that the restart was triggered. And all of this happens in just a second or two. The best part is I can do it from anywhere that has an internet connection. I additionally created a bash script on the EC2 instance that lets me easily perform this same task via the command line from there.
To satisfy my Raspberry Pi remote shell requirement, I spun up a small AWS EC2 instance, created a few public/private key pairs and set up the Raspberry Pi to start a reverse SSH tunnel to the EC2 instance at boot time. This allows me to first log on to the EC2 instance and then connect to the Raspberry Pi via the secure pre-established tunnel. This works even with the Raspberry Pi on a local network behind a firewall. In this case, the EC2 instance basically becomes an SSH gateway. An SSH app on my phone lets me connect to the EC2 instance using a password encrypted PPK pair as well.
By this time I was already neck deep in learning AWS services. So I kept going. While I have decades of experience with SQL databases, my experience with NoSQL is sparse. So to help fill that void , I set up a table in DynamoDB, using as many best practices as I could find, and updated my application to start dumping data into it. What I found out shortly after, was that analyzing data in NoSQL databases is, well let’s just say not very easy.
So I set up a stream from DynamoDB to feed a Lambda function that then dumps the data into an AWS Elasticsearch “cluster”. I say cluster with quotes, because I only used one node since my data requirements were minimal and non-essential. I already had some previous experience with using Solr, which like Elasticsearch is also based on Apache Lucene, so the learning curve here wasn’t too bad for me.
But I still had to get the data out of Elasticsearch in a meaningful way. Fortunately, when you set up Elasticsearch on AWS, you automatically get Kibana which is a web-based data visualization tool for Elasticsearch. While this part was new to me, the graphs, charts, and lists you can produce from this tool with minimal effort is incredible. But while the analytics of the web-based tool are great, accessing it locally from my laptop wasn’t automatically possible.
One of the main reasons I wanted to explore IoT on AWS was for the security aspects. I had developed unsecured in-house IoT projects before, but hadn’t worked with internet based IoT yet. Well in this case, the security features were keeping me from using Kibana from my laptop. My solution for that was to put the Elasticsearch and EC2 instances in the same AWS Virtual Private Cloud. Since I already had secure access to the EC2 instance, I could use that as a gateway to Kibana by running a port forwarding SSH tunnel from my laptop to the EC2 instance, thereby giving me access to the VPC. And it worked like a charm.
It’s Alive! (I hope)
The last feature I added was an AWS Lambda to check and make sure that my application was always running. AWS has a Blueprint Lambda called lambda-canary that does just this. It basically works by performing an HTTP request against your application and verifying the response. In my case, my application only had one URL (via ngrok), and it was subject to change at any time. Fortunately, in my logs I create a log entry every time a new ngrok tunnel is established and record the URL in the log message. So I created a Cloudwatch log query to get the most recent entry in the logs that contained the ngrok URL (regular expressions came to the rescue here). I then passed this URL into the lambda-canary function to have it periodically do a health check on the application. The Lambda function is regularly triggered using a Cloudwatch schedule rule. If the health check fails, it sends an SNS request and notifies me of the problem via e-mail and text.
Even though I started with minimal experience on AWS, after configuring a dozen or more services, the AWS console starts to get pretty familiar. You find yourself having a number of windows always open at the same time — especially IAM, Cloudwatch, Lambdas, and IoT Core. Getting used to working with AWS roles and policies had a bit of a learning curve, but now it’s almost second nature when setting up any new services or adding new configurations.
In the end, I wound up with an architecture based on AWS that fulfilled all of the access, instrumentation, and analytic features I could think of. And all it took to support my $35 Raspberry Pi was a few billion dollars worth of Amazon cloud infrastructure. Even though my initial need was just for remote application logging, here is a graphical representation of what I ultimately ended up with:
While I utilized AWS way more than I originally intended, the available tools and smooth integration of services made it easy to evolve my use cases. With so many offerings, figuring out which service to use to solve any given problem can be challenging. This can lead to dead ends after diving in sometimes, like it did for me with SMS. But the functionality that exists there is amazing and I’m glad I took the time to explore it, both in terms of practical application as well as just for the sake of learning.