Fun with pf and rdomains

In this blog post I'll speak heavily about OpenBSD concepts and networking, this is meant mostly as a ressource for people trying to achieve something similar to me. Nevertheless, this is not a technical tutorial or guide, so feel free to read on even if you're not so familiar with OpenBSD, I'll do my best to not make it boring and to link to a lot of resources so that you can explore concepts you're unfamiliar with. And I'll always be happy to explain or clarify things if you feel like writing to me. Even though I felt like explaining the whole story and reasoning on how I fixed it, you can also just look at the problem I tried to solve and the solution to that problem.

I am currently using Wireguard as a VPN, I use mostly to route all my traffic through a VPS somewhere else, but I also use that VPS as a router with a static public IP. The IP you are seeing for this blog is actually the IP of that VPS, but not the server where the blog is hosted. Anywho, a few month ago I was using 9front quite a lot and the only available VPN on it was Tinc, I'd never heard of it before, so I gave it a shot and explored a bit. One very interesting feature of Tinc over Wireguard is that it fully embraces the mesh routing paradigm and dynamically update routes. That was very cool, so when I wanted to connect to a machine on the same LAN, Tinc was smart enough (given a correct configuration) to figure out how to reach the other machine through the LAN and not need to pass through the VPS with the public IP. Unfortunately that is not really possible with Wireguard, but there is another feature to Tinc which allows you to run scripts on certain events such as a new peer connects to your machine or being disconnected from other peers etc… This is mostly meant to change routes dynamically based on certain events and further leverage the mesh routing capabilities of Tinc. Again Wireguard is less flexible than Tinc on that level, but having tasted the abilty to use a vpn for all your machines yet benefit from local speeds when on the same LAN I wanted to try to improve this with my current Wireguard setup.

I really liked Tinc, but the main reason I continue to use Wireguard is because it is directly supported by OpenBSD's base system, without any third-party packages, through the wg(4) interface and ifconfig(8). And I am quite a big proponent of using the base system as much as possible without installing third-party packages.

Initial setup

My laptop is connected to a VPN via Wireguard on a VPS I am renting from Vultr, I am currently planning to migrate my VPS to OpenBSD Amsterdam soon as I believe it's much better to support small provider like them over big providers like Vultr, Hetzner, Linode or Digital Ocean.

The VPS acts like a router and firewall for most of my machines. That way no matter where I am in the world I can connect to my different servers, I can run my own DNS resolver and do my own DNS blocking (as opposed to the ISP/country DNS censorship), and since I do NAT, it really feels like I am on my own LAN. I honestly don't need to care wether I am at home, on a public wifi, or visiting my SO in the US. That's a setup that has worked for me for the past 4 years and I am not planning of changing it anytime soon. As this setup is heavily based on IPv4 and as I am starting to dabble in with IPv6, I can forsee that things might evolve significantly in the near future.

Both my laptop and my VPS runs OpenBSD, so it's fairly easy to setup such a VPN with hostname.if(4), rdomain(4) and pf(4).

On my laptop, the wg0 interface, and thus all the traffic going through the VPN, is constrained within rdomain 0 (the default rdomain). While the em0 interface is constrained to rdomain 1, so that all my "non-VPN" traffic is made through rdomain 1. This has the nice benefit of cleanely separating traffic between VPN and no VPN. Everything goes through the VPN network by default, unless I specify otherwise. I can run a program outside the VPN like so:

route -T1 exec ping 192.168.1.1

This pings a machine on my local network, but I could open a web browser and start browsing the web as if I had no VPN in the same way.

I didn't need any particular pf(4) configuration on my laptop, so I used the default pf.conf(5).

The problem

This setup feels pretty clean, but there is one slight annoyance, each time I want to do things locally like access the admin interface on my home router, or copy big files on my LAN to avoid my data doing a back and forth with my VPS and benefit from the huge local speeds, I need to always prefix my commandes with route -T1 exec ... and this started to become a bit more annoying. Especially since I started to to more and more with my local machines. My initial reflex was to define a shell alias in order to shorten the prefix to lan ...:

alias lan='route -T1 exec'

But then it is still required to start new instances of iridium (a chromium fork) or emacs each time I want those programs to access local resources on the network. This is still unsatisfactory.

As I explained in the intro, I was then wondering if it was possible to specify local routes "à la Tinc". Unfortunately I haven't found a way to do so, but I kind of knew that it was possible to move network traffic between rdomains with pf(4).

Methodology

One of the first possibility to move everything in a single rdomain (the default one), and handle everything with route(8) and giving a higher priority to local routes. I didn't want this because I would lose the ability to browse the internet from our local connection without disabling the VPN totally. So the only way to do it is through pf(4) and move traffic from one rdomain to another.

That's where the frustration started, I learnt pf(4) a while ago and I had trouble remembering some basic and important details and I actually didn't want to dive once again into learning pf(4), it really felt like wasted time. No real way around it tho, when you want to do things on OpenBSD, it's much better to spend a bit of time reading some documentation and understand what you are doing. And that's why I really love and appreciate OpenBSD, the quality of its documentation coupled with the relative simplicity makes such frustrating tasks much easier to carry on. Off to the excellent FAQ on PF, quickly skimming through was a welcome refresher. Then a few internet searches yielded blog posts like this or this other one, and seeking some advices on #openbsd on Libera.Chat gave me a good place to start. Adding to that the example on the rdomain(4) man page and armed with the pf.conf(5) man page, I was ready to tackle this issue.

One of my initial tries was to add something like this at the end of the default pf.conf(5) file:

match in on rdomain 0 to { 192.168.0.0/16 } rtable 1

This was mostly based on the suggestion by someone on #openbsd, and when I asked why not match out... he told me that you want to match in... almost all the time. Turns out, my intuition was right but I didn't know that yet. And especially as I explained in a previous blog post, it's very important to moderate your trust in random people on the internet. I then tried a few variations of that config, but each time I tried to reload the PF daemon, my network would be entirely blocked on the default rdomain (rdomain 0). I also wasn't sure if it needed to be a match statement or a pass statement. I must say I was quite confused, but then I exercised dubitativeness followed my initial intuition and tried match out...:

match out on rdomain 0 to { 192.168.0.0/16 } rtable 1

Network was working now, but I still couldn't ping a local address from rdomain 0.

Tcpdump to the rescue

Given that network was working but doing a ping on a LAN machine was sending ICMP request but not getting any replies from rdomain 0, but was working fine from rdomain 1 I gave a shot at sniffing traffic on my server and see what was actually going on in the "pipes".

Some important details here, my server was on the LAN with IP address 192.168.2.3 on em0, my laptop had IP addrees 192.168.2.101 on em0 in rdomain 1 and IP address 10.0.1.2 on wg in rdomain 0.

To sniff the traffic on my server I ssh'd in and ran the following command which was just logging ICMP requests on the default interface (em0 for my server):

doas tcpdump 'icmp[icmptype] = icmp-echo'

And then from my laptop I ran first:

route -T1 exec ping 192.168.2.3

Which worked fine and the following was shown in tcpdump(8):

16:53:55.242014 192.168.2.101 > 192.168.2.3: icmp: echo request
16:53:55.242098 192.168.2.101 > 192.168.2.3: icmp: echo request
16:53:56.250806 192.168.2.101 > 192.168.2.3: icmp: echo request
16:53:56.250887 192.168.2.101 > 192.168.2.3: icmp: echo request

I then ran another ping from my laptop but this time from rdomain 0:

ping 192.168.2.3

Which resulted in 100% packet loss and no replies, but when I saw what was bing displayed in tcpdump(8), the issue became self-evident:

16:54:08.332822 10.0.1.2 > 192.168.2.3: icmp: echo request
16:54:08.332902 10.0.1.2 > 192.168.2.3: icmp: echo request
16:54:09.340634 10.0.1.2 > 192.168.2.3: icmp: echo request
16:54:09.340645 10.0.1.2 > 192.168.2.3: icmp: echo request

The source IP address was the same as my wg0 address in rdomain 0, NAT was lacking basically. Changing the rule to:

match out on rdomain 0 to { 192.168.0.0/16 } nat-to (em0) rtable 1

And then it worked fine. This took me around 3 hours.

Traffic flows both ways

I was able to reach a server on my LAN from rdomain 0 on my laptop, and the ICPM replies were able to flow back because in the default configuration pf(4) allows stateful traffic no matter where it comes from. But what if I want to reach a service on rdomain 0 on my laptop from my LAN server? That's where things become a bit unorthodox because I don't think many people have a use case for running services on their laptop and have it reachable from the LAN. Or if they do, they probably don't use OpenBSD and rdomains to separate traffic between VPN and non-VPN. And to me it just made more sense to have the "reverse" rule allowing services running in rdomain 0 being reachable from the LAN. I put reverse in quote because it isn't actually straightforward, it's more akin to the contraposition in logic than a simple negation or inversion. The first thing I tried was the following:

match in on rdomain 1 from { 192.168.0.0/16 } rtable 0

It didn't work, so I again started with sniffing traffic on my laptop this time. I used the following command which logs both ICMP requests and replies on interface em0 (the one on the LAN):

doas tcpdump -i em0 'icmp[0] = 8 or icmp[0] = 0'

And with the previous pf rules the ping was not getting any replies and when looking at the tcpdump(8) logs nothing seemed out of place despite the missing replies:

17:26:31.338368 192.168.2.3 > 192.168.2.101: icmp: echo request
17:26:32.339194 192.168.2.3 > 192.168.2.101: icmp: echo request
17:26:33.339185 192.168.2.3 > 192.168.2.101: icmp: echo request
17:26:34.339191 192.168.2.3 > 192.168.2.101: icmp: echo request

The replies must then either be blocked by the firewall (unlikely) or go somewhere they shouldn't (another NAT-like issue). In the hope of cathing the missing replies, I tried to sniff the network on the other interfaces with:

doas tcpdump -i wg0 'icmp[0] = 8 or icmp[0] = 0'
doas tcpdump -i lo0 'icmp[0] = 8 or icmp[0] = 0'

But nothing! Things started to get weird and I wasn't really sure where to look at next…

Logging with PF

On the #openbsd IRC channel someone then mentionned to look at pflog0, which is a network interface that logs traffic going through rules marked as log. I then proceeded to modify the rule like so:

match in log on rdomain 1 from { 192.168.0.0/16 } rtable 0

And in order to see the logged traffic I had to run the following:

doas tcpdump -v -e -n -i pflog0

And that's when I actually saw nothing… I was at a loss again, but not giving up I tried to add a log statement to the general pass rule, just to see if I could actually see something. And that is whin I saw it:

pass out on wg0: [rewritten: src 192.168.2.101:54615, dst 192.168.2.101:8] 192.168.2.3 > 192.168.2.101: icmp: echo request

And there it was, an address like 192.168.2.101 had nothing to do on the wg0 interface (which has the 10.0.1.2 address). So it really was anothe NAT-like issue. That's how I came to try the followin, just to see, again:

match in log on rdomain 1 from { 192.168.0.0/16 } nat-to (wg0) rtable 0

Which showed the following in the logs:

match in on em0: [rewritten: src 10.0.1.2:32045, dst 192.168.2.101:8] 192.168.2.3 > 192.168.2.101: icmp: echo request

And that's when the "AHA" moment struck and I understood the difference between the nat-to and rdr-to statement in pf.conf(5). The nat-to modified the src field and the rdr-to will most probably modify the dst field. And this is what I wanted, the dst field should match the IP address of wg0 (10.0.1.2) for it to work. So naturally I expected the following to work and it did:

match in log on rdomain 1 from { 192.168.0.0/16 } rdr-to (wg0) rtable 0

And the log entry for this one showed what I wanted to:

match in on em0: [rewritten: src 192.168.2.3:35314, dst 10.0.1.2:8] 192.168.2.3 > 192.168.2.101: icmp: echo request

At that point, everything worked as I wanted it to.

One last tweak

The last adjustement I want to make before wrapping up is to specify all the from and to in the rules. By default pf(4) infers from/to any when left unspecified and firewall rules should always match as closely as possible. So the last tweak looked like this:

match out on rdomain 0 from (egress) to { 192.168.0.0/16 } nat-to (em0) rtable 1
match in on rdomain 1 from { 192.168.0.0/16 } to (em0) rdr-to (wg0) rtable 0

And we just have make such to have all the interface names between parenthesis for pf to handle IP address chaenges dynamically on those interfaces. This is escpecially important if the interface uses DHCP or gets his IP address assigned later after boot. If you don't put the parenthesis in such a scenario, pf might just block all network traffic until you reload it with something like:

doas pfctl -f /etc/pf.conf

The solution

Adding those two lines at the end of /etc/pf.conf will allow your machine to:

reach the LAN network (192.168.0.0/16 subnet in my case) that is available on the em0 interface in rdomain 1
have services running in the default rdomain 0, be reachable by machines in your LAN network which is accessible only through rdomain 1

match out on rdomain 0 from (egress) to { 192.168.0.0/16 } nat-to (em0) rtable 1
match in on rdomain 1 from { 192.168.0.0/16 } to (em0) rdr-to (wg0) rtable 0

Closing thoughts

That was quite the task, this took me around 6-7 hours to achieve when I actually did it two days ago! And this just for two lines of configurations. A perfect example to highlight the all too common fallacy of equating number of lines of code with the amount of worked produced. It was also a good refresher for me to get back into pf(4) and it will tremendously ease my system administration by removing the need to fire up a new web browser or a new Emacs instance each time I want to access some LAN network resources from those programs.