skip to content
BitsAndBytes
Table of Contents

A long debugging session, written up so future-me (and maybe future-you) doesn’t have to redo it from scratch. This post walks through getting Matter-over-Thread devices commissioning successfully when Home Assistant runs in a VM on Proxmox and the OpenThread Border Router and Matter Server live in separate LXC containers. If you’ve ended up with Thread devices that join the mesh but commissioning times out at “Configuring…” — this is for you.

Setup overview

The starting point: Home Assistant is already running in a HAOS VM on Proxmox and works fine. What follows assumes you have HAOS up and want to add Thread/Matter via separate LXCs.

Three components, all on Proxmox, all on the same IoT VLAN (100):

  • HAOS VM — the existing Home Assistant Operating System VM.
  • OpenThread Border Router LXC (CTID 110) — the Thread network’s bridge to your IPv6 LAN. Deployed via the community-scripts OpenThread BR helper.
  • Open Home Foundation Matter Server LXC (CTID 111) — the Matter controller that Home Assistant talks to over WebSocket. Deployed via the community-scripts Matter Server helper. The package is python-matter-server, hence the “Python Matter Server” branding you see in its web UI.

The Thread radio is an SLZB-MR4U (Ethernet/PoE coordinator from SMLIGHT) at 192.168.100.247:6638 — i.e. the OpenThread BR talks to it over TCP, not USB. This matters for some of the failure modes later.

Network: everything on vmbr0 with VLAN tag 100 (IoT). Same broadcast domain — critical, because Matter’s mDNS discovery and IPv6 Router Advertisements (RAs) don’t cross VLANs.

┌─────────────────────────────────────────────────────────────────┐
│ Proxmox Host (vmbr0, VLAN 100) │
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ HAOS VM │ │ OTBR LXC 110 │ │ Matter LXC │ │
│ │ │ │ │ │ 111 │ │
│ │ Home │←─┤ wpan0 │ │ │ │
│ │ Assistant │ │ ↕ │ │ matter- │ │
│ │ │ │ eth0 │ │ server │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
│ │ │ TCP │ │
│ │ ↓ │ │
│ │ ┌──────────────┐ │ │
│ │ │ SLZB-MR4U │ │ │
│ │ │ Thread Radio │ │ │
│ │ └──────────────┘ │ │
│ └───────────────────────────────────┘ │
│ All on VLAN 100 │
└─────────────────────────────────────────────────────────────────┘
UniFi gateway (router for VLAN 100)
IPv6 ULA prefix: fd52:cf61:d373:cc4::/64

About wpan0 — that’s Linux’s network interface for the Thread mesh. “wpan” stands for Wireless Personal Area Network — the kernel’s umbrella name for low-power short-range protocols like 802.15.4. Inside the OpenThread BR LXC, otbr-agent creates wpan0 as a regular kernel netdev; packets to/from Thread devices appear there as normal IPv6 traffic, and ip -6 route shows the OMR prefix routed via wpan0. The actual radio work (encrypt, transmit on 2.4 GHz) happens on the SLZB-MR4U, which otbr-agent talks to over TCP rather than USB. So wpan0 is a virtual netdev backed by a remote radio — the OTBR routes IPv6 between eth0 (the LAN side) and wpan0 (the mesh side).

The Thread mesh has its own ULA prefix — the Off-Mesh-Routable prefix (OMR) — fdd2:b43f:bc3:1::/64, advertised by the OpenThread BR onto the LAN via RAs containing a Route Information Option (RIO).

The symptom

Commissioning a Thread/Matter bulb (an IKEA Kajplats) from the Home Assistant app would get to “Configuring…” and time out after ~30 seconds. The Matter Server logs showed:

CHIP_ERROR [chip.native.SC] PASESession timed out while waiting for a response from the peer.
CHIP_ERROR [chip.native.CTL] Discovery timed out

The bulb joined the Thread mesh briefly (visible in ot-ctl child table), then dropped. Phone-to-bulb Bluetooth Low Energy (BLE) worked fine. Thread join worked. But Matter commissioning over IPv6 wouldn’t complete.

Why this is tricky

Thread/Matter on Linux requires several things to align, any one of which silently fails:

  1. The OpenThread BR must announce the OMR prefix in RAs with an RIO.
  2. Every Linux device that needs to reach Thread devices must process that RIO and install a route to the OMR prefix.
  3. The OpenThread BR must be able to route packets bidirectionally between the LAN and the Thread mesh.
  4. mDNS service discovery must work between the Matter controller and Thread devices via the OpenThread BR’s Service Registration Protocol (SRP) and mDNS proxy.

Failure in any of these layers gives you the same symptom — “configuring times out” — with no obvious clue where to look. Welcome to the rabbit hole.

How RIO processing actually works on Linux

When a router on your network sends an RA, it can include an RIO (RFC 4191, option type 24) saying “to reach prefix X, route via me.” The kernel processes this only if:

  • accept_ra=2 on the interface (just 1 won’t do because you also have forwarding=1)
  • accept_ra_rt_info_max_plen is at least the prefix length (so 64 to accept a /64)
  • The kernel was built with CONFIG_IPV6_ROUTE_INFO

That last point matters: HAOS is built without CONFIG_IPV6_ROUTE_INFO. On HAOS, NetworkManager handles RIO processing in userspace instead, so the kernel doesn’t need it. On a standard Debian LXC, the kernel does have it, but the sysctls need to be set correctly and the container needs to have the privileges to install routes from RAs (unprivileged LXCs don’t).

We hit failures at every one of these layers.

Findings, in the order we discovered them

1. Per-interface sysctls don’t propagate from all

You’d expect net.ipv6.conf.all.accept_ra=2 to apply everywhere. It doesn’t. The kernel actually checks the per-interface value when processing an RA on that interface. Setting only all.* doesn’t propagate to existing interfaces (and even for new interfaces, the behavior is kernel-version-dependent).

Fix: explicit per-interface sysctls in /etc/sysctl.d/99-thread.conf inside each LXC:

Terminal window
cat > /etc/sysctl.d/99-thread.conf <<'EOF'
net.ipv6.conf.all.forwarding=1
net.ipv6.conf.all.accept_ra=2
net.ipv6.conf.all.accept_ra_rt_info_max_plen=64
net.ipv6.conf.default.accept_ra=2
net.ipv6.conf.default.accept_ra_rt_info_max_plen=64
net.ipv6.conf.eth0.accept_ra=2
net.ipv6.conf.eth0.accept_ra_rt_info_max_plen=64
net.ipv6.conf.eth0.forwarding=1
EOF
sysctl --system

2. Unprivileged LXCs silently drop RA-derived routes

Unprivileged containers run with restricted capabilities in their network namespace. The kernel parses incoming RAs (you’ll see them in tcpdump -i eth0 -vv 'icmp6 && ip6[40] == 134'), but the route never gets installed in the container’s routing table. No log message, no error — just a missing route.

Fix: convert OpenThread BR and Matter Server LXCs to privileged. Edit /etc/pve/lxc/<CTID>.conf on the Proxmox host and remove the unprivileged: 1 line. Then pct stop <CTID> && pct start <CTID>.

Verify with cat /proc/self/status | grep CapEff — privileged shows 000001ffffffffff or similar.

You can verify route installation works with:

Terminal window
ip -6 route | grep -v fe80
# Should show fdd2:b43f:bc3:1::/64 via fe80::xxxx dev eth0 proto ra

The proto ra is the kernel saying “I learned this from an RA.”

3. The OpenThread BR couldn’t route LAN-bound Thread replies

This was the smoking gun. With three tmux panes watching ot-ctl, tcpdump -i wpan0, and Matter Server logs, we saw this in tcpdump during commissioning:

fdd2:b43f:bc3:1:70c2:5787:2867:bc91 > fdd2:b43f:bc3:1:d90c:a2c0:107:9604:
ICMP6, destination unreachable, unreachable route fd52:cf61:d373:cc4:be24:11ff:fe81:fae6

Translation: the OpenThread BR is telling the bulb it can’t route back to the Matter Server’s LAN address. Forward path was fine: Matter Server → OpenThread BR → bulb. Reply path broke at the OpenThread BR.

The OpenThread BR’s own routing table had no entry for the LAN prefix (fd52:cf61:d373:cc4::/64). It has wpan0 routes to the Thread mesh, link-local on eth0 — but no route to the LAN it sits on. So it couldn’t forward Thread→LAN replies.

The cause: this OpenThread BR LXC, despite being on the same VLAN as everyone else, wasn’t getting a global IPv6 from the gateway’s RAs. It saw Router Solicitations (RSes) go out, no RAs came back. Something on the gateway or in the OpenThread BR’s own RA-processing was filtering.

Workaround that actually works: don’t fight it. Add the LAN address and route manually. We did this through a systemd service (see the systemd section below), because writing to /etc/network/interfaces got overridden by Proxmox at every boot.

4. Proxmox regenerates /etc/network/interfaces on every container start

The net0 line in the LXC config is the source of truth. Anything you add to /etc/network/interfaces gets wiped on container start. Don’t even try.

5. The OpenThread BR community-script ships a SysV init script that lies to systemd

The OpenThread BR’s init script uses start-stop-daemon -b (background-and-fork). systemd sees the init wrapper exit 0 and marks the service as “active (exited)” — even when the actual otbr-agent daemon dies a moment later. This means Restart=always does nothing, because systemd never sees a failure.

Symptoms: systemctl status otbr-agent says “active (exited)” forever, but pgrep otbr-agent shows no daemon running, and ot-ctl state returns “Connection refused: no such file or directory.”

Fix: replace the SysV-generated unit with a proper systemd service. See below.

6. Thread network needs an explicit ifconfig up; thread start after each daemon restart

Even when otbr-agent is running, Thread itself starts in detached state and doesn’t auto-rejoin. You have to run ot-ctl ifconfig up && ot-ctl thread start. We solved this with another systemd unit.

7. systemd race conditions on cold boot

The first attempt at the routing systemd service hit Error: Nexthop device is not up because the gateway’s link-local address wasn’t in the neighbor table yet when the route command ran. Fixed with a sleep + retry loop.

The complete working setup

What follows is the full configuration you can apply to recreate this. Adjust IP addresses and prefixes to match your network — the placeholders are:

  • LAN ULA prefix: fd52:cf61:d373:cc4::/64
  • LAN gateway link-local: fe80::9e05:d6ff:fecb:cc7b (find yours with ip -6 neigh show | grep router)
  • Thread OMR prefix: fdd2:b43f:bc3:1::/64 (the OpenThread BR generates this; find it with ot-ctl br omrprefix)
  • Thread radio: 192.168.100.247:6638 (your SLZB-MR or USB device path)

Step 1: Install the building blocks via community-scripts

Home Assistant is already running in a HAOS VM, so we just need the two LXCs. On Proxmox:

Terminal window
# OpenThread Border Router LXC
bash -c "$(wget -qLO - https://github.com/community-scripts/ProxmoxVE/raw/main/ct/openthread-br.sh)"
# Open Home Foundation Matter Server LXC
bash -c "$(wget -qLO - https://github.com/community-scripts/ProxmoxVE/raw/main/ct/matter-server.sh)"

Set both to use VLAN tag 100 (or whichever VLAN your IoT network is on) and the same bridge as the HAOS VM.

Step 2: Make both LXCs privileged

On the Proxmox host:

Terminal window
# OpenThread BR
nano /etc/pve/lxc/110.conf
# Remove the line: unprivileged: 1
# Matter Server
nano /etc/pve/lxc/111.conf
# Remove the line: unprivileged: 1
# Restart both
pct stop 110 && pct start 110
pct stop 111 && pct start 111

Step 3: Apply IPv6 sysctls in both LXCs

In each container (pct enter 110 and pct enter 111):

Terminal window
cat > /etc/sysctl.d/99-thread.conf <<'EOF'
net.ipv6.conf.all.forwarding=1
net.ipv6.conf.all.accept_ra=2
net.ipv6.conf.all.accept_ra_rt_info_max_plen=64
net.ipv6.conf.default.accept_ra=2
net.ipv6.conf.default.accept_ra_rt_info_max_plen=64
net.ipv6.conf.eth0.accept_ra=2
net.ipv6.conf.eth0.accept_ra_rt_info_max_plen=64
net.ipv6.conf.eth0.forwarding=1
EOF
sysctl --system

Step 4: OpenThread BR — replace the SysV init service with proper systemd

The community-script’s init script causes systemd to lose track of the daemon’s actual state. Replace it.

In OpenThread BR LXC:

Terminal window
# Disable the SysV-generated unit
systemctl disable otbr-agent
systemctl stop otbr-agent
# Remove old drop-ins
rm -rf /etc/systemd/system/otbr-agent.service.d
# Write proper systemd unit
cat > /etc/systemd/system/otbr-agent.service <<'EOF'
[Unit]
Description=OpenThread Border Router Agent
After=network-online.target
Wants=network-online.target
[Service]
Type=simple
EnvironmentFile=/etc/default/otbr-agent
ExecStart=/bin/sh -c '/usr/sbin/otbr-agent $OTBR_AGENT_OPTS'
Restart=always
RestartSec=10
StartLimitBurst=20
StartLimitIntervalSec=600
[Install]
WantedBy=multi-user.target
EOF
systemctl daemon-reload
systemctl enable --now otbr-agent

The Type=simple is what makes the difference: systemd treats the daemon’s lifecycle as the service’s lifecycle, so when the daemon dies (e.g., the radio TCP connection drops), Restart=always actually triggers.

Step 5: OpenThread BR — persistent IPv6 LAN routing

If your OpenThread BR’s RA processing works on its own (you get a kernel_ra address on eth0 and a route to the LAN prefix), skip this. If not, this systemd unit adds them manually.

In OpenThread BR LXC:

Terminal window
cat > /etc/systemd/system/thread-routing.service <<'EOF'
[Unit]
Description=Thread border router IPv6 routing setup
After=network-online.target
Wants=network-online.target
[Service]
Type=oneshot
RemainAfterExit=yes
ExecStartPre=/bin/sleep 5
ExecStart=/sbin/ip -6 addr replace fd52:cf61:d373:cc4::110/64 dev eth0
ExecStart=/bin/sh -c 'for i in 1 2 3 4 5 6 7 8 9 10; do /sbin/ip -6 route replace fd52:cf61:d373:cc4::/64 via fe80::9e05:d6ff:fecb:cc7b dev eth0 && exit 0; sleep 2; done; exit 1'
[Install]
WantedBy=multi-user.target
EOF
systemctl daemon-reload
systemctl enable --now thread-routing.service

The retry loop on the route command handles the boot-time race where the gateway isn’t yet in the neighbor table.

Step 6: OpenThread BR — bring Thread up automatically

Even with otbr-agent running, the Thread network sits in detached state until you tell it to start. This unit waits for the OpenThread BR socket to be ready, then brings Thread up.

In OpenThread BR LXC:

Terminal window
cat > /etc/systemd/system/thread-network.service <<'EOF'
[Unit]
Description=Bring up Thread network on OpenThread BR
After=otbr-agent.service network-online.target
Wants=otbr-agent.service network-online.target
[Service]
Type=oneshot
RemainAfterExit=yes
# Wait up to 2 minutes for the ot-ctl socket
ExecStartPre=/bin/sh -c 'for i in $(seq 1 60); do ot-ctl state >/dev/null 2>&1 && exit 0; sleep 2; done; exit 1'
ExecStart=/bin/sh -c 'ot-ctl ifconfig up || true'
ExecStart=/bin/sh -c 'ot-ctl thread start || true'
[Install]
WantedBy=multi-user.target
EOF
systemctl daemon-reload
systemctl enable --now thread-network.service

Step 7: Configure the Matter Server in Home Assistant

Once the Matter Server LXC is reachable on its WebSocket (port 5580), in Home Assistant: Settings → Devices & Services → Add Integration → Matter. Point it at ws://<matter-lxc-ip>:5580/ws.

Step 8: Commission a Thread device

  1. Settings → Devices & Services → Matter → Add device
  2. Scan the Matter QR code or enter the manual code
  3. Place the device close to the Thread radio for first commissioning

The first Thread device is the awkward one — it has to be in radio range of your border router. After that, mains-powered Thread devices act as routers themselves, extending the mesh.

Verification commands

When something’s wrong, here’s a debugging tour. Run these and the output tells you which layer is broken.

Terminal window
# On Proxmox host
pct config 110 | grep unprivileged # should show NOTHING (privileged)
pct config 111 | grep unprivileged # should show NOTHING (privileged)
# In OpenThread BR LXC
ot-ctl state # leader, router, or child (NOT detached)
ot-ctl br state # running
ot-ctl br omrprefix # shows the OMR prefix
ot-ctl netdata show # OMR prefix should be listed with 'paos' flags
ot-ctl child table # devices that joined as children
ot-ctl neighbor table # all Thread neighbors
ip -6 route # should have fdd2::/64 via wpan0 AND fd52::/64 via gateway
ip -6 addr show eth0 # should have global address (manual or kernel_ra)
sysctl net.ipv6.conf.eth0.accept_ra net.ipv6.conf.eth0.accept_ra_rt_info_max_plen
# Should be 2 and 64
# In Matter Server LXC
sysctl net.ipv6.conf.eth0.accept_ra net.ipv6.conf.eth0.accept_ra_rt_info_max_plen
ip -6 route | grep fdd2 # should show route to Thread mesh via OpenThread BR
# From Matter LXC, verify reachability
ping6 -c 2 fdd2:b43f:bc3:1:<otbr-omr-suffix> # should respond
# Watch live commissioning
journalctl -u matter-server -f

Common failure modes and what they mean

PASESession timed out in Matter Server logs: The phone established BLE with the device, the device joined Thread, but the Matter Server can’t reach it over IPv6 to complete cryptographic commissioning. Check OpenThread BR routing in both directions.

destination unreachable, unreachable route X in OpenThread BR’s tcpdump: Pure routing problem at the OpenThread BR. It can’t forward Thread → LAN. Check ip -6 route on the OpenThread BR — needs entries for both the Thread mesh and the LAN prefix.

connect session failed: No such file or directory when running ot-ctl: The OpenThread BR daemon isn’t actually running. systemctl status otbr-agent may say “active” — that’s the SysV lie. Check pgrep otbr-agent. If it’s missing, the proper systemd unit (Step 4) should fix this.

Empty ot-ctl child table and ot-ctl childip after a device tries to join: Device joined briefly but timed out and detached. Look at the Matter Server logs for what happened in between.

ot-ctl state: detached: The OpenThread BR is up but Thread isn’t. Run ot-ctl ifconfig up && ot-ctl thread start, or restart the thread-network.service.

Error: Nexthop device is not up when adding a route: The kernel doesn’t yet know that the gateway’s link-local address is on eth0. Wait for an RA to arrive and try again, or use the retry loop pattern in thread-routing.service.

Things I learned the hard way

Many things, including:

  • The HAOS shell from the SSH add-on is not the HAOS host shell. The add-on runs in its own container with its own network namespace. Use nsenter --target 1 --mount --uts --ipc --net --pid -- /bin/sh or equivalent to actually get to the host.

  • ip link set eth0 down on the Proxmox host doesn’t refer to a container’s eth0. There’s no eth0 on the host (NICs are named enp87s0 etc.). Run that command in the wrong shell and at best it errors harmlessly; at worst, if you happen to have an interface named eth0, you take down your own connectivity.

  • Filename ordering in /etc/sysctl.d/ matters. If you have 99-otbr.conf and 99-thread.conf, both will load and the later one wins. Use unambiguous names if order matters.

  • A network-attached Thread radio (like SMLIGHT SLZB-MR series) creates a TCP connection dependency the OpenThread BR can’t recover from gracefully if it fails. The proper systemd unit + Restart=always is essential.

  • mDNS does not cross VLANs. Don’t try to be clever putting your IoT devices on a different VLAN than Home Assistant. Same broadcast domain only. UniFi’s mDNS reflector doesn’t fix it for IPv6, and reportedly causes its own pain.

  • Thread devices auto-promote to router role once they’ve been joined and are mains-powered. The Kajplats was a child briefly, then promoted to router, which is why subsequent diagnostics showed it in neighbor table instead of child table.

What I’d do differently next time

If I were starting from zero:

  1. Convert both LXCs to privileged before doing anything else.
  2. Apply the sysctls before starting any Thread/Matter services.
  3. Replace the OpenThread BR’s SysV unit with a proper systemd service before observing any “weird crashes.”
  4. Don’t bother trying /etc/network/interfaces for IPv6 config in Proxmox LXCs. Use systemd units from the start.
  5. Have a clear understanding of what each layer does before debugging — RA processing, RIO handling, mDNS proxying, and SRP service registration are each their own thing, and each can fail silently.

Useful references

Closing thought

This setup is more fragile than HAOS-native add-ons because you’re piecing together components that the official Home Assistant team has consciously chosen to provide as a single integrated stack. If you’re not particularly attached to running these in separate LXCs, the simplest “Thread/Matter just works” answer is: install the equivalent add-ons inside HAOS and let NetworkManager handle the userspace RA processing.

But if you want the separation of concerns (different LXCs for the Thread radio and Matter controller, with Home Assistant itself in its own VM — I do, for backup and update independence reasons), you can absolutely make it work. It’s just a fair bit of yak-shaving, and most of the failure modes are silent. Hopefully this document saves you from rediscovering them.