Discussion:
Bridging Network Connections with libvirt are unreliable
(too old to reply)
Rainer Dorsch
2024-08-28 08:10:01 UTC
Permalink
Hello,

I have a (for me) weird problem on a bookworm system

***@h370:~$ inxi -S
System:
Host: h370 Kernel: 6.1.0-23-amd64 arch: x86_64 bits: 64 Desktop: KDE Plasma
v: 5.27.5 Distro: Debian GNU/Linux 12 (bookworm)
***@h370:~$

It uses bridging network connections with libvirt work unreliable.

I have in /etc/network/interface bridging networks e.g.

iface eno1.2 inet manual

# libvirt VM
auto br2
iface br2 inet dhcp
# Use the MAC address identified above.
hwaddress ether 18:31:bf:52:1b:1c
bridge_ports eno1.2
# If you want to turn on Spanning Tree Protocol, ask your hosting
# provider first as it may conflict with their network.
bridge_stp off
# If STP is off, set to 0. If STP is on, set to 2 (or greater).
bridge_fd 0

to make the interface available for libvirt.

In addition there are non-bridging networks, e.g.

allow-hotplug eno1.4
iface eno1.4 inet dhcp

All of them share the same physical network but defined separate VLANs.

The full /etc/network/interface file of the machine is here https://
bokomoko.de/~rd/Debian/interfaces

That works well for many hours or even days, but at some point in time the
network is suddenly gone, and all network services die.

***@h370:~# ifdown br2

and

***@h370:~# ifup br2

heals the issue immediately. The non-bridging networks don't see the problem.
The problem occurs independently of libvirt running or not.

In the systemd log, the first entry indicating network problems is that the DNS
server switches to another interface. But it could easily be a consequence and
not the cause of the issue:

Aug 28 06:57:54 h370 dhclient[1195]: DHCPREQUEST for 192.168.4.203 on eno1.4
to 192.168.4.1 port 67
Aug 28 06:57:54 h370 dhclient[1195]: DHCPACK of 192.168.4.203 from 192.168.4.1
Aug 28 06:57:54 h370 dnsmasq[2386]: reading /etc/resolv.conf
Aug 28 06:57:54 h370 dnsmasq[2386]: using nameserver 192.168.4.1#53
Aug 28 06:57:54 h370 dhclient[1195]: bound to 192.168.4.203 -- renewal in
18265 seconds.

As a workaround I could probably write a small script, which pings another
network host and restarts the br interfaces, but I would prefer to understand
why the problem occurs at the first place.

Any idea or hint is welcome.

Many thanks
Rainer
--
Rainer Dorsch
http://bokomoko.de/
Tim Woodall
2024-08-29 18:40:01 UTC
Permalink
Post by Rainer Dorsch
In the systemd log, the first entry indicating network problems is that the DNS
server switches to another interface. But it could easily be a consequence and
Aug 28 06:57:54 h370 dhclient[1195]: DHCPREQUEST for 192.168.4.203 on eno1.4
to 192.168.4.1 port 67
Aug 28 06:57:54 h370 dhclient[1195]: DHCPACK of 192.168.4.203 from 192.168.4.1
Aug 28 06:57:54 h370 dnsmasq[2386]: reading /etc/resolv.conf
Aug 28 06:57:54 h370 dnsmasq[2386]: using nameserver 192.168.4.1#53
Aug 28 06:57:54 h370 dhclient[1195]: bound to 192.168.4.203 -- renewal in
18265 seconds.
To me that looks like it's the DHCP request(renewal?) that is more
likely breaking things. The DHCP server is presumably rewriting
resolv.conf.

I have the following setting to stop dhcp changing resolv.conf:

$ cat /etc/dhcp/dhclient-enter-hooks.d/nodnsupdate
make_resolv_conf() {
:
}

Don't know if that will fix your problem but it should hopefully stop
those dnsmasq lines appearing in the log.

Does the problem definitely happen when the dhcp update happens or are
these just the nearest logs?

Tim.
Rainer Dorsch
2024-08-30 16:50:01 UTC
Permalink
Post by Tim Woodall
Post by Rainer Dorsch
In the systemd log, the first entry indicating network problems is that
the DNS server switches to another interface. But it could easily be a
Aug 28 06:57:54 h370 dhclient[1195]: DHCPREQUEST for 192.168.4.203 on
eno1.4 to 192.168.4.1 port 67
Aug 28 06:57:54 h370 dhclient[1195]: DHCPACK of 192.168.4.203 from
192.168.4.1 Aug 28 06:57:54 h370 dnsmasq[2386]: reading /etc/resolv.conf
Aug 28 06:57:54 h370 dnsmasq[2386]: using nameserver 192.168.4.1#53
Aug 28 06:57:54 h370 dhclient[1195]: bound to 192.168.4.203 -- renewal in
18265 seconds.
To me that looks like it's the DHCP request(renewal?) that is more
likely breaking things. The DHCP server is presumably rewriting
resolv.conf.
$ cat /etc/dhcp/dhclient-enter-hooks.d/nodnsupdate
make_resolv_conf() {
}
Don't know if that will fix your problem but it should hopefully stop
those dnsmasq lines appearing in the log.
Does the problem definitely happen when the dhcp update happens or are
these just the nearest logs?
Many thanks for your reply. I added the nodnsupdate configuration you
suggested. But I should see now, if the problem comes back (unfortunately if
happens in very irregular intervals). Do I need to restart a service that the
change becomes effective?

I cannot tell if it happens when the dhcp update happens or if this was just a
coincident (or if the network issue even triggered a dns update?). I can tell
though that it by far happens not for every dhcp update, there are many more
of them in the log. Therefore at least something else must happen as well.

I see a number of active dhclients though

root 772 0.0 0.0 5872 3148 ? Ss Aug26 0:00 dhclient -4
-v -i -pf /run/dhclient.eno1.pid -lf /var/lib/dhcp/dhclient.eno1.leases -I -df
/var/lib/dhcp/dhclient6.eno1.leases eno1
root 1114 0.0 0.0 5872 3524 ? Ss Aug26 0:00 dhclient -4
-v -i -pf /run/dhclient.eno1.3.pid -lf /var/lib/dhcp/dhclient.eno1.3.leases -I
-df /var/lib/dhcp/dhclient6.eno1.3.leases eno1.3
root 1195 0.0 0.0 5872 3572 ? Ss Aug26 0:00 dhclient -4
-v -i -pf /run/dhclient.eno1.4.pid -lf /var/lib/dhcp/dhclient.eno1.4.leases -I
-df /var/lib/dhcp/dhclient6.eno1.4.leases eno1.4
root 1268 0.0 0.0 5868 3428 ? Ss Aug26 0:00 dhclient -4
-v -i -pf /run/dhclient.eno1.6.pid -lf /var/lib/dhcp/dhclient.eno1.6.leases -I
-df /var/lib/dhcp/dhclient6.eno1.6.leases eno1.6
root 377797 0.0 0.0 5848 3380 ? Ss Aug28 0:00 dhclient -4
-v -i -pf /run/dhclient.br7.pid -lf /var/lib/dhcp/dhclient.br7.leases -I -df /
var/lib/dhcp/dhclient6.br7.leases br7
root 378009 0.0 0.0 5848 3560 ? Ss Aug28 0:00 dhclient -4
-v -i -pf /run/dhclient.br2.pid -lf /var/lib/dhcp/dhclient.br2.leases -I -df /
var/lib/dhcp/dhclient6.br2.leases br2
root 378210 0.0 0.0 5848 3516 ? Ss Aug28 0:00 dhclient -4
-v -i -pf /run/dhclient.br5.pid -lf /var/lib/dhcp/dhclient.br5.leases -I -df /
var/lib/dhcp/dhclient6.br5.leases br5

Many thanks again
Rainer
--
Rainer Dorsch
http://bokomoko.de/
Jeffrey Walton
2024-08-29 23:20:01 UTC
Permalink
Post by Rainer Dorsch
Hello,
I have a (for me) weird problem on a bookworm system
Host: h370 Kernel: 6.1.0-23-amd64 arch: x86_64 bits: 64 Desktop: KDE Plasma
v: 5.27.5 Distro: Debian GNU/Linux 12 (bookworm)
It uses bridging network connections with libvirt work unreliable.
I have in /etc/network/interface bridging networks e.g.
iface eno1.2 inet manual
# libvirt VM
auto br2
iface br2 inet dhcp
# Use the MAC address identified above.
hwaddress ether 18:31:bf:52:1b:1c
bridge_ports eno1.2
# If you want to turn on Spanning Tree Protocol, ask your hosting
# provider first as it may conflict with their network.
bridge_stp off
# If STP is off, set to 0. If STP is on, set to 2 (or greater).
bridge_fd 0
to make the interface available for libvirt.
In addition there are non-bridging networks, e.g.
allow-hotplug eno1.4
iface eno1.4 inet dhcp
All of them share the same physical network but defined separate VLANs.
The full /etc/network/interface file of the machine is here https://
bokomoko.de/~rd/Debian/interfaces
That works well for many hours or even days, but at some point in time the
network is suddenly gone, and all network services die.
and
heals the issue immediately. The non-bridging networks don't see the problem.
The problem occurs independently of libvirt running or not.
In the systemd log, the first entry indicating network problems is that the DNS
server switches to another interface. But it could easily be a consequence and
Aug 28 06:57:54 h370 dhclient[1195]: DHCPREQUEST for 192.168.4.203 on eno1.4
to 192.168.4.1 port 67
Aug 28 06:57:54 h370 dhclient[1195]: DHCPACK of 192.168.4.203 from 192.168.4.1
Aug 28 06:57:54 h370 dnsmasq[2386]: reading /etc/resolv.conf
Aug 28 06:57:54 h370 dnsmasq[2386]: using nameserver 192.168.4.1#53
Aug 28 06:57:54 h370 dhclient[1195]: bound to 192.168.4.203 -- renewal in
18265 seconds.
As a workaround I could probably write a small script, which pings another
network host and restarts the br interfaces, but I would prefer to understand
why the problem occurs at the first place.
Any idea or hint is welcome.
Do you know if MAC Address Randomization is happening on your interfaces?

Jeff
Rainer Dorsch
2024-08-30 17:00:01 UTC
Permalink
Post by Jeffrey Walton
Post by Rainer Dorsch
Hello,
I have a (for me) weird problem on a bookworm system
Host: h370 Kernel: 6.1.0-23-amd64 arch: x86_64 bits: 64 Desktop: KDE Plasma
v: 5.27.5 Distro: Debian GNU/Linux 12 (bookworm)
It uses bridging network connections with libvirt work unreliable.
I have in /etc/network/interface bridging networks e.g.
iface eno1.2 inet manual
# libvirt VM
auto br2
iface br2 inet dhcp
# Use the MAC address identified above.
hwaddress ether 18:31:bf:52:1b:1c
bridge_ports eno1.2
# If you want to turn on Spanning Tree Protocol, ask your hosting
# provider first as it may conflict with their network.
bridge_stp off
# If STP is off, set to 0. If STP is on, set to 2 (or greater).
bridge_fd 0
to make the interface available for libvirt.
In addition there are non-bridging networks, e.g.
allow-hotplug eno1.4
iface eno1.4 inet dhcp
All of them share the same physical network but defined separate VLANs.
The full /etc/network/interface file of the machine is here https://
bokomoko.de/~rd/Debian/interfaces
That works well for many hours or even days, but at some point in time the
network is suddenly gone, and all network services die.
and
heals the issue immediately. The non-bridging networks don't see the
problem. The problem occurs independently of libvirt running or not.
In the systemd log, the first entry indicating network problems is that
the DNS server switches to another interface. But it could easily be a
Aug 28 06:57:54 h370 dhclient[1195]: DHCPREQUEST for 192.168.4.203 on
eno1.4 to 192.168.4.1 port 67
Aug 28 06:57:54 h370 dhclient[1195]: DHCPACK of 192.168.4.203 from
192.168.4.1 Aug 28 06:57:54 h370 dnsmasq[2386]: reading /etc/resolv.conf
Aug 28 06:57:54 h370 dnsmasq[2386]: using nameserver 192.168.4.1#53
Aug 28 06:57:54 h370 dhclient[1195]: bound to 192.168.4.203 -- renewal in
18265 seconds.
As a workaround I could probably write a small script, which pings another
network host and restarts the br interfaces, but I would prefer to
understand why the problem occurs at the first place.
Any idea or hint is welcome.
Do you know if MAC Address Randomization is happening on your interfaces?
Hi Jeff,

many thanks for your reply.

I am not aware that I configured address randomization.

Just checking right now the output of inxi

***@h370:~# inxi -n
Network:
Device-1: Intel Ethernet I219-V driver: e1000e
IF: eno1 state: up speed: 1000 Mbps duplex: full mac: 18:31:bf:52:1b:1c
IF-ID-1: br2 state: up speed: 1000 Mbps duplex: unknown mac: 18:31:bf:52:1b:
1c
IF-ID-2: br5 state: up speed: 1000 Mbps duplex: unknown mac: 18:31:bf:52:1b:
1c
IF-ID-3: br7 state: up speed: 1000 Mbps duplex: unknown mac: 18:31:bf:52:1b:
1c
IF-ID-4: docker0 state: down mac: 02:42:5a:3f:a7:55
IF-ID-5: eno1.2 state: up speed: 1000 Mbps duplex: full mac: 18:31:bf:52:1b:
1c
IF-ID-6: eno1.3 state: up speed: 1000 Mbps duplex: full mac: 18:31:bf:52:1b:
1c
IF-ID-7: eno1.4 state: up speed: 1000 Mbps duplex: full mac: 18:31:bf:52:1b:
1c
IF-ID-8: eno1.5 state: up speed: 1000 Mbps duplex: full mac: 18:31:bf:52:1b:
1c
IF-ID-9: eno1.6 state: up speed: 1000 Mbps duplex: full mac: 18:31:bf:52:1b:
1c
IF-ID-10: eno1.7 state: up speed: 1000 Mbps duplex: full mac: 18:31:bf:
52:1b:1c
IF-ID-11: eno1.99 state: up speed: 1000 Mbps duplex: full mac: 18:31:bf:
52:1b:1c
IF-ID-12: virbr0 state: down mac: 52:54:00:79:ce:77
***@h370:~#

shows a mac address of 18:31:bf:52:1b:1c (which is the same as ifconfig
reports).

In a note which is years old, I found in the dmidecode output this MAC address
in the UUID encoded:

System Information
Manufacturer: System manufacturer
Product Name: System Product Name
Version: System Version
Serial Number: System Serial Number
UUID: 9c815dee-28d8-5276-d202-1831bf521b1c
Wake-up Type: Power Switch
SKU Number: ASUS_MB_CNL
Family: To be filled by O.E.M.

For me that means there is no address randomization used. At least it would
run very infrequently :-).

Thanks again
Rainer
--
Rainer Dorsch
http://bokomoko.de/
Loading...