Discussion:
timeout in shutdown, mutt killed by SIGKILL
(too old to reply)
Vincent Lefevre
2024-05-29 01:00:01 UTC
Permalink
During the latest shutdown:

May 29 01:55:05 qaa systemd[1]: Stopping session-2.scope - Session 2 of User vinc17...
[...]
May 29 01:55:26 qaa systemd[1]: session-2.scope: Stopping timed out. Killing.
May 29 01:55:26 qaa systemd[1]: session-2.scope: Killing process 2990 (mutt) with signal SIGKILL.
May 29 01:55:26 qaa systemd[1]: session-2.scope: Failed with result 'timeout'.
May 29 01:55:26 qaa systemd[1]: Stopped session-2.scope - Session 2 of User vinc17.
May 29 01:55:26 qaa systemd[1]: session-2.scope: Consumed 8h 17min 54.832s CPU time.
May 29 01:55:26 qaa systemd[1]: Stopping systemd-logind.service - User Login Management...
May 29 01:55:26 qaa systemd[1]: Stopping ***@1000.service - User Manager for UID 1000...
[...]

Note: I had set DefaultTimeoutStopSec=20s because then is normally
sufficient.

The mutt process was running in GNU Screen. The current mailbox
was just a local one. Mutt does no network connections (I do not
use IMAP). I general, I quit it before logging out, but this time,
it seems that I forgot.

But I don't understand why there was a timeout. Does this mean that
mutt didn't react to SIGTERM? Any reason?
--
Vincent Lefèvre <***@vinc17.net> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)
Curt
2024-05-29 16:20:01 UTC
Permalink
Post by Vincent Lefevre
May 29 01:55:05 qaa systemd[1]: Stopping session-2.scope - Session 2 of User vinc17...
[...]
May 29 01:55:26 qaa systemd[1]: session-2.scope: Stopping timed out. Killing.
May 29 01:55:26 qaa systemd[1]: session-2.scope: Killing process 2990 (mutt) with signal SIGKILL.
May 29 01:55:26 qaa systemd[1]: session-2.scope: Failed with result 'timeout'.
May 29 01:55:26 qaa systemd[1]: Stopped session-2.scope - Session 2 of User vinc17.
May 29 01:55:26 qaa systemd[1]: session-2.scope: Consumed 8h 17min 54.832s CPU time.
May 29 01:55:26 qaa systemd[1]: Stopping systemd-logind.service - User Login Management...
[...]
Note: I had set DefaultTimeoutStopSec=20s because then is normally
sufficient.
The mutt process was running in GNU Screen. The current mailbox
was just a local one. Mutt does no network connections (I do not
use IMAP). I general, I quit it before logging out, but this time,
it seems that I forgot.
But I don't understand why there was a timeout. Does this mean that
mutt didn't react to SIGTERM? Any reason?
sudo journalctl -b-1?
Vincent Lefevre
2024-05-30 16:10:02 UTC
Permalink
Post by Curt
Post by Vincent Lefevre
May 29 01:55:05 qaa systemd[1]: Stopping session-2.scope - Session 2 of User vinc17...
[...]
May 29 01:55:26 qaa systemd[1]: session-2.scope: Stopping timed out. Killing.
May 29 01:55:26 qaa systemd[1]: session-2.scope: Killing process 2990 (mutt) with signal SIGKILL.
May 29 01:55:26 qaa systemd[1]: session-2.scope: Failed with result 'timeout'.
May 29 01:55:26 qaa systemd[1]: Stopped session-2.scope - Session 2 of User vinc17.
May 29 01:55:26 qaa systemd[1]: session-2.scope: Consumed 8h 17min 54.832s CPU time.
May 29 01:55:26 qaa systemd[1]: Stopping systemd-logind.service - User Login Management...
[...]
Note: I had set DefaultTimeoutStopSec=20s because then is normally
sufficient.
The mutt process was running in GNU Screen. The current mailbox
was just a local one. Mutt does no network connections (I do not
use IMAP). I general, I quit it before logging out, but this time,
it seems that I forgot.
But I don't understand why there was a timeout. Does this mean that
mutt didn't react to SIGTERM? Any reason?
sudo journalctl -b-1?
See above. There's nothing else about mutt in the journal.
--
Vincent Lefèvre <***@vinc17.net> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)
Max Nikulin
2024-05-29 17:30:01 UTC
Permalink
Post by Vincent Lefevre
But I don't understand why there was a timeout. Does this mean that
mutt didn't react to SIGTERM? Any reason?
Have you tried to send SIGTERM to mutt? If it ignores this signal or the
reaction is some prompt then you need to find another way to stop mutt
and to configure systemd user session to do it on logout (shutdown).
Vincent Lefevre
2024-05-30 16:20:01 UTC
Permalink
Post by Max Nikulin
Post by Vincent Lefevre
But I don't understand why there was a timeout. Does this mean that
mutt didn't react to SIGTERM? Any reason?
Have you tried to send SIGTERM to mutt?
I didn't. AFAIK, systemd sends a SIGTERM to all the processes of the
session: that's the

May 29 01:55:05 qaa systemd[1]: Stopping session-2.scope - Session 2 of User vinc17...

but this timed out, apparently due to the mutt process, which was
still running, so that systemd sent a SIGKILL (which succeeded).

May 29 01:55:26 qaa systemd[1]: session-2.scope: Stopping timed out. Killing.
May 29 01:55:26 qaa systemd[1]: session-2.scope: Killing process 2990 (mutt) with signal SIGKILL.
Post by Max Nikulin
If it ignores this signal or the reaction is some prompt then you
need to find another way to stop mutt and to configure systemd user
session to do it on logout (shutdown).
A SIGTERM normally kills mutt. In signal.c, signals are blocked
"while doing critical ops", namely for compressing/decompressing
(which I don't use) and when locking a mailbox (but I don't see
why this would have happened). Otherwise the signal handler just
does that:

static void exit_handler (int sig)
{
curs_set (1);
endwin (); /* just to be safe */

exit_print_string (Caught_Signal_L10N ? Caught_Signal_L10N : "Caught signal ");
#if SYS_SIGLIST_DECLARED
exit_print_string (sys_siglist[sig]);
#else
#if (__sun__ && __svr4__)
exit_print_string (_sys_siglist[sig]);
#else
#if (__alpha && __osf__)
exit_print_string (__sys_siglist[sig]);
#else
exit_print_int (sig);
#endif
#endif
#endif
exit_print_string (Exiting_L10N ? Exiting_L10N : "... Exiting.\n");
exit (0);
}
--
Vincent Lefèvre <***@vinc17.net> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)
Max Nikulin
2024-05-31 03:10:01 UTC
Permalink
Post by Vincent Lefevre
Post by Max Nikulin
Have you tried to send SIGTERM to mutt?
I didn't. AFAIK, systemd sends a SIGTERM to all the processes of the
session: that's the
Yes, SIGTERM is the default that systemd tries first. There are various
kill modes, another signal may be configured for a unit, but I would
leave defaults for user sessions.

Do you mean that mutt properly exits unless it receives SIGTERM in the
course of shutdown process? I would try to enable debug log in mutt.
There is a chance that networking is already disabled (or some other
system unit important for mutt is already stopped) when systemd tries to
kill mutt first time.
Vincent Lefevre
2024-05-31 08:20:01 UTC
Permalink
Post by Max Nikulin
Do you mean that mutt properly exits unless it receives SIGTERM in the
course of shutdown process?
I think that this was not the first time I did a shutdown while Mutt
was still running. But this was the first time it did not exit.
Post by Max Nikulin
I would try to enable debug log in mutt. There is a chance that
networking is already disabled (or some other system unit important
for mutt is already stopped) when systemd tries to kill mutt first
time.
No, according to the logs, the network was stopped after SIGKILL
was sent to Mutt:

May 29 01:55:26 qaa systemd[1]: session-2.scope: Stopping timed out. Killing.
May 29 01:55:26 qaa systemd[1]: session-2.scope: Killing process 2990 (mutt) with signal SIGKILL.
May 29 01:55:26 qaa systemd[1]: session-2.scope: Failed with result 'timeout'.
May 29 01:55:26 qaa systemd[1]: Stopped session-2.scope - Session 2 of User vinc17.
May 29 01:55:26 qaa systemd[1]: session-2.scope: Consumed 8h 17min 54.832s CPU time.
May 29 01:55:26 qaa systemd[1]: Stopping systemd-logind.service - User Login Management...
[...]
May 29 01:55:26 qaa systemd[1]: Stopping NetworkManager.service - Network Manager...
May 29 01:55:26 qaa NetworkManager[1238]: <info> [1716940526.2966] caught SIGTERM, shutting down normally.
May 29 01:55:26 qaa NetworkManager[1238]: <info> [1716940526.2975] dhcp4 (eth0): canceled DHCP transaction

Note that mutt is not the only case of timeout at shutdown. There's
also

May 25 22:53:32 qaa systemd[2230]: xdg-document-portal.service: State 'stop-sigterm' timed out. Killing.
May 25 22:53:32 qaa systemd[2230]: xdg-document-portal.service: Killing process 116031 (xdg-document-po) with signal SIGKILL.
May 25 22:53:32 qaa systemd[2230]: xdg-document-portal.service: Killing process 116037 (fusermount3) with signal SIGKILL.
May 25 22:53:32 qaa systemd[2230]: xdg-document-portal.service: Main process exited, code=killed, status=9/KILL
May 25 22:53:32 qaa systemd[2230]: xdg-document-portal.service: Failed with result 'timeout'.

This one is a known issue:

https://github.com/flatpak/xdg-desktop-portal/issues/999

I'm wondering whether there could be a same cause.
--
Vincent Lefèvre <***@vinc17.net> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)
Vincent Lefevre
2024-05-31 09:20:01 UTC
Permalink
Post by Vincent Lefevre
Post by Max Nikulin
Do you mean that mutt properly exits unless it receives SIGTERM in the
course of shutdown process?
I think that this was not the first time I did a shutdown while Mutt
was still running. But this was the first time it did not exit.
I've tried a shutdown with Mutt still running in GNU Screen
a couple of times, and I got no timeout, i.e. Mutt exited as
expected.
--
Vincent Lefèvre <***@vinc17.net> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)
Max Nikulin
2024-05-31 12:10:01 UTC
Permalink
Post by Vincent Lefevre
https://github.com/flatpak/xdg-desktop-portal/issues/999
I'm wondering whether there could be a same cause.
I can imagine that mutt may start a GUI handler for some attachment and
that application uses XDG desktop portal. However I would expect that
mutt just creates a file in /tmp, so it should not prevent mutt from
quit on SIGTERM. Desktop environment should ensure that the handler runs
independently of the mutt process (not as a child). Moreover that
application should be stopped earlier during shutdown.

Do you see an attempt to send SIGTERM to mutt before timeout and
SIGKILL? What other processes survived first step? Are there something
suspicious before SIGKILL stage?
Vincent Lefevre
2024-06-04 00:40:01 UTC
Permalink
Do you see an attempt to send SIGTERM to mutt before timeout and SIGKILL?
Unfortunately, there was no information from systemd. Some daemons
log a received SIGTERM, but mutt isn't a daemon.
What other processes survived first step? Are there something suspicious
before SIGKILL stage?
According to the journalctl output, only mutt survived (the only
process to receive a SIGKILL). And nothing suspicious.
--
Vincent Lefèvre <***@vinc17.net> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)
Max Nikulin
2024-06-04 15:10:02 UTC
Permalink
Post by Vincent Lefevre
Do you see an attempt to send SIGTERM to mutt before timeout and SIGKILL?
Unfortunately, there was no information from systemd. Some daemons
log a received SIGTERM, but mutt isn't a daemon.
It is still a part of some systemd.service, try systemd-cgls. If mutt
does not receive SIGTERM then it has no chance to stop. Perhaps that log
entry just tells that a signal is sent to the whole cgroup.
Post by Vincent Lefevre
What other processes survived first step? Are there something suspicious
before SIGKILL stage?
According to the journalctl output, only mutt survived (the only
process to receive a SIGKILL). And nothing suspicious.
In the previous message you wrote about xdg-document-portal.

Perhaps timeouts are inconsistent for system and user services, so
***@.service is killed before SIGTERM and SIGKILL stages completed
withing the systemd user session. However I still expect SIGTERM before
SIGKILL.

You may start a shell script that logs SIGTERM, but does not exit, to
test if behavior is similar to your mutt issue.

Loading...