BYOC macOS clouds go offline for no reason
I've noticed this on two Macs now. Initially I thought it was because the machines themselves were sleeping but I have verified this is not the case.
Last night all was building fine. This morning, while the Mac is still accessible across the network via ping and VNC, it shows as "Offline" in the AppVeyor server build environment section. Running a build fails as it cannot find a Mac to build on.
Nothing has changed overnight. Both server and Mac are online. I've also verified a ping succeeds from the Mac to the server.
How can I diagnose this issue? Are there logs? Is there some sort of idle timeout?
Keyboard shortcuts
Generic
? | Show this help |
---|---|
ESC | Blurs the current field |
Comment Form
r | Focus the comment reply box |
---|---|
^ + ↩ | Submit the comment |
You can use Command ⌘
instead of Control ^
on Mac
1 Posted by Oliver Collyer on 06 Mar, 2020 08:58 AM
Also, here is confirmation the service is still running, after typing brew services list:
Name Status User Plist
appveyor-host-agent started oliver /Users/oliver/Library/LaunchAgents/homebrew.mxcl.appveyor-host-agent.plist
2 Posted by Oliver Collyer on 06 Mar, 2020 09:08 AM
And confirmation from netstat that the Mac is still listening on port 5020:
appveyor- 440 oliver 229u IPv4 0x9baeb5b4fa55a4a1 0t0 TCP localhost:5020 (LISTEN)
appveyor- 440 oliver 230u IPv6 0x9baeb5b4f2c67bc9 0t0 TCP localhost:5020 (LISTEN)
The AppVeyor server can also reach the Mac via ping.
So from what I can tell there is nothing network-related that is causing the cloud to show up as offline.
I suspect that restarting the service on the Mac would solve it, but I would like to understand the cause as I need this to be reliable.
3 Posted by Oliver Collyer on 06 Mar, 2020 09:41 AM
Ok, I've. been trying to upload/paste/link to the logs but your security systems won't allow me.
Bit from the logs it appears to lose connection at some point. (I can't see what time, as the logs aren't timestamped, but this may coincide with increased network activity over my LAN during overnight backups.
The stdout log ends at the connection failure.
Is the agent missing some reconnection logic? After all, transitory network events do occur, but judging from the logs, it didn't try and reconnect and just gave up.
4 Posted by Oliver Collyer on 06 Mar, 2020 10:26 AM
Ok, my work around is to run this script every 60s:
if grep "Error connecting Host Agent to AppVeyor" /usr/local/var/appveyor/host-agent/host-agent.stdout.log ; then
brew services stop appveyor-host-agent rm /usr/local/var/appveyor/host-agent/*.log brew services start appveyor-host-agent fi
5 Posted by Oliver Collyer on 28 Apr, 2020 06:46 AM
So I've found that this also happens on Windows too.
It's necessary to schedule the following batch file every 60s to workaround it:
wevtutil qe "AppVeyor" | findstr /C:"Error connecting Host Agent to AppVeyor" || exit 0
PowerShell -NoProfile -ExecutionPolicy Bypass -Command "& Stop-Service Appveyor.HostAgent"
wevtutil cl "AppVeyor"
PowerShell -NoProfile -ExecutionPolicy Bypass -Command "& Start-Service Appveyor.HostAgent"
This can be created as a task in task scheduler, and needs to be given admin privileges.
Perhaps the service applet can be improved so that it periodically attempts to reconnect instead of giving up forever? Otherwise, it seems to me that as soon as there is a network outage of a sufficiently long period of time the whole thing just stops without these workarounds.
Hope these workarounds help someone.
6 Posted by Oliver Collyer on 09 Sep, 2022 11:53 AM
I wonder if there can be improvements to the cmdlet for this?
Here is how things currently are, 2.5 years later. This is the output from the host agent on a Mac when I stop the appveyor server service (Windows) for a minute or so, and then start it again.
As you can see, the host agen, never recovers, it stays forever on "Stopping Host Agent connection...".
One has to manually stop/start the service, to enable it to reconnect. Clearly you have put in some disconnection detection and reconnection logic, but it would appear that it isn't working in this case.
info: Appveyor.HostAgent.Services.HostAgentHubClientCore[0]
Connected to HostAgentHub
info: Appveyor.HostAgent.AppHost[0]
Registering Host Agent
info: Appveyor.HostAgent.AppHost[0]
Host Agent registered
info: Appveyor.HostAgent.AppHost[0]
Registering all clouds received from AppVeyor
info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
Configuring cloud 13
info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
[worker-13-001] Start
info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
[worker-13-001] Run
info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
[worker-13-001] Ready
info: Appveyor.HostAgent.AppHost[0]
Updating cloud availability (buildCloudId=13): 20 worker(s) available, 0 worker(s) busy
info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
[worker-13-001] Started
info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
[worker-13-013] Start
info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
[worker-13-013] Run
info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
[worker-13-013] Ready
info: Appveyor.HostAgent.AppHost[0]
Updating cloud availability (buildCloudId=13): 20 worker(s) available, 0 worker(s) busy
info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
[worker-13-013] Started
info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
[worker-13-005] Start
info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
[worker-13-005] Run
info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
[worker-13-005] Ready
info: Appveyor.HostAgent.AppHost[0]
Updating cloud availability (buildCloudId=13): 20 worker(s) available, 0 worker(s) busy
info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
[worker-13-005] Started
info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
[worker-13-004] Start
info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
[worker-13-004] Run
info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
[worker-13-004] Ready
info: Appveyor.HostAgent.AppHost[0]
Updating cloud availability (buildCloudId=13): 20 worker(s) available, 0 worker(s) busy
info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
[worker-13-004] Started
info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
[worker-13-019] Start
info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
[worker-13-019] Run
info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
[worker-13-019] Ready
info: Appveyor.HostAgent.AppHost[0]
Updating cloud availability (buildCloudId=13): 20 worker(s) available, 0 worker(s) busy
info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
[worker-13-019] Started
info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
[worker-13-006] Start
info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
[worker-13-006] Run
info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
[worker-13-006] Ready
info: Appveyor.HostAgent.AppHost[0]
Updating cloud availability (buildCloudId=13): 20 worker(s) available, 0 worker(s) busy
info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
[worker-13-006] Started
info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
[worker-13-017] Start
info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
[worker-13-017] Run
info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
[worker-13-017] Ready
info: Appveyor.HostAgent.AppHost[0]
Updating cloud availability (buildCloudId=13): 20 worker(s) available, 0 worker(s) busy
info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
[worker-13-017] Started
info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
[worker-13-003] Start
info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
[worker-13-003] Run
info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
[worker-13-003] Ready
info: Appveyor.HostAgent.AppHost[0]
Updating cloud availability (buildCloudId=13): 20 worker(s) available, 0 worker(s) busy
info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
[worker-13-003] Started
info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
[worker-13-014] Start
info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
[worker-13-014] Run
info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
[worker-13-014] Ready
info: Appveyor.HostAgent.AppHost[0]
Updating cloud availability (buildCloudId=13): 20 worker(s) available, 0 worker(s) busy
info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
[worker-13-014] Started
info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
[worker-13-012] Start
info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
[worker-13-012] Run
info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
[worker-13-012] Ready
info: Appveyor.HostAgent.AppHost[0]
Updating cloud availability (buildCloudId=13): 20 worker(s) available, 0 worker(s) busy
info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
[worker-13-012] Started
info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
[worker-13-016] Start
info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
[worker-13-016] Run
info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
[worker-13-016] Ready
info: Appveyor.HostAgent.AppHost[0]
Updating cloud availability (buildCloudId=13): 20 worker(s) available, 0 worker(s) busy
info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
[worker-13-016] Started
info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
[worker-13-002] Start
info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
[worker-13-002] Run
info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
[worker-13-002] Ready
info: Appveyor.HostAgent.AppHost[0]
Updating cloud availability (buildCloudId=13): 20 worker(s) available, 0 worker(s) busy
info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
[worker-13-002] Started
info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
[worker-13-008] Start
info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
[worker-13-008] Run
info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
[worker-13-008] Ready
info: Appveyor.HostAgent.AppHost[0]
Updating cloud availability (buildCloudId=13): 20 worker(s) available, 0 worker(s) busy
info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
[worker-13-008] Started
info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
[worker-13-009] Start
info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
[worker-13-009] Run
info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
[worker-13-009] Ready
info: Appveyor.HostAgent.AppHost[0]
Updating cloud availability (buildCloudId=13): 20 worker(s) available, 0 worker(s) busy
info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
[worker-13-009] Started
info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
[worker-13-018] Start
info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
[worker-13-018] Run
info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
[worker-13-018] Ready
info: Appveyor.HostAgent.AppHost[0]
Updating cloud availability (buildCloudId=13): 20 worker(s) available, 0 worker(s) busy
info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
[worker-13-018] Started
info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
[worker-13-015] Start
info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
[worker-13-015] Run
info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
[worker-13-015] Ready
info: Appveyor.HostAgent.AppHost[0]
Updating cloud availability (buildCloudId=13): 20 worker(s) available, 0 worker(s) busy
info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
[worker-13-015] Started
info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
[worker-13-020] Start
info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
[worker-13-020] Run
info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
[worker-13-020] Ready
info: Appveyor.HostAgent.AppHost[0]
Updating cloud availability (buildCloudId=13): 20 worker(s) available, 0 worker(s) busy
info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
[worker-13-020] Started
info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
[worker-13-007] Start
info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
[worker-13-007] Run
info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
[worker-13-007] Ready
info: Appveyor.HostAgent.AppHost[0]
Updating cloud availability (buildCloudId=13): 20 worker(s) available, 0 worker(s) busy
info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
[worker-13-007] Started
info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
[worker-13-010] Start
info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
[worker-13-010] Run
info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
[worker-13-010] Ready
info: Appveyor.HostAgent.AppHost[0]
Updating cloud availability (buildCloudId=13): 20 worker(s) available, 0 worker(s) busy
info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
[worker-13-010] Started
info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
[worker-13-011] Start
info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
[worker-13-011] Run
info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
[worker-13-011] Ready
info: Appveyor.HostAgent.AppHost[0]
Updating cloud availability (buildCloudId=13): 20 worker(s) available, 0 worker(s) busy
info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
[worker-13-011] Started
info: Appveyor.HostAgent.AppHost[0]
Checking Host Agent connection: ba8e8de6-0a4a-48df-944e-07405eb97e16
info: Appveyor.HostAgent.AppHost[0]
Host Agent connection is alive
warn: Appveyor.HostAgent.Services.HostAgentHubClientCore[0]
Host Agent has been disconnected from AppVeyor
info: Appveyor.HostAgent.Services.HostAgentHubClientCore[0]
Host Agent will try to reconnect in 10 seconds.
info: Appveyor.HostAgent.Services.HostAgentHubClientCore[0]
Connecting to HostAgentHub...
fail: Appveyor.HostAgent.Services.HostAgentHubClientCore[0]
Error connecting Host Agent to AppVeyor: Operation timed out
info: Appveyor.HostAgent.AppHost[0]
Checking Host Agent connection: 4112a4b9-5376-4646-9967-0637e6ac331e
warn: Appveyor.HostAgent.AppHost[0]
Ping callback has not received in 20 seconds: 4112a4b9-5376-4646-9967-0637e6ac331e
warn: Appveyor.HostAgent.AppHost[0]
Host Agent connection is not responding on AppVeyor events
info: Appveyor.HostAgent.AppHost[0]
Stopping Host Agent connection...