BYOC macOS clouds go offline for no reason

Oliver Collyer's Avatar

Oliver Collyer

06 Mar, 2020 08:54 AM

I've noticed this on two Macs now. Initially I thought it was because the machines themselves were sleeping but I have verified this is not the case.

Last night all was building fine. This morning, while the Mac is still accessible across the network via ping and VNC, it shows as "Offline" in the AppVeyor server build environment section. Running a build fails as it cannot find a Mac to build on.

Nothing has changed overnight. Both server and Mac are online. I've also verified a ping succeeds from the Mac to the server.

How can I diagnose this issue? Are there logs? Is there some sort of idle timeout?

  1. 1 Posted by Oliver Collyer on 06 Mar, 2020 08:58 AM

    Oliver Collyer's Avatar

    Also, here is confirmation the service is still running, after typing brew services list:

    Name Status User Plist
    appveyor-host-agent started oliver /Users/oliver/Library/LaunchAgents/homebrew.mxcl.appveyor-host-agent.plist

  2. 2 Posted by Oliver Collyer on 06 Mar, 2020 09:08 AM

    Oliver Collyer's Avatar

    And confirmation from netstat that the Mac is still listening on port 5020:

    appveyor- 440 oliver 229u IPv4 0x9baeb5b4fa55a4a1 0t0 TCP localhost:5020 (LISTEN)
    appveyor- 440 oliver 230u IPv6 0x9baeb5b4f2c67bc9 0t0 TCP localhost:5020 (LISTEN)

    The AppVeyor server can also reach the Mac via ping.

    So from what I can tell there is nothing network-related that is causing the cloud to show up as offline.

    I suspect that restarting the service on the Mac would solve it, but I would like to understand the cause as I need this to be reliable.

  3. 3 Posted by Oliver Collyer on 06 Mar, 2020 09:41 AM

    Oliver Collyer's Avatar

    Ok, I've. been trying to upload/paste/link to the logs but your security systems won't allow me.

    Bit from the logs it appears to lose connection at some point. (I can't see what time, as the logs aren't timestamped, but this may coincide with increased network activity over my LAN during overnight backups.

    The stdout log ends at the connection failure.

    Is the agent missing some reconnection logic? After all, transitory network events do occur, but judging from the logs, it didn't try and reconnect and just gave up.

  4. 4 Posted by Oliver Collyer on 06 Mar, 2020 10:26 AM

    Oliver Collyer's Avatar

    Ok, my work around is to run this script every 60s:

    if grep "Error connecting Host Agent to AppVeyor" /usr/local/var/appveyor/host-agent/host-agent.stdout.log ; then
    brew services stop appveyor-host-agent rm /usr/local/var/appveyor/host-agent/*.log brew services start appveyor-host-agent fi

  5. 5 Posted by Oliver Collyer on 28 Apr, 2020 06:46 AM

    Oliver Collyer's Avatar

    So I've found that this also happens on Windows too.

    It's necessary to schedule the following batch file every 60s to workaround it:

    wevtutil qe "AppVeyor" | findstr /C:"Error connecting Host Agent to AppVeyor" || exit 0
    PowerShell -NoProfile -ExecutionPolicy Bypass -Command "& Stop-Service Appveyor.HostAgent"
    wevtutil cl "AppVeyor"
    PowerShell -NoProfile -ExecutionPolicy Bypass -Command "& Start-Service Appveyor.HostAgent"

    This can be created as a task in task scheduler, and needs to be given admin privileges.

    Perhaps the service applet can be improved so that it periodically attempts to reconnect instead of giving up forever? Otherwise, it seems to me that as soon as there is a network outage of a sufficiently long period of time the whole thing just stops without these workarounds.

    Hope these workarounds help someone.

  6. 6 Posted by Oliver Collyer on 09 Sep, 2022 11:53 AM

    Oliver Collyer's Avatar

    I wonder if there can be improvements to the cmdlet for this?

    Here is how things currently are, 2.5 years later. This is the output from the host agent on a Mac when I stop the appveyor server service (Windows) for a minute or so, and then start it again.

    As you can see, the host agen, never recovers, it stays forever on "Stopping Host Agent connection...".

    One has to manually stop/start the service, to enable it to reconnect. Clearly you have put in some disconnection detection and reconnection logic, but it would appear that it isn't working in this case.

    info: Appveyor.HostAgent.Services.HostAgentHubClientCore[0]
          Connected to HostAgentHub
    info: Appveyor.HostAgent.AppHost[0]
          Registering Host Agent
    info: Appveyor.HostAgent.AppHost[0]
          Host Agent registered
    info: Appveyor.HostAgent.AppHost[0]
          Registering all clouds received from AppVeyor
    info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
          Configuring cloud 13
    info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
          [worker-13-001] Start
    info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
          [worker-13-001] Run
    info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
          [worker-13-001] Ready
    info: Appveyor.HostAgent.AppHost[0]
          Updating cloud availability (buildCloudId=13): 20 worker(s) available, 0 worker(s) busy
    info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
          [worker-13-001] Started
    info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
          [worker-13-013] Start
    info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
          [worker-13-013] Run
    info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
          [worker-13-013] Ready
    info: Appveyor.HostAgent.AppHost[0]
          Updating cloud availability (buildCloudId=13): 20 worker(s) available, 0 worker(s) busy
    info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
          [worker-13-013] Started
    info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
          [worker-13-005] Start
    info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
          [worker-13-005] Run
    info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
          [worker-13-005] Ready
    info: Appveyor.HostAgent.AppHost[0]
          Updating cloud availability (buildCloudId=13): 20 worker(s) available, 0 worker(s) busy
    info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
          [worker-13-005] Started
    info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
          [worker-13-004] Start
    info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
          [worker-13-004] Run
    info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
          [worker-13-004] Ready
    info: Appveyor.HostAgent.AppHost[0]
          Updating cloud availability (buildCloudId=13): 20 worker(s) available, 0 worker(s) busy
    info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
          [worker-13-004] Started
    info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
          [worker-13-019] Start
    info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
          [worker-13-019] Run
    info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
          [worker-13-019] Ready
    info: Appveyor.HostAgent.AppHost[0]
          Updating cloud availability (buildCloudId=13): 20 worker(s) available, 0 worker(s) busy
    info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
          [worker-13-019] Started
    info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
          [worker-13-006] Start
    info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
          [worker-13-006] Run
    info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
          [worker-13-006] Ready
    info: Appveyor.HostAgent.AppHost[0]
          Updating cloud availability (buildCloudId=13): 20 worker(s) available, 0 worker(s) busy
    info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
          [worker-13-006] Started
    info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
          [worker-13-017] Start
    info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
          [worker-13-017] Run
    info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
          [worker-13-017] Ready
    info: Appveyor.HostAgent.AppHost[0]
          Updating cloud availability (buildCloudId=13): 20 worker(s) available, 0 worker(s) busy
    info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
          [worker-13-017] Started
    info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
          [worker-13-003] Start
    info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
          [worker-13-003] Run
    info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
          [worker-13-003] Ready
    info: Appveyor.HostAgent.AppHost[0]
          Updating cloud availability (buildCloudId=13): 20 worker(s) available, 0 worker(s) busy
    info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
          [worker-13-003] Started
    info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
          [worker-13-014] Start
    info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
          [worker-13-014] Run
    info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
          [worker-13-014] Ready
    info: Appveyor.HostAgent.AppHost[0]
          Updating cloud availability (buildCloudId=13): 20 worker(s) available, 0 worker(s) busy
    info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
          [worker-13-014] Started
    info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
          [worker-13-012] Start
    info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
          [worker-13-012] Run
    info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
          [worker-13-012] Ready
    info: Appveyor.HostAgent.AppHost[0]
          Updating cloud availability (buildCloudId=13): 20 worker(s) available, 0 worker(s) busy
    info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
          [worker-13-012] Started
    info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
          [worker-13-016] Start
    info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
          [worker-13-016] Run
    info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
          [worker-13-016] Ready
    info: Appveyor.HostAgent.AppHost[0]
          Updating cloud availability (buildCloudId=13): 20 worker(s) available, 0 worker(s) busy
    info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
          [worker-13-016] Started
    info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
          [worker-13-002] Start
    info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
          [worker-13-002] Run
    info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
          [worker-13-002] Ready
    info: Appveyor.HostAgent.AppHost[0]
          Updating cloud availability (buildCloudId=13): 20 worker(s) available, 0 worker(s) busy
    info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
          [worker-13-002] Started
    info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
          [worker-13-008] Start
    info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
          [worker-13-008] Run
    info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
          [worker-13-008] Ready
    info: Appveyor.HostAgent.AppHost[0]
          Updating cloud availability (buildCloudId=13): 20 worker(s) available, 0 worker(s) busy
    info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
          [worker-13-008] Started
    info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
          [worker-13-009] Start
    info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
          [worker-13-009] Run
    info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
          [worker-13-009] Ready
    info: Appveyor.HostAgent.AppHost[0]
          Updating cloud availability (buildCloudId=13): 20 worker(s) available, 0 worker(s) busy
    info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
          [worker-13-009] Started
    info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
          [worker-13-018] Start
    info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
          [worker-13-018] Run
    info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
          [worker-13-018] Ready
    info: Appveyor.HostAgent.AppHost[0]
          Updating cloud availability (buildCloudId=13): 20 worker(s) available, 0 worker(s) busy
    info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
          [worker-13-018] Started
    info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
          [worker-13-015] Start
    info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
          [worker-13-015] Run
    info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
          [worker-13-015] Ready
    info: Appveyor.HostAgent.AppHost[0]
          Updating cloud availability (buildCloudId=13): 20 worker(s) available, 0 worker(s) busy
    info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
          [worker-13-015] Started
    info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
          [worker-13-020] Start
    info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
          [worker-13-020] Run
    info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
          [worker-13-020] Ready
    info: Appveyor.HostAgent.AppHost[0]
          Updating cloud availability (buildCloudId=13): 20 worker(s) available, 0 worker(s) busy
    info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
          [worker-13-020] Started
    info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
          [worker-13-007] Start
    info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
          [worker-13-007] Run
    info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
          [worker-13-007] Ready
    info: Appveyor.HostAgent.AppHost[0]
          Updating cloud availability (buildCloudId=13): 20 worker(s) available, 0 worker(s) busy
    info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
          [worker-13-007] Started
    info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
          [worker-13-010] Start
    info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
          [worker-13-010] Run
    info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
          [worker-13-010] Ready
    info: Appveyor.HostAgent.AppHost[0]
          Updating cloud availability (buildCloudId=13): 20 worker(s) available, 0 worker(s) busy
    info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
          [worker-13-010] Started
    info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
          [worker-13-011] Start
    info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
          [worker-13-011] Run
    info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
          [worker-13-011] Ready
    info: Appveyor.HostAgent.AppHost[0]
          Updating cloud availability (buildCloudId=13): 20 worker(s) available, 0 worker(s) busy
    info: Appveyor.HostAgent.BuildAgentProcess.WorkerCloud[0]
          [worker-13-011] Started
    info: Appveyor.HostAgent.AppHost[0]
          Checking Host Agent connection: ba8e8de6-0a4a-48df-944e-07405eb97e16
    info: Appveyor.HostAgent.AppHost[0]
          Host Agent connection is alive
    warn: Appveyor.HostAgent.Services.HostAgentHubClientCore[0]
          Host Agent has been disconnected from AppVeyor
    info: Appveyor.HostAgent.Services.HostAgentHubClientCore[0]
          Host Agent will try to reconnect in 10 seconds.
    info: Appveyor.HostAgent.Services.HostAgentHubClientCore[0]
          Connecting to HostAgentHub...
    fail: Appveyor.HostAgent.Services.HostAgentHubClientCore[0]
          Error connecting Host Agent to AppVeyor: Operation timed out
    info: Appveyor.HostAgent.AppHost[0]
          Checking Host Agent connection: 4112a4b9-5376-4646-9967-0637e6ac331e
    warn: Appveyor.HostAgent.AppHost[0]
          Ping callback has not received in 20 seconds: 4112a4b9-5376-4646-9967-0637e6ac331e
    warn: Appveyor.HostAgent.AppHost[0]
          Host Agent connection is not responding on AppVeyor events
    info: Appveyor.HostAgent.AppHost[0]
          Stopping Host Agent connection...

Reply to this discussion

Internal reply

Formatting help / Preview (switch to plain text) No formatting (switch to Markdown)

Attaching KB article:

»

Attached Files

You can attach files up to 10MB

If you don't have an account yet, we need to confirm you're human and not a machine trying to post spam.

Keyboard shortcuts

Generic

? Show this help
ESC Blurs the current field

Comment Form

r Focus the comment reply box
^ + ↩ Submit the comment

You can use Command ⌘ instead of Control ^ on Mac

 

26 Sep, 2024 03:49 PM
26 Sep, 2024 09:02 AM
25 Sep, 2024 07:07 PM
24 Sep, 2024 08:39 PM
24 Sep, 2024 06:47 AM
20 Sep, 2024 05:50 PM