Intermittent network problems on Appveyor

Maxwell Grady's Avatar

Maxwell Grady

08 Feb, 2019 06:17 PM

All week we have been noticing issues with Appveyor's network connectivity

Builds have failed fora variety of reasons:

  • ssh: Could not resolve hostname github.com: Name or service not known
  • HTTPSConnectionPool(host='packages.enthought.com', port=443): Max retries exceeded with url: <********************************> (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x00000000043E6278>: Failed to establish a new connection: [Errno 11001] getaddrinfo failed',))
    
  1. Support Staff 1 Posted by Ilya Finkelshte... on 08 Feb, 2019 08:01 PM

    Ilya Finkelshteyn's Avatar

    Thank you for the report. We just yesterday noticed that nodes which host Visual Studio 2013 image have intermittent network issues. Looking into it now.

  2. Support Staff 2 Posted by Ilya Finkelshte... on 09 Feb, 2019 02:24 AM

    Ilya Finkelshteyn's Avatar

    We replaced some network equipment we suspect was the root cause. Because issue is intermittent we cannot say for sure it s fixed. Please let us know if you see it again.

  3. 3 Posted by Geoffrey on 12 Feb, 2019 02:34 AM

    Geoffrey's Avatar

    This is happening to us too.
    It seems to happen in "bursts" where multiple builds will fail in a row because of network issues.

    For example, installing flyway using choco will sometimes result in:

    The remote file either doesn't exist, is unauthorized, or is forbidden for url 'https://repo1.maven.org/maven2/org/flywaydb/flyway-commandline/5.2.1/flyway-commandline-5.2.1.zip'. Exception calling "GetResponse" with "0" argument(s): "The remote server returned an error: (403) Forbidden."
    
    Manually reaching the url through a brower is working as expected.

    Any suggestions?

    Thank you

  4. Support Staff 4 Posted by Ilya Finkelshte... on 12 Feb, 2019 02:53 AM

    Ilya Finkelshteyn's Avatar

    (403) Forbidden is clearly server-side response and unrelated to the networking issue discussed in this topic.

    Question is why https://repo1.maven.org returns 403 from time to time. Maybe it is just one of their frontend servers misbehaves and you hit it during those bursts. Or maybe they do not like connections from specific datacenters (builds happened in a few different datacenters). It can be number of other reasons as well.

    If you send number of links links to both failed and successful builds, we can try to find some commonalities which can help you to root cause.

    But I would also send a request to the maven.org support, because they should know better why this server returns 403.

  5. 5 Posted by Geoffrey on 12 Feb, 2019 03:12 AM

    Geoffrey's Avatar

    Thanks for your answer.

    Here are 2 failed builds in a row: 22295366, 22295507
    And a subsequent successful build: 22295633

  6. 6 Posted by Geoffrey on 12 Feb, 2019 10:13 PM

    Geoffrey's Avatar

    Just a follow-up:
    We still believe there is still transiant networking issues.

    One of our build just failed because:

    Error Message:
     System.Net.WebException : The remote name could not be resolved: 'api.xero.com'
    
    Build id: 22322147
  7. Support Staff 7 Posted by Ilya Finkelshte... on 13 Feb, 2019 01:40 AM

    Ilya Finkelshteyn's Avatar

    Builds 22295366 and 22295507 indeed happened in the same network segment in Liquid Web (Lancing, MI) datacenter, behind public IP 67.225.164.54. I tried to download https://repo1.maven.org/maven2/org/flywaydb/flyway-commandline/5.2.1/flyway-commandline-5.2.1.zip from builds behind the same IP and it went fine.

    DNS resolution issue happened on VM in AWS West US (Oregon) datacenter. AppVeyor VMs are using Google DNS servers (8.8.8.8 and 10.10.10.10).

    There were no incidents in on https://status.liquidweb.com/ and https://status.aws.amazon.com/ and we have no other network related complains. Again, this ticket originally was created regarding issues specific to the network segment serving specifically Linux and Visual Studio 2013 servers.

    Network issues are very difficult to reason about as we cannot control networking end-to-end. However in this case my feeling is that it was issues on the "other side": repo1.maven.org server and on DNS server hosting api.xero.com.

    What I would do is to make your code more tolerant to this kind if issues by at least adding retries. For scripting, you can use appveyor-retry utility. For tests (I see you use xunit) you can use https://github.com/giggio/xunit-retry. I would not recommend to add retries everywhere, but at least in places where you see network flakiness.

    Also build cache greatly decrease your dependency on external services.

  8. 8 Posted by Geoffrey on 13 Feb, 2019 01:53 AM

    Geoffrey's Avatar

    Thanks for taking the time to investigate.

    We'll look into implementing your suggestions.

    Have a good day.

Reply to this discussion

Internal reply

Formatting help / Preview (switch to plain text) No formatting (switch to Markdown)

Attaching KB article:

»

Attached Files

You can attach files up to 10MB

If you don't have an account yet, we need to confirm you're human and not a machine trying to post spam.

Keyboard shortcuts

Generic

? Show this help
ESC Blurs the current field

Comment Form

r Focus the comment reply box
^ + ↩ Submit the comment

You can use Command ⌘ instead of Control ^ on Mac

Recent Discussions

21 Aug, 2019 11:06 PM
21 Aug, 2019 02:08 PM
21 Aug, 2019 01:37 PM
21 Aug, 2019 10:21 AM
21 Aug, 2019 09:10 AM

 

21 Aug, 2019 05:56 AM
21 Aug, 2019 03:08 AM
20 Aug, 2019 11:33 PM
20 Aug, 2019 08:07 PM
20 Aug, 2019 05:50 PM
20 Aug, 2019 12:35 PM