Sporadic ECONNRESETs during npm install grunt-contrib-imagemin - connectivity issues to github?

Anthony Seddon's Avatar

Anthony Seddon

07 Apr, 2015 03:25 PM

Is there an issue connecting to GitHub from the AV build machines?

We're getting sporadic ECONNRESET errors when running npm install grunt-contrib-imagemin. I've setup a reproducable example yml file as follows using pngquant-bin, a dependency of grunt-contrib-imagemin:

version: 1.0.{build}
install:
- ps: Install-Product node 0
build_script:
- node --version
- npm --version
- npm install grunt pngquant-bin

The vast majority of the time, we get ECONNRESETs when the installer downloads binaries from github (see the line with ‼ read ECONNRESET:

Build started
git clone -q --branch=master https://bitbucket.org/livsmarter/av-build-problem.git C:\projects\av-build-problem
git checkout -qf f1efe686b3d060a7a624b0b2196b21b5d97bd9f3
Running Install scripts
Install-Product node 0
Uninstalling node 0.10.38 (x86)... 
Installing node 0.12.2 (x86)... 
node --version
v0.12.2
npm --version
2.7.4
npm install grunt pngquant-bin


> pngquant-bin@2.0.3 postinstall C:\projects\av-build-problem\node_modules\pngquant-bin > node lib/install.js


‼ read ECONNRESET ‼ pngquant pre-build test failed i compiling from source × Error: pngquant failed to build, make sure that libpng-dev is installed at ChildProcess.exithandler (child_process.js:751:12) at ChildProcess.emit (events.js:110:17) at maybeClose (child_process.js:1015:16) at Process.ChildProcess._handle.onexit (child_process.js:1087:5) grunt@0.4.5 node_modules\grunt ├── dateformat@1.0.2-1.2.3 ├── which@1.0.9 ├── eventemitter2@0.4.14 ├── getobject@0.1.0 ├── colors@0.6.2 ├── rimraf@2.2.8 ├── async@0.1.22 ├── grunt-legacy-util@0.2.0 ├── hooker@0.2.3 ├── exit@0.1.2 ├── nopt@1.0.10 (abbrev@1.0.5) ├── minimatch@0.2.14 (sigmund@1.0.0, lru-cache@2.5.0) ├── glob@3.1.21 (inherits@1.0.0, graceful-fs@1.2.3) ├── lodash@0.9.2 ├── coffee-script@1.3.3 ├── underscore.string@2.2.1 ├── iconv-lite@0.2.11 ├── findup-sync@0.1.3 (glob@3.2.11, lodash@2.4.1) ├── grunt-legacy-log@0.1.1 (underscore.string@2.3.3, lodash@2.4.1) └── js-yaml@2.0.5 (argparse@0.1.16, esprima@1.0.4)


pngquant-bin@2.0.3 node_modules\pngquant-bin ├── logalot@2.1.0 (figures@1.3.5, squeak@1.2.0) ├── bin-build@2.1.1 (exec-series@1.0.1, url-regex@2.1.2, tempfile@1.1.0, archive-type@2.0.0, rimraf@2.3.2, decompress@2.2.1, download@3.3.0) └── bin-wrapper@2.1.3 (os-filter-obj@1.0.3, is-path-global@1.0.1, download-status@2.1.1, npm-installed@1.0.0, bin-check@1.0.0, bin-version-check@2.1.0, globby@1.2.0, lnfs@1.0.0, download@3.3.0) dir node_modules\pngquant-bin\vendor Volume in drive C has no label. Volume Serial Number is F074-BCDF


Directory of C:\projects\av-build-problem\node_modules\pngquant-bin


File Not Found Command exited with code 1
During the postinstall script, npm tries to download a file from github, specifically https://raw.githubusercontent.com/imagemin/pngquant-bin/v2.0.3/vend... and it looks like this is where the ECONNRESET happens. Checking the contents of C:\projects\av-build-problem\node_modules\pngquant-bin\vendor verifies this.

Interestingly (or frustratingly), it works some of the time:

Build started
git clone -q --branch=master https://bitbucket.org/livsmarter/av-build-problem.git C:\projects\av-build-problem
git checkout -qf f1efe686b3d0
Running Install scripts
Install-Product node 0
Uninstalling node 0.10.38 (x86)... 
Installing node 0.12.2 (x86)... 
node --version
v0.12.2
npm --version
2.7.4
npm install grunt pngquant-bin


> pngquant-bin@2.0.3 postinstall C:\projects\av-build-problem\node_modules\pngquant-bin > node lib/install.js


√ pngquant pre-build test passed successfully grunt@0.4.5 node_modules\grunt ├── dateformat@1.0.2-1.2.3 ├── which@1.0.9 ├── getobject@0.1.0 ├── eventemitter2@0.4.14 ├── colors@0.6.2 ├── rimraf@2.2.8 ├── async@0.1.22 ├── grunt-legacy-util@0.2.0 ├── hooker@0.2.3 ├── exit@0.1.2 ├── nopt@1.0.10 (abbrev@1.0.5) ├── minimatch@0.2.14 (sigmund@1.0.0, lru-cache@2.5.0) ├── glob@3.1.21 (inherits@1.0.0, graceful-fs@1.2.3) ├── lodash@0.9.2 ├── coffee-script@1.3.3 ├── underscore.string@2.2.1 ├── iconv-lite@0.2.11 ├── grunt-legacy-log@0.1.1 (underscore.string@2.3.3, lodash@2.4.1) ├── findup-sync@0.1.3 (glob@3.2.11, lodash@2.4.1) └── js-yaml@2.0.5 (argparse@0.1.16, esprima@1.0.4)


pngquant-bin@2.0.3 node_modules\pngquant-bin ├── logalot@2.1.0 (figures@1.3.5, squeak@1.2.0) ├── bin-build@2.1.1 (exec-series@1.0.1, url-regex@2.1.2, tempfile@1.1.0, archive-type@2.0.0, rimraf@2.3.2, decompress@2.2.1, download@3.3.0) └── bin-wrapper@2.1.3 (os-filter-obj@1.0.3, is-path-global@1.0.1, download-status@2.1.1, npm-installed@1.0.0, bin-check@1.0.0, bin-version-check@2.1.0, globby@1.2.0, lnfs@1.0.0, download@3.3.0) dir node_modules\pngquant-bin\vendor Volume in drive C has no label. Volume Serial Number is F074-BCDF


Directory of C:\projects\av-build-problem\node_modules\pngquant-bin\vendor


04/07/2015 02:58 PM <DIR> . 04/07/2015 02:58 PM <DIR> .. 04/07/2015 02:58 PM 544,355 pngquant.exe 1 File(s) 544,355 bytes 2 Dir(s) 56,997,486,592 bytes free Discovering tests...OK Build success

I've been able to reproduce the issue with optipng-bin, gifsicle, jpegtran-bin and grunt-contrib-imagemin which is what we use on our main build and is causing us constant build failures since it has dependencies on the other 4 packages I've mentioned.

  1. Support Staff 1 Posted by Feodor Fitsner on 07 Apr, 2015 05:12 PM

    Feodor Fitsner's Avatar

    Seems something's indeed taking place as there is another similar report with the same message.

    So, you both are running on Pro environment - to check if this is an environment (or location/region) specific issue it would be interesting to see if you get the same errors while running on Azure. To force your builds running on Azure you can choose "unstable" image.

    This may be a coincidence, but both reports came after we updated default node.js on build workers to 0.10.38 (though I see you are running on 0.12.x branch). It might be worth checking if you get error while switching back to 0.10.37 with Install-Product node 0.10.37 PS command.

    Another thing to try is build cache node modules, so they are not fetched from remote locations every build.

  2. 2 Posted by giacomo.tag on 07 Apr, 2015 06:48 PM

    giacomo.tag's Avatar

    it would be interesting to see if you get the same errors while running on Azure

    Let me try that

    This may be a coincidence, but both reports came after we updated default node.js on build workers to 0.10.38 (though I see you are running on 0.12.x branch). It might be worth checking if you get error while switching back to 0.10.37 with Install-Product node 0.10.37 PS command.

    It's definitely related to a change happened ~1-2 weeks ago, so it could be that, let me check

    Another thing to try is build cache node modules, so they are not fetched from remote locations every build.

    I can't actually do that because the task I am running always gets the newest file from the net, no way to cache them

  3. 3 Posted by giacomo.tag on 07 Apr, 2015 06:57 PM

    giacomo.tag's Avatar

    I actually checked my logs, And I noticed a difference in the npm version.

    It worked with version 2.7.4 and it stopped working with 2.7.5. That could be the problem? The node version was always the same (0.10.38) both for good and bad builds.

  4. 4 Posted by giacomo.tag on 07 Apr, 2015 06:59 PM

    giacomo.tag's Avatar

    I take back what I said. There are builds that fail for the same error with npm version 2.7.4

  5. Support Staff 5 Posted by Feodor Fitsner on 07 Apr, 2015 07:01 PM

    Feodor Fitsner's Avatar

    Yeah, I think npm coming with specific node version should be the same.

  6. 6 Posted by Anthony Seddon on 07 Apr, 2015 07:52 PM

    Anthony Seddon's Avatar

    I've tested with both the Unstable platform and node 0.10.37 and I'm still getting build failures due to connectivity resets.

    Regards,
    Ant

  7. Support Staff 7 Posted by Feodor Fitsner on 07 Apr, 2015 08:11 PM

    Feodor Fitsner's Avatar

    OK, could you please also test on "Previous Windows Server 2012 R2" image which was in effect between March 21 and April 4?

    - Feodor

  8. 8 Posted by Anthony Seddon on 07 Apr, 2015 09:46 PM

    Anthony Seddon's Avatar

    Builds on the Previous Win2K image seem to work as expected however they are very slow.

    I'm just testing with some caching changes and both builds I've got queued are now "hanged" in a Queued state. Cancelling and restarting doesn't seem to move the state to In Progress either.

  9. 9 Posted by Anthony Seddon on 07 Apr, 2015 09:48 PM

    Anthony Seddon's Avatar

    Typical! One of the queued builds kicked off as soon as I posted that last comment...

  10. Support Staff 10 Posted by Feodor Fitsner on 07 Apr, 2015 11:40 PM

    Feodor Fitsner's Avatar

    Have you run it on "previous" environment several times? Are you still getting the error on default image?

  11. 11 Posted by Anthony Seddon on 08 Apr, 2015 04:52 AM

    Anthony Seddon's Avatar

    Yes, it works consistently for my test build and our main build on the Previous image with no issues.

  12. Support Staff 12 Posted by Feodor Fitsner on 08 Apr, 2015 05:12 AM

    Feodor Fitsner's Avatar

    Is it possible for you to share your test on public repo, so I can play with it?

  13. 13 Posted by Anthony Seddon on 08 Apr, 2015 06:21 AM

    Anthony Seddon's Avatar

    I've given you access to the repo - https://bitbucket.org/livsmarter/av-build-problem

  14. 14 Posted by giacomo.tag on 08 Apr, 2015 01:21 PM

    giacomo.tag's Avatar

    I tried on "previous" environment a couple of times, but still no success.

  15. Support Staff 15 Posted by Feodor Fitsner on 08 Apr, 2015 01:22 PM

    Feodor Fitsner's Avatar

    Could you please provide a simple test project as well that reproduces the issue?

    - Feodor

  16. 16 Posted by giacomo.tag on 08 Apr, 2015 03:41 PM

    giacomo.tag's Avatar

    I am trying to, but I cannot reproduce it: https://ci.appveyor.com/project/itajaja/appveyor-troubleshoot there is no ECONNRESET error here, and I don't know what could be the difference

  17. 17 Posted by giacomo.tag on 08 Apr, 2015 03:42 PM

    giacomo.tag's Avatar

    A difference might be that in the full repository, I am calling npm install in a subfolder from a grunt task in the root folder. Here the ECONNRESET error happens. In the second case, npm install is called directly and there is no ECONNRESET

  18. Support Staff 18 Posted by Feodor Fitsner on 08 Apr, 2015 04:12 PM

    Feodor Fitsner's Avatar

    Could you try reproducing the same folders structure in a test project?

  19. 19 Posted by giacomo.tag on 08 Apr, 2015 05:04 PM

    giacomo.tag's Avatar

    For some reason, I am unable to reproduce it:

    https://ci.appveyor.com/project/itajaja/appveyor-troubleshoot

    I don't know what the difference could be sincerely, except that this one is not on a "PRO" plan but it's on a basic one

  20. Support Staff 20 Posted by Feodor Fitsner on 08 Apr, 2015 05:15 PM

    Feodor Fitsner's Avatar

    So, looks like it does not work on Pro environment and does work on Azure?

  21. 21 Posted by giacomo.tag on 08 Apr, 2015 05:30 PM

    giacomo.tag's Avatar

    If the non-pro is on Azure instead of Server 2012, yes

  22. Support Staff 22 Posted by Feodor Fitsner on 08 Apr, 2015 05:34 PM

    Feodor Fitsner's Avatar

    Interesting. Will bring back previous image on Pro to see if it's reproducible.

    Just to confirm - the same build sequence was working before (presumably the latest update on April 4)?

  23. Support Staff 23 Posted by Feodor Fitsner on 08 Apr, 2015 06:27 PM

    Feodor Fitsner's Avatar

    Looks like it's not related to the latest build worker updates. I'm testing with "March 21" image on Pro environment (which was in effect prior the latest update on April 4) and getting the same sporadic errors. It could be reproduced with a simple:

    npm install imagemin-gifsicle
    

    Is there any way to better understand what causes this issue?

  24. 24 Posted by Anthony Seddon on 08 Apr, 2015 06:54 PM

    Anthony Seddon's Avatar

    I'm not an npm/node expert but it looks like it's when it tries to download
    the bin files from github in the index.js. Could github be rejecting the
    connection?

  25. Support Staff 25 Posted by Feodor Fitsner on 08 Apr, 2015 08:41 PM

    Feodor Fitsner's Avatar

    If that's true what URL is that? Can we have a super simple JavaScript code sample from that index.js trying to download that file? That would be much easier to troubleshoot localized issue.

  26. 26 Posted by Anthony Seddon on 08 Apr, 2015 09:49 PM

    Anthony Seddon's Avatar

    I've setup some samples in the repo we've been using which try to download https://raw.githubusercontent.com/imagemin/gifsicle-bin/v2.0.1/vend... emulating what the gifsicle-bin npm package does but with nodejs and PS.

    Each sample is on a separate branch and I can demonstrate that builds consistently succeed on the previous image but fail on the current image.

    Interestingly, if I do a "Re-build commit" following a failed download/build, the download succeeds! Any ideas why that could be?

    Maybe a shot in the dark but this post (https://youtrack.jetbrains.com/issue/TW-35180) on JetBrains bug tracker talks about some known networking issues between Azure and GitHub. Could it be a similar issue?

  27. Support Staff 27 Posted by Feodor Fitsner on 08 Apr, 2015 11:47 PM

    Feodor Fitsner's Avatar

    Great, let me try that.

    I'd like to clarify regarding environments. When you have os: in appveyor.yml with any value the build is run on Azure. So when you specify os: Previous Windows Server 2012 R2 or os: unstable or os: <something> it's all Azure. When there is no os specified then it's Hyper-V environment (non-Azure data center) if you are on Pro plan.

    So, seems the problem is specific to Pro (Hyper-V) environment only, i.e. non-Azure data center. Why it's works there on the second run - don't know yet - maybe it's specific to some Hyper-V host, because every time the build runs on a different host.

  28. Support Staff 28 Posted by Feodor Fitsner on 09 Apr, 2015 01:38 AM

    Feodor Fitsner's Avatar

    I can confirm the issue is reproducible on Pro environment with both ps-sample and nodejs-sample samples: https://ci.appveyor.com/project/FeodorFitsner/av-build-problem/history

    The question now is whether it's AppVeyor data center issue of GitHub data center issue. I'm going to contact our DC first.

    In the meantime it would be great to implement some kind of re-try mechanism. Too bad npm install returns 0 exit code in all those cases. Have to find a different workaround for this issue.

  29. 29 Posted by giacomo.tag on 12 Apr, 2015 10:03 PM

    giacomo.tag's Avatar

    Any news on the issue?

  30. Support Staff 30 Posted by Feodor Fitsner on 12 Apr, 2015 10:29 PM

    Feodor Fitsner's Avatar

    Yes, good news.

    First, we sent support tickets to both SoftLayer and GitHub. They both asked for traceroutes. After looking into traceroutes SoftLayer suggested that it might be an issue on GitHub side and GitHub just didn't respond back.

    In the meantime, we performed the tests in different data center (within SoftLayer) and the problem did not reproduce there. Clearly, there is some DC-wide (Dallas) connectivity problem with githubusercontent.com web site.

    We decided to move all Pro environment hosts to a different data center. With this move we not just fixed that connectivity issue but upgraded hardware as well! Now build workers are featuring the latest Xeon 2690 v3 (Haswell) processors and have 2,500 MB of RAM.

    We are going to gradually trigger new hosts tomorrow (Monday), so your builds will start working again. Will send notification to AppVeyor technical updates mailing list with updated IP ranges.

Comments are currently closed for this discussion. You can start a new one.

Keyboard shortcuts

Generic

? Show this help
ESC Blurs the current field

Comment Form

r Focus the comment reply box
^ + ↩ Submit the comment

You can use Command ⌘ instead of Control ^ on Mac