Sporadic ECONNRESETs during npm install grunt-contrib-imagemin - connectivity issues to github?
Is there an issue connecting to GitHub from the AV build machines?
We're getting sporadic ECONNRESET errors when running npm install grunt-contrib-imagemin. I've setup a reproducable example yml file as follows using pngquant-bin, a dependency of grunt-contrib-imagemin:
version: 1.0.{build}
install:
- ps: Install-Product node 0
build_script:
- node --version
- npm --version
- npm install grunt pngquant-bin
The vast majority of the time, we get ECONNRESETs when the
installer downloads binaries from github (see the line with
‼ read ECONNRESET:
Build started
git clone -q --branch=master https://bitbucket.org/livsmarter/av-build-problem.git C:\projects\av-build-problem
git checkout -qf f1efe686b3d060a7a624b0b2196b21b5d97bd9f3
Running Install scripts
Install-Product node 0
Uninstalling node 0.10.38 (x86)...
Installing node 0.12.2 (x86)...
node --version
v0.12.2
npm --version
2.7.4
npm install grunt pngquant-bin
> pngquant-bin@2.0.3 postinstall C:\projects\av-build-problem\node_modules\pngquant-bin
> node lib/install.js
‼ read ECONNRESET
‼ pngquant pre-build test failed
i compiling from source
× Error: pngquant failed to build, make sure that libpng-dev is installed
at ChildProcess.exithandler (child_process.js:751:12)
at ChildProcess.emit (events.js:110:17)
at maybeClose (child_process.js:1015:16)
at Process.ChildProcess._handle.onexit (child_process.js:1087:5)
grunt@0.4.5 node_modules\grunt
├── dateformat@1.0.2-1.2.3
├── which@1.0.9
├── eventemitter2@0.4.14
├── getobject@0.1.0
├── colors@0.6.2
├── rimraf@2.2.8
├── async@0.1.22
├── grunt-legacy-util@0.2.0
├── hooker@0.2.3
├── exit@0.1.2
├── nopt@1.0.10 (abbrev@1.0.5)
├── minimatch@0.2.14 (sigmund@1.0.0, lru-cache@2.5.0)
├── glob@3.1.21 (inherits@1.0.0, graceful-fs@1.2.3)
├── lodash@0.9.2
├── coffee-script@1.3.3
├── underscore.string@2.2.1
├── iconv-lite@0.2.11
├── findup-sync@0.1.3 (glob@3.2.11, lodash@2.4.1)
├── grunt-legacy-log@0.1.1 (underscore.string@2.3.3, lodash@2.4.1)
└── js-yaml@2.0.5 (argparse@0.1.16, esprima@1.0.4)
pngquant-bin@2.0.3 node_modules\pngquant-bin
├── logalot@2.1.0 (figures@1.3.5, squeak@1.2.0)
├── bin-build@2.1.1 (exec-series@1.0.1, url-regex@2.1.2, tempfile@1.1.0, archive-type@2.0.0, rimraf@2.3.2, decompress@2.2.1, download@3.3.0)
└── bin-wrapper@2.1.3 (os-filter-obj@1.0.3, is-path-global@1.0.1, download-status@2.1.1, npm-installed@1.0.0, bin-check@1.0.0, bin-version-check@2.1.0, globby@1.2.0, lnfs@1.0.0, download@3.3.0)
dir node_modules\pngquant-bin\vendor
Volume in drive C has no label.
Volume Serial Number is F074-BCDF
Directory of C:\projects\av-build-problem\node_modules\pngquant-bin
File Not Found
Command exited with code 1
During the postinstall script, npm tries to download a file from
github, specifically
https://raw.githubusercontent.com/imagemin/pngquant-bin/v2.0.3/vend...
and it looks like this is where the ECONNRESET happens. Checking
the contents of
C:\projects\av-build-problem\node_modules\pngquant-bin\vendor
verifies this.
Interestingly (or frustratingly), it works some of the
time:
Build started
git clone -q --branch=master https://bitbucket.org/livsmarter/av-build-problem.git C:\projects\av-build-problem
git checkout -qf f1efe686b3d0
Running Install scripts
Install-Product node 0
Uninstalling node 0.10.38 (x86)...
Installing node 0.12.2 (x86)...
node --version
v0.12.2
npm --version
2.7.4
npm install grunt pngquant-bin
> pngquant-bin@2.0.3 postinstall C:\projects\av-build-problem\node_modules\pngquant-bin
> node lib/install.js
√ pngquant pre-build test passed successfully
grunt@0.4.5 node_modules\grunt
├── dateformat@1.0.2-1.2.3
├── which@1.0.9
├── getobject@0.1.0
├── eventemitter2@0.4.14
├── colors@0.6.2
├── rimraf@2.2.8
├── async@0.1.22
├── grunt-legacy-util@0.2.0
├── hooker@0.2.3
├── exit@0.1.2
├── nopt@1.0.10 (abbrev@1.0.5)
├── minimatch@0.2.14 (sigmund@1.0.0, lru-cache@2.5.0)
├── glob@3.1.21 (inherits@1.0.0, graceful-fs@1.2.3)
├── lodash@0.9.2
├── coffee-script@1.3.3
├── underscore.string@2.2.1
├── iconv-lite@0.2.11
├── grunt-legacy-log@0.1.1 (underscore.string@2.3.3, lodash@2.4.1)
├── findup-sync@0.1.3 (glob@3.2.11, lodash@2.4.1)
└── js-yaml@2.0.5 (argparse@0.1.16, esprima@1.0.4)
pngquant-bin@2.0.3 node_modules\pngquant-bin
├── logalot@2.1.0 (figures@1.3.5, squeak@1.2.0)
├── bin-build@2.1.1 (exec-series@1.0.1, url-regex@2.1.2, tempfile@1.1.0, archive-type@2.0.0, rimraf@2.3.2, decompress@2.2.1, download@3.3.0)
└── bin-wrapper@2.1.3 (os-filter-obj@1.0.3, is-path-global@1.0.1, download-status@2.1.1, npm-installed@1.0.0, bin-check@1.0.0, bin-version-check@2.1.0, globby@1.2.0, lnfs@1.0.0, download@3.3.0)
dir node_modules\pngquant-bin\vendor
Volume in drive C has no label.
Volume Serial Number is F074-BCDF
Directory of C:\projects\av-build-problem\node_modules\pngquant-bin\vendor
04/07/2015 02:58 PM <DIR> .
04/07/2015 02:58 PM <DIR> ..
04/07/2015 02:58 PM 544,355 pngquant.exe
1 File(s) 544,355 bytes
2 Dir(s) 56,997,486,592 bytes free
Discovering tests...OK
Build success
I've been able to reproduce the issue with optipng-bin, gifsicle, jpegtran-bin and grunt-contrib-imagemin which is what we use on our main build and is causing us constant build failures since it has dependencies on the other 4 packages I've mentioned.
Comments are currently closed for this discussion. You can start a new one.
Keyboard shortcuts
Generic
? | Show this help |
---|---|
ESC | Blurs the current field |
Comment Form
r | Focus the comment reply box |
---|---|
^ + ↩ | Submit the comment |
You can use Command ⌘
instead of Control ^
on Mac
Support Staff 1 Posted by Feodor Fitsner on 07 Apr, 2015 05:12 PM
Seems something's indeed taking place as there is another similar report with the same message.
So, you both are running on Pro environment - to check if this is an environment (or location/region) specific issue it would be interesting to see if you get the same errors while running on Azure. To force your builds running on Azure you can choose "unstable" image.
This may be a coincidence, but both reports came after we updated default node.js on build workers to 0.10.38 (though I see you are running on 0.12.x branch). It might be worth checking if you get error while switching back to 0.10.37 with
Install-Product node 0.10.37
PS command.Another thing to try is build cache node modules, so they are not fetched from remote locations every build.
2 Posted by giacomo.tag on 07 Apr, 2015 06:48 PM
Let me try that
It's definitely related to a change happened ~1-2 weeks ago, so it could be that, let me check
I can't actually do that because the task I am running always gets the newest file from the net, no way to cache them
3 Posted by giacomo.tag on 07 Apr, 2015 06:57 PM
I actually checked my logs, And I noticed a difference in the npm version.
It worked with version
2.7.4
and it stopped working with2.7.5
. That could be the problem? The node version was always the same (0.10.38
) both for good and bad builds.4 Posted by giacomo.tag on 07 Apr, 2015 06:59 PM
I take back what I said. There are builds that fail for the same error with npm version
2.7.4
Support Staff 5 Posted by Feodor Fitsner on 07 Apr, 2015 07:01 PM
Yeah, I think
npm
coming with specific node version should be the same.6 Posted by Anthony Seddon on 07 Apr, 2015 07:52 PM
I've tested with both the Unstable platform and node 0.10.37 and I'm still getting build failures due to connectivity resets.
Regards,
Ant
Support Staff 7 Posted by Feodor Fitsner on 07 Apr, 2015 08:11 PM
OK, could you please also test on "Previous Windows Server 2012 R2" image which was in effect between March 21 and April 4?
- Feodor
8 Posted by Anthony Seddon on 07 Apr, 2015 09:46 PM
Builds on the Previous Win2K image seem to work as expected however they are very slow.
I'm just testing with some caching changes and both builds I've got queued are now "hanged" in a Queued state. Cancelling and restarting doesn't seem to move the state to In Progress either.
9 Posted by Anthony Seddon on 07 Apr, 2015 09:48 PM
Typical! One of the queued builds kicked off as soon as I posted that last comment...
Support Staff 10 Posted by Feodor Fitsner on 07 Apr, 2015 11:40 PM
Have you run it on "previous" environment several times? Are you still getting the error on default image?
11 Posted by Anthony Seddon on 08 Apr, 2015 04:52 AM
Yes, it works consistently for my test build and our main build on the Previous image with no issues.
Support Staff 12 Posted by Feodor Fitsner on 08 Apr, 2015 05:12 AM
Is it possible for you to share your test on public repo, so I can play with it?
13 Posted by Anthony Seddon on 08 Apr, 2015 06:21 AM
I've given you access to the repo - https://bitbucket.org/livsmarter/av-build-problem
14 Posted by giacomo.tag on 08 Apr, 2015 01:21 PM
I tried on "previous" environment a couple of times, but still no success.
Support Staff 15 Posted by Feodor Fitsner on 08 Apr, 2015 01:22 PM
Could you please provide a simple test project as well that reproduces the issue?
- Feodor
16 Posted by giacomo.tag on 08 Apr, 2015 03:41 PM
I am trying to, but I cannot reproduce it: https://ci.appveyor.com/project/itajaja/appveyor-troubleshoot there is no ECONNRESET error here, and I don't know what could be the difference
17 Posted by giacomo.tag on 08 Apr, 2015 03:42 PM
A difference might be that in the full repository, I am calling npm install in a subfolder from a grunt task in the root folder. Here the ECONNRESET error happens. In the second case, npm install is called directly and there is no ECONNRESET
Support Staff 18 Posted by Feodor Fitsner on 08 Apr, 2015 04:12 PM
Could you try reproducing the same folders structure in a test project?
19 Posted by giacomo.tag on 08 Apr, 2015 05:04 PM
For some reason, I am unable to reproduce it:
https://ci.appveyor.com/project/itajaja/appveyor-troubleshoot
I don't know what the difference could be sincerely, except that this one is not on a "PRO" plan but it's on a basic one
Support Staff 20 Posted by Feodor Fitsner on 08 Apr, 2015 05:15 PM
So, looks like it does not work on Pro environment and does work on Azure?
21 Posted by giacomo.tag on 08 Apr, 2015 05:30 PM
If the non-pro is on Azure instead of Server 2012, yes
Support Staff 22 Posted by Feodor Fitsner on 08 Apr, 2015 05:34 PM
Interesting. Will bring back previous image on Pro to see if it's reproducible.
Just to confirm - the same build sequence was working before (presumably the latest update on April 4)?
Support Staff 23 Posted by Feodor Fitsner on 08 Apr, 2015 06:27 PM
Looks like it's not related to the latest build worker updates. I'm testing with "March 21" image on Pro environment (which was in effect prior the latest update on April 4) and getting the same sporadic errors. It could be reproduced with a simple:
Is there any way to better understand what causes this issue?
24 Posted by Anthony Seddon on 08 Apr, 2015 06:54 PM
I'm not an npm/node expert but it looks like it's when it tries to download
the bin files from github in the index.js. Could github be rejecting the
connection?
Support Staff 25 Posted by Feodor Fitsner on 08 Apr, 2015 08:41 PM
If that's true what URL is that? Can we have a super simple JavaScript code sample from that index.js trying to download that file? That would be much easier to troubleshoot localized issue.
26 Posted by Anthony Seddon on 08 Apr, 2015 09:49 PM
I've setup some samples in the repo we've been using which try to download https://raw.githubusercontent.com/imagemin/gifsicle-bin/v2.0.1/vend... emulating what the gifsicle-bin npm package does but with nodejs and PS.
Each sample is on a separate branch and I can demonstrate that builds consistently succeed on the previous image but fail on the current image.
Interestingly, if I do a "Re-build commit" following a failed download/build, the download succeeds! Any ideas why that could be?
Maybe a shot in the dark but this post (https://youtrack.jetbrains.com/issue/TW-35180) on JetBrains bug tracker talks about some known networking issues between Azure and GitHub. Could it be a similar issue?
Support Staff 27 Posted by Feodor Fitsner on 08 Apr, 2015 11:47 PM
Great, let me try that.
I'd like to clarify regarding environments. When you have
os:
in appveyor.yml with any value the build is run on Azure. So when you specifyos: Previous Windows Server 2012 R2
oros: unstable
oros: <something>
it's all Azure. When there is noos
specified then it's Hyper-V environment (non-Azure data center) if you are on Pro plan.So, seems the problem is specific to Pro (Hyper-V) environment only, i.e. non-Azure data center. Why it's works there on the second run - don't know yet - maybe it's specific to some Hyper-V host, because every time the build runs on a different host.
Support Staff 28 Posted by Feodor Fitsner on 09 Apr, 2015 01:38 AM
I can confirm the issue is reproducible on Pro environment with both ps-sample and nodejs-sample samples: https://ci.appveyor.com/project/FeodorFitsner/av-build-problem/history
The question now is whether it's AppVeyor data center issue of GitHub data center issue. I'm going to contact our DC first.
In the meantime it would be great to implement some kind of re-try mechanism. Too bad
npm install
returns 0 exit code in all those cases. Have to find a different workaround for this issue.29 Posted by giacomo.tag on 12 Apr, 2015 10:03 PM
Any news on the issue?
Support Staff 30 Posted by Feodor Fitsner on 12 Apr, 2015 10:29 PM
Yes, good news.
First, we sent support tickets to both SoftLayer and GitHub. They both asked for traceroutes. After looking into traceroutes SoftLayer suggested that it might be an issue on GitHub side and GitHub just didn't respond back.
In the meantime, we performed the tests in different data center (within SoftLayer) and the problem did not reproduce there. Clearly, there is some DC-wide (Dallas) connectivity problem with
githubusercontent.com
web site.We decided to move all Pro environment hosts to a different data center. With this move we not just fixed that connectivity issue but upgraded hardware as well! Now build workers are featuring the latest Xeon 2690 v3 (Haswell) processors and have 2,500 MB of RAM.
We are going to gradually trigger new hosts tomorrow (Monday), so your builds will start working again. Will send notification to AppVeyor technical updates mailing list with updated IP ranges.