Inexplicable timeouts
We've been having a large number of timeouts lately. A few looked similar to issues that we have run into lately, but many others did not, and appeared to be only happening on AppVeyor workers. I've done as much to mitigate the known issues as I can (including running tests in serial for 64-bit, which causes even successful builds to take quite a bit longer), but we're still hitting timeouts. Any way to look into what was happening on the build worker for the following cases?
https://ci.appveyor.com/project/StefanKarpinski/julia/build/1.0.398...
https://ci.appveyor.com/project/StefanKarpinski/julia/build/1.0.395...
https://ci.appveyor.com/project/StefanKarpinski/julia/build/1.0.393...
https://ci.appveyor.com/project/StefanKarpinski/julia/build/1.0.387...
https://ci.appveyor.com/project/StefanKarpinski/julia/build/1.0.385...
https://ci.appveyor.com/project/StefanKarpinski/julia/build/1.0.369...
https://ci.appveyor.com/project/StefanKarpinski/julia/build/1.0.363...
https://ci.appveyor.com/project/StefanKarpinski/julia/build/1.0.354...
https://ci.appveyor.com/project/StefanKarpinski/julia/build/1.0.347...
https://ci.appveyor.com/project/StefanKarpinski/julia/build/1.0.336...
https://ci.appveyor.com/project/StefanKarpinski/julia/build/1.0.333...
https://ci.appveyor.com/project/StefanKarpinski/julia/build/1.0.328...
https://ci.appveyor.com/project/StefanKarpinski/julia/build/1.0.322...
Comments are currently closed for this discussion. You can start a new one.
Keyboard shortcuts
Generic
| ? | Show this help |
|---|---|
| ESC | Blurs the current field |
Comment Form
| r | Focus the comment reply box |
|---|---|
| ^ + ↩ | Submit the comment |
You can use Command ⌘ instead of Control ^ on Mac

1 Posted by tony on 10 Dec, 2014 09:57 PM
by "that we have run into lately," I meant "that we have run into locally"
Support Staff 2 Posted by Feodor Fitsner on 10 Dec, 2014 10:01 PM
Sure, will check it out.
-Feodor
3 Posted by tony on 11 Dec, 2014 01:56 AM
Here's another one that's running now and looks frozen https://ci.appveyor.com/project/StefanKarpinski/julia/build/1.0.404...
Support Staff 4 Posted by Feodor Fitsner on 11 Dec, 2014 02:09 AM
Hm, right now there are only about 4 projects running on that server.
At first, I thought there was high load on the server and you were switched to Azure, but no - all recent Julia builds were running on new environment.
Maybe it's worker-AppVeyor communication issue (if it looks like frozen). Can I look into VM with running Julia build to see if there are any SignalR errors?
-Feodor
5 Posted by tony on 11 Dec, 2014 02:23 AM
Whatever instrumentation you want to do. I can contact Stefan as owner of the account if you need input from him.
It seems to only happen on our 64-bit builds. There is a chance that some change in the Julia codebase could have introduced freezing during build/tests, I'm running locally on a range of commits to check.
But in case it's some communication issue I do think it's worth looking into if you can.
Support Staff 6 Posted by Feodor Fitsner on 11 Dec, 2014 02:26 AM
Sure, will take a look.
Though you could be right there was some change into x64 as it seems 32-bit builds manage to complete in time.
-Feodor
Support Staff 7 Posted by Feodor Fitsner on 11 Dec, 2014 04:12 AM
Next time you see it's going to stuck drop me a quick message - I'd like to see what's going on there. I can do that during the build only as after that VM is immediately restored.
Support Staff 8 Posted by Feodor Fitsner on 11 Dec, 2014 04:13 AM
Was watching this one: https://ci.appveyor.com/project/StefanKarpinski/julia/build/1.0.408...
By the end of build Julia process took 820 MB (75%) of RAM. CPU was around 40%. So it's probably neither CPU nor RAM.
9 Posted by tony on 11 Dec, 2014 04:34 AM
https://ci.appveyor.com/project/StefanKarpinski/julia/build/1.0.410... might be stuck? The last line I see is "profile.jl" which should only take a few seconds.
10 Posted by tony on 11 Dec, 2014 04:36 AM
Oh and that build is running on our release branch which doesn't change anywhere near as dramatically as master
Support Staff 11 Posted by Feodor Fitsner on 11 Dec, 2014 06:04 PM
Looking at recently failing builds you may notice that
ARCH=x86_64job was running relatively small time (on the right) before getting stuck.https://ci.appveyor.com/project/StefanKarpinski/julia/build/1.0.410... - 6 min
https://ci.appveyor.com/project/StefanKarpinski/julia/build/1.0.404... - 9 min
https://ci.appveyor.com/project/StefanKarpinski/julia/build/1.0.398... - 9 min
https://ci.appveyor.com/project/StefanKarpinski/julia/build/1.0.395... - 5 min
https://ci.appveyor.com/project/StefanKarpinski/julia/build/1.0.393... - 6 min
What's interesting in all these cases
ARCH=i686job was successful with pretty consistent run time of 13-16 min.Did that start since you moved to a new environment or it was working there then started to fail?
12 Posted by tony on 11 Dec, 2014 11:12 PM
It's freezing either during the "system image build," the list of *.jl files that is the last build step before running the tests, or in one of the first few tests.
It seems to me as though it had been a bit more reliable at first, but it may have been triggered by a code change. Since it's an intermittent problem it's quite difficult and time-consuming to try running git bisect on it.
13 Posted by tony on 12 Dec, 2014 04:44 AM
I'm not so sure what's actually happening here, some or all of these might be real Julia freezes that will have to be looked into if we can figure out what's causing them.
In the meantime, maybe an optional mitigation feature of early timeouts when no output is received for, say, 10 minutes? That would at least make these hold up the build queue a bit less.
Support Staff 14 Posted by Feodor Fitsner on 12 Dec, 2014 05:32 PM
Log inactivity timeout is a great idea. I'll add a new issue for that. I'm not sure about 10 minutes cap though - maybe we should make that configurable. I don't know but maybe there could be some "heavy" projects silently doing something longer than 10 minutes :)
15 Posted by tony on 12 Dec, 2014 07:53 PM
Yeah, configurable would make sense. Some people like running their builds with logging disabled so yeah this might be a not-for-everyone option.
16 Posted by tony on 16 Dec, 2014 11:31 PM
Another one frozen: https://ci.appveyor.com/project/StefanKarpinski/julia/build/1.0.642...
17 Posted by tony on 17 Dec, 2014 12:20 AM
https://ci.appveyor.com/project/StefanKarpinski/julia/build/1.0.643... as well
Support Staff 18 Posted by Feodor Fitsner on 17 Dec, 2014 12:23 AM
Let me see what's gong on there.
Support Staff 19 Posted by Feodor Fitsner on 17 Dec, 2014 12:37 AM
julia.exe - 50% CPU and 266 MB RAM. Server memory: 1.1/1.7 GB
Build started at 12:10, stalled at 12:17
Found these errors in "Application" event log:
20 Posted by tony on 17 Dec, 2014 12:39 AM
Thanks for looking into it!
Does that happen in a successful build too? Like the i686 builds, or a x86_64 build that didn't freeze? That's really strange since we aren't building with visual studio 9 at all, and I don't think we do anything with any files named
pgo*either.Support Staff 21 Posted by Feodor Fitsner on 17 Dec, 2014 12:41 AM
Will try to catch i686 build next time - can do that while it's running only.
22 Posted by tony on 17 Dec, 2014 12:46 AM
Okay, builds 644 and 645 should fail quickly, 644 is an already-merged pull request and 645 will be stopped by my code since there are other builds pending for the same PR. 646 will have an i686 build bug that should take about 5-10 minutes to get to.
23 Posted by tony on 17 Dec, 2014 01:14 AM
https://ci.appveyor.com/project/StefanKarpinski/julia/build/1.0.648... should be a normal successful i686 build, taking maybe 15 minutes or so
Support Staff 24 Posted by Feodor Fitsner on 17 Dec, 2014 01:15 AM
Will take a look.
-Feodor
Support Staff 25 Posted by Feodor Fitsner on 17 Dec, 2014 01:25 AM
OK, false alarm. On the worker running i686 those 3 errors in Windows event log were before the build started.
Support Staff 26 Posted by Feodor Fitsner on 17 Dec, 2014 01:28 AM
I noticed i686 job runs like 2 julia.exe processes while x64 only one?
27 Posted by tony on 17 Dec, 2014 01:30 AM
Sometimes I run 2 julia.exe processes for doing the tests. This is less reliable on win64, sometimes running tests in parallel can freeze even locally. So I've sometimes has win64 running tests in serial, sometimes tried my luck at parallel. But that type of freeze would look different, we would get "From worker 2:" and "From worker 3:" and one of the workers will get to the end, waiting at "parallel" test for the other worker to finish.
The freezing that's most common on appveyor is during an early stage, building the system image with the list of *.jl files, that runs on a single process right now.
Ilya Finkelshteyn closed this discussion on 25 Aug, 2018 01:53 AM.