Successful builds failing with code 259

acrichton's Avatar

acrichton

25 Jan, 2019 03:09 PM

We've recently had two builds that seem to fail spuriously:

where both report that they're failing with exit code 259, but according to the logs it looks like the build completed successfully (and the relevant code later landed without changes and passed all tests). We noticed that on Windows 259 corresponds to STILL_ACTIVE which may mean that a process isn't ready for exiting and it needs to be reaped later, so we were wondering if this is a possible bug in AppVeyor's harness?

  1. Support Staff 1 Posted by Ilya Finkelshte... on 25 Jan, 2019 09:47 PM

    Ilya Finkelshteyn's Avatar

    Hi Alex,

    We took a look into this issue. Let me please share our observations:

    • Issue happens with always with the same job (CI_JOB_NAME=x86_64-mingw).

    • AppVeyor build worker image was last updated before this started happening.

    • There are no specific Hyper-V node where it happens.

    • External dependency downloaded are the same between recent "green" and "red" jobs.

    • Per my understanding exit code 259 means that caller started child process, and exiting without waiting that child process to complete, as described in this discussion.

    • AppVeyor interprets any exit code other than zero as error.

    It can be that some external dependency specific to this CI_JOB_NAME=x86_64-mingw changed. Those two builds which mention are the first ones who failed this way? Can you think of specific downloadable dependency to check?

    As a workaround you can catch exit code 259 and return 0 as 259 is not really a failure.

    Side note: we recently implemented RE-RUN INCOMPLETE button so you can rerun build failed this way and only failed and cancelled jobs will be executed.

    Ilya.

  2. 2 Posted by acrichton on 25 Jan, 2019 10:46 PM

    acrichton's Avatar

    Hm interesting, thanks for the info! Since this is a spurious error, it may also be the case that a previous update to the image didn't show up until now. Looking more into this though, we print the "test run finished successfully" message at the end, and then the sequence of events are:

    • The Python interpreter which ran the whole test suite exits, hopefully with error code 0
    • An invocation of sh -x -c "$FOO" is now unblocked that $FOO (the python interpreter) has exited
    • An invocation of bash src/ci/run.sh is then unblocked as the CI run script exits
    • After that, control is yielded back to AppVeyor as we don't have any further items configured in the test_script section of our appveyor.yml.

    The three components which are reaping processes after the success message is printed are sh.exe, bash.exe, and AppVeyor's runner as well (I think maybe cmd.exe for us?). It's probably not a bug in cmd.exe, so this may just be a bug in sh.exe or bash.exe which we haven't ever run into before.

    I think we're using sh.exe and bash.exe from the environment by putting C:\msys64\usr\bin, do you know if those have changed at all recently?

    Apart from that I'm not really sure where this could be showing up :(. AFAIK nothing we download could have changed to have caused this, but I've been wrong before!

  3. Support Staff 3 Posted by Ilya Finkelshte... on 29 Jan, 2019 03:44 AM

    Ilya Finkelshteyn's Avatar

    Can you try to check and print exit code of sh -x -c "$FOO"? It is here, correct?

    Side note: Two jobs in latest build (https://ci.appveyor.com/project/rust-lang/rust/builds/21954146) failed because of issue on our side. Sorry for the trouble. Can you please use RE-BUILD INCOMPLETE button to rerun those jobs?

  4. 4 Posted by acrichton on 29 Jan, 2019 03:47 PM

    acrichton's Avatar

    Sure yeah, I've added some debugging to hopefully see what comes out.

    Ah I was a bit slow on those reruns! Our infrastructure workflow isn't currently architected well enough to take advantage of that, but we'll be sure to keep that in mind :)

  5. Support Staff 5 Posted by Ilya Finkelshte... on 29 Jan, 2019 10:15 PM

    Ilya Finkelshteyn's Avatar

    You can use this API with reRunIncomplete: True.

  6. 6 Posted by Pietro Albini on 04 Feb, 2019 06:36 PM

    Pietro Albini's Avatar

    Hi! I'm Pietro from the Rust infrastructure team.

    After we added the debugging code we hit the same failure again in https://ci.appveyor.com/project/rust-lang/rust/builds/22108226/job/.... The relevant log messages are:

    Build completed successfully in 2:59:39
    script exited with 0
    Command exited with code 259
    

    So that's not the problem. I opened another PR to add the same debugging code to appveyor.yml if we can get relevant information outside of it.

  7. Support Staff 7 Posted by Ilya Finkelshte... on 04 Feb, 2019 09:44 PM

    Ilya Finkelshteyn's Avatar

    Hi Pietro,

    Thanks a lot for the update. I would also ask to add the following commands into the end of run.sh file:

    tasklist
    where sh
    

    We would compare tasklist output between "good" and "bad" builds in hope to get some idea what is the process spawned by sh not closed by the end of it's run (probable reason for exit code 259).

    where sh should display us where shell is being called from. Your configuration is quite involved and it is better to double-check which sh.exe is running.

    Ilya.

  8. 8 Posted by Pietro Albini on 05 Feb, 2019 08:23 AM

    Pietro Albini's Avatar

    Thanks! Added the two commands at the bot of run.sh, let's see how it goes.

    Pietro.

  9. 9 Posted by Pietro Albini on 06 Feb, 2019 01:53 PM

    Pietro Albini's Avatar

    We added the code to our CI, but comparing a good build's tasklist and a bad build's tasklist doesn't show anything useful (it's the same list of processes). The output of which sh is:

    C:\msys64\usr\bin\sh.exe
    C:\Program Files\Git\usr\bin\sh.exe
    
  10. Support Staff 10 Posted by Ilya Finkelshte... on 13 Feb, 2019 12:02 AM

    Ilya Finkelshteyn's Avatar

    Hi Pietro,

    My apologies, missed this message. Do you still have this issue? Did it happen during last day (asking in hope that recent update somehow fixed it)? And can you try to use C:\Program Files\Git\usr\bin\sh.exe (which is default on AppVeyor image) to execute run.sh.

    Sorry for "try this and that" approach -- It is the kind of issue we have no clear idea...

    Ilya.

  11. 11 Posted by acrichton on 13 Feb, 2019 06:55 PM

    acrichton's Avatar

    We had a build fail yesterday which I think was after the update, so unfortunately looks like that didn't fix it. I'll switch over to git's sh.exe and see how that works.

    No worries as well, I'm just glad you've got ideas of how we can try different things because we ran out!

  12. 12 Posted by acrichton on 14 Feb, 2019 02:33 PM

    acrichton's Avatar

    I'm certainly no expert on sh.exe on Windows, but this error we got looks like sh.exe from git is incompatible with also having msys64 items in PATH unfortunately :(

  13. Support Staff 13 Posted by Ilya Finkelshte... on 15 Feb, 2019 07:53 PM

    Ilya Finkelshteyn's Avatar

    Hmm, when this error happened first time?

  14. 14 Posted by acrichton on 15 Feb, 2019 10:19 PM

    acrichton's Avatar

    Searching back a bit, and this is by no means exhaustive, the earliest instance I could fine was 25 days ago

  15. Support Staff 15 Posted by Ilya Finkelshte... on 19 Feb, 2019 05:22 AM

    Ilya Finkelshteyn's Avatar

    Hi Alex,

    Can it be about the same time you switched to Hyper-V from GCE? Can you please try to switch back to GCE again for a while to see how it is going to behave?

    Sorry once again for back-and-forth...

    Ilya.

  16. 16 Posted by acrichton on 20 Feb, 2019 06:03 PM

    acrichton's Avatar

    A good point! We're trying that with https://github.com/rust-lang/rust/pull/58597 and we'll watch for a few days to see if a spurious error crops up

Reply to this discussion

Internal reply

Formatting help / Preview (switch to plain text) No formatting (switch to Markdown)

Attaching KB article:

»

Attached Files

You can attach files up to 10MB

If you don't have an account yet, we need to confirm you're human and not a machine trying to post spam.

Keyboard shortcuts

Generic

? Show this help
ESC Blurs the current field

Comment Form

r Focus the comment reply box
^ + ↩ Submit the comment

You can use Command ⌘ instead of Control ^ on Mac