Stochastic failure on Python 3.4 and 3.5 builds

Chris Markiewicz's Avatar

Chris Markiewicz

05 Oct, 2018 07:18 PM

Hi all,

We've been having intermittent failures since June 8 that don't seem to be related to any changes made in our code, and which we cannot reproduce anywhere except AppVeyor (and here inconsistently). For a time, they were infrequent, but in the last month or two it has become very common for at least one of these four builds to fail, but it is very inconsistent. We have even tried RDPing into a machine that ran a failing build, and could not reproduce the error in three builds.

This set of failures exclusively emerges in Python 3.4 and 3.5 (both x86 and x64), and it seems to relate to is checks, where x is x returns false. In some cases it could be there is a bug on our end; a caching object might fail to return identical objects for various reasons, but others seem to be class tests, e.g. numpy.float64 is numpy.float64.

Things it does not seem to be:

  • Machine - RDPing and re-running the build script could not reproduce the failure
  • Specific wheels - comparing successful runs to unsuccessful runs, the installation sets are identical (the failure to reproduce on the same machine also argues against this)

So we're trying to figure out where the problem might lie. Was there a known change in hardware or configuration on or around June 8, that might explain the sudden onset of these failures? Has anybody else experienced anything like this?

For reference, I've also submitted a bug to numpy, as all of the failures do seem related to numpy objects. I can copy the tracebacks here, if that would be helpful...

Thanks,
Chris

  1. Support Staff 1 Posted by Owen McDonnell on 05 Oct, 2018 09:30 PM

    Owen McDonnell's Avatar

    There have been several platform updates since June 8th. Perhaps you can start looking through them to guess at any changes that might be affecting your build.

    Also, can you try simplifying your matrix to problematic jobs and then add appveyor_build_worker_cloud: gce to your environment variables. I enabled this different build worker cloud, which has higher spec VMs, on your account in case we're dealing with a threading/resource issue.

    Note, though that these workers take a little longer time to be provisioned so your builds will lag a bit at the beginning.

    Run some tests like this and let us know how it goes.

  2. 2 Posted by Chris Markiewic... on 06 Oct, 2018 11:58 AM

    Chris Markiewicz's Avatar

    Thanks for the response. I added the environment variable (just to be sure, it wasn't supposed to be all-caps?), and the errors exhibited again in Python 3.5: https://ci.appveyor.com/project/nipy/nibabel/build/1.0.521

    I looked through the updates, and didn't see any relevant changes to Python 3.4 and 3.5 in the time frame. There was an update to 3.5.4 in the preceding month, so that's possible, but no updates to Python 3.4 since 2017. I'll have a more detailed look through when I have time.

  3. 3 Posted by Chris Markiewic... on 08 Oct, 2018 03:17 PM

    Chris Markiewicz's Avatar

    Ah, and just to be clear, here's the branch we're using to test. I've bumped some empty commits to re-run a few times. It seems less likely to fail but not resolved by using GCE.

    https://github.com/nipy/nibabel/pull/676

  4. Support Staff 4 Posted by Owen McDonnell on 13 Oct, 2018 01:55 AM

    Owen McDonnell's Avatar

    Sorry for the delay in responding.

    Given that some of the comments at numpy suggest a threading issue, it seems a good one to rule out. The GCE build workers don't actually provide any more cores than the standard workers.

    I could enable our new premium Quad VM's on your account just for the purpose of testing the result on stability of your builds.

    This is a paid option and we can not provide it free outside of a short testing window (1 or 2 days), but at least it will make it clear whether or not we are dealing with a threading issue.

    Let me know if and when you are ready to test this option.

  5. 5 Posted by Chris Markiewic... on 15 Oct, 2018 02:03 PM

    Chris Markiewicz's Avatar

    Hi, just a quick note: I won't be able to devote concerted effort to this, this week. I'll see if I can carve out some time next week.

  6. Support Staff 6 Posted by Owen McDonnell on 15 Oct, 2018 02:14 PM

    Owen McDonnell's Avatar

    Ok. We'll wait for news from you.

Reply to this discussion

Internal reply

Formatting help / Preview (switch to plain text) No formatting (switch to Markdown)

Attaching KB article:

»

Attached Files

You can attach files up to 10MB

If you don't have an account yet, we need to confirm you're human and not a machine trying to post spam.

Keyboard shortcuts

Generic

? Show this help
ESC Blurs the current field

Comment Form

r Focus the comment reply box
^ + ↩ Submit the comment

You can use Command ⌘ instead of Control ^ on Mac