Build/test timeouts

Krzysztof Czurylo's Avatar

Krzysztof Czurylo

18 Jun, 2017 11:53 AM

Hi,

For a last few days, we have seen the prolonged build/test time on pmem/nvml project (https://ci.appveyor.com/project/pmem/nvml), which results in build time out/cancellation after 1 hour.
It looks like test execution time is almost twice as long as usual. It used to take about 30 mins. to execute all the unit tests for one configuration, so including the build time, it was less than 40 mins. per job. Now, all the jobs timeouts at about 50% of work done.
At first, I thought it was a regression on our side, but i couldn't find any reason for that, so I decided to re-build the last successful build and it also failed (time out). See below.
Could it be related to the environment update on 16.06?

last successful (6 days ago) - about 37 min.
https://ci.appveyor.com/project/pmem/nvml/build/1.0.329

same commit rebuilt on 17.06.2017 - canceled after 1hr
https://ci.appveyor.com/project/pmem/nvml/build/1.0.357

Best regards,
K.

  1. 1 Posted by Krzysztof Czury... on 18 Jun, 2017 11:55 AM

    Krzysztof Czurylo's Avatar

    Ooops,

    Of course, I meant the update on Jul 12th.
    https://www.appveyor.com/updates/

    K.

  2. Support Staff 2 Posted by Feodor Fitsner on 18 Jun, 2017 06:19 PM

    Feodor Fitsner's Avatar

    Hi Krzysztof,

    Indeed, I've tested the last commit on both "previous" and current images and noticed the regression: https://ci.appveyor.com/project/FeodorFitsner/nvml/history

    Among other things installed on June 12th update were latest Windows updates and most probably something could be related to that. Will be taking a look.

    Can you give me some hint what part of the test could become slow - is it something IO related or perhaps some chore PS task before/after test, etc.?

  3. Support Staff 3 Posted by Feodor Fitsner on 19 Jun, 2017 02:56 AM

    Feodor Fitsner's Avatar

    Oh, forgot to mention - to unblock the builds please run on image: Previous Visual Studio 2015 for now.

  4. 4 Posted by Krzysztof Czury... on 19 Jun, 2017 07:32 AM

    Krzysztof Czurylo's Avatar

    Hi Feodor,

    Thanks for quick response.
    I analyzed the logs and it looks like the problem is not in the tests, but in the PowerShell scripts that are used to execute them. Each test program/script is run in a subprocess / PS job. The execution time of each job is printed in the log and in most cases it's below 1s. But when you compare it with the timestamp of the log line, it appears that spawning this subprocess/job may take a couple of seconds. It used to be ~3s (average), now it is 9s avg. (not less than 4 sec.!!!).
    I have no idea whether this change it is related to the PowerShell itself or some other Windows updates.

  5. Support Staff 5 Posted by Feodor Fitsner on 19 Jun, 2017 01:51 PM

    Feodor Fitsner's Avatar

    How do you invoke PS scripts with tests? I'd like to to a reproduce to use in regression testing.

  6. 6 Posted by Krzysztof Czury... on 20 Jun, 2017 03:32 PM

    Krzysztof Czurylo's Avatar

    It's something like this:

    $sb = {
        cd $args[0]
        Invoke-Expression $args[1]
    }
    $j1 = Start-Job -Name $name -ScriptBlock $sb -ArgumentList (pwd).Path, ".\$runscript"
    ...
    Receive-Job -Job $j1

    If needed, I can write a simple script that would illustrate that.

    K.

  7. Support Staff 7 Posted by Feodor Fitsner on 20 Jun, 2017 04:05 PM

    Feodor Fitsner's Avatar

    Yes, please, an example appveyor build in public repository would be great help. Maybe with a loop to see real delay. Thank you!

  8. 8 Posted by Krzysztof Czury... on 27 Jun, 2017 09:48 AM

    Krzysztof Czurylo's Avatar

    Hi Feodor,

    Sorry for late response.
    Eventually I found some time to write a simple script that reproduces the issue. It's a simplified version of the actual script we use - stripped out from everything that is not relevant.

    See: src/test/RUNTESTS.ps1 + src/test/TEST.ps1
    The comments in these files should explain how it works.

    https://github.com/krzycz/nvml/commits/av-test_job_time1
    https://github.com/krzycz/nvml/commits/av-test_job_time2

    I have run it on the old and new OS image and here are the results:
    https://ci.appveyor.com/project/krzycz/nvml/build/1.0.1607
    https://ci.appveyor.com/project/krzycz/nvml/build/1.0.1608

  9. 9 Posted by Krzysztof Czury... on 28 Jun, 2017 04:33 PM

    Krzysztof Czurylo's Avatar

    Even when using image: Previous Visual Studio 2015, we still observe sporadic time outs. I.e. one job completes in 30 min.s, while the second one timeouts.

  10. Support Staff 10 Posted by Feodor Fitsner on 28 Jun, 2017 04:42 PM

    Feodor Fitsner's Avatar

    Hi Krzysztof,

    Thanks for the provided repro. We are currently looking into this issue.

  11. Support Staff 11 Posted by Ilya Finkelshte... on 29 Jun, 2017 03:56 AM

    Ilya Finkelshteyn's Avatar

    Hi Krzysztof,

    Sorry with some delay and not providing too much results. Here is what I did:

    I can miss something with this repro, can you please advise?

    Ilya.

  12. 12 Posted by Krzysztof Czury... on 29 Jun, 2017 09:29 AM

    Krzysztof Czurylo's Avatar

    Hi,

    In all your tests, "RUNTESTS TOTAL" is between 48-54 sec., which means you hit the regression is all the cases. The 2-5sec. difference doesn't matter, because the actual problem is that the total time used to be ~20 sec.
    See: https://ci.appveyor.com/project/krzycz/nvml/build/1.0.1607

    Note that the actual test time is about 10s (10 iterations x 1s), but since the overhead of spawning new job is about 1s (that's very long, but still acceptable), then with additional 10s and it gives us 20s in total. (that's how it used to be)
    Now the total time is ~50s, so the overhead is 40s (4s per job), which is veeeery bad.

    Based on your results and on the fact we still observe sporadic timeouts when using old image, I suspect the problem may be related not to the image update, but rather some other factor - server machines, build cloud, etc...

    K.

  13. 13 Posted by Łukasz Plewa on 29 Jun, 2017 01:44 PM

    Łukasz Plewa's Avatar

    Hi Ilya,

    I took yours commit and started a new build and here you have a results:
    https://ci.appveyor.com/project/lplewa/nvml/build/1.0.599

    it is 9:30min to 3:30 min on old image

    It's look like that tests on new image is always slow, on old images sometimes is fast sometimes it is slow.

  14. Support Staff 14 Posted by Ilya Finkelshte... on 30 Jun, 2017 04:40 PM

    Ilya Finkelshteyn's Avatar

    Hi Łukasz and Krzysztof,

    Apparently update 4014507 causes this. We hope that installing latest updates on top of it could fix that, but they did not. We are not sure if it is safe to uninstall it globally from image. For now workaround is to uninstall it at init stage as I did here. It takes about 40 seconds (including reboot), which maybe tolerable for you, in comparison to performance degrade you observe.

    We will look what else we can do, but for now this is workaround we propose.

    Also sorry for confusion with Previous Visual Studio 2015 image -- it worked the same as new one sometimes because it was actually new one. We are in process of adding more Hyper-V hosts to our datacenter and during this process there are short periods when older images still not deployed to new host.

    Ilya.

  15. Support Staff 15 Posted by Feodor Fitsner on 22 Jul, 2017 04:41 AM

    Feodor Fitsner's Avatar

    Hi guys,

    We've just deployed an updated "Visual Studio 2015" image with PS performance issue fixed! No workarounds needed anymore.

  16. 16 Posted by Krzysztof Czury... on 22 Jul, 2017 08:30 AM

    Krzysztof Czurylo's Avatar

    Hi Feodor,

    Thanks a lot for your support.

Reply to this discussion

Internal reply

Formatting help / Preview (switch to plain text) No formatting (switch to Markdown)

Attaching KB article:

»

Attached Files

You can attach files up to 10MB

If you don't have an account yet, we need to confirm you're human and not a machine trying to post spam.

Keyboard shortcuts

Generic

? Show this help
ESC Blurs the current field

Comment Form

r Focus the comment reply box
^ + ↩ Submit the comment

You can use Command ⌘ instead of Control ^ on Mac

Recent Discussions

19 Oct, 2017 02:25 AM
18 Oct, 2017 07:26 PM
18 Oct, 2017 11:00 AM
18 Oct, 2017 08:54 AM
18 Oct, 2017 07:42 AM

 

17 Oct, 2017 07:36 PM
17 Oct, 2017 05:39 PM
17 Oct, 2017 04:29 PM
17 Oct, 2017 07:49 AM
16 Oct, 2017 11:38 PM
16 Oct, 2017 10:01 PM