+2
Completed

Refresh: improve .gitignore processing performance [SG-11113]

Zeblote 7 years ago updated by Marc Strapetz 7 years ago 28

I'm not sure what happens during refreshing, other than it counting up the files to ignore. But it could probably run in parallel for many files, right? That would help with large repos like Unreal Engine.

GOOD, I'M SATISFIED
Satisfaction mark by Zeblote 7 years ago

Please give 17.1 RC a try and let me known whether this works better for you.

It's way faster, but still takes almost 10 seconds, with very low cpu usage. What is it actually doing during that time?

Could it be due to a cold file system cache? If closing and reopening the repository again, does it become faster? Please also generate periodical thread dumps during closing and reopening a few time:


http://www.syntevo.com/doc/display/SG/Periodical+Thread+Dumps

It doesn't seem to become faster. Here's the log file: https://dl.dropboxusercontent.com/s/bpienbxg1qccc7m/log.zip

I opened/closed the repo 3 times, then clicked the "create periodical thread dumps".

Unfortunately the dumps did not cover the problem: you have to invoke Create Periodical Thread Dumps before opening/closing the repository. Btw, how many (tracked) files does the repository contain? Can you also provide a clone URL, so I can see how long it takes for me?

Oh. Your help page says to click that after. Here's a new log file: https://dl.dropboxusercontent.com/s/t0jc9cbi1wziiqs/log2.zip

The repo contains around 80000 tracked files, plus 60000 ignored ones.


To test for yourself you'd have to create an unreal engine account and follow these steps: https://www.unrealengine.com/en-US/ue4-on-github, then run Setup.bat after the clone finished.

Sorry, the help was actually not clear about it -- I have changed that now.


Regarding the performance problem, the thread dumps confirm that most of the time is spent in Ignore-processing code. This fits to the 60000 ignored files you have mentioned. Ignore-processing will be slow if many files have been checked against the .gitignore patterns. For example: having all 60000 files in a directory (and sub-directories) ".out" and ignoring "out" or "out/", this will be fast, but ignoring "out/*.txt" or even "out/*" will be slow because SmartGit will have to check every out/* file against the pattern.

I guess you will see fast refreshing if temporarily removing all ignored files or adjusting the .gitignore as outlined above?

Seems like you're right. A fresh clone without any hidden intermediate files refreshes lightning fast, in less than a second.


Unfortunately I'm not sure how much I can do about the .gitignore file. First it's provided by unreal engine, and it's rather weird, almost like an inverse: https://pastebin.com/KXEHn9CV


Maybe something can be done to make processing the ignored files faster?

Probably worth noting that a "git status" also finishes in < 1 second with all the hidden files, so it's definitely possible to make it faster.

It's probably possibly, but I guess some optimizations will be needed here. I have to profile this in more detail, what may take a while. What you may try: if there is a certain directory which contains most of the output files, you may ignore this explicitly in .git/info/excludes.

There's the Intermediate folder which contains all of the build files, but it only has 16000 files in it, the others are scattered all over the place. So it probably wouldn't help that much

Can you please provide an entire repository, including working tree and ignored files to reproduce the problem? I guess a DropBox upload will be helpful.

Not really, it's over 60GB. However you could create an unreal engine account, then download and build it to get pretty much the same result.

This "building" thing is what frightens me, as I guess it will require an appropriate build environment. But: you could send a complete directory listing (dir /b /s) and let me know the exact SHA of your HEAD. Then I'll be able to reproduce your state.

That file list is pretty huge by itself! https://dl.dropboxusercontent.com/s/fe129641ilma3mk/blah.zip


I only made minor changes from the official release branch, so it's probably best if you just clone that one: https://github.com/EpicGames/UnrealEngine/commits/release

You'd need to make an unreal account and link your github account on their site to view that page.


The build environment is pretty straight forward. You need either

a) VS 2015 with c++ support

or

b) VS 2017 with following packages:

- Game dev for c++

- Desktop dev for c++

- Desktop dev for .net

- Windows 8.1 SDK under individual components


Then just

1) Clone release branch

2) Run Setup.bat (this step downloads thousands of files from somewhere)

3) Run GenerateProjectFiles.bat

4) Open UE4.sln

5) Build the UE4 target, or the entire solution to generate more hidden stuff

What's with this "on moderation" thing on my other reply, did you get it? Haven't heard anything for a week.

Sorry, not sure what happened to my reply: I can reproduce the slowness and there are some promising ideas to make .gitignore processing faster which we are investigating.

Nice! Looking forward to seeing them.

Has there been any progress on this yet?

No progress yet. I'll let you know once there is something to test.

It's been a while. Still no progress?

No progress yet. I'll let you know once there is something to test.

Still no improvements in the next preview?

What of

I'll let you know once there is something to test.

was misunderstanding? Could you imagine that we are working on other things already?

Unfortunately it's a rather big change. Besides SmartGit code itself, also jgit is involved and we wanted to gather feedback on our proposed patch. So various people and projects involved and this is the reason why it takes somewhat longer. Patch is acknowledged now and improvements are planned for version 18.1.

Completed

Fixed for 18.1 preview 6

Just tried preview 6, working with the unreal repo is no longer frustrating :D

Thanks for the confirmation!