+2
Completed

Partial Clone support (Git 2.19 and higher) [SG-13863]

Marc Strapetz 4 years ago updated 3 years ago 7

Support for the new "partial clone" feature of Git, as described at:

https://git-scm.com/docs/partial-clone

For SmartGit, this especially means to fetch missing blobs on-demand (status, Log display, ...).

Related topics:

Fantastic! Really glad to see SmartGit thinking about "partial clone" support - I think once partial cloning matures and becomes generally available (along with sparse-checkout), this will really shake up the industry.

I was actually playing around with this the other day (with self-hosted Git 2.28), and the current situation with SmartGit and partial cloning isn't too bad:

  1. If a blob is missing (eg, due to cloning with --filter=blob:none), the Changes view will currently display an error message. All we have to do is do a "checkout" of the relevant commit (from the GUI) and git will fetch the relevant blobs and then SmartGit will display the diff properly (the checkout will respect any sparse-checkout filters you've set up, so it's not too expensive).

    For new users who don't know this, I think a quick fix is simply for SmartGit to detect the error and replace it with a message allowing the user to fetch the commit blobs if they want to (please don't auto fetch when browsing through the Log though!). When the partial clone feature matures, it might also be possible to be more selective about what blobs to fetch. 

  2. For repos that are large because of large binary files (and not because they have millions and millions of files and commits), cloning with something like --filter=blob:limit=1M seems to work ok (it only clones blobs smaller than 1MB, so all or most text diffs will be available, but it won't fetch large binaries until needed).

In future, as the partial clone path-based filter options mature, I think the ideal UI would be to have the left Repo pane be "expandable" to show files/folders in the repo (ls-tree not worktree). We could then use that Repo view to see what's available and change our partial clone filters / sparse checkout patterns without leaving the GUI (see https://smartgit.userecho.com/communities/1/topics/1150-sparse-checkout).

I think SmartGit would make sure that trees for all commits are present. Otherwise too much code might be affected by "missing objects" problems.

If a blob is missing (eg, due to cloning with --filter=blob:none), the Changes view will currently display an error message. All we have to do is do a "checkout" of the relevant commit (from the GUI) and git will fetch the relevant blobs and then SmartGit will display the diff properly (the checkout will respect any sparse-checkout filters you've set up, so it's not too expensive).

I would not fetch the entire commit but instead just fetch the missing blobs (git fetch ... <blobid>). The blobs will only be required for the Changes view, so it will be only a few (or just a single one) out of every selected commit. Fetching these blobs should happen automatically, at least up to a certain size: if the user clicks through the Graph and investigates changes for certain commits, this should work smoothly.

I would not fetch the entire commit but instead just fetch the missing blobs (git fetch ... <blobid>)


Even better if you can do that. I think you would want to fetch all blobs relating to that commit though (ie, all changed files in that commit) rather than just for the selected file. Reason being that this would allow the Compare view to track content that moves across files (SG-12887).

Fetching these blobs should happen automatically, at least up to a certain size: if the user clicks through the Graph and investigates changes for certain commits, this should work smoothly


Agree with making things work smoothly! I think don't fetch when just clicking in the Log (because as a user I might just want to see what files changed, not the actual diffs), but if performance is good enough, then agree with automatic fetching when selecting files for comparison. (But if auto fetching makes things unresponsive, then this will be the opposite of a smooth experience! So I guess just need to play around and see...)

> > I would not fetch the entire commit but instead just fetch the missing blobs (git fetch ... <blobid>)

>

> Even better if you can do that. I think you would want to fetch all blobs relating to that commit though (ie, all changed files in that commit) rather than just for the selected file. Reason being that this would allow the Compare view to track content that movesacross files (SG-12887).

I agree, that sounds reasonable. Even without SG-12887, chances are good that if a user investigates a commit, he might check contents of other files of this commit as well. And as the "round trip" overhead of fetching a single blob is usually tremendous compared to the blob data itself, we should better fetch all blobs at once (given they are below a certain threshold).

> I think don't fetch when just clicking in the Log (because as a user I might just want to see what files changed, not the actual diffs), but if performance is good enough, then agree with automatic fetching when selecting files for comparison. (But if auto fetching makes things unresponsive, then this will be the opposite of a smooth experience! So I guess just need to play around and see...)

The fetch will be triggered exactly once the Changes view tries to access blob data which in turn is triggered by the Files view selection which is triggered by the Graph selection. If you are working with partial clones and you are not interested in diffs at all, the best solution will be to close the Changes view in the Window|Main Perspective. This holds true already now if you want to prevent SmartGit from accessing Git-LFS blobs.

[Sorry, still figuring out the forum. Reposted as a "reply" as it should be.]


> The fetch will be triggered exactly once the Changes view tries to access blob data... which is triggered by the Graph selection.

Ah, just tried - I see what you mean. I think I was thinking of the 20.2 Preview 1 feature that allows the Changes view to auto-hide behind the Graph view and come to the front only when a file is selected. With that enabled, I think it would be good to not fetch until the Changes view is brought into focus (ie, by a Files view selection or by manually clicking the Changes tab).

This allows quickly perusing the Log for details other than the diffs (eg, parent/child commits, detailed commit message, etc), but still smoothly seeing diffs when needed. Do you agree?

> > The fetch will be triggered exactly once the Changes view tries to access blob data... which is triggered by the Graphselection.

> Ah, just tried - I see what you mean. I think I was thinking of the 20.2 Preview 1 feature that allows the Changes view to auto-hide behind the Graph view and come to the front only when a file is selected. With that enabled, I think it would be good to not fetch until the Changes view is brought into focus (ie, by a Files view selection or by manually clicking the Changes tab).

>

> This allows quickly perusing the Log for details other than the diffs (eg, parent/child commits, detailed commit message, etc), but still smoothly seeing diffs when needed. Do you agree?


I agree, that sounds like an optimization which could even be done independently. I have logged this as SG-13950.


Completed

This has been implemented now for 21.1 Preview:

https://www.syntevo.com/smartgit/preview/

build 17044+ (use Help|Check for Latest Build, if necessary). Changes:

We hope to have identified all code parts which fetch a large amount of blobs (like File Log, Blame, Investigate) and properly changed this code to do a few large fetches, so overall execution time should be reasonable. If you encounter long-running operations for which blobs are still fetched one-by-one (you will see this in SmartGit's log.txt.0 file), please report back.