Git clone –depth 2 is vastly better than –depth 1 if you want to Git push later by jakub_g

Share This Article

Sed ut perspiciatis unde.

TL;DR: use --depth 2. Read on for why.

Shallow clones (can, but not necessarily must) defeat an important optimization. In your case this happens for the first push, but not for subsequent pushes. Other defeating cases can occur so other pushes might also be slow.

We start with the fact that Git is really all about commits,¹ which are shaped into a Directed Acyclic Graph. The graph has vertices or nodes—whichever term you prefer—that are numbered, by commit hash IDs. Relatively immaterial here, but helpful for concreteness, is the fact that the edges / arcs between the nodes are stored as part of the nodes themselves, rather than being kept separately. Each node stores the hash IDs of its predecessor nodes.

A repository is, at heart, a database of these commit objects. A complete—non-shallow—repository has the entire graph, from every root to every tip commit. A single-branch clone potentially drops some part of the graph, but never has any “gaps” in the graph. For instance, given:

           node--node--tip1
          /
root--node
          
           node--node--tip2

we can drop either tip and the nodes on that row, but not the nodes and root on the middle row. In all of these cases, then, we can—as Git always does—start at the tip and work backwards and eventually arrive at the root.

Now, there are two properties of each node that are important here:

The number is unique. It’s a universally unique ID. No node in any other Git repository (that we’ll meet anyway) can re-use that ID.
The data in the node are strictly read-only. That includes the outgoing edge links.

What this means is that if we have a gap-free repository—one that’s either totally complete, or at least as complete as required for the tip commits it contains—on each side of a sender-to-receiver operation, we can have the sending repository simply enumerate for us some set of commits, by their numbers. If we, the

Post Author

rafaelcosta

Posted February 12, 2025 at 9:32 am

I'm wondering what the "because when we read it in, we mangle it" part really means… does this mean that there's no way to reference the commit (signaling that it's just a reference and has no actual data) without actually reading the contents of it?

— Update: just realized why it wouldn't make sense: `git push` would send only the delta from the previous commit and the previous commit is… non-existent (we only know it's ID), so we'd be back in square 1 (sending everything).

0Likes Log in to Reply
Post Author

edflsafoiewq

Posted February 12, 2025 at 9:36 am

Do blobless clones suffer from this?

0Likes Log in to Reply
Post Author

necovek

Posted February 12, 2025 at 10:32 am

I like the fact that none of this was tested, even if described with such authority :)

Anyone try it out yet?

(Not that I don't trust it, but I usually fetch the full history locally anyway)

0Likes Log in to Reply
Post Author

nopurpose

Posted February 12, 2025 at 10:50 am

`git clone –filter blob:none` FTW

0Likes Log in to Reply
Post Author

Timwi

Posted February 12, 2025 at 10:55 am

This seems like a bug to me. Even if the previous commit is “mangled” as they call it, there's no reason why you can't diff against it and only send the diff.

0Likes Log in to Reply
Post Author

haunter

Posted February 12, 2025 at 11:18 am

Wait is this a bug actually?

0Likes Log in to Reply
Post Author

jbreckmckye

Posted February 12, 2025 at 11:18 am

Why can't git push, when it encounters a `.git/shallow`, just ask the git server to fill in the remaining history by verifying the parent hashes the client can send?

0Likes Log in to Reply
Post Author

zeristor

Posted February 12, 2025 at 11:46 am

Should have the (2021) suffix

0Likes Log in to Reply
Post Author

bradley13

Posted February 12, 2025 at 11:48 am

Ok, I'm a simplistic Git user, but: I always do a full clone. Maybe (probably) I will never need all that history, but…maybe I will. Disk space is cheap.

0Likes Log in to Reply
Post Author

mg

Posted February 12, 2025 at 11:52 am

Reading this again reminds me of the fact how beautifully git uses the file system as a database. Where everything is laid out nicely in directories and files.

Except for performance, is there any downside to this?

In other words: When you store data in an application that only reads and writes data occasionally, is it a good idea to use the git approach and store it in files?

0Likes Log in to Reply
Post Author

wvh

Posted February 12, 2025 at 12:18 pm

That's a beautiful answer. Sometimes people explain something you already know, but different parts of your brain light up. This doesn't just explain git once more, but also plants some seeds related to hashed state optimisations in other, future challenges.

0Likes Log in to Reply
Post Author

kruador

Posted February 12, 2025 at 1:03 pm
It isn't mangled. The commit is there as-is. Instead the repository has a file, ".git/shallow", which tells it not to look for the parents of any commit listed there. If you do a '–depth 1' clone, the file will list the single commit that was retrieved.

This is similar to the 'grafts' feature. Indeed 'git log' says 'grafted'.

You can test this using "git cat-file -p" with the commit that got retrieved, to print the raw object.

> git clone –depth 1 https://github.com/git/git
> git log

commit 388218fac77d0405a5083cd4b4ee20f6694609c3 (grafted, HEAD -> master, origin/master, origin/HEAD)
Author: Junio C Hamano <gitster@pobox.com>
Date: Mon Feb 10 10:18:17 2025 -0800

The ninth batch Signed-off-by: Junio C Hamano <gitster@pobox.com>

> git cat-file -p 388218fac77d0405a5083cd4b4ee20f6694609c3

tree fc620998515e75437810cb1ba80e9b5173458d1c
parent 50e1821529fd0a096fe03f137eab143b31e8ef55
author Junio C Hamano <gitster@pobox.com> 1739211497 -0800
committer Junio C Hamano <gitster@pobox.com> 1739211512 -0800

The ninth batch

Signed-off-by: Junio C Hamano <gitster@pobox.com>

I can't reproduce the problem pushing to Bitbucket, using the most recent Git for Windows (2.47.1.windows.2). It only sent 3 objects (which would be the blob of the new file, the tree object containing the new file, and the commit object describing the tree), not the 6000+ in the repository I tested it on.

It may be that there was a bug that has now been fixed. Or it may be something that only happens/happened with GitHub (i.e. a bug at the receiving end, not the sending one!)

I note that the Stack Overflow user who wrote the answer left a comment underneath saying

"worth noting: I haven't tested this; it's just some simple applied math. One clone-and-push will tell you if I was right. :-)"
0Likes Log in to Reply

Git clone –depth 2 is vastly better than –depth 1 if you want to Git push later by jakub_g

Git clone –depth 2 is vastly better than –depth 1 if you want to Git push later by jakub_g

Share This Article

Newsletter

HackTech

12 Comments

rafaelcosta

edflsafoiewq

necovek

nopurpose

Timwi

haunter

jbreckmckye

zeristor

bradley13

mg

wvh

kruador

Leave a comment Cancel reply

Editor's Choice

Git clone –depth 2 is vastly better than –depth 1 if you want to Git push later by jakub_g

Git clone –depth 2 is vastly better than –depth 1 if you want to Git push later by jakub_g

Share This Article

Newsletter

12 Comments

Leave a comment Cancel reply

Editor's Choice

Sign Up to Our Newsletter