Git's objects
Now that you know Git stores every commit as a full tree state or snapshot, let's look closer at the object's Git store in the repository.
Git's object storage is a key-value storage, the key being the ID of the object and the value being the object itself. The key is an SHA-1 hash of the object, with some additional information such as size. There are four types of objects in Git, branches (which are not objects, but are important), and the special HEAD
pointer that refers to the branch/commit currently checked out. The four object types are as follows:
- Files, or blobs as they are also called in the Git context
- Directories, or trees in the Git context
- Commits
- Tags
We will start by looking at the most recent commit object in the repository we just cloned, keeping in mind that the special HEAD
pointer points to the branch currently checked out.
Getting ready
To view the objects in the Git database, we first need a repository to be examined. For this recipe, we will clone an example repository located here:
$ git clone https://github.com/dvaske/data-model.git $ cd data-model
Now you are ready to look at the objects in the database, we will start by looking first at the commit object, then the trees, the files, and finally the branches and tags.
How to do it...
Let's take a closer look at the object's Git stores in the repository.
The commit object
The special Git object HEAD
always points to the current snapshot/commit, so we can use that as a target for our request of the commit we want to have a look at:
$ git cat-file -p HEAD tree 34fa038544bcd9aed660c08320214bafff94150b parent a90d1906337a6d75f1dc32da647931f932500d83 author Aske Olsson <[email protected]> 1386933960 +0100 committer Aske Olsson <[email protected]> 1386941455 +0100 This is the subject line of the commit message It should be followed by a blank line then the body, which is this text. Here you can have multiple paragraphs etc. and explain your commit. It's like an email with subject and body, so get people's attention in the subject
The cat-file
command with the -p
option pretty prints the object given on the command line; in this case, HEAD
, which points to master
, which in turn points to the most-recent commit on the branch.
We can now see the commit object, consisting of the root tree (tree
), the parent commit object's ID (parent
), author and timestamp information (author
), committer and timestamp information (committer
), and the commit message.
The tree object
To see the tree object, we can run the same command on the tree, but with the tree ID (34fa038544bcd9aed660c08320214bafff94150b
) as the target:
$ git cat-file -p 34fa038544bcd9aed660c08320214bafff94150b 100644 blob f21dc2804e888fee6014d7e5b1ceee533b222c15 README.md 040000 tree abc267d04fb803760b75be7e665d3d69eeed32f8 a_sub_directory 100644 blob b50f80ac4d0a36780f9c0636f43472962154a11a another-file.txt 100644 blob 92f046f17079aa82c924a9acf28d623fcb6ca727 cat-me.txt 100644 blob bb2fe940924c65b4a1cefcbdbe88c74d39eb23cd hello_world.c
We can also specify that we want the tree object from the commit pointed to by HEAD
, by specifying git cat-file -p HEAD^{tree}
, which would give the same results as the previous one. The special notation HEAD^{tree}
means that from the reference given, (HEAD
) recursively dereferences the object at the reference until a tree object is found. The first tree object is the root tree object found from the commit pointed to by the master
branch, which is pointed to by HEAD
. A generic form of the notation is <rev>^<type>
and will return the first object of <type>
searching recursively from <rev>
.
From the tree object, we can see what it contains: file type/permissions, type (tree
/blob
), ID, and pathname:
Type/ Permissions |
Type |
ID/SHA-1 |
Pathname |
---|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
The blob object
Now, we can investigate the blob (file) object. We can do it using the same command, giving the blob ID as target for the cat-me.txt
file:
$ git cat-file -p 92f046f17079aa82c924a9acf28d623fcb6ca727 This is the content of the file: "cat-me.txt." Not really that exciting, huh?
This is simply the content of the file, which we will also get by running a normal cat cat-me.txt
command. So, the objects are tied together, blobs to trees, trees to other trees, and the root tree to the commit object, all by the SHA-1 identifier of the object.
The branch
The branch object is not really like any other Git objects; you can't print it using the cat-file
command as we can with the others (if you specify the -p
pretty print, you'll just get the commit object it points to):
$ git cat-file master usage: git cat-file (-t|-s|-e|-p|<type>|--textconv) <object> or: git cat-file (--batch|--batch-check) < <list_of_objects> <type> can be one of: blob, tree, commit, tag. ... $ git cat-file -p master tree 34fa038544bcd9aed660c08320214bafff94150b parent a90d1906337a6d75f1dc32da647931f932500d83 ...
Instead, we can take a look at the branch inside the .git
folder where the whole Git repository is stored. If we open the text file .git/refs/heads/master
, we can actually see the commit ID the master
branch points to. We can do this using cat
as follows:
$ cat .git/refs/heads/master 34acc370b4d6ae53f051255680feaefaf7f7850d
We can verify that this is the latest commit by running git log -1
:
$ git log -1 commit 34acc370b4d6ae53f051255680feaefaf7f7850d Author: Aske Olsson <[email protected]> Date: Fri Dec 13 12:26:00 2013 +0100 This is the subject line of the commit message ...
We can also see that HEAD
is pointing to the active branch by using cat
with the .git/HEAD
file:
$ cat .git/HEAD ref: refs/heads/master
The branch object is simply a pointer to a commit, identified by its SHA-1 hash.
The tag object
The last object to be analyzed is the tag
object. There are three different kinds of tags: a lightweight (just a label) tag, an annotated tag, and a signed tag. In the example repository, there are two annotated tags:
$ git tag v0.1 v1.0
Let's take a closer look at the v1.0
tag:
$ git cat-file -p v1.0 object 34acc370b4d6ae53f051255680feaefaf7f7850d type commit tag v1.0 tagger Aske Olsson <[email protected]> 1386941492 +0100 We got the hello world C program merged, let's call that a release 1.0
As you can see, the tag consists of an object, which in this case is the latest commit on the master branch, the object's type (both, commits, and blobs and trees can be tagged), the tag name, the tagger and timestamp, and finally a tag message.
How it works...
The Git command git cat-file -p
will pretty print the object given as an input. Normally, it is not used in everyday Git commands, but it is quite useful to investigate how it ties together the objects. We can also verify the output of git cat-file
, by rehashing it with the Git command git hash-object
; for example, if we want to verify the commit object at HEAD
(34acc370b4d6ae53f051255680feaefaf7f7850d
), we can run the following command:
$ git cat-file -p HEAD | git hash-object -t commit --stdin 34acc370b4d6ae53f051255680feaefaf7f7850d
If you see the same commit hash as HEAD
pointing towards you, you can verify whether it is correct with git log -1
.
There's more...
There are many ways to see the objects in the Git database. The git ls-tree
command can easily show the contents of trees and subtrees and git show
can show the Git objects, but in a different way.
See also
- For further information about Git plumbing, see Chapter 11, Git Plumbing and Attributes, almost at the end of this book.