Git介绍
Git --- The stupid content tracker, 傻瓜内容跟踪器。Linus 是这样给我们介绍 Git 的。
Git 是用于 Linux 内核开发的版本控制工具。与常用的版本控制工具 CVS, Subversion 等不同,它采用了分布式版本库的方式,不必服务器端软件支持,使源代码的发布和交流极其方便。 Git 的速度很快,这对于诸如 Linux kernel 这样的大项目来说自然很重要。 Git 最为出色的是它的合并跟踪(merge tracing)能力。
实际上内核开发团队决定开始开发和使用 Git 来作为内核开发的版本控制系统的时候,世界开源社群的反对声音不少,最大的理由是 Git 太艰涩难懂,从 Git 的内部工作机制来说,的确是这样。但是随着开发的深入,Git 的正常使用都由一些友好的脚本命令来执行,使 Git 变得非常好用,即使是用来管理我们自己的开发项目,Git 都是一个友好,有力的工具。现在,越来越多的著名项目采用 Git 来管理项目开发,例如:wine, U-boot 等,详情看 http://www.kernel.org/git
Git架构
commit-tree commit obj +----+ | | | | V V +-----------+ | Object DB | | Backing | | Store | +-----------+ ^ write-tree | | tree obj | | | | read-tree | | tree obj V +-----------+ | Index | | "cache" | +-----------+ update-index ^ blob obj | | | | checkout-index -u | | checkout-index stat | | blob obj V +-----------+ | Working | | Directory | +-----------+
git 使用“三大数据结构”来完成它的工作,当前工作目录、“index file”(index cache) 和 git仓库。 git commit 会将 index file 中的改变写到 git 仓库;git add 会将“当前工作目录”的改变写到“index file”;“commit -a”则会直接将“当前工作目录”的改动同时写到“index file”和“git仓库”。
将 Current working directory 记为 (1), 将 Index file 记为 (2), 将 Git repository 记为 (3), 他们之间的提交层次关系是 (1) -> (2) -> (3) 。git add 完成的是(1) -> (2),git commit 完成的是(2) -> (3),git commit -a 是两者的直接结合。从时间上看,可以认为(1)是最新的代码,(2)比较旧,(3)更旧。
git diff 得到的是从(2)到(1)的变化。git diff –cached 得到的是从(3)到(2)的变化。 git diff HEAD 得到的是从(3)到(1)的变化。
Git简单用法
转载自 http://roclinux.cn/?p=371 (需要注明作者,文章题名;另外,该段内容属于使用问题,因该移到其他页面。)
独立开发者的最大特点就是他们不需要和其他人来交换补丁,而且只在一个独立的固定的git仓库中工作。
下面这些命令将可以帮助你完成日常工作:
git-show-branch:可以显示你当前所在的分支以及提交记录。
git-log:显示提交日志
git-checkout或者git-branch:用于切换日志
git-add:用于将修改内容加入到index文件中
git-diff和git-status:用于显示开发者所做的修改
git-commit:用于提交当前修改到git仓库。
git-reset和git-checkout:用于撤销某些修改
git-merge:用于合并两个分支
git-rebase:用于维护topic分支(此处我也不太懂,等完成git学习后转过头来会关注此问题)
git-tag:用于标记标签。
1)获得帮助可以使用类似man git-****的命令格式:想获得关于commit命令的帮助,则man git-commit想获得关于pull命令的帮助,则man git-pull想获得关于merge命令的帮助,则man git-merge以此类推
2)任何人在使用git之前,都要提交简单的个人信息,以便git区分不同的提交者身份。
- git config –global user.name “your name”
- git config –global user.email yourname@example.com
3)想新开启一个项目,应该先建立一个目录,例如名为myproject,然后所有的项目开发内容都在此目录下进行。
- cd myproject
- git init
- git add .
- git commit //这个步骤会自动进入编辑状态,要求提交者输入有关本次提交的“开发信息”
至此,一个新项目就诞生了,第一个开发信息(开发日志)也随之诞生。
4)如果改进了项目源代码,并且到了开发者认为“应该再次记录开发信息”的时候,则提交“工作成果”。
- git commit -a //这是一个偷懒的命令,相当于git add .; git commit;
5)想检查到目前为止对源码都做了哪些修改(相对于本次工作刚开始之时):
- git diff //这个命令只在git add之前使用有效。如果已经add了,那么此命令输出为空
- git diff –cached //这个命令在git add之后在git commit之前有效。
- git status //这个命令在git commit之前有效,表示都有哪些文件发生了改动
6)想查看自项目开启到现在的所有开发日志
- git log
- git log -p //会输出非常详细的日志内容,包括了每次都做了哪些源码的修改
7)开启一个试验分支(experimental),如果分支开发成功则合并到主分支(master),否则放弃该试验分支。
- git branch experimental //创建一个试验分支,名称叫experimental
- git branch //显示当前都有哪些分支,其中标注*为当前所在分支
- git checkout experimental //转移到experimental分支
(省略数小时在此分支上的开发过程)…如果分支开发成功:
- git commit -a //在experimental分支改进完代码之后用commit在此分支中进行提交
- git checkout master //转移回master分支
- git merge experimental //经证实分支开发成功,将exerimental分支合并到主分支
- git commit -a //彻底完成此次分支合并,即提交master分支
- git branch -d experimental //因为experimental分支已提交,所以可安全删除此分支
如果分支开发失败:
- git checkout master
- git branch -D experimental //由于分支被证明失败,因此使用-D来放弃并删除该分支
8)随时查看图形化分支信息。
- gitk
9)当合作伙伴bob希望改进我(rocrocket)的工作成果,则:
- bob$git clone /home/rocrocket/project myrepo //此命令用于克隆我的工作到bob的myrepo目录下。请注意,此命令有可能会因为/home/rocrocket的目录权限问题而被拒绝,解决方法是chmod o+rx /home/rocrocket。
(省略bob数小时的开发过程)…
- bob$git commit -a //bob提交自己的改进成果到自己的git仓库中,并口头告知我(rocrocket)他已经完成了工作。
我如果非常非常信任bob的开发能力:
- cd /home/rocrocket/project
- git pull /home/bob/myrepo //pull命令的意思是从远端git仓库中取出(git-fetch)修改的代码,然后合并(git-merge)到我(rocrocket)的项目中去。读者要记住一个小技巧,那就是“git pull .”命令,它和git merge的功能是一样的,以后完全可以用git pull .来代替git merge哦!请注意,git-pull命令有可能会因为/home/bob的目录权限问题而被拒绝,解决方法是chmod o+rx /home/bob。
如果我不是很信任bob的开发能力:
- cd /home/rocrocket/project
- git fetch /home/bob/myrepo master:bobworks //此命令意思是提取出bob修改的代码内容,然后放到我(rocrocket)工作目录下的bobworks分支中。之所以要放到分支中,而不是master中,就是要我先仔仔细细看看bob的开发成果,如果我觉得满意,我再merge到master中,如果不满意,我完全可以直接git branch -D掉。
- git whatchanged -p master..bobworks //用来查看bob都做了什么
- git checkout master //切换到master分区
- git pull . bobworks //如果我检查了bob的工作后很满意,就可以用pull来将bobworks分支合并到我的项目中了
- git branch -D bobworks //如果我检查了bob的工作后很不满意,就可以用-D来放弃这个分支就可以了
过了几天,bob如果想继续帮助我开发,他需要先同步一下我这几天的工作成果,只要在其当初clone的myrepo目录下执行git pull即可:
- git pull //不用加任何参数,因为当初clone的时候,git已经记住了我(rocrocket)的工作目录,它会直接找到我的目录来取。
数据结构
GIT 核心数据结构有五个: object, blob, tree, commit, cache_entry。其中
- object: 基类。
- blob: 对应于一个文件。
- tree: 对应于一个目录。 一个 tree 包含一个或多个 blob 和 tree。
- commit: 对应于一个版本。 一个 commit 对象指向一个 tree 对象,该 tree 对象对应于该版本的根目录。 一个 commit 对象指向一个父 commit 对象, 表示它是该父commit 的下一个版本,或指向多个父 commit 对象,表示它由这些父 commit 合并得到。
Example
执行下面代码来创建一个测试目录
$ git init $ echo ‘Hi,rocrocket’>file.txt $ git add . $ git commit -a -m “initial commit”
然后键入 `git log' 得到
commit 241e0a4d2a5644f92737b7fba8b9eb19dcb0c345 Author: rocrocket <wupengchong@gmail.com> Date: Fri Sep 26 10:57:13 2008 +0800 initial commit
commit字符串后面的一大长串(共40位)十六进制数字是干什么用的?
这40位十六进制数是用来“标识一次commit”的名称。其实,这40位十六进制数是一个SHA1哈希数(Secure Hash Algorithm),它可以保证每次commit生成的名称都是唯一的且可以永远有效的。
- $git cat-file -t 241e //cat-file命令中-t选项表示列出相应ID的对象类型;241e是刚才commit后得出的SHA1码
commit //可以看到此ID对应的对象类型为一次commit
然后介入 cat-file 命令
$git cat-file commit 241e //此处的commit表示要查询的是一个对象类型为commit的对象,后面给出此对象的ID
tree 9a327d5e3aa818b98ddaa7b5b369f5deb47dc9f6 author rocrocket <wupengchong@gmail.com> 1222397833 +0800 committer rocrocket <wupengchong@gmail.com> 1222397833 +0800
- $ git ls-tree 9a327
100644 blob 7d4e0fa616551318405e8309817bcfecb7224cff file.txt
我们可以看到9a327这棵树上包括了一个file.txt文件,其ID为7d4e0f
- $ git cat-file -t 7d4e0f
blob
- $ git cat-file blob 7d4e0f
Hi,rocrocket
可以看到7d4e0f对应的对象的类型是blob,而其内容就是“Hi,rocrocket”
object
#ifndef OBJECT_H #define OBJECT_H struct object_list { struct object *item; struct object_list *next; }; struct object { unsigned parsed : 1; unsigned used : 1; unsigned int flags; unsigned char sha1[20]; const char *type; struct object_list *refs; }; int nr_objs; struct object **objs; struct object *lookup_object(unsigned char *sha1); void created_object(unsigned char *sha1, struct object *obj); /** Returns the object, having parsed it to find out what it is. **/ struct object *parse_object(unsigned char *sha1); void add_ref(struct object *refer, struct object *target); void mark_reachable(struct object *obj, unsigned int mask); #endif /* OBJECT_H */
blob
#ifndef BLOB_H #define BLOB_H #include "object.h" extern const char *blob_type; struct blob { struct object object; }; struct blob *lookup_blob(unsigned char *sha1); int parse_blob(struct blob *item); #endif /* BLOB_H */
tree
#ifndef TREE_H #define TREE_H #include "object.h" extern const char *tree_type; struct tree_entry_list { struct tree_entry_list *next; unsigned directory : 1; unsigned executable : 1; char *name; union { struct tree *tree; struct blob *blob; } item; }; struct tree { struct object object; unsigned has_full_path : 1; struct tree_entry_list *entries; }; struct tree *lookup_tree(unsigned char *sha1); int parse_tree(struct tree *tree); #endif /* TREE_H */
commit
#ifndef COMMIT_H #define COMMIT_H #include "object.h" #include "tree.h" struct commit_list { struct commit *item; struct commit_list *next; }; struct commit { struct object object; unsigned long date; struct commit_list *parents; struct tree *tree; }; extern const char *commit_type; struct commit *lookup_commit(unsigned char *sha1); int parse_commit(struct commit *item); void commit_list_insert(struct commit *item, struct commit_list **list_p); void free_commit_list(struct commit_list *list); void sort_by_date(struct commit_list **list); /** Removes the first commit from a list sorted by date, and adds all * of its parents. **/ struct commit *pop_most_recent_commit(struct commit_list **list, unsigned int mark); #endif /* COMMIT_H */
cache_entry
struct cache_entry { struct cache_time ce_ctime; struct cache_time ce_mtime; unsigned int ce_dev; unsigned int ce_ino; unsigned int ce_mode; unsigned int ce_uid; unsigned int ce_gid; unsigned int ce_size; unsigned char sha1[20]; //对应仓库中object的sha1 unsigned short ce_flags; char name[0]; //对应working tree中文件的路径 };
重要函数
由于index cache对理解git工作原理起到至关重要的作用,这里主要分析与之相关的一些函数
cache中的全局变量
const char *sha1_file_directory = NULL; struct cache_entry **active_cache = NULL; unsigned int active_nr = 0, active_alloc = 0;
void * read_sha1_file(const unsigned char *sha1, char *type, unsigned long *size)
void *map_sha1_file(const unsigned char *sha1, unsigned long *size) { char *filename = sha1_file_name(sha1); int fd = open(filename, O_RDONLY); struct stat st; void *map; if (fd < 0) { perror(filename); return NULL; } if (fstat(fd, &st) < 0) { close(fd); return NULL; } map = mmap(NULL, st.st_size, PROT_READ, MAP_PRIVATE, fd, 0); close(fd); if (-1 == (int)(long)map) return NULL; *size = st.st_size; return map; } void * unpack_sha1_file(void *map, unsigned long mapsize, char *type, unsigned long *size) { int ret, bytes; z_stream stream; char buffer[8192]; char *buf; /* Get the data stream */ memset(&stream, 0, sizeof(stream)); stream.next_in = map; stream.avail_in = mapsize; stream.next_out = buffer; stream.avail_out = sizeof(buffer); inflateInit(&stream); ret = inflate(&stream, 0); if (sscanf(buffer, "%10s %lu", type, size) != 2) return NULL; bytes = strlen(buffer) + 1; buf = malloc(*size); if (!buf) return NULL; memcpy(buf, buffer + bytes, stream.total_out - bytes); bytes = stream.total_out - bytes; if (bytes < *size && ret == Z_OK) { stream.next_out = buf + bytes; stream.avail_out = *size - bytes; while (inflate(&stream, Z_FINISH) == Z_OK) /* nothing */; } inflateEnd(&stream); return buf; } void * read_sha1_file(const unsigned char *sha1, char *type, unsigned long *size) { unsigned long mapsize; void *map, *buf; map = map_sha1_file(sha1, &mapsize); if (map) { buf = unpack_sha1_file(map, mapsize, type, size); munmap(map, mapsize); return buf; } return NULL; }
int write_sha1_file(char *buf, unsigned len, unsigned char *returnsha1)
int write_sha1_file(char *buf, unsigned len, unsigned char *returnsha1) { int size; char *compressed; z_stream stream; unsigned char sha1[20]; SHA_CTX c; /* Set it up */ memset(&stream, 0, sizeof(stream)); deflateInit(&stream, Z_BEST_COMPRESSION); size = deflateBound(&stream, len); compressed = malloc(size); /* Compress it */ stream.next_in = buf; stream.avail_in = len; stream.next_out = compressed; stream.avail_out = size; while (deflate(&stream, Z_FINISH) == Z_OK) /* nothing */; deflateEnd(&stream); size = stream.total_out; /* Sha1.. */ SHA1_Init(&c); SHA1_Update(&c, compressed, size); SHA1_Final(sha1, &c); if (write_sha1_buffer(sha1, compressed, size) < 0) return -1; if (returnsha1) memcpy(returnsha1, sha1, 20); return 0; } int write_sha1_buffer(const unsigned char *sha1, void *buf, unsigned int size) { char *filename = sha1_file_name(sha1); int fd; fd = open(filename, O_WRONLY | O_CREAT | O_EXCL, 0666); if (fd < 0) return (errno == EEXIST) ? 0 : -1; write(fd, buf, size); close(fd); return 0; }
int add_cache_entry(struct cache_entry *ce, int ok_to_add)
int cache_name_pos(const char *name, int namelen) { int first, last; first = 0; last = active_nr; while (last > first) { int next = (last + first) >> 1; struct cache_entry *ce = active_cache[next]; int cmp = cache_name_compare(name, namelen, ce->name, ce->namelen); if (!cmp) return next; if (cmp < 0) { last = next; continue; } first = next+1; } return -first-1; } int add_cache_entry(struct cache_entry *ce, int ok_to_add) { int pos; pos = cache_name_pos(ce->name, ce->namelen); /* existing match? Just replace it */ if (pos >= 0) { active_cache[pos] = ce; return 0; } pos = -pos-1; if (!ok_to_add) return -1; /* Make sure the array is big enough .. */ if (active_nr == active_alloc) { active_alloc = alloc_nr(active_alloc); active_cache = realloc(active_cache, active_alloc * sizeof(struct cache_entry *)); } /* Add it in.. */ active_nr++; if (active_nr > pos) memmove(active_cache + pos + 1, active_cache + pos, (active_nr - pos - 1) * sizeof(ce)); active_cache[pos] = ce; return 0; }
工作流程
1) working directory -> index
You update the index with information from the working directory withthe gitlink:git-update-index command. Yougenerally update the index information by just specifying the filenameyou want to update, like so:
2) index -> object database
You write your current index file to a "tree" object with the program
that doesn't come with any options - it will just write out thecurrent index into the set of tree objects that describe that state,and it will return the name of the resulting top-level tree. You canuse that tree to re-generate the index at any time by going in theother direction
3) object database -> index
You read a "tree" file from the object database, and use that topopulate (and overwrite - don't do this if your index contains anyunsaved state that you might want to restore later!) your currentindex. Normal operation is just
and your index file will now be equivalent to the tree that you savedearlier. However, that is only your 'index' file: your workingdirectory contents have not been modified.
4) index -> working directory
You update your working directory from the index by "checking out"files. This is not a very common operation, since normally you'd justkeep your files updated, and rather than write to your workingdirectory, you'd tell the index files about the changes in yourworking directory (i.e. `git-update-index`).
However, if you decide to jump to a new version, or check out somebodyelse's version, or just restore a previous tree, you'd populate yourindex file with read-tree, and then you need to check out the resultwith
or, if you want to check out all of the index, use `-a`.
5) Tying it all together
To commit a tree you have instantiated with "git-write-tree", you'dcreate a "commit" object that refers to that tree and the historybehind it - most notably the "parent" commits that preceded it inhistory.
You create a commit object by giving it the tree that describes thestate at the time of the commit, and a list of parents:
6) Examining the data
You can examine the data represented in the object database and theindex with various helper tools. For every object, you can usegitlink:git-cat-file[1] to examine details about theobject:
shows the type of the object, and once you have the type (which isusually implicit in where you find the object), you can use
7) Merging multiple trees
To get the "base" for the merge, you first look up the common parentof two commits with
which will return you the commit they are both based on.To do the merge, do
which will do all trivial merge operations for you directly in theindex file, and you can just write the result out with
脚本分析
git-merge-one-file-script
#!/bin/sh # # This is the git merge script, called with # # $1 - original file SHA1 (or empty) # $2 - file in branch1 SHA1 (or empty) # $3 - file in branch2 SHA1 (or empty) # $4 - pathname in repository # # # Handle some trivial cases.. The _really_ trivial cases have # been handled already by read-tree, but that one doesn't # do any merges that migth change the tree layout # # if the directory is newly added in a branch, it might not exist # in the current tree dir=$(dirname "$4") mkdir -p "$dir" case "${1:-.}${2:-.}${3:-.}" in # # deleted in both # "$1..") echo "ERROR: $4 is removed in both branches" echo "ERROR: This is a potential rename conflict" exit 1;; # # deleted in one and unchanged in the other # "$1.." | "$1.$1" | "$1$1.") rm -f -- "$4" echo "Removing $4" git-update-cache --remove -- "$4" exit 0 ;; # # added in one # ".$2." | "..$3" ) echo "Adding $4 with perm $6$7" mv $(unpack-file "$2$3") $4 chmod "$6$7" $4 git-update-cache --add -- $4 exit 0 ;; # # Added in both (check for same permissions) # ".$2$2") if [ "$6" != "$7" ]; then echo "ERROR: File $4 added in both branches, permissions conflict $6->$7" exit 1 fi echo "Adding $4 with perm $6" mv $(unpack-file "$2") $4 chmod "$6" $4 git-update-cache --add -- $4 exit 0;; # # Modified in both, but differently ;( # "$1$2$3") echo "Auto-merging $4" orig=$(git-unpack-file $1) src1=$(git-unpack-file $2) src2=$(git-unpack-file $3) merge "$src2" "$orig" "$src1" ret=$? if [ "$6" != "$7" ]; then echo "ERROR: Permissions $5->$6->$7 don't match merging $src2" if [ $ret -ne 0 ]; then echo "ERROR: Leaving conflict merge in $src2" fi exit 1 fi chmod -- "$6" "$src2" if [ $ret -ne 0 ]; then echo "ERROR: Leaving conflict merge in $src2" exit 1 fi cp -- "$src2" "$4" && chmod -- "$6" "$4" && git-update-cache --add -- "$4" && exit 0 ;; *) echo "Not handling case $1 -> $2 -> $3" ;; esac exit 1
git-pull-script
#!/bin/sh # # use "$1" or something in a real script, this # just hard-codes it. # merge_repo=$1 rm -f .git/MERGE_HEAD .git/ORIG_HEAD cp .git/HEAD .git/ORIG_HEAD echo "Getting object database" rsync -avz --ignore-existing $merge_repo/objects/. ${SHA1_FILE_DIRECTORY:-.git/objects}/. echo "Getting remote head" rsync -L $merge_repo/HEAD .git/MERGE_HEAD || exit 1 head=$(cat .git/HEAD) merge_head=$(cat .git/MERGE_HEAD) common=$(git-merge-base $head $merge_head) if [ -z "$common" ]; then echo "Unable to find common commit between" $merge_head $head exit 1 fi # Get the trees associated with those commits common_tree=$(git-cat-file commit $common | sed 's/tree //;q') head_tree=$(git-cat-file commit $head | sed 's/tree //;q') merge_tree=$(git-cat-file commit $merge_head | sed 's/tree //;q') if [ "$common" == "$merge_head" ]; then echo "Already up-to-date. Yeeah!" exit 0 fi if [ "$common" == "$head" ]; then echo "Updating from $head to $merge_head." echo "Destroying all noncommitted data!" echo "Kill me within 3 seconds.." sleep 3 git-read-tree -m $merge_tree && git-checkout-cache -f -a && git-update-cache --refresh echo $merge_head > .git/HEAD exit 0 fi echo "Trying to merge $merge_head into $head" git-read-tree -m $common_tree $head_tree $merge_tree merge_msg="Merge of $merge_repo" result_tree=$(git-write-tree 2> /dev/null) if [ $? -ne 0 ]; then echo "Simple merge failed, trying Automatic merge" git-merge-cache git-merge-one-file-script -a merge_msg="Automatic merge of $merge_repo" result_tree=$(git-write-tree) || exit 1 fi result_commit=$(echo "$merge_msg" | git-commit-tree $result_tree -p $head -p $merge_head) echo "Committed merge $result_commit" echo $result_commit > .git/HEAD git-checkout-cache -f -a && git-update-cache --refresh