開始

本章介紹Git的相關知識。 先從講解一些版本控制工具的背景知識開始,然後試著在讀者的系統將Git跑起來,最後則是設定它。 本在章結束,讀者應瞭解為什麼Git如 此流行、為什麼讀者應該利用它、以及完成使用它的準備工作。

關於版本控制

什麼是版本控制? 以及為什麼讀者會在意它? 版本控制是一個能夠記錄一個或一組檔案在某一段時間的變更,使得讀者以後能取回特定版本的系統。 在本書的範例中,讀者會學到如何對軟體的原始碼做版本控制。 即使實際上讀者幾乎可以針對電腦上任意型態的檔案做版本控制。

若讀者是繪圖或網頁設計師且想要記錄每一版影像或版面配置(這也通常是讀者想要做的),採用版本控制系統(VCS)做這件事是非常明智的。 它允許讀者將檔案復原到原本的狀態、將整個專案復原到先前的狀態、比對某一段時間的修改、查看最後是誰在哪個時間點做了錯誤的修改導致問題發生,等。 使用版本控制系統一般也意謂著若讀者做了一些傻事、或者遺失檔案,讀者能很容易的回復。 更進一步,僅需付出很小的代價即可得到這些優點。

本地端版本控制

許多讀者採用複製檔案到其它目錄的方式來做版本控制(若他們夠聰明的話,或許會是有記錄時間戳記的目錄)。 因為它很簡單,這是個很常見的方法;但它也很容易出錯。 讀者很容易就忘記在哪個目錄,並不小心的把錯誤的檔案寫入、或者複製到不想要的檔案。

為了解決這問題,設式設計師在很久以前開發了本地端的版本控制系統,具備簡單的資料庫,用來記載檔案的所有變更記錄(參考圖1-1)。

圖1-1。 本地端版本控制流程圖。

這種版本控制工具中最流行的是RCS,目前仍存在於許多電腦。 即使是流行的Mac OS X作業系統,都會在讀者安裝開發工具時安裝rcs命令。 此工具基本上以特殊的格式記錄修補集合(即檔案從一個版本變更到另一個版本所需資訊),並儲存於磁碟上。 它就可以藉由套用各修補集合産生各時間點的檔案內容。

集中式版本控制系統

接下來人們遇到的主要問題是需要多位其它系統的開發協同作業。 為了解決此問題,集中式版本控制系統被發展出來。 此系統,如:CVS、Subversion及Perforce皆具備單一伺服器,記錄所有版本的檔案,且有多個客戶端從伺服器從伺服器取出檔案。 在多年後,這已經是版本控制系統的標準(參考圖1-2)。

圖1-2. 集中式版本控制系統

這樣的配置提供了很多優點,特別是相較於本地端的版本控制系統來說。 例如:每個人皆能得知其它人對此專案做了些什麼修改有一定程度的瞭解。 管理員可調整存取權限,限制各使用者能做的事。 而且維護集中式版本控制系統也比維護散落在各使用者端的資料庫來的容易。

然而,這樣的配置也有嚴重的缺點。 最明顯的就是無法連上伺服器時。 如果伺服器關閉一個小時,在這段時間中沒有人能進行協同開發的工作或者將變更的部份傳遞給其它使用者。 如果伺服器用來儲存資料庫的硬碟損毀,而且沒有相關的偏份資料。 除了使用者已取到自己的電腦的版本外,所有資訊,包含該專案開發的歷史都會遺失。 本地端版本控制系統也會有同樣的問題,只要使用者將整個專案的開發歷史都放在同一個地方,就有遺失所有資料的風險。

分散式版本控制系統

這就是分散式版本控制系統被引入的原因。 在分散式版本控制系統,諸如:Git、Mercurial、Bazaar、Darcs。 客戶端不只是取出最後一版的檔案,而是複製整個儲存庫。 即使是整個系統賴以運作的電腦損毀,皆可將任何一個客戶端先前複製的資料還原到伺服器。 每一次的取出動作實際上就是完整備份整個儲存庫。(參考圖1-3)

圖1-3. 分散式版本控制系統

更進一步來說,許多這樣子的系統皆能同時與數個遠端的機器同時運作。 因此讀者能同時與許多不同群組的人們協同開發同一個專案。 這允許讀者設定多種集中式系統做不到的工作流程,如:階層式模式。

Git 的簡史

如同許多生命中美好的事物一樣,Git從有一點創意的破壞及激烈的討論中誕生。 Linux kernel 是開放原始碼中相當大的專案。 在 Linux kernel 大部份的維護時間內(1991~2002),修改該軟體的方式通常以多個修補檔及壓縮檔流通。 在2002年,Linux kernel 開始採用名為 BitKeeper 的商業分散式版本控制系統。

在 2005年,開發 Linux kernel 的社群與開發 BitKeeper 的商業公司的關係走向決裂,也無法再免費使用該工具。 這告訴了 Linux 社群及 Linux 之父 Linus Torvalds,該是基於使用 BitKeeper 得到的經驗,開發自有的工具的時候。 這個系統必須達成下列目標:

自從 2005 年誕生後,Git已相當成熟,也能很容易使手,並保持著最一開始的要求的品質。 它不可思議的快速、處理大型專案非常有效率、也具備相當優秀足以應付非線性開發的分支系統。(參考第三章)

Git 基礎要點

那麼,簡單的說,Git是一個什麼樣的系統? 這一章節是非常的重要的。 若讀者瞭解什麼是Git以及它的基本工作原因,那麼使用垉來就會很輕鬆且有效率。 在學習之前,試著忘記以前所知道的其它版本控制系統,如:Subversion 及 Perforce。 這將會幫助讀者使用此工具時發生不必要的誤會。 Git儲存資料及運作它們的方式遠異於其它系統,即使它們的使用者介面是很相似的。 瞭解這些差異會幫助讀者更準確的使用此工具。

記錄檔案快照,而不是差異的部份

Git與其它版本控制系統(包含Subversion以及與它相關的)的差別是如何處理資料的方式。 一般來說,大部份其它系統記錄資訊是一連串檔案更動的內容。 如圖1-4所示。 這些系統(CVS、Subversion、Perforce、Bazaar等等)儲存一組基本的檔案以及隨時間遞增而更動這些檔案的資料。

圖1-4. 其它系統傾向儲存每個檔案更動的資料。

Git並不以此種方式儲存資料。 而是將其視為小型檔案系統的一組快照。 每一次讀者提交更新時、或者儲存目前專案的狀態到Git時。 基本上它為當時的資料做一組快照並記錄參考到該快照的參考點。 為了講求效率,只要檔案沒有變更,Git不會再度儲存該檔案,而是記錄到前一次的相同檔案的連結。 Git的工作方式如圖1-5所示。

圖1-5. Git儲存每次專案更新時的快照。

這是Git與所有其它版本控制系統最重要的區別。 它完全顛覆傳統版本控制的作法。 這使用Git更像一個上層具備更強大工具的小型的檔案系統,而不只是版本控制系統。 我們將會在第三章介紹分支時,提到採用此種作法的優點。

大部份的動作皆可在本地端完成

大部份Git的動作皆只需要本地端的檔案及資源即可完成。 一般來說並需要到網路上其它電腦提取的資訊。 若讀者使用集中式版本控制系統,大部份的動作皆包含網路延遲的成本。 這項特點讓你覺得Git處理資料的速度飛快。 因為整個專案的歷史皆存在你的硬碟中,大部份的運作看起來幾乎都是馬上完成。

例如:瀏覽器專案的歷史,Git不需要到伺服器下載歷史,而是從本地端的磁碟機讀出來並顯示。 這意謂著讀者幾乎馬上就可以看到專案的歷史。 若讀者想瞭解某個檔案一個月前的版本及現在版本的差別,Git可在本地端找出一個月前的檔案並在比對兩者的差異,而不是要求遠端的伺服器執行這項工作,或者從伺服器取回舊版本的檔案並在本地端比對。

這意謂著即使讀者已離線,或者切斷VPN連線後,也很少有讀者無法執行的動作。 若讀者在飛機或火車上,並想要做一些工作,讀者在取得可上傳的網路前仍可很快樂的提交更新。 若讀者回到家且無法讓VPN連線程式正常運作,讀者仍然可繼續工作。 在許多其它系統幾乎是無法做這些事或者必須付出很大代價。 以Perforce為例,在無法連到伺服器時讀者做不了多少事。以Subversion及CVS為例,雖然讀者能編輯檔案,但因為資料庫此時是離線的,讀者無法提交更新到資料庫。 這看起來可能還不是什麼大問題,但讀者可能驚訝Git有這麼大的不同。

Git能檢查完整性

在Git中所有的物件在儲存前都會被計算查核碼並以查核碼檢索物件。 這意謂著Git不可能不清楚任何檔案或目錄的內容已被更動。 此功能內建在Git底層並整合到它的設計哲學。 Git不可能偵測不出讀者在傳輸或取得有問題的檔案。

Git用來計算查核碼的機制稱為SHA1雜湊法。 它由40個十六進制的字母組成的字串組成,基於Git的檔案內容或者目錄結構計算。 查核碼看起來如下所示:

24b9da6552252987aa493b52f8696cd6d3b00373

讀者會Git中到處都看到雜湊值,因為它到處被使用。 事實上Git以檔案內容的雜湊值定址出放置資料的地方,而不是檔案名稱。

Git 通常只增加資料

當讀者使用Git,幾乎所有的動作只是增加資料到Git的資料庫。 很難藉此讓做出讓系統無法復原或者清除資料的動作。 在任何版本控制系統,讀者有可能會遺失或者搞混尚未提交的更新。 但是在提交快照到Git後,很少會有遺失的情況,特別是讀者定期將資料庫更新到其它儲存庫。

這讓使用Git可輕鬆的像在玩一樣,因為我們知道我們可以進行任何實驗而不會破壞任何東西。 在第九章的“底層細節”中,我們會進一步討論Git如何儲存資料,以及讀者如何復原看似遺失的資料。

三種狀態

現在,注意。 若讀者希望接下來的學習過程順利些,這是關於Git的重要且需記住的事項。 Git有三種表達檔案的狀態:已提交、已修改及已暫存。 已提交意謂著資料己安全地存在讀者的本地端資料庫。 己修改代表著讀者已修改檔案但尚未提交到資料庫。 已暫存意謂著讀者標記已修改檔案目前的版本到下一次提供的快照。

這帶領我們到Git專案的三個主要區域:Git目錄、工作目錄及暫存區域。

圖1-6. 工作目錄、暫存區域及git目錄。

Git目錄是Git用來儲存讀者的專案的元數據及物件資料庫。 這是Git最重要的部份而且它是當讀者從其它電腦複製儲存庫時會複製過來的。

工作目錄是專案被取出的某一個版本。 這些檔案從Git目錄內被壓縮過的資料庫中拉出來並放在磁碟機供讀者使用或修改。

暫存區域是一個單純的檔案,一般來說放在Git目錄,儲存關於下一個提交的資訊。 有時稱為索引,但現在將它稱為暫存區域已開始成為標準。

基本Git工作流程大致如下:

  1. 讀者修改工作目錄內的檔案。
  2. 讀者將檔案的快照新增到暫存區域。
  3. 做提交的動作,這會讓存在暫存區域的檔案快照永久的儲存在Git目錄。

在Git目錄內特定版本的檔案被認定為已提交。 若檔案被修改且被增加到暫存區域,稱為被暫存。 若檔案被取出後有被修改,但未被暫存,稱為被修改。 在第二章讀者會學到更多關於這些狀態以及如何利用它們的優點或者整個略過暫存步驟。

安裝Git

Let’s get into using some Git. First things first—you have to install it. You can get it a number of ways; the two major ones are to install it from source or to install an existing package for your platform. 讓我們開始使用Git。 首先讀者要做的事是安裝Git。 讀者有很多取得它們的方法。 主要的兩種分別是從原始碼安裝或者從讀者使用平台現存的套件安裝。

從原始碼安裝

若讀者有能力的話,從原始碼安裝是非常有用的。 因為讀者能取得最新版本。 每一版Git通常都會包含有用的UI改善。 因此取得最新版本通常是最好的,只要讀者覺得編譯軟體的原始碼是很容易的。 許多Linux發行套件通常都是附上非常舊的套件。 除非讀者使用的發行套件非常新或者使用向後相容的移植版本。 從原始碼安裝通常是最好的選擇。

要安裝Git,讀者需要先安裝它需要的程式庫:curl、zlib、openssl、expat及libiconv。 例如:若讀者的系統有yum(如:Fedpra)或apt-get(如:以Debian為基礎的系統),讀者可使用下列任一命令安裝所有需要的程式庫:

$ yum install curl-devel expat-devel gettext-devel \
  openssl-devel zlib-devel

$ apt-get install libcurl4-gnutls-dev libexpat1-dev gettext \
  libz-dev

當讀者安裝所有必要的程式庫,讀者可到Git的網站取得最新版本:

http://git-scm.com/download

接著,編譯及安裝:

$ tar -zxf git-1.6.0.5.tar.gz
$ cd git-1.6.0.5
$ make prefix=/usr/local all
$ sudo make prefix=/usr/local install

在這些工作完成後,讀者也可以使用Git取得Git的更新版:

$ git clone git://git.kernel.org/pub/scm/git/git.git

在Linux系統安裝

若讀者想使用二進位安裝程式安裝Git到Linux,一般來說讀者可經由發行套件提供的套件管理工具完成此工作。 若讀者使用Fedora,可使用yum:

$ yum install git-core

若讀者在以Debian為基礎的發行套件,如:Ubuntu。 試試apt-get:

$ apt-get install git-core

在Mac系統安裝

有兩種很容易將Git安裝到Mac的方法。 最簡單的是使用圖形化界面的Git安裝程式,可從Google Code下載)圖1-7):

http://code.google.com/p/git-osx-installer

圖1-7. Git OS X 安裝程式。

藉由MacPorts安裝Git是另一種主要的方法。 若讀者已安裝MacPorts,使用下列命令安裝Git

$ sudo port install git-core +svn +doc +bash_completion +gitweb

讀者完全不需要安裝所有的額外套件,但讀者可能會想要加上+svn參數,以利於使用Git讀寫Subversion儲存庫(參考第8章)

在Windows系統安裝

在Windows系統安裝Git相當的容易。 msysGit專案已提供相當容易安裝的程序。 只要從Google Code網頁下載安裝程式並執行即可:

http://code.google.com/p/msysgit

在安裝完畢後,讀者同時會有命令列版本(包含SSH客戶端程式)及標準的圖形界面版本。

初次設定Git

現在讀者的系統已安裝了Git,讀者可能想要做一些客製化的動作。 讀者應只需要做這些工作一次。 這些設定在更新版本時會被保留下來。 讀者可藉由再度執行命令的方式再度修改這些設定。

Git附帶名為git config的工具,允許讀者取得及設定組態參數,可用來決定Git外觀及運作。 這些參數可存放在以下三個地方:

在Windows系統,Git在$HOME目錄(對大部份使用者來說是C:\Documents and Settings\$USER)內尋找.gitconfig。 它也會尋找/etc/gitconfig,只不過它是相對於Msys根目錄,取決於讀者當初在Windows系統執行Git的安裝程式時安裝的目的地。

設定識別資料

讀者安裝Git後首先應該做的事是指定使用者名稱及電子郵件帳號。 這一點非常重要,因為每次Git提交會使用這些資訊,而且提交後不能再被修改:

$ git config --global user.name "John Doe"
$ git config --global user.email johndoe@example.com

再說一次,若讀者有指定 --global 參數,只需要做這工作一次。 因為在此系統,不論Git做任何事都會採用此資訊。 若讀者想指定不同的名字或電子郵件給特定的專案, 只需要在該專案目錄內執行此命令,並確定未加上 --global 參數。

指定編輯器

現在讀者的識別資料已設定完畢,讀者可設定預設的文書編輯器,當Git需要讀者輸入訊息時會叫用它。 預設情況下,Git會使用系統預設的編輯器,一般來說是Vi或Vim。 若讀者想指定不同的編輯器,例如:Emacs。可執行下列指令:

$ git config --global core.editor emacs

指定合併工具

另外一個對讀者來說有用的選項是設定解決合併失敗時,讀者慣用的合併工具。 假設讀者想使用vimdiff:

$ git config --global merge.tool vimdiff

Git能接受kdiff3、tkdiff、meld、xxdiff、emerge、vimdiff、gvimdiff、ecmerge及opendiff做為合併工具。 讀者可設定自訂的工具。 詳情參考第七章。

檢查讀者的設定

若讀者想確認設定值,可使用 git config --list 命令列出所有Git能找到的設定值:

$ git config --list
user.name=Scott Chacon
user.email=schacon@gmail.com
color.status=auto
color.branch=auto
color.interactive=auto
color.diff=auto
...

讀者可能會看到同一個設定名稱出現多次,因為Git從不同的檔案讀到同一個設定名稱(例如:/etc/gitconfig及~/.gitconfig)。 在這情況下,Git會使用最後一個設定名稱的設定值。

使用者也可以下列命令 git config 設定名稱,檢視Git認為該設定名稱的設定值:

$ git config user.name
Scott Chacon

取得說明文件

若讀者在使用Git時需要幫助,有三種方法取得任何Git命令的手冊:

$ git help 命令
$ git 命令 --help
$ man git-命令

例如:讀者可以下列命令取得config命令的手冊

$ git help config

這些命令對讀者是很有幫助的,因為讀者可在任意地方取得它們,即使已離線。 若手冊及這本書不足以幫助讀者,且讀者需要更進一步的協助。 讀者可試著進入Freenode IRC伺服器(irc.freenode.net)的#git或#github頻道。 這些頻道平時都有上百位對Git非常瞭解的高手而且通常樂意協助。

總結

目前讀者應該對於Git有一些基本的瞭解,而且知道它與其它集中式版本控制系統的不同,其中有些可能是讀者正在使用的。 讀者的系統現在也應該有一套可動作的Git且已設定好讀者個人的識別資料。 現在正是學習一些Git的基本的操作的好時機。

Git 基礎

若讀者只需要讀取一個章節即可開始使用Git,這就是了。 本章節涵蓋讀者大部份用到Git時需要使用的所有基本命令。 在讀完本章節後,讀者應該有能力組態及初始化一個儲存庫、開始及停止追蹤檔案、暫存及提供更新。 還會提到如何讓Git忽略某些檔案、如何輕鬆且很快的救回失誤、如何瀏覽讀者的專案的歷史及觀看各個已提交的更新之間的變更、以及如何上傳到遠端儲存庫或取得。

取得Git儲存庫

讀者可使用兩種主要的方法取得一個Git儲存庫。 第一種是將現有的專案或者目錄匯入Git。 第二種從其它伺服器複製一份已存在的Git儲存庫。

在現有目錄初始化儲存庫

若讀者要開始使用 Git 追蹤現有的專案,只需要進入該專案的目錄並執行:

$ git init

這個命令會建立名為 .git 的子目錄,該目錄包含一個Git儲存庫架構必要的所有檔案。 目前來說,專案內任何檔案都還沒有被追蹤。(關於.git目錄內有些什麼檔案,可參考第九章)

若讀者想要開始對現有的檔案開始做版本控制(除了空的目錄以外),讀者也許應該開始追蹤這些檔案並做第一次的提交。 讀者能以少數的git add命令指定要追蹤的檔案,並將它們提交:

$ git add *.c
$ git add README
$ git commit -m 'initial project version'

這些命令執行完畢大約只需要一分鐘。 現在,讀者已經有個追蹤部份檔案及第一次提交內容的Git儲存庫。

複製現有的儲存庫

若讀者想要取得現有的Git儲存庫的複本(例如:讀者想要散佈的),那需要使用的命令是 git clone。 若讀者熟悉其它版本控制系統,例如:Subversion,讀者應該注意這個命令是複製,而不是取出特定版本。 這一點非常重要,Git取得的是大部份伺服器端所有的資料複本。 該專案歷史中所有檔案的所有版本都在讀者執行過 git clone 後拉回來。 事實上,若伺服器的磁碟機損毀,讀者可使用任何一個客戶端的複本還原伺服器為當初取得該複本的狀態(讀者可能會遺失一些僅存於伺服器的攔截程式,不過所有版本的資料都健在),參考第四章取得更多資訊。

讀者可以 git clone 超連結,複製一個儲存庫。 例如:若讀者想複製名為Grit的Ruby Git程式庫,可以執行下列命令:

$ git clone git://github.com/schacon/grit.git

接下來會有個名為grit的目錄被建立,並在其下初始化名為.git的目錄。 拉下所有存在該儲存庫的所有資料,並取出最新版本為工作複本。 若讀者進入新建立的 grit 目錄,會看到專案的檔案都在這兒,且可使用。 若讀者想畏複製儲存庫到grit以外其它名字的目錄,只需要在下一個參數指定即可:

$ git clone git://github.com/schacon/grit.git mygrit

這個命令做的事大致如同上一個命令,只不過目的目錄名為mygrit。

Git提供很多種協定給讀者使用。 上一個範例採用 git:// 協定,讀者可能會看過 http(s):// 或者 user@server:/path.git 等使用 SSH 傳輸的協定。 在第四章會介紹設定存取伺服器上的 Git 儲存庫的所有可用的選項,以及它們的優點及缺點。

提交更新到儲存庫

讀者現在有一個貨真價實的Git儲存庫,而且有一份已放到工作複本的該專案的檔案。 讀者需要做一些修改並提交這些更動的快照到儲存庫,當這些修改到達讀者想要記錄狀態的情況。

記住工作目錄內的每個檔案可能為兩種狀態的任一種:追蹤或者尚未被追蹤。 被追蹤的檔案是最近的快照;它們可被復原、修改,或者暫存。 未被追蹤的檔案則是其它未在最近快照也未被暫存的任何檔案。 當讀者第一次複製儲存器時,讀者所有檔案都是被追蹤且未被修改的。 因為讀者剛取出它們而且尚未更改做任何修改。

只要讀者編輯任何已被追蹤的檔案。 Git將它們視為被更動的,因為讀者將它們改成與最後一次提交不同。 讀者暫存這些已更動檔案並提供所有被暫存的更新, 並重複此週期。 此生命週期如圖2-1所示。

圖2-1. 檔案狀態的生命週期。

檢視檔案的狀態

主要給讀者用來檢視檔案的狀態是 git status 命令。 若讀者在複製完複本後馬上執行此命令,會看到如下的文字:

$ git status
# On branch master
nothing to commit (working directory clean)

這意謂著讀者有一份乾淨的工作目錄(換句話說,沒有未被追蹤或已被修改的檔案)。 Git未看到任何未被追蹤的檔案,否則會將它們列出。 最後,這個命令告訴讀者目前在哪一個分支。 到目前為止,一直都是master,這是預設的。 目前讀者不用考慮它。 下一個章節會詳細介紹分支。

假設讀者新增一些檔案到專案,如README。 若該檔案先前並不存在,執行 git status 命令後,讀者會看到未被追蹤的檔案,如下:

$ vim README
$ git status
# On branch master
# Untracked files:
#   (use "git add <file>..." to include in what will be committed)
#
#   README
nothing added to commit but untracked files present (use "git add" to track)

讀者可看到新增的README尚未被追蹤,因為它被列在輸出訊息的 Untracked files 下方。 除非讀者明確指定要將該檔案加入提交的快照,Git不會主動將它加入。 這樣就不會突然地將一些二進位格式的檔案或其它讀者並不想加入的檔案含入。 讀者的確是要新增 README 檔案,因此讓我們開始追蹤該檔案。

追蹤新檔案

要追蹤新增的檔案,讀者可使用git add命令。 欲追蹤README檔案,讀者可執行:

$ git add README

若讀者再度檢查目前狀態,可看到README檔案已被列入追蹤並且已被暫存:

$ git status
# On branch master
# Changes to be committed:
#   (use "git reset HEAD <file>..." to unstage)
#
#   new file:   README
#

因為它被放在Changes to be commited文字下方,讀者可得知它已被暫存起來。 若讀者此時提交更新,剛才執行git add加進來的檔案就會被記錄在歷史的快照。 讀者可能可回想一下先前執行git init後也有執行過git add,開始追蹤目錄內的檔案。 git add命令可接受檔名或者目錄名。 若是目錄名,會遞迴將整個目錄下所有檔案及子目錄都加進來。

暫存已修改檔案

讓我們修改已被追蹤的檔案。 若讀者修改先前已被追蹤的檔案,名為benchmarks.rb,並檢查目前儲存庫的狀態。 讀者會看到類似以下文字:

$ git status
# On branch master
# Changes to be committed:
#   (use "git reset HEAD <file>..." to unstage)
#
#   new file:   README
#
# Changed but not updated:
#   (use "git add <file>..." to update what will be committed)
#
#   modified:   benchmarks.rb
#

benchmarks.rb檔案出現在Changed but not updated下方,代表著這個檔案已被追蹤,而且位於工作目錄的該檔案已被修改,但尚未暫存。 要暫存該檔案,可執行git add命令(這是一個多重用途的檔案)。現在,讀者使用 git add將benchmarks.rb檔案暫存起來,並再度執行git status:

$ git add benchmarks.rb
$ git status
# On branch master
# Changes to be committed:
#   (use "git reset HEAD <file>..." to unstage)
#
#   new file:   README
#   modified:   benchmarks.rb
#

這兩個檔案目前都被暫存起來,而且會進入下一次的提交。 假設讀者記得仍需要對benchmarks.rb做一點修改後才要提交,可再度開啟並編輯該檔案。 然而,當我們再度執行git status:

$ vim benchmarks.rb 
$ git status
# On branch master
# Changes to be committed:
#   (use "git reset HEAD <file>..." to unstage)
#
#   new file:   README
#   modified:   benchmarks.rb
#
# Changed but not updated:
#   (use "git add <file>..." to update what will be committed)
#
#   modified:   benchmarks.rb
#

到底發生了什麼事? 現在benchmarks.rb同時被列在已被暫存及未被暫存。 這怎麼可能? 這表示Git的確在讀者執行git add命令後,將檔案暫存起來。 若讀者現在提交更新,最近一次執行git add命令時暫存的benchmarks.rb會被提交。 若讀者在git add後修改檔案,需要再度執行git add將最新版的檔案暫存起來:

$ git add benchmarks.rb
$ git status
# On branch master
# Changes to be committed:
#   (use "git reset HEAD <file>..." to unstage)
#
#   new file:   README
#   modified:   benchmarks.rb
#

忽略某些檔案

通常讀者會有一類不想讓Git自動新增,也不希望它們被列入未被追蹤的檔案。 這些通常是自動產生的檔案,例如:記錄檔或者編譯系統產生的檔案。 在這情況下,讀者可建立一個名為.gitignore檔案,列出符合這些檔案檔名的特徵。 以下是一個範例:

$ cat .gitignore
*.[oa]
*~

第一列告訴Git忽略任何檔名為.o或.a結尾的檔案,它們是可能是編譯系統建置讀者的程式碼時產生的目的檔及程式庫。 第二列告訴Git忽略所有檔名為~結尾的檔案,通常被很多文書編輯器,如:Emacs,使用的暫存檔案。 讀者可能會想一併將log、tmp、pid目錄及自動產生的文件等也一併加進來。 依據類推。 在讀者要開始開發之前將.gitignore設定好,通常是一個不錯的點子。 這樣子讀者不會意外的將真的不想追蹤的檔案提交到Git儲存庫。

編寫.gitignore檔案的規則如下:

Glob pattern就像是shell使用的簡化版正規運算式。 星號(*)匹配零個或多個字元;[abc]匹配中括弧內的任一字元(此例為a、b、c);問號(?)匹配單一個字元;中括孤內的字以連字符連接(如:[0-9]),用來匹配任何符合該範圍的字(此例為0到9)。

以下是其它的範例:

# 註解,會被忽略。
*.a       # 不要追蹤檔名為 .a 結尾的檔案
!lib.a    # 但是要追蹤 lib.a,即使上方已指定忽略所有的 .a 檔案
/TODO     # 只忽略根目錄下的 TODO 檔案。 不包含子目錄下的 TODO
build/    # 忽略build/目錄下所有檔案
doc/*.txt # 忽略doc/notes.txt但不包含doc/server/arch.txt

檢視已暫存及尚未暫存的更動

若git status命令仍無法清楚告訴讀者想要的資訊(讀者想知道的是更動了哪些內容,而不是哪些檔案)。 可使用git diff命令。 稍後我們會更詳盡講解該命令。 讀者使用它時通常會是為了瞭解兩個問題: 目前已做的修改但尚未暫存的內容是哪些? 以及將被提交的暫存資料有哪些? 雖然git status一般來說即可回答這些問題。 git diff可精確的顯示哪些列被加入或刪除,以修補檔方式表達。

假設讀者編輯並暫存README,接者修改benchmarks.rb檔案,卻未暫存。 若讀者檢視目前的狀況,會看到類似下方文字:

$ git status
# On branch master
# Changes to be committed:
#   (use "git reset HEAD <file>..." to unstage)
#
#   new file:   README
#
# Changed but not updated:
#   (use "git add <file>..." to update what will be committed)
#
#   modified:   benchmarks.rb
#

想瞭解尚未暫存的修改,執行git diff,不用帶任何參數:

$ git diff
diff --git a/benchmarks.rb b/benchmarks.rb
index 3cb747f..da65585 100644
--- a/benchmarks.rb
+++ b/benchmarks.rb
@@ -36,6 +36,10 @@ def main
           @commit.parents[0].parents[0].parents[0]
         end

+        run_code(x, 'commits 1') do
+          git.commits.size
+        end
+
         run_code(x, 'commits 2') do
           log = git.commits('master', 15)
           log.size

這命令比對目前工作目錄及暫存區域後告訴讀者哪些變更尚未被暫存。

若讀者想知道將被提交的暫存資料,使用git diff --cached(在Git 1.6.1及更新版本,也可以使用較易記憶的git diff --staged命令)。 這命令比對暫存區域及最後一個提交。

$ git diff --cached
diff --git a/README b/README
new file mode 100644
index 0000000..03902a1
--- /dev/null
+++ b/README2
@@ -0,0 +1,5 @@
+grit
+ by Tom Preston-Werner, Chris Wanstrath
+ http://github.com/mojombo/grit
+
+Grit is a Ruby library for extracting information from a Git repository

很重要的一點是git diff不會顯示最後一次commit後的所有變更;只會顯示尚未暫存的變更。 這一點可能會混淆,若讀者已暫存所有的變更,git diff不會顯示任何資訊。

舉其它例子,若讀者暫存benchmarks.rb檔案後又編輯,可使用git diff看已暫存的版本與工作目錄內版本尚未暫存的變更:

$ git add benchmarks.rb
$ echo '# test line' >> benchmarks.rb
$ git status
# On branch master
#
# Changes to be committed:
#
#   modified:   benchmarks.rb
#
# Changed but not updated:
#
#   modified:   benchmarks.rb
#

現在讀者可使用git diff檢視哪些部份尚未被暫存:

$ git diff 
diff --git a/benchmarks.rb b/benchmarks.rb
index e445e28..86b2f7c 100644
--- a/benchmarks.rb
+++ b/benchmarks.rb
@@ -127,3 +127,4 @@ end
 main()

 ##pp Grit::GitRuby.cache_client.stats 
+# test line

以及使用git diff --cached檢視目前已暫存的變更:

$ git diff --cached
diff --git a/benchmarks.rb b/benchmarks.rb
index 3cb747f..e445e28 100644
--- a/benchmarks.rb
+++ b/benchmarks.rb
@@ -36,6 +36,10 @@ def main
          @commit.parents[0].parents[0].parents[0]
        end

+        run_code(x, 'commits 1') do
+          git.commits.size
+        end
+              
        run_code(x, 'commits 2') do
          log = git.commits('master', 15)
          log.size

提交修改

現在讀者的暫存區域已被更新為讀者想畏的,可開始提交變更的部份。 要記得任何尚未被暫存的新建檔案或已被修改但尚未使用git add暫存的檔案將不會被記錄在本次的提交中。 它們仍會以被修改的檔案的身份存在磁碟中。 在這情況下,最後一次執行git status,讀者會看到所有已被暫存的檔案,讀者也準備好要提交修改。 最簡單的提交是執行git commit:

$ git commit

執行此命令會叫出讀者指定的編輯器。(由讀者shell的$EDITOR環境變數指定,通常是vim或emacs。讀者也可以如同第1章介紹的,使用git config --global core.editor命令指定)

編輯器會顯示如下文字(此範例為Vim的畫面):

# Please enter the commit message for your changes. Lines starting
# with '#' will be ignored, and an empty message aborts the commit.
# On branch master
# Changes to be committed:
#   (use "git reset HEAD <file>..." to unstage)
#
#       new file:   README
#       modified:   benchmarks.rb 
~
~
~
".git/COMMIT_EDITMSG" 10L, 283C

讀者可看到預設的提交訊息包含最近一次git status的輸出以註解方式呈現,以及螢幕最上方有一列空白列。 讀者可移除這些註解後再輸入提交的訊息,或者保留它們,提醒你現在正在進行提交。(若想知道更動的內容,可傳遞-v參數給git commit。如此一來連比對的結果也會一併顯示在編輯器內,方便讀者明確看到有什麼變更。) 當讀者離開編輯器,Git會利用這些提交訊息產生新的提交(註解及比對的結果會先被濾除)。

另一種方式則是在commit命令後方以-m參數指定提交訊息,如下:

$ git commit -m "Story 182: Fix benchmarks for speed"
[master]: created 463dc4f: "Fix benchmarks for speed"
 2 files changed, 3 insertions(+), 0 deletions(-)
 create mode 100644 README

現在讀者已建立第一個提交! 讀者可從輸出的訊息看到此提交、放到哪個分支(master)、SHA-1查核碼(463dc4f)、有多少檔案被更動,以及統計此提交有多少列被新增及移除。

記得提交記錄讀者放在暫存區的快照。 任何讀者未暫存的仍然保持在已被修改狀態;讀者可進行其它的提交,將它增加到歷史。 每一次讀者執行提供,都是記錄專案的快照,而且以後可用來比對或者復原。

跳過暫存區域

雖然優秀好用的暫存區域能很有技巧且精確的提交讀者想記錄的資訊,有時候暫存區域也比讀者實際需要的工作流程繁瑣。 若讀者想跳過暫存區域,Git提供了簡易的使用方式。 在git commit命令後方加上-a參數,Git自動將所有已被追蹤且被修改的檔案送到暫存區域並開始提交程序,讓讀者略過git add的步驟:

$ git status
# On branch master
#
# Changed but not updated:
#
#   modified:   benchmarks.rb
#
$ git commit -a -m 'added new benchmarks'
[master 83e38c7] added new benchmarks
 1 files changed, 5 insertions(+), 0 deletions(-)

留意本次的提交之前,讀者並不需要執行git add將benchmarks.rb檔案加入。

刪除檔案

要從Git刪除檔案,讀者需要將它從已被追蹤檔案中移除(更精確的來說,是從暫存區域移除),並且提交。 git rm命令除了完成此工作外,也會將該檔案從工作目錄移除。 因此讀者以後不會在未被追蹤檔案列表看到它。

若讀者僅僅是將檔案從工作目錄移除,那麼在git status的輸出,可看見該檔案將會被視為已被變更且尚未被更新(也就是尚未存到暫存區域):

$ rm grit.gemspec
$ git status
# On branch master
#
# Changed but not updated:
#   (use "git add/rm <file>..." to update what will be committed)
#
#       deleted:    grit.gemspec
#

接著,若執行git rm,則會將暫存區域內的該檔案移除:

$ git rm grit.gemspec
rm 'grit.gemspec'
$ git status
# On branch master
#
# Changes to be committed:
#   (use "git reset HEAD <file>..." to unstage)
#
#       deleted:    grit.gemspec
#

下一次提交時,該檔案將會消失而且不再被追蹤。 若已更動過該檔案且將它記錄到暫存區域。 必須使用-f參數才能將它強制移除。 這是為了避免已被記錄的快照意外被移除且再也無法使用Git復原。

其它有用的技巧的是保留工作目錄內的檔案,但從暫存區域移除。 換句話說,或許讀者想在磁碟機上的檔案且不希望Git繼續追蹤它。 這在讀者忘記將某些檔案記錄到.gitignore且不小心將它增加到暫存區域時特別有用。 比如說:巨大的記錄檔、或大量在編譯時期產生的.a檔案。 欲使用此功能,加上--cached參數:

$ git rm --cached readme.txt

除了檔名、目錄名以外,還可以指定簡化的正規運算式給git rm命令。 這意謂著可執行類似下列指令:

$ git rm log/\*.log

注意倒斜線(\)前方的星號(*)。 這是必須的,因為Git會在shell以上執行檔案的擴展。 此命令移除log目錄下所有檔名以.log結尾的檔案。 讀者也可以執行類似下列命令:

$ git rm \*~

此命令移除所有檔名以~結尾的檔案。

搬動檔案

Git並不像其它檔案控制系統一樣,很精確的追蹤檔案的移動。 若將被Git追蹤的檔名更名,Git並沒有任何元數據記錄此更名動作。 然而Git能很聰明的指出這一點。 稍後會介紹關於偵測檔案的搬動。

因此Git的mv指令會造成一點混淆。 若想要用Git更名某個檔案,可執行以下命令:

$ git mv file_from file_to

而且這命令可正常工作。 事實上,在執行完更名的動作後檢視一下狀態。 可看到Git認為該檔案被更名:

$ git mv README.txt README
$ git status
# On branch master
# Your branch is ahead of 'origin/master' by 1 commit.
#
# Changes to be committed:
#   (use "git reset HEAD <file>..." to unstage)
#
#       renamed:    README.txt -> README
#

不過,這就相當於執行下列命令:

$ mv README.txt README
$ git rm README.txt
$ git add README

Git會在背後判斷檔案是否被更名,因此不管是用上述方法還是mv命令都沒有差別。 實際上唯一不同的是mv是一個命令,而不是三個。 使用上較方便。 更重畏的是讀者可使用任何慣用的工具更名,再使用add/rm,接著才提交。

檢視提交的歷史記錄

在提交數個更新,或者複製已有一些歷史記錄的儲存庫後。 或許會想希望檢視之前發生過什麼事。 最基本也最具威力的工具就是 git log 命令。

以下採用非常簡單,名為 simplegit 的專案做展示。 欲取得此專案,執行以下命令:

git clone git://github.com/schacon/simplegit-progit.git

在此專案目錄內執行 git log,應該會看到類似以下訊息:

$ git log
commit ca82a6dff817ec66f44342007202690a93763949
Author: Scott Chacon <schacon@gee-mail.com>
Date:   Mon Mar 17 21:52:11 2008 -0700

    changed the version number

commit 085bb3bcb608e1e8451d4b2432f8ecbe6306e7e7
Author: Scott Chacon <schacon@gee-mail.com>
Date:   Sat Mar 15 16:40:33 2008 -0700

    removed unnecessary test code

commit a11bef06a3f659402fe7563abf99ad00de2209e6
Author: Scott Chacon <schacon@gee-mail.com>
Date:   Sat Mar 15 10:31:28 2008 -0700

    first commit

在未加任何參數情況下,git log以新到舊的順序列出儲存庫的提交的歷史記錄。 也就是說最新的更新會先被列出來。 同時也會列出每個更新的 SHA1 查核值、作者大名及電子郵件地址、及提交時輸入的訊息。

git log命令有很多樣化的選項,供讀者精確指出想搜尋的結果。 接下來會介紹一些常用的選項。

最常用的選項之一為 -p,用來顯示每個更新之間差別的內容。 另外還可以加上 -2 參數,限制為只輸出最後兩個更新。

$ git log –p -2
commit ca82a6dff817ec66f44342007202690a93763949
Author: Scott Chacon <schacon@gee-mail.com>
Date:   Mon Mar 17 21:52:11 2008 -0700

    changed the version number

diff --git a/Rakefile b/Rakefile
index a874b73..8f94139 100644
--- a/Rakefile
+++ b/Rakefile
@@ -5,7 +5,7 @@ require 'rake/gempackagetask'
 spec = Gem::Specification.new do |s|
-    s.version   =   "0.1.0"
+    s.version   =   "0.1.1"
     s.author    =   "Scott Chacon"

commit 085bb3bcb608e1e8451d4b2432f8ecbe6306e7e7
Author: Scott Chacon <schacon@gee-mail.com>
Date:   Sat Mar 15 16:40:33 2008 -0700

    removed unnecessary test code

diff --git a/lib/simplegit.rb b/lib/simplegit.rb
index a0a60ae..47c6340 100644
--- a/lib/simplegit.rb
+++ b/lib/simplegit.rb
@@ -18,8 +18,3 @@ class SimpleGit
     end

 end
-
-if $0 == __FILE__
-  git = SimpleGit.new
-  puts git.show
-end
\ No newline at end of file

這個選項除了顯示相同的資訊外,還另外附上每個更新的差異。 這對於重新檢視或者快速的瀏覽協同工作伙伴新增的更新非常有幫助。 另外也可以使用git log提供的一系統摘要選項。 例如:若想檢視每個更新的簡略統計資訊,可使用 --stat 選項:

$ git log --stat 
commit ca82a6dff817ec66f44342007202690a93763949
Author: Scott Chacon <schacon@gee-mail.com>
Date:   Mon Mar 17 21:52:11 2008 -0700

    changed the version number

 Rakefile |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

commit 085bb3bcb608e1e8451d4b2432f8ecbe6306e7e7
Author: Scott Chacon <schacon@gee-mail.com>
Date:   Sat Mar 15 16:40:33 2008 -0700

    removed unnecessary test code

 lib/simplegit.rb |    5 -----
 1 files changed, 0 insertions(+), 5 deletions(-)

commit a11bef06a3f659402fe7563abf99ad00de2209e6
Author: Scott Chacon <schacon@gee-mail.com>
Date:   Sat Mar 15 10:31:28 2008 -0700

    first commit

 README           |    6 ++++++
 Rakefile         |   23 +++++++++++++++++++++++
 lib/simplegit.rb |   25 +++++++++++++++++++++++++
 3 files changed, 54 insertions(+), 0 deletions(-)

如以上所示,--stat選項在每個更新項目的下方列出被更動的檔案、有多少檔案被更動,以及有多行列被加入或移出該檔案。 也會在最後印出摘要的訊息。 其它實用的選項是 --pretty。 這個選項改變原本預設輸出的格式。 有數個內建的選項供讀者選用。 其中 oneline 選項將每一個更新印到單獨一行,對於檢視很多更新時很有用。 更進一步,short、full、fuller 選項輸出的格式大致相同,但會少一些或者多一些資訊。

$ git log --pretty=oneline
ca82a6dff817ec66f44342007202690a93763949 changed the version number
085bb3bcb608e1e8451d4b2432f8ecbe6306e7e7 removed unnecessary test code
a11bef06a3f659402fe7563abf99ad00de2209e6 first commit

最有趣的選項是 format,允許讀者指定自訂的輸出格式1。 當需要輸出給機器分析時特別有用。 因為明確的指定了格式,即可確定它不會因為更新 Git 而被更動:

$ git log --pretty=format:"%h - %an, %ar : %s"
ca82a6d - Scott Chacon, 11 months ago : changed the version number
085bb3b - Scott Chacon, 11 months ago : removed unnecessary test code
a11bef0 - Scott Chacon, 11 months ago : first commit

表格2-1列出一些 format 支援的選項。

選項  選項的說明
%H  該更新的SHA1雜湊值
%h  該更新的簡短SHA1雜湊值
%T  存放該更新的根目錄的Tree物件的SHA1雜湊值
%t  存放該更新的根目錄的Tree物件的簡短SHA1雜湊值
%P  該更新的父更新的SHA1雜湊值
%p  該更新的父更新的簡短SHA1雜湊值
%an 作者名字
%ae 作者電子郵件
%ad 作者的日期 (格式依據 date 選項而不同)
%ar 相對於目前時間的作者的日期
%cn 提交者的名字
%ce 提交者的電子郵件
%cd 提交的日期
%cr 相對於目前時間的提交的日期
%s  標題

讀者可能會好奇作者與提交者之間的差別。 作者是完成該工作的人,而提交者則是最後將該工作提交出來的人。 因此,若讀者將某個專案的修補檔送出,而且該專案的核心成員中一員套用該更新,則讀者與該成員皆會被列入該更新。 讀者即作者,而該成員則是提交者。 在第五章會提到較多之間的差別。

oneline 及 format 選項對於另一個名為 --graph 的選項特別有用。 該選項以 ASCII 畫出分支的分歧及合併的歷史。 可參考我們的 Grit 的儲存庫:

$ git log --pretty=format:"%h %s" --graph
* 2d3acf9 ignore errors from SIGCHLD on trap
*  5e3ee11 Merge branch 'master' of git://github.com/dustin/grit
|\  
| * 420eac9 Added a method for getting the current branch.
* | 30e367c timeout code and tests
* | 5a09431 add timeout protection to grit
* | e1193f8 support for heads with slashes in them
|/  
* d6016bc require time for xmlschema
*  11d191e Merge branch 'defunkt' into local

這些只是一些簡單的 git log 的選項,還有許多其它的。 表格2-2列出目前我們涵蓋的及一些可能有用的格式選項,以及它們如何更動 git log 命令的輸出格式。

選項  選項的說明
-p  顯示每個更新與上一個的差異。
--stat  顯示每個更新更動的檔案的統計及摘要資訊。
--shortstat 僅顯示--stat提供的的訊息中關於更動、插入、刪除的文字。
--name-only 在更新的訊息後方顯示更動的檔案列表。
--name-status   顯示新增、更動、刪除的檔案列表。
--abbrev-commit 僅顯示SHA1查核值的前幾位數,而不是顯示全部的40位數。
--relative-date 以相對於目前時間方式顯示日期(例如:“2 weeks ago”),而不是完整的日期格式。
--graph 以 ASCII 在 log 輸出旁邊畫出分支的分歧及合併。
--pretty    以其它格式顯示更新。 可用的選項包含oneline、short、full、fuller及可自訂格式的format。

限制 log 的輸出範圍

除了輸出格式的選項,git log也接受一些好用的選項。 也就是指定只顯示某一個子集合的更新。 先前已介紹過僅顯示最後兩筆更新的 -2 選項。 實際上可指定 -n,而 n 是任何整數,用來顯示最後的 n 個更新。 不過讀者可能不太會常用此選項,因為 Git 預設將所有的輸出導到分頁程式,故一次只會看到一頁。

然而,像 --since 及 --until 限制時間的選項就很有用。 例如,以下命令列出最近兩週的更新:

$ git log --since=2.weeks

此命令支援多種格式。 可指定特定日期(如:2008-01-15)或相對的日期,如:2 years 1day 3minutes ago。

使用者也可以過濾出符合某些搜尋條件的更新。 --author 選項允許使用者過濾出特定作者,而 --grep 選項允許以關鍵字搜尋提交的訊息。(注意:若希望同時符合作者名字及字串比對,需要再加上 --all-match;否則預設為列出符合任一條件的更新)

最後一個有用的選項是過濾路徑。 若指定目錄或檔案名稱,可僅印出更動到這些檔案的更新。 這選項永遠放在最後,而且一般來說會在前方加上 -- 以資區別。

在表格2-3,我們列出這些選項以及少數其它常見選項以供參考。

選項  選項的說明文字
-(n)    僅顯示最後 n 個更新
--since, --after    列出特定日期後的更新。
--until, --before   列出特定日期前的更新。
--author    列出作者名稱符合指定字串的更新。
--committer 列出提交者名稱符合指定字串的更新。

例如:若想檢視 Git 的原始碼中,Junio Hamano 在 2008 年十月份提交且不是合併用的更新。 可執行以下命令:

$ git log --pretty="%h - %s" --author=gitster --since="2008-10-01" \
   --before="2008-11-01" --no-merges -- t/
5610e3b - Fix testcase failure when extended attribute
acd3b9e - Enhance hold_lock_file_for_{update,append}()
f563754 - demonstrate breakage of detached checkout wi
d1a43f2 - reset --hard/read-tree --reset -u: remove un
51a94af - Fix "checkout --track -b newbranch" on detac
b0ad11e - pull: allow "git pull origin $something:$cur

Git 原始碼的更新歷史接近二萬筆更新,本命令顯示符合條件的六筆更新。

使用圖形界面檢視歷史

若讀者較偏向使用圖形界面檢視歷史,或與會想看一下隨著 Git 發怖的,名為 gitk 的 Tcl/Tk 程式。 Gitk 基本上就是 git log 的圖形界面版本,而且幾乎接受所有 git log 支援的過濾用選項。 若在專案所在目錄下執行 gitk 命令,將會看到如圖2-2的畫面。

圖2-2。 gitk檢視歷史程式。

在上圖中可看到視窗的上半部顯示相當棒的更新歷史圖。 視窗下半部則顯示當時被點選的更新引入的變更。

復原

在任何時間點,或許讀者會想要復原一些事情。 接下來我們會介紹一些基本的復原方式。 但要注意,由於有些復原動作所做的變更無法再被還原。 這是少數在使用 Git 時,執行錯誤的動作會遺失資料的情況。

更動最後一筆更新

最常見的復原發生在太早提交更新,也許忘了加入某些檔案、或者搞砸了提交的訊息。 若想要試著重新提交,可試著加上 --amend 選項:

$ git commit --amend

此命令取出暫存區資料並用來做本次的提交。 只要在最後一次提交後沒有做過任何修改(例如:在上一次提交後,馬上執行此命令),那麼整個快照看起來會與上次提交的一模一樣,唯一有更動的是提交時的訊息。

同一個文書編輯器被帶出來,並且已包含先前提交的更新內的訊息。 讀者可像往常一樣編輯這些訊息,差別在於它們會覆蓋上一次提交。

如下例,若提交了更新後發現忘了一併提交某些檔案,可執行最後一個命令:

$ git commit -m 'initial commit'
$ git add forgotten_file
$ git commit --amend

這些命令的僅僅會提交一個更新,第二個被提交的更新會取代第一個。

取消已被暫存的檔案

接下來兩節展示如何應付暫存區及工作目錄的復原。 用來判斷這兩個區域狀態的命令也以相當好的方式提示如何復原。 比如說已經修改兩個檔案,並想要以兩個不同的更新提交它們,不過不小心執行 git add * 將它們同時都加入暫存區。 應該如何將其中一個移出暫存區? git status 命令已附上相關的提示:

$ git add .
$ git status
# On branch master
# Changes to be committed:
#   (use "git reset HEAD <file>..." to unstage)
#
#       modified:   README.txt
#       modified:   benchmarks.rb
#

在 "Changes to be commited" 文字下方,註明著使用 "git reset HEAD ...",將 file 移出暫存區。 因此,讓我們依循該建議將 benchmarks.rb 檔案移出暫存區:

$ git reset HEAD benchmarks.rb 
benchmarks.rb: locally modified
$ git status
# On branch master
# Changes to be committed:
#   (use "git reset HEAD <file>..." to unstage)
#
#       modified:   README.txt
#
# Changed but not updated:
#   (use "git add <file>..." to update what will be committed)
#   (use "git checkout -- <file>..." to discard changes in working directory)
#
#       modified:   benchmarks.rb
#

這個命令看起來有點奇怪,不過它的確可用。 benchmarks.rb 檔案被移出暫存區了。

復原已被更動的檔案

若讀者發現其者並不需要保留 benchmarks.rb 檔案被更動部份,應該如何做才能很容易的復原為最後一次提交的狀態(或者最被複製儲存庫時、或放到工作目錄時的版本)? 很幸運的,git status 同樣也告訴讀者如何做。 在最近一次檢視狀態時,暫存區看起來應如下所示:

# Changed but not updated:
#   (use "git add <file>..." to update what will be committed)
#   (use "git checkout -- <file>..." to discard changes in working directory)
#
#       modified:   benchmarks.rb
#

在這訊息中已很明確的說明如何拋棄所做的修改(至少需升級為 Git 1.6.1或更新版本。 若讀者使用的是舊版,強烈建議升級,以取得更好用的功能。) 讓我們依據命令執行:

$ git checkout -- benchmarks.rb
$ git status
# On branch master
# Changes to be committed:
#   (use "git reset HEAD <file>..." to unstage)
#
#       modified:   README.txt
#

在上述文字可看到該變更已被復原。 讀者應該瞭解這是危險的命令,任何對該檔案做的修改將不復存在,就好像複製別的檔案將它覆蓋。 除非很清楚真的不需要該檔案,絕不要使用此檔案。 若需要將這些修改排除,我們在下一章節會介紹備份及分支。 一般來說會比此方法來的好。

切記,任何在 Git 提交的更新幾乎都是可復原的。 即使是分支中的更新被刪除或被 --amend 覆寫,皆能被覆原。(參考第九章關於資料的復原) 然而,未被提交的則幾乎無法救回。

與遠端協同工作

欲在任何Git控管的專案協同作業,需要瞭解如何管理遠端的儲存庫。 遠端儲存庫讀者是置放在網際網路或網路其它地方的複本。 讀者可設定多個遠端儲存庫,唯讀或者可讀寫。 與他人協同作業時,需要管理這些遠端儲存庫,並在需要分享工作時上傳或下載資料。 管理遠端儲存庫包含瞭解如何新增遠端儲存庫、移除已失效的儲存庫、管理許多分支及定義是否要追蹤它們等等。 本節包含如何遠端管理的技巧。

顯示所有的遠端儲存庫

欲瞭解目前已加進來的遠端儲存庫,可執行 git remote 命令。 它會列出當初加入遠端儲存庫時指定的名稱。 若目前所在儲存庫是從其它儲存庫複製過來的,至少應該看到 origin,也就是 Git 複製儲存庫時預設取的名字:

$ git clone git://github.com/schacon/ticgit.git
Initialized empty Git repository in /private/tmp/ticgit/.git/
remote: Counting objects: 595, done.
remote: Compressing objects: 100% (269/269), done.
remote: Total 595 (delta 255), reused 589 (delta 253)
Receiving objects: 100% (595/595), 73.31 KiB | 1 KiB/s, done.
Resolving deltas: 100% (255/255), done.
$ cd ticgit
$ git remote 
origin

也可以再加上 -v 參數,將會在名稱後方顯示其URL:

$ git remote -v
origin  git://github.com/schacon/ticgit.git

若有一個以上遠端儲存庫,此命令會列出全部。 例如:我的 Grit 儲存庫包含以下遠端儲存庫。

$ cd grit
$ git remote -v
bakkdoor  git://github.com/bakkdoor/grit.git
cho45     git://github.com/cho45/grit.git
defunkt   git://github.com/defunkt/grit.git
koke      git://github.com/koke/grit.git
origin    git@github.com:mojombo/grit.git

這意謂著我們可很容易從這些伙伴的儲存庫取得最新的更新。 要留意的是只有 origin 遠端的 URL 是 SSH。 因此它是唯一我們能上傳的遠端的儲存庫。(關於這部份將在第四章介紹)

新增遠端儲存庫

在先前章節已提到並示範如何新增遠端儲存庫,這邊會很明確的說明如何做這項工作。 欲新增遠端儲存庫並取一個簡短的名字,執行 git remote add名字 URL:

$ git remote
origin
$ git remote add pb git://github.com/paulboone/ticgit.git
$ git remote -v
origin  git://github.com/schacon/ticgit.git
pb  git://github.com/paulboone/ticgit.git

現在可看到命令列中的 pb 字串取代了整個 URL。 例如,若想取得 Paul 上傳的且本地端儲存庫沒有的更新,可執行 git fetch pb:

$ git fetch pb
remote: Counting objects: 58, done.
remote: Compressing objects: 100% (41/41), done.
remote: Total 44 (delta 24), reused 1 (delta 0)
Unpacking objects: 100% (44/44), done.
From git://github.com/paulboone/ticgit
 * [new branch]      master     -> pb/master
 * [new branch]      ticgit     -> pb/ticgit

現在可在本地端使用 pb/master 存取 Paul 的 master 分支。 讀者可將它合併到本地端任一分支、或者建立一個本地端的分支指向它,如果讀者想監看它。

從遠端儲存庫擷取或合併

如剛才所示,欲從遠端擷取資料,可執行:

$ git fetch [remote-name]

此命令到該遠端專案將所有本地端沒有的資料拉下來。 在執行此動作後,讀者應該有參考到該遠端專案所有分支的參考點,可在任何時間點用來合併或監看。(在第三章將會提及更多關於如何使用分支的細節)

若複製了一個儲存庫,會自動將該遠端儲存庫命令為 origin。 因此 git fetch origin 取出所有在複製或最後一下擷取後被上傳到該儲存庫的更新。 需留意的是 fetch 命令僅僅將資料拉到本地端的儲存庫,並未自動將它合併進來,也沒有修改任何目前工作的項目。 讀者得在必要時將它們手動合併進來。

若讀者設定一個會追蹤遠端分支的分支(參考下一節及第三章,取得更多資料),可使用 git pull 命令自動擷取及合併遠端分支到目錄的分支。 這對讀者來說或許是較合適的工作流程。 而且 git clone 命令預設情況下會自動設定本地端的 master 分支追蹤被複製的遠端儲存庫的 master 分支。(假設該儲存庫有 master 分支) 執行 git pull 一般來說會從當初複製時的來源儲存庫擷取資料並自動試著合併到目前工作的版本。

上傳到遠端儲存庫

當讀者有想分享出去的專案,可將更新上傳到上游。 執行此動作的命令很簡單:git push 遠端儲存庫名字 分支名。 若想要上傳 master 分支到 origin 伺服器(再說一次,複製時通常自動設定此名字),接著執行以下命令即可上傳到伺服器:

$ git push origin master

此命令只有在被複製的伺服器開放寫入權限給使用者,而且同一時間內沒有其它人在上傳。 若讀者在其它同樣複製該伺服器的使用者上傳一些更新後上傳到上游,該上傳動作將會被拒絕。 讀者必須先將其它使用者上傳的資料拉下來並整合進來後才能上傳。 參考第三章瞭解如何上傳到遠端儲存庫的細節。

監看遠端儲存庫

若讀者想取得遠端儲存庫某部份更詳盡的資料,可執行 git remote show 遠端儲存庫名字。 若執行此命令時加上特定的遠端名字,比如說: origin。 會看到類似以下輸出:

$ git remote show origin
* remote origin
  URL: git://github.com/schacon/ticgit.git
  Remote branch merged with 'git pull' while on branch master
    master
  Tracked remote branches
    master
    ticgit

It lists the URL for the remote repository as well as the tracking branch information. The command helpfully tells you that if you’re on the master branch and you run git pull, it will automatically merge in the master branch on the remote after it fetches all the remote references. It also lists all the remote references it has pulled down.

That is a simple example you’re likely to encounter. When you’re using Git more heavily, however, you may see much more information from git remote show:

$ git remote show origin
* remote origin
  URL: git@github.com:defunkt/github.git
  Remote branch merged with 'git pull' while on branch issues
    issues
  Remote branch merged with 'git pull' while on branch master
    master
  New remote branches (next fetch will store in remotes/origin)
    caching
  Stale tracking branches (use 'git remote prune')
    libwalker
    walker2
  Tracked remote branches
    acl
    apiv2
    dashboard2
    issues
    master
    postgres
  Local branch pushed with 'git push'
    master:master

This command shows which branch is automatically pushed when you run git push on certain branches. It also shows you which remote branches on the server you don’t yet have, which remote branches you have that have been removed from the server, and multiple branches that are automatically merged when you run git pull.

Removing and Renaming Remotes

If you want to rename a reference, in newer versions of Git you can run git remote rename to change a remote’s shortname. For instance, if you want to rename pb to paul, you can do so with git remote rename:

$ git remote rename pb paul
$ git remote
origin
paul

It’s worth mentioning that this changes your remote branch names, too. What used to be referenced at pb/master is now at paul/master.

If you want to remove a reference for some reason — you’ve moved the server or are no longer using a particular mirror, or perhaps a contributor isn’t contributing anymore — you can use git remote rm:

$ git remote rm paul
$ git remote
origin

Tagging

Like most VCSs, Git has the ability to tag specific points in history as being important. Generally, people use this functionality to mark release points (v1.0, and so on). In this section, you’ll learn how to list the available tags, how to create new tags, and what the different types of tags are.

Listing Your Tags

Listing the available tags in Git is straightforward. Just type git tag:

$ git tag
v0.1
v1.3

This command lists the tags in alphabetical order; the order in which they appear has no real importance.

You can also search for tags with a particular pattern. The Git source repo, for instance, contains more than 240 tags. If you’re only interested in looking at the 1.4.2 series, you can run this:

$ git tag -l 'v1.4.2.*'
v1.4.2.1
v1.4.2.2
v1.4.2.3
v1.4.2.4

Creating Tags

Git uses two main types of tags: lightweight and annotated. A lightweight tag is very much like a branch that doesn’t change — it’s just a pointer to a specific commit. Annotated tags, however, are stored as full objects in the Git database. They’re checksummed; contain the tagger name, e-mail, and date; have a tagging message; and can be signed and verified with GNU Privacy Guard (GPG). It’s generally recommended that you create annotated tags so you can have all this information; but if you want a temporary tag or for some reason don’t want to keep the other information, lightweight tags are available too.

Annotated Tags

Creating an annotated tag in Git is simple. The easiest way is to specify -a when you run the tag command:

$ git tag -a v1.4 -m 'my version 1.4'
$ git tag
v0.1
v1.3
v1.4

The -m specifies a tagging message, which is stored with the tag. If you don’t specify a message for an annotated tag, Git launches your editor so you can type it in.

You can see the tag data along with the commit that was tagged by using the git show command:

$ git show v1.4
tag v1.4
Tagger: Scott Chacon <schacon@gee-mail.com>
Date:   Mon Feb 9 14:45:11 2009 -0800

my version 1.4
commit 15027957951b64cf874c3557a0f3547bd83b3ff6
Merge: 4a447f7... a6b4c97...
Author: Scott Chacon <schacon@gee-mail.com>
Date:   Sun Feb 8 19:02:46 2009 -0800

    Merge branch 'experiment'

That shows the tagger information, the date the commit was tagged, and the annotation message before showing the commit information.

Signed Tags

You can also sign your tags with GPG, assuming you have a private key. All you have to do is use -s instead of -a:

$ git tag -s v1.5 -m 'my signed 1.5 tag'
You need a passphrase to unlock the secret key for
user: "Scott Chacon <schacon@gee-mail.com>"
1024-bit DSA key, ID F721C45A, created 2009-02-09

If you run git show on that tag, you can see your GPG signature attached to it:

$ git show v1.5
tag v1.5
Tagger: Scott Chacon <schacon@gee-mail.com>
Date:   Mon Feb 9 15:22:20 2009 -0800

my signed 1.5 tag
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.8 (Darwin)

iEYEABECAAYFAkmQurIACgkQON3DxfchxFr5cACeIMN+ZxLKggJQf0QYiQBwgySN
Ki0An2JeAVUCAiJ7Ox6ZEtK+NvZAj82/
=WryJ
-----END PGP SIGNATURE-----
commit 15027957951b64cf874c3557a0f3547bd83b3ff6
Merge: 4a447f7... a6b4c97...
Author: Scott Chacon <schacon@gee-mail.com>
Date:   Sun Feb 8 19:02:46 2009 -0800

    Merge branch 'experiment'

A bit later, you’ll learn how to verify signed tags.

Lightweight Tags

Another way to tag commits is with a lightweight tag. This is basically the commit checksum stored in a file — no other information is kept. To create a lightweight tag, don’t supply the -a, -s, or -m option:

$ git tag v1.4-lw
$ git tag
v0.1
v1.3
v1.4
v1.4-lw
v1.5

This time, if you run git show on the tag, you don’t see the extra tag information. The command just shows the commit:

$ git show v1.4-lw
commit 15027957951b64cf874c3557a0f3547bd83b3ff6
Merge: 4a447f7... a6b4c97...
Author: Scott Chacon <schacon@gee-mail.com>
Date:   Sun Feb 8 19:02:46 2009 -0800

    Merge branch 'experiment'

Verifying Tags

To verify a signed tag, you use git tag -v [tag-name]. This command uses GPG to verify the signature. You need the signer’s public key in your keyring for this to work properly:

$ git tag -v v1.4.2.1
object 883653babd8ee7ea23e6a5c392bb739348b1eb61
type commit
tag v1.4.2.1
tagger Junio C Hamano <junkio@cox.net> 1158138501 -0700

GIT 1.4.2.1

Minor fixes since 1.4.2, including git-mv and git-http with alternates.
gpg: Signature made Wed Sep 13 02:08:25 2006 PDT using DSA key ID F3119B9A
gpg: Good signature from "Junio C Hamano <junkio@cox.net>"
gpg:                 aka "[jpeg image of size 1513]"
Primary key fingerprint: 3565 2A26 2040 E066 C9A7  4A7D C0C6 D9A4 F311 9B9A

If you don’t have the signer’s public key, you get something like this instead:

gpg: Signature made Wed Sep 13 02:08:25 2006 PDT using DSA key ID F3119B9A
gpg: Can't check signature: public key not found
error: could not verify the tag 'v1.4.2.1'

Tagging Later

You can also tag commits after you’ve moved past them. Suppose your commit history looks like this:

$ git log --pretty=oneline
15027957951b64cf874c3557a0f3547bd83b3ff6 Merge branch 'experiment'
a6b4c97498bd301d84096da251c98a07c7723e65 beginning write support
0d52aaab4479697da7686c15f77a3d64d9165190 one more thing
6d52a271eda8725415634dd79daabbc4d9b6008e Merge branch 'experiment'
0b7434d86859cc7b8c3d5e1dddfed66ff742fcbc added a commit function
4682c3261057305bdd616e23b64b0857d832627b added a todo file
166ae0c4d3f420721acbb115cc33848dfcc2121a started write support
9fceb02d0ae598e95dc970b74767f19372d61af8 updated rakefile
964f16d36dfccde844893cac5b347e7b3d44abbc commit the todo
8a5cbc430f1a9c3d00faaeffd07798508422908a updated readme

Now, suppose you forgot to tag the project at v1.2, which was at the "updated rakefile" commit. You can add it after the fact. To tag that commit, you specify the commit checksum (or part of it) at the end of the command:

$ git tag -a v1.2 9fceb02

You can see that you’ve tagged the commit:

$ git tag 
v0.1
v1.2
v1.3
v1.4
v1.4-lw
v1.5

$ git show v1.2
tag v1.2
Tagger: Scott Chacon <schacon@gee-mail.com>
Date:   Mon Feb 9 15:32:16 2009 -0800

version 1.2
commit 9fceb02d0ae598e95dc970b74767f19372d61af8
Author: Magnus Chacon <mchacon@gee-mail.com>
Date:   Sun Apr 27 20:43:35 2008 -0700

    updated rakefile
...

Sharing Tags

By default, the git push command doesn’t transfer tags to remote servers. You will have to explicitly push tags to a shared server after you have created them. This process is just like sharing remote branches – you can run git push origin [tagname].

$ git push origin v1.5
Counting objects: 50, done.
Compressing objects: 100% (38/38), done.
Writing objects: 100% (44/44), 4.56 KiB, done.
Total 44 (delta 18), reused 8 (delta 1)
To git@github.com:schacon/simplegit.git
* [new tag]         v1.5 -> v1.5

If you have a lot of tags that you want to push up at once, you can also use the --tags option to the git push command. This will transfer all of your tags to the remote server that are not already there.

$ git push origin --tags
Counting objects: 50, done.
Compressing objects: 100% (38/38), done.
Writing objects: 100% (44/44), 4.56 KiB, done.
Total 44 (delta 18), reused 8 (delta 1)
To git@github.com:schacon/simplegit.git
 * [new tag]         v0.1 -> v0.1
 * [new tag]         v1.2 -> v1.2
 * [new tag]         v1.4 -> v1.4
 * [new tag]         v1.4-lw -> v1.4-lw
 * [new tag]         v1.5 -> v1.5

Now, when someone else clones or pulls from your repository, they will get all your tags as well.

Tips and Tricks

Before we finish this chapter on basic Git, a few little tips and tricks may make your Git experience a bit simpler, easier, or more familiar. Many people use Git without using any of these tips, and we won’t refer to them or assume you’ve used them later in the book; but you should probably know how to do them.

Auto-Completion

If you use the Bash shell, Git comes with a nice auto-completion script you can enable. Download the Git source code, and look in the contrib/completion directory; there should be a file called git-completion.bash. Copy this file to your home directory, and add this to your .bashrc file:

source ~/.git-completion.bash

If you want to set up Git to automatically have Bash shell completion for all users, copy this script to the /opt/local/etc/bash_completion.d directory on Mac systems or to the /etc/bash_completion.d/ directory on Linux systems. This is a directory of scripts that Bash will automatically load to provide shell completions.

If you’re using Windows with Git Bash, which is the default when installing Git on Windows with msysGit, auto-completion should be preconfigured.

Press the Tab key when you’re writing a Git command, and it should return a set of suggestions for you to pick from:

$ git co<tab><tab>
commit config

In this case, typing git co and then pressing the Tab key twice suggests commit and config. Adding m<tab> completes git commit automatically.

This also works with options, which is probably more useful. For instance, if you’re running a git log command and can’t remember one of the options, you can start typing it and press Tab to see what matches:

$ git log --s<tab>
--shortstat  --since=  --src-prefix=  --stat   --summary

That’s a pretty nice trick and may save you some time and documentation reading.

Git Aliases

Git doesn’t infer your command if you type it in partially. If you don’t want to type the entire text of each of the Git commands, you can easily set up an alias for each command using git config. Here are a couple of examples you may want to set up:

$ git config --global alias.co checkout
$ git config --global alias.br branch
$ git config --global alias.ci commit
$ git config --global alias.st status

This means that, for example, instead of typing git commit, you just need to type git ci. As you go on using Git, you’ll probably use other commands frequently as well; in this case, don’t hesitate to create new aliases.

This technique can also be very useful in creating commands that you think should exist. For example, to correct the usability problem you encountered with unstaging a file, you can add your own unstage alias to Git:

$ git config --global alias.unstage 'reset HEAD --'

This makes the following two commands equivalent:

$ git unstage fileA
$ git reset HEAD fileA

This seems a bit clearer. It’s also common to add a last command, like this:

$ git config --global alias.last 'log -1 HEAD'

This way, you can see the last commit easily:

$ git last
commit 66938dae3329c7aebe598c2246a8e6af90d04646
Author: Josh Goebel <dreamer3@example.com>
Date:   Tue Aug 26 19:48:51 2008 +0800

    test for current head

    Signed-off-by: Scott Chacon <schacon@example.com>

As you can tell, Git simply replaces the new command with whatever you alias it for. However, maybe you want to run an external command, rather than a Git subcommand. In that case, you start the command with a ! character. This is useful if you write your own tools that work with a Git repository. We can demonstrate by aliasing git visual to run gitk:

$ git config --global alias.visual "!gitk"

Summary

At this point, you can do all the basic local Git operations — creating or cloning a repository, making changes, staging and committing those changes, and viewing the history of all the changes the repository has been through. Next, we’ll cover Git’s killer feature: its branching model.

Git Branching

Nearly every VCS has some form of branching support. Branching means you diverge from the main line of development and continue to do work without messing with that main line. In many VCS tools, this is a somewhat expensive process, often requiring you to create a new copy of your source code directory, which can take a long time for large projects.

Some people refer to the branching model in Git as its “killer feature,” and it certainly sets Git apart in the VCS community. Why is it so special? The way Git branches is incredibly lightweight, making branching operations nearly instantaneous and switching back and forth between branches generally just as fast. Unlike many other VCSs, Git encourages a workflow that branches and merges often, even multiple times in a day. Understanding and mastering this feature gives you a powerful and unique tool and can literally change the way that you develop.

What a Branch Is

To really understand the way Git does branching, we need to take a step back and examine how Git stores its data. As you may remember from Chapter 1, Git doesn’t store data as a series of changesets or deltas, but instead as a series of snapshots.

When you commit in Git, Git stores a commit object that contains a pointer to the snapshot of the content you staged, the author and message metadata, and zero or more pointers to the commit or commits that were the direct parents of this commit: zero parents for the first commit, one parent for a normal commit, and multiple parents for a commit that results from a merge of two or more branches.

To visualize this, let’s assume that you have a directory containing three files, and you stage them all and commit. Staging the files checksums each one (the SHA-1 hash we mentioned in Chapter 1), stores that version of the file in the Git repository (Git refers to them as blobs), and adds that checksum to the staging area:

$ git add README test.rb LICENSE
$ git commit -m 'initial commit of my project'

When you create the commit by running git commit, Git checksums each subdirectory (in this case, just the root project directory) and stores those tree objects in the Git repository. Git then creates a commit object that has the metadata and a pointer to the root project tree so it can re-create that snapshot when needed.

Your Git repository now contains five objects: one blob for the contents of each of your three files, one tree that lists the contents of the directory and specifies which file names are stored as which blobs, and one commit with the pointer to that root tree and all the commit metadata. Conceptually, the data in your Git repository looks something like Figure 3-1.

Figure 3-1. Single commit repository data.

If you make some changes and commit again, the next commit stores a pointer to the commit that came immediately before it. After two more commits, your history might look something like Figure 3-2.

Figure 3-2. Git object data for multiple commits.

A branch in Git is simply a lightweight movable pointer to one of these commits. The default branch name in Git is master. As you initially make commits, you’re given a master branch that points to the last commit you made. Every time you commit, it moves forward automatically.

Figure 3-3. Branch pointing into the commit data’s history.

What happens if you create a new branch? Well, doing so creates a new pointer for you to move around. Let’s say you create a new branch called testing. You do this with the git branch command:

$ git branch testing

This creates a new pointer at the same commit you’re currently on (see Figure 3-4).

Figure 3-4. Multiple branches pointing into the commit’s data history.

How does Git know what branch you’re currently on? It keeps a special pointer called HEAD. Note that this is a lot different than the concept of HEAD in other VCSs you may be used to, such as Subversion or CVS. In Git, this is a pointer to the local branch you’re currently on. In this case, you’re still on master. The git branch command only created a new branch — it didn’t switch to that branch (see Figure 3-5).

Figure 3-5. HEAD file pointing to the branch you’re on.

To switch to an existing branch, you run the git checkout command. Let’s switch to the new testing branch:

$ git checkout testing

This moves HEAD to point to the testing branch (see Figure 3-6).

Figure 3-6. HEAD points to another branch when you switch branches.

What is the significance of that? Well, let’s do another commit:

$ vim test.rb
$ git commit -a -m 'made a change'

Figure 3-7 illustrates the result.

Figure 3-7. The branch that HEAD points to moves forward with each commit.

This is interesting, because now your testing branch has moved forward, but your master branch still points to the commit you were on when you ran git checkout to switch branches. Let’s switch back to the master branch:

$ git checkout master

Figure 3-8 shows the result.

Figure 3-8. HEAD moves to another branch on a checkout.

That command did two things. It moved the HEAD pointer back to point to the master branch, and it reverted the files in your working directory back to the snapshot that master points to. This also means the changes you make from this point forward will diverge from an older version of the project. It essentially rewinds the work you’ve done in your testing branch temporarily so you can go in a different direction.

Let’s make a few changes and commit again:

$ vim test.rb
$ git commit -a -m 'made other changes'

Now your project history has diverged (see Figure 3-9). You created and switched to a branch, did some work on it, and then switched back to your main branch and did other work. Both of those changes are isolated in separate branches: you can switch back and forth between the branches and merge them together when you’re ready. And you did all that with simple branch and checkout commands.

Figure 3-9. The branch histories have diverged.

Because a branch in Git is in actuality a simple file that contains the 40 character SHA-1 checksum of the commit it points to, branches are cheap to create and destroy. Creating a new branch is as quick and simple as writing 41 bytes to a file (40 characters and a newline).

This is in sharp contrast to the way most VCS tools branch, which involves copying all of the project’s files into a second directory. This can take several seconds or even minutes, depending on the size of the project, whereas in Git the process is always instantaneous. Also, because we’re recording the parents when we commit, finding a proper merge base for merging is automatically done for us and is generally very easy to do. These features help encourage developers to create and use branches often.

Let’s see why you should do so.

Basic Branching and Merging

Let’s go through a simple example of branching and merging with a workflow that you might use in the real world. You’ll follow these steps:

  1. Do work on a web site.
  2. Create a branch for a new story you’re working on.
  3. Do some work in that branch.

At this stage, you’ll receive a call that another issue is critical and you need a hotfix. You’ll do the following:

  1. Revert back to your production branch.
  2. Create a branch to add the hotfix.
  3. After it’s tested, merge the hotfix branch, and push to production.
  4. Switch back to your original story and continue working.

Basic Branching

First, let’s say you’re working on your project and have a couple of commits already (see Figure 3-10).

Figure 3-10. A short and simple commit history.

You’ve decided that you’re going to work on issue #53 in whatever issue-tracking system your company uses. To be clear, Git isn’t tied into any particular issue-tracking system; but because issue #53 is a focused topic that you want to work on, you’ll create a new branch in which to work. To create a branch and switch to it at the same time, you can run the git checkout command with the -b switch:

$ git checkout -b iss53
Switched to a new branch "iss53"

This is shorthand for:

$ git branch iss53
$ git checkout iss53

Figure 3-11 illustrates the result.

Figure 3-11. Creating a new branch pointer.

You work on your web site and do some commits. Doing so moves the iss53 branch forward, because you have it checked out (that is, your HEAD is pointing to it; see Figure 3-12):

$ vim index.html
$ git commit -a -m 'added a new footer [issue 53]'

Figure 3-12. The iss53 branch has moved forward with your work.

Now you get the call that there is an issue with the web site, and you need to fix it immediately. With Git, you don’t have to deploy your fix along with the iss53 changes you’ve made, and you don’t have to put a lot of effort into reverting those changes before you can work on applying your fix to what is in production. All you have to do is switch back to your master branch.

However, before you do that, note that if your working directory or staging area has uncommitted changes that conflict with the branch you’re checking out, Git won’t let you switch branches. It’s best to have a clean working state when you switch branches. There are ways to get around this (namely, stashing and commit amending) that we’ll cover later. For now, you’ve committed all your changes, so you can switch back to your master branch:

$ git checkout master
Switched to branch "master"

At this point, your project working directory is exactly the way it was before you started working on issue #53, and you can concentrate on your hotfix. This is an important point to remember: Git resets your working directory to look like the snapshot of the commit that the branch you check out points to. It adds, removes, and modifies files automatically to make sure your working copy is what the branch looked like on your last commit to it.

Next, you have a hotfix to make. Let’s create a hotfix branch on which to work until it’s completed (see Figure 3-13):

$ git checkout -b 'hotfix'
Switched to a new branch "hotfix"
$ vim index.html
$ git commit -a -m 'fixed the broken email address'
[hotfix]: created 3a0874c: "fixed the broken email address"
 1 files changed, 0 insertions(+), 1 deletions(-)

Figure 3-13. hotfix branch based back at your master branch point.

You can run your tests, make sure the hotfix is what you want, and merge it back into your master branch to deploy to production. You do this with the git merge command:

$ git checkout master
$ git merge hotfix
Updating f42c576..3a0874c
Fast forward
 README |    1 -
 1 files changed, 0 insertions(+), 1 deletions(-)

You’ll notice the phrase "Fast forward" in that merge. Because the commit pointed to by the branch you merged in was directly upstream of the commit you’re on, Git moves the pointer forward. To phrase that another way, when you try to merge one commit with a commit that can be reached by following the first commit’s history, Git simplifies things by moving the pointer forward because there is no divergent work to merge together — this is called a "fast forward".

Your change is now in the snapshot of the commit pointed to by the master branch, and you can deploy your change (see Figure 3-14).

Figure 3-14. Your master branch points to the same place as your hotfix branch after the merge.

After your super-important fix is deployed, you’re ready to switch back to the work you were doing before you were interrupted. However, first you’ll delete the hotfix branch, because you no longer need it — the master branch points at the same place. You can delete it with the -d option to git branch:

$ git branch -d hotfix
Deleted branch hotfix (3a0874c).

Now you can switch back to your work-in-progress branch on issue #53 and continue working on it (see Figure 3-15):

$ git checkout iss53
Switched to branch "iss53"
$ vim index.html
$ git commit -a -m 'finished the new footer [issue 53]'
[iss53]: created ad82d7a: "finished the new footer [issue 53]"
 1 files changed, 1 insertions(+), 0 deletions(-)

Figure 3-15. Your iss53 branch can move forward independently.

It’s worth noting here that the work you did in your hotfix branch is not contained in the files in your iss53 branch. If you need to pull it in, you can merge your master branch into your iss53 branch by running git merge master, or you can wait to integrate those changes until you decide to pull the iss53 branch back into master later.

Basic Merging

Suppose you’ve decided that your issue #53 work is complete and ready to be merged into your master branch. In order to do that, you’ll merge in your iss53 branch, much like you merged in your hotfix branch earlier. All you have to do is check out the branch you wish to merge into and then run the git merge command:

$ git checkout master
$ git merge iss53
Merge made by recursive.
 README |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

This looks a bit different than the hotfix merge you did earlier. In this case, your development history has diverged from some older point. Because the commit on the branch you’re on isn’t a direct ancestor of the branch you’re merging in, Git has to do some work. In this case, Git does a simple three-way merge, using the two snapshots pointed to by the branch tips and the common ancestor of the two. Figure 3-16 highlights the three snapshots that Git uses to do its merge in this case.

Figure 3-16. Git automatically identifies the best common-ancestor merge base for branch merging.

Instead of just moving the branch pointer forward, Git creates a new snapshot that results from this three-way merge and automatically creates a new commit that points to it (see Figure 3-17). This is referred to as a merge commit and is special in that it has more than one parent.

It’s worth pointing out that Git determines the best common ancestor to use for its merge base; this is different than CVS or Subversion (before version 1.5), where the developer doing the merge has to figure out the best merge base for themselves. This makes merging a heck of a lot easier in Git than in these other systems.

Figure 3-17. Git automatically creates a new commit object that contains the merged work.

Now that your work is merged in, you have no further need for the iss53 branch. You can delete it and then manually close the ticket in your ticket-tracking system:

$ git branch -d iss53

Basic Merge Conflicts

Occasionally, this process doesn’t go smoothly. If you changed the same part of the same file differently in the two branches you’re merging together, Git won’t be able to merge them cleanly. If your fix for issue #53 modified the same part of a file as the hotfix, you’ll get a merge conflict that looks something like this:

$ git merge iss53
Auto-merging index.html
CONFLICT (content): Merge conflict in index.html
Automatic merge failed; fix conflicts and then commit the result.

Git hasn’t automatically created a new merge commit. It has paused the process while you resolve the conflict. If you want to see which files are unmerged at any point after a merge conflict, you can run git status:

[master*]$ git status
index.html: needs merge
# On branch master
# Changed but not updated:
#   (use "git add <file>..." to update what will be committed)
#   (use "git checkout -- <file>..." to discard changes in working directory)
#
#   unmerged:   index.html
#

Anything that has merge conflicts and hasn’t been resolved is listed as unmerged. Git adds standard conflict-resolution markers to the files that have conflicts, so you can open them manually and resolve those conflicts. Your file contains a section that looks something like this:

<<<<<<< HEAD:index.html
<div id="footer">contact : email.support@github.com</div>
=======
<div id="footer">
  please contact us at support@github.com
</div>
>>>>>>> iss53:index.html

This means the version in HEAD (your master branch, because that was what you had checked out when you ran your merge command) is the top part of that block (everything above the =======), while the version in your iss53 branch looks like everything in the bottom part. In order to resolve the conflict, you have to either choose one side or the other or merge the contents yourself. For instance, you might resolve this conflict by replacing the entire block with this:

<div id="footer">
please contact us at email.support@github.com
</div>

This resolution has a little of each section, and I’ve fully removed the <<<<<<<, =======, and >>>>>>> lines. After you’ve resolved each of these sections in each conflicted file, run git add on each file to mark it as resolved. Staging the file marks it as resolved in Git. If you want to use a graphical tool to resolve these issues, you can run git mergetool, which fires up an appropriate visual merge tool and walks you through the conflicts:

$ git mergetool
merge tool candidates: kdiff3 tkdiff xxdiff meld gvimdiff opendiff emerge vimdiff
Merging the files: index.html

Normal merge conflict for 'index.html':
  {local}: modified
  {remote}: modified
Hit return to start merge resolution tool (opendiff):

If you want to use a merge tool other than the default (Git chose opendiff for me in this case because I ran the command on a Mac), you can see all the supported tools listed at the top after “merge tool candidates”. Type the name of the tool you’d rather use. In Chapter 7, we’ll discuss how you can change this default value for your environment.

After you exit the merge tool, Git asks you if the merge was successful. If you tell the script that it was, it stages the file to mark it as resolved for you.

You can run git status again to verify that all conflicts have been resolved:

$ git status
# On branch master
# Changes to be committed:
#   (use "git reset HEAD <file>..." to unstage)
#
#   modified:   index.html
#

If you’re happy with that, and you verify that everything that had conflicts has been staged, you can type git commit to finalize the merge commit. The commit message by default looks something like this:

Merge branch 'iss53'

Conflicts:
  index.html
#
# It looks like you may be committing a MERGE.
# If this is not correct, please remove the file
# .git/MERGE_HEAD
# and try again.
#

You can modify that message with details about how you resolved the merge if you think it would be helpful to others looking at this merge in the future — why you did what you did, if it’s not obvious.

Branch Management

Now that you’ve created, merged, and deleted some branches, let’s look at some branch-management tools that will come in handy when you begin using branches all the time.

The git branch command does more than just create and delete branches. If you run it with no arguments, you get a simple listing of your current branches:

$ git branch
  iss53
* master
  testing

Notice the * character that prefixes the master branch: it indicates the branch that you currently have checked out. This means that if you commit at this point, the master branch will be moved forward with your new work. To see the last commit on each branch, you can run git branch –v:

$ git branch -v
  iss53   93b412c fix javascript issue
* master  7a98805 Merge branch 'iss53'
  testing 782fd34 add scott to the author list in the readmes

Another useful option to figure out what state your branches are in is to filter this list to branches that you have or have not yet merged into the branch you’re currently on. The useful --merged and --no-merged options have been available in Git since version 1.5.6 for this purpose. To see which branches are already merged into the branch you’re on, you can run git branch –merged:

$ git branch --merged
  iss53
* master

Because you already merged in iss53 earlier, you see it in your list. Branches on this list without the * in front of them are generally fine to delete with git branch -d; you’ve already incorporated their work into another branch, so you’re not going to lose anything.

To see all the branches that contain work you haven’t yet merged in, you can run git branch --no-merged:

$ git branch --no-merged
  testing

This shows your other branch. Because it contains work that isn’t merged in yet, trying to delete it with git branch -d will fail:

$ git branch -d testing
error: The branch 'testing' is not an ancestor of your current HEAD.
If you are sure you want to delete it, run 'git branch -D testing'.

If you really do want to delete the branch and lose that work, you can force it with -D, as the helpful message points out.

Branching Workflows

Now that you have the basics of branching and merging down, what can or should you do with them? In this section, we’ll cover some common workflows that this lightweight branching makes possible, so you can decide if you would like to incorporate it into your own development cycle.

Long-Running Branches

Because Git uses a simple three-way merge, merging from one branch into another multiple times over a long period is generally easy to do. This means you can have several branches that are always open and that you use for different stages of your development cycle; you can merge regularly from some of them into others.

Many Git developers have a workflow that embraces this approach, such as having only code that is entirely stable in their master branch — possibly only code that has been or will be released. They have another parallel branch named develop or next that they work from or use to test stability — it isn’t necessarily always stable, but whenever it gets to a stable state, it can be merged into master. It’s used to pull in topic branches (short-lived branches, like your earlier iss53 branch) when they’re ready, to make sure they pass all the tests and don’t introduce bugs.

In reality, we’re talking about pointers moving up the line of commits you’re making. The stable branches are farther down the line in your commit history, and the bleeding-edge branches are farther up the history (see Figure 3-18).

Figure 3-18. More stable branches are generally farther down the commit history.

It’s generally easier to think about them as work silos, where sets of commits graduate to a more stable silo when they’re fully tested (see Figure 3-19).

Figure 3-19. It may be helpful to think of your branches as silos.

You can keep doing this for several levels of stability. Some larger projects also have a proposed or pu (proposed updates) branch that has integrated branches that may not be ready to go into the next or master branch. The idea is that your branches are at various levels of stability; when they reach a more stable level, they’re merged into the branch above them. Again, having multiple long-running branches isn’t necessary, but it’s often helpful, especially when you’re dealing with very large or complex projects.

Topic Branches

Topic branches, however, are useful in projects of any size. A topic branch is a short-lived branch that you create and use for a single particular feature or related work. This is something you’ve likely never done with a VCS before because it’s generally too expensive to create and merge branches. But in Git it’s common to create, work on, merge, and delete branches several times a day.

You saw this in the last section with the iss53 and hotfix branches you created. You did a few commits on them and deleted them directly after merging them into your main branch. This technique allows you to context-switch quickly and completely — because your work is separated into silos where all the changes in that branch have to do with that topic, it’s easier to see what has happened during code review and such. You can keep the changes there for minutes, days, or months, and merge them in when they’re ready, regardless of the order in which they were created or worked on.

Consider an example of doing some work (on master), branching off for an issue (iss91), working on it for a bit, branching off the second branch to try another way of handling the same thing (iss91v2), going back to your master branch and working there for a while, and then branching off there to do some work that you’re not sure is a good idea (dumbidea branch). Your commit history will look something like Figure 3-20.

Figure 3-20. Your commit history with multiple topic branches.

Now, let’s say you decide you like the second solution to your issue best (iss91v2); and you showed the dumbidea branch to your coworkers, and it turns out to be genius. You can throw away the original iss91 branch (losing commits C5 and C6) and merge in the other two. Your history then looks like Figure 3-21.

Figure 3-21. Your history after merging in dumbidea and iss91v2.

It’s important to remember when you’re doing all this that these branches are completely local. When you’re branching and merging, everything is being done only in your Git repository — no server communication is happening.

Remote Branches

Remote branches are references to the state of branches on your remote repositories. They’re local branches that you can’t move; they’re moved automatically whenever you do any network communication. Remote branches act as bookmarks to remind you where the branches on your remote repositories were the last time you connected to them.

They take the form (remote)/(branch). For instance, if you wanted to see what the master branch on your origin remote looked like as of the last time you communicated with it, you would check the origin/master branch. If you were working on an issue with a partner and they pushed up an iss53 branch, you might have your own local iss53 branch; but the branch on the server would point to the commit at origin/iss53.

This may be a bit confusing, so let’s look at an example. Let’s say you have a Git server on your network at git.ourcompany.com. If you clone from this, Git automatically names it origin for you, pulls down all its data, creates a pointer to where its master branch is, and names it origin/master locally; and you can’t move it. Git also gives you your own master branch starting at the same place as origin’s master branch, so you have something to work from (see Figure 3-22).

Figure 3-22. A Git clone gives you your own master branch and origin/master pointing to origin’s master branch.

If you do some work on your local master branch, and, in the meantime, someone else pushes to git.ourcompany.com and updates its master branch, then your histories move forward differently. Also, as long as you stay out of contact with your origin server, your origin/master pointer doesn’t move (see Figure 3-23).

Figure 3-23. Working locally and having someone push to your remote server makes each history move forward differently.

To synchronize your work, you run a git fetch origin command. This command looks up which server origin is (in this case, it’s git.ourcompany.com), fetches any data from it that you don’t yet have, and updates your local database, moving your origin/master pointer to its new, more up-to-date position (see Figure 3-24).

Figure 3-24. The git fetch command updates your remote references.

To demonstrate having multiple remote servers and what remote branches for those remote projects look like, let’s assume you have another internal Git server that is used only for development by one of your sprint teams. This server is at git.team1.ourcompany.com. You can add it as a new remote reference to the project you’re currently working on by running the git remote add command as we covered in Chapter 2. Name this remote teamone, which will be your shortname for that whole URL (see Figure 3-25).

Figure 3-25. Adding another server as a remote.

Now, you can run git fetch teamone to fetch everything server has that you don’t have yet. Because that server is a subset of the data your origin server has right now, Git fetches no data but sets a remote branch called teamone/master to point to the commit that teamone has as its master branch (see Figure 3-26).

Figure 3-26. You get a reference to teamone’s master branch position locally.

Pushing

When you want to share a branch with the world, you need to push it up to a remote that you have write access to. Your local branches aren’t automatically synchronized to the remotes you write to — you have to explicitly push the branches you want to share. That way, you can use private branches for work you don’t want to share, and push up only the topic branches you want to collaborate on.

If you have a branch named serverfix that you want to work on with others, you can push it up the same way you pushed your first branch. Run git push (remote) (branch):

$ git push origin serverfix
Counting objects: 20, done.
Compressing objects: 100% (14/14), done.
Writing objects: 100% (15/15), 1.74 KiB, done.
Total 15 (delta 5), reused 0 (delta 0)
To git@github.com:schacon/simplegit.git
 * [new branch]      serverfix -> serverfix

This is a bit of a shortcut. Git automatically expands the serverfix branchname out to refs/heads/serverfix:refs/heads/serverfix, which means, “Take my serverfix local branch and push it to update the remote’s serverfix branch.” We’ll go over the refs/heads/ part in detail in Chapter 9, but you can generally leave it off. You can also do git push origin serverfix:serverfix, which does the same thing — it says, “Take my serverfix and make it the remote’s serverfix.” You can use this format to push a local branch into a remote branch that is named differently. If you didn’t want it to be called serverfix on the remote, you could instead run git push origin serverfix:awesomebranch to push your local serverfix branch to the awesomebranch branch on the remote project.

The next time one of your collaborators fetches from the server, they will get a reference to where the server’s version of serverfix is under the remote branch origin/serverfix:

$ git fetch origin
remote: Counting objects: 20, done.
remote: Compressing objects: 100% (14/14), done.
remote: Total 15 (delta 5), reused 0 (delta 0)
Unpacking objects: 100% (15/15), done.
From git@github.com:schacon/simplegit
 * [new branch]      serverfix    -> origin/serverfix

It’s important to note that when you do a fetch that brings down new remote branches, you don’t automatically have local, editable copies of them. In other words, in this case, you don’t have a new serverfix branch — you only have an origin/serverfix pointer that you can’t modify.

To merge this work into your current working branch, you can run git merge origin/serverfix. If you want your own serverfix branch that you can work on, you can base it off your remote branch:

$ git checkout -b serverfix origin/serverfix
Branch serverfix set up to track remote branch refs/remotes/origin/serverfix.
Switched to a new branch "serverfix"

This gives you a local branch that you can work on that starts where origin/serverfix is.

Tracking Branches

Checking out a local branch from a remote branch automatically creates what is called a tracking branch. Tracking branches are local branches that have a direct relationship to a remote branch. If you’re on a tracking branch and type git push, Git automatically knows which server and branch to push to. Also, running git pull while on one of these branches fetches all the remote references and then automatically merges in the corresponding remote branch.

When you clone a repository, it generally automatically creates a master branch that tracks origin/master. That’s why git push and git pull work out of the box with no other arguments. However, you can set up other tracking branches if you wish — ones that don’t track branches on origin and don’t track the master branch. The simple case is the example you just saw, running git checkout -b [branch] [remotename]/[branch]. If you have Git version 1.6.2 or later, you can also use the --track shorthand:

$ git checkout --track origin/serverfix
Branch serverfix set up to track remote branch refs/remotes/origin/serverfix.
Switched to a new branch "serverfix"

To set up a local branch with a different name than the remote branch, you can easily use the first version with a different local branch name:

$ git checkout -b sf origin/serverfix
Branch sf set up to track remote branch refs/remotes/origin/serverfix.
Switched to a new branch "sf"

Now, your local branch sf will automatically push to and pull from origin/serverfix.

Deleting Remote Branches

Suppose you’re done with a remote branch — say, you and your collaborators are finished with a feature and have merged it into your remote’s master branch (or whatever branch your stable codeline is in). You can delete a remote branch using the rather obtuse syntax git push [remotename] :[branch]. If you want to delete your serverfix branch from the server, you run the following:

$ git push origin :serverfix
To git@github.com:schacon/simplegit.git
 - [deleted]         serverfix

Boom. No more branch on your server. You may want to dog-ear this page, because you’ll need that command, and you’ll likely forget the syntax. A way to remember this command is by recalling the git push [remotename] [localbranch]:[remotebranch] syntax that we went over a bit earlier. If you leave off the [localbranch] portion, then you’re basically saying, “Take nothing on my side and make it be [remotebranch].”

Rebasing

In Git, there are two main ways to integrate changes from one branch into another: the merge and the rebase. In this section you’ll learn what rebasing is, how to do it, why it’s a pretty amazing tool, and in what cases you won’t want to use it.

The Basic Rebase

If you go back to an earlier example from the Merge section (see Figure 3-27), you can see that you diverged your work and made commits on two different branches.

Figure 3-27. Your initial diverged commit history.

The easiest way to integrate the branches, as we’ve already covered, is the merge command. It performs a three-way merge between the two latest branch snapshots (C3 and C4) and the most recent common ancestor of the two (C2), creating a new snapshot (and commit), as shown in Figure 3-28.

Figure 3-28. Merging a branch to integrate the diverged work history.

However, there is another way: you can take the patch of the change that was introduced in C3 and reapply it on top of C4. In Git, this is called rebasing. With the rebase command, you can take all the changes that were committed on one branch and replay them on another one.

In this example, you’d run the following:

$ git checkout experiment
$ git rebase master
First, rewinding head to replay your work on top of it...
Applying: added staged command

It works by going to the common ancestor of the two branches (the one you’re on and the one you’re rebasing onto), getting the diff introduced by each commit of the branch you’re on, saving those diffs to temporary files, resetting the current branch to the same commit as the branch you are rebasing onto, and finally applying each change in turn. Figure 3-29 illustrates this process.

Figure 3-29. Rebasing the change introduced in C3 onto C4.

At this point, you can go back to the master branch and do a fast-forward merge (see Figure 3-30).

Figure 3-30. Fast-forwarding the master branch.

Now, the snapshot pointed to by C3 is exactly the same as the one that was pointed to by C5 in the merge example. There is no difference in the end product of the integration, but rebasing makes for a cleaner history. If you examine the log of a rebased branch, it looks like a linear history: it appears that all the work happened in series, even when it originally happened in parallel.

Often, you’ll do this to make sure your commits apply cleanly on a remote branch — perhaps in a project to which you’re trying to contribute but that you don’t maintain. In this case, you’d do your work in a branch and then rebase your work onto origin/master when you were ready to submit your patches to the main project. That way, the maintainer doesn’t have to do any integration work — just a fast-forward or a clean apply.

Note that the snapshot pointed to by the final commit you end up with, whether it’s the last of the rebased commits for a rebase or the final merge commit after a merge, is the same snapshot — it’s only the history that is different. Rebasing replays changes from one line of work onto another in the order they were introduced, whereas merging takes the endpoints and merges them together.

More Interesting Rebases

You can also have your rebase replay on something other than the rebase branch. Take a history like Figure 3-31, for example. You branched a topic branch (server) to add some server-side functionality to your project, and made a commit. Then, you branched off that to make the client-side changes (client) and committed a few times. Finally, you went back to your server branch and did a few more commits.

Figure 3-31. A history with a topic branch off another topic branch.

Suppose you decide that you want to merge your client-side changes into your mainline for a release, but you want to hold off on the server-side changes until it’s tested further. You can take the changes on client that aren’t on server (C8 and C9) and replay them on your master branch by using the --onto option of git rebase:

$ git rebase --onto master server client

This basically says, “Check out the client branch, figure out the patches from the common ancestor of the client and server branches, and then replay them onto master.” It’s a bit complex; but the result, shown in Figure 3-32, is pretty cool.

Figure 3-32. Rebasing a topic branch off another topic branch.

Now you can fast-forward your master branch (see Figure 3-33):

$ git checkout master
$ git merge client

Figure 3-33. Fast-forwarding your master branch to include the client branch changes.

Let’s say you decide to pull in your server branch as well. You can rebase the server branch onto the master branch without having to check it out first by running git rebase [basebranch] [topicbranch] — which checks out the topic branch (in this case, server) for you and replays it onto the base branch (master):

$ git rebase master server

This replays your server work on top of your master work, as shown in Figure 3-34.

Figure 3-34. Rebasing your server branch on top of your master branch.

Then, you can fast-forward the base branch (master):

$ git checkout master
$ git merge server

You can remove the client and server branches because all the work is integrated and you don’t need them anymore, leaving your history for this entire process looking like Figure 3-35:

$ git branch -d client
$ git branch -d server

Figure 3-35. Final commit history.

The Perils of Rebasing

Ahh, but the bliss of rebasing isn’t without its drawbacks, which can be summed up in a single line:

Do not rebase commits that you have pushed to a public repository.

If you follow that guideline, you’ll be fine. If you don’t, people will hate you, and you’ll be scorned by friends and family.

When you rebase stuff, you’re abandoning existing commits and creating new ones that are similar but different. If you push commits somewhere and others pull them down and base work on them, and then you rewrite those commits with git rebase and push them up again, your collaborators will have to re-merge their work and things will get messy when you try to pull their work back into yours.

Let’s look at an example of how rebasing work that you’ve made public can cause problems. Suppose you clone from a central server and then do some work off that. Your commit history looks like Figure 3-36.

Figure 3-36. Clone a repository, and base some work on it.

Now, someone else does more work that includes a merge, and pushes that work to the central server. You fetch them and merge the new remote branch into your work, making your history look something like Figure 3-37.

Figure 3-37. Fetch more commits, and merge them into your work.

Next, the person who pushed the merged work decides to go back and rebase their work instead; they do a git push --force to overwrite the history on the server. You then fetch from that server, bringing down the new commits.

Figure 3-38. Someone pushes rebased commits, abandoning commits you’ve based your work on.

At this point, you have to merge this work in again, even though you’ve already done so. Rebasing changes the SHA-1 hashes of these commits so to Git they look like new commits, when in fact you already have the C4 work in your history (see Figure 3-39).

Figure 3-39. You merge in the same work again into a new merge commit.

You have to merge that work in at some point so you can keep up with the other developer in the future. After you do that, your commit history will contain both the C4 and C4' commits, which have different SHA-1 hashes but introduce the same work and have the same commit message. If you run a git log when your history looks like this, you’ll see two commits that have the same author date and message, which will be confusing. Furthermore, if you push this history back up to the server, you’ll reintroduce all those rebased commits to the central server, which can further confuse people.

If you treat rebasing as a way to clean up and work with commits before you push them, and if you only rebase commits that have never been available publicly, then you’ll be fine. If you rebase commits that have already been pushed publicly, and people may have based work on those commits, then you may be in for some frustrating trouble.

Summary

We’ve covered basic branching and merging in Git. You should feel comfortable creating and switching to new branches, switching between branches and merging local branches together. You should also be able to share your branches by pushing them to a shared server, working with others on shared branches and rebasing your branches before they are shared.

Git on the Server

At this point, you should be able to do most of the day-to-day tasks for which you’ll be using Git. However, in order to do any collaboration in Git, you’ll need to have a remote Git repository. Although you can technically push changes to and pull changes from individuals’ repositories, doing so is discouraged because you can fairly easily confuse what they’re working on if you’re not careful. Furthermore, you want your collaborators to be able to access the repository even if your computer is offline — having a more reliable common repository is often useful. Therefore, the preferred method for collaborating with someone is to set up an intermediate repository that you both have access to, and push to and pull from that. We’ll refer to this repository as a "Git server"; but you’ll notice that it generally takes a tiny amount of resources to host a Git repository, so you’ll rarely need to use an entire server for it.

Running a Git server is simple. First, you choose which protocols you want your server to communicate with. The first section of this chapter will cover the available protocols and the pros and cons of each. The next sections will explain some typical setups using those protocols and how to get your server running with them. Last, we’ll go over a few hosted options, if you don’t mind hosting your code on someone else’s server and don’t want to go through the hassle of setting up and maintaining your own server.

If you have no interest in running your own server, you can skip to the last section of the chapter to see some options for setting up a hosted account and then move on to the next chapter, where we discuss the various ins and outs of working in a distributed source control environment.

A remote repository is generally a bare repository — a Git repository that has no working directory. Because the repository is only used as a collaboration point, there is no reason to have a snapshot checked out on disk; it’s just the Git data. In the simplest terms, a bare repository is the contents of your project’s .git directory and nothing else.

The Protocols

Git can use four major network protocols to transfer data: Local, Secure Shell (SSH), Git, and HTTP. Here we’ll discuss what they are and in what basic circumstances you would want (or not want) to use them.

It’s important to note that with the exception of the HTTP protocols, all of these require Git to be installed and working on the server.

Local Protocol

The most basic is the Local protocol, in which the remote repository is in another directory on disk. This is often used if everyone on your team has access to a shared filesystem such as an NFS mount, or in the less likely case that everyone logs in to the same computer. The latter wouldn’t be ideal, because all your code repository instances would reside on the same computer, making a catastrophic loss much more likely.

If you have a shared mounted filesystem, then you can clone, push to, and pull from a local file-based repository. To clone a repository like this or to add one as a remote to an existing project, use the path to the repository as the URL. For example, to clone a local repository, you can run something like this:

$ git clone /opt/git/project.git

Or you can do this:

$ git clone file:///opt/git/project.git

Git operates slightly differently if you explicitly specify file:// at the beginning of the URL. If you just specify the path, Git tries to use hardlinks or directly copy the files it needs. If you specify file://, Git fires up the processes that it normally uses to transfer data over a network which is generally a lot less efficient method of transferring the data. The main reason to specify the file:// prefix is if you want a clean copy of the repository with extraneous references or objects left out — generally after an import from another version-control system or something similar (see Chapter 9 for maintenance tasks). We’ll use the normal path here because doing so is almost always faster.

To add a local repository to an existing Git project, you can run something like this:

$ git remote add local_proj /opt/git/project.git

Then, you can push to and pull from that remote as though you were doing so over a network.

The Pros

The pros of file-based repositories are that they’re simple and they use existing file permissions and network access. If you already have a shared filesystem to which your whole team has access, setting up a repository is very easy. You stick the bare repository copy somewhere everyone has shared access to and set the read/write permissions as you would for any other shared directory. We’ll discuss how to export a bare repository copy for this purpose in the next section, “Getting Git on a Server.”

This is also a nice option for quickly grabbing work from someone else’s working repository. If you and a co-worker are working on the same project and they want you to check something out, running a command like git pull /home/john/project is often easier than them pushing to a remote server and you pulling down.

The Cons

The cons of this method are that shared access is generally more difficult to set up and reach from multiple locations than basic network access. If you want to push from your laptop when you’re at home, you have to mount the remote disk, which can be difficult and slow compared to network-based access.

It’s also important to mention that this isn’t necessarily the fastest option if you’re using a shared mount of some kind. A local repository is fast only if you have fast access to the data. A repository on NFS is often slower than the repository over SSH on the same server, allowing Git to run off local disks on each system.

The SSH Protocol

Probably the most common transport protocol for Git is SSH. This is because SSH access to servers is already set up in most places — and if it isn’t, it’s easy to do. SSH is also the only network-based protocol that you can easily read from and write to. The other two network protocols (HTTP and Git) are generally read-only, so even if you have them available for the unwashed masses, you still need SSH for your own write commands. SSH is also an authenticated network protocol; and because it’s ubiquitous, it’s generally easy to set up and use.

To clone a Git repository over SSH, you can specify ssh:// URL like this:

$ git clone ssh://user@server:project.git

Or you can not specify a protocol — Git assumes SSH if you aren’t explicit:

$ git clone user@server:project.git

You can also not specify a user, and Git assumes the user you’re currently logged in as.

The Pros

The pros of using SSH are many. First, you basically have to use it if you want authenticated write access to your repository over a network. Second, SSH is relatively easy to set up — SSH daemons are commonplace, many network admins have experience with them, and many OS distributions are set up with them or have tools to manage them. Next, access over SSH is secure — all data transfer is encrypted and authenticated. Last, like the Git and Local protocols, SSH is efficient, making the data as compact as possible before transferring it.

The Cons

The negative aspect of SSH is that you can’t serve anonymous access of your repository over it. People must have access to your machine over SSH to access it, even in a read-only capacity, which doesn’t make SSH access conducive to open source projects. If you’re using it only within your corporate network, SSH may be the only protocol you need to deal with. If you want to allow anonymous read-only access to your projects, you’ll have to set up SSH for you to push over but something else for others to pull over.

The Git Protocol

Next is the Git protocol. This is a special daemon that comes packaged with Git; it listens on a dedicated port (9418) that provides a service similar to the SSH protocol, but with absolutely no authentication. In order for a repository to be served over the Git protocol, you must create the git-export-daemon-ok file — the daemon won’t serve a repository without that file in it — but other than that there is no security. Either the Git repository is available for everyone to clone or it isn’t. This means that there is generally no pushing over this protocol. You can enable push access; but given the lack of authentication, if you turn on push access, anyone on the internet who finds your project’s URL could push to your project. Suffice it to say that this is rare.

The Pros

The Git protocol is the fastest transfer protocol available. If you’re serving a lot of traffic for a public project or serving a very large project that doesn’t require user authentication for read access, it’s likely that you’ll want to set up a Git daemon to serve your project. It uses the same data-transfer mechanism as the SSH protocol but without the encryption and authentication overhead.

The Cons

The downside of the Git protocol is the lack of authentication. It’s generally undesirable for the Git protocol to be the only access to your project. Generally, you’ll pair it with SSH access for the few developers who have push (write) access and have everyone else use git:// for read-only access. It’s also probably the most difficult protocol to set up. It must run its own daemon, which is custom — we’ll look at setting one up in the “Gitosis” section of this chapter — it requires xinetd configuration or the like, which isn’t always a walk in the park. It also requires firewall access to port 9418, which isn’t a standard port that corporate firewalls always allow. Behind big corporate firewalls, this obscure port is commonly blocked.

The HTTP/S Protocol

Last we have the HTTP protocol. The beauty of the HTTP or HTTPS protocol is the simplicity of setting it up. Basically, all you have to do is put the bare Git repository under your HTTP document root and set up a specific post-update hook, and you’re done (See Chapter 7 for details on Git hooks). At that point, anyone who can access the web server under which you put the repository can also clone your repository. To allow read access to your repository over HTTP, do something like this:

$ cd /var/www/htdocs/
$ git clone --bare /path/to/git_project gitproject.git
$ cd gitproject.git
$ mv hooks/post-update.sample hooks/post-update
$ chmod a+x hooks/post-update

That’s all. The post-update hook that comes with Git by default runs the appropriate command (git update-server-info) to make HTTP fetching and cloning work properly. This command is run when you push to this repository over SSH; then, other people can clone via something like

$ git clone http://example.com/gitproject.git

In this particular case, we’re using the /var/www/htdocs path that is common for Apache setups, but you can use any static web server — just put the bare repository in its path. The Git data is served as basic static files (see Chapter 9 for details about exactly how it’s served).

It’s possible to make Git push over HTTP as well, although that technique isn’t as widely used and requires you to set up complex WebDAV requirements. Because it’s rarely used, we won’t cover it in this book. If you’re interested in using the HTTP-push protocols, you can read about preparing a repository for this purpose at http://www.kernel.org/pub/software/scm/git/docs/howto/setup-git-server-over-http.txt. One nice thing about making Git push over HTTP is that you can use any WebDAV server, without specific Git features; so, you can use this functionality if your web-hosting provider supports WebDAV for writing updates to your web site.

The Pros

The upside of using the HTTP protocol is that it’s easy to set up. Running the handful of required commands gives you a simple way to give the world read access to your Git repository. It takes only a few minutes to do. The HTTP protocol also isn’t very resource intensive on your server. Because it generally uses a static HTTP server to serve all the data, a normal Apache server can serve thousands of files per second on average — it’s difficult to overload even a small server.

You can also serve your repositories read-only over HTTPS, which means you can encrypt the content transfer; or you can go so far as to make the clients use specific signed SSL certificates. Generally, if you’re going to these lengths, it’s easier to use SSH public keys; but it may be a better solution in your specific case to use signed SSL certificates or other HTTP-based authentication methods for read-only access over HTTPS.

Another nice thing is that HTTP is such a commonly used protocol that corporate firewalls are often set up to allow traffic through this port.

The Cons

The downside of serving your repository over HTTP is that it’s relatively inefficient for the client. It generally takes a lot longer to clone or fetch from the repository, and you often have a lot more network overhead and transfer volume over HTTP than with any of the other network protocols. Because it’s not as intelligent about transferring only the data you need — there is no dynamic work on the part of the server in these transactions — the HTTP protocol is often referred to as a dumb protocol. For more information about the differences in efficiency between the HTTP protocol and the other protocols, see Chapter 9.

Getting Git on a Server

In order to initially set up any Git server, you have to export an existing repository into a new bare repository — a repository that doesn’t contain a working directory. This is generally straightforward to do. In order to clone your repository to create a new bare repository, you run the clone command with the --bare option. By convention, bare repository directories end in .git, like so:

$ git clone --bare my_project my_project.git
Initialized empty Git repository in /opt/projects/my_project.git/

The output for this command is a little confusing. Since clone is basically a git init then a git fetch, we see some output from the git init part, which creates an empty directory. The actual object transfer gives no output, but it does happen. You should now have a copy of the Git directory data in your my_project.git directory.

This is roughly equivalent to something like

$ cp -Rf my_project/.git my_project.git

There are a couple of minor differences in the configuration file; but for your purpose, this is close to the same thing. It takes the Git repository by itself, without a working directory, and creates a directory specifically for it alone.

Putting the Bare Repository on a Server

Now that you have a bare copy of your repository, all you need to do is put it on a server and set up your protocols. Let’s say you’ve set up a server called git.example.com that you have SSH access to, and you want to store all your Git repositories under the /opt/git directory. You can set up your new repository by copying your bare repository over:

$ scp -r my_project.git user@git.example.com:/opt/git

At this point, other users who have SSH access to the same server which has read-access to the /opt/git directory can clone your repository by running

$ git clone user@git.example.com:/opt/git/my_project.git

If a user SSHs into a server and has write access to the /opt/git/my_project.git directory, they will also automatically have push access. Git will automatically add group write permissions to a repository properly if you run the git init command with the --shared option.

$ ssh user@git.example.com
$ cd /opt/git/my_project.git
$ git init --bare --shared

You see how easy it is to take a Git repository, create a bare version, and place it on a server to which you and your collaborators have SSH access. Now you’re ready to collaborate on the same project.

It’s important to note that this is literally all you need to do to run a useful Git server to which several people have access — just add SSH-able accounts on a server, and stick a bare repository somewhere that all those users have read and write access to. You’re ready to go — nothing else needed.

In the next few sections, you’ll see how to expand to more sophisticated setups. This discussion will include not having to create user accounts for each user, adding public read access to repositories, setting up web UIs, using the Gitosis tool, and more. However, keep in mind that to collaborate with a couple of people on a private project, all you need is an SSH server and a bare repository.

Small Setups

If you’re a small outfit or are just trying out Git in your organization and have only a few developers, things can be simple for you. One of the most complicated aspects of setting up a Git server is user management. If you want some repositories to be read-only to certain users and read/write to others, access and permissions can be a bit difficult to arrange.

SSH Access

If you already have a server to which all your developers have SSH access, it’s generally easiest to set up your first repository there, because you have to do almost no work (as we covered in the last section). If you want more complex access control type permissions on your repositories, you can handle them with the normal filesystem permissions of the operating system your server runs.

If you want to place your repositories on a server that doesn’t have accounts for everyone on your team whom you want to have write access, then you must set up SSH access for them. We assume that if you have a server with which to do this, you already have an SSH server installed, and that’s how you’re accessing the server.

There are a few ways you can give access to everyone on your team. The first is to set up accounts for everybody, which is straightforward but can be cumbersome. You may not want to run adduser and set temporary passwords for every user.

A second method is to create a single 'git' user on the machine, ask every user who is to have write access to send you an SSH public key, and add that key to the ~/.ssh/authorized_keys file of your new 'git' user. At that point, everyone will be able to access that machine via the 'git' user. This doesn’t affect the commit data in any way — the SSH user you connect as doesn’t affect the commits you’ve recorded.

Another way to do it is to have your SSH server authenticate from an LDAP server or some other centralized authentication source that you may already have set up. As long as each user can get shell access on the machine, any SSH authentication mechanism you can think of should work.

Generating Your SSH Public Key

That being said, many Git servers authenticate using SSH public keys. In order to provide a public key, each user in your system must generate one if they don’t already have one. This process is similar across all operating systems. First, you should check to make sure you don’t already have a key. By default, a user’s SSH keys are stored in that user’s ~/.ssh directory. You can easily check to see if you have a key already by going to that directory and listing the contents:

$ cd ~/.ssh
$ ls
authorized_keys2  id_dsa       known_hosts
config            id_dsa.pub

You’re looking for a pair of files named something and something.pub, where the something is usually id_dsa or id_rsa. The .pub file is your public key, and the other file is your private key. If you don’t have these files (or you don’t even have a .ssh directory), you can create them by running a program called ssh-keygen, which is provided with the SSH package on Linux/Mac systems and comes with the MSysGit package on Windows:

$ ssh-keygen 
Generating public/private rsa key pair.
Enter file in which to save the key (/Users/schacon/.ssh/id_rsa): 
Enter passphrase (empty for no passphrase): 
Enter same passphrase again: 
Your identification has been saved in /Users/schacon/.ssh/id_rsa.
Your public key has been saved in /Users/schacon/.ssh/id_rsa.pub.
The key fingerprint is:
43:c5:5b:5f:b1:f1:50:43:ad:20:a6:92:6a:1f:9a:3a schacon@agadorlaptop.local

First it confirms where you want to save the key (.ssh/id_rsa), and then it asks twice for a passphrase, which you can leave empty if you don’t want to type a password when you use the key.

Now, each user that does this has to send their public key to you or whoever is administrating the Git server (assuming you’re using an SSH server setup that requires public keys). All they have to do is copy the contents of the .pub file and e-mail it. The public keys look something like this:

$ cat ~/.ssh/id_rsa.pub 
ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAklOUpkDHrfHY17SbrmTIpNLTGK9Tjom/BWDSU
GPl+nafzlHDTYW7hdI4yZ5ew18JH4JW9jbhUFrviQzM7xlELEVf4h9lFX5QVkbPppSwg0cda3
Pbv7kOdJ/MTyBlWXFCR+HAo3FXRitBqxiX1nKhXpHAZsMciLq8V6RjsNAQwdsdMFvSlVK/7XA
t3FaoJoAsncM1Q9x5+3V0Ww68/eIFmb1zuUFljQJKprrX88XypNDvjYNby6vw/Pb0rwert/En
mZ+AW4OZPnTPI89ZPmVMLuayrD2cE86Z/il8b+gw3r3+1nKatmIkjn2so1d01QraTlMqVSsbx
NrRFi9wrf+M7Q== schacon@agadorlaptop.local

For a more in-depth tutorial on creating an SSH key on multiple operating systems, see the GitHub guide on SSH keys at http://github.com/guides/providing-your-ssh-key.

Setting Up the Server

Let’s walk through setting up SSH access on the server side. In this example, you’ll use the authorized_keys method for authenticating your users. We also assume you’re running a standard Linux distribution like Ubuntu. First, you create a 'git' user and a .ssh directory for that user.

$ sudo adduser git
$ su git
$ cd
$ mkdir .ssh

Next, you need to add some developer SSH public keys to the authorized_keys file for that user. Let’s assume you’ve received a few keys by e-mail and saved them to temporary files. Again, the public keys look something like this:

$ cat /tmp/id_rsa.john.pub
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCB007n/ww+ouN4gSLKssMxXnBOvf9LGt4L
ojG6rs6hPB09j9R/T17/x4lhJA0F3FR1rP6kYBRsWj2aThGw6HXLm9/5zytK6Ztg3RPKK+4k
Yjh6541NYsnEAZuXz0jTTyAUfrtU3Z5E003C4oxOj6H0rfIF1kKI9MAQLMdpGW1GYEIgS9Ez
Sdfd8AcCIicTDWbqLAcU4UpkaX8KyGlLwsNuuGztobF8m72ALC/nLF6JLtPofwFBlgc+myiv
O7TCUSBdLQlgMVOFq1I2uPWQOkOWQAHukEOmfjy2jctxSDBQ220ymjaNsHT4kgtZg2AYYgPq
dAv8JggJICUvax2T9va5 gsg-keypair

You just append them to your authorized_keys file:

$ cat /tmp/id_rsa.john.pub >> ~/.ssh/authorized_keys
$ cat /tmp/id_rsa.josie.pub >> ~/.ssh/authorized_keys
$ cat /tmp/id_rsa.jessica.pub >> ~/.ssh/authorized_keys

Now, you can set up an empty repository for them by running git init with the --bare option, which initializes the repository without a working directory:

$ cd /opt/git
$ mkdir project.git
$ cd project.git
$ git --bare init

Then, John, Josie, or Jessica can push the first version of their project into that repository by adding it as a remote and pushing up a branch. Note that someone must shell onto the machine and create a bare repository every time you want to add a project. Let’s use gitserver as the hostname of the server on which you’ve set up your 'git' user and repository. If you’re running it internally, and you set up DNS for gitserver to point to that server, then you can use the commands pretty much as is:

# on Johns computer
$ cd myproject
$ git init
$ git add .
$ git commit -m 'initial commit'
$ git remote add origin git@gitserver:/opt/git/project.git
$ git push origin master

At this point, the others can clone it down and push changes back up just as easily:

$ git clone git@gitserver:/opt/git/project.git
$ vim README
$ git commit -am 'fix for the README file'
$ git push origin master

With this method, you can quickly get a read/write Git server up and running for a handful of developers.

As an extra precaution, you can easily restrict the 'git' user to only doing Git activities with a limited shell tool called git-shell that comes with Git. If you set this as your 'git' user’s login shell, then the 'git' user can’t have normal shell access to your server. To use this, specify git-shell instead of bash or csh for your user’s login shell. To do so, you’ll likely have to edit your /etc/passwd file:

$ sudo vim /etc/passwd

At the bottom, you should find a line that looks something like this:

git:x:1000:1000::/home/git:/bin/sh

Change /bin/sh to /usr/bin/git-shell (or run which git-shell to see where it’s installed). The line should look something like this:

git:x:1000:1000::/home/git:/usr/bin/git-shell

Now, the 'git' user can only use the SSH connection to push and pull Git repositories and can’t shell onto the machine. If you try, you’ll see a login rejection like this:

$ ssh git@gitserver
fatal: What do you think I am? A shell?
Connection to gitserver closed.

Public Access

What if you want anonymous read access to your project? Perhaps instead of hosting an internal private project, you want to host an open source project. Or maybe you have a bunch of automated build servers or continuous integration servers that change a lot, and you don’t want to have to generate SSH keys all the time — you just want to add simple anonymous read access.

Probably the simplest way for smaller setups is to run a static web server with its document root where your Git repositories are, and then enable that post-update hook we mentioned in the first section of this chapter. Let’s work from the previous example. Say you have your repositories in the /opt/git directory, and an Apache server is running on your machine. Again, you can use any web server for this; but as an example, we’ll demonstrate some basic Apache configurations that should give you an idea of what you might need.

First you need to enable the hook:

$ cd project.git
$ mv hooks/post-update.sample hooks/post-update
$ chmod a+x hooks/post-update

If you’re using a version of Git earlier than 1.6, the mv command isn’t necessary — Git started naming the hooks examples with the .sample postfix only recently.

What does this post-update hook do? It looks basically like this:

$ cat .git/hooks/post-update 
#!/bin/sh
exec git-update-server-info

This means that when you push to the server via SSH, Git will run this command to update the files needed for HTTP fetching.

Next, you need to add a VirtualHost entry to your Apache configuration with the document root as the root directory of your Git projects. Here, we’re assuming that you have wildcard DNS set up to send *.gitserver to whatever box you’re using to run all this:

<VirtualHost *:80>
    ServerName git.gitserver
    DocumentRoot /opt/git
    <Directory /opt/git/>
        Order allow, deny
        allow from all
    </Directory>
</VirtualHost>

You’ll also need to set the Unix user group of the /opt/git directories to www-data so your web server can read-access the repositories, because the Apache instance running the CGI script will (by default) be running as that user:

$ chgrp -R www-data /opt/git

When you restart Apache, you should be able to clone your repositories under that directory by specifying the URL for your project:

$ git clone http://git.gitserver/project.git

This way, you can set up HTTP-based read access to any of your projects for a fair number of users in a few minutes. Another simple option for public unauthenticated access is to start a Git daemon, although that requires you to daemonize the process - we’ll cover this option in the next section, if you prefer that route.

GitWeb

Now that you have basic read/write and read-only access to your project, you may want to set up a simple web-based visualizer. Git comes with a CGI script called GitWeb that is commonly used for this. You can see GitWeb in use at sites like http://git.kernel.org (see Figure 4-1).

Figure 4-1. The GitWeb web-based user interface.

If you want to check out what GitWeb would look like for your project, Git comes with a command to fire up a temporary instance if you have a lightweight server on your system like lighttpd or webrick. On Linux machines, lighttpd is often installed, so you may be able to get it to run by typing git instaweb in your project directory. If you’re running a Mac, Leopard comes preinstalled with Ruby, so webrick may be your best bet. To start instaweb with a non-lighttpd handler, you can run it with the --httpd option.

$ git instaweb --httpd=webrick
[2009-02-21 10:02:21] INFO  WEBrick 1.3.1
[2009-02-21 10:02:21] INFO  ruby 1.8.6 (2008-03-03) [universal-darwin9.0]

That starts up an HTTPD server on port 1234 and then automatically starts a web browser that opens on that page. It’s pretty easy on your part. When you’re done and want to shut down the server, you can run the same command with the --stop option:

$ git instaweb --httpd=webrick --stop

If you want to run the web interface on a server all the time for your team or for an open source project you’re hosting, you’ll need to set up the CGI script to be served by your normal web server. Some Linux distributions have a gitweb package that you may be able to install via apt or yum, so you may want to try that first. We’ll walk though installing GitWeb manually very quickly. First, you need to get the Git source code, which GitWeb comes with, and generate the custom CGI script:

$ git clone git://git.kernel.org/pub/scm/git/git.git
$ cd git/
$ make GITWEB_PROJECTROOT="/opt/git" \
        prefix=/usr gitweb/gitweb.cgi
$ sudo cp -Rf gitweb /var/www/

Notice that you have to tell the command where to find your Git repositories with the GITWEB_PROJECTROOT variable. Now, you need to make Apache use CGI for that script, for which you can add a VirtualHost:

<VirtualHost *:80>
    ServerName gitserver
    DocumentRoot /var/www/gitweb
    <Directory /var/www/gitweb>
        Options ExecCGI +FollowSymLinks +SymLinksIfOwnerMatch
        AllowOverride All
        order allow,deny
        Allow from all
        AddHandler cgi-script cgi
        DirectoryIndex gitweb.cgi
    </Directory>
</VirtualHost>

Again, GitWeb can be served with any CGI capable web server; if you prefer to use something else, it shouldn’t be difficult to set up. At this point, you should be able to visit http://gitserver/ to view your repositories online, and you can use http://git.gitserver to clone and fetch your repositories over HTTP.

Gitosis

Keeping all users’ public keys in the authorized_keys file for access works well only for a while. When you have hundreds of users, it’s much more of a pain to manage that process. You have to shell onto the server each time, and there is no access control — everyone in the file has read and write access to every project.

At this point, you may want to turn to a widely used software project called Gitosis. Gitosis is basically a set of scripts that help you manage the authorized_keys file as well as implement some simple access controls. The really interesting part is that the UI for this tool for adding people and determining access isn’t a web interface but a special Git repository. You set up the information in that project; and when you push it, Gitosis reconfigures the server based on that, which is cool.

Installing Gitosis isn’t the simplest task ever, but it’s not too difficult. It’s easiest to use a Linux server for it — these examples use a stock Ubuntu 8.10 server.

Gitosis requires some Python tools, so first you have to install the Python setuptools package, which Ubuntu provides as python-setuptools:

$ apt-get install python-setuptools

Next, you clone and install Gitosis from the project’s main site:

$ git clone git://eagain.net/gitosis.git
$ cd gitosis
$ sudo python setup.py install

That installs a couple of executables that Gitosis will use. Next, Gitosis wants to put its repositories under /home/git, which is fine. But you have already set up your repositories in /opt/git, so instead of reconfiguring everything, you create a symlink:

$ ln -s /opt/git /home/git/repositories

Gitosis is going to manage your keys for you, so you need to remove the current file, re-add the keys later, and let Gitosis control the authorized_keys file automatically. For now, move the authorized_keys file out of the way:

$ mv /home/git/.ssh/authorized_keys /home/git/.ssh/ak.bak

Next you need to turn your shell back on for the 'git' user, if you changed it to the git-shell command. People still won’t be able to log in, but Gitosis will control that for you. So, let’s change this line in your /etc/passwd file

git:x:1000:1000::/home/git:/usr/bin/git-shell

back to this:

git:x:1000:1000::/home/git:/bin/sh

Now it’s time to initialize Gitosis. You do this by running the gitosis-init command with your personal public key. If your public key isn’t on the server, you’ll have to copy it there:

$ sudo -H -u git gitosis-init < /tmp/id_dsa.pub
Initialized empty Git repository in /opt/git/gitosis-admin.git/
Reinitialized existing Git repository in /opt/git/gitosis-admin.git/

This lets the user with that key modify the main Git repository that controls the Gitosis setup. Next, you have to manually set the execute bit on the post-update script for your new control repository.

$ sudo chmod 755 /opt/git/gitosis-admin.git/hooks/post-update

You’re ready to roll. If you’re set up correctly, you can try to SSH into your server as the user for which you added the public key to initialize Gitosis. You should see something like this:

$ ssh git@gitserver
PTY allocation request failed on channel 0
fatal: unrecognized command 'gitosis-serve schacon@quaternion'
  Connection to gitserver closed.

That means Gitosis recognized you but shut you out because you’re not trying to do any Git commands. So, let’s do an actual Git command — you’ll clone the Gitosis control repository:

# on your local computer
$ git clone git@gitserver:gitosis-admin.git

Now you have a directory named gitosis-admin, which has two major parts:

$ cd gitosis-admin
$ find .
./gitosis.conf
./keydir
./keydir/scott.pub

The gitosis.conf file is the control file you use to specify users, repositories, and permissions. The keydir directory is where you store the public keys of all the users who have any sort of access to your repositories — one file per user. The name of the file in keydir (in the previous example, scott.pub) will be different for you — Gitosis takes that name from the description at the end of the public key that was imported with the gitosis-init script.

If you look at the gitosis.conf file, it should only specify information about the gitosis-admin project that you just cloned:

$ cat gitosis.conf 
[gitosis]

[group gitosis-admin]
writable = gitosis-admin
members = scott

It shows you that the 'scott' user — the user with whose public key you initialized Gitosis — is the only one who has access to the gitosis-admin project.

Now, let’s add a new project for you. You’ll add a new section called mobile where you’ll list the developers on your mobile team and projects that those developers need access to. Because 'scott' is the only user in the system right now, you’ll add him as the only member, and you’ll create a new project called iphone_project to start on:

[group mobile]
writable = iphone_project
members = scott

Whenever you make changes to the gitosis-admin project, you have to commit the changes and push them back up to the server in order for them to take effect:

$ git commit -am 'add iphone_project and mobile group'
[master]: created 8962da8: "changed name"
 1 files changed, 4 insertions(+), 0 deletions(-)
$ git push
Counting objects: 5, done.
Compressing objects: 100% (2/2), done.
Writing objects: 100% (3/3), 272 bytes, done.
Total 3 (delta 1), reused 0 (delta 0)
To git@gitserver:/opt/git/gitosis-admin.git
   fb27aec..8962da8  master -> master

You can make your first push to the new iphone_project project by adding your server as a remote to your local version of the project and pushing. You no longer have to manually create a bare repository for new projects on the server — Gitosis creates them automatically when it sees the first push:

$ git remote add origin git@gitserver:iphone_project.git
$ git push origin master
Initialized empty Git repository in /opt/git/iphone_project.git/
Counting objects: 3, done.
Writing objects: 100% (3/3), 230 bytes, done.
Total 3 (delta 0), reused 0 (delta 0)
To git@gitserver:iphone_project.git
 * [new branch]      master -> master

Notice that you don’t need to specify the path (in fact, doing so won’t work), just a colon and then the name of the project — Gitosis finds it for you.

You want to work on this project with your friends, so you’ll have to re-add their public keys. But instead of appending them manually to the ~/.ssh/authorized_keys file on your server, you’ll add them, one key per file, into the keydir directory. How you name the keys determines how you refer to the users in the gitosis.conf file. Let’s re-add the public keys for John, Josie, and Jessica:

$ cp /tmp/id_rsa.john.pub keydir/john.pub
$ cp /tmp/id_rsa.josie.pub keydir/josie.pub
$ cp /tmp/id_rsa.jessica.pub keydir/jessica.pub

Now you can add them all to your 'mobile' team so they have read and write access to iphone_project:

[group mobile]
writable = iphone_project
members = scott john josie jessica

After you commit and push that change, all four users will be able to read from and write to that project.

Gitosis has simple access controls as well. If you want John to have only read access to this project, you can do this instead:

[group mobile]
writable = iphone_project
members = scott josie jessica

[group mobile_ro]
readonly = iphone_project
members = john

Now John can clone the project and get updates, but Gitosis won’t allow him to push back up to the project. You can create as many of these groups as you want, each containing different users and projects. You can also specify another group as one of the members (using @ as prefix), to inherit all of its members automatically:

[group mobile_committers]
members = scott josie jessica

[group mobile]
writable  = iphone_project
members   = @mobile_committers

[group mobile_2]
writable  = another_iphone_project
members   = @mobile_committers john

If you have any issues, it may be useful to add loglevel=DEBUG under the [gitosis] section. If you’ve lost push access by pushing a messed-up configuration, you can manually fix the file on the server under /home/git/.gitosis.conf — the file from which Gitosis reads its info. A push to the project takes the gitosis.conf file you just pushed up and sticks it there. If you edit that file manually, it remains like that until the next successful push to the gitosis-admin project.

Gitolite

Git has started to become very popular in corporate environments, which tend to have some additional requirements in terms of access control. Gitolite was created to help with those requirements.

Gitolite allows you to specify permissions not just by repository (like Gitosis does), but also by branch or tag names within each repository. That is, you can specify that certain people (or groups of people) can only push certain "refs" (branches or tags) but not others.

Installing

Installing Gitolite is very easy, even if you don't read the extensive documentation that comes with it. You need an account on a Unix server of some kind (various Linux flavours, and Solaris 10, have been tested), with git, perl, and an openssh compatible ssh server installed. In the examples below, we will use the gitolite account on a host called gitserver.

Curiously, Gitolite is installed by running a script on the workstation, so your workstation must have a bash shell available. Even the bash that comes with msysgit will do, in case you're wondering.

You start by obtaining public key based access to your server, so that you can log in from your workstation to the server without getting a password prompt. The following method works on Linux; for other workstation OSs you may have to do this manually. We assume you already had a key pair generated using ssh-keygen.

$ ssh-copy-id -i ~/.ssh/id_rsa gitolite@gitserver

This will ask you for the password to the gitolite account, and then set up public key access. This is essential for the install script, so check to make sure you can run a command without getting a password prompt:

$ ssh gitolite@gitserver pwd
/home/gitolite

Next, you clone Gitolite from the project's main site and run the "easy install" script (the third argument is your name as you would like it to appear in the resulting gitolite-admin repository):

$ git clone git://github.com/sitaramc/gitolite
$ cd gitolite/src
$ ./gl-easy-install -q gitolite gitserver sitaram

And you're done! Gitolite has now been installed on the server, and you now have a brand new repository called gitolite-admin in the home directory of your workstation. You administer your gitolite setup by making changes to this repository and pushing (just like Gitosis).

[By the way, upgrading gitolite is also done the same way. Also, if you're interested, run the script without any arguments to get a usage message.]

That last command does produce a fair amount of output, which might be interesting to read. Also, the first time you run this, a new keypair is created; you will have to choose a passphrase or hit enter for none. Why a second keypair is needed, and how it is used, is explained in the "ssh troubleshooting" document that comes with Gitolite. (Hey the documentation has to be good for something!)

Customising the Install

While the default, quick, install works for most people, there are some ways to customise the install if you need to. Firstly, there are two other branches that you may be interested in installing, instead of "master". The "wildrepos" branch allows you to specify repositories by wildcards (regular expressions) in the configuration file; an extremely powerful feature that we will not be covering in this article. And if your server side git is older than 1.5.6 or so, you should use the "oldgits" branch.

Finally, if you omit the -q argument, you get a "verbose" mode install -- detailed information on what the install is doing at each step. The verbose mode also allows you to change certain server-side parameters, such as the location of the actual repositories, by editing an "rc" file that the server uses. This "rc" file is liberally commented so you should be able to make any changes you need quite easily, save it, and continue.

Config File and Access Control Rules

So once the install is done, you switch to the gitolite-admin repository (placed in your HOME directory) and poke around to see what you got:

$ cd ~/gitolite-admin/
$ ls
conf/  keydir/
$ find conf keydir -type f
conf/gitolite.conf
keydir/sitaram.pub
$ cat conf/gitolite.conf
#gitolite conf
# please see conf/example.conf for details on syntax and features

repo gitolite-admin
    RW+                 = sitaram

repo testing
    RW+                 = @all

Notice that "sitaram" (the last argument in the gl-easy-install command you gave earlier) has read-write permissions on the gitolite-admin repository as well as a public key file of the same name.

The config file syntax for Gitolite is quite different from Gitosis. Again, this is liberally documented in conf/example.conf, so we'll only mention some highlights here.

You can group users or repos for convenience. The group names are just like macros; when defining them, it doesn't even matter whether they are projects or users; that distinction is only made when you use the "macro".

@oss_repos      = linux perl rakudo git gitolite
@secret_repos   = fenestra pear

@admins         = scott     # Adams, not Chacon, sorry :)
@interns        = ashok     # get the spelling right, Scott!
@engineers      = sitaram dilbert wally alice
@staff          = @admins @engineers @interns

You can control permissions at the "ref" level. In the following example, interns can only push the "int" branch. Engineers can push any branch whose name starts with "eng-", and tags that start with "rc" followed by a digit. And the admins can do anything (including rewind) to any ref.

repo @oss_repos
    RW  int$                = @interns
    RW  eng-                = @engineers
    RW  refs/tags/rc[0-9]   = @engineers
    RW+                     = @admins

The expression after the RW or RW+ is a regular expression (regex) that the refname (ref) being pushed is matched against. So we call it a "refex"! Of course, a refex can be far more powerful than shown here, so don't overdo it if you're not comfortable with perl regexes.

Also, as you probably guessed, Gitolite prefixes refs/heads/ as a syntactic convenience if the refex does not begin with refs/.

An important feature of the config file's syntax is that all the rules for a repository need not be in one place. You can keep all the common stuff together, like the rules for all oss_repos shown above, then add specific rules for specific cases later on, like so:

repo gitolite
    RW+                     = sitaram

That rule will just get added to the ruleset for the gitolite repository.

At this point you might be wondering how the access control rules are actually applied, so let's go over that briefly.

There are two levels of access control in gitolite. The first is at the repository level; if you have read (or write) access to any ref in the repository, then you have read (or write) access to the repository. This is the only access control that Gitosis had.

The second level, applicable only to "write" access, is by branch or tag within a repository. The username, the access being attempted (W or +), and the refname being updated are known. The access rules are checked in order of appearance in the config file, looking for a match for this combination (but remember that the refname is regex-matched, not merely string-matched). If a match is found, the push succeeds. A fallthrough results in access being denied.

Advanced Access Control -- "deny" rules

As you can see, permissions can be one of R, RW, or RW+. There is another permission available: -, standing for "deny". This gives you a lot more power, at the expense of some complexity, because now fallthrough is not the only way for access to be denied, and so the order of the rules now matters!

Let us say, in the situation above, we want engineers to be able to rewind any branch except master and integ. Here's how:

    RW  master integ    = @engineers
    -   master integ    = @engineers
    RW+                 = @engineers

Again, you simply follow the rules top down until you hit a match for your access mode, or a deny. Non-rewind push to master or integ is allowed by the first rule. A rewind push to those refs does not match the first rule, drops down to the second, and is therefore denied. Any push (rewind or non-rewind) to refs other than master or integ won't match the first two rules anyway, and the third rule allows it.

If that sounds complicated, you may want to play with it to increase your understanding. Also, most of the time you don't need "deny" rules anyway, so you can choose to just avoid them if you prefer.

Other Features

We'll round off this discussion with a bunch of other features, all of which are described in great detail in the "faqs, tips, etc" document.

Gitolite logs all successful accesses. If you were somewhat relaxed about giving people rewind permissions (RW+) and some kid blew away "master", the log file is a life saver, in terms of easily and quickly finding the SHA that got hosed.

One extremely useful convenience feature in gitolite is support for git installed outside the normal $PATH (this is more common than you think; some corporate environments or even some hosting providers refuse to install things system-wide and you end up putting them in your own directories). Normally, you are forced to make the client-side git aware of this non-standard location of the git binaries in some way. With gitolite, just choose a verbose install and set $GIT_PATH in the "rc" files. No client-side changes are required after that :-)

Another convenient feature is what happens when you try and just ssh to the server. Older versions of gitolite used to complain about the SSH_ORIGINAL_COMMAND environment variable being empty (see the ssh documentation if interested). Now Gitolite comes up with something like this:

hello sitaram, the gitolite version here is v0.90-9-g91e1e9f
you have the following permissions:
  R     anu-wsd
  R     entrans
  R  W  git-notes
  R  W  gitolite
  R  W  gitolite-admin
  R     indic_web_input
  R     shreelipi_converter

For really large installations, you can delegate responsibility for groups of repositories to various people and have them manage those pieces independently. This reduces the load on the main admin, and makes him less of a bottleneck. This feature has its own documentation file in the doc/ directory.

Finally, Gitolite also has a feature called "personal branches" (or rather, "personal branch namespace") that can be very useful in a corporate environment.

A lot of code exchange in the git world happens by "please pull" requests. In a corporate environment, however, unauthenticated access is a no-no, and a developer workstation cannot do authentication, so you have to push to the central server and ask someone to pull from there.

This would normally cause the same branch name clutter as in a centralised VCS, plus setting up permissions for this becomes a chore for the admin.

Gitolite lets you define a "personal" or "scratch" namespace prefix for each developer (for example, refs/personal/<devname>/*), with full permissions for that dev only, and read access for everyone else. Just choose a verbose install and set the $PERSONAL variable in the "rc" file to refs/personal. That's all; it's pretty much fire and forget as far as the admin is concerned, even if there is constant churn in the project team composition.

Git Daemon

For public, unauthenticated read access to your projects, you’ll want to move past the HTTP protocol and start using the Git protocol. The main reason is speed. The Git protocol is far more efficient and thus faster than the HTTP protocol, so using it will save your users time.

Again, this is for unauthenticated read-only access. If you’re running this on a server outside your firewall, it should only be used for projects that are publicly visible to the world. If the server you’re running it on is inside your firewall, you might use it for projects that a large number of people or computers (continuous integration or build servers) have read-only access to, when you don’t want to have to add an SSH key for each.

In any case, the Git protocol is relatively easy to set up. Basically, you need to run this command in a daemonized manner:

git daemon --reuseaddr --base-path=/opt/git/ /opt/git/

--reuseaddr allows the server to restart without waiting for old connections to time out, the --base-path option allows people to clone projects without specifying the entire path, and the path at the end tells the Git daemon where to look for repositories to export. If you’re running a firewall, you’ll also need to punch a hole in it at port 9418 on the box you’re setting this up on.

You can daemonize this process a number of ways, depending on the operating system you’re running. On an Ubuntu machine, you use an Upstart script. So, in the following file

/etc/event.d/local-git-daemon

you put this script:

start on startup
stop on shutdown
exec /usr/bin/git daemon \
    --user=git --group=git \
    --reuseaddr \
    --base-path=/opt/git/ \
    /opt/git/
respawn

For security reasons, it is strongly encouraged to have this daemon run as a user with read-only permissions to the repositories – you can easily do this by creating a new user 'git-ro' and running the daemon as them. For the sake of simplicity we’ll simply run it as the same 'git' user that Gitosis is running as.

When you restart your machine, your Git daemon will start automatically and respawn if it goes down. To get it running without having to reboot, you can run this:

initctl start local-git-daemon

On other systems, you may want to use xinetd, a script in your sysvinit system, or something else — as long as you get that command daemonized and watched somehow.

Next, you have to tell your Gitosis server which repositories to allow unauthenticated Git server-based access to. If you add a section for each repository, you can specify the ones from which you want your Git daemon to allow reading. If you want to allow Git protocol access for your iphone project, you add this to the end of the gitosis.conf file:

[repo iphone_project]
daemon = yes

When that is committed and pushed up, your running daemon should start serving requests for the project to anyone who has access to port 9418 on your server.

If you decide not to use Gitosis, but you want to set up a Git daemon, you’ll have to run this on each project you want the Git daemon to serve:

$ cd /path/to/project.git
$ touch git-daemon-export-ok

The presence of that file tells Git that it’s OK to serve this project without authentication.

Gitosis can also control which projects GitWeb shows. First, you need to add something like the following to the /etc/gitweb.conf file:

$projects_list = "/home/git/gitosis/projects.list";
$projectroot = "/home/git/repositories";
$export_ok = "git-daemon-export-ok";
@git_base_url_list = ('git://gitserver');

You can control which projects GitWeb lets users browse by adding or removing a gitweb setting in the Gitosis configuration file. For instance, if you want the iphone project to show up on GitWeb, you make the repo setting look like this:

[repo iphone_project]
daemon = yes
gitweb = yes

Now, if you commit and push the project, GitWeb will automatically start showing your iphone project.

Hosted Git

If you don’t want to go through all of the work involved in setting up your own Git server, you have several options for hosting your Git projects on an external dedicated hosting site. Doing so offers a number of advantages: a hosting site is generally quick to set up and easy to start projects on, and no server maintenance or monitoring is involved. Even if you set up and run your own server internally, you may still want to use a public hosting site for your open source code — it’s generally easier for the open source community to find and help you with.

These days, you have a huge number of hosting options to choose from, each with different advantages and disadvantages. To see an up-to-date list, check out the GitHosting page on the main Git wiki:

http://git.or.cz/gitwiki/GitHosting

Because we can’t cover all of them, and because I happen to work at one of them, we’ll use this section to walk through setting up an account and creating a new project at GitHub. This will give you an idea of what is involved.

GitHub is by far the largest open source Git hosting site and it’s also one of the very few that offers both public and private hosting options so you can keep your open source and private commercial code in the same place. In fact, we used GitHub to privately collaborate on this book.

GitHub

GitHub is slightly different than most code-hosting sites in the way that it namespaces projects. Instead of being primarily based on the project, GitHub is user centric. That means when I host my grit project on GitHub, you won’t find it at github.com/grit but instead at github.com/schacon/grit. There is no canonical version of any project, which allows a project to move from one user to another seamlessly if the first author abandons the project.

GitHub is also a commercial company that charges for accounts that maintain private repositories, but anyone can quickly get a free account to host as many open source projects as they want. We’ll quickly go over how that is done.

Setting Up a User Account

The first thing you need to do is set up a free user account. If you visit the Pricing and Signup page at http://github.com/plans and click the "Sign Up" button on the Free account (see figure 4-2), you’re taken to the signup page.

Figure 4-2. The GitHub plan page.

Here you must choose a username that isn’t yet taken in the system and enter an e-mail address that will be associated with the account and a password (see Figure 4-3).

Figure 4-3. The GitHub user signup form.

If you have it available, this is a good time to add your public SSH key as well. We covered how to generate a new key earlier, in the "Simple Setups" section. Take the contents of the public key of that pair, and paste it into the SSH Public Key text box. Clicking the "explain ssh keys" link takes you to detailed instructions on how to do so on all major operating systems. Clicking the "I agree, sign me up" button takes you to your new user dashboard (see Figure 4-4).

Figure 4-4. The GitHub user dashboard.

Next you can create a new repository.

Creating a New Repository

Start by clicking the "create a new one" link next to Your Repositories on the user dashboard. You’re taken to the Create a New Repository form (see Figure 4-5).

Figure 4-5. Creating a new repository on GitHub.

All you really have to do is provide a project name, but you can also add a description. When that is done, click the "Create Repository" button. Now you have a new repository on GitHub (see Figure 4-6).

Figure 4-6. GitHub project header information.

Since you have no code there yet, GitHub will show you instructions for how create a brand-new project, push an existing Git project up, or import a project from a public Subversion repository (see Figure 4-7).

Figure 4-7. Instructions for a new repository.

These instructions are similar to what we’ve already gone over. To initialize a project if it isn’t already a Git project, you use

$ git init
$ git add .
$ git commit -m 'initial commit'

When you have a Git repository locally, add GitHub as a remote and push up your master branch:

$ git remote add origin git@github.com:testinguser/iphone_project.git
$ git push origin master

Now your project is hosted on GitHub, and you can give the URL to anyone you want to share your project with. In this case, it’s http://github.com/testinguser/iphone_project. You can also see from the header on each of your project’s pages that you have two Git URLs (see Figure 4-8).

Figure 4-8. Project header with a public URL and a private URL.

The Public Clone URL is a public, read-only Git URL over which anyone can clone the project. Feel free to give out that URL and post it on your web site or what have you.

The Your Clone URL is a read/write SSH-based URL that you can read or write over only if you connect with the SSH private key associated with the public key you uploaded for your user. When other users visit this project page, they won’t see that URL—only the public one.

Importing from Subversion

If you have an existing public Subversion project that you want to import into Git, GitHub can often do that for you. At the bottom of the instructions page is a link to a Subversion import. If you click it, you see a form with information about the import process and a text box where you can paste in the URL of your public Subversion project (see Figure 4-9).

Figure 4-9. Subversion importing interface.

If your project is very large, nonstandard, or private, this process probably won’t work for you. In Chapter 7, you’ll learn how to do more complicated manual project imports.

Adding Collaborators

Let’s add the rest of the team. If John, Josie, and Jessica all sign up for accounts on GitHub, and you want to give them push access to your repository, you can add them to your project as collaborators. Doing so will allow pushes from their public keys to work.

Click the "edit" button in the project header or the Admin tab at the top of the project to reach the Admin page of your GitHub project (see Figure 4-10).

Figure 4-10. GitHub administration page.

To give another user write access to your project, click the “Add another collaborator” link. A new text box appears, into which you can type a username. As you type, a helper pops up, showing you possible username matches. When you find the correct user, click the Add button to add that user as a collaborator on your project (see Figure 4-11).

Figure 4-11. Adding a collaborator to your project.

When you’re finished adding collaborators, you should see a list of them in the Repository Collaborators box (see Figure 4-12).

Figure 4-12. A list of collaborators on your project.

If you need to revoke access to individuals, you can click the "revoke" link, and their push access will be removed. For future projects, you can also copy collaborator groups by copying the permissions of an existing project.

Your Project

After you push your project up or have it imported from Subversion, you have a main project page that looks something like Figure 4-13.

Figure 4-13. A GitHub main project page.

When people visit your project, they see this page. It contains tabs to different aspects of your projects. The Commits tab shows a list of commits in reverse chronological order, similar to the output of the git log command. The Network tab shows all the people who have forked your project and contributed back. The Downloads tab allows you to upload project binaries and link to tarballs and zipped versions of any tagged points in your project. The Wiki tab provides a wiki where you can write documentation or other information about your project. The Graphs tab has some contribution visualizations and statistics about your project. The main Source tab that you land on shows your project’s main directory listing and automatically renders the README file below it if you have one. This tab also shows a box with the latest commit information.

Forking Projects

If you want to contribute to an existing project to which you don’t have push access, GitHub encourages forking the project. When you land on a project page that looks interesting and you want to hack on it a bit, you can click the "fork" button in the project header to have GitHub copy that project to your user so you can push to it.

This way, projects don’t have to worry about adding users as collaborators to give them push access. People can fork a project and push to it, and the main project maintainer can pull in those changes by adding them as remotes and merging in their work.

To fork a project, visit the project page (in this case, mojombo/chronic) and click the "fork" button in the header (see Figure 4-14).

Figure 4-14. Get a writable copy of any repository by clicking the "fork" button.

After a few seconds, you’re taken to your new project page, which indicates that this project is a fork of another one (see Figure 4-15).

Figure 4-15. Your fork of a project.

GitHub Summary

That’s all we’ll cover about GitHub, but it’s important to note how quickly you can do all this. You can create an account, add a new project, and push to it in a matter of minutes. If your project is open source, you also get a huge community of developers who now have visibility into your project and may well fork it and help contribute to it. At the very least, this may be a way to get up and running with Git and try it out quickly.

Summary

You have several options to get a remote Git repository up and running so that you can collaborate with others or share your work.

Running your own server gives you a lot of control and allows you to run the server within your own firewall, but such a server generally requires a fair amount of your time to set up and maintain. If you place your data on a hosted server, it’s easy to set up and maintain; however, you have to be able to keep your code on someone else’s servers, and some organizations don’t allow that.

It should be fairly straightforward to determine which solution or combination of solutions is appropriate for you and your organization.

Distributed Git

Now that you have a remote Git repository set up as a point for all the developers to share their code, and you’re familiar with basic Git commands in a local workflow, you’ll look at how to utilize some of the distributed workflows that Git affords you.

In this chapter, you’ll see how to work with Git in a distributed environment as a contributor and an integrator. That is, you’ll learn how to contribute code successfully to a project and make it as easy on you and the project maintainer as possible, and also how to maintain a project successfully with a number of developers contributing.

Distributed Workflows

Unlike Centralized Version Control Systems (CVCSs), the distributed nature of Git allows you to be far more flexible in how developers collaborate on projects. In centralized systems, every developer is a node working more or less equally on a central hub. In Git, however, every developer is potentially both a node and a hub — that is, every developer can both contribute code to other repositories and maintain a public repository on which others can base their work and which they can contribute to. This opens a vast range of workflow possibilities for your project and/or your team, so I’ll cover a few common paradigms that take advantage of this flexibility. I’ll go over the strengths and possible weaknesses of each design; you can choose a single one to use, or you can mix and match features from each.

Centralized Workflow

In centralized systems, there is generally a single collaboration model—the centralized workflow. One central hub, or repository, can accept code, and everyone synchronizes their work to it. A number of developers are nodes — consumers of that hub — and synchronize to that one place (see Figure 5-1).

Figure 5-1. Centralized workflow.

This means that if two developers clone from the hub and both make changes, the first developer to push their changes back up can do so with no problems. The second developer must merge in the first one’s work before pushing changes up, so as not to overwrite the first developer’s changes. This concept is true in Git as it is in Subversion (or any CVCS), and this model works perfectly in Git.

If you have a small team or are already comfortable with a centralized workflow in your company or team, you can easily continue using that workflow with Git. Simply set up a single repository, and give everyone on your team push access; Git won’t let users overwrite each other. If one developer clones, makes changes, and then tries to push their changes while another developer has pushed in the meantime, the server will reject that developer’s changes. They will be told that they’re trying to push non-fast-forward changes and that they won’t be able to do so until they fetch and merge. This workflow is attractive to a lot of people because it’s a paradigm that many are familiar and comfortable with.

Integration-Manager Workflow

Because Git allows you to have multiple remote repositories, it’s possible to have a workflow where each developer has write access to their own public repository and read access to everyone else’s. This scenario often includes a canonical repository that represents the "official" project. To contribute to that project, you create your own public clone of the project and push your changes to it. Then, you can send a request to the maintainer of the main project to pull in your changes. They can add your repository as a remote, test your changes locally, merge them into their branch, and push back to their repository. The process works as follow (see Figure 5-2):

  1. The project maintainer pushes to their public repository.
  2. A contributor clones that repository and makes changes.
  3. The contributor pushes to their own public copy.
  4. The contributor sends the maintainer an e-mail asking them to pull changes.
  5. The maintainer adds the contributor’s repo as a remote and merges locally.
  6. The maintainer pushes merged changes to the main repository.

Figure 5-2. Integration-manager workflow.

This is a very common workflow with sites like GitHub, where it’s easy to fork a project and push your changes into your fork for everyone to see. One of the main advantages of this approach is that you can continue to work, and the maintainer of the main repository can pull in your changes at any time. Contributors don’t have to wait for the project to incorporate their changes — each party can work at their own pace.

Dictator and Lieutenants Workflow

This is a variant of a multiple-repository workflow. It’s generally used by huge projects with hundreds of collaborators; one famous example is the Linux kernel. Various integration managers are in charge of certain parts of the repository; they’re called lieutenants. All the lieutenants have one integration manager known as the benevolent dictator. The benevolent dictator’s repository serves as the reference repository from which all the collaborators need to pull. The process works like this (see Figure 5-3):

  1. Regular developers work on their topic branch and rebase their work on top of master. The master branch is that of the dictator.
  2. Lieutenants merge the developers’ topic branches into their master branch.
  3. The dictator merges the lieutenants’ master branches into the dictator’s master branch.
  4. The dictator pushes their master to the reference repository so the other developers can rebase on it.


Figure 5-3. Benevolent dictator workflow.

This kind of workflow isn’t common but can be useful in very big projects or in highly hierarchical environments, because as it allows the project leader (the dictator) to delegate much of the work and collect large subsets of code at multiple points before integrating them.

These are some commonly used workflows that are possible with a distributed system like Git, but you can see that many variations are possible to suit your particular real-world workflow. Now that you can (I hope) determine which workflow combination may work for you, I’ll cover some more specific examples of how to accomplish the main roles that make up the different flows.

Contributing to a Project

You know what the different workflows are, and you should have a pretty good grasp of fundamental Git usage. In this section, you’ll learn about a few common patterns for contributing to a project.

The main difficulty with describing this process is that there are a huge number of variations on how it’s done. Because Git is very flexible, people can and do work together many ways, and it’s problematic to describe how you should contribute to a project — every project is a bit different. Some of the variables involved are active contributor size, chosen workflow, your commit access, and possibly the external contribution method.

The first variable is active contributor size. How many users are actively contributing code to this project, and how often? In many instances, you’ll have two or three developers with a few commits a day, or possibly less for somewhat dormant projects. For really large companies or projects, the number of developers could be in the thousands, with dozens or even hundreds of patches coming in each day. This is important because with more and more developers, you run into more issues with making sure your code applies cleanly or can be easily merged. Changes you submit may be rendered obsolete or severely broken by work that is merged in while you were working or while your changes were waiting to be approved or applied. How can you keep your code consistently up to date and your patches valid?

The next variable is the workflow in use for the project. Is it centralized, with each developer having equal write access to the main codeline? Does the project have a maintainer or integration manager who checks all the patches? Are all the patches peer-reviewed and approved? Are you involved in that process? Is a lieutenant system in place, and do you have to submit your work to them first?

The next issue is your commit access. The workflow required in order to contribute to a project is much different if you have write access to the project than if you don’t. If you don’t have write access, how does the project prefer to accept contributed work? Does it even have a policy? How much work are you contributing at a time? How often do you contribute?

All these questions can affect how you contribute effectively to a project and what workflows are preferred or available to you. I’ll cover aspects of each of these in a series of use cases, moving from simple to more complex; you should be able to construct the specific workflows you need in practice from these examples.

Commit Guidelines

Before you start looking at the specific use cases, here’s a quick note about commit messages. Having a good guideline for creating commits and sticking to it makes working with Git and collaborating with others a lot easier. The Git project provides a document that lays out a number of good tips for creating commits from which to submit patches — you can read it in the Git source code in the Documentation/SubmittingPatches file.

First, you don’t want to submit any whitespace errors. Git provides an easy way to check for this — before you commit, run git diff --check, which identifies possible whitespace errors and lists them for you. Here is an example, where I’ve replaced a red terminal color with Xs:

$ git diff --check
lib/simplegit.rb:5: trailing whitespace.
+    @git_dir = File.expand_path(git_dir)XX
lib/simplegit.rb:7: trailing whitespace.
+ XXXXXXXXXXX
lib/simplegit.rb:26: trailing whitespace.
+    def command(git_cmd)XXXX

If you run that command before committing, you can tell if you’re about to commit whitespace issues that may annoy other developers.

Next, try to make each commit a logically separate changeset. If you can, try to make your changes digestible — don’t code for a whole weekend on five different issues and then submit them all as one massive commit on Monday. Even if you don’t commit during the weekend, use the staging area on Monday to split your work into at least one commit per issue, with a useful message per commit. If some of the changes modify the same file, try to use git add --patch to partially stage files (covered in detail in Chapter 6). The project snapshot at the tip of the branch is identical whether you do one commit or five, as long as all the changes are added at some point, so try to make things easier on your fellow developers when they have to review your changes. This approach also makes it easier to pull out or revert one of the changesets if you need to later. Chapter 6 describes a number of useful Git tricks for rewriting history and interactively staging files — use these tools to help craft a clean and understandable history.

The last thing to keep in mind is the commit message. Getting in the habit of creating quality commit messages makes using and collaborating with Git a lot easier. As a general rule, your messages should start with a single line that’s no more than about 50 characters and that describes the changeset concisely, followed by a blank line, followed by a more detailed explanation. The Git project requires that the more detailed explanation include your motivation for the change and contrast its implementation with previous behavior — this is a good guideline to follow. It’s also a good idea to use the imperative present tense in these messages. In other words, use commands. Instead of "I added tests for" or "Adding tests for," use "Add tests for." Here is a template originally written by Tim Pope at tpope.net:

Short (50 chars or less) summary of changes

More detailed explanatory text, if necessary.  Wrap it to about 72
characters or so.  In some contexts, the first line is treated as the
subject of an email and the rest of the text as the body.  The blank
line separating the summary from the body is critical (unless you omit
the body entirely); tools like rebase can get confused if you run the
two together.

Further paragraphs come after blank lines.

 - Bullet points are okay, too

 - Typically a hyphen or asterisk is used for the bullet, preceded by a
   single space, with blank lines in between, but conventions vary here

If all your commit messages look like this, things will be a lot easier for you and the developers you work with. The Git project has well-formatted commit messages — I encourage you to run git log --no-merges there to see what a nicely formatted project-commit history looks like.

In the following examples, and throughout most of this book, for the sake of brevity I don’t format messages nicely like this; instead, I use the -m option to git commit. Do as I say, not as I do.

Private Small Team

The simplest setup you’re likely to encounter is a private project with one or two other developers. By private, I mean closed source — not read-accessible to the outside world. You and the other developers all have push access to the repository.

In this environment, you can follow a workflow similar to what you might do when using Subversion or another centralized system. You still get the advantages of things like offline committing and vastly simpler branching and merging, but the workflow can be very similar; the main difference is that merges happen client-side rather than on the server at commit time. Let’s see what it might look like when two developers start to work together with a shared repository. The first developer, John, clones the repository, makes a change, and commits locally. (I’m replacing the protocol messages with ... in these examples to shorten them somewhat.)

# John's Machine
$ git clone john@githost:simplegit.git
Initialized empty Git repository in /home/john/simplegit/.git/
...
$ cd simplegit/
$ vim lib/simplegit.rb 
$ git commit -am 'removed invalid default value'
[master 738ee87] removed invalid default value
 1 files changed, 1 insertions(+), 1 deletions(-)

The second developer, Jessica, does the same thing — clones the repository and commits a change:

# Jessica's Machine
$ git clone jessica@githost:simplegit.git
Initialized empty Git repository in /home/jessica/simplegit/.git/
...
$ cd simplegit/
$ vim TODO 
$ git commit -am 'add reset task'
[master fbff5bc] add reset task
 1 files changed, 1 insertions(+), 0 deletions(-)

Now, Jessica pushes her work up to the server:

# Jessica's Machine
$ git push origin master
...
To jessica@githost:simplegit.git
   1edee6b..fbff5bc  master -> master

John tries to push his change up, too:

# John's Machine
$ git push origin master
To john@githost:simplegit.git
 ! [rejected]        master -> master (non-fast forward)
error: failed to push some refs to 'john@githost:simplegit.git'

John isn’t allowed to push because Jessica has pushed in the meantime. This is especially important to understand if you’re used to Subversion, because you’ll notice that the two developers didn’t edit the same file. Although Subversion automatically does such a merge on the server if different files are edited, in Git you must merge the commits locally. John has to fetch Jessica’s changes and merge them in before he will be allowed to push:

$ git fetch origin
...
From john@githost:simplegit
 + 049d078...fbff5bc master     -> origin/master

At this point, John’s local repository looks something like Figure 5-4.

Figure 5-4. John’s initial repository.

John has a reference to the changes Jessica pushed up, but he has to merge them into his own work before he is allowed to push:

$ git merge origin/master
Merge made by recursive.
 TODO |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

The merge goes smoothly — John’s commit history now looks like Figure 5-5.

Figure 5-5. John’s repository after merging origin/master.

Now, John can test his code to make sure it still works properly, and then he can push his new merged work up to the server:

$ git push origin master
...
To john@githost:simplegit.git
   fbff5bc..72bbc59  master -> master

Finally, John’s commit history looks like Figure 5-6.

Figure 5-6. John’s history after pushing to the origin server.

In the meantime, Jessica has been working on a topic branch. She’s created a topic branch called issue54 and done three commits on that branch. She hasn’t fetched John’s changes yet, so her commit history looks like Figure 5-7.

Figure 5-7. Jessica’s initial commit history.

Jessica wants to sync up with John, so she fetches:

# Jessica's Machine
$ git fetch origin
...
From jessica@githost:simplegit
   fbff5bc..72bbc59  master     -> origin/master

That pulls down the work John has pushed up in the meantime. Jessica’s history now looks like Figure 5-8.

Figure 5-8. Jessica’s history after fetching John’s changes.

Jessica thinks her topic branch is ready, but she wants to know what she has to merge her work into so that she can push. She runs git log to find out:

$ git log --no-merges origin/master ^issue54
commit 738ee872852dfaa9d6634e0dea7a324040193016
Author: John Smith <jsmith@example.com>
Date:   Fri May 29 16:01:27 2009 -0700

    removed invalid default value

Now, Jessica can merge her topic work into her master branch, merge John’s work (origin/master) into her master branch, and then push back to the server again. First, she switches back to her master branch to integrate all this work:

$ git checkout master
Switched to branch "master"
Your branch is behind 'origin/master' by 2 commits, and can be fast-forwarded.

She can merge either origin/master or issue54 first — they’re both upstream, so the order doesn’t matter. The end snapshot should be identical no matter which order she chooses; only the history will be slightly different. She chooses to merge in issue54 first:

$ git merge issue54
Updating fbff5bc..4af4298
Fast forward
 README           |    1 +
 lib/simplegit.rb |    6 +++++-
 2 files changed, 6 insertions(+), 1 deletions(-)

No problems occur; as you can see it, was a simple fast-forward. Now Jessica merges in John’s work (origin/master):

$ git merge origin/master
Auto-merging lib/simplegit.rb
Merge made by recursive.
 lib/simplegit.rb |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

Everything merges cleanly, and Jessica’s history looks like Figure 5-9.

Figure 5-9. Jessica’s history after merging John’s changes.

Now origin/master is reachable from Jessica’s master branch, so she should be able to successfully push (assuming John hasn’t pushed again in the meantime):

$ git push origin master
...
To jessica@githost:simplegit.git
   72bbc59..8059c15  master -> master

Each developer has committed a few times and merged each other’s work successfully; see Figure 5-10.

Figure 5-10. Jessica’s history after pushing all changes back to the server.

That is one of the simplest workflows. You work for a while, generally in a topic branch, and merge into your master branch when it’s ready to be integrated. When you want to share that work, you merge it into your own master branch, then fetch and merge origin/master if it has changed, and finally push to the master branch on the server. The general sequence is something like that shown in Figure 5-11.

Figure 5-11. General sequence of events for a simple multiple-developer Git workflow.

Private Managed Team

In this next scenario, you’ll look at contributor roles in a larger private group. You’ll learn how to work in an environment where small groups collaborate on features and then those team-based contributions are integrated by another party.

Let’s say that John and Jessica are working together on one feature, while Jessica and Josie are working on a second. In this case, the company is using a type of integration-manager workflow where the work of the individual groups is integrated only by certain engineers, and the master branch of the main repo can be updated only by those engineers. In this scenario, all work is done in team-based branches and pulled together by the integrators later.

Let’s follow Jessica’s workflow as she works on her two features, collaborating in parallel with two different developers in this environment. Assuming she already has her repository cloned, she decides to work on featureA first. She creates a new branch for the feature and does some work on it there:

# Jessica's Machine
$ git checkout -b featureA
Switched to a new branch "featureA"
$ vim lib/simplegit.rb
$ git commit -am 'add limit to log function'
[featureA 3300904] add limit to log function
 1 files changed, 1 insertions(+), 1 deletions(-)

At this point, she needs to share her work with John, so she pushes her featureA branch commits up to the server. Jessica doesn’t have push access to the master branch — only the integrators do — so she has to push to another branch in order to collaborate with John:

$ git push origin featureA
...
To jessica@githost:simplegit.git
 * [new branch]      featureA -> featureA

Jessica e-mails John to tell him that she’s pushed some work into a branch named featureA and he can look at it now. While she waits for feedback from John, Jessica decides to start working on featureB with Josie. To begin, she starts a new feature branch, basing it off the server’s master branch:

# Jessica's Machine
$ git fetch origin
$ git checkout -b featureB origin/master
Switched to a new branch "featureB"

Now, Jessica makes a couple of commits on the featureB branch:

$ vim lib/simplegit.rb
$ git commit -am 'made the ls-tree function recursive'
[featureB e5b0fdc] made the ls-tree function recursive
 1 files changed, 1 insertions(+), 1 deletions(-)
$ vim lib/simplegit.rb
$ git commit -am 'add ls-files'
[featureB 8512791] add ls-files
 1 files changed, 5 insertions(+), 0 deletions(-)

Jessica’s repository looks like Figure 5-12.

Figure 5-12. Jessica’s initial commit history.

She’s ready to push up her work, but gets an e-mail from Josie that a branch with some initial work on it was already pushed to the server as featureBee. Jessica first needs to merge those changes in with her own before she can push to the server. She can then fetch Josie’s changes down with git fetch:

$ git fetch origin
...
From jessica@githost:simplegit
 * [new branch]      featureBee -> origin/featureBee

Jessica can now merge this into the work she did with git merge:

$ git merge origin/featureBee
Auto-merging lib/simplegit.rb
Merge made by recursive.
 lib/simplegit.rb |    4 ++++
 1 files changed, 4 insertions(+), 0 deletions(-)

There is a bit of a problem — she needs to push the merged work in her featureB branch to the featureBee branch on the server. She can do so by specifying the local branch followed by a colon (:) followed by the remote branch to the git push command:

$ git push origin featureB:featureBee
...
To jessica@githost:simplegit.git
   fba9af8..cd685d1  featureB -> featureBee

This is called a refspec. See Chapter 9 for a more detailed discussion of Git refspecs and different things you can do with them.

Next, John e-mails Jessica to say he’s pushed some changes to the featureA branch and ask her to verify them. She runs a git fetch to pull down those changes:

$ git fetch origin
...
From jessica@githost:simplegit
   3300904..aad881d  featureA   -> origin/featureA

Then, she can see what has been changed with git log:

$ git log origin/featureA ^featureA
commit aad881d154acdaeb2b6b18ea0e827ed8a6d671e6
Author: John Smith <jsmith@example.com>
Date:   Fri May 29 19:57:33 2009 -0700

    changed log output to 30 from 25

Finally, she merges John’s work into her own featureA branch:

$ git checkout featureA
Switched to branch "featureA"
$ git merge origin/featureA
Updating 3300904..aad881d
Fast forward
 lib/simplegit.rb |   10 +++++++++-
1 files changed, 9 insertions(+), 1 deletions(-)

Jessica wants to tweak something, so she commits again and then pushes this back up to the server:

$ git commit -am 'small tweak'
[featureA ed774b3] small tweak
 1 files changed, 1 insertions(+), 1 deletions(-)
$ git push origin featureA
...
To jessica@githost:simplegit.git
   3300904..ed774b3  featureA -> featureA

Jessica’s commit history now looks something like Figure 5-13.

Figure 5-13. Jessica’s history after committing on a feature branch.

Jessica, Josie, and John inform the integrators that the featureA and featureBee branches on the server are ready for integration into the mainline. After they integrate these branches into the mainline, a fetch will bring down the new merge commits, making the commit history look like Figure 5-14.

Figure 5-14. Jessica’s history after merging both her topic branches.

Many groups switch to Git because of this ability to have multiple teams working in parallel, merging the different lines of work late in the process. The ability of smaller subgroups of a team to collaborate via remote branches without necessarily having to involve or impede the entire team is a huge benefit of Git. The sequence for the workflow you saw here is something like Figure 5-15.

Figure 5-15. Basic sequence of this managed-team workflow.

Public Small Project

Contributing to public projects is a bit different. Because you don’t have the permissions to directly update branches on the project, you have to get the work to the maintainers some other way. This first example describes contributing via forking on Git hosts that support easy forking. The repo.or.cz and GitHub hosting sites both support this, and many project maintainers expect this style of contribution. The next section deals with projects that prefer to accept contributed patches via e-mail.

First, you’ll probably want to clone the main repository, create a topic branch for the patch or patch series you’re planning to contribute, and do your work there. The sequence looks basically like this:

$ git clone (url)
$ cd project
$ git checkout -b featureA
$ (work)
$ git commit
$ (work)
$ git commit

You may want to use rebase -i to squash your work down to a single commit, or rearrange the work in the commits to make the patch easier for the maintainer to review — see Chapter 6 for more information about interactive rebasing.

When your branch work is finished and you’re ready to contribute it back to the maintainers, go to the original project page and click the "Fork" button, creating your own writable fork of the project. You then need to add in this new repository URL as a second remote, in this case named myfork:

$ git remote add myfork (url)

You need to push your work up to it. It’s easiest to push the remote branch you’re working on up to your repository, rather than merging into your master branch and pushing that up. The reason is that if the work isn’t accepted or is cherry picked, you don’t have to rewind your master branch. If the maintainers merge, rebase, or cherry-pick your work, you’ll eventually get it back via pulling from their repository anyhow:

$ git push myfork featureA

When your work has been pushed up to your fork, you need to notify the maintainer. This is often called a pull request, and you can either generate it via the website — GitHub has a "pull request" button that automatically messages the maintainer — or run the git request-pull command and e-mail the output to the project maintainer manually.

The request-pull command takes the base branch into which you want your topic branch pulled and the Git repository URL you want them to pull from, and outputs a summary of all the changes you’re asking to be pulled in. For instance, if Jessica wants to send John a pull request, and she’s done two commits on the topic branch she just pushed up, she can run this:

$ git request-pull origin/master myfork
The following changes since commit 1edee6b1d61823a2de3b09c160d7080b8d1b3a40:
  John Smith (1):
        added a new function

are available in the git repository at:

  git://githost/simplegit.git featureA

Jessica Smith (2):
      add limit to log function
      change log output to 30 from 25

 lib/simplegit.rb |   10 +++++++++-
 1 files changed, 9 insertions(+), 1 deletions(-)

The output can be sent to the maintainer—it tells them where the work was branched from, summarizes the commits, and tells where to pull this work from.

On a project for which you’re not the maintainer, it’s generally easier to have a branch like master always track origin/master and to do your work in topic branches that you can easily discard if they’re rejected. Having work themes isolated into topic branches also makes it easier for you to rebase your work if the tip of the main repository has moved in the meantime and your commits no longer apply cleanly. For example, if you want to submit a second topic of work to the project, don’t continue working on the topic branch you just pushed up — start over from the main repository’s master branch:

$ git checkout -b featureB origin/master
$ (work)
$ git commit
$ git push myfork featureB
$ (email maintainer)
$ git fetch origin

Now, each of your topics is contained within a silo — similar to a patch queue — that you can rewrite, rebase, and modify without the topics interfering or interdepending on each other as in Figure 5-16.

Figure 5-16. Initial commit history with featureB work.

Let’s say the project maintainer has pulled in a bunch of other patches and tried your first branch, but it no longer cleanly merges. In this case, you can try to rebase that branch on top of origin/master, resolve the conflicts for the maintainer, and then resubmit your changes:

$ git checkout featureA
$ git rebase origin/master
$ git push –f myfork featureA

This rewrites your history to now look like Figure 5-17.

Figure 5-17. Commit history after featureA work.

Because you rebased the branch, you have to specify the –f to your push command in order to be able to replace the featureA branch on the server with a commit that isn’t a descendant of it. An alternative would be to push this new work to a different branch on the server (perhaps called featureAv2).

Let’s look at one more possible scenario: the maintainer has looked at work in your second branch and likes the concept but would like you to change an implementation detail. You’ll also take this opportunity to move the work to be based off the project’s current master branch. You start a new branch based off the current origin/master branch, squash the featureB changes there, resolve any conflicts, make the implementation change, and then push that up as a new branch:

$ git checkout -b featureBv2 origin/master
$ git merge --no-commit --squash featureB
$ (change implementation)
$ git commit
$ git push myfork featureBv2

The --squash option takes all the work on the merged branch and squashes it into one non-merge commit on top of the branch you’re on. The --no-commit option tells Git not to automatically record a commit. This allows you to introduce all the changes from another branch and then make more changes before recording the new commit.

Now you can send the maintainer a message that you’ve made the requested changes and they can find those changes in your featureBv2 branch (see Figure 5-18).

Figure 5-18. Commit history after featureBv2 work.

Public Large Project

Many larger projects have established procedures for accepting patches — you’ll need to check the specific rules for each project, because they will differ. However, many larger public projects accept patches via a developer mailing list, so I’ll go over an example of that now.

The workflow is similar to the previous use case — you create topic branches for each patch series you work on. The difference is how you submit them to the project. Instead of forking the project and pushing to your own writable version, you generate e-mail versions of each commit series and e-mail them to the developer mailing list:

$ git checkout -b topicA
$ (work)
$ git commit
$ (work)
$ git commit

Now you have two commits that you want to send to the mailing list. You use git format-patch to generate the mbox-formatted files that you can e-mail to the list — it turns each commit into an e-mail message with the first line of the commit message as the subject and the rest of the message plus the patch that the commit introduces as the body. The nice thing about this is that applying a patch from an e-mail generated with format-patch preserves all the commit information properly, as you’ll see more of in the next section when you apply these commits:

$ git format-patch -M origin/master
0001-add-limit-to-log-function.patch
0002-changed-log-output-to-30-from-25.patch

The format-patch command prints out the names of the patch files it creates. The -M switch tells Git to look for renames. The files end up looking like this:

$ cat 0001-add-limit-to-log-function.patch 
From 330090432754092d704da8e76ca5c05c198e71a8 Mon Sep 17 00:00:00 2001
From: Jessica Smith <jessica@example.com>
Date: Sun, 6 Apr 2008 10:17:23 -0700
Subject: [PATCH 1/2] add limit to log function

Limit log functionality to the first 20

---
 lib/simplegit.rb |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/lib/simplegit.rb b/lib/simplegit.rb
index 76f47bc..f9815f1 100644
--- a/lib/simplegit.rb
+++ b/lib/simplegit.rb
@@ -14,7 +14,7 @@ class SimpleGit
   end

   def log(treeish = 'master')
-    command("git log #{treeish}")
+    command("git log -n 20 #{treeish}")
   end

   def ls_tree(treeish = 'master')
-- 
1.6.2.rc1.20.g8c5b.dirty

You can also edit these patch files to add more information for the e-mail list that you don’t want to show up in the commit message. If you add text between the -- line and the beginning of the patch (the lib/simplegit.rb line), then developers can read it; but applying the patch excludes it.

To e-mail this to a mailing list, you can either paste the file into your e-mail program or send it via a command-line program. Pasting the text often causes formatting issues, especially with "smarter" clients that don’t preserve newlines and other whitespace appropriately. Luckily, Git provides a tool to help you send properly formatted patches via IMAP, which may be easier for you. I’ll demonstrate how to send a patch via Gmail, which happens to be the e-mail agent I use; you can read detailed instructions for a number of mail programs at the end of the aforementioned Documentation/SubmittingPatches file in the Git source code.

First, you need to set up the imap section in your ~/.gitconfig file. You can set each value separately with a series of git config commands, or you can add them manually; but in the end, your config file should look something like this:

[imap]
  folder = "[Gmail]/Drafts"
  host = imaps://imap.gmail.com
  user = user@gmail.com
  pass = p4ssw0rd
  port = 993
  sslverify = false

If your IMAP server doesn’t use SSL, the last two lines probably aren’t necessary, and the host value will be imap:// instead of imaps://. When that is set up, you can use git send-email to place the patch series in the Drafts folder of the specified IMAP server:

$ git send-email *.patch
0001-added-limit-to-log-function.patch
0002-changed-log-output-to-30-from-25.patch
Who should the emails appear to be from? [Jessica Smith <jessica@example.com>] 
Emails will be sent from: Jessica Smith <jessica@example.com>
Who should the emails be sent to? jessica@example.com
Message-ID to be used as In-Reply-To for the first email? y

Then, Git spits out a bunch of log information looking something like this for each patch you’re sending:

(mbox) Adding cc: Jessica Smith <jessica@example.com> from 
  \line 'From: Jessica Smith <jessica@example.com>'
OK. Log says:
Sendmail: /usr/sbin/sendmail -i jessica@example.com
From: Jessica Smith <jessica@example.com>
To: jessica@example.com
Subject: [PATCH 1/2] added limit to log function
Date: Sat, 30 May 2009 13:29:15 -0700
Message-Id: <1243715356-61726-1-git-send-email-jessica@example.com>
X-Mailer: git-send-email 1.6.2.rc1.20.g8c5b.dirty
In-Reply-To: <y>
References: <y>

Result: OK

At this point, you should be able to go to your Drafts folder, change the To field to the mailing list you’re sending the patch to, possibly CC the maintainer or person responsible for that section, and send it off.

Summary

This section has covered a number of common workflows for dealing with several very different types of Git projects you’re likely to encounter and introduced a couple of new tools to help you manage this process. Next, you’ll see how to work the other side of the coin: maintaining a Git project. You’ll learn how to be a benevolent dictator or integration manager.

Maintaining a Project

In addition to knowing how to effectively contribute to a project, you’ll likely need to know how to maintain one. This can consist of accepting and applying patches generated via format-patch and e-mailed to you, or integrating changes in remote branches for repositories you’ve added as remotes to your project. Whether you maintain a canonical repository or want to help by verifying or approving patches, you need to know how to accept work in a way that is clearest for other contributors and sustainable by you over the long run.

Working in Topic Branches

When you’re thinking of integrating new work, it’s generally a good idea to try it out in a topic branch — a temporary branch specifically made to try out that new work. This way, it’s easy to tweak a patch individually and leave it if it’s not working until you have time to come back to it. If you create a simple branch name based on the theme of the work you’re going to try, such as ruby_client or something similarly descriptive, you can easily remember it if you have to abandon it for a while and come back later. The maintainer of the Git project tends to namespace these branches as well — such as sc/ruby_client, where sc is short for the person who contributed the work. As you’ll remember, you can create the branch based off your master branch like this:

$ git branch sc/ruby_client master

Or, if you want to also switch to it immediately, you can use the checkout -b option:

$ git checkout -b sc/ruby_client master

Now you’re ready to add your contributed work into this topic branch and determine if you want to merge it into your longer-term branches.

Applying Patches from E-mail

If you receive a patch over e-mail that you need to integrate into your project, you need to apply the patch in your topic branch to evaluate it. There are two ways to apply an e-mailed patch: with git apply or with git am.

Applying a Patch with apply

If you received the patch from someone who generated it with the git diff or a Unix diff command, you can apply it with the git apply command. Assuming you saved the patch at /tmp/patch-ruby-client.patch, you can apply the patch like this:

$ git apply /tmp/patch-ruby-client.patch

This modifies the files in your working directory. It’s almost identical to running a patch -p1 command to apply the patch, although it’s more paranoid and accepts fewer fuzzy matches than patch. It also handles file adds, deletes, and renames if they’re described in the git diff format, which patch won’t do. Finally, git apply is an "apply all or abort all" model where either everything is applied or nothing is, whereas patch can partially apply patchfiles, leaving your working directory in a weird state. git apply is overall much more paranoid than patch. It won’t create a commit for you — after running it, you must stage and commit the changes introduced manually.

You can also use git apply to see if a patch applies cleanly before you try actually applying it — you can run git apply --check with the patch:

$ git apply --check 0001-seeing-if-this-helps-the-gem.patch 
error: patch failed: ticgit.gemspec:1
error: ticgit.gemspec: patch does not apply

If there is no output, then the patch should apply cleanly. This command also exits with a non-zero status if the check fails, so you can use it in scripts if you want.

Applying a Patch with am

If the contributor is a Git user and was good enough to use the format-patch command to generate their patch, then your job is easier because the patch contains author information and a commit message for you. If you can, encourage your contributors to use format-patch instead of diff to generate patches for you. You should only have to use git apply for legacy patches and things like that.

To apply a patch generated by format-patch, you use git am. Technically, git am is built to read an mbox file, which is a simple, plain-text format for storing one or more e-mail messages in one text file. It looks something like this:

From 330090432754092d704da8e76ca5c05c198e71a8 Mon Sep 17 00:00:00 2001
From: Jessica Smith <jessica@example.com>
Date: Sun, 6 Apr 2008 10:17:23 -0700
Subject: [PATCH 1/2] add limit to log function

Limit log functionality to the first 20

This is the beginning of the output of the format-patch command that you saw in the previous section. This is also a valid mbox e-mail format. If someone has e-mailed you the patch properly using git send-email, and you download that into an mbox format, then you can point git am to that mbox file, and it will start applying all the patches it sees. If you run a mail client that can save several e-mails out in mbox format, you can save entire patch series into a file and then use git am to apply them one at a time.

However, if someone uploaded a patch file generated via format-patch to a ticketing system or something similar, you can save the file locally and then pass that file saved on your disk to git am to apply it:

$ git am 0001-limit-log-function.patch 
Applying: add limit to log function

You can see that it applied cleanly and automatically created the new commit for you. The author information is taken from the e-mail’s From and Date headers, and the message of the commit is taken from the Subject and body (before the patch) of the e-mail. For example, if this patch was applied from the mbox example I just showed, the commit generated would look something like this:

$ git log --pretty=fuller -1
commit 6c5e70b984a60b3cecd395edd5b48a7575bf58e0
Author:     Jessica Smith <jessica@example.com>
AuthorDate: Sun Apr 6 10:17:23 2008 -0700
Commit:     Scott Chacon <schacon@gmail.com>
CommitDate: Thu Apr 9 09:19:06 2009 -0700

   add limit to log function

   Limit log functionality to the first 20

The Commit information indicates the person who applied the patch and the time it was applied. The Author information is the individual who originally created the patch and when it was originally created.

But it’s possible that the patch won’t apply cleanly. Perhaps your main branch has diverged too far from the branch the patch was built from, or the patch depends on another patch you haven’t applied yet. In that case, the git am process will fail and ask you what you want to do:

$ git am 0001-seeing-if-this-helps-the-gem.patch 
Applying: seeing if this helps the gem
error: patch failed: ticgit.gemspec:1
error: ticgit.gemspec: patch does not apply
Patch failed at 0001.
When you have resolved this problem run "git am --resolved".
If you would prefer to skip this patch, instead run "git am --skip".
To restore the original branch and stop patching run "git am --abort".

This command puts conflict markers in any files it has issues with, much like a conflicted merge or rebase operation. You solve this issue much the same way — edit the file to resolve the conflict, stage the new file, and then run git am --resolved to continue to the next patch:

$ (fix the file)
$ git add ticgit.gemspec 
$ git am --resolved
Applying: seeing if this helps the gem

If you want Git to try a bit more intelligently to resolve the conflict, you can pass a -3 option to it, which makes Git attempt a three-way merge. This option isn’t on by default because it doesn’t work if the commit the patch says it was based on isn’t in your repository. If you do have that commit — if the patch was based on a public commit — then the -3 option is generally much smarter about applying a conflicting patch:

$ git am -3 0001-seeing-if-this-helps-the-gem.patch 
Applying: seeing if this helps the gem
error: patch failed: ticgit.gemspec:1
error: ticgit.gemspec: patch does not apply
Using index info to reconstruct a base tree...
Falling back to patching base and 3-way merge...
No changes -- Patch already applied.

In this case, I was trying to apply a patch I had already applied. Without the -3 option, it looks like a conflict.

If you’re applying a number of patches from an mbox, you can also run the am command in interactive mode, which stops at each patch it finds and asks if you want to apply it:

$ git am -3 -i mbox
Commit Body is:
--------------------------
seeing if this helps the gem
--------------------------
Apply? [y]es/[n]o/[e]dit/[v]iew patch/[a]ccept all

This is nice if you have a number of patches saved, because you can view the patch first if you don’t remember what it is, or not apply the patch if you’ve already done so.

When all the patches for your topic are applied and committed into your branch, you can choose whether and how to integrate them into a longer-running branch.

Checking Out Remote Branches

If your contribution came from a Git user who set up their own repository, pushed a number of changes into it, and then sent you the URL to the repository and the name of the remote branch the changes are in, you can add them as a remote and do merges locally.

For instance, if Jessica sends you an e-mail saying that she has a great new feature in the ruby-client branch of her repository, you can test it by adding the remote and checking out that branch locally:

$ git remote add jessica git://github.com/jessica/myproject.git
$ git fetch jessica
$ git checkout -b rubyclient jessica/ruby-client

If she e-mails you again later with another branch containing another great feature, you can fetch and check out because you already have the remote setup.

This is most useful if you’re working with a person consistently. If someone only has a single patch to contribute once in a while, then accepting it over e-mail may be less time consuming than requiring everyone to run their own server and having to continually add and remove remotes to get a few patches. You’re also unlikely to want to have hundreds of remotes, each for someone who contributes only a patch or two. However, scripts and hosted services may make this easier — it depends largely on how you develop and how your contributors develop.

The other advantage of this approach is that you get the history of the commits as well. Although you may have legitimate merge issues, you know where in your history their work is based; a proper three-way merge is the default rather than having to supply a -3 and hope the patch was generated off a public commit to which you have access.

If you aren’t working with a person consistently but still want to pull from them in this way, you can provide the URL of the remote repository to the git pull command. This does a one-time pull and doesn’t save the URL as a remote reference:

$ git pull git://github.com/onetimeguy/project.git
From git://github.com/onetimeguy/project
 * branch            HEAD       -> FETCH_HEAD
Merge made by recursive.

Determining What Is Introduced

Now you have a topic branch that contains contributed work. At this point, you can determine what you’d like to do with it. This section revisits a couple of commands so you can see how you can use them to review exactly what you’ll be introducing if you merge this into your main branch.

It’s often helpful to get a review of all the commits that are in this branch but that aren’t in your master branch. You can exclude commits in the master branch by adding the --not option before the branch name. For example, if your contributor sends you two patches and you create a branch called contrib and applied those patches there, you can run this:

$ git log contrib --not master
commit 5b6235bd297351589efc4d73316f0a68d484f118
Author: Scott Chacon <schacon@gmail.com>
Date:   Fri Oct 24 09:53:59 2008 -0700

    seeing if this helps the gem

commit 7482e0d16d04bea79d0dba8988cc78df655f16a0
Author: Scott Chacon <schacon@gmail.com>
Date:   Mon Oct 22 19:38:36 2008 -0700

    updated the gemspec to hopefully work better

To see what changes each commit introduces, remember that you can pass the -p option to git log and it will append the diff introduced to each commit.

To see a full diff of what would happen if you were to merge this topic branch with another branch, you may have to use a weird trick to get the correct results. You may think to run this:

$ git diff master

This command gives you a diff, but it may be misleading. If your master branch has moved forward since you created the topic branch from it, then you’ll get seemingly strange results. This happens because Git directly compares the snapshots of the last commit of the topic branch you’re on and the snapshot of the last commit on the master branch. For example, if you’ve added a line in a file on the master branch, a direct comparison of the snapshots will look like the topic branch is going to remove that line.

If master is a direct ancestor of your topic branch, this isn’t a problem; but if the two histories have diverged, the diff will look like you’re adding all the new stuff in your topic branch and removing everything unique to the master branch.

What you really want to see are the changes added to the topic branch — the work you’ll introduce if you merge this branch with master. You do that by having Git compare the last commit on your topic branch with the first common ancestor it has with the master branch.

Technically, you can do that by explicitly figuring out the common ancestor and then running your diff on it:

$ git merge-base contrib master
36c7dba2c95e6bbb78dfa822519ecfec6e1ca649
$ git diff 36c7db

However, that isn’t convenient, so Git provides another shorthand for doing the same thing: the triple-dot syntax. In the context of the diff command, you can put three periods after another branch to do a diff between the last commit of the branch you’re on and its common ancestor with another branch:

$ git diff master...contrib

This command shows you only the work your current topic branch has introduced since its common ancestor with master. That is a very useful syntax to remember.

Integrating Contributed Work

When all the work in your topic branch is ready to be integrated into a more mainline branch, the question is how to do it. Furthermore, what overall workflow do you want to use to maintain your project? You have a number of choices, so I’ll cover a few of them.

Merging Workflows

One simple workflow merges your work into your master branch. In this scenario, you have a master branch that contains basically stable code. When you have work in a topic branch that you’ve done or that someone has contributed and you’ve verified, you merge it into your master branch, delete the topic branch, and then continue the process. If we have a repository with work in two branches named ruby_client and php_client that looks like Figure 5-19 and merge ruby_client first and then php_client next, then your history will end up looking like Figure 5-20.

Figure 5-19. History with several topic branches.

Figure 5-20. After a topic branch merge.

That is probably the simplest workflow, but it’s problematic if you’re dealing with larger repositories or projects.

If you have more developers or a larger project, you’ll probably want to use at least a two-phase merge cycle. In this scenario, you have two long-running branches, master and develop, in which you determine that master is updated only when a very stable release is cut and all new code is integrated into the develop branch. You regularly push both of these branches to the public repository. Each time you have a new topic branch to merge in (Figure 5-21), you merge it into develop (Figure 5-22); then, when you tag a release, you fast-forward master to wherever the now-stable develop branch is (Figure 5-23).

Figure 5-21. Before a topic branch merge.

Figure 5-22. After a topic branch merge.

Figure 5-23. After a topic branch release.

This way, when people clone your project’s repository, they can either check out master to build the latest stable version and keep up to date on that easily, or they can check out develop, which is the more cutting-edge stuff. You can also continue this concept, having an integrate branch where all the work is merged together. Then, when the codebase on that branch is stable and passes tests, you merge it into a develop branch; and when that has proven itself stable for a while, you fast-forward your master branch.

Large-Merging Workflows

The Git project has four long-running branches: master, next, and pu (proposed updates) for new work, and maint for maintenance backports. When new work is introduced by contributors, it’s collected into topic branches in the maintainer’s repository in a manner similar to what I’ve described (see Figure 5-24). At this point, the topics are evaluated to determine whether they’re safe and ready for consumption or whether they need more work. If they’re safe, they’re merged into next, and that branch is pushed up so everyone can try the topics integrated together.

Figure 5-24. Managing a complex series of parallel contributed topic branches.

If the topics still need work, they’re merged into pu instead. When it’s determined that they’re totally stable, the topics are re-merged into master and are then rebuilt from the topics that were in next but didn’t yet graduate to master. This means master almost always moves forward, next is rebased occasionally, and pu is rebased even more often (see Figure 5-25).

Figure 5-25. Merging contributed topic branches into long-term integration branches.

When a topic branch has finally been merged into master, it’s removed from the repository. The Git project also has a maint branch that is forked off from the last release to provide backported patches in case a maintenance release is required. Thus, when you clone the Git repository, you have four branches that you can check out to evaluate the project in different stages of development, depending on how cutting edge you want to be or how you want to contribute; and the maintainer has a structured workflow to help them vet new contributions.

Rebasing and Cherry Picking Workflows

Other maintainers prefer to rebase or cherry-pick contributed work on top of their master branch, rather than merging it in, to keep a mostly linear history. When you have work in a topic branch and have determined that you want to integrate it, you move to that branch and run the rebase command to rebuild the changes on top of your current master (or develop, and so on) branch. If that works well, you can fast-forward your master branch, and you’ll end up with a linear project history.

The other way to move introduced work from one branch to another is to cherry-pick it. A cherry-pick in Git is like a rebase for a single commit. It takes the patch that was introduced in a commit and tries to reapply it on the branch you’re currently on. This is useful if you have a number of commits on a topic branch and you want to integrate only one of them, or if you only have one commit on a topic branch and you’d prefer to cherry-pick it rather than run rebase. For example, suppose you have a project that looks like Figure 5-26.

Figure 5-26. Example history before a cherry pick.

If you want to pull commit e43a6 into your master branch, you can run

$ git cherry-pick e43a6fd3e94888d76779ad79fb568ed180e5fcdf
Finished one cherry-pick.
[master]: created a0a41a9: "More friendly message when locking the index fails."
 3 files changed, 17 insertions(+), 3 deletions(-)

This pulls the same change introduced in e43a6, but you get a new commit SHA-1 value, because the date applied is different. Now your history looks like Figure 5-27.

Figure 5-27. History after cherry-picking a commit on a topic branch.

Now you can remove your topic branch and drop the commits you didn’t want to pull in.

Tagging Your Releases

When you’ve decided to cut a release, you’ll probably want to drop a tag so you can re-create that release at any point going forward. You can create a new tag as I discussed in Chapter 2. If you decide to sign the tag as the maintainer, the tagging may look something like this:

$ git tag -s v1.5 -m 'my signed 1.5 tag'
You need a passphrase to unlock the secret key for
user: "Scott Chacon <schacon@gmail.com>"
1024-bit DSA key, ID F721C45A, created 2009-02-09

If you do sign your tags, you may have the problem of distributing the public PGP key used to sign your tags. The maintainer of the Git project has solved this issue by including their public key as a blob in the repository and then adding a tag that points directly to that content. To do this, you can figure out which key you want by running gpg --list-keys:

$ gpg --list-keys
/Users/schacon/.gnupg/pubring.gpg
---------------------------------
pub   1024D/F721C45A 2009-02-09 [expires: 2010-02-09]
uid                  Scott Chacon <schacon@gmail.com>
sub   2048g/45D02282 2009-02-09 [expires: 2010-02-09]

Then, you can directly import the key into the Git database by exporting it and piping that through git hash-object, which writes a new blob with those contents into Git and gives you back the SHA-1 of the blob:

$ gpg -a --export F721C45A | git hash-object -w --stdin
659ef797d181633c87ec71ac3f9ba29fe5775b92

Now that you have the contents of your key in Git, you can create a tag that points directly to it by specifying the new SHA-1 value that the hash-object command gave you:

$ git tag -a maintainer-pgp-pub 659ef797d181633c87ec71ac3f9ba29fe5775b92

If you run git push --tags, the maintainer-pgp-pub tag will be shared with everyone. If anyone wants to verify a tag, they can directly import your PGP key by pulling the blob directly out of the database and importing it into GPG:

$ git show maintainer-pgp-pub | gpg --import

They can use that key to verify all your signed tags. Also, if you include instructions in the tag message, running git show <tag> will let you give the end user more specific instructions about tag verification.

Generating a Build Number

Because Git doesn’t have monotonically increasing numbers like 'v123' or the equivalent to go with each commit, if you want to have a human-readable name to go with a commit, you can run git describe on that commit. Git gives you the name of the nearest tag with the number of commits on top of that tag and a partial SHA-1 value of the commit you’re describing:

$ git describe master
v1.6.2-rc1-20-g8c5b85c

This way, you can export a snapshot or build and name it something understandable to people. In fact, if you build Git from source code cloned from the Git repository, git --version gives you something that looks like this. If you’re describing a commit that you have directly tagged, it gives you the tag name.

The git describe command favors annotated tags (tags created with the -a or -s flag), so release tags should be created this way if you’re using git describe, to ensure the commit is named properly when described. You can also use this string as the target of a checkout or show command, although it relies on the abbreviated SHA-1 value at the end, so it may not be valid forever. For instance, the Linux kernel recently jumped from 8 to 10 characters to ensure SHA-1 object uniqueness, so older git describe output names were invalidated.

Preparing a Release

Now you want to release a build. One of the things you’ll want to do is create an archive of the latest snapshot of your code for those poor souls who don’t use Git. The command to do this is git archive:

$ git archive master --prefix='project/' | gzip > `git describe master`.tar.gz
$ ls *.tar.gz
v1.6.2-rc1-20-g8c5b85c.tar.gz

If someone opens that tarball, they get the latest snapshot of your project under a project directory. You can also create a zip archive in much the same way, but by passing the --format=zip option to git archive:

$ git archive master --prefix='project/' --format=zip > `git describe master`.zip

You now have a nice tarball and a zip archive of your project release that you can upload to your website or e-mail to people.

The Shortlog

It’s time to e-mail your mailing list of people who want to know what’s happening in your project. A nice way of quickly getting a sort of changelog of what has been added to your project since your last release or e-mail is to use the git shortlog command. It summarizes all the commits in the range you give it; for example, the following gives you a summary of all the commits since your last release, if your last release was named v1.0.1:

$ git shortlog --no-merges master --not v1.0.1
Chris Wanstrath (8):
      Add support for annotated tags to Grit::Tag
      Add packed-refs annotated tag support.
      Add Grit::Commit#to_patch
      Update version and History.txt
      Remove stray `puts`
      Make ls_tree ignore nils

Tom Preston-Werner (4):
      fix dates in history
      dynamic version method
      Version bump to 1.0.2
      Regenerated gemspec for version 1.0.2

You get a clean summary of all the commits since v1.0.1, grouped by author, that you can e-mail to your list.

Summary

You should feel fairly comfortable contributing to a project in Git as well as maintaining your own project or integrating other users’ contributions. Congratulations on being an effective Git developer! In the next chapter, you’ll learn more powerful tools and tips for dealing with complex situations, which will truly make you a Git master.

Git 工具

現在,你已經學習了管理或者維護 Git 倉庫,實現代碼控制所需的大多數日常命令和工作流程。你已經完成了跟蹤和提交檔案的基本任務,並且發揮了暫存區(staging area)和羽量級的特性分支及合併的威力。

接下來你將領略到一些 Git 可以實現的非常強大的功能,這些功能你可能並不會在日常操作中使用,但在某些時候你也許會需要。

選擇修訂版本

Git 允許你通過幾種方法來指明特定的或者一定範圍內的提交。瞭解它們並不是必需的,但是瞭解一下總沒壞處。

單個修訂版本

顯然你可以使用給出的 SHA-1 值來指明一次提交,不過也有更加人性化的方法來做同樣的事。本節概述了指明單個提交的諸多方法。

簡短的 SHA

Git 很聰明,它能夠通過你提供的前幾個字元來識別你想要的那次提交,只要你提供的那部分 SHA-1 不短於四個字元,並且沒有歧義——也就是說,當前倉庫中只有一個物件以這段 SHA-1 開頭。

例如,想要查看一次指定的提交,假設你執行 git log 命令並找到你增加了功能的那次提交:

$ git log
commit 734713bc047d87bf7eac9674765ae793478c50d3
Author: Scott Chacon <schacon@gmail.com>
Date:   Fri Jan 2 18:32:33 2009 -0800

    fixed refs handling, added gc auto, updated tests

commit d921970aadf03b3cf0e71becdaab3147ba71cdef
Merge: 1c002dd... 35cfb2b...
Author: Scott Chacon <schacon@gmail.com>
Date:   Thu Dec 11 15:08:43 2008 -0800

    Merge commit 'phedders/rdocs'

commit 1c002dd4b536e7479fe34593e72e6c6c1819e53b
Author: Scott Chacon <schacon@gmail.com>
Date:   Thu Dec 11 14:58:32 2008 -0800

    added some blame and merge stuff

假設是 1c002dd.... 。如果你想 git show 這次提交,下面命令的作用是相同的(假設簡短的版本沒有歧義):

$ git show 1c002dd4b536e7479fe34593e72e6c6c1819e53b
$ git show 1c002dd4b536e7479f
$ git show 1c002d

Git 可以為你的 SHA-1 值生成出簡短且唯一的縮寫。如果你傳遞 --abbrev-commitgit log 命令,輸出結果裡就會使用簡短且唯一的值;它預設使用七個字元來表示,不過必要時為了避免 SHA-1 的歧義,會增加字元數:

$ git log --abbrev-commit --pretty=oneline
ca82a6d changed the version number
085bb3b removed unnecessary test code
a11bef0 first commit

通常在一個專案中,使用八到十個字元來避免 SHA-1 歧義已經足夠了。最大的 Git 專案之一,Linux 內核,目前也只需要最長 40 個字元中的 12 個字元來保持唯一性。

關於 SHA-1 的簡短說明

許多人可能會擔心一個問題:在隨機的偶然情況下,在他們的倉庫裡會出現兩個具有相同 SHA-1 值的物件。那會怎麼樣呢?

如果你真的向倉庫裡提交了一個跟之前的某個物件具有相同 SHA-1 值的物件,Git 將會發現之前的那個物件已經存在在 Git 資料庫中,並認為它已經被寫入了。如果什麼時候你想再次檢出那個物件時,你會總是得到先前的那個物件的資料。

不過,你應該瞭解到,這種情況發生的概率是多麼微小。SHA-1 摘要長度是 20 位元組,也就是 160 位元。為了保證有 50% 的概率出現一次衝突,需要 2^80 個隨機雜湊的物件(計算衝突機率的公式是 p = (n(n-1)/2) * (1/2^160))。2^80 是 1.2 x 10^24,也就是一億億億,那是地球上沙粒總數的 1200 倍。

現在舉例說一下怎樣才能產生一次 SHA-1 衝突。如果地球上 65 億的人類都在程式設計,每人每秒都在產生相當於整個 Linux 內核歷史(一百萬個 Git 物件)的代碼,並將之提交到一個巨大的 Git 倉庫裡面,那將花費 5 年的時間才會產生足夠的物件,使其擁有 50% 的概率產生一次 SHA-1 物件衝突。這要比你程式設計團隊的成員同一個晚上在互不相干的意外中被狼襲擊並殺死的機率還要小。

分支引用 (Branch References)

指明一次提交的最直接的方法是有一個指向它的分支引用。這樣,你就可以在任何需要一個提交物件或者 SHA-1 值的 Git 命令中使用該分支名稱了。如果你想要顯示一個分支的最後一次提交的物件,例如假設 topic1 分支指向 ca82a6d,那麼下面的命令是相等的:

$ git show ca82a6dff817ec66f44342007202690a93763949
$ git show topic1

如果你想知道某個分支指向哪個特定的 SHA,或者想看任何一個例子中被簡寫的 SHA-1,你可以使用一個叫做 rev-parse 的 Git plumbing 工具。在第 9 章你可以看到關於 plumbing 工具的更多信息;簡單來說,rev-parse 是為了底層操作而不是日常操作設計的。不過,有時你想看 Git 現在到底處於什麼狀態時,它可能會很有用。現在,你可以對你的分支執行 rev-parse

$ git rev-parse topic1
ca82a6dff817ec66f44342007202690a93763949

引用日誌(RefLog)裡的簡稱

在你工作的同時,Git 在後臺的工作之一就是保存一份引用日誌(reflog)——一份記錄最近幾個月你的 HEAD 和分支引用的日誌。

你可以使用 git reflog 來查看引用日誌:

$ git reflog
734713b... HEAD@{0}: commit: fixed refs handling, added gc auto, updated
d921970... HEAD@{1}: merge phedders/rdocs: Merge made by recursive.
1c002dd... HEAD@{2}: commit: added some blame and merge stuff
1c36188... HEAD@{3}: rebase -i (squash): updating HEAD
95df984... HEAD@{4}: commit: # This is a combination of two commits.
1c36188... HEAD@{5}: rebase -i (squash): updating HEAD
7e05da5... HEAD@{6}: rebase -i (pick): updating HEAD

每次你的分支頂端因為某些原因被修改時,Git 就會為你將資訊保存在這個臨時歷史記錄裡面。你也可以使用這份資料來指明更早的分支。如果你想查看倉庫中 HEAD 在五次前的值,你可以使用引用日誌的輸出中的 @{n} 引用:

$ git show HEAD@{5}

你也可以使用這個語法來查看一定時間前分支指向哪裡。例如,想看你的 master 分支昨天在哪,你可以輸入

$ git show master@{yesterday}

它就會顯示昨天分支的頂端在哪。這項技術只對還在你引用日誌裡的資料有用,所以不能用來查看比幾個月前還早的提交。

想要看類似於 git log 輸出格式的引用日誌資訊,你可以執行 git log -g

$ git log -g master
commit 734713bc047d87bf7eac9674765ae793478c50d3
Reflog: master@{0} (Scott Chacon <schacon@gmail.com>)
Reflog message: commit: fixed refs handling, added gc auto, updated 
Author: Scott Chacon <schacon@gmail.com>
Date:   Fri Jan 2 18:32:33 2009 -0800

    fixed refs handling, added gc auto, updated tests

commit d921970aadf03b3cf0e71becdaab3147ba71cdef
Reflog: master@{1} (Scott Chacon <schacon@gmail.com>)
Reflog message: merge phedders/rdocs: Merge made by recursive.
Author: Scott Chacon <schacon@gmail.com>
Date:   Thu Dec 11 15:08:43 2008 -0800

    Merge commit 'phedders/rdocs'

需要注意的是,日誌引用資訊只存在於本地——這是一個你在倉庫裡做過什麼的日誌。這些引用不會和其他人的倉庫拷貝裡的相同;當你新 clone 一個倉庫的時候,引用日誌是空的,因為你在倉庫裡還沒有操作。只有你克隆了一個專案至少兩個月,git show HEAD@{2.months.ago} 才會有用——如果你是五分鐘前克隆的倉庫,將不會有結果回傳。

祖先引用 (Ancestry References)

另一種指明某次提交的常用方法是通過它的祖先。如果你在引用最後加上一個 ^,Git 將其理解為此次提交的父提交。 假設你的專案歷史是這樣的:

$ git log --pretty=format:'%h %s' --graph
* 734713b fixed refs handling, added gc auto, updated tests
*   d921970 Merge commit 'phedders/rdocs'
|\  
| * 35cfb2b Some rdoc changes
* | 1c002dd added some blame and merge stuff
|/  
* 1c36188 ignore *.gem
* 9b29157 add open3_detach to gemspec file list

那麼,想看上一次提交,你可以使用 HEAD^,意思是「HEAD 的父提交」:

$ git show HEAD^
commit d921970aadf03b3cf0e71becdaab3147ba71cdef
Merge: 1c002dd... 35cfb2b...
Author: Scott Chacon <schacon@gmail.com>
Date:   Thu Dec 11 15:08:43 2008 -0800

    Merge commit 'phedders/rdocs'

你也可以在 ^ 後添加一個數字——例如,d921970^2 意思是「d921970 的第二父提交」。這種語法只在合併提交時有用,因為合併提交可能有多個父提交。第一父提交是你合併時所在分支,而第二父提交是你所合併進來的分支:

$ git show d921970^
commit 1c002dd4b536e7479fe34593e72e6c6c1819e53b
Author: Scott Chacon <schacon@gmail.com>
Date:   Thu Dec 11 14:58:32 2008 -0800

    added some blame and merge stuff

$ git show d921970^2
commit 35cfb2b795a55793d7cc56a6cc2060b4bb732548
Author: Paul Hedderly <paul+git@mjr.org>
Date:   Wed Dec 10 22:22:03 2008 +0000

    Some rdoc changes

另外一個指明祖先提交的方法是 ~。這也是指向第一父提交,所以 HEAD~HEAD^ 是相等的。當你指定數字的時候就明顯不一樣了。HEAD~2 是指「第一父提交的第一父提交」,也就是「祖父提交」——它會根據你指定的次數檢索第一父提交。例如,在上面列出的歷史記錄裡面,HEAD~3 會是

$ git show HEAD~3
commit 1c3618887afb5fbcbea25b7c013f4e2114448b8d
Author: Tom Preston-Werner <tom@mojombo.com>
Date:   Fri Nov 7 13:47:59 2008 -0500

    ignore *.gem

也可以寫成 HEAD^^^,同樣是第一父提交的第一父提交的第一父提交:

$ git show HEAD^^^
commit 1c3618887afb5fbcbea25b7c013f4e2114448b8d
Author: Tom Preston-Werner <tom@mojombo.com>
Date:   Fri Nov 7 13:47:59 2008 -0500

    ignore *.gem

你也可以混合使用這些語法——你可以通過 HEAD~3^2 指明先前引用的第二父提交(假設它是一個合併提交),依此類推。

提交範圍

現在你已經可以指明單次的提交,讓我們來看看怎樣指明一定範圍的提交。這在你管理分支的時候尤顯重要——如果你有很多分支,你可以指明範圍來圈定一些問題的答案,比如:「這個分支上我有哪些工作還沒合併到主分支的?」

雙點

最常用的指明範圍的方法是雙點的語法。這種語法主要是讓 Git 區分出可從一個分支中獲得而不能從另一個分支中獲得的提交。例如,假設你有類似於圖 6-1 的提交歷史。

Figure 6-1. 範圍選擇的提交歷史實例

你想要查看你的試驗分支(experiment)上哪些沒有被提交到主分支,那麼你就可以使用 master..experiment 來讓 Git 顯示這些提交的日誌——這句話的意思是「所有可從 experiment 分支中獲得而不能從 master 分支中獲得的提交」。為了使例子簡單明瞭,我使用了圖示中提交物件的字母,來代替它們在實際的日誌輸出裏的顯示順序:

$ git log master..experiment
D
C

另一方面,如果你想看相反的——所有在 master 而不在 experiment 中的分支——你可以交換分支的名字。experiment..master 顯示所有可在 master 獲得而在 experiment 中不能獲得的提交:

$ git log experiment..master
F
E

這在你想將 experiment 分支維持在最新狀態,並預覽你將合併的提交的時候特別有用。這個語法的另一種常見用途是查看你將把什麼推送到遠端:

$ git log origin/master..HEAD

這條命令顯示任何在你當前分支上而不在遠端 origin 上的 master 分支上的提交。如果你執行 git push 並且你的當前分支正在追蹤 origin/master,被 git log origin/master..HEAD 列出的提交就是將被傳輸到伺服器上的提交。 你也可以省略語法中的一邊讓 Git 來假定它是 HEAD。例如,輸入 git log origin/master.. 將得到和上面的例子一樣的結果—— Git 使用 HEAD 來代替不存在的一邊。

多點

雙點語法就像速記一樣有用;但是你也許會想針對兩個以上的分支來指明修訂版本,比如查看哪些提交被包含在某些分支中的一個,但是不在你當前的分支上。Git 允許你在引用前使用 ^ 字元或者 --not 指明你不希望提交被包含其中的分支。因此下面三個命令是等同的:

$ git log refA..refB
$ git log ^refA refB
$ git log refB --not refA

這樣很好,因為它允許你在查詢中指定多於兩個的引用,而這是雙點語法所做不到的。例如,如果你想查找所有從 refArefB 包含的但是不被 refC 包含的提交,你可以輸入下面中的一個

$ git log refA refB ^refC
$ git log refA refB --not refC

這建立了一個非常強大的修訂版本查詢系統,應該可以幫助你了解你的分支裡有些什麼東西。

三點

最後一種主要的範圍選擇語法是三點語法,這個可以指定被兩個引用中的一個包含但又不被兩者同時包含的分支。回過頭來看一下圖6-1裡所列的提交歷史的例子。 如果你想查看 master 或者 experiment 中包含的但不是兩者共有的引用,你可以執行

$ git log master...experiment
F
E
D
C

這個再次給出你普通的 log 輸出但是只顯示那四次提交的資訊,按照傳統的提交日期排列。

這種情形下,log 命令的一個常用參數是 --left-right,它會顯示每個提交到底處於哪一側的分支。這使得資料更加有用。

$ git log --left-right master...experiment
< F
< E
> D
> C

有了以上工具,讓 Git 知道你要察看哪些提交就容易得多了。

互動式暫存

Git 提供了很多腳本來輔助某些命令列任務。這裡,你將看到一些互動式命令,它們幫助你方便地構建只包含特定組合和部分檔案的提交。在你修改了一大批檔案然後決定將這些變更分佈在幾個有聚焦的提交而不是單個又大又亂的提交時,這些工具非常有用。用這種方法,你可以確保你的提交在邏輯上劃分為相應的變更集合,以便於和你一起工作的開發者審閱。 如果你執行 git add 時加上 -i 或者 --interactive 選項,Git 就進入了一個互動式的 shell 模式,顯示一些類似於下面的資訊:

$ git add -i
           staged     unstaged path
  1:    unchanged        +0/-1 TODO
  2:    unchanged        +1/-1 index.html
  3:    unchanged        +5/-1 lib/simplegit.rb

*** Commands ***
  1: status     2: update      3: revert     4: add untracked
  5: patch      6: diff        7: quit       8: help
What now>

你會看到這個命令以一個完全不同的視圖顯示了你的暫存區——主要是你通過 git status 得到的那些資訊但是稍微簡潔但資訊更加豐富一些。它在左側列出了你暫存的變更,在右側列出了未被暫存的變更。

在這之後是一個命令區。這裡你可以做很多事情,包括暫存檔案(stage)、撤回檔案(unstage)、暫存部分檔案、加入未被追蹤的文件、查看暫存文件的差別。

暫存和撤回檔案

如果你在 What now> 的提示後輸入 2 或者 u,這個腳本會提示你那些檔你想要暫存:

What now> 2
           staged     unstaged path
  1:    unchanged        +0/-1 TODO
  2:    unchanged        +1/-1 index.html
  3:    unchanged        +5/-1 lib/simplegit.rb
Update>>

如果想暫存 TODO 和 index.html,你可以輸入相應的編號:

Update>> 1,2
           staged     unstaged path
* 1:    unchanged        +0/-1 TODO
* 2:    unchanged        +1/-1 index.html
  3:    unchanged        +5/-1 lib/simplegit.rb
Update>>

每個檔旁邊的 * 表示選中的檔將被暫存。如果你在 update>> 提示後直接敲入 Enter,Git會替你把所有選中的內容暫存:

Update>> 
updated 2 paths

*** Commands ***
  1: status     2: update      3: revert     4: add untracked
  5: patch      6: diff        7: quit       8: help
What now> 1
           staged     unstaged path
  1:        +0/-1      nothing TODO
  2:        +1/-1      nothing index.html
  3:    unchanged        +5/-1 lib/simplegit.rb

現在你可以看到 TODO 和 index.html 檔被暫存了,同時 simplegit.rb 檔仍然未被暫存。如果這時你想要撤回 TODO 檔,就使用 3 或者 r(代表 revert,恢復)選項:

*** Commands ***
  1: status     2: update      3: revert     4: add untracked
  5: patch      6: diff        7: quit       8: help
What now> 3
           staged     unstaged path
  1:        +0/-1      nothing TODO
  2:        +1/-1      nothing index.html
  3:    unchanged        +5/-1 lib/simplegit.rb
Revert>> 1
           staged     unstaged path
* 1:        +0/-1      nothing TODO
  2:        +1/-1      nothing index.html
  3:    unchanged        +5/-1 lib/simplegit.rb
Revert>> [enter]
reverted one path

再次查看 Git 的狀態,你會看到你已經撤回了 TODO 檔

*** Commands ***
  1: status     2: update      3: revert     4: add untracked
  5: patch      6: diff        7: quit       8: help
What now> 1
           staged     unstaged path
  1:    unchanged        +0/-1 TODO
  2:        +1/-1      nothing index.html
  3:    unchanged        +5/-1 lib/simplegit.rb

要查看你暫存內容的差異,你可以使用 6 或者 d(表示diff)命令。它會顯示你暫存檔的列表,你可以選擇其中的幾個,顯示其被暫存的差異。這跟你在命令列下指定 git diff --cached 非常相似:

*** Commands ***
  1: status     2: update      3: revert     4: add untracked
  5: patch      6: diff        7: quit       8: help
What now> 6
           staged     unstaged path
  1:        +1/-1      nothing index.html
Review diff>> 1
diff --git a/index.html b/index.html
index 4d07108..4335f49 100644
--- a/index.html
+++ b/index.html
@@ -16,7 +16,7 @@ Date Finder

 <p id="out">...</p>

-<div id="footer">contact : support@github.com</div>
+<div id="footer">contact : email.support@github.com</div>

 <script type="text/javascript">

通過這些基本命令,你可以使用互動式增加模式更加方便地處理暫存區。

暫存補丁 (Staging Patches)

只讓 Git 將檔案的某些部分暫存,而忽略其他部份也是有可能的。例如,你對 simplegit.rb 檔作了兩處修改但是只想暫存其中一個而忽略另一個,在 Git 中實現這一點非常容易。在互動式的提示符下,輸入 5 或者 p(表示 patch,補丁)。Git 會詢問哪些檔你希望部分暫存;然後對於被選中檔案的每一節,他會逐個顯示檔案的差異區塊並詢問你是否希望暫存他們:

diff --git a/lib/simplegit.rb b/lib/simplegit.rb
index dd5ecc4..57399e0 100644
--- a/lib/simplegit.rb
+++ b/lib/simplegit.rb
@@ -22,7 +22,7 @@ class SimpleGit
   end

   def log(treeish = 'master')
-    command("git log -n 25 #{treeish}")
+    command("git log -n 30 #{treeish}")
   end

   def blame(path)
Stage this hunk [y,n,a,d,/,j,J,g,e,?]?

此處你有很多選擇。輸入 ? 可以顯示清單:

Stage this hunk [y,n,a,d,/,j,J,g,e,?]? ?
y - stage this hunk
n - do not stage this hunk
a - stage this and all the remaining hunks in the file
d - do not stage this hunk nor any of the remaining hunks in the file
g - select a hunk to go to
/ - search for a hunk matching the given regex
j - leave this hunk undecided, see next undecided hunk
J - leave this hunk undecided, see next hunk
k - leave this hunk undecided, see previous undecided hunk
K - leave this hunk undecided, see previous hunk
s - split the current hunk into smaller hunks
e - manually edit the current hunk
? - print help

如果你想暫存各個區塊,通常你會輸入 y 或者 n,但是暫存特定檔裡的全部區塊或者暫時跳過對一個區塊的處理同樣也很有用。如果你暫存了檔案的一個部分而保留另外一個部分不被暫存,你的狀態輸出看起來會是這樣:

What now> 1
           staged     unstaged path
  1:    unchanged        +0/-1 TODO
  2:        +1/-1      nothing index.html
  3:        +1/-1        +4/-0 lib/simplegit.rb

simplegit.rb 的狀態非常有意思。它顯示有幾行被暫存了,有幾行沒有。你部分地暫存了這個檔。這時候,你可以退出互動式腳本然後執行 git commit 來提交部分暫存的檔。

最後,你也可以不通過互動式增加的模式來實現檔案部分暫存——你可以在命令列下使用 git add -p 或者 git add --patch 來啟動同樣的腳本。

儲藏 (Stashing)

經常有這樣的事情發生,當你正在進行專案中某一部分的工作,裡面的東西處於一個比較雜亂的狀態,而你想轉到其他分支上進行一些工作。問題是,你不想只為了待會要回到這個工作點,就把做到一半的工作進行提交。解決這個問題的辦法就是 git stash 命令。

「儲藏」可以獲取你工作目錄的 dirty state——也就是你修改過的被追蹤檔和暫存的變更——並將它保存到一個未完成變更的堆疊(stack)中,隨時可以重新應用。

儲藏你的工作

為了演示這一功能,你可以進入你的專案,在一些檔上進行工作,有可能還暫存其中一個變更。如果你執行 git status,你可以看到你的 dirty state:

$ git status
# On branch master
# Changes to be committed:
#   (use "git reset HEAD <file>..." to unstage)
#
#      modified:   index.html
#
# Changed but not updated:
#   (use "git add <file>..." to update what will be committed)
#
#      modified:   lib/simplegit.rb
#

現在你想切換分支,但是你還不想提交你正在進行中的工作;所以你儲藏這些變更。為了往堆疊推送一個新的儲藏,執行 git stash

$ git stash
Saved working directory and index state \
  "WIP on master: 049d078 added the index file"
HEAD is now at 049d078 added the index file
(To restore them type "git stash apply")

你的工作目錄就乾淨了:

$ git status
# On branch master
nothing to commit (working directory clean)

這時,你可以方便地切換到其他分支工作;你的變更都保存在堆疊上。要查看現有的儲藏,你可以使用 git stash list

$ git stash list
stash@{0}: WIP on master: 049d078 added the index file
stash@{1}: WIP on master: c264051... Revert "added file_size"
stash@{2}: WIP on master: 21d80a5... added number to log

在這個案例中,之前已經進行了兩次儲藏,所以你可以取得三個不同的儲藏。你可以重新應用你剛剛的儲藏,所採用的命令就是原本 stash 命令輸出的輔助訊息裡提示的:git stash apply。如果你想應用較舊的儲藏,你可以通過名字指定它,像這樣:git stash apply stash@{2}。如果你不指明,Git 預設使用最近的儲藏並嘗試應用它:

$ git stash apply
# On branch master
# Changed but not updated:
#   (use "git add <file>..." to update what will be committed)
#
#      modified:   index.html
#      modified:   lib/simplegit.rb
#

你可以看到 Git 重新修改了你所儲藏的那些當時尚未提交的檔。在這個案例裡,你嘗試應用儲藏的工作目錄是乾淨的,並且屬於同一分支;但是一個乾淨的工作目錄和應用到相同的分支上並不是應用儲藏的必要條件。你可以在其中一個分支上保留一份儲藏,隨後切換到另外一個分支,再重新應用這些變更。在工作目錄裡包含已修改、未提交的檔時,你也可以應用儲藏——Git 會給出合併衝突,如果有任何變更無法乾淨地被應用。

對檔案的變更被重新應用,但是被暫存的檔沒有重新被暫存。想那樣的話,你必須在執行 git stash apply 命令時帶上一個 --index 的選項來重新應用被暫存的變更。如果你是這麼做的,你應該已經回到你原來的位置:

$ git stash apply --index
# On branch master
# Changes to be committed:
#   (use "git reset HEAD <file>..." to unstage)
#
#      modified:   index.html
#
# Changed but not updated:
#   (use "git add <file>..." to update what will be committed)
#
#      modified:   lib/simplegit.rb
#

apply 選項只嘗試應用儲藏的工作——儲藏的內容仍然在堆疊上。要移除它,你可以執行 git stash drop,加上你希望移除的儲藏的名字:

$ git stash list
stash@{0}: WIP on master: 049d078 added the index file
stash@{1}: WIP on master: c264051... Revert "added file_size"
stash@{2}: WIP on master: 21d80a5... added number to log
$ git stash drop stash@{0}
Dropped stash@{0} (364e91f3f268f0900bc3ee613f9f733e82aaed43)

你也可以執行 git stash pop 來重新應用儲藏,同時立刻將其從堆疊中移走。

取消儲藏 (Un-applying a Stash)

在某些使用情境下,你可能想要應用儲藏的變更,做一些工作,然後又要把來自原儲藏的變更取消。Git 並未提供類似 stash unapply 的命令,但是達成相同效果是可能的,只要取得該儲藏關連的補丁然後反向應用它就行了:

$ git stash show -p stash@{0} | git apply -R

同樣的,如果你沒有指定某個儲藏,Git 會預設為最近的儲藏:

$ git stash show -p | git apply -R

你可能會想要新建一個別名,在你的 git 增加一個 stash-unapply 命令,這樣更有效率。例如:

$ git config --global alias.stash-unapply '!git stash show -p | git apply -R'
$ git stash
$ #... work work work
$ git stash-unapply

從儲藏中創建分支

如果你儲藏了一些工作,暫時不去理會,然後繼續在你儲藏工作的分支上工作,你在重新應用工作時可能會碰到一些問題。如果嘗試應用的變更是針對一個你那之後修改過的檔,你會碰到一個合併衝突並且必須去化解它。如果你想用更方便的方法來重新檢驗你儲藏的變更,你可以執行 git stash branch,這會創建一個新的分支,檢出你儲藏工作時所處的提交,重新應用你的工作,如果成功,將會丟棄儲藏。

$ git stash branch testchanges
Switched to a new branch "testchanges"
# On branch testchanges
# Changes to be committed:
#   (use "git reset HEAD <file>..." to unstage)
#
#      modified:   index.html
#
# Changed but not updated:
#   (use "git add <file>..." to update what will be committed)
#
#      modified:   lib/simplegit.rb
#
Dropped refs/stash@{0} (f0dfc4d5dc332d1cee34a634182e168c4efc3359)

這是一個很棒的捷徑來恢復儲藏的工作然後在新的分支上繼續當時的工作。

重寫歷史

很多時候,在 Git 上工作的時候,你也許會由於某種原因想要修訂你的提交歷史。Git 的一個卓越之處就是它允許你在最後可能的時刻再作決定。你可以在你即將提交暫存區時決定什麼檔歸入哪一次提交,你可以使用 stash 命令來決定你暫時擱置的工作,你可以重寫已經發生的提交以使它們看起來是另外一種樣子。這個包括改變提交的次序、改變說明或者修改提交中包含的檔,將提交歸併(squash)、拆分或者完全刪除——這一切在你尚未開始將你的工作和別人共用前都是可以的。

在這一節中,你會學到如何完成這些很有用的任務,使得你的提交歷史在你將其共用給別人之前變成你想要的樣子。

改變最後一次提交

改變最後一次提交也許是最常見的重寫歷史的行為。對於你的最近一次提交,你經常想做兩件基本事情:改變提交說明,或者經由增加、改變、移除檔案而改變你剛記錄的快照。

如果你只想修改最近一次提交說明,這非常簡單:

$ git commit --amend

這會把你帶入文字編輯器,裡面包含了你最近一次提交的說明訊息,供你修改。當你保存並退出編輯器,這個編輯器會寫入一個新的提交,裡面包含了那個說明,並且讓它成為你的新的最後提交。

如果你完成提交後又想修改被提交的快照,增加或者修改其中的檔案,可能因為你最初提交時,忘了添加一個新建的檔,這個過程基本上一樣。你通過修改檔案然後對其執行 git add 或對一個已被記錄的檔執行 git rm,隨後的 git commit --amend 會獲取你當前的暫存區並將它作為新提交對應的快照。

使用這項技術的時候你必須小心,因為修正會改變提交的SHA-1值。這個很像是一次非常小的 rebase——不要在你最近一次提交被推送後還去修正它。

修改多個提交訊息

要修改歷史中更早的提交,你必須採用更複雜的工具。Git 沒有一個修改歷史的工具,但是你可以使用 rebase 工具來衍合一系列的提交到它們原來所在的 HEAD 上而不是移到新的上。依靠這個互動式的 rebase 工具,你就可以停留在每一次提交後,如果你想修改或改變說明、增加檔案或做任何事情。在 git rebase 增加 -i 選項可以對話模式執行 rebase。你必須告訴 rebase 命令要衍合到哪次提交,來指明你想要重寫的提交要回溯到多遠。

例如,你想修改最近三次的提交說明,或者其中任意一次,你必須給 git rebase -i 提供一個參數,指明你想要修改的提交的父提交,例如 HEAD~2^ 或者 HEAD~3。可能記住 ~3 更加容易,因為你想修改最近三次提交;但是請記住你事實上所指的是四次提交之前,即你想修改的提交的父提交。

$ git rebase -i HEAD~3

再次提醒這是一個衍合命令—— HEAD~3..HEAD 範圍內的每一次提交都會被重寫,無論你是否修改說明。不要涵蓋你已經推送到中心伺服器的提交——這麼做會使其他開發者產生混亂,因為你提供了同樣變更的不同版本。

執行這個命令會在你的文字編輯器提供一個提交列表,看起來像下面這樣:

pick f7f3f6d changed my name a bit
pick 310154e updated README formatting and added blame
pick a5f4a0d added cat-file

# Rebase 710f0f8..a5f4a0d onto 710f0f8
#
# Commands:
#  p, pick = use commit
#  e, edit = use commit, but stop for amending
#  s, squash = use commit, but meld into previous commit
#
# If you remove a line here THAT COMMIT WILL BE LOST.
# However, if you remove everything, the rebase will be aborted.
#

很重要的一點是你得注意這些提交的順序與你通常通過 log 命令看到的是相反的。如果你執行 log,你會看到下面這樣的結果:

$ git log --pretty=format:"%h %s" HEAD~3..HEAD
a5f4a0d added cat-file
310154e updated README formatting and added blame
f7f3f6d changed my name a bit

請注意這裡的順序是相反的。互動式的 rebase 給了你一個即將執行的腳本。它會從你在命令列上指明的提交開始(HEAD~3)然後自上至下重播每次提交裡引入的變更。它將最早的列在頂上而不是最近的,因為這是第一個需要重播的。

你需要修改這個腳本來讓它停留在你想修改的變更上。要做到這一點,你只要將你想修改的每一次提交前面的 pick 改為 edit。例如,只想修改第三次提交說明的話,你就像下面這樣修改文件:

edit f7f3f6d changed my name a bit
pick 310154e updated README formatting and added blame
pick a5f4a0d added cat-file

當你存檔並退出編輯器,Git 會倒回至列表中的最後一次提交,然後把你送到命令列中,同時顯示以下資訊:

$ git rebase -i HEAD~3
Stopped at 7482e0d... updated the gemspec to hopefully work better
You can amend the commit now, with

       git commit --amend

Once you’re satisfied with your changes, run

       git rebase --continue

這些指示很明確地告訴了你該幹什麼。輸入

$ git commit --amend

修改提交說明,退出編輯器。然後,執行

$ git rebase --continue

這個命令會自動應用其他兩次提交,你就完成任務了。如果你將更多行的 pick 改為 edit ,你就能對你想修改的提交重複這些步驟。Git 每次都會停下,讓你修正提交,完成後繼續執行。

重排(Reordering) 提交

你也可以使用互動式的衍合來徹底重排或刪除提交。如果你想刪除 ”added cat-file” 這個提交並且修改其他兩次提交引入的順序,你將 rebase 腳本從這個

pick f7f3f6d changed my name a bit
pick 310154e updated README formatting and added blame
pick a5f4a0d added cat-file

改為這個:

pick 310154e updated README formatting and added blame
pick f7f3f6d changed my name a bit

當你存檔並退出編輯器,Git 將分支倒回至這些提交的父提交,應用 310154e,然後 f7f3f6d,接著停止。你有效地修改了這些提交的順序並且徹底刪除了 ”added cat-file” 這次提交。

擠壓(Squashing) 提交

互動式的衍合工具還可以將一系列提交擠壓為單一提交。腳本在 rebase 的資訊裡放了一些有用的指示:

#
# Commands:
#  p, pick = use commit
#  e, edit = use commit, but stop for amending
#  s, squash = use commit, but meld into previous commit
#
# If you remove a line here THAT COMMIT WILL BE LOST.
# However, if you remove everything, the rebase will be aborted.
#

如果不用 ”pick” 或者 ”edit”,而是指定 ”squash”,Git 會同時應用那個變更和它之前的變更並將提交說明歸併。因此,如果你想將這三個提交合併為單一提交,你可以將腳本修改成這樣:

pick f7f3f6d changed my name a bit
squash 310154e updated README formatting and added blame
squash a5f4a0d added cat-file

當你儲存並退出編輯器,Git 會應用全部三次變更然後將你送回編輯器來歸併三次提交說明。

# This is a combination of 3 commits.
# The first commit's message is:
changed my name a bit

# This is the 2nd commit message:

updated README formatting and added blame

# This is the 3rd commit message:

added cat-file

當你儲存之後,你就擁有了一個包含前三次提交的全部變更的單一提交。

拆分(Splitting) 提交

拆分提交就是撤銷一次提交,然後多次部分地暫存或提交直到結束。例如,假設你想將三次提交中的中間一次拆分。將「updated README formatting and added blame」拆分成兩次提交:第一次為「updated README formatting」,第二次為「added blame」。你可以在 rebase -i 腳本中修改你想拆分的提交前的指令為 ”edit”:

pick f7f3f6d changed my name a bit
edit 310154e updated README formatting and added blame
pick a5f4a0d added cat-file

然後,這個腳本就將你帶入命令列,你重置那次提交,提取被重置的變更,從中創建多次提交。當你儲存並退出編輯器,Git 倒回到列表中第一次提交的父提交,應用第一次提交(f7f3f6d),應用第二次提交(310154e),然後將你帶到控制台。那裡你可以用 git reset HEAD^ 對那次提交進行一次混合的重置,這將撤銷那次提交並且將修改的檔從暫存區撤回。此時你可以暫存並提交檔案,直到你擁有多次提交,結束後,執行 git rebase --continue

$ git reset HEAD^
$ git add README
$ git commit -m 'updated README formatting'
$ git add lib/simplegit.rb
$ git commit -m 'added blame'
$ git rebase --continue

Git 在腳本中應用了最後一次提交(a5f4a0d),你的歷史看起來就像這樣了:

$ git log -4 --pretty=format:"%h %s"
1c002dd added cat-file
9b29157 added blame
35cfb2b updated README formatting
f3cc40e changed my name a bit

再次提醒,這會修改你列表中的提交的 SHA 值,所以請確保這個列表裡不包含你已經推送到共用倉庫的提交。

核彈級選項: filter-branch

如果你想用腳本的方式修改大量的提交,還有一個重寫歷史的選項可以用——例如,全域性地修改電子郵寄地址或者將一個檔從所有提交中刪除。這個命令是 filter-branch,這會大面積地修改你的歷史,所以你很有可能不該去用它,除非你的專案尚未公開,沒有其他人在你準備修改的提交的基礎上工作。儘管如此,這個可以非常有用。你會學習一些常見用法,借此對它的能力有所認識。

從所有提交中刪除一個檔

這個經常發生。有些人不經思考使用 git add .,意外地提交了一個巨大的二進位檔案,你想將它從所有地方刪除。也許你不小心提交了一個包含密碼的檔,而你想讓你的專案成為 open source。filter-branch 大概會是你用來清理整個歷史的工具。要從整個歷史中刪除一個名叫 password.txt 的檔,你可以在 filter-branch 上使用 --tree-filter 選項:

$ git filter-branch --tree-filter 'rm -f passwords.txt' HEAD
Rewrite 6b9b3cf04e7c5686a9cb838c3f36a8cb6a0fc2bd (21/21)
Ref 'refs/heads/master' was rewritten

--tree-filter 選項會在每次 checkout 專案時先執行指定的命令然後重新提交結果。在這個例子中,你會在所有快照中刪除一個名叫 password.txt 的檔,無論它是否存在。如果你想刪除所有不小心提交上去的編輯器備份檔案,你可以執行類似 git filter-branch --tree-filter 'rm -f *~' HEAD 的命令。

你可以觀察到 Git 重寫目錄樹並且提交,然後將分支指標移到末尾。一個比較好的辦法是在一個測試分支上做這件事,然後在你確定結果真的是你所要的之後,再 hard-reset 你的主分支。要在你所有的分支上運行 filter-branch 的話,你可以傳遞一個 --all 參數給該命令。

將一個子目錄設置為新的根目錄

假設你完成了從另外一個代碼控制系統的導入工作,得到了一些沒有意義的子目錄(trunk, tags 等等)。如果你想讓 trunk 子目錄成為每一次提交的新的專案根目錄,filter-branch 也可以幫你做到:

$ git filter-branch --subdirectory-filter trunk HEAD
Rewrite 856f0bf61e41a27326cdae8f09fe708d679f596f (12/12)
Ref 'refs/heads/master' was rewritten

現在你的專案根目錄就是 trunk 子目錄了。Git 會自動地刪除不對這個子目錄產生影響的提交。

全域性地更換電子郵寄地址

另一個常見的案例是你在開始時忘了執行 git config 來設置你的姓名和電子郵寄地址,也許你想開源一個專案,把你所有的工作電子郵寄地址修改為個人位址。無論哪種情況你都可以用 filter-branch 來更換多次提交裡的電子郵寄地址。你必須小心一些,只改變屬於你的電子郵寄地址,所以你使用 --commit-filter

$ git filter-branch --commit-filter '
        if [ "$GIT_AUTHOR_EMAIL" = "schacon@localhost" ];
        then
                GIT_AUTHOR_NAME="Scott Chacon";
                GIT_AUTHOR_EMAIL="schacon@example.com";
                git commit-tree "$@";
        else
                git commit-tree "$@";
        fi' HEAD

這樣會巡迴並重寫所有提交使之擁有你的新地址。因為提交裡包含了它們的父提交的 SHA-1 值,這個命令會修改你的歷史中的所有提交,而不僅僅是包含了匹配的電子郵寄地址的那些。

使用 Git 做 Debug

Git 也提供了一些工具來幫助你 debug 專案中遇到的問題。由於 Git 被設計為可應用於幾乎任何類型的專案,這些工具是通用型的,但是在遇到問題時經常可以幫助你找到 bug 在哪裏。

檔案標注 (File Annotation)

如果你在追查程式碼中的 bug,想要知道這是什麼時候、為什麼被引進來的,檔案標注會是你的最佳工具。它會顯示檔案中對每一行進行修改的最近一次提交。因此,如果你發現自己程式碼中的一個 method 有 bug,你可以用 git blame 來標注該檔案,查看那個 method 的每一行分別是由誰在哪一天修改的。下面這個例子使用了 -L 選項來限制輸出範圍在第12至22行:

$ git blame -L 12,22 simplegit.rb 
^4832fe2 (Scott Chacon  2008-03-15 10:31:28 -0700 12)  def show(tree = 'master')
^4832fe2 (Scott Chacon  2008-03-15 10:31:28 -0700 13)   command("git show #{tree}")
^4832fe2 (Scott Chacon  2008-03-15 10:31:28 -0700 14)  end
^4832fe2 (Scott Chacon  2008-03-15 10:31:28 -0700 15)
9f6560e4 (Scott Chacon  2008-03-17 21:52:20 -0700 16)  def log(tree = 'master')
79eaf55d (Scott Chacon  2008-04-06 10:15:08 -0700 17)   command("git log #{tree}")
9f6560e4 (Scott Chacon  2008-03-17 21:52:20 -0700 18)  end
9f6560e4 (Scott Chacon  2008-03-17 21:52:20 -0700 19) 
42cf2861 (Magnus Chacon 2008-04-13 10:45:01 -0700 20)  def blame(path)
42cf2861 (Magnus Chacon 2008-04-13 10:45:01 -0700 21)   command("git blame #{path}")
42cf2861 (Magnus Chacon 2008-04-13 10:45:01 -0700 22)  end

請注意第一欄是最後一次修改該行的那次提交的 SHA-1 部份值。接下去的兩欄是從那次提交中取出的值——作者姓名和日期——所以你可以方便地獲知誰在什麼時候修改了這一行。在這後面是行號和檔案內容。請注意 ^4832fe2 提交的那些行,這些指的是檔案最初提交(original commit)的那些行。那個提交是檔案第一次被加入這個專案時存在的,自那以後未被修改過。這會帶來小小的困惑,因為你已經至少看到了 Git 使用 ^ 來修飾一個提交的 SHA值 的三種不同的意義,但這裡確實就是這個意思。

另一件很酷的事情是,Git 並不會明確地記錄對檔案所做的重命名(rename)動作。它會記錄快照,然後根據實際狀況嘗試找出隱藏在背後的重命名動作。這其中有一個很有意思的特性,就是你可以讓它找出所有的程式碼移動。如果你在 git blame 後加上 -C,Git 會分析你所標注的檔案,然後嘗試找出其中代碼片段的原始出處,如果它是從其他地方拷貝過來的話。最近,我在對 GITServerHandler.m 這個檔案做程式碼重構(code refactoring),將它分解為多個檔案,其中一個是 GITPackUpload.m。通過對 GITPackUpload.m 執行帶 -C 參數的 blame 命令,我可以看到程式碼片段的原始出處:

$ git blame -C -L 141,153 GITPackUpload.m 
f344f58d GITServerHandler.m (Scott 2009-01-04 141) 
f344f58d GITServerHandler.m (Scott 2009-01-04 142) - (void) gatherObjectShasFromC
f344f58d GITServerHandler.m (Scott 2009-01-04 143) {
70befddd GITServerHandler.m (Scott 2009-03-22 144)         //NSLog(@"GATHER COMMI
ad11ac80 GITPackUpload.m    (Scott 2009-03-24 145)
ad11ac80 GITPackUpload.m    (Scott 2009-03-24 146)         NSString *parentSha;
ad11ac80 GITPackUpload.m    (Scott 2009-03-24 147)         GITCommit *commit = [g
ad11ac80 GITPackUpload.m    (Scott 2009-03-24 148)
ad11ac80 GITPackUpload.m    (Scott 2009-03-24 149)         //NSLog(@"GATHER COMMI
ad11ac80 GITPackUpload.m    (Scott 2009-03-24 150)
56ef2caf GITServerHandler.m (Scott 2009-01-05 151)         if(commit) {
56ef2caf GITServerHandler.m (Scott 2009-01-05 152)                 [refDict setOb
56ef2caf GITServerHandler.m (Scott 2009-01-05 153)

這真的非常有用。通常,你會把你拷貝代碼的那次提交作為原始提交,因為這是你在這個檔中第一次接觸到那幾行。Git可以告訴你編寫那些行的原始提交,即便是在另一個檔裡。

二分法查找 (Binary Search)

當你知道問題在哪裡的時候,標注檔案會有幫助。如果你不知道,並且自從上次程式碼可用的狀態之後已經經歷了上百次的提交,你可能就要求助於 git bisect 命令了。bisect 會在你的提交歷史中進行二分查找,來儘快地確定哪一次提交引入了錯誤。

例如你剛剛推送了一個代碼發佈版本到產品環境中,得到一些在你開發環境中沒有發生的錯誤報告,而你對代碼為什麼會表現成那樣百思不得其解。你回到你的代碼中,還好你可以重現那個錯誤,但是找不到問題在哪裡。你可以對代碼執行 bisect 來尋找。首先你執行 git bisect start 啟動,然後你用 git bisect bad 來告訴系統當前的提交已經有問題了。然後你必須告訴 bisect 已知的最後一次正常狀態是哪次提交,使用 git bisect good [good_commit]

$ git bisect start
$ git bisect bad
$ git bisect good v1.0
Bisecting: 6 revisions left to test after this
[ecb6e1bc347ccecc5f9350d878ce677feb13d3b2] error handling on repo

Git 發現在你標記為正常的提交(v1.0)和當前的錯誤版本之間有大約12次提交,於是它 check out 中間的一個。在這裡,你可以進行測試,檢查問題是否存在於這次提交。如果是,那麼它是在這個中間提交之前的某一次引入的;如果否,那麼問題是在中間提交之後引入的。假設這裡是沒有錯誤的,那麼你就通過 git bisect good 來告訴 Git 然後繼續你的旅程:

$ git bisect good
Bisecting: 3 revisions left to test after this
[b047b02ea83310a70fd603dc8cd7a6cd13d15c04] secure this thing

現在你在另外一個提交上了,在你剛剛測試通過的和一個錯誤提交的中點處。你再次執行測試然後發現這次提交是錯誤的,因此你通過 git bisect bad 來告訴 Git:

$ git bisect bad
Bisecting: 1 revisions left to test after this
[f71ce38690acf49c1f3c9bea38e09d82a5ce6014] drop exceptions table

這次提交是好的,那麼 Git 就獲得了確定問題引入位置所需的所有資訊。它告訴你第一個錯誤提交的 SHA-1 值,並且顯示一些提交說明,以及哪些檔在那次提交裡被修改過,這樣你可以找出 bug 被引入的根源:

$ git bisect good
b047b02ea83310a70fd603dc8cd7a6cd13d15c04 is first bad commit
commit b047b02ea83310a70fd603dc8cd7a6cd13d15c04
Author: PJ Hyett <pjhyett@example.com>
Date:   Tue Jan 27 14:48:32 2009 -0800

    secure this thing

:040000 040000 40ee3e7821b895e52c1695092db9bdc4c61d1730
f24d3c6ebcfc639b1a3814550e62d60b8e68a8e4 M  config

當你完成之後,你應該執行 git bisect reset 來重設你的 HEAD 到你開始前的地方,否則你會處於一個詭異的狀態:

$ git bisect reset

這是個強大的工具,可以幫助你檢查上百的提交,在幾分鐘內找出 bug 引入的位置。事實上,如果你有一個腳本程式會在專案工作正常時返回0,錯誤時返回非0的話,你可以完全自動地執行 git bisect。首先你需要提供已知的錯誤和正確提交來告訴它二分查找的範圍。你可以通過 bisect start 命令來列出它們,先列出已知的錯誤提交再列出已知的正確提交:

$ git bisect start HEAD v1.0
$ git bisect run test-error.sh

這樣會自動地在每一個 checked-out 提交裡執行 test-error.sh 直到 Git 找出第一個破損的提交。你也可以執行像 make 或者 make tests,或者任何你所能執行的自動化測試。

子模組 (Submodules)

經常有這樣的事情,當你在一個專案上工作時,你需要在其中使用另外一個專案。也許它是一個協力廠商開發的程式庫(Library),或者是你另外開發給多個父專案使用的子專案。在這個情境下產生了一個常見的問題:你想將這兩個專案分開處理,但是又需要在其中一個中使用另外一個。

這裡有一個例子。假設你在開發一個網站,並提供 Atom 訂閱(Atom feeds)。你不想自己編寫產生 Atom 的程式,而是決定使用一個 Library。你可能必須從 CPAN install 或者 Ruby gem 之類的共用庫(shared library)將那段程式 include 進來,或者將原始程式碼複製到你的專案樹中。如果採用包含程式庫的辦法,那麼不管用什麼辦法都很難對這個程式庫做客製化(customize),部署它就更加困難了,因為你必須確保每個客戶都擁有那個程式庫。把程式碼包含到你自己的專案中帶來的問題是,當上游被修改時,任何你進行的客製化的修改都很難歸併(merge)。

Git 通過子模組處理這個問題。子模組允許你將一個 Git 倉庫當作另外一個 Git 倉庫的子目錄。這允許你 clone 另外一個倉庫到你的專案中並且保持你的提交相對獨立。

子模組初步

假設你想把 Rack library(一個 Ruby 的 web 伺服器閘道介面)加入到你的專案中,可能既要保持你自己的變更,又要延續上游的變更。首先你要把外部的倉庫 clone 到你的子目錄中。你通過 git submodule add 命令將外部專案加為子模組:

$ git submodule add git://github.com/chneukirchen/rack.git rack
Initialized empty Git repository in /opt/subtest/rack/.git/
remote: Counting objects: 3181, done.
remote: Compressing objects: 100% (1534/1534), done.
remote: Total 3181 (delta 1951), reused 2623 (delta 1603)
Receiving objects: 100% (3181/3181), 675.42 KiB | 422 KiB/s, done.
Resolving deltas: 100% (1951/1951), done.

現在你就在專案裡的 rack 子目錄下有了一個 Rack 專案。你可以進入那個子目錄,進行變更,加入你自己的遠端可寫倉庫來推送你的變更,從原始倉庫拉取(pull)和歸併等等。如果你在加入子模組後立刻運行 git status,你會看到下面兩項:

$ git status
# On branch master
# Changes to be committed:
#   (use "git reset HEAD <file>..." to unstage)
#
#      new file:   .gitmodules
#      new file:   rack
#

首先你注意到有一個 .gitmodules 文件。這是一個設定檔,保存了專案 URL 和你拉取到的本地子目錄

$ cat .gitmodules 
[submodule "rack"]
      path = rack
      url = git://github.com/chneukirchen/rack.git

如果你有多個子模組,這個檔裡會有多個條目。很重要的一點是這個文件跟其他文件一樣也是處於版本控制之下的,就像你的 .gitignore 檔一樣。它跟專案裡的其他檔一樣可以被推送和拉取。這是其他 clone 此專案的人獲知子模組專案來源的途徑。

git status 的輸出裡所列的另一項 rack 。如果你在它上面執行 git diff,會發現一些有趣的東西:

$ git diff --cached rack
diff --git a/rack b/rack
new file mode 160000
index 0000000..08d709f
--- /dev/null
+++ b/rack
@@ -0,0 +1 @@
+Subproject commit 08d709f78b8c5b0fbeb7821e37fa53e69afcf433

儘管 rack 是你工作目錄裡的子目錄,但 Git 把它視作一個子模組,當你不在那個目錄裡的時候,Git 並不會追蹤記錄它的內容。取而代之的是,Git 將它記錄成來自那個倉庫的一個特殊的提交。當你在那個子目錄裡修改並提交時,子專案會通知那裡的 HEAD 已經發生變更並記錄你當前正在工作的那個提交;通過那樣的方法,當其他人 clone 此專案,他們可以重新創建一致的環境。

這是關於子模組的重要一點:你記錄他們當前確切所處的提交。你不能記錄一個子模組的 master 或者其他的符號引用(symbolic reference)。

當你提交時,會看到類似如下:

$ git commit -m 'first commit with submodule rack'
[master 0550271] first commit with submodule rack
 2 files changed, 4 insertions(+), 0 deletions(-)
 create mode 100644 .gitmodules
 create mode 160000 rack

注意 rack 條目的 160000 模式。這在 Git 中是一個特殊模式,基本意思是你將一個提交記錄為一個目錄項而不是子目錄或者檔案。

你可以將 rack 目錄當作一個獨立的專案,保持一個指向子目錄的最新提交的指標然後反復地更新上層專案。所有的 Git 命令都在兩個子目錄裡獨立工作:

$ git log -1
commit 0550271328a0038865aad6331e620cd7238601bb
Author: Scott Chacon <schacon@gmail.com>
Date:   Thu Apr 9 09:03:56 2009 -0700

    first commit with submodule rack
$ cd rack/
$ git log -1
commit 08d709f78b8c5b0fbeb7821e37fa53e69afcf433
Author: Christian Neukirchen <chneukirchen@gmail.com>
Date:   Wed Mar 25 14:49:04 2009 +0100

    Document version change

Clone 一個帶子模組的專案

這裡你將 clone 一個帶子模組的專案。當你接收到這樣一個專案,你將得到了包含子專案的目錄,但裡面沒有檔案:

$ git clone git://github.com/schacon/myproject.git
Initialized empty Git repository in /opt/myproject/.git/
remote: Counting objects: 6, done.
remote: Compressing objects: 100% (4/4), done.
remote: Total 6 (delta 0), reused 0 (delta 0)
Receiving objects: 100% (6/6), done.
$ cd myproject
$ ls -l
total 8
-rw-r--r--  1 schacon  admin   3 Apr  9 09:11 README
drwxr-xr-x  2 schacon  admin  68 Apr  9 09:11 rack
$ ls rack/
$

rack 目錄存在了,但是是空的。你必須執行兩個命令:git submodule init 來初始化你的本地設定檔,git submodule update 來從那個專案拉取所有資料並 check out 你上層專案裡所列的合適的提交:

$ git submodule init
Submodule 'rack' (git://github.com/chneukirchen/rack.git) registered for path 'rack'
$ git submodule update
Initialized empty Git repository in /opt/myproject/rack/.git/
remote: Counting objects: 3181, done.
remote: Compressing objects: 100% (1534/1534), done.
remote: Total 3181 (delta 1951), reused 2623 (delta 1603)
Receiving objects: 100% (3181/3181), 675.42 KiB | 173 KiB/s, done.
Resolving deltas: 100% (1951/1951), done.
Submodule path 'rack': checked out '08d709f78b8c5b0fbeb7821e37fa53e69afcf433'

現在你的 rack 子目錄就處於你先前提交的確切狀態了。如果另外一個開發者變更了 rack 的代碼並提交,你拉取那個引用然後歸併之,你會得到有點怪怪的東西:

$ git merge origin/master
Updating 0550271..85a3eee
Fast forward
 rack |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)
[master*]$ git status
# On branch master
# Changed but not updated:
#   (use "git add <file>..." to update what will be committed)
#   (use "git checkout -- <file>..." to discard changes in working directory)
#
#      modified:   rack
#

你歸併進來的僅僅是一個指向你的子模組的指標;但是它並不更新你子模組目錄裡的代碼,所以看起來你的工作目錄處於一個臨時狀態(dirty state):

$ git diff
diff --git a/rack b/rack
index 6c5e70b..08d709f 160000
--- a/rack
+++ b/rack
@@ -1 +1 @@
-Subproject commit 6c5e70b984a60b3cecd395edd5b48a7575bf58e0
+Subproject commit 08d709f78b8c5b0fbeb7821e37fa53e69afcf433

事情就是這樣,因為你所擁有的子模組的指標,並沒有對應到子模組目錄的真實狀態。為了修復這一點,你必須再次運行 git submodule update

$ git submodule update
remote: Counting objects: 5, done.
remote: Compressing objects: 100% (3/3), done.
remote: Total 3 (delta 1), reused 2 (delta 0)
Unpacking objects: 100% (3/3), done.
From git@github.com:schacon/rack
   08d709f..6c5e70b  master     -> origin/master
Submodule path 'rack': checked out '6c5e70b984a60b3cecd395edd5b48a7575bf58e0'

每次你從主專案中拉取一個子模組的變更都必須這樣做。看起來很怪但是管用。

一個常見問題是當開發者對子模組做了一個本地的變更但是並沒有推送到公共伺服器。然後他們提交了一個指向那個非公開狀態的指標然後推送上層專案。當其他開發者試圖運行 git submodule update,那個子模組系統會找不到所引用的提交,因為它只存在于第一個開發者的系統中。如果發生那種情況,你會看到類似這樣的錯誤:

$ git submodule update
fatal: reference isn’t a tree: 6c5e70b984a60b3cecd395edd5b48a7575bf58e0
Unable to checkout '6c5e70b984a60b3cecd395edd5ba7575bf58e0' in submodule path 'rack'

你不得不去查看誰最後變更了子模組:

$ git log -1 rack
commit 85a3eee996800fcfa91e2119372dd4172bf76678
Author: Scott Chacon <schacon@gmail.com>
Date:   Thu Apr 9 09:19:14 2009 -0700

    added a submodule reference I will never make public. hahahahaha!

然後,你給那個傢伙發電子郵件說他一通。

上層專案

有時候,開發者想按照他們的分組獲取一個大專案的子目錄的子集。如果你是從 CVS 或者 Subversion 遷移過來的話這個很常見,在那些系統中你已經定義了一個模組或者子目錄的集合,而你想延續這種類型的工作流程。

在 Git 中實現這個的一個好辦法是你將每一個子目錄都做成獨立的 Git 倉庫,然後創建一個上層專案的 Git 倉庫包含多個子模組。這個辦法的一個優勢是,你可以在上層專案中通過標籤和分支更為明確地定義專案之間的關係。

子模組的問題

使用子模組並非沒有任何缺點。首先,你在子模組目錄中工作時必須相對小心。當你執行 git submodule update,它會 check out 專案的指定版本,但是不在分支內。這叫做獲得一個分離的頭(detached head)——這意味著 HEAD 檔直接指向一次提交,而不是一個符號引用(symbolic reference)。問題在於你通常並不想在一個分離的頭的環境下工作,因為太容易丟失變更了。如果你先執行了一次 submodule update,然後在那個子模組目錄裡不創建分支就進行提交,然後再次從上層專案裡執行 git submodule update 同時不進行提交,Git 會毫無提示地覆蓋你的變更。技術上講你不會丟失工作,但是你將失去指向它的分支,因此會很難取得。

為了避免這個問題,當你在子模組目錄裡工作時應使用 git checkout -b work 或類似的命令來 創建一個分支。當你再次在子模組裡更新的時候,它仍然會覆蓋你的工作,但是至少你擁有一個可以回溯的指標。

切換帶有子模組的分支同樣也很有技巧。如果你創建一個新的分支,增加了一個子模組,然後切換回不帶該子模組的分支,你仍然會擁有一個未被追蹤的子模組的目錄 :

$ git checkout -b rack
Switched to a new branch "rack"
$ git submodule add git@github.com:schacon/rack.git rack
Initialized empty Git repository in /opt/myproj/rack/.git/
...
Receiving objects: 100% (3184/3184), 677.42 KiB | 34 KiB/s, done.
Resolving deltas: 100% (1952/1952), done.
$ git commit -am 'added rack submodule'
[rack cc49a69] added rack submodule
 2 files changed, 4 insertions(+), 0 deletions(-)
 create mode 100644 .gitmodules
 create mode 160000 rack
$ git checkout master
Switched to branch "master"
$ git status
# On branch master
# Untracked files:
#   (use "git add <file>..." to include in what will be committed)
#
#      rack/

你將不得不將它移走或者刪除,這樣的話當你切換回去的時候必須重新 clone 它——你可能會丟失你未推送的本地的變更或分支。

最後一個需要注意的是關於從子目錄切換到子模組。如果你已經追蹤了你專案中的一些檔案,但是想把它們移到子模組去,你必須非常小心,否則 Git 會生你的氣。假設你的專案中有一個子目錄裡放了 rack 的檔,然後你想將它轉換為子模組。如果你刪除子目錄然後執行 submodule add,Git 會向你大吼:

$ rm -Rf rack/
$ git submodule add git@github.com:schacon/rack.git rack
'rack' already exists in the index

你必須先將 rack 目錄撤回(unstage)。然後你才能加入子模組:

$ git rm -r rack
$ git submodule add git@github.com:schacon/rack.git rack
Initialized empty Git repository in /opt/testsub/rack/.git/
remote: Counting objects: 3184, done.
remote: Compressing objects: 100% (1465/1465), done.
remote: Total 3184 (delta 1952), reused 2770 (delta 1675)
Receiving objects: 100% (3184/3184), 677.42 KiB | 88 KiB/s, done.
Resolving deltas: 100% (1952/1952), done.

現在假設你在一個分支裡那樣做了。如果你嘗試切換回一個仍然在目錄裡保留那些檔而不是子模組的分支時——你會得到下面的錯誤:

$ git checkout master
error: Untracked working tree file 'rack/AUTHORS' would be overwritten by merge.

你必須先移除 rack 子模組的目錄才能切換到不包含它的分支:

$ mv rack /tmp/
$ git checkout master
Switched to branch "master"
$ ls
README  rack

然後,當你切換回來,你會得到一個空的 rack 目錄。你可以執行 git submodule update 重新 clone,也可以將 /tmp/rack 目錄重新移回空目錄。

子樹合併

現在你已經看到了子模組系統的麻煩之處,讓我們來看一下解決相同問題的另一途徑。當 Git 歸併(merge)時,它會檢查需要歸併的內容然後選擇一個合適的歸併策略。如果你歸併的分支是兩個,Git使用一個「遞迴 (recursive)」策略。如果你歸併的分支超過兩個,Git採用「章魚」策略。這些策略是自動選擇的,因為遞迴策略可以處理複雜的三路歸併情況——比如多於一個共同祖先——但是它只能處理兩個分支的歸併。章魚歸併可以處理多個分支,但是必須更加小心以避免衝突帶來的麻煩,因此它被選中作為歸併兩個以上分支的預設策略。

實際上,你也可以選擇其他策略。其中的一個就是「子樹歸併 (subtree merge)」,你可以用它來處理子專案問題。這裡你會看到如何換用子樹歸併的方法來實現前一節裡所做的 rack 的嵌入。

子樹歸併的想法是你擁有兩個專案,其中一個專案映射到另外一個專案的子目錄中,反過來也一樣。當你指定一個子樹歸併,Git 可以聰明地探知其中一個是另外一個的子樹從而實現正確的歸併——這相當神奇。

首先你將 Rack 應用加入到專案中。你將 Rack 專案當作你專案中的一個遠端參照,然後將它 check out 到它自身的分支:

$ git remote add rack_remote git@github.com:schacon/rack.git
$ git fetch rack_remote
warning: no common commits
remote: Counting objects: 3184, done.
remote: Compressing objects: 100% (1465/1465), done.
remote: Total 3184 (delta 1952), reused 2770 (delta 1675)
Receiving objects: 100% (3184/3184), 677.42 KiB | 4 KiB/s, done.
Resolving deltas: 100% (1952/1952), done.
From git@github.com:schacon/rack
 * [new branch]      build      -> rack_remote/build
 * [new branch]      master     -> rack_remote/master
 * [new branch]      rack-0.4   -> rack_remote/rack-0.4
 * [new branch]      rack-0.9   -> rack_remote/rack-0.9
$ git checkout -b rack_branch rack_remote/master
Branch rack_branch set up to track remote branch refs/remotes/rack_remote/master.
Switched to a new branch "rack_branch"

現在在你的 rack_branch 分支中就有了 Rack 專案的根目錄,而你自己的專案在 master 分支中。如果你先 check out 其中一個然後另外一個,你會看到它們有不同的專案根目錄:

$ ls
AUTHORS        KNOWN-ISSUES   Rakefile      contrib        lib
COPYING        README         bin           example        test
$ git checkout master
Switched to branch "master"
$ ls
README

要將 Rack 專案當作子目錄拉取到你的 master 專案中。你可以在 Git 中用 git read-tree 來實現。你會在第9章學到更多與 read-tree 和它的朋友相關的東西,目前你會知道它讀取一個分支的根目錄樹到當前的暫存區和工作目錄。你只要切換回你的 master 分支,然後拉取 rack 分支到你主專案的 master 分支的 rack 子目錄:

$ git read-tree --prefix=rack/ -u rack_branch

當你提交的時候,看起來就像你在那個子目錄下擁有全部 Rack 的檔案——就像你從一個 tarball 裡拷貝的一樣。有意思的是你可以比較容易地歸併其中一個分支的變更到另外一個。因此,如果 Rack 專案更新了,你可以通過切換到那個分支並執行拉取來獲得上游的變更:

$ git checkout rack_branch
$ git pull

然後,你可以將那些變更歸併回你的 master 分支。你可以使用 git merge -s subtree,它會工作的很好;但是 Git 同時會把歷史歸併到一起,這可能不是你想要的。為了拉取變更並預置提交說明,需要在 -s subtree 策略選項的同時使用 --squash--no-commit 選項。

$ git checkout master
$ git merge --squash -s subtree --no-commit rack_branch
Squash commit -- not updating HEAD
Automatic merge went well; stopped before committing as requested

所有 Rack 專案的變更都被歸併進來,而且可以進行本地提交。你也可以做相反的事情——在你主分支的 rack 目錄裡進行變更然後歸併回 rack_branch 分支,然後將它們提交給維護者或者推送到上游。

為了得到 rack 子目錄和你 rack_branch 分支的區別——以決定你是否需要歸併它們——你不能使用一般的 diff 命令。而是對你想比較的分支執行 git diff-tree

$ git diff-tree -p rack_branch

或者,為了比較你的 rack 子目錄和伺服器上你拉取時的 master 分支,你可以執行

$ git diff-tree -p rack_remote/master

總結

你已經看到了很多高級的工具,允許你更加精確地操控你的提交和暫存區(staging area)。當你碰到問題時,你應該可以很容易找出是哪個分支、什麼時候、由誰引入了它們。如果你想在專案中使用子專案,你也已經學會了一些方法來滿足這些需求。到此,你應該能夠在 Git 命令列下完成大部分的日常事務,並且感到比較順手。

# Git 客製化 #

So far, I’ve covered the basics of how Git works and how to use it, and I’ve introduced a number of tools that Git provides to help you use it easily and efficiently. In this chapter, I’ll go through some operations that you can use to make Git operate in a more customized fashion by introducing several important configuration settings and the hooks system. With these tools, it’s easy to get Git to work exactly the way you, your company, or your group needs it to. 到目前為止,我闡述了 Git 基本的運作機制和使用方式,介紹了 Git 提供的許多工具來幫助你簡單且有效地使用它。在本章,我將會介紹 Git 的一些重要的組態設定(configuration)和鉤子(hooks)機制以滿足自訂的要求。通過這些工具,它能夠更容易地使 Git 按照你、你的公司或團隊所需要的方式去運作。

Git 配置

如第一章所言,用 git config 配置 Git,要做的第一件事就是設置名字和郵箱地址:

$ git config --global user.name "John Doe"
$ git config --global user.email johndoe@example.com

從現在開始,你會瞭解到一些類似以上但更為有趣的設置選項來自訂 Git。

先過一遍第一章中提到的 Git 配置細節。Git 使用一系列的設定檔來儲存你定義的偏好,它首先會查找 /etc/gitconfig 檔,該檔所含的設定值對系統上所有使用者都有效,也對他們所擁有的倉庫都有效(譯注:gitconfig 是全域設定檔), 如果傳遞 --system 選項給 git config 命令, Git 會讀寫這個檔。

接下來 Git 會尋找每個用戶的 ~/.gitconfig 檔,它是針對個別使用者的,你可以傳遞 --global 選項讓 Git 讀寫該檔。

最後,Git 會尋找你目前使用中的倉庫 Git 目錄下的設定檔(.git/config),該文件中的設定值只對這個倉庫有效。以上闡述的三層配置從一般到特殊層層推進,如果定義的值有衝突,後面層級的設定會面寫前面層級的設定值,例如:在 .git/config/etc/gitconfig 的較量中, .git/config 取得了勝利。雖然你也可以直接手動編輯這些設定檔,但是執行 git config 命令將會來得簡單些。

用戶端基本配置

Git 能夠識別的配置項被分為了兩大類:用戶端和伺服器端,其中大部分屬於用戶端配置,這是基於你個人工作偏好所做的設定。儘管有數不盡的選項,但我只闡述其中經常使用、或者會對你的工作流程產生巨大影響的選項。許多選項只在極端的情況下有用,這裏就不多做介紹了。如果你想觀察你的 Git 版本能識別的選項清單,請執行

$ git config --help

git config 的手冊頁(譯注:以 man 命令的顯示方式)非常細緻地羅列了所有可用的配置項。

core.editor

預設情況下,Git 會使用你所設定的「預設文字編輯器」,否則會使用 Vi 來創建和編輯提交以及標籤資訊,你可以使用 core.editor 改變預設編輯器:

$ git config --global core.editor emacs

現在無論你的環境變數 editor 被定義成什麼,Git 都會觸發 Emacs 來編輯相關訊息。

commit.template

如果把這個項目指定為你系統上的一個檔,當你提交的時候,Git 會預設使用該檔定義的內容做為預設的提交訊息。例如:你創建了一個範本檔 $HOME/.gitmessage.txt,它看起來像這樣:

subject line

what happened

[ticket: X]

如下設定 commit.template 可以告訴 Git,把上列內容做為預設訊息,當你執行 git commit 的時候,在你的編輯器中顯示:

$ git config --global commit.template $HOME/.gitmessage.txt
$ git commit

然後當你提交時,在編輯器中顯示的提交資訊如下:

subject line

what happened

[ticket: X]
# Please enter the commit message for your changes. Lines starting
# with '#' will be ignored, and an empty message aborts the commit.
# On branch master
# Changes to be committed:
#   (use "git reset HEAD <file>..." to unstage)
#
# modified:   lib/test.rb
#
~
~
".git/COMMIT_EDITMSG" 14L, 297C

如果你對於提交訊息有特定的政策,那就在系統上創建一個範本檔,設定 Git 使用它做為預設值,這樣可以幫助提升你的政策經常被遵守的機會。

core.pager

core.pager 指定 Git 執行諸如 log、diff 等命令時所使用的分頁器,你可以設成用 more 或者任何你喜歡的分頁器(預設用的是 less), 當然你也可以不用分頁器,只要把它設成空字串:

$ git config --global core.pager ''

這樣不管命令的輸出量多少,都會在一頁顯示所有內容。

user.signingkey

如果你要創建經簽署的含附注的標籤(signed annotated tags)(正如第二章所述),那麼把你的 GPG 簽署金鑰設為配置項會更好,設置金鑰 ID 如下:

$ git config --global user.signingkey <gpg-key-id>

現在你能夠簽署標籤,從而不必每次執行 git tag 命令時定義金鑰:

$ git tag -s <tag-name>

core.excludesfile

正如第二章所述,你能在專案倉庫的 .gitignore 檔裡頭用模式(pattern)來定義那些無需納入 Git 管理的檔案,這樣它們不會出現在未追蹤列表,也不會在你執行 git add 後被暫存。然而,如果你想用專案倉庫之外的檔案來定義那些需被忽略的檔的話,可以用 core.excludesfile 來通知 Git 該檔所在的位置,檔案內容則和 .gitignore 類似。

help.autocorrect

這個選置項只在 Git 1.6.1 以上(含)版本有效,假如你在 Git 1.6 中錯打了一條命令,它會像這樣顯示:

$ git com
git: 'com' is not a git-command. See 'git --help'.

Did you mean this?
     commit

如果你把 help.autocorrect 設置成1(譯注:啟動自動修正),那麼在只有一個命令符合的情況下,Git 會自動執行該命令。

Git 中的著色

Git 能夠為輸出到你終端(terminal)的內容著色,以便你可以憑直觀進行快速、簡單地分析,有許多選項能幫你將顏色調成你喜好的。

color.ui

Git 會按照你的需要,自動為大部分的輸出加上顏色。你能明確地規定哪些需要著色、以及怎樣著色,設置 color.ui 為 true 來打開所有的預設終端著色:

$ git config --global color.ui true

設置好以後,當輸出到終端時,Git 會為之加上顏色。其他的參數還有 false 和 always,false 意味著不為輸出著色,而 always 則表示在任何情況下都要著色 -- 即使 Git 命令被重定向到文件或 pipe 到另一個命令。Git 1.5.5 版本引進了此項配置,如果你的版本更舊,你必須對顏色有關選項各自進行詳細地設置。

你會很少用到 color.ui = always,在大多數情況下,如果你想在被重定向的輸出中插入顏色碼,你可以傳遞 --color 旗標給 Git 命令來迫使它這麼做,color.ui = true 應該是你的首選。

color.*

想要具體指定哪些命令輸出需要被著色,以及怎樣著色,或者 Git 的版本很舊,你就要用到和具體命令有關的顏色配置選項,它們都能被設為 true、false 或 always:

color.branch
color.diff
color.interactive
color.status

除此之外,以上每個選項都有子選項,可以被用來覆蓋其父設置,以達到為輸出的各個部分著色的目的。例如,要讓 diff 輸出的改變資訊 (meta information) 以粗體、藍色前景和黑色背景的形式顯示,你可以執行:

$ git config --global color.diff.meta “blue black bold”

你能設置的顏色值如下:normal、black、red、green、yellow、blue、magenta、cyan、white。正如以上例子設置的粗體屬性,想要設置字體屬性的話,你的選擇有:bold、dim、ul、blink、reverse。

如果你想配置子選項的話,可以參考 git config 幫助頁。

外部的合併與比較工具

雖然 Git 自己實做了 diff,而且到目前為止你一直在使用它,但你能夠設定一個外部的工具來替代它。你還可以設定用一個圖形化的工具來合併和解決衝突,而不必自己手動解決。有一個不錯且免費的工具可以被用來做比較和合併工作,它就是 P4Merge(譯注:Perforce 圖形化合併工具),我會展示它的安裝過程。

所以如果你想試試看的話,因為 P4Merge 可以在所有主流平臺上運行,所以你應該可以嘗試看看。對於向你展示的例子,在 Mac 和 Linux 系統上,我會使用路徑名;在 Windows 上,/usr/local/bin 應該被改為你環境中的可執行路徑。

你可以在這裏下載 P4Merge:

http://www.perforce.com/perforce/downloads/component.html

首先,你要設定一個外部包裝腳本(external wrapper scripts)來執行你要的命令,我會使用 Mac 系統上的路徑來指定該腳本的位置;在其他系統上,它應該被放置在二進位檔案 p4merge 所在的目錄中。創建一個 merge 包裝腳本,名字叫作 extMerge,讓它附帶所有參數呼叫 p4merge 二進位檔案:

$ cat /usr/local/bin/extMerge
#!/bin/sh
/Applications/p4merge.app/Contents/MacOS/p4merge $*

diff 包裝腳本首先確定傳遞過來7個參數,隨後把其中2個傳遞給你的 merge 包裝腳本,預設情況下,Git 會傳遞以下參數給 diff:

path old-file old-hex old-mode new-file new-hex new-mode

由於你僅僅需要 old-filenew-file 參數,用 diff 包裝腳本來傳遞它們吧。

$ cat /usr/local/bin/extDiff 
#!/bin/sh
[ $# -eq 7 ] && /usr/local/bin/extMerge "$2" "$5"

你還需要確認一下這兩個腳本是可執行的:

$ sudo chmod +x /usr/local/bin/extMerge 
$ sudo chmod +x /usr/local/bin/extDiff

現在來設定使用你自訂的比較和合併工具吧。這需要許多自訂設置:merge.tool 通知 Git 使用哪個合併工具;mergetool.*.cmd 規定命令如何執行;mergetool.trustExitCode 會通知 Git 該程式的退出碼(exit code)是否指示合併操作成功;diff.external 通知 Git 用什麼命令做比較。因此,你可以執行以下4條配置命令:

$ git config --global merge.tool extMerge
$ git config --global mergetool.extMerge.cmd \
    'extMerge "$BASE" "$LOCAL" "$REMOTE" "$MERGED"'
$ git config --global mergetool.trustExitCode false
$ git config --global diff.external extDiff

或者直接編輯 ~/.gitconfig 文件如下:

[merge]
  tool = extMerge
[mergetool "extMerge"]
  cmd = extMerge "$BASE" "$LOCAL" "$REMOTE" "$MERGED"
  trustExitCode = false
[diff]
  external = extDiff

設置完畢後,如果你像這樣執行 diff 命令:

$ git diff 32d1776b1^ 32d1776b1

不同於在命令列得到 diff 命令的輸出,Git 觸發了剛剛設置的 P4Merge,它看起來像圖7-1這樣:

Figure 7-1. P4Merge.

當你設法合併兩個分支,結果卻有衝突時,執行 git mergetool,Git 會啟用 P4Merge 讓你通過圖形介面來解決衝突。

設置包裝腳本的好處是你能簡單地改變 diff 和 merge 工具,例如把 extDiffextMerge 改成 KDiff3,要做的僅僅是編輯 extMerge 指令檔:

$ cat /usr/local/bin/extMerge
#!/bin/sh   
/Applications/kdiff3.app/Contents/MacOS/kdiff3 $*

現在 Git 會使用 KDiff3 來做比較、合併和解決衝突。

Git 預先設置了許多其他的合併和解決衝突的工具,而你不必設置 cmd。可以把合併工具設置為:kdiff3、opendiff、tkdiff、meld、xxdiff、emerge、vimdiff、gvimdiff。如果你不想用 KDiff3 來做 diff,只是想用它來合併,而且 kdiff3 命令也在你的路徑裏,那麼你可以執行:

$ git config --global merge.tool kdiff3

如果執行了以上命令,沒有設置 extMergeextDiff 檔,Git 會用 KDiff3 做合併,讓平常內建的比較工具來做比較。

格式化與空格

格式化與空格是許多開發人員在協同工作時,特別是在跨平臺情況下,遇到的令人頭疼的細小問題。在一些大家合作的工作或提交的補丁中,很容易因為編輯器安靜無聲地加入一些小空格,或者 Windows 程式師在跨平臺專案中的檔案行尾加入了回車分行符號(carriage return)。Git 的一些配置選項可以幫助解決這些問題。

core.autocrlf

如果你在 Windows 上寫程式,或者你不是用 Windows,但和其他在 Windows 上寫程式的人合作,在這些情況下,你可能會遇到換行符號的問題。這是因為 Windows 使用回車(carriage-return)和換行(linefeed)兩個字元來結束一行,而 Mac 和 Linux 只使用一個換行字元。雖然這是小問題,但它會極大地擾亂跨平臺協作。

Git 可以在你提交時自動地把換行符號 CRLF 轉換成 LF,而在簽出代碼時把 LF 轉換成 CRLF。用 core.autocrlf 來打開此項功能,如果是在Windows 系統上,把它設置成 true,這樣當 check out 程式的時候,LF 會被轉換成 CRLF:

$ git config --global core.autocrlf true

Linux 或 Mac 系統使用 LF 作為行結束符,因此你不希望 Git 在 check out 檔案時進行自動的轉換;但是,當一個以 CRLF 做為換行符號的檔案不小心被引入時,你肯定希望 Git 可以修正它。你可以把 core.autocrlf 設置成 input 來告訴 Git 在提交時把 CRLF 轉換成 LF,check out 時不轉換:

$ git config --global core.autocrlf input

這樣會在 Windows 系統上的 check out 檔案中保留 CRLF,而在 Mac 和 Linux 系統上,以及倉庫中保留 LF。

如果你是 Windows 程式師,且正在開發僅運行在 Windows 上的專案,可以設置 false 取消此功能,把 carriage returns 記錄在倉庫中:

$ git config --global core.autocrlf false

core.whitespace

Git 預先設置了一些選項來探測和修正空格問題,其中有四個主要選項,有2個預設開啟,2個預設關閉,你可以自由地打開或關閉它們。

預設開啟的2個選項是:trailing-space 會查找每行結尾的空格,space-before-tab 會查找每行開頭的定位字元前的空格。

預設關閉的2個選項是:indent-with-non-tab 會查找8個以上空格(非定位字元)開頭的行,cr-at-eol 告訴 Git carriage returns 是合法的。

設置 core.whitespace,按照你的需要來打開或關閉選項,設定值之間以逗號分隔。從設定字串裏把設定值去掉,就可以關閉這個設定,或是在設定值前面加上減號 - 也可以。例如,如果你想要打開除了 cr-at-eol 之外的所有選項,你可以這麼做:

$ git config --global core.whitespace \
    trailing-space,space-before-tab,indent-with-non-tab

當你執行 git diff 命令且為輸出著色時,Git 會偵測這些問題,因此你有可能在提交前修復它們。當你用 git apply 打補丁時,它也同樣會使用這些設定值來幫助你。你可以要 Git 警告你,如果正準備運用的補丁有特別的空白問題:

$ git apply --whitespace=warn <patch>

或者讓 Git 在打上補丁嘗試自動修正此問題:

$ git apply --whitespace=fix <patch>

這些選項也能運用於衍合。如果提交了有空格問題的檔但還沒推送到上游,你可以執行帶有 --whitespace=fix 選項的 rebase 來讓 Git 在重寫補丁時自動修正它們。

伺服器端配置

Git 伺服器端的配置選項並不多,但仍有一些有趣的選項值得你一看。

receive.fsckObjects

Git 預設情況下不會在推送期間檢查所有物件的一致性。Git 雖然會檢查確認每個物件仍然符合它的 SHA-1 checksum, 所指向的物件也都是有效的,但是預設 Git 不會在每次推送時都做這種檢查。對於 Git 來說,倉庫或推送的檔越大,這個操作代價就相對越高,每次推送會消耗更多時間。如果想讓 Git 在每次推送時都檢查物件一致性,可以設定 receive.fsckObjects 為 true 來強迫它這麼做:

$ git config --system receive.fsckObjects true

現在 Git 會在每次推送被接受前檢查庫的完整性,確保有問題的用戶端沒有引入破壞性的資料。

receive.denyNonFastForwards

如果對已經被推送的提交歷史做衍合,繼而再推送;或是要將某個提交推送到遠端分支,而該提交歷史未包含這個遠端分支目前指向的 commit,這樣的推送會被拒絕。這通常是個很好的禁止策略,但有時你在做衍合的時候,你可能很確定自己在做什麼,那就可以在 push 命令後加 -f 旗標來強制更新遠端分支。

要禁用這樣的強制更新遠端分支 non-fast-forward references 的功能,可以如下設定 receive.denyNonFastForwards

$ git config --system receive.denyNonFastForwards true

稍後你會看到,用伺服器端的 receive hooks 也能達到同樣的目的。這個方法可以做更細緻的控制,例如:拒絕某些特定的使用者強制更新 non-fast-forwards。

receive.denyDeletes

避開 denyNonFastForwards 策略的方法之一就是使用者刪除分支,然後推回新的引用(reference)。在更新的 Git 版本中(從1.6.1版本開始),你可以把 receive.denyDeletes 設置為 true:

$ git config --system receive.denyDeletes true

這樣會在推送過程中阻止刪除分支和標籤 — 沒有使用者能夠這麼做。要刪除遠端分支,必須從伺服器手動刪除引用檔(ref files)。通過用戶存取控制清單也能這麼做,在本章結尾將會介紹這些有趣的方式。

Git 屬性

一些設定值(settings)也能指定到特定的路徑,這樣,Git 只對這個特定的子目錄或某些檔案應用這些設定值。這些針對特定路徑的設定值被稱為 Git 屬性(attributes),可以在你目錄中的 .gitattributes 檔內進行設置(通常是你專案的根目錄),當你不想讓這些屬性檔和專案檔案一同提交時,也可以在 .git/info/attributes 檔進行設置。

使用屬性,你可以對個別檔案或目錄定義不同的合併策略,讓 Git 知道怎樣比較非文字檔,在你提交或簽出(check out)前讓 Git 過濾內容。你將在這個章節裏瞭解到能在自己的專案中使用的屬性,以及一些實例。

二進位檔案

你可以用 Git 屬性讓其知道哪些是二進位檔案(以防 Git 沒有識別出來),以及指示怎樣處理這些檔,這點很酷。例如,一些文字檔是由機器產生的,而且無法比較,而一些二進位檔案可以比較 — 你將會瞭解到怎樣讓 Git 識別這些檔。

識別二進位檔案

某些檔案看起來像是文字檔,但其實是看做為二進位資料。例如,在 Mac 上的 Xcode 專案含有一個以 .pbxproj 結尾的檔,它是由記錄設置項的 IDE 寫到磁碟的 JSON 資料集(純文字 javascript 資料類型)。雖然技術上看它是由 ASCII 字元組成的文字檔,但是你並不想這麼看它,因為它確實是一個輕量級資料庫 — 如果有兩個人改變了它,你沒辦法合併它們,diff 通常也幫不上忙,只有機器才能進行識別和操作,於是,你想把它當成二進位檔案。

To tell Git to treat all pbxproj files as binary data, add the following line to your .gitattributes file: 讓 Git 把所有 pbxproj 檔當成二進位檔案,在 .gitattributes 文件中加上下面這行:

*.pbxproj -crlf -diff

現在,Git 不會嘗試轉換和修正 CRLF(回車換行)問題;也不會當你在專案中執行 git show 或 git diff 時,嘗試比較不同的內容。在 Git 1.6 及之後的版本中,可以用一個巨集代替 -crlf -diff

*.pbxproj binary

Diffing Binary Files

在 Git 1.6 及以上版本中,你能利用 Git 屬性來有效地比較二進位檔案。可以設置 Git 把二進位資料轉換成文本格式,然後用一般 diff 來做比較。

這個特性很酷,而且鮮為人知,因此我會結合實例來講解。首先,你將使用這項技術來解決最令人頭疼的問題之一:對 Word 文檔進行版本控制。每個人都知道 Word 是最可怕的編輯器,奇怪的是,每個人都在使用它。如果想對 Word 文件進行版本控制,你可以把檔案加入到 Git 倉庫中,每次修改後提交即可。但這樣做有什麼好處?如果你像平常一樣執行 git diff 命令,你只能得到如下的結果:

$ git diff 
diff --git a/chapter1.doc b/chapter1.doc
index 88839c4..4afcb7c 100644
Binary files a/chapter1.doc and b/chapter1.doc differ

你不能直接比較兩個 Word 文件版本,除非人工細看,對吧?Git 屬性能很好地解決此問題,把下面這行加到 .gitattributes 文件:

*.doc diff=word

當你要看比較結果時,如果檔副檔名是 ”doc”,Git 會使用 ”word” 篩檢程式(filter)。什麼是 ”word” 篩檢程式呢?你必須設置它。下面你將設定 Git 使用 strings 程式,把 Word 文檔轉換成可讀的文字檔,之後再進行比較:

$ git config diff.word.textconv strings

現在 Git 知道了,如果它要在在兩個快照之間做比較,而其中任何一個檔檔名是以 .doc 結尾,它應該要對這些檔執行 ”word” 篩檢程式,也就是定義為執行 strings 程式。這樣就可以在比較前把 Word 檔轉換成文字檔。

下面展示了一個實例,我把此書的第一章納入 Git 管理,在一個段落中加入了一些文字後保存,之後執行 git diff 命令,得到結果如下:

$ git diff
diff --git a/chapter1.doc b/chapter1.doc
index c1c8a0a..b93c9e4 100644
--- a/chapter1.doc
+++ b/chapter1.doc
@@ -8,7 +8,8 @@ re going to cover Version Control Systems (VCS) and Git basics
 re going to cover how to get it and set it up for the first time if you don
 t already have it on your system.
 In Chapter Two we will go over basic Git usage - how to use Git for the 80% 
-s going on, modify stuff and contribute changes. If the book spontaneously 
+s going on, modify stuff and contribute changes. If the book spontaneously 
+Let's see if this works.

Git 成功且簡潔地顯示出我增加的文字 ”Let’s see if this works”。雖然有些瑕疵 -- 在末尾顯示了一些隨機的內容 -- 但確實可以比較了。如果你能找到或自己寫個 Word 到純文字的轉換器的話,效果可能會更好。不過因為 strings 可以在大部分 Mac 和 Linux 系統上運行,所以在初次嘗試對各種二進位格式檔進行類似的處理,它是個不錯的選擇。

你還能用這個方法解決另一個有趣的問題:比較影像檔。方法之一是對 JPEG 檔執行一個篩檢程式,把 EXIF 資訊捉取出來 — EXIF 資訊是記錄在大部分圖像格式裏面的 metadata。如果你下載並安裝了 exiftool 程式,可以用它把圖檔的 metadata 轉換成文本,於是至少 diff 可以用文字呈現的方式向你展示發生了哪些修改:

$ echo '*.png diff=exif' >> .gitattributes
$ git config diff.exif.textconv exiftool

如果你把專案中的一個影像檔替換成另一個,然後執行 git diff 命令的結果如下:

diff --git a/image.png b/image.png
index 88839c4..4afcb7c 100644
--- a/image.png
+++ b/image.png
@@ -1,12 +1,12 @@
 ExifTool Version Number         : 7.74
-File Size                       : 70 kB
-File Modification Date/Time     : 2009:04:21 07:02:45-07:00
+File Size                       : 94 kB
+File Modification Date/Time     : 2009:04:21 07:02:43-07:00
 File Type                       : PNG
 MIME Type                       : image/png
-Image Width                     : 1058
-Image Height                    : 889
+Image Width                     : 1056
+Image Height                    : 827
 Bit Depth                       : 8
 Color Type                      : RGB with Alpha

你可以很容易看出來,檔案的大小跟影像的尺寸都發生了改變。

關鍵字擴展

使用 SVN 或 CVS 的開發人員經常要求關鍵字擴展。這在 Git 中主要的問題是,你無法在一個檔案被提交後再修改它,因為 Git 會先對該檔計算 checksum。然而,你可以在檔案 check out 之後注入(inject)一些文字,然後在提交前再把它移除。Git 屬性提供了兩種方式來進行。

首先,你可以把某個 blob 的 SHA-1 checksum 自動注入檔案的 $Id$ 欄位。如果在一個或多個檔案上設置了此欄位,當下次你 check out 該分支的時候,Git 會用 blob 的 SHA-1 值替換那個欄位。注意,這不是 commit 物件的 SHA,而是 blob 本身的:

$ echo '*.txt ident' >> .gitattributes
$ echo '$Id$' > test.txt

下次 check out 這個檔案的時候,Git 注入了 blob 的 SHA 值:

$ rm text.txt
$ git checkout -- text.txt
$ cat test.txt 
$Id: 42812b7653c7b88933f8a9d6cad0ca16714b9bb3 $

然而,這個結果的用處有限。如果你在 CVS 或 Subversion 中用過關鍵字替換,你可以包含一個日期值 -- 而這個 SHA 值沒什麼幫助,因為它相當地隨機,也無法區分某個 SHA 跟另一個 SHA 比起來是比較新或是比較舊。

因此,你可以撰寫自己的篩檢程式,在提交或 checkout 文件時替換關鍵字。有兩種篩檢程式,”clean” 和 ”smudge”。在 .gitattributes 檔中,你能對特定的路徑設置一個篩檢程式,然後設置處理檔案的腳本,這些腳本會在檔案 check out 前(”smudge”,見圖 7-2)和提交前(”clean”,見圖7-3)被執行。這些篩檢程式能夠做各種有趣的事。

Figure 7-2. “smudge” filter 在 checkout 時執行

Figure 7-3. “clean” filter 在檔案被 staged 的時候執行

這裡舉一個簡單的例子:在提交前,用 indent(縮進)程式過濾所有C原始程式碼。在 .gitattributes 檔中設置 ”indent” 篩檢程式過濾 *.c 文件:

*.c     filter=indent

然後,通過以下配置,讓 Git 知道 ”indent” 篩檢程式在遇到 ”smudge” 和 ”clean” 時分別該做什麼:

$ git config --global filter.indent.clean indent
$ git config --global filter.indent.smudge cat

於是,當你提交 *.c 檔時,indent 程式會被觸發,在把它們 check out 之前,cat 程式會被觸發。但 cat 程式在這裡沒什麼實際作用。這樣的組合,使C原始程式碼在提交前被 indent 程式過濾,非常有效。

另一個有趣的例子是類似 RCS 的 $Date$ 關鍵字擴展。為了演示,需要一個小腳本,接受檔案名參數,得到專案的最新提交日期,最後把日期寫入該檔。下面用 Ruby 腳本來實現:

#! /usr/bin/env ruby
data = STDIN.read
last_date = `git log --pretty=format:"%ad" -1`
puts data.gsub('$Date$', '$Date: ' + last_date.to_s + '$')

該腳本從 git log 命令中得到最新提交日期,找到檔案中所有的 $Date$ 字串,最後把該日期填到 $Date$ 字串中 — 此腳本很簡單,你可以選擇你喜歡的程式設計語言來實現。把該腳本命名為 expand_date,放到正確的路徑中,之後需要在 Git 中設置一個篩檢程式(dater),讓它在 check ou 檔案時使用 expand_date,在提交時用 Perl 清除之:

$ git config filter.dater.smudge expand_date
$ git config filter.dater.clean 'perl -pe "s/\\\$Date[^\\\$]*\\\$/\\\$Date\\\$/"'

這個 Perl 小程式會刪除 $Date$ 字串裡多餘的字元,恢復 $Date$ 原貌。到目前為止,你的篩檢程式已經設置完畢,可以開始測試了。打開一個檔,在檔中輸入 $Date$ 關鍵字,然後設置 Git 屬性:

$ echo '# $Date$' > date_test.txt
$ echo 'date*.txt filter=dater' >> .gitattributes

如果把這些修改提交,之後再 check out,你會發現關鍵字被替換了:

$ git add date_test.txt .gitattributes
$ git commit -m "Testing date expansion in Git"
$ rm date_test.txt
$ git checkout date_test.txt
$ cat date_test.txt
# $Date: Tue Apr 21 07:26:52 2009 -0700$

雖說這項技術對自訂應用來說很有用,但還是要小心,因為 .gitattributes 檔會隨著專案一起提交,而篩檢程式(例如:dater)不會,所以,它不會在所有地方都成功運作。當你在設計這些篩檢程式時要注意,即使它們無法正常工作,也要讓整個專案運作下去。

匯出倉庫

Git 屬性在將專案匯出歸檔(archive)時也能發揮作用。

export-ignore

當產生一個 archive 時,可以告訴 Git 不要匯出某些檔案或目錄。如果你不想在 archive 中包含一個子目錄或檔案,但想將他們納入專案的版本管理中,你能對應地設置 export-ignore 屬性。

例如,在 test/ 子目錄中有一些測試檔,在專案的壓縮包中包含他們是沒有意義的。因此,可以增加下面這行到 Git 屬性檔中:

test/ export-ignore

現在,當運行 git archive 來創建專案的壓縮包時,那個目錄不會在 archive 中出現。

export-subst

還能對 archives 做一些簡單的關鍵字替換。在第2章中已經可以看到,可以以 --pretty=format 形式的簡碼在任何檔中放入 $Format:$ 字串。例如,如果想在專案中包含一個叫作 LAST_COMMIT 的檔,當運行 git archive 時,最後提交日期自動地注入進該檔,可以這樣設置:

$ echo 'Last commit date: $Format:%cd$' > LAST_COMMIT
$ echo "LAST_COMMIT export-subst" >> .gitattributes
$ git add LAST_COMMIT .gitattributes
$ git commit -am 'adding LAST_COMMIT file for archives'

執行 git archive 後,打開該檔,會發現其內容如下:

$ cat LAST_COMMIT
Last commit date: $Format:Tue Apr 21 08:38:48 2009 -0700$

合併策略

You can also use Git attributes to tell Git to use different merge strategies for specific files in your project. One very useful option is to tell Git to not try to merge specific files when they have conflicts, but rather to use your side of the merge over someone else’s. 通過 Git 屬性,還能對專案中的特定檔案使用不同的合併策略。一個非常有用的選項就是,當一些特定檔案發生衝突,Git 不會嘗試合併他們,而使用你這邊的來覆蓋別人的。

如果專案的一個分支有歧義或比較特別,但你想從該分支合併,而且需要忽略其中某些檔,這樣的合併策略是有用的。例如,你有一個資料庫設置檔 database.xml,在兩個分支中他們是不同的,你想合併一個分支到另一個,而不弄亂該資料庫檔,可以設置屬性如下:

database.xml merge=ours

如果合併到另一個分支,database.xml 檔不會有合併衝突,顯示如下:

$ git merge topic
Auto-merging database.xml
Merge made by recursive.

這樣,database.xml會保持原樣。

Git Hooks

和其他版本控制系統一樣,當某些重要事件發生時,Git 有方法可以觸發自訂腳本。有兩組掛鉤(hooks):用戶端和伺服器端。用戶端掛鉤用於用戶端的操作,如提交和合併。伺服器端掛鉤用於 Git 伺服器端的操作,如接收被推送的提交。你可以為了各種不同的原因使用這些掛鉤,下面會講解其中一些。

安裝一個 Hook

掛鉤都被儲存在 Git 目錄下的 hooks 子目錄中,即大部分專案中預設的 .git/hooks。Git 預設會放置一些腳本範例在這個目錄中,除了可以作為掛鉤使用,這些樣本本身是可以獨立使用的。所有的樣本都是 shell 腳本,其中一些還包含了 Perl 的腳本,不過,任何正確命名的可執行腳本都可以正常使用 — 可以用 Ruby 或 Python,或其他。在 Git 1.6 版本之後,這些樣本檔名都是以 .sample 結尾,因此,你必須重新命名。在 Git 1.6 版本之前,這些樣本名都是正確的,但這些樣本不是可執行檔。

把一個正確命名且可執行的檔放入 Git 目錄下的 hooks 子目錄中,可以啟動該掛鉤腳本,之後他一直會被 Git 呼叫。隨後會講解主要的掛鉤腳本。

用戶端掛鉤

有許多用戶端掛鉤,以下把他們分為:提交工作流程掛鉤、電子郵件工作流程掛鉤及其他用戶端掛鉤。

提交工作流程掛鉤

有四個掛鉤被用來處理提交的過程。pre-commit 掛鉤在鍵入提交資訊前運行,被用來檢查即將提交的快照,例如,檢查是否有東西被遺漏,確認測試是否運行,以及檢查代碼。當從該掛鉤返回非零值時,Git 放棄此次提交,但可以用 git commit --no-verify 來忽略。該掛鉤可以被用來檢查程式碼樣式(運行類似 lint 的程式),檢查尾部空白(預設掛鉤是這麼做的),檢查新方法(簡體中文版譯注:程式的函數)的說明。

prepare-commit-msg 掛鉤在提交資訊編輯器顯示之前,預設資訊被創建之後執行。因此,可以有機會在提交作者看到預設資訊前進行編輯。該掛鉤接收一些選項:擁有提交資訊的檔案路徑,提交類型,以及提交的 SHA-1 (如果這是一個 amended 提交)。該掛鉤對通常的提交來說不是很有用,只在自動產生的預設提交資訊的情況下有作用,如提交資訊範本、合併、壓縮和 amended 提交等。可以和提交範本配合使用,以程式設計的方式插入資訊。

commit-msg 掛鉤接收一個參數,此參數是包含最近提交資訊的暫存檔路徑。如果該掛鉤腳本以非零退出,Git 會放棄提交,因此,可以用來在提交通過前驗證專案狀態或提交資訊。本章上一小節已經展示了使用該掛鉤核對提交資訊是否符合特定的模式。

post-commit 掛鉤在整個提交過程完成後運行,他不會接收任何參數,但可以執行 git log -1 HEAD 來獲得最後的提交資訊。總之,該掛鉤是作為通知之類使用的。

提交工作流程的用戶端掛鉤腳本可以在任何工作流程中使用,他們經常被用來實施某些策略,但值得注意的是,這些腳本在 clone 期間不會被傳送。可以在伺服器端實施策略來拒絕不符合某些策略的推送,但這完全取決於開發者在用戶端使用這些腳本的情況。所以,這些腳本對開發者是有用的,由他們自己設置和維護,而且在任何時候都可以覆蓋或修改這些腳本。

E-mail 工作流掛鉤

有三個可用的用戶端掛鉤用於 e-mail工作流。當運行 git am 命令時,會呼叫他們,因此,如果你沒有在工作流中用到此命令,可以跳過本節。如果你通過 e-mail 接收由 git format-patch 產生的補丁,這些掛鉤也許對你有用。

首先執行的是 applypatch-msg 掛鉤,他接收一個參數:包含被建議提交資訊的暫存檔案名。如果該腳本以非零值退出,Git 將放棄此補丁。可以使用這個腳本確認提交資訊是否被正確格式化,或讓腳本把提交訊息編輯為正規化。

下一個當透過 git am 應用補丁時執行的是 pre-applypatch 掛鉤。該掛鉤不接收參數,在補丁被應用之後執行,因此,可以被用來在提交前檢查快照。你能用此腳本執行測試,檢查工作樹。如果有些什麼遺漏,或測試沒通過,腳本會以非零退出,放棄此次 git am 的運行,補丁不會被提交。

最後在 git am 操作期間執行的掛鉤是 post-applypatch。你可以用他來通知一個小組或該補丁的作者,但無法使用此腳本阻止打補丁的過程。

其他用戶端掛鉤

pre-rebase 掛鉤在衍合前執行,腳本以非零退出可以中止衍合的過程。你可以使用這個掛鉤來禁止衍合已經推送的提交物件,Git 所安裝的 pre-rebase 掛鉤範例就是這麼做的,不過它假定 next 是你定義的分支名。因此,你可能要修改樣本,把 next 改成你定義過且穩定的分支名。

git checkout 成功執行後會執行 post-checkout 掛鉤。他可以用來為你的專案環境設置合適的工作目錄。例如:放入大的二進位檔案、自動產生的文檔或其他一切你不想納入版本控制的檔。

最後,在 merge 命令成功執行後會執行 post-merge 掛鉤。他可以用來在 Git 無法跟蹤的工作樹中恢復資料,例如許可權資料。該掛鉤同樣能夠驗證在 Git 控制之外的檔是否存在,當工作樹改變時,你希望可以複製進來的檔案。

伺服器端掛鉤

除了用戶端掛鉤,作為系統管理員,你還可以使用兩個伺服器端的掛鉤對專案實施各種類型的策略。這些掛鉤腳本可以在提交物件推送到伺服器前執行,也可以在推送到伺服器後執行。推送到伺服器前執行的掛鉤(pre hooks)可以在任何時候以非零退出,拒絕推送、傳回錯誤訊息給用戶端;還可以如你所願設置足夠複雜的推送策略。

pre-receive and post-receive

處理來自用戶端的推送(push)操作時最先執行的腳本就是 pre-receive 。它從標準輸入(stdin)獲取被推送的引用(references)列表;如果它退出時的返回值不是0,那麼所有推送內容都不會被接受。利用此掛鉤腳本可以實現類似保證被更新的索引(references)都不是 non-fast-forward 類型;抑或檢查執行推送操作的用戶擁有創建、刪除或者推送的許可權,或者他是否對將要修改的每一個檔都有存取權限。

post-receive 掛鉤在整個過程完結以後執行,可以用來更新其他系統服務或者通知使用者。它接受與 pre-receive 相同的標準輸入資料。應用實例包括給某郵寄清單發信,通知即時整合資料的伺服器,或者更新軟體專案的問題追蹤系統 —— 甚至可以通過分析提交資訊來決定某個問題是否應該被開啟、修改或結案。該腳本無法停止推送程序,不過用戶端在它完成之前將保持連接狀態;所以在用它作一些長時間的操作之前請三思。

update

update 腳本和 pre-receive 腳本十分類似,除了它會為推送者更新的每一個分支運行一次。假如推送者同時向多個分支推送內容,pre-receive 只執行一次,相較之下 update 則會為每一個更新的分支運行一次。它不會從標準輸入讀取內容,而是接受三個參數:索引(reference)的名字(分支),推送前索引指向的內容的 SHA-1 值,以及使用者試圖推送內容的 SHA-1 值。如果 update 腳本退出時返回非零值,只有相應的那一個索引會被拒絕;其餘的依然會得到更新。

Git 強制策略實例

在本節中,我們應用前面學到的知識建立這樣一個 Git 工作流程:檢查提交資訊的格式,只接受純 fast-forward 內容的推送,並且指定專案中的某些特定用戶只能修改某些特定子目錄。我們將撰寫一個用戶端腳本來提示開發人員他們推送的內容是否會被拒絕,以及一個伺服端腳本來實際執行這些策略。

我使用 Ruby 來撰寫這些腳本,一方面因為它是我喜好的指令碼語言(scripting language),也因為我覺得它是最接近偽代碼(pseudocode-looking)的指令碼語言;因而即便你不使用 Ruby 也能大致看懂。不過任何其他語言也一樣適用。所有 Git 自帶的範例腳本都是用 Perl 或 Bash 寫的,所以從這些腳本中能找到相當多的這兩種語言的掛鉤範例。

服務端掛鉤

所有服務端的工作都在 hooks(掛鉤)目錄的 update(更新)腳本中制定。update 腳本為每一個得到推送的分支運行一次;它接受推送目標的索引(reference)、該分支原來指向的位置、以及被推送的新內容。如果推送是通過 SSH 進行的,還可以獲取發出此次操作的用戶。如果設定所有操作都通過公鑰授權的單一帳號(比如"git")進行,就有必要通過一個 shell 包裝(wrapper)依據公鑰來判斷用戶的身份,並且設定環境變數來表示該使用者的身份。下面假設嘗試連接的使用者儲存在 $USER 環境變數裡,我們的 update 腳本首先搜集一切需要的資訊:

#!/usr/bin/env ruby

$refname = ARGV[0]
$oldrev  = ARGV[1]
$newrev  = ARGV[2]
$user    = ENV['USER']

puts "Enforcing Policies... \n(#{$refname}) (#{$oldrev[0,6]}) (#{$newrev[0,6]})"

沒錯,我在用全域變數。別鄙視我——這樣比較利於演示過程。

強制特定的提交資訊格式

我們的第一項任務是指定每一條提交資訊都必須遵循某種特殊的格式。只是設定一個目標,假定每一條資訊必須包含一條形似 “ref: 1234” 這樣的字串,因為我們需要把每一次提交連結到專案問題追蹤系統裏面的工作項目。我們要逐一檢查每一條推送上來的提交內容,看看提交資訊是否包含這麼一個字串,然後,如果該提交裡不包含這個字串,以非零返回值退出從而拒絕此次推送。

$newrev$oldrev 變數的值傳給一個叫做 git rev-list 的 Git plumbing 命令可以獲取所有提交內容 SHA-1 值的列表。git rev-list 基本上是個 git log 命令,但它預設只輸出 SHA-1 值而已,沒有其他資訊。所以要獲取由 SHA 值表示的從一次提交到另一次提交之間的所有 SHA 值,可以執行:

$ git rev-list 538c33..d14fc7
d14fc7c847ab946ec39590d87783c69b031bdfb7
9f585da4401b0a3999e84113824d15245c13f0be
234071a1be950e2a8d078e6141f5cd20c1e61ad3
dfa04c9ef3d5197182f13fb5b9b1fb7717d2222a
17716ec0f1ff5c77eff40b7fe912f9f6cfd0e475

取得這些輸出內容,迴圈遍歷其中每一個提交的 SHA 值,找出與之對應的提交資訊,然後用規則運算式(regular expression)來測試該資訊是否符合某個 pattern。

下面要搞定如何從所有的提交內容中提取出提交資訊。使用另一個叫做 git cat-file 的 Git plumbing 工具可以獲得原始的提交資料。我們將在第九章瞭解到這些 plumbing 工具的細節;現在暫時先看一下這條命令會給你什麼:

$ git cat-file commit ca82a6
tree cfda3bf379e4f8dba8717dee55aab78aef7f4daf
parent 085bb3bcb608e1e8451d4b2432f8ecbe6306e7e7
author Scott Chacon <schacon@gmail.com> 1205815931 -0700
committer Scott Chacon <schacon@gmail.com> 1240030591 -0700

changed the version number

A simple way to get the commit message from a commit when you have the SHA-1 value is to go to the first blank line and take everything after that. You can do so with the sed command on Unix systems: 通過 SHA-1 值獲得提交內容中的提交資訊的一個簡單辦法是找到提交的第一個空白行,然後取出它之後的所有內容。可以使用 Unix 系統的 sed 命令來實現這個效果:

$ git cat-file commit ca82a6 | sed '1,/^$/d'
changed the version number

這條咒語從每一個待提交內容裡提取提交訊息,並且會在提交訊息不符合要求的情況下退出。為了退出腳本和拒絕此次推送,返回一個非零值。整個 method 大致如下:

$regex = /\[ref: (\d+)\]/

# enforced custom commit message format
def check_message_format
  missed_revs = `git rev-list #{$oldrev}..#{$newrev}`.split("\n")
  missed_revs.each do |rev|
    message = `git cat-file commit #{rev} | sed '1,/^$/d'`
    if !$regex.match(message)
      puts "[POLICY] Your message is not formatted correctly"
      exit 1
    end
  end
end
check_message_format

把這一段放在 update 腳本裡,所有包含不符合指定規則的提交都會遭到拒絕。

實現基於使用者的存取權限控制清單(ACL)系統

假設你需要添加一個使用存取權限控制列表 (access control list, ACL) 的機制來指定哪些使用者對專案的哪些部分有推送許可權。某些使用者具有全部的存取權,其他人只對某些子目錄或者某些特定的檔案具有推送許可權。要搞定這一點,所有的規則將被寫入一個位於伺服器的原始 Git 倉庫的 acl 檔。我們讓 update 掛鉤檢閱這些規則,審視推送的提交內容中需要修改的所有檔案,然後判定執行推送的用戶是否對所有這些檔案都有許可權。

我們首先要創建這個列表。這裡使用的格式和 CVS 的 ACL 機制十分類似:它由若干行構成,第一欄的內容是 avail 或者 unavail;下一欄是由逗號分隔的使用者清單,列出這條規則會對哪些使用者生效;最後一欄是這條規則會對哪個目錄生效(空白表示開放訪問)。這些欄位由 pipe (|) 字元隔開。

下例中,我們指定幾個管理員,幾個對 doc 目錄具有許可權的文件作者,以及一個只對 libtests 目錄具有許可權的開發人員,ACL 檔看起來像這樣:

avail|nickh,pjhyett,defunkt,tpw
avail|usinclair,cdickens,ebronte|doc
avail|schacon|lib
avail|schacon|tests

首先把這些資料讀入到你所能使用的資料結構中。本例中,為保持簡潔,我們暫時只實做 avail 的規則(譯注:也就是省略了 unavail 部分)。下面這個 method 產生一個關聯式陣列,它的主鍵是用戶名,對應的值是一個該用戶有寫入許可權的所有目錄組成的陣列:

def get_acl_access_data(acl_file)
  # read in ACL data
  acl_file = File.read(acl_file).split("\n").reject { |line| line == '' }
  access = {}
  acl_file.each do |line|
    avail, users, path = line.split('|')
    next unless avail == 'avail'
    users.split(',').each do |user|
      access[user] ||= []
      access[user] << path
    end
  end
  access
end

針對之前給出的 ACL 規則檔,這個 get_acl_access_data method 回傳的資料結構如下:

{"defunkt"=>[nil],
 "tpw"=>[nil],
 "nickh"=>[nil],
 "pjhyett"=>[nil],
 "schacon"=>["lib", "tests"],
 "cdickens"=>["doc"],
 "usinclair"=>["doc"],
 "ebronte"=>["doc"]}

搞定了使用者許可權的資料,下面需要找出這次推送的提交之中,哪些位置被修改,從而確保試圖推送的使用者對這些位置都有許可權。

使用 git log--name-only 選項(在第二章裡簡單的提過)我們可以輕而易舉的找出一次提交裡有哪些被修改的檔案:

$ git log -1 --name-only --pretty=format:'' 9f585d

README
lib/test.rb

使用 get_acl_access_data 回傳的 ACL 結構來一一核對每一次提交修改的檔案列表,就能判定該用戶是否有許可權推送所有的提交內容:

# only allows certain users to modify certain subdirectories in a project
def check_directory_perms
  access = get_acl_access_data('acl')

  # see if anyone is trying to push something they can't
  new_commits = `git rev-list #{$oldrev}..#{$newrev}`.split("\n")
  new_commits.each do |rev|
    files_modified = `git log -1 --name-only --pretty=format:'' #{rev}`.split("\n")
    files_modified.each do |path|
      next if path.size == 0
      has_file_access = false
      access[$user].each do |access_path|
        if !access_path  # user has access to everything
          || (path.index(access_path) == 0) # access to this path
          has_file_access = true 
        end
      end
      if !has_file_access
        puts "[POLICY] You do not have access to push to #{path}"
        exit 1
      end
    end
  end  
end

check_directory_perms

以上的大部分內容應該還算容易理解。通過 git rev-list 獲取推送到伺服器的提交清單。然後,針對其中每一項,找出它試圖修改的檔案,然後確保執行推送的用戶對這些檔案具有許可權。一個不太容易理解的 Ruby 技巧是 path.index(access_path) ==0 這句,如果路徑以 access_path 開頭,它會回傳 True——這是為了確保 access_path 並不是只在允許的路徑之一,而是所有准許全選的目錄都在該目錄之下。

現在,如果提交資訊的格式不對的話,或是修改的檔案在允許的路徑之外的話,你的用戶就不能推送這些提交。

只允許 Fast-Forward 類型的推送

剩下的最後一項任務是指定只接受 fast-forward 的推送。在 Git 1.6 或者較新的版本裡,只需要設定 receive.denyDeletesreceive.denyNonFastForwards 選項就可以了。但是用掛鉤來實做這個功能,便可以在舊版本的 Git 上運作,並且通過一定的修改,它可以做到只針對某些用戶執行,或者更多以後可能用到的規則。

檢查的邏輯是看看是否有任何的提交在舊版本(revision)裡能找到、但在新版本裡卻找不到。如果沒有,那這是一次純 fast-forward 的推送;如果有,那我們拒絕此次推送:

# enforces fast-forward only pushes 
def check_fast_forward
  missed_refs = `git rev-list #{$newrev}..#{$oldrev}`
  missed_ref_count = missed_refs.split("\n").size
  if missed_ref_count > 0
    puts "[POLICY] Cannot push a non fast-forward reference"
    exit 1
  end
end

check_fast_forward

一切都設定好了。如果現在執行 chmod u+x .git/hooks/update —— 這是包含以上內容的案,我們修改它的許可權,然後嘗試推送一個包含非 fast-forward 類型的索引,會得到類似如下:

$ git push -f origin master
Counting objects: 5, done.
Compressing objects: 100% (3/3), done.
Writing objects: 100% (3/3), 323 bytes, done.
Total 3 (delta 1), reused 0 (delta 0)
Unpacking objects: 100% (3/3), done.
Enforcing Policies... 
(refs/heads/master) (8338c5) (c5b616)
[POLICY] Cannot push a non-fast-forward reference
error: hooks/update exited with error code 1
error: hook declined to update refs/heads/master
To git@gitserver:project.git
 ! [remote rejected] master -> master (hook declined)
error: failed to push some refs to 'git@gitserver:project.git'

這裡有幾個有趣的資訊。首先,我們可以看到掛鉤執行的起點:

Enforcing Policies... 
(refs/heads/master) (fb8c72) (c56860)

注意這是你在 update 腳本一開頭的地方印出到標準輸出的東西。所有從腳本印出到 stdout 的東西都會發送到用戶端,這點很重要。

下一個值得注意的部分是錯誤資訊。

[POLICY] Cannot push a non fast-forward reference
error: hooks/update exited with error code 1
error: hook declined to update refs/heads/master

第一行是我們的腳本輸出的,在往下是 Git 在告訴我們 update 腳本退出時傳回了非零值,因而推送遭到了拒絕。最後一點:

To git@gitserver:project.git
 ! [remote rejected] master -> master (hook declined)
error: failed to push some refs to 'git@gitserver:project.git'

每一個被掛鉤拒絕的索引(reference),你都會看到一條遠端拒絕訊息,解釋它被拒絕是因為一個掛鉤失敗的原因。

而且,如果 ref 標記字串(譯註: 例如 ref: 1234)沒有包含在任何的提交裡,我們將看到前面腳本裡印出的錯誤資訊:

[POLICY] Your message is not formatted correctly

又或者某人想修改一個自己不具備許可權的檔,然後推送了一個包含它的提交,他將看到類似的提示。比如,一個文件作者嘗試推送一個修改了 lib 目錄下某些東西的提交,他會看到

[POLICY] You do not have access to push to lib/test.rb

都做好了。從這裡開始,只要 update 腳本存在並且可執行,我們的倉庫永遠都不會遭到回轉(rewound),或者包含不符合要求資訊的提交內容,並且使用者都被鎖在了沙箱裡面。

用戶端掛鉤

這種方法的缺點在於使用者推送內容遭到拒絕後幾乎無法避免的抱怨。辛辛苦苦寫成的代碼在最後時刻慘遭拒絕是令人十分沮喪、迷惑的;更可憐的是他們不得不修改提交歷史來解決問題,有時候這可不是隨便哪個人都做得來的。

這種兩難境地的解答是提供一些用戶端的掛鉤,讓使用者可以用來在他們作出伺服器可能會拒絕的事情時給以警告。這樣的話,用戶們就能在提交--問題變得更難修正之前修正問題。由於掛鉤本身不跟隨 clone 的專案副本分發,所以必須通過其他途徑把這些掛鉤分發到用戶的 .git/hooks 目錄並設為可執行檔。雖然可以在專案裏或用另一個專案分發這些掛鉤,不過全自動的解決方案是不存在的。

首先,你應該在每次提交前檢查你的提交說明訊息,這樣你才能確保伺服器不會因為不合格式的提交說明訊息而拒絕你的更改。為了達到這個目的,你可以增加 commit-msg 掛鉤。如果你使用該掛鉤來讀取第一個參數傳遞的檔案裏的訊息,並且與規定的模式(pattern)作對比,你就可以使 Git 在提交說明訊息不符合條件的情況下,拒絕執行提交。

#!/usr/bin/env ruby
message_file = ARGV[0]
message = File.read(message_file)

$regex = /\[ref: (\d+)\]/

if !$regex.match(message)
  puts "[POLICY] Your message is not formatted correctly"
  exit 1
end

如果這個腳本放在這個位置 (.git/hooks/commit-msg) 並且是可執行的, 而你的提交說明訊息沒有做適當的格式化,你會看到:

$ git commit -am 'test'
[POLICY] Your message is not formatted correctly

在這個實例中,提交沒有成功。然而如果你的提交說明訊息符合要求的,Git 會允許你提交:

$ git commit -am 'test [ref: 132]'
[master e05c914] test [ref: 132]
 1 files changed, 1 insertions(+), 0 deletions(-)

接下來我們要保證沒有修改到 ACL 允許範圍之外的檔案。如果你的專案 .git 目錄裡有前面使用過的 ACL 檔,那麼以下的 pre-commit 腳本將執行裡面的限制規定:

#!/usr/bin/env ruby

$user    = ENV['USER']

# [ insert acl_access_data method from above ]

# only allows certain users to modify certain subdirectories in a project
def check_directory_perms
  access = get_acl_access_data('.git/acl')

  files_modified = `git diff-index --cached --name-only HEAD`.split("\n")
  files_modified.each do |path|
    next if path.size == 0
    has_file_access = false
    access[$user].each do |access_path|
    if !access_path || (path.index(access_path) == 0)
      has_file_access = true
    end
    if !has_file_access
      puts "[POLICY] You do not have access to push to #{path}"
      exit 1
    end
  end
end

check_directory_perms

這和服務端的腳本幾乎一樣,除了兩個重要區別。第一,ACL 檔的位置不同,因為這個腳本在當前工作目錄執行,而非 Git 目錄。ACL 檔的目錄必須由這個

access = get_acl_access_data('acl')

改成這個:

access = get_acl_access_data('.git/acl')

另一個重要區別是獲取「被修改檔案清單」的方式。在服務端的時候使用了查看提交紀錄的方式,可是目前的提交都還沒被記錄下來呢,所以這個清單只能從暫存區域獲取。原來是這樣:

files_modified = `git log -1 --name-only --pretty=format:'' #{ref}`

現在要用這樣:

files_modified = `git diff-index --cached --name-only HEAD`

不同的就只有這兩點——除此之外,該腳本完全相同。一個小陷阱在於它假設在本地執行的帳戶和推送到遠端服務端的相同。如果這二者不一樣,則需要手動設置一下 $user 變數。

最後一項任務是檢查確認推送內容中不包含非 fast-forward 類型的索引(reference),不過這個需求比較少見。以下情況會得到一個非 fast-forward 類型的索引,要麼在某個已經推送過的提交上做衍合,要麼從本地不同分支推送到遠端相同的分支上。

既然伺服器會告訴你不能推送非 fast-forward 內容,而且上面的掛鉤也能阻止強制的推送,唯一剩下的潛在問題就是衍合已經推送過的提交內容。

下面是一個檢查這個問題的 pre-rabase 腳本的例子。它獲取一個所有即將重寫的提交內容的清單,然後檢查它們是否在遠端的索引(reference)裡已經存在。一旦發現某個提交可以從遠端索引裡衍變過來,它就放棄衍合操作:

#!/usr/bin/env ruby

base_branch = ARGV[0]
if ARGV[1]
  topic_branch = ARGV[1]
else
  topic_branch = "HEAD"
end

target_shas = `git rev-list #{base_branch}..#{topic_branch}`.split("\n")
remote_refs = `git branch -r`.split("\n").map { |r| r.strip }

target_shas.each do |sha|
  remote_refs.each do |remote_ref|
    shas_pushed = `git rev-list ^#{sha}^@ refs/remotes/#{remote_ref}`
    if shas_pushed.split(“\n”).include?(sha)
      puts "[POLICY] Commit #{sha} has already been pushed to #{remote_ref}"
      exit 1
    end
  end
end

這個腳本利用了一個第六章「修訂版本選擇」一節中不曾提到的語法。執行這個命令可以獲得一個所有已經完成推送的提交的列表:

git rev-list ^#{sha}^@ refs/remotes/#{remote_ref}

The SHA^@ syntax resolves to all the parents of that commit. You’re looking for any commit that is reachable from the last commit on the remote and that isn’t reachable from any parent of any of the SHAs you’re trying to push up — meaning it’s a fast-forward. SHA^@ 語法解析該次提交的所有祖先。我們尋找任何一個提交,這個提交可以從遠端最後一次提交衍變獲得(reachable),但從我們嘗試推送的任何一個提交的 SHA 值的任何一個祖先都無法衍變獲得——也就是 fast-forward 的內容。

這個解決方案的缺點在於它可能會很慢而且通常是沒有必要的——只要不用 -f 來強制推送,伺服器會自動給出警告並且拒絕推送內容。然而,這是個不錯的練習,而且理論上能幫助用戶避免一個將來不得不回頭修改的衍合操作。

總結

你已經見識過絕大多數通過自訂 Git 用戶端和服務端來適應自己工作流程和專案內容的方式了。你已經學到了各種配置設定(configuration settings)、以檔案為基礎的屬性(file-based attributes)、以及事件掛鉤,你也建置了一個執行強制政策的伺服器。現在,差不多任何你能想像到的工作流程,你應該都能讓 Git 切合你的需要。

Git 與其他系統

世界不是完美的。大多數時候,將所有接觸到的專案全部轉向 Git 是不可能的。有時我們不得不為某個專案使用其他的版本控制系統(VCS, Version Control System ),其中比較常見的是 Subversion 。你將在本章的第一部分學習使用 git svn,這是 Git 為 Subversion 附帶的雙向橋接工具。

或許現在你已經在考慮將先前的專案轉向 Git 。本章的第二部分將介紹如何將專案遷移到 Git:先介紹從 Subversion 的遷移,然後是 Perforce,最後介紹如何使用自訂的腳本進行非標準的導入。

Git 與 Subversion

當前,大多數開發中的開源專案以及大量的商業專案都使用 Subversion 來管理源碼。作為最流行的開源版本控制系統,Subversion 已經存在了接近十年的時間。它在許多方面與 CVS 十分類似,後者是前者出現之前代碼控制世界的霸主。

Git 最為重要的特性之一是名為 git svn 的 Subversion 雙向橋接工具。該工具把 Git 變成了 Subversion 服務的用戶端,從而讓你在本地享受到 Git 所有的功能,而後直接向 Subversion 伺服器推送內容,仿佛在本地使用了 Subversion 用戶端。也就是說,在其他人忍受古董的同時,你可以在本地享受分支合併,使用暫存區域,衍合以及單項挑揀(cherry-picking)等等。這是個讓 Git 偷偷潛入合作開發環境的好東西,在幫助你的開發同伴們提高效率的同時,它還能幫你勸說團隊讓整個專案框架轉向對 Git 的支持。這個 Subversion 之橋是通向分散式版本控制系統(DVCS, Distributed VCS )世界的神奇隧道。

git svn

Git 中所有 Subversion 橋接命令的基礎是 git svn 。所有的命令都從它開始。相關的命令數目不少,你將通過幾個簡單的工作流程瞭解到其中常見的一些。

值得注意的是,在使用 git svn 的時候,你實際是在與 Subversion 互動,Git 比它要高級複雜的多。儘管可以在本地隨意的進行分支和合併,最好還是通過衍合保持線性的提交歷史,儘量避免類似「與遠端 Git 倉庫同步互動」這樣的操作。

避免修改歷史再重新推送的做法,也不要同時推送到並行的 Git 倉庫來試圖與其他 Git 用戶合作。Subersion 只能保存單一的線性提交歷史,一不小心就會被搞糊塗。合作團隊中同時有人用 SVN 和 Git,一定要確保所有人都使用 SVN 服務來協作——這會讓生活輕鬆很多。

初始設定

為了展示功能,先要一個具有寫入許可權的 SVN 倉庫。如果想嘗試這個範例,你必須複製一份其中的測試倉庫。比較簡單的做法是使用一個名為 svnsync 的工具。較新的 Subversion 版本中都帶有該工具,它將資料編碼為用於網路傳輸的格式。

要嘗試本例,先在本地新建一個 Subversion 倉庫:

$ mkdir /tmp/test-svn
$ svnadmin create /tmp/test-svn

然後,允許所有用戶修改 revprop —— 簡單的做法是添加一個總是以 0 作為傳回值的 pre-revprop-change 腳本:

$ cat /tmp/test-svn/hooks/pre-revprop-change 
#!/bin/sh
exit 0;
$ chmod +x /tmp/test-svn/hooks/pre-revprop-change

現在可以呼叫 svnsync init,參數加目標倉庫,再加來源倉庫,就可以把該專案同步到本地了:

$ svnsync init file:///tmp/test-svn http://progit-example.googlecode.com/svn/

這將建立進行同步所需的屬性(property)。可以通過執行以下命令來 clone 程式碼:

$ svnsync sync file:///tmp/test-svn
Committed revision 1.
Copied properties for revision 1.
Committed revision 2.
Copied properties for revision 2.
Committed revision 3.
...

別看這個操作只花掉幾分鐘,要是你想把源倉庫複製到另一個遠端倉庫,而不是本地倉庫,那將花掉接近一個小時,儘管專案中只有不到 100 次的提交。 Subversion 每次只複製一次修改,把它推送到另一個倉庫裡,然後周而復始——驚人的低效率,但是我們別無選擇。

入門

有了可以寫入的 Subversion 倉庫以後,就可以嘗試一下典型的工作流程了。我們從 git svn clone 命令開始,它會把整個 Subversion 倉庫導入到一個本地的 Git 倉庫中。提醒一下,這裡導入的是一個貨真價實的 Subversion 倉庫,所以應該把下面的 file:///tmp/test-svn 換成你所用的 Subversion 倉庫的 URL:

$ git svn clone file:///tmp/test-svn -T trunk -b branches -t tags
Initialized empty Git repository in /Users/schacon/projects/testsvnsync/svn/.git/
r1 = b4e387bc68740b5af56c2a5faf4003ae42bd135c (trunk)
      A    m4/acx_pthread.m4
      A    m4/stl_hash.m4
...
r75 = d1957f3b307922124eec6314e15bcda59e3d9610 (trunk)
Found possible branch point: file:///tmp/test-svn/trunk => \
    file:///tmp/test-svn /branches/my-calc-branch, 75
Found branch parent: (my-calc-branch) d1957f3b307922124eec6314e15bcda59e3d9610
Following parent with do_switch
Successfully followed parent
r76 = 8624824ecc0badd73f40ea2f01fce51894189b01 (my-calc-branch)
Checked out HEAD:
 file:///tmp/test-svn/branches/my-calc-branch r76

這相當於針對所提供的 URL 運行了兩條命令—— git svn init 加上 gitsvn fetch 。可能會花上一段時間。我們所用的測試專案僅僅包含 75 次提交並且它的代碼量不算大,所以只有幾分鐘而已。不過,Git 仍然需要提取每一個版本,每次一個,再逐個提交。對於一個包含成百上千次提交的專案,花掉的時間則可能是幾小時甚至數天。

-T trunk -b branches -t tags 告訴 Git 該 Subversion 倉庫遵循了基本的分支和標籤命名法則。如果你的主幹(譯注:trunk,相當於非分散式版本控制裡的 master 分支,代表開發的主線)分支或者標籤以不同的方式命名,則應做出相應改變。由於該法則的常見性,可以使用 -s 來代替整條命令,它意味著標準佈局(s 是 Standard layout 的首字母),也就是前面選項的內容。下面的命令有相同的效果:

$ git svn clone file:///tmp/test-svn -s

現在,你有了一個有效的 Git 倉庫,包含著導入的分支和標籤:

$ git branch -a
* master
  my-calc-branch
  tags/2.0.2
  tags/release-2.0.1
  tags/release-2.0.2
  tags/release-2.0.2rc1
  trunk

值得注意的是,該工具分配命名空間時和遠端參照的方式不盡相同。clone 普通的 Git 倉庫時,可以用 origin/[branch] 的形式獲取遠端伺服器上所有可用的分支——分配到遠端服務的名稱下。然而 git svn 假定不存在多個遠端伺服器,所以把所有指向遠端服務的引用不加區分(no namespacing)的保存下來。可以用 Git 底層(plumbing)命令 show-ref 來查看所有引用的全名:

$ git show-ref
1cbd4904d9982f386d87f88fce1c24ad7c0f0471 refs/heads/master
aee1ecc26318164f355a883f5d99cff0c852d3c4 refs/remotes/my-calc-branch
03d09b0e2aad427e34a6d50ff147128e76c0e0f5 refs/remotes/tags/2.0.2
50d02cc0adc9da4319eeba0900430ba219b9c376 refs/remotes/tags/release-2.0.1
4caaa711a50c77879a91b8b90380060f672745cb refs/remotes/tags/release-2.0.2
1c4cb508144c513ff1214c3488abe66dcb92916f refs/remotes/tags/release-2.0.2rc1
1cbd4904d9982f386d87f88fce1c24ad7c0f0471 refs/remotes/trunk

而普通的 Git 倉庫應該是這個模樣:

$ git show-ref
83e38c7a0af325a9722f2fdc56b10188806d83a1 refs/heads/master
3e15e38c198baac84223acfc6224bb8b99ff2281 refs/remotes/gitserver/master
0a30dd3b0c795b80212ae723640d4e5d48cabdff refs/remotes/origin/master
25812380387fdd55f916652be4881c6f11600d6f refs/remotes/origin/testing

這裡有兩個遠端伺服器:一個名為 gitserver ,具有一個 master 分支;另一個叫 origin,具有 mastertesting 兩個分支。

注意本例中通過 git svn 導入的遠端參照,(Subversion 的)標籤是當作遠端分支添加的,而不是真正的 Git 標籤。導入的 Subversion 倉庫仿佛是有一個帶有不同分支的 tags 遠端伺服器。

提交到 Subversion

有了可以開展工作的(本地)倉庫以後,你可以開始對該專案做出貢獻並向上游倉庫提交內容了,Git 這時相當於一個 SVN 用戶端。假如編輯了一個檔並進行提交,那麼這次提交僅存在於本地的 Git 而非 Subversion 伺服器上:

$ git commit -am 'Adding git-svn instructions to the README'
[master 97031e5] Adding git-svn instructions to the README
 1 files changed, 1 insertions(+), 1 deletions(-)

接下來,可以將作出的修改推送到上游。值得注意的是,Subversion 的使用流程也因此改變了——你可以在離線狀態下進行多次提交然後一次性的推送到 Subversion 的伺服器上。向 Subversion 伺服器推送的命令是 git svn dcommit

$ git svn dcommit
Committing to file:///tmp/test-svn/trunk ...
       M      README.txt
Committed r79
       M      README.txt
r79 = 938b1a547c2cc92033b74d32030e86468294a5c8 (trunk)
No changes between current HEAD and refs/remotes/trunk
Resetting to the latest refs/remotes/trunk

所有在原 Subversion 資料基礎上提交的 commit 會一一提交到 Subversion,然後你本地 Git 的 commit 將被重寫,加入一個特別標識。這一步很重要,因為它意味著所有 commit 的 SHA-1 指都會發生變化。這也是同時使用 Git 和 Subversion 兩種服務作為遠端服務不是個好主意的原因之一。檢視以下最後一個 commit,你會找到新添加的 git-svn-id (譯注:即本段開頭所說的特別標識):

$ git log -1
commit 938b1a547c2cc92033b74d32030e86468294a5c8
Author: schacon <schacon@4c93b258-373f-11de-be05-5f7a86268029>
Date:   Sat May 2 22:06:44 2009 +0000

    Adding git-svn instructions to the README

    git-svn-id: file:///tmp/test-svn/trunk@79 4c93b258-373f-11de-be05-5f7a86268029

注意看,原本以 97031e5 開頭的 SHA-1 校驗值在提交完成以後變成了 938b1a5 。如果既要向 Git 遠端伺服器推送內容,又要推送到 Subversion 遠端伺服器,則必須先向 Subversion 推送(dcommit),因為該操作會改變所提交的資料內容。

拉取最新進展

If you’re working with other developers, then at some point one of you will push, and then the other one will try to push a change that conflicts. That change will be rejected until you merge in their work. In git svn, it looks like this: 如果要與其他開發者協作,總有那麼一天你推送完畢之後,其他人發現他們推送自己修改的時候(與你推送的內容)產生衝突。這些修改在你合併之前將一直被拒絕。在 git svn 裡這種情況像這樣:

$ git svn dcommit
Committing to file:///tmp/test-svn/trunk ...
Merge conflict during commit: Your file or directory 'README.txt' is probably \
out-of-date: resource out of date; try updating at /Users/schacon/libexec/git-\
core/git-svn line 482

為了解決該問題,可以執行 git svn rebase ,它會拉取伺服器上所有最新的改變,再於此基礎上衍合你的修改:

$ git svn rebase
       M      README.txt
r80 = ff829ab914e8775c7c025d741beb3d523ee30bc4 (trunk)
First, rewinding head to replay your work on top of it...
Applying: first user change

現在,你做出的修改都在 Subversion 伺服器上,所以可以順利的運行 dcommit

$ git svn dcommit
Committing to file:///tmp/test-svn/trunk ...
       M      README.txt
Committed r81
       M      README.txt
r81 = 456cbe6337abe49154db70106d1836bc1332deed (trunk)
No changes between current HEAD and refs/remotes/trunk
Resetting to the latest refs/remotes/trunk

需要牢記的一點是,Git 要求我們在推送之前先合併上游倉庫中最新的內容,而 git svn 只要求存在衝突的時候才這樣做。假如有人向一個檔推送了一些修改,這時你要向另一個文件推送一些修改,那麼 dcommit 將正常工作:

$ git svn dcommit
Committing to file:///tmp/test-svn/trunk ...
       M      configure.ac
Committed r84
       M      autogen.sh
r83 = 8aa54a74d452f82eee10076ab2584c1fc424853b (trunk)
       M      configure.ac
r84 = cdbac939211ccb18aa744e581e46563af5d962d0 (trunk)
W: d2f23b80f67aaaa1f6f5aaef48fce3263ac71a92 and refs/remotes/trunk differ, \
  using rebase:
:100755 100755 efa5a59965fbbb5b2b0a12890f1b351bb5493c18 \
  015e4c98c482f0fa71e4d5434338014530b37fa6 M   autogen.sh
First, rewinding head to replay your work on top of it...
Nothing to do.

這一點需要牢記,因為它的結果是推送之後專案處於一個不完整存在於任何主機上的狀態。如果做出的修改無法相容但沒有產生衝突,則可能造成一些很難確診的難題。這和使用 Git 伺服器是不同的——在 Git 世界裡,發佈之前,你可以在用戶端系統裡完整的測試專案的狀態,而在 SVN 永遠都沒法確保提交前後專案的狀態完全一樣。

即使還沒打算進行提交,你也應該用這個命令從 Subversion 伺服器拉取最新修改。你可以執行 git svn fetch 獲取最新的資料,不過 git svn rebase 才會在獲取之後在本地進行更新 。

$ git svn rebase
       M      generate_descriptor_proto.sh
r82 = bd16df9173e424c6f52c337ab6efa7f7643282f1 (trunk)
First, rewinding head to replay your work on top of it...
Fast-forwarded master to refs/remotes/trunk.

不時地執行一下 git svn rebase 可以確保你的代碼沒有過時。不過,執行該命令時需要確保工作目錄的整潔。如果在本地做了修改,則必須在執行 git svn rebase 之前暫存工作、或暫時提交內容——否則,該命令會發現衍合的結果包含著衝突因而終止。

Git 分支問題

習慣了 Git 的工作流程以後,你可能會創建一些特性分支,完成相關的開發工作,然後合併他們。如果要用 git svn 向 Subversion 推送內容,那麼最好是每次用衍合來併入一個單一分支,而不是直接合併。使用衍合的原因是 Subversion 只有一個線性的歷史而不像 Git 那樣處理合併,所以 Git svn 在把快照轉換為 Subversion 的 commit 時只能包含第一個祖先。

假設分支歷史如下:創建一個 experiment 分支,進行兩次提交,然後合併到 master 。在 dcommit 的時候會得到如下輸出:

$ git svn dcommit
Committing to file:///tmp/test-svn/trunk ...
       M      CHANGES.txt
Committed r85
       M      CHANGES.txt
r85 = 4bfebeec434d156c36f2bcd18f4e3d97dc3269a2 (trunk)
No changes between current HEAD and refs/remotes/trunk
Resetting to the latest refs/remotes/trunk
COPYING.txt: locally modified
INSTALL.txt: locally modified
       M      COPYING.txt
       M      INSTALL.txt
Committed r86
       M      INSTALL.txt
       M      COPYING.txt
r86 = 2647f6b86ccfcaad4ec58c520e369ec81f7c283c (trunk)
No changes between current HEAD and refs/remotes/trunk
Resetting to the latest refs/remotes/trunk

在一個包含了合併歷史的分支上使用 dcommit 可以成功運行,不過在 Git 專案的歷史中,它沒有重寫你在 experiment 分支中的兩個 commit ——取而代之的是,這些改變出現在了 SVN 版本中同一個合併 commit 中。

在別人 clone 該專案的時候,只能看到這個合併 commit 包含了所有發生過的修改;他們無法獲知修改的作者和時間等提交資訊。

Subversion 分支

Subversion 的分支和 Git 中的不盡相同;避免過多的使用可能是最好方案。不過,用 git svn 創建和提交不同的 Subversion 分支仍是可行的。

創建新的 SVN 分支

要在 Subversion 中建立一個新分支,可以執行 git svn branch [分支名]:

$ git svn branch opera
Copying file:///tmp/test-svn/trunk at r87 to file:///tmp/test-svn/branches/opera...
Found possible branch point: file:///tmp/test-svn/trunk => \
  file:///tmp/test-svn/branches/opera, 87
Found branch parent: (opera) 1f6bfe471083cbca06ac8d4176f7ad4de0d62e5f
Following parent with do_switch
Successfully followed parent
r89 = 9b6fe0b90c5c9adf9165f700897518dbc54a7cbf (opera)

這相當於在 Subversion 中的 svn copy trunk branches/opera 命令並且對 Subversion 伺服器進行了相關操作。值得提醒的是它沒有檢出(check out)並轉換到那個分支;如果現在進行提交,將提交到伺服器上的 trunk, 而非 opera

切換當前分支

Git 通過搜尋提交歷史中 Subversion 分支的頭部(tip)來決定 dcommit 的目的地——而它應該只有一個,那就是當前分支歷史中最近一次包含 git-svn-id 的提交。

如果需要同時在多個分支上提交,可以通過導入 Subversion 上某個其他分支的 commit 來建立以該分支為 dcommit 目的地的本地分支。比如你想擁有一個並行維護的 opera 分支,可以執行

$ git branch opera remotes/opera

然後,如果要把 opera 分支併入 trunk (本地的 master 分支),可以使用普通的 git merge。不過最好提供一條描述提交的資訊(通過 -m),否則這次合併的記錄會是「Merge branch opera」,而不是任何有用的東西。

記住,雖然使用了 git merge 來進行這次操作,並且合併過程可能比使用 Subversion 簡單一些(因為 Git 會自動找到適合的合併基礎),這並不是一次普通的 Git 合併提交。最終它將被推送回Subversion 伺服器上,而 Subversion 伺服器上無法處理包含多個祖先的 commit;因而在推送之後,它將變成一個包含了所有在其他分支上做出的改變的單一 commit。把一個分支合併到另一個分支以後,你沒法像在 Git 中那樣輕易的回到那個分支上繼續工作。提交時執行的 dcommit 命令擦掉了所有關於哪個分支被併入的資訊,因而以後的合併基礎計算將是不正確的—— dcommit 讓 git merge 的結果變得類似於 git merge --squash。不幸的是,我們沒有什麼好辦法來避免該情況—— Subversion 無法儲存這個資訊,所以在使用它作為伺服器的時候你將永遠為這個缺陷所困。為了不出現這種問題,在把本地分支(本例中的 opera)併入 trunk 以後應該立即將其刪除。

對應 Subversion 的命令

git svn 工具集合了若干個與 Subversion 類似的功能,對應的命令可以簡化向 Git 的轉化過程。下面這些命令能實現 Subversion 的這些功能。

SVN 風格的歷史紀錄

習慣了 Subversion 的人可能想以 SVN 的風格顯示歷史,運行 git svn log 可以讓提交歷史顯示為 SVN 格式:

$ git svn log
------------------------------------------------------------------------
r87 | schacon | 2009-05-02 16:07:37 -0700 (Sat, 02 May 2009) | 2 lines

autogen change

------------------------------------------------------------------------
r86 | schacon | 2009-05-02 16:00:21 -0700 (Sat, 02 May 2009) | 2 lines

Merge branch 'experiment'

------------------------------------------------------------------------
r85 | schacon | 2009-05-02 16:00:09 -0700 (Sat, 02 May 2009) | 2 lines

updated the changelog

關於 git svn log ,有兩點需要注意。首先,它可以離線工作,不像 svn log 命令,需要向 Subversion 伺服器索取資料。其次,它僅僅顯示已經提交到 Subversion 伺服器上的 commit。在本地尚未 dcommit 的 Git 資料不會出現在這裡;其他人向 Subversion 伺服器新提交的資料也不會顯示。等於說是顯示了最近已知 Subversion 伺服器上的狀態。

SVN Annotation

類似 git svn log 命令模擬了 svn log 命令的離線操作,svn annotate 的等效命令是 git svn blame [檔案名]。其輸出如下:

$ git svn blame README.txt 
 2   temporal Protocol Buffers - Google's data interchange format
 2   temporal Copyright 2008 Google Inc.
 2   temporal http://code.google.com/apis/protocolbuffers/
 2   temporal 
22   temporal C++ Installation - Unix
22   temporal =======================
 2   temporal 
79    schacon Committing in git-svn.
78    schacon 
 2   temporal To build and install the C++ Protocol Buffer runtime and the Protocol
 2   temporal Buffer compiler (protoc) execute the following:
 2   temporal

同樣,它不顯示本地的 Git 提交以及 Subversion 上後來更新的內容。

SVN 伺服器資訊

還可以使用 git svn info 來獲取與執行 svn info 類似的資訊:

$ git svn info
Path: .
URL: https://schacon-test.googlecode.com/svn/trunk
Repository Root: https://schacon-test.googlecode.com/svn
Repository UUID: 4c93b258-373f-11de-be05-5f7a86268029
Revision: 87
Node Kind: directory
Schedule: normal
Last Changed Author: schacon
Last Changed Rev: 87
Last Changed Date: 2009-05-02 16:07:37 -0700 (Sat, 02 May 2009)

它與 blame 和 log 的相同點在於離線運行以及只更新到最後一次與 Subversion 伺服器通信的狀態。

忽略 Subversion 所忽略的

假如 clone 了一個包含了 svn:ignore 屬性的 Subversion 倉庫,就有必要建立對應的 .gitignore 文件來防止意外提交一些不應該提交的文件。git svn 有兩個有助於改善該問題的命令。第一個是 git svn create-ignore,它自動建立對應的 .gitignore 檔,以便下次提交的時候可以包含它。

第二個命令是 git svn show-ignore,它把需要放進 .gitignore 檔中的內容列印到標準輸出,方便我們把輸出重定向到專案的黑名單檔(exclude file):

$ git svn show-ignore > .git/info/exclude

這樣一來,避免了 .gitignore 對專案的干擾。如果你是一個 Subversion 團隊裡唯一的 Git 用戶,而其他隊友不喜歡專案裏出現 .gitignore 檔案,該方法是你的不二之選。

Git-Svn 總結

git svn 工具集在當前不得不使用 Subversion 伺服器或者開發環境要求使用 Subversion 伺服器的時候格外有用。不妨把它看成一個跛腳的 Git,然而,你還是有可能在轉換過程中碰到一些困惑你和合作者們的謎題。為了避免麻煩,試著遵守如下守則:

如果遵循這些守則,在 Subversion 上工作還可以接受。然而,如果能遷徙到真正的 Git 伺服器,則能為團隊帶來更多好處。

遷移到 Git

如果在其他版本控制系統(VCS)中保存了某專案的代碼而後決定轉而使用 Git,那麼該專案必須經歷某種形式的遷移。本節將介紹 Git 中包含的一些針對常見系統的導入腳本(importer),並將展示編寫自訂的導入腳本的方法。

導入

你將學習到如何從專業重量級的版本控制系統(SCM)中匯入資料—— Subversion 和 Perforce —— 因為據我所知這二者的用戶是(向 Git)轉換的主要群體,而且 Git 為此二者附帶了高品質的轉換工具。

Subversion

讀過前一節有關 git svn 的內容以後,你應該能輕而易舉的根據其中的指導來 git svn clone 一個倉庫了;然後,停止 Subversion 的使用,向一個新 Git server 推送,並開始使用它。想保留歷史記錄,所花的時間應該不過就是從 Subversion 伺服器拉取資料的時間(可能要等上好一會就是了)。

然而,這樣的匯入並不完美;而且還要花那麼多時間,不如乾脆一次把它做對!首當其衝的任務是作者資訊。在 Subversion,每個提交者都在主機上有一個用戶名,記錄在提交資訊中。上節例子中多處顯示了 schacon ,比如 blame 的輸出以及 git svn log。如果想讓這條資訊更好的映射到 Git 作者資料裡,則需要從 Subversion 用戶名到 Git 作者的一個映射關係。建立一個叫做 user.txt 的檔,用如下格式表示映射關係:

schacon = Scott Chacon <schacon@geemail.com>
selse = Someo Nelse <selse@geemail.com>

通過以下命令可以獲得 SVN 作者的列表:

$ svn log --xml | grep author | sort -u | perl -pe 's/.>(.?)<./$1 = /'

它將輸出 XML 格式的日誌——你可以找到作者,建立一個單獨的列表,然後從 XML 中抽取出需要的資訊。(顯而易見,本方法要求主機上安裝了grepsortperl.)然後把輸出重定向到 user.txt 檔,然後就可以在每一項的後面添加相應的 Git 使用者資料。

git svn 提供該檔可以讓它更精確的映射作者資料。你還可以在 clone 或者 init 後面添加 --no-metadata 來阻止 git svn 包含那些 Subversion 的附加資訊。這樣 import 命令就變成了:

$ git-svn clone http://my-project.googlecode.com/svn/ \
      --authors-file=users.txt --no-metadata -s my_project

現在 my_project 目錄下導入的 Subversion 應該比原來整潔多了。原來的 commit 看上去是這樣:

commit 37efa680e8473b615de980fa935944215428a35a
Author: schacon <schacon@4c93b258-373f-11de-be05-5f7a86268029>
Date:   Sun May 3 00:12:22 2009 +0000

    fixed install - go to trunk

    git-svn-id: https://my-project.googlecode.com/svn/trunk@94 4c93b258-373f-11de-
    be05-5f7a86268029

現在是這樣:

commit 03a8785f44c8ea5cdb0e8834b7c8e6c469be2ff2
Author: Scott Chacon <schacon@geemail.com>
Date:   Sun May 3 00:12:22 2009 +0000

    fixed install - go to trunk

不僅作者一項乾淨了不少,git-svn-id 也就此消失了。

你還需要一點 post-import(導入後) 清理工作。最起碼的,應該清理一下 git svn 創建的那些怪異的索引結構。首先要移動標籤,把它們從奇怪的遠端分支變成實際的標籤,然後把剩下的分支移動到本地。

要把標籤變成合適的 Git 標籤,執行

$ cp -Rf .git/refs/remotes/tags/* .git/refs/tags/
$ rm -Rf .git/refs/remotes/tags

該命令將原本以 tag/ 開頭的遠端分支的索引變成真正的 (lightweight) 標籤。

接下來,把 refs/remotes 下面剩下的索引(reference)變成本地分支:

$ cp -Rf .git/refs/remotes/* .git/refs/heads/
$ rm -Rf .git/refs/remotes

現在所有的舊分支都變成真正的 Git 分支,所有的舊標籤也變成真正的 Git 標籤。最後一項工作就是把新建的 Git 伺服器添加為遠端伺服器並且向它推送。為了讓所有的分支和標籤都得到上傳,我們使用這條命令:

$ git push origin --all

所有的分支和標籤現在都應該整齊乾淨的躺在新的 Git 伺服器裡了。

Perforce

你將瞭解到的下一個被導入的系統是 Perforce. Git 發行的時候同時也附帶了一個 Perforce 導入腳本,不過它是包含在源碼的 contrib 部分——而不像 git svn 那樣預設就可以使用。執行它之前必須獲取 Git 的源碼,可以在 git.kernel.org 下載:

$ git clone git://git.kernel.org/pub/scm/git/git.git
$ cd git/contrib/fast-import

在這個 fast-import 目錄下,應該有一個叫做 git-p4 的 Python 可執行腳本。主機上必須裝有 Python 和 p4 工具該導入才能正常進行。例如,你要從 Perforce 公共代碼倉庫(譯注: Perforce Public Depot,Perforce 官方提供的代碼寄存服務)導入 Jam 專案。為了設定用戶端,我們要把 P4PORT 環境變數 export 到 Perforce 倉庫:

$ export P4PORT=public.perforce.com:1666

執行 git-p4 clone 命令將從 Perforce 伺服器導入 Jam 專案,我們需要給出倉庫和專案的路徑以及導入的目標路徑:

$ git-p4 clone //public/jam/src@all /opt/p4import
Importing from //public/jam/src@all into /opt/p4import
Reinitialized existing Git repository in /opt/p4import/.git/
Import destination: refs/remotes/p4/master
Importing revision 4409 (100%)

現在去 /opt/p4import 目錄執行一下 git log ,就能看到導入的成果:

$ git log -2
commit 1fd4ec126171790efd2db83548b85b1bbbc07dc2
Author: Perforce staff <support@perforce.com>
Date:   Thu Aug 19 10:18:45 2004 -0800

    Drop 'rc3' moniker of jam-2.5.  Folded rc2 and rc3 RELNOTES into
    the main part of the document.  Built new tar/zip balls.

    Only 16 months later.

    [git-p4: depot-paths = "//public/jam/src/": change = 4409]

commit ca8870db541a23ed867f38847eda65bf4363371d
Author: Richard Geiger <rmg@perforce.com>
Date:   Tue Apr 22 20:51:34 2003 -0800

    Update derived jamgram.c

    [git-p4: depot-paths = "//public/jam/src/": change = 3108]

每一個 commit 裡都有一個 git-p4 識別字。這個識別字可以保留,以防以後需要引用 Perforce 的修改版本號。然而,如果想刪除這些識別字,現在正是時候——開始在新倉庫上工作之前。可以通過 git filter-branch 來批量刪除這些識別字:

$ git filter-branch --msg-filter '
        sed -e "/^\[git-p4:/d"
'
Rewrite 1fd4ec126171790efd2db83548b85b1bbbc07dc2 (123/123)
Ref 'refs/heads/master' was rewritten

現在執行一下 git log,你會發現這些 commit 的 SHA-1 校驗值都發生了改變,而那些 git-p4 字串則從提交資訊裡消失了:

$ git log -2
commit 10a16d60cffca14d454a15c6164378f4082bc5b0
Author: Perforce staff <support@perforce.com>
Date:   Thu Aug 19 10:18:45 2004 -0800

    Drop 'rc3' moniker of jam-2.5.  Folded rc2 and rc3 RELNOTES into
    the main part of the document.  Built new tar/zip balls.

    Only 16 months later.

commit 2b6c6db311dd76c34c66ec1c40a49405e6b527b2
Author: Richard Geiger <rmg@perforce.com>
Date:   Tue Apr 22 20:51:34 2003 -0800

    Update derived jamgram.c

至此導入已經完成,可以開始向新的 Git 伺服器推送了。

自定導入腳本

如果你的系統不是 Subversion 或 Perforce 之一,先上網找一下有沒有與之對應的導入腳本——導入 CVS,Clear Case,Visual Source Safe,甚至存檔目錄的導入腳本已經存在。假如這些工具都不適用,或者使用的工具很少見,抑或你需要導入過程具有更多可制定性,則應該使用 git fast-import。該命令從標準輸入讀取簡單的指令來寫入具體的 Git 資料。這樣創建 Git 物件比執行純 Git 命令或者手動寫物件要簡單的多(更多相關內容見第九章)。通過它,你可以編寫一個導入腳本來從導入來源讀取必要的資訊,同時在標準輸出直接輸出相關指令(instructions)。你可以執行該腳本並把它的輸出管道連接(pipe)到 git fast-import

下面演示一下如何編寫一個簡單的導入腳本。假設你在進行一項工作,並且按時通過把工作目錄複寫為以時間戳記 backYYMM_DD 命名的目錄來進行備份,現在你需要把它們導入 Git 。目錄結構如下:

$ ls /opt/import_from
back_2009_01_02
back_2009_01_04
back_2009_01_14
back_2009_02_03
current

為了導入到一個 Git 目錄,我們首先回顧一下 Git 儲存資料的方式。你可能還記得,Git 本質上是一個 commit 物件的鏈表,每一個物件指向一個內容的快照。而這裡需要做的工作就是告訴 fast-import 內容快照的位置,什麼樣的 commit 資料指向它們,以及它們的順序。我們採取一次處理一個快照的策略,為每一個內容目錄建立對應的 commit ,每一個 commit 與前一個 commit 建立連結。

正如在第七章 “Git 執行策略一例” 一節中一樣,我們將使用 Ruby 來編寫這個腳本,因為它是我日常使用的語言而且閱讀起來簡單一些。你可以用任何其他熟悉的語言來重寫這個例子——它僅需要把必要的資訊列印到標準輸出而已。同時,如果你在使用 Windows,這意味著你要特別留意不要在換行的時候引入回車符(譯注:carriage returns,Windows 換行時加入的符號,通常說的 \r )—— Git 的 fast-import 對僅使用分行符號(LF)而非 Windows 的回車符(CRLF)要求非常嚴格。

首先,進入目標目錄並且找到所有子目錄,每一個子目錄將作為一個快照被導入為一個 commit。我們將依次進入每一個子目錄並列印所需的命令來匯出它們。腳本的主迴圈大致是這樣:

last_mark = nil

# loop through the directories
Dir.chdir(ARGV[0]) do
  Dir.glob("*").each do |dir|
    next if File.file?(dir)

    # move into the target directory
    Dir.chdir(dir) do 
      last_mark = print_export(dir, last_mark)
    end
  end
end

我們在每一個目錄裡執行 print_export,它會取出上一個快照的索引和標記並返回本次快照的索引和標記;由此我們就可以正確的把二者連接起來。”標記(mark)” 是 fast-import 中對 commit 識別字的叫法;在創建 commit 的同時,我們逐一賦予一個標記以便以後在把它連接到其他 commit 時使用。因此,在 print_export 方法中要做的第一件事就是根據目錄名產生一個標記:

mark = convert_dir_to_mark(dir)

實現該函數的方法是建立一個目錄的陣列序列並使用陣列的索引值作為標記,因為標記必須是一個整數。這個方法大致是這樣的:

$marks = []
def convert_dir_to_mark(dir)
  if !$marks.include?(dir)
    $marks << dir
  end
  ($marks.index(dir) + 1).to_s
end

有了整數來代表每個 commit,我們現在需要提交附加資訊中的日期。由於日期是用目錄名表示的,我們就從中解析出來。print_export 文件的下一行將是:

date = convert_dir_to_date(dir)

convert_dir_to_date 則定義為

def convert_dir_to_date(dir)
  if dir == 'current'
    return Time.now().to_i
  else
    dir = dir.gsub('back_', '')
    (year, month, day) = dir.split('_')
    return Time.local(year, month, day).to_i
  end
end

它為每個目錄回傳一個 integer。提交附加資訊裡最後一項所需的是提交者資料,我們在一個全域變數中直接定義之:

$author = 'Scott Chacon <schacon@example.com>'

我們差不多可以開始為導入腳本輸出提交資料了。第一項資訊指明我們定義的是一個 commit 物件以及它所在的分支,隨後是我們產生的標記、提交者資訊以及提交備註,然後是前一個 commit 的索引,如果有的話。程式碼大致像這樣:

# print the import information
puts 'commit refs/heads/master'
puts 'mark :' + mark
puts "committer #{$author} #{date} -0700"
export_data('imported from ' + dir)
puts 'from :' + last_mark if last_mark

為了簡化,時區寫死(hardcode)為(-0700)。如果是從其他版本控制系統導入,則必須以變數的形式指明時區。提交訊息必須以特定格式給出:

data (size)\n(contents)

該格式包含了「data」這個字、所讀取資料的大小、一個分行符號,最後是資料本身。由於隨後指明檔案內容的時候要用到相同的格式,我們寫一個輔助方法,export_data

def export_data(string)
  print "data #{string.size}\n#{string}"
end

唯一剩下的就是每一個快照的內容了。這簡單的很,因為它們分別處於一個目錄——你可以輸出 deleeall 命令,隨後是目錄中每個檔的內容。Git 會正確的記錄每一個快照:

puts 'deleteall'
Dir.glob("**/*").each do |file|
  next if !File.file?(file)
  inline_data(file)
end

注意:由於很多系統把每次修訂看作一個 commit 到另一個 commit 的變化量,fast-import 也可以依據每次提交獲取一個命令來指出哪些檔被添加,刪除或者修改過,以及修改的內容。我們將需要計算快照之間的差別並且僅僅給出這項資料,不過該做法要複雜很多——還不如直接把所有資料丟給 Git 讓它自己搞清楚。假如前面這個方法更適用於你的資料,參考 fast-import 的 man 説明頁面來瞭解如何以這種方式提供資料。

列舉新檔內容或者指明帶有新內容的已修改檔的格式如下:

M 644 inline path/to/file
data (size)
(file contents)

這裡,644 是許可權模式(如果有執行檔,則需要偵測之並設定為 755),而 inline 說明我們在本行結束之後立即列出檔的內容。我們的 inline_data 方法大致是:

def inline_data(file, code = 'M', mode = '644')
  content = File.read(file)
  puts "#{code} #{mode} inline #{file}"
  export_data(content)
end

我們再次使用了前面定義過的 export_data,因為這裡和指明提交注釋的格式如出一轍。

最後一項工作是回傳當前的標記以便下次迴圈的使用。

return mark

注意:如果你是在 Windows 上執行,一定記得添加一項額外的步驟。前面提過,Windows 使用 CRLF 作為換行字元而 Git fast-import 只接受 LF。為了避開這個問題來滿足 git fast-import,你需要讓 ruby 用 LF 取代 CRLF:

$stdout.binmode

搞定了。現在執行該腳本,你將得到如下內容:

$ ruby import.rb /opt/import_from 
commit refs/heads/master
mark :1
committer Scott Chacon <schacon@geemail.com> 1230883200 -0700
data 29
imported from back_2009_01_02deleteall
M 644 inline file.rb
data 12
version two
commit refs/heads/master
mark :2
committer Scott Chacon <schacon@geemail.com> 1231056000 -0700
data 29
imported from back_2009_01_04from :1
deleteall
M 644 inline file.rb
data 14
version three
M 644 inline new.rb
data 16
new version one
(...)

要執行導入腳本,在需要導入的目錄把該內容用管道定向(pipe)到 git fast-import。你可以建立一個空目錄然後執行 git init 作為起點,然後執行該腳本:

$ git init
Initialized empty Git repository in /opt/import_to/.git/
$ ruby import.rb /opt/import_from | git fast-import
git-fast-import statistics:
---------------------------------------------------------------------
Alloc'd objects:       5000
Total objects:           18 (         1 duplicates                  )
      blobs  :            7 (         1 duplicates          0 deltas)
      trees  :            6 (         0 duplicates          1 deltas)
      commits:            5 (         0 duplicates          0 deltas)
      tags   :            0 (         0 duplicates          0 deltas)
Total branches:           1 (         1 loads     )
      marks:           1024 (         5 unique    )
      atoms:              3
Memory total:          2255 KiB
       pools:          2098 KiB
     objects:           156 KiB
---------------------------------------------------------------------
pack_report: getpagesize()            =       4096
pack_report: core.packedGitWindowSize =   33554432
pack_report: core.packedGitLimit      =  268435456
pack_report: pack_used_ctr            =          9
pack_report: pack_mmap_calls          =          5
pack_report: pack_open_windows        =          1 /          1
pack_report: pack_mapped              =       1356 /       1356
---------------------------------------------------------------------

你會發現,在它成功執行完畢以後,會給出一堆有關已完成工作的資料。上例在一個分支導入了5次提交資料,包含了18個物件。現在可以執行 git log 來檢視新的歷史:

$ git log -2
commit 10bfe7d22ce15ee25b60a824c8982157ca593d41
Author: Scott Chacon <schacon@example.com>
Date:   Sun May 3 12:57:39 2009 -0700

    imported from current

commit 7e519590de754d079dd73b44d695a42c9d2df452
Author: Scott Chacon <schacon@example.com>
Date:   Tue Feb 3 01:00:00 2009 -0700

    imported from back_2009_02_03

就這樣——一個乾淨整潔的 Git 倉庫。需要注意的是此時沒有任何內容被檢出(checked out)——剛開始目前的目錄裡沒有任何檔。要獲取它們,你得轉到 master 分支的所在:

$ ls
$ git reset --hard master
HEAD is now at 10bfe7d imported from current
$ ls
file.rb  lib

fast-import 還可以做更多——處理不同的檔案模式、二進位檔案、多重分支與合併、標籤、進展標識(progress indicators)等等。一些更加複雜的實例可以在 Git 源碼的 contib/fast-import 目錄裡找到;較佳的其中之一是前面提過的 git-p4 腳本。

總結

現在的你應該掌握了在 Subversion 上使用 Git,以及把幾乎任何現存倉庫在不遺漏資料的情況下導入為 Git 倉庫。下一章將介紹 Git 內部的原始資料格式,從而使你能親手鍛造其中的每一個位元組,如果需要的話。

Git 內部原理

不管你是從前面的章節直接跳到了本章,還是讀完了其餘各章一直到這,你都將在本章見識 Git 的內部工作原理和實現方式。我個人發現學習這些內容對於理解 Git 的用處和強大是非常重要的,不過也有人認為這些內容對於初學者來說可能難以理解且過於複雜。正因如此我把這部分內容放在最後一章,你在學習過程中可以先閱讀這部分,也可以晚點閱讀這部分,這完全取決於你自己。

既然已經讀到這了,就讓我們開始吧。首先要弄明白一點,從根本上來講 Git 是一套內容定址 (content-addressable) 檔案系統,在此之上提供了一個 VCS 使用者介面。馬上你就會學到這意味著什麼。

早期的 Git (主要是 1.5 之前版本) 的使用者介面要比現在複雜得多,這是因為它更側重于成為檔案系統而不是一套更精緻的 VCS 。最近幾年改進了 UI 從而使它跟其他任何系統一樣清晰易用。即便如此,還是經常會有一些陳腔濫調提到早期 Git 的 UI 複雜又難學。

內容定址檔案系統這一層相當酷,在本章中我會先講解這部分。隨後你會學到傳輸機制和最終要使用的各種倉庫管理任務。

底層命令 (Plumbing) 和高層命令 (Porcelain)

本書講解了使用 checkout, branch, remote 等共約 30 個 Git 命令。然而由於 Git 一開始被設計成供 VCS 使用的工具集,而不是一整套 user-friendly 的 VCS,它還包含了許多底層命令,這些命令用於以 UNIX 風格使用或由腳本呼叫。這些命令一般被稱為 “plumbing” 命令(底層命令),其他的更友好的命令則被稱為 “porcelain” 命令(高層命令)。

本書前八章主要專門討論高層命令。本章將主要討論底層命令以理解 Git 的內部工作機制、演示 Git 如何及為何要以這種方式工作。這些命令主要不是用來從命令列手工使用的,更多的是用來為其他工具和自訂腳本服務的。

當你在一個新目錄或已有目錄內執行 git init 時,Git 會創建一個 .git 目錄,幾乎所有 Git 儲存和操作的內容都位於該目錄下。如果你要備份或複製一個倉庫,基本上將這一目錄拷貝至其他地方就可以了。本章基本上都討論該目錄下的內容。該目錄結構如下:

$ ls 
HEAD
branches/
config
description
hooks/
index
info/
objects/
refs/

該目錄下有可能還有其他檔,但這是一個全新的 git init 生成的倉庫,所以預設情況下這些就是你能看到的結構。新版本的 Git 不再使用 branches 目錄,description 檔僅供 GitWeb 程式使用,所以不用關心這些內容。config 檔包含了專案特有的配置選項,info 目錄保存了一份不希望在 .gitignore 檔中管理的忽略模式 (ignored patterns) 的全域可執行檔。hooks 目錄包含了第六章詳細介紹的用戶端或服務端鉤子腳本。

另外還有四個重要的檔案或目錄:HEADindex 檔,objectsrefs 目錄。這些是 Git 的核心部分。objects 目錄存放所有資料內容,refs 目錄存放指向資料 (分支) 的提交物件的指標,HEAD 檔指向當前分支,index 檔保存了暫存區域資訊。馬上你將詳細瞭解 Git 是如何操縱這些內容的。

Git 物件

Git 是一套內容定址檔案系統。很不錯。不過這是什麼意思呢? 這種說法的意思是,從內部來看,Git 是簡單的 key-value 資料儲存。它允許插入任意類型的內容,並會回傳一個鍵值,通過該鍵值可以在任何時候再取出該內容。可以通過底層命令 hash-object 來示範這點,傳一些資料給該命令,它會將資料保存在 .git 目錄並回傳表示這些資料的鍵值。首先初使化一個 Git 倉庫並確認 objects 目錄是空的:

$ mkdir test
$ cd test
$ git init
Initialized empty Git repository in /tmp/test/.git/
$ find .git/objects
.git/objects
.git/objects/info
.git/objects/pack
$ find .git/objects -type f
$

Git 初始化了 objects 目錄,同時在該目錄下創建了 packinfo 子目錄,但是該目錄下沒有其他常規檔。我們往這個 Git 資料庫裡儲存一些文本:

$ echo 'test content' | git hash-object -w --stdin
d670460b4b4aece5915caf5c68d12f560a9fe3e4

參數 -w 指示 hash-object 命令儲存 (資料) 物件,若不指定這個參數該命令僅僅回傳鍵值。--stdin 指定從標準輸入裝置 (stdin) 來讀取內容,若不指定這個參數,hash-object 就需要指定一個要儲存的檔案路徑。該命令輸出長度為 40 個字元的校驗和(checksum hash)。這是個 SHA-1 雜湊值──其值為要儲存的資料加上你馬上會瞭解到的一種頭資訊(header)的校驗和。現在可以查看到 Git 已經儲存了資料:

$ find .git/objects -type f 
.git/objects/d6/70460b4b4aece5915caf5c68d12f560a9fe3e4

可以在 objects 目錄下看到一個檔。這便是 Git 儲存資料內容的方式──為每份內容生成一個檔,取得該內容與頭資訊的 SHA-1 校驗和,創建以該校驗和前兩個字元為名稱的子目錄,並以 (校驗和) 剩下 38 個字元為檔命名 (保存至子目錄下)。

通過 cat-file 命令可以將資料內容取回。該命令是查看 Git 對象的瑞士軍刀。傳入 -p 參數可以讓 cat-file 命令輸出資料內容的類型:

$ git cat-file -p d670460b4b4aece5915caf5c68d12f560a9fe3e4
test content

可以往 Git 中添加更多內容並取回了。也可以直接添加檔案。比方說可以對一個檔進行簡單的版本控制。首先,創建一個新檔,並把檔案內容儲存到資料庫中:

$ echo 'version 1' > test.txt
$ git hash-object -w test.txt 
83baae61804e65cc73a7201a7252750c76066a30

接著往該檔中寫入一些新內容並再次保存:

$ echo 'version 2' > test.txt
$ git hash-object -w test.txt 
1f7a7a472abf3dd9643fd615f6da379c4acb3e3a

資料庫中已經將檔案的兩個新版本連同一開始的內容保存下來了:

$ find .git/objects -type f 
.git/objects/1f/7a7a472abf3dd9643fd615f6da379c4acb3e3a
.git/objects/83/baae61804e65cc73a7201a7252750c76066a30
.git/objects/d6/70460b4b4aece5915caf5c68d12f560a9fe3e4

再將檔案修復到第一個版本:

$ git cat-file -p 83baae61804e65cc73a7201a7252750c76066a30 > test.txt 
$ cat test.txt 
version 1

或恢復到第二個版本:

$ git cat-file -p 1f7a7a472abf3dd9643fd615f6da379c4acb3e3a > test.txt 
$ cat test.txt 
version 2

需要記住的是幾個版本的檔 SHA-1 值可能與實際的值不同,其次,儲存的並不是檔案名而僅僅是檔案內容。這種物件類型稱為 blob 。通過傳遞 SHA-1 值給 cat-file -t 命令可以讓 Git 返回任何物件的類型:

$ git cat-file -t 1f7a7a472abf3dd9643fd615f6da379c4acb3e3a
blob

Tree 物件

接下去來看 tree 物件,tree 物件可以儲存檔案名,同時也允許將一組檔案儲存在一起。Git 以一種類似 UNIX 檔案系統但更簡單的方式來儲存內容。所有內容以 tree 或 blob 物件儲存,其中 tree 物件對應於 UNIX 中的目錄,blob 物件則大致對應於 inodes 或檔案內容。一個單獨的 tree 物件包含一條或多條 tree 記錄,每一條記錄含有一個指向 blob 或子 tree 物件的 SHA-1 指標,並附有該物件的許可權模式 (mode)、類型和檔案名資訊。以 simplegit 專案為例,最新的 tree 可能是這個樣子:

$ git cat-file -p master^{tree}
100644 blob a906cb2a4a904a152e80877d4088654daad0c859      README
100644 blob 8f94139338f9404f26296befa88755fc2598c289      Rakefile
040000 tree 99f1a6d12cb4b6f19c8655fca46c3ecf317074e0      lib

master^{tree} 表示 master 分支上最新提交指向的 tree 物件。請注意 lib 子目錄並非一個 blob 物件,而是一個指向別一個 tree 物件的指標:

$ git cat-file -p 99f1a6d12cb4b6f19c8655fca46c3ecf317074e0
100644 blob 47c6340d6459e05787f644c2447d2595f5d3a54b      simplegit.rb

從概念上來講,Git 保存的資料如圖 9-1 所示。

Figure 9-1. Git 物件模型的簡化版

你可以自己創建 tree 。通常 Git 根據你的暫存區域或 index 來創建並寫入一個 tree 。因此要創建一個 tree 物件的話首先要通過將一些檔暫存從而創建一個 index 。可以使用 plumbing 命令 update-index 為一個單獨檔 ── test.txt 檔的第一個版本 ── 創建一個 index。通過該命令人工地將 test.txt 檔的首個版本加入到了一個新的暫存區域中。由於該檔原先並不在暫存區域中 (甚至就連暫存區域也還沒被創建出來呢),必須傳入 --add 參數;由於要添加的檔並不在目前的目錄下而是在資料庫中,必須傳入 --cacheinfo 參數。同時指定檔案模式,SHA-1 值和檔案名:

$ git update-index --add --cacheinfo 100644 \
  83baae61804e65cc73a7201a7252750c76066a30 test.txt

在本例中,指定了檔案模式為 100644,表明這是一個普通檔。其他可用的模式有:100755 表示可執行檔,120000 表示符號連結(symbolic link)。檔案模式是從一般的 UNIX 檔案模式中參考來的,但是沒有那麼靈活 ── 上述三種模式僅對 Git 中的檔案 (blobs) 有效 (雖然也有其他模式用於目錄和子模組)。

現在可以用 write-tree 命令將暫存區域的內容寫到一個 tree 物件了。無需 -w 參數 ── 如果目標 tree 不存在,呼叫 write-tree 會自動根據 index 狀態創建一個 tree 物件。

$ git write-tree
d8329fc1cc938780ffdd9f94e0d364e0ea74f579
$ git cat-file -p d8329fc1cc938780ffdd9f94e0d364e0ea74f579
100644 blob 83baae61804e65cc73a7201a7252750c76066a30      test.txt

可以驗證這確實是一個 tree 物件:

$ git cat-file -t d8329fc1cc938780ffdd9f94e0d364e0ea74f579
tree

再根據 test.txt 的第二個版本以及一個新檔創建一個新 tree 物件:

$ echo 'new file' > new.txt
$ git update-index test.txt 
$ git update-index --add new.txt

這時暫存區域中包含了 test.txt 的新版本及一個新檔 new.txt 。創建 (寫) 該 tree 物件 (將暫存區域或 index 狀態寫入到一個 tree 物件),然後瞧瞧它的樣子:

$ git write-tree
0155eb4229851634a0f03eb265b69f5a2d56f341
$ git cat-file -p 0155eb4229851634a0f03eb265b69f5a2d56f341
100644 blob fa49b077972391ad58037050f2a75f74e3671e92      new.txt
100644 blob 1f7a7a472abf3dd9643fd615f6da379c4acb3e3a      test.txt

請注意該 tree 物件包含了兩個檔案記錄,且 test.txt 的 SHA 值是早先值的 “第二版” (1f7a7a)。來點更有趣的,你將把第一個 tree 物件作為一個子目錄加進該 tree 中。可以用 read-tree 命令將 tree 物件讀到暫存區域中去。在這時,通過傳一個 --prefix 參數給 read-tree,將一個已有的 tree 物件作為一個子 tree 讀到暫存區域中:

$ git read-tree --prefix=bak d8329fc1cc938780ffdd9f94e0d364e0ea74f579
$ git write-tree
3c4e9cd789d88d8d89c1073707c3585e41b0e614
$ git cat-file -p 3c4e9cd789d88d8d89c1073707c3585e41b0e614
040000 tree d8329fc1cc938780ffdd9f94e0d364e0ea74f579      bak
100644 blob fa49b077972391ad58037050f2a75f74e3671e92      new.txt
100644 blob 1f7a7a472abf3dd9643fd615f6da379c4acb3e3a      test.txt

如果從剛寫入的新 tree 物件創建一個工作目錄,將得到位於工作目錄頂級的兩個檔和一個名為 bak 的子目錄,該子目錄包含了 test.txt 檔的第一個版本。可以將 Git 用來包含這些內容的資料想像成如圖 9-2 所示的樣子。

Figure 9-2. 當前 Git 資料的內容結構

Commit 物件

你現在有三個 tree 物件,它們指向了你要跟蹤的專案的不同快照,可是先前的問題依然存在:必須記往三個 SHA-1 值以獲得這些快照。你也沒有關於誰、何時以及為何保存了這些快照的資訊。commit 物件為你保存了這些基本資訊。

要創建一個 commit 物件,使用 commit-tree 命令,指定一個 tree 的 SHA-1,如果有任何前繼提交物件,也可以指定。從你寫的第一個 tree 開始:

$ echo 'first commit' | git commit-tree d8329f
fdf4fc3344e67ab068f836878b6c4951e3b15f3d

通過 cat-file 查看這個新 commit 物件:

$ git cat-file -p fdf4fc3
tree d8329fc1cc938780ffdd9f94e0d364e0ea74f579
author Scott Chacon <schacon@gmail.com> 1243040974 -0700
committer Scott Chacon <schacon@gmail.com> 1243040974 -0700

first commit

commit 物件格式很簡單:指明了該時間點專案快照的頂層樹物件、作者/提交者資訊 (從 Git 組態設定的 user.nameuser.email 中獲得) 以及當前時間戳記、一個空行,以及提交注釋資訊。

接著再寫入另外兩個 commit 物件,每一個都指定其之前的那個 commit 物件:

$ echo 'second commit' | git commit-tree 0155eb -p fdf4fc3
cac0cab538b970a37ea1e769cbbde608743bc96d
$ echo 'third commit'  | git commit-tree 3c4e9c -p cac0cab
1a410efbd13591db07496601ebc7a059dd55cfe9

每一個 commit 物件都指向了你創建的樹物件快照。出乎意料的是,現在已經有了真實的 Git 歷史了,所以如果執行 git log 命令並指定最後那個 commit 物件的 SHA-1 便可以查看歷史:

$ git log --stat 1a410e
commit 1a410efbd13591db07496601ebc7a059dd55cfe9
Author: Scott Chacon <schacon@gmail.com>
Date:   Fri May 22 18:15:24 2009 -0700

    third commit

 bak/test.txt |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

commit cac0cab538b970a37ea1e769cbbde608743bc96d
Author: Scott Chacon <schacon@gmail.com>
Date:   Fri May 22 18:14:29 2009 -0700

    second commit

 new.txt  |    1 +
 test.txt |    2 +-
 2 files changed, 2 insertions(+), 1 deletions(-)

commit fdf4fc3344e67ab068f836878b6c4951e3b15f3d
Author: Scott Chacon <schacon@gmail.com>
Date:   Fri May 22 18:09:34 2009 -0700

    first commit

 test.txt |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

真棒。你剛剛通過使用低級操作而不是那些普通命令創建了一個 Git 歷史。這基本上就是執行 git addgit commit 命令時 Git 進行的工作 ──保存修改了的檔案的 blob,更新索引,創建 tree 物件,最後創建 commit 物件,這些 commit 物件指向了頂層 tree 物件以及先前的 commit 物件。這三類 Git 物件 ── blob,tree 以及 commit ── 都各自以檔案的方式保存在 .git/objects 目錄下。以下所列是目前為止範例目錄的所有物件,每個物件後面的注釋裡標明了它們保存的內容:

$ find .git/objects -type f
.git/objects/01/55eb4229851634a0f03eb265b69f5a2d56f341 # tree 2
.git/objects/1a/410efbd13591db07496601ebc7a059dd55cfe9 # commit 3
.git/objects/1f/7a7a472abf3dd9643fd615f6da379c4acb3e3a # test.txt v2
.git/objects/3c/4e9cd789d88d8d89c1073707c3585e41b0e614 # tree 3
.git/objects/83/baae61804e65cc73a7201a7252750c76066a30 # test.txt v1
.git/objects/ca/c0cab538b970a37ea1e769cbbde608743bc96d # commit 2
.git/objects/d6/70460b4b4aece5915caf5c68d12f560a9fe3e4 # 'test content'
.git/objects/d8/329fc1cc938780ffdd9f94e0d364e0ea74f579 # tree 1
.git/objects/fa/49b077972391ad58037050f2a75f74e3671e92 # new.txt
.git/objects/fd/f4fc3344e67ab068f836878b6c4951e3b15f3d # commit 1

如果你按照以上描述進行了操作,可以得到如圖 9-3 所示的物件圖。

Figure 9-3. Git 目錄下的所有物件

物件儲存

之前我提到當儲存資料內容時,同時會有一個檔頭被儲存起來。我們花些時間來看看 Git 是如何儲存物件的。你將看到如何通過 Ruby 指令碼語言儲存一個 blob 物件 (這裡以字串 “what is up, doc?” 為例) 。使用 irb 命令進入 Ruby 互動式模式:

$ irb
>> content = "what is up, doc?"
=> "what is up, doc?"

Git 以物件類型為起始內容構造一個檔頭,本例中是一個 blob。然後添加一個空格,接著是資料內容的長度,最後是一個空位元組 (null byte):

>> header = "blob #{content.length}\0"
=> "blob 16\000"

Git 將檔頭與原始資料內容拼接起來,並計算拼接後的新內容的 SHA-1 校驗和。可以在 Ruby 中使用 require 語句導入 SHA1 digest 程式庫,然後呼叫 Digest::SHA1.hexdigest() 方法計算字串的 SHA-1 值:

>> store = header + content
=> "blob 16\000what is up, doc?"
>> require 'digest/sha1'
=> true
>> sha1 = Digest::SHA1.hexdigest(store)
=> "bd9dbf5aae1a3862dd1526723246b20206e5fc37"

Git 用 zlib 對資料內容進行壓縮,在 Ruby 中可以用 zlib 程式庫來實現。首先需要導入該程式庫,然後用 Zlib::Deflate.deflate() 對資料進行壓縮:

>> require 'zlib'
=> true
>> zlib_content = Zlib::Deflate.deflate(store)
=> "x\234K\312\311OR04c(\317H,Q\310,V(-\320QH\311O\266\a\000_\034\a\235"

最後將用 zlib 壓縮後的內容寫入磁片。需要指定保存物件的路徑 (SHA-1 值的頭兩個字元作為子目錄名稱,剩餘 38 個字元作為檔案名保存至該子目錄中)。在 Ruby 中,如果子目錄不存在可以用 FileUtils.mkdir_p() 函數創建它。接著用 File.open 方法打開檔案,並用 write() 方法將之前壓縮的內容寫入該檔:

>> path = '.git/objects/' + sha1[0,2] + '/' + sha1[2,38]
=> ".git/objects/bd/9dbf5aae1a3862dd1526723246b20206e5fc37"
>> require 'fileutils'
=> true
>> FileUtils.mkdir_p(File.dirname(path))
=> ".git/objects/bd"
>> File.open(path, 'w') { |f| f.write zlib_content }
=> 32

這就行了 ── 你已經創建了一個正確的 blob 物件。所有的 Git 物件都以這種方式儲存,惟一的區別是類型不同 ── 除了字串 blob,檔頭起始內容還可以是 commit 或 tree 。不過雖然 blob 幾乎可以是任意內容,commit 和 tree 的資料卻是有固定格式的。

Git References

你可以執行像 git log 1a410e 這樣的命令來查看完整的歷史,但是這樣你就要記得 1a410e 是你最後一次提交,這樣才能在提交歷史中找到這些物件。你需要一個檔來用一個簡單的名字來記錄這些 SHA-1 值,這樣你就可以用這些指標而不是原來的 SHA-1 值去檢索了。

在 Git 中,這些我們稱之為「引用」(references 或者 refs,譯者注)。你可以在 .git/refs 目錄下面找到這些包含 SHA-1 值的檔。在這個專案裡,這個目錄還沒不包含任何檔,但是包含這樣一個簡單的結構:

$ find .git/refs
.git/refs
.git/refs/heads
.git/refs/tags
$ find .git/refs -type f
$

如果想要創建一個新的引用幫助你記住最後一次提交,技術上你可以這樣做:

$ echo "1a410efbd13591db07496601ebc7a059dd55cfe9" > .git/refs/heads/master

現在,你就可以在 Git 命令中使用你剛才創建的引用而不是 SHA-1 值:

$ git log --pretty=oneline  master
1a410efbd13591db07496601ebc7a059dd55cfe9 third commit
cac0cab538b970a37ea1e769cbbde608743bc96d second commit
fdf4fc3344e67ab068f836878b6c4951e3b15f3d first commit

當然,我們並不鼓勵你直接修改這些引用檔。如果你確實需要更新一個引用,Git 提供了一個比較安全的命令 update-ref

$ git update-ref refs/heads/master 1a410efbd13591db07496601ebc7a059dd55cfe9

基本上 Git 中的一個分支其實就是一個指向某個工作版本一條 HEAD 記錄的指標或引用。你可以用這條命令創建一個指向第二次提交的分支:

$ git update-ref refs/heads/test cac0ca

這樣你的分支將會只包含那次提交以及之前的工作:

$ git log --pretty=oneline test
cac0cab538b970a37ea1e769cbbde608743bc96d second commit
fdf4fc3344e67ab068f836878b6c4951e3b15f3d first commit

現在,你的 Git 資料庫應該看起來像圖 9-4 一樣。

Figure 9-4. 包含分支引用的 Git 目錄物件

每當你執行 git branch (分支名稱) 這樣的命令,Git 基本上就是執行 update-ref 命令,把你現在所在分支中最後一次提交的 SHA-1 值,添加到你要創建的分支的引用。

HEAD 標記

現在的問題是,當你執行 git branch (分支名稱) 這條命令的時候,Git 怎麼知道最後一次提交的 SHA-1 值呢?答案就是 HEAD 檔。HEAD 檔是一個指向你當前所在分支的引用識別字。這樣的引用識別字——它看起來並不像一個普通的引用——其實並不包含 SHA-1 值,而是一個指向另外一個引用的指標。如果你看一下這個檔,通常你將會看到這樣的內容:

$ cat .git/HEAD 
ref: refs/heads/master

如果你執行 git checkout test,Git 就會更新這個檔,看起來像這樣:

$ cat .git/HEAD 
ref: refs/heads/test

當你再執行 git commit 命令,它就創建了一個 commit 物件,把這個 commit 物件的父級設置為 HEAD 指向的引用的 SHA-1 值。

你也可以手動編輯這個檔,但是同樣有一個更安全的方法可以這樣做:symbolic-ref。你可以用下面這條命令讀取 HEAD 的值:

$ git symbolic-ref HEAD
refs/heads/master

你也可以設置 HEAD 的值:

$ git symbolic-ref HEAD refs/heads/test
$ cat .git/HEAD 
ref: refs/heads/test

但是你不能設置成 refs 以外的形式:

$ git symbolic-ref HEAD test
fatal: Refusing to point HEAD outside of refs/

Tags

你剛剛已經重溫過了 Git 的三個主要物件類型,現在這是第四種。Tag 物件非常像一個 commit 物件——包含一個標籤,一組資料,一個消息和一個指標。最主要的區別就是 Tag 物件指向一個 commit 而不是一個 tree。它就像是一個分支引用,但是不會變化——永遠指向同一個 commit,僅僅是提供一個更加友好的名字。

正如我們在第二章所討論的,Tag 有兩種類型:annotated 和 lightweight 。你可以類似下面這樣的命令建立一個 lightweight tag:

$ git update-ref refs/tags/v1.0 cac0cab538b970a37ea1e769cbbde608743bc96d

這就是 lightweight tag 的全部 —— 一個永遠不會發生變化的分支。 annotated tag 要更複雜一點。如果你創建一個 annotated tag,Git 會創建一個 tag 物件,然後寫入一個 reference 指向這個 tag,而不是直接指向 commit。你可以這樣創建一個 annotated tag(-a 參數表明這是一個 annotated tag):

$ git tag -a v1.1 1a410efbd13591db07496601ebc7a059dd55cfe9 –m 'test tag'

這是所創建物件的 SHA-1 值:

$ cat .git/refs/tags/v1.1 
9585191f37f7b0fb9444f35a9bf50de191beadc2

現在你可以執行 cat-file 命令檢查這個 SHA-1 值:

$ git cat-file -p 9585191f37f7b0fb9444f35a9bf50de191beadc2
object 1a410efbd13591db07496601ebc7a059dd55cfe9
type commit
tag v1.1
tagger Scott Chacon <schacon@gmail.com> Sat May 23 16:48:58 2009 -0700

test tag

值得注意的是這個物件指向你所標記的 commit 物件的 SHA-1 值。同時需要注意的是它並不是必須要指向一個 commit 物件;你可以標記任何 Git 物件。例如,在 Git 的原始程式碼裡,管理者添加了一個 GPG 公開金鑰(這是一個 blob 物件)對它做了一個標籤。你可以執行以下命令來查看

$ git cat-file blob junio-gpg-pub

Git 原始程式碼裡的公開金鑰. Linux kernel 也有一個不是指向 commit 物件的 tag —— 第一個 tag 是在導入原始程式碼的時候創建的,它指向初始 tree (initial tree,譯者注)。

Remotes

你將會看到的第三種 reference 是 remote reference(遠端參照,譯者注)。如果你添加了一個 remote 然後推送代碼過去,Git 會把你最後一次推送到這個 remote 的每個分支的值都記錄在 refs/remotes 目錄下。例如,你可以添加一個叫做 origin 的 remote 然後把你的 master 分支推送上去:

$ git remote add origin git@github.com:schacon/simplegit-progit.git
$ git push origin master
Counting objects: 11, done.
Compressing objects: 100% (5/5), done.
Writing objects: 100% (7/7), 716 bytes, done.
Total 7 (delta 2), reused 4 (delta 1)
To git@github.com:schacon/simplegit-progit.git
   a11bef0..ca82a6d  master -> master

然後查看 refs/remotes/origin/master 這個檔,你就會發現 origin remote 中的 master 分支就是你最後一次和伺服器的通信。

$ cat .git/refs/remotes/origin/master 
ca82a6dff817ec66f44342007202690a93763949

Remote references 和分支(refs/heads references)的主要區別在於他們是不能被 check out 的。Git 把他們當作是標記了這些分支在伺服器上最後狀態的一種書簽。

Packfiles

我們再來看一下 test Git 倉庫。目前為止,有 11 個物件 ── 4 個 blob,3 個 tree,3 個 commit 以及一個 tag:

$ find .git/objects -type f
.git/objects/01/55eb4229851634a0f03eb265b69f5a2d56f341 # tree 2
.git/objects/1a/410efbd13591db07496601ebc7a059dd55cfe9 # commit 3
.git/objects/1f/7a7a472abf3dd9643fd615f6da379c4acb3e3a # test.txt v2
.git/objects/3c/4e9cd789d88d8d89c1073707c3585e41b0e614 # tree 3
.git/objects/83/baae61804e65cc73a7201a7252750c76066a30 # test.txt v1
.git/objects/95/85191f37f7b0fb9444f35a9bf50de191beadc2 # tag
.git/objects/ca/c0cab538b970a37ea1e769cbbde608743bc96d # commit 2
.git/objects/d6/70460b4b4aece5915caf5c68d12f560a9fe3e4 # 'test content'
.git/objects/d8/329fc1cc938780ffdd9f94e0d364e0ea74f579 # tree 1
.git/objects/fa/49b077972391ad58037050f2a75f74e3671e92 # new.txt
.git/objects/fd/f4fc3344e67ab068f836878b6c4951e3b15f3d # commit 1

Git 用 zlib 壓縮檔案內容,因此這些檔並沒有佔用太多空間,所有檔加起來總共僅用了 925 位元組。接下去你將添加一些大檔以演示 Git 的一個很有意思的功能。將你之前用到過的 Grit 庫中的 repo.rb 檔加進去 ── 這個原始程式碼檔大小約為 12K:

$ curl http://github.com/mojombo/grit/raw/master/lib/grit/repo.rb > repo.rb
$ git add repo.rb 
$ git commit -m 'added repo.rb'
[master 484a592] added repo.rb
 3 files changed, 459 insertions(+), 2 deletions(-)
 delete mode 100644 bak/test.txt
 create mode 100644 repo.rb
 rewrite test.txt (100%)

如果查看一下生成的 tree,可以看到 repo.rb 檔的 blob 物件的 SHA-1 值:

$ git cat-file -p master^{tree}
100644 blob fa49b077972391ad58037050f2a75f74e3671e92      new.txt
100644 blob 9bc1dc421dcd51b4ac296e3e5b6e2a99cf44391e      repo.rb
100644 blob e3f094f522629ae358806b17daf78246c27c007b      test.txt

然後可以用 git cat-file 命令查看這個物件有多大:

$ git cat-file -s 9bc1dc421dcd51b4ac296e3e5b6e2a99cf44391e
12898

稍微修改一下些檔,看會發生些什麼:

$ echo '# testing' >> repo.rb 
$ git commit -am 'modified repo a bit'
[master ab1afef] modified repo a bit
 1 files changed, 1 insertions(+), 0 deletions(-)

查看這個 commit 生成的 tree,可以看到一些有趣的東西:

$ git cat-file -p master^{tree}
100644 blob fa49b077972391ad58037050f2a75f74e3671e92      new.txt
100644 blob 05408d195263d853f09dca71d55116663690c27c      repo.rb
100644 blob e3f094f522629ae358806b17daf78246c27c007b      test.txt

blob 物件與之前的已經不同了。這說明雖然只是往一個 400 行的檔最後加入了一行內容,Git 卻用一個全新的物件來保存新的檔案內容:

$ git cat-file -s 05408d195263d853f09dca71d55116663690c27c
12908

你的磁片上有了兩個幾乎完全相同的 12K 的物件。如果 Git 只完整保存其中一個,並保存另一個物件的差異內容,豈不更好?

事實上 Git 可以那樣做。Git 往磁片保存物件時預設使用的格式叫鬆散物件 (loose object) 格式。Git 時不時地將這些物件打包至一個叫 packfile 的二進位檔案以節省空間並提高效率。當倉庫中有太多的鬆散物件,或是手動執行 git gc 命令,或推送至遠端伺服器時,Git 都會這樣做。手動執行 git gc 命令讓 Git 將倉庫中的物件打包,並看看會發生些什麼:

$ git gc
Counting objects: 17, done.
Delta compression using 2 threads.
Compressing objects: 100% (13/13), done.
Writing objects: 100% (17/17), done.
Total 17 (delta 1), reused 10 (delta 0)

查看一下 objects 目錄,會發現大部分物件都不在了,與此同時出現了兩個新檔:

$ find .git/objects -type f
.git/objects/71/08f7ecb345ee9d0084193f147cdad4d2998293
.git/objects/d6/70460b4b4aece5915caf5c68d12f560a9fe3e4
.git/objects/info/packs
.git/objects/pack/pack-7a16e4488ae40c7d2bc56ea2bd43e25212a66c45.idx
.git/objects/pack/pack-7a16e4488ae40c7d2bc56ea2bd43e25212a66c45.pack

仍保留著的幾個物件是未被任何 commit 引用的 blob ── 在此例中是你之前創建的 “what is up, doc?” 和 “test content” 這兩個示例 blob。你從沒將他們添加至任何 commit,所以 Git 認為它們是懸而未決的,不會將它們打包進 packfile 。

The other files are your new packfile and an index. The packfile is a single file containing the contents of all the objects that were removed from your filesystem. The index is a file that contains offsets into that packfile so you can quickly seek to a specific object. What is cool is that although the objects on disk before you ran the gc were collectively about 12K in size, the new packfile is only 6K. You’ve halved your disk usage by packing your objects. 剩下的檔是新創建的 packfile 以及一個索引。packfile 檔包含了剛才從檔案系統中移除的所有物件。索引檔包含了 packfile 的偏移資訊(offset),這樣就可以快速定位任意一個指定物件。有意思的是執行 gc 命令前磁片上的物件大小約為 12K ,而這個新生成的 packfile 僅為 6K 大小。通過打包物件減少了一半磁片使用空間。

Git 是如何做到這點的?Git 打包物件時,會查找命名及尺寸相近的檔,並只保存檔案不同版本之間的差異內容。可以查看一下 packfile ,觀察它是如何節省空間的。git verify-pack 命令用於顯示已打包的內容:

$ git verify-pack -v \
  .git/objects/pack/pack-7a16e4488ae40c7d2bc56ea2bd43e25212a66c45.idx
0155eb4229851634a0f03eb265b69f5a2d56f341 tree   71 76 5400
05408d195263d853f09dca71d55116663690c27c blob   12908 3478 874
09f01cea547666f58d6a8d809583841a7c6f0130 tree   106 107 5086
1a410efbd13591db07496601ebc7a059dd55cfe9 commit 225 151 322
1f7a7a472abf3dd9643fd615f6da379c4acb3e3a blob   10 19 5381
3c4e9cd789d88d8d89c1073707c3585e41b0e614 tree   101 105 5211
484a59275031909e19aadb7c92262719cfcdf19a commit 226 153 169
83baae61804e65cc73a7201a7252750c76066a30 blob   10 19 5362
9585191f37f7b0fb9444f35a9bf50de191beadc2 tag    136 127 5476
9bc1dc421dcd51b4ac296e3e5b6e2a99cf44391e blob   7 18 5193 1
05408d195263d853f09dca71d55116663690c27c \
  ab1afef80fac8e34258ff41fc1b867c702daa24b commit 232 157 12
cac0cab538b970a37ea1e769cbbde608743bc96d commit 226 154 473
d8329fc1cc938780ffdd9f94e0d364e0ea74f579 tree   36 46 5316
e3f094f522629ae358806b17daf78246c27c007b blob   1486 734 4352
f8f51d7d8a1760462eca26eebafde32087499533 tree   106 107 749
fa49b077972391ad58037050f2a75f74e3671e92 blob   9 18 856
fdf4fc3344e67ab068f836878b6c4951e3b15f3d commit 177 122 627
chain length = 1: 1 object
pack-7a16e4488ae40c7d2bc56ea2bd43e25212a66c45.pack: ok

如果你還記得的話, 9bc1d 這個 blob 是 repo.rb 檔的第一個版本,這個 blob 引用了 05408 這個 blob,即該檔的第二個版本。命令輸出內容的第三欄顯示的是物件大小,可以看到 05408 佔用了 12K 空間,而 9bc1d 僅為 7 位元組。非常有趣的是第二個版本才是完整保存檔案內容的物件,而第一個版本是以差異方式保存的 ── 這是因為大部分情況下需要快速訪問的是檔案的最新版本。

最妙的是可以隨時進行重新封包。Git 自動定期對倉庫進行重新封包以節省空間。當然也可以手動執行 git gc 命令來這麼做。

The Refspec

這本書讀到這裡,你已經使用過一些簡單的遠端分支到本地引用的映射方式了,這種映射可以更為複雜。假設你像這樣添加了一項遠端倉庫:

$ git remote add origin git@github.com:schacon/simplegit-progit.git

它在你的 .git/config 檔中添加了一節,指定了遠程的名稱 (origin), 遠程倉庫的 URL 地址,和用於獲取(fetch)操作的 Refspec:

[remote "origin"]
       url = git@github.com:schacon/simplegit-progit.git
       fetch = +refs/heads/*:refs/remotes/origin/*

Refspec 的格式是一個可選的 + 號,接著是 <src>:<dst> 的格式,這裡 <src> 是遠端上的引用格式, <dst> 是將要記錄在本地的引用格式。可選的 + 號告訴 Git 在即使不能快速演進(fast-forward)的情況下,也去強制更新它。

預設情況下 refspec 會被 git remote add 命令所自動產生,Git 會獲取遠端伺服器上 refs/heads/ 下面的所有引用,並將它寫入到本地的 refs/remotes/origin/. 所以,如果遠端伺服器上有一個 master 分支,你在本地可以通過下面這種方式來取得它的歷史記錄:

$ git log origin/master
$ git log remotes/origin/master
$ git log refs/remotes/origin/master

They’re all equivalent, because Git expands each of them to refs/remotes/origin/master. 它們的作用都是相同的,因為 Git 把它們都擴展成 refs/remotes/origin/master.

如果你想讓 Git 每次只拉取遠端的 master 分支,而不是遠端的所有分支,你可以把 fetch 這一行修改成這樣:

fetch = +refs/heads/master:refs/remotes/origin/master

這是 git fetch 操作對這個遠端的預設 refspec 值。而如果你只想做一次該操作,也可以在命令列上指定這個 refspec. 例如可以這樣拉取遠端的 master 分支到本地的 origin/mymaster 分支:

$ git fetch origin master:refs/remotes/origin/mymaster

你也可以在命令列上指定多個 refspec. 像這樣可以一次獲取遠端的多個分支:

$ git fetch origin master:refs/remotes/origin/mymaster \
   topic:refs/remotes/origin/topic
From git@github.com:schacon/simplegit
 ! [rejected]        master     -> origin/mymaster  (non fast forward)
 * [new branch]      topic      -> origin/topic

在這個例子中, master 分支因為不是一個可以快速演進的引用而拉取操作被拒絕。你可以在 refspec 之前使用一個 + 號來 override 這種行為。

你也可以在設定檔中指定多個 refspec. 如你想在每次獲取時都獲取 master 和 experiment 分支,就添加兩行:

[remote "origin"]
       url = git@github.com:schacon/simplegit-progit.git
       fetch = +refs/heads/master:refs/remotes/origin/master
       fetch = +refs/heads/experiment:refs/remotes/origin/experiment

但是這裡不能使用部分萬用字元,像這樣就是不合法的:

fetch = +refs/heads/qa*:refs/remotes/origin/qa*

但是你可以使用命名空間來達到這個目的。如果你有一個 QA 團隊,他們推送一系列分支,你想每次獲取 master 分支和 QA 團隊 的所有分支,你可以使用這樣的配置段落(config section):

[remote "origin"]
       url = git@github.com:schacon/simplegit-progit.git
       fetch = +refs/heads/master:refs/remotes/origin/master
       fetch = +refs/heads/qa/*:refs/remotes/origin/qa/*

如果你的工作流程很複雜,有QA團隊推送的分支、開發人員推送的分支、和集成人員推送的分支,並且他們在遠端分支上協作,你可以採用這種方式為他們創建各自的命名空間。

推送 Refspecs

採用命名空間的方式確實很棒,但QA團隊第一次是如何將他們的分支推送到 qa/ 空間裡面的呢?答案是你可以使用 refspec 來推送。

如果QA團隊想把他們的 master 分支推送到遠端的 qa/master 分支上,可以這樣執行:

$ git push origin master:refs/heads/qa/master

如果他們想讓 Git 每次運行 git push origin 時都這樣自動推送,他們可以在設定檔中添加 push 值:

[remote "origin"]
       url = git@github.com:schacon/simplegit-progit.git
       fetch = +refs/heads/*:refs/remotes/origin/*
       push = refs/heads/master:refs/heads/qa/master

這樣,就會讓 git push origin 預設就把本地的 master 分支推送到遠端的 qa/master 分支上。

刪除 References

你也可以使用 refspec 來刪除遠端的引用(references),是通過執行這樣的命令:

$ git push origin :topic

因為 refspec 的格式是 <src>:<dst>, 通過把 <src> 部分留空的方式,這個意思是是把遠端的 topic 分支變成空,也就是刪除它。

傳輸協議

Git 可以用兩種主要的方式跨越兩個倉庫傳輸資料:基於 HTTP 協定之上,和 file://, ssh://, 和 git:// 等智慧傳輸協議。這一節帶你快速流覽這兩種主要的協議操作過程。

啞協議

Git 基於 HTTP 之上傳輸通常被稱為啞協議,這是因為它在服務端不需要有針對 Git 特有的代碼。這個獲取過程僅僅是一系列 GET 請求,用戶端可以假定服務端的 Git 倉庫中的佈局。讓我們以 simplegit 倉庫為例來看看 http-fetch 的過程:

$ git clone http://github.com/schacon/simplegit-progit.git

它做的第一件事情就是獲取 info/refs 檔。這個檔是在服務端運行了 update-server-info 所產生的,這也解釋了為什麼在服務端要想使用 HTTP 傳輸,必須要開啟 post-receive 鉤子(hook):

=> GET info/refs
ca82a6dff817ec66f44342007202690a93763949     refs/heads/master

現在你有一個遠端引用和 SHA 值的列表。下一步是尋找 HEAD 引用,這樣你就知道了在完成後,什麼應該被檢出到工作目錄:

=> GET HEAD
ref: refs/heads/master

這說明在完成獲取後,需要檢出(check out) master 分支。 這時,已經可以開始漫遊操作(walking process)了。因為你的起點是在 info/refs 檔中所提到的 ca82a6 commit 物件,你的開始操作就是獲取它:

=> GET objects/ca/82a6dff817ec66f44342007202690a93763949
(179 bytes of binary data)

然後你取回了這個物件 - 這在服務端是一個鬆散格式的物件,你使用的是靜態的 HTTP GET 請求獲取的。可以使用 zlib 解壓縮它,去除檔頭,查看它的 commmit 內容:

$ git cat-file -p ca82a6dff817ec66f44342007202690a93763949
tree cfda3bf379e4f8dba8717dee55aab78aef7f4daf
parent 085bb3bcb608e1e8451d4b2432f8ecbe6306e7e7
author Scott Chacon <schacon@gmail.com> 1205815931 -0700
committer Scott Chacon <schacon@gmail.com> 1240030591 -0700

changed the version number

這樣,就得到了兩個需要進一步獲取的物件 - cfda3b 是這個 commit 物件所對應的 tree 物件,和 085bb3 是它的父物件:

=> GET objects/08/5bb3bcb608e1e8451d4b2432f8ecbe6306e7e7
(179 bytes of data)

這樣就取得了這它的下一步 commit 物件,再抓取 tree 物件:

=> GET objects/cf/da3bf379e4f8dba8717dee55aab78aef7f4daf
(404 - Not Found)

哎呀! - 看起來這個 tree 物件在服務端並不以鬆散格式對象存在,所以得到了404回應,代表在 HTTP 服務端沒有找到該物件。這有好幾個原因 - 這個物件可能在替代倉庫裡面,或者在打包檔裡面,Git 會首先檢查任何列出的替代倉庫:

=> GET objects/info/http-alternates
(empty file)

如果這回傳了幾個替代倉庫列表,那麼它會去那些地方檢查鬆散格式物件和檔案 - 這是一種在軟體分叉(forks)之間共用物件以節省磁碟的好方法。然而,在這個例子中,沒有替代倉庫。所以你所需要的物件肯定在某個打包檔中。要檢查服務端有哪些打包格式檔,你需要獲取 objects/info/packs 檔,這裡面包含有打包檔列表(是的,它也是被 update-server-info 所產生的):

=> GET objects/info/packs
P pack-816a9b2334da9953e530f27bcac22082a9f5b835.pack

這裡服務端只有一個打包檔,所以你要的物件顯然就在裡面。但是你可以先檢查它的索引檔以確認。這在服務端有多個打包檔時也很有用,因為這樣就可以先檢查你所需要的物件空間是在哪一個打包檔裡面了:

=> GET objects/pack/pack-816a9b2334da9953e530f27bcac22082a9f5b835.idx
(4k of binary data)

現在你有了這個打包檔的索引,你可以看看你要的物件是否在裡面 - 因為索引檔列出了這個打包檔所包含的所有物件的SHA值,和該物件存在於打包檔中的偏移量,所以你只需要簡單地獲取整個打包檔:

=> GET objects/pack/pack-816a9b2334da9953e530f27bcac22082a9f5b835.pack
(13k of binary data)

現在你也有了這個 tree 物件,你可以繼續在 commit 物件上漫遊。它們全部都在這個你已經下載到的打包檔裡面,所以你不用繼續向服務端請求更多下載了。在這完成之後,由於下載開始時已探明 HEAD 引用是指向 master 分支,Git 會將它檢出到工作目錄。

整個過程看起來就像這樣:

$ git clone http://github.com/schacon/simplegit-progit.git
Initialized empty Git repository in /private/tmp/simplegit-progit/.git/
got ca82a6dff817ec66f44342007202690a93763949
walk ca82a6dff817ec66f44342007202690a93763949
got 085bb3bcb608e1e8451d4b2432f8ecbe6306e7e7
Getting alternates list for http://github.com/schacon/simplegit-progit.git
Getting pack list for http://github.com/schacon/simplegit-progit.git
Getting index for pack 816a9b2334da9953e530f27bcac22082a9f5b835
Getting pack 816a9b2334da9953e530f27bcac22082a9f5b835
 which contains cfda3bf379e4f8dba8717dee55aab78aef7f4daf
walk 085bb3bcb608e1e8451d4b2432f8ecbe6306e7e7
walk a11bef06a3f659402fe7563abf99ad00de2209e6

智慧協議

HTTP 方法是很簡單但效率不是很高。使用智慧協定是傳送資料的更常用的方法。這些協定在遠端都有 Git 智慧型程序(process)在服務 - 它可以讀出本地資料並計算出用戶端所需要的,並產生合適的資料給它,這有兩類傳輸資料的程序:一對用於上傳資料和一對用於下載。

上傳資料

為了上傳資料至遠端,Git 使用 send-packreceive-pack 程序。這個 send-pack 程序運行在用戶端上,它連接至遠端運行的 receive-pack 程序。

舉例來說,你在你的專案上執行了 git push origin master, 並且 origin 被定義為一個使用 SSH 協議的 URL。Git 會使用 send-pack 程序,它會啟動一個基於 SSH 的連接(connection)到伺服器。它嘗試像這樣透過 SSH 在服務端運行命令:

$ ssh -x git@github.com "git-receive-pack 'schacon/simplegit-progit.git'"
005bca82a6dff817ec66f4437202690a93763949 refs/heads/master report-status delete-refs
003e085bb3bcb608e1e84b2432f8ecbe6306e7e7 refs/heads/topic
0000

這裡的 git-receive-pack 命令會立即對它所擁有的每一個引用回應一行 - 在這個例子中,只有 master 分支和它的SHA值。這裡第1行也包含了服務端的能力清單(這裡是 report-statusdelete-refs)。

每一行以4位元組的十六進位開始,用於指定整行的長度。你看到第1行以005b開始,這在十六進位中表示91,意味著第1行有91位元組長。下一行以003e起始,表示有62位元組長,所以需要讀剩下的62位元組。再下一行是0000開始,表示伺服器已完成了引用列表過程。

現在它知道了服務端的狀態,你的 send-pack 程序會判斷哪些 commit 是它所擁有但服務端沒有的。針對每個引用,這次推送都會告訴對端的 receive-pack 這個資訊。舉例說,如果你在更新 master 分支,並且增加 experiment 分支,這個 send-pack 將會是像這樣:

0085ca82a6dff817ec66f44342007202690a93763949  15027957951b64cf874c3557a0f3547bd83b3ff6 refs/heads/master report-status
00670000000000000000000000000000000000000000 cdfdb42577e2506715f8cfeacdbabc092bf63e8d refs/heads/experiment
0000

這裡全部是’0’的SHA-1值表示之前沒有過這個物件 - 因為你是在添加新的 experiment 引用。如果你在刪除一個引用,你會看到相反的: 就是右邊全部是’0’。

Git 針對每個引用發送這樣一行資訊,就是舊的SHA值,新的SHA值,和將要更新的引用的名稱。第1行還會包含有用戶端的能力。下一步,用戶端會發送一個所有那些服務端所沒有的物件的一個打包檔。最後,服務端以成功(或者失敗)來回應:

000Aunpack ok

下載資料

當你在下載資料時,fetch-packupload-pack 程序就起作用了。用戶端啟動 fetch-pack 程序,連接至遠端的 upload-pack 程序,以協商後續資料傳輸過程。

在遠端倉庫有不同的方式啟動 upload-pack 程序。你可以使用與 receive-pack 相同的透過 SSH 管道的方式,也可以通過 Git 後臺來啟動這個進程,它預設監聽在 9418 號埠上。這裡 fetch-pack 程序在連接後像這樣向後臺發送資料:

003fgit-upload-pack schacon/simplegit-progit.git\0host=myserver.com\0

它也是以4位元組指定後續位元組長度的方式開始,然後是要執行的命令,和一個空位元組,然後是服務端的主機名稱,再跟隨一個最後的空位元組。Git 後臺程序會檢查這個命令是否可以執行,以及那個倉庫是否存在,以及是否具有公開許可權。如果所有檢查都通過了,它會啟動這個 upload-pack 程序並將用戶端的請求移交給它。

如果你透過 SSH 使用獲取(fetch)功能,fetch-pack 會像這樣運行:

$ ssh -x git@github.com "git-upload-pack 'schacon/simplegit-progit.git'"

不管哪種方式,在 fetch-pack 連接之後, upload-pack 都會以這種形式回傳:

0088ca82a6dff817ec66f44342007202690a93763949 HEAD\0multi_ack thin-pack \
  side-band side-band-64k ofs-delta shallow no-progress include-tag
003fca82a6dff817ec66f44342007202690a93763949 refs/heads/master
003e085bb3bcb608e1e8451d4b2432f8ecbe6306e7e7 refs/heads/topic
0000

這與 receive-pack 回應很類似,但是這裡指的能力是不同的。而且它還會指出 HEAD 引用,讓用戶端可以檢查是否是一份 clone。

在這裡,fetch-pack 程序檢查它自己所擁有的物件和所有它需要的物件,通過發送 “want” 和所需物件的 SHA 值,發送 “have” 和所有它已擁有的物件的 SHA 值。在列表完成時,再發送 “done” 通知 upload-pack 程序開始發送所需物件的打包檔。這個過程看起來像這樣:

0054want ca82a6dff817ec66f44342007202690a93763949 ofs-delta
0032have 085bb3bcb608e1e8451d4b2432f8ecbe6306e7e7
0000
0009done

這是傳輸協議的一個很基礎的例子,在更複雜的例子中,用戶端可能會支援 multi_ack 或者 side-band 能力;但是這個例子中展示了智慧協議的基本交互過程。

維護及資料復原

偶爾,你可能需要進行一些清理工作 ── 如減小一個倉庫的大小,清理導入的倉庫,或是恢復丟失的資料。本節將描述這類使用場景。

維護

Git 會不定時地自動執行稱為「auto gc」的命令。大部分情況下該命令什麼都不處理。不過要是存在太多鬆散物件 (loose object, 不在 packfile 中的物件) 或 packfile,Git 會執行 git gc 命令。gc 指垃圾收集 (garbage collect),此命令會做很多工作:收集所有鬆散物件並將它們存入 packfile,合併這些 packfile 進一個大的 packfile,然後將不被任何 commit 引用並且已存在一段時間 (數月) 的物件刪除。

可以如下手動執行 auto gc 命令:

$ git gc --auto

再次強調,這個命令一般什麼都不幹。如果有 7,000 個左右的鬆散對象或是 50 個以上的 packfile,Git 才會真正觸發 gc 命令。你可以修改配置中的 gc.autogc.autopacklimit 來調整這兩個設定值。

gc 還會將所有引用 (references) 併入一個單獨檔。假設倉庫中包含以下分支和標籤:

$ find .git/refs -type f
.git/refs/heads/experiment
.git/refs/heads/master
.git/refs/tags/v1.0
.git/refs/tags/v1.1

這時如果執行 git gc, refs 下的所有檔都會消失。Git 會將這些檔挪到 .git/packed-refs 檔中去以提高效率,該檔是這個樣子的:

$ cat .git/packed-refs 
# pack-refs with: peeled 
cac0cab538b970a37ea1e769cbbde608743bc96d refs/heads/experiment
ab1afef80fac8e34258ff41fc1b867c702daa24b refs/heads/master
cac0cab538b970a37ea1e769cbbde608743bc96d refs/tags/v1.0
9585191f37f7b0fb9444f35a9bf50de191beadc2 refs/tags/v1.1
^1a410efbd13591db07496601ebc7a059dd55cfe9

當更新一個引用時,Git 不會修改這個檔,而是在 refs/heads 下寫入一個新檔。當查找一個引用的 SHA 時,Git 首先在 refs 目錄下查找,如果未找到則到 packed-refs 檔中去查找。因此如果在 refs 目錄下找不到一個引用,該引用可能存到 packed-refs 檔中去了。

請留意檔最後以 ^ 開頭的那一行。這表示該行上一行的那個標籤是一個 annotated 標籤,而該行正是那個標籤所指向的 commit 。

資料復原

在使用 Git 的過程中,有時會不小心丟失 commit 資訊。這一般出現在以下情況下:強制刪除了一個分支而後又想重新使用這個分支,hard-reset 了一個分支從而丟棄了分支的部分 commit。如果這真的發生了,有什麼辦法把丟失的 commit 找回來呢?

下面的例子演示了對 test 倉庫 master 分支進行 hard-reset 到一個老版本的 commit 的操作,然後恢復丟失的 commit 。首先查看一下當前的倉庫狀態:

$ git log --pretty=oneline
ab1afef80fac8e34258ff41fc1b867c702daa24b modified repo a bit
484a59275031909e19aadb7c92262719cfcdf19a added repo.rb
1a410efbd13591db07496601ebc7a059dd55cfe9 third commit
cac0cab538b970a37ea1e769cbbde608743bc96d second commit
fdf4fc3344e67ab068f836878b6c4951e3b15f3d first commit

接著將 master 分支移回至中間的一個 commit:

$ git reset --hard 1a410efbd13591db07496601ebc7a059dd55cfe9
HEAD is now at 1a410ef third commit
$ git log --pretty=oneline
1a410efbd13591db07496601ebc7a059dd55cfe9 third commit
cac0cab538b970a37ea1e769cbbde608743bc96d second commit
fdf4fc3344e67ab068f836878b6c4951e3b15f3d first commit

這樣就丟棄了最新的兩個 commit ── 包含這兩個 commit 的分支不存在了。現在要做的是找出最新的那個 commit 的 SHA,然後添加一個指向它的分支。關鍵在於找出最新的 commit 的 SHA ── 你不大可能記住了這個 SHA,是吧?

通常最快捷的辦法是使用 git reflog 工具。當你 (在一個倉庫下) 工作時,Git 會在你每次修改了 HEAD 時悄悄地將改動記錄下來。當你提交或修改分支時,reflog 就會更新。git update-ref 命令也可以更新 reflog,這是在本章前面的 “Git References” 部分我們使用該命令而不是手工將 SHA 值寫入 ref 文件的理由。任何時間執行 git reflog 命令可以查看當前的狀態:

$ git reflog
1a410ef HEAD@{0}: 1a410efbd13591db07496601ebc7a059dd55cfe9: updating HEAD
ab1afef HEAD@{1}: ab1afef80fac8e34258ff41fc1b867c702daa24b: updating HEAD

可以看到我們 check out 的兩個 commit ,但沒有更多的相關資訊。執行 git log -g 會輸出 reflog 的正常日誌,從而顯示更多有用資訊:

$ git log -g
commit 1a410efbd13591db07496601ebc7a059dd55cfe9
Reflog: HEAD@{0} (Scott Chacon <schacon@gmail.com>)
Reflog message: updating HEAD
Author: Scott Chacon <schacon@gmail.com>
Date:   Fri May 22 18:22:37 2009 -0700

    third commit

commit ab1afef80fac8e34258ff41fc1b867c702daa24b
Reflog: HEAD@{1} (Scott Chacon <schacon@gmail.com>)
Reflog message: updating HEAD
Author: Scott Chacon <schacon@gmail.com>
Date:   Fri May 22 18:15:24 2009 -0700

     modified repo a bit

看起來弄丟了的 commit 是底下那個,這樣在那個 commit 上創建一個新分支就能把它恢復過來。比方說,可以在那個 commit (ab1afef) 上創建一個名為 recover-branch 的分支:

$ git branch recover-branch ab1afef
$ git log --pretty=oneline recover-branch
ab1afef80fac8e34258ff41fc1b867c702daa24b modified repo a bit
484a59275031909e19aadb7c92262719cfcdf19a added repo.rb
1a410efbd13591db07496601ebc7a059dd55cfe9 third commit
cac0cab538b970a37ea1e769cbbde608743bc96d second commit
fdf4fc3344e67ab068f836878b6c4951e3b15f3d first commit

酷!這樣有了一個跟原來 master 一樣的 recover-branch 分支,最新的兩個 commit 又找回來了。 接著,假設引起 commit 丟失的原因並沒有記錄在 reflog 中 ── 可以通過刪除 recover-branch 和 reflog 來類比這種情況。這樣最新的兩個 commit 不會被任何東西引用到:

$ git branch –D recover-branch
$ rm -Rf .git/logs/

因為 reflog 資料是保存在 .git/logs/ 目錄下的,這樣就沒有 reflog 了。現在要怎樣恢復 commit 呢?辦法之一是使用 git fsck 工具,該工具會檢查倉庫的資料完整性。如果指定 --full 選項,該命令顯示所有未被其他物件引用 (指向) 的所有物件:

$ git fsck --full
dangling blob d670460b4b4aece5915caf5c68d12f560a9fe3e4
dangling commit ab1afef80fac8e34258ff41fc1b867c702daa24b
dangling tree aea790b9a58f6cf6f2804eeac9f0abbe9631e4c9
dangling blob 7108f7ecb345ee9d0084193f147cdad4d2998293

本例中,可以從 dangling commit 找到丟失了的 commit。用相同的方法就可以恢復它,即創建一個指向該 SHA 的分支。

移除物件

Git 有許多過人之處,不過有一個功能有時卻會帶來問題:git clone 會將包含每一個檔的所有歷史版本的整個專案下載下來。如果專案包含的僅僅是原始程式碼的話這並沒有什麼壞處,畢竟 Git 可以非常高效地壓縮此類資料。不過如果有人在某個時刻往專案中添加了一個非常大的檔,即便他在後來的提交中將此檔刪掉了,所有的簽出都會下載這個大檔。因為歷史記錄中引用了這個檔,它會一直存在著。

當你將 Subversion 或 Perforce 倉庫轉換導入至 Git 時這會成為一個很嚴重的問題。在此類系統中,(簽出時) 不會下載整個倉庫歷史,所以這種情形不大會有不良後果。如果你從其他系統導入了一個倉庫,或是發覺一個倉庫的尺寸遠超出預計,可以用下面的方法找到並移除大 (尺寸) 物件。

警告:此方法會破壞提交歷史。為了移除對一個大檔的引用,從最早包含該引用的 tree 物件開始之後的所有 commit 物件都會被重寫。如果在剛導入一個倉庫並在其他人在此基礎上開始工作之前這麼做,那沒有什麼問題 ── 否則你不得不通知所有協作者 (貢獻者) 去衍合你新修改的 commit 。

為了演示這點,往 test 倉庫中加入一個大檔,然後在下次提交時將它刪除,接著找到並將這個檔從倉庫中永久刪除。首先,加一個大檔進去:

$ curl http://kernel.org/pub/software/scm/git/git-1.6.3.1.tar.bz2 > git.tbz2
$ git add git.tbz2
$ git commit -am 'added git tarball'
[master 6df7640] added git tarball
 1 files changed, 0 insertions(+), 0 deletions(-)
 create mode 100644 git.tbz2

喔,你並不想往專案中加進一個這麼大的 tar 包。最後還是去掉它:

$ git rm git.tbz2 
rm 'git.tbz2'
$ git commit -m 'oops - removed large tarball'
[master da3f30d] oops - removed large tarball
 1 files changed, 0 insertions(+), 0 deletions(-)
 delete mode 100644 git.tbz2

對倉庫進行 gc 操作,並查看佔用了空間:

$ git gc
Counting objects: 21, done.
Delta compression using 2 threads.
Compressing objects: 100% (16/16), done.
Writing objects: 100% (21/21), done.
Total 21 (delta 3), reused 15 (delta 1)

可以執行 count-objects 以查看使用了多少空間:

$ git count-objects -v
count: 4
size: 16
in-pack: 21
packs: 1
size-pack: 2016
prune-packable: 0
garbage: 0

size-pack 是以 KB 為單位表示的 packfiles 的大小,因此已經使用了 2MB 。而在這次提交之前僅用了 2K 左右 ── 顯然在這次提交時刪除檔並沒有真正將其從歷史記錄中刪除。每當有人複製這個倉庫去取得這個小專案時,都不得不複製所有 2MB 資料,而這僅僅因為你曾經不小心加了個大檔。讓我們來解決這個問題。

首先要找出這個檔。在本例中,你知道是哪個文件。假設你並不知道這一點,要如何找出哪個 (些) 文件佔用了這麼多的空間?如果執行 git gc,所有物件會存入一個 packfile 檔;執行另一個底層命令 git verify-pack 以識別出大物件,對輸出的第三欄資訊即檔案大小進行排序,還可以將輸出定向(pipe)到 tail 命令,因為你只關心排在最後的那幾個最大的檔:

$ git verify-pack -v .git/objects/pack/pack-3f8c0...bb.idx | sort -k 3 -n | tail -3
e3f094f522629ae358806b17daf78246c27c007b blob   1486 734 4667
05408d195263d853f09dca71d55116663690c27c blob   12908 3478 1189
7a9eb2fba2b1811321254ac360970fc169ba2330 blob   2056716 2056872 5401

最底下那個就是那個大檔:2MB 。要查看這到底是哪個檔,可以使用第 7 章中已經簡單使用過的 rev-list 命令。若給 rev-list 命令傳入 --objects 選項,它會列出所有 commit SHA 值,blob SHA 值及相應的檔路徑。可以這樣查看 blob 的檔案名:

$ git rev-list --objects --all | grep 7a9eb2fb
7a9eb2fba2b1811321254ac360970fc169ba2330 git.tbz2

接下來要將該檔從歷史記錄的所有 tree 中移除。很容易找出哪些 commit 修改了這個檔:

$ git log --pretty=oneline -- git.tbz2
da3f30d019005479c99eb4c3406225613985a1db oops - removed large tarball
6df764092f3e7c8f5f94cbe08ee5cf42e92a0289 added git tarball

必須重寫從 6df76 開始的所有 commit 才能將檔從 Git 歷史中完全移除。這麼做需要用到第 6 章中用過的 filter-branch 命令:

$ git filter-branch --index-filter \
   'git rm --cached --ignore-unmatch git.tbz2' -- 6df7640^..
Rewrite 6df764092f3e7c8f5f94cbe08ee5cf42e92a0289 (1/2)rm 'git.tbz2'
Rewrite da3f30d019005479c99eb4c3406225613985a1db (2/2)
Ref 'refs/heads/master' was rewritten

--index-filter 選項類似於第 6 章中使用的 --tree-filter 選項,但這裡不是傳入一個命令去修改磁碟上 checked out 的檔,而是修改暫存區域或索引。不能用 rm file 命令來刪除一個特定檔,而是必須用 git rm --cached 來刪除它 ── 也就是說,從索引而不是從磁碟上刪除它。這樣做是出於速度考慮 ── 由於 Git 在執行你的 filter 之前無需將所有版本簽出到磁片上,這個操作會快得多。也可以用 --tree-filter 來完成相同的操作。git rm--ignore-unmatch 選項指定當你試圖刪除的內容並不存在時不顯示錯誤。最後,因為你清楚問題是從哪個 commit 開始的,使用 filter-branch 重寫自 6df7640 這個 commit 開始的所有歷史記錄。不這麼做的話會重寫所有歷史記錄,花費不必要的更多時間。

現在歷史記錄中已經不包含對那個檔的引用了。不過 reflog 以及執行 filter-branch 時 Git 往 .git/refs/original 添加的一些 refs 中仍有對它的引用,因此需要將這些引用刪除並對倉庫進行 repack 操作。在進行 repack 前需要將所有對這些 commits 的引用去除:

$ rm -Rf .git/refs/original
$ rm -Rf .git/logs/
$ git gc
Counting objects: 19, done.
Delta compression using 2 threads.
Compressing objects: 100% (14/14), done.
Writing objects: 100% (19/19), done.
Total 19 (delta 3), reused 16 (delta 1)

看一下節省了多少空間。

$ git count-objects -v
count: 8
size: 2040
in-pack: 19
packs: 1
size-pack: 7
prune-packable: 0
garbage: 0

repack 後倉庫的大小減小到了 7K ,遠小於之前的 2MB 。從 size 值可以看出大檔物件還在鬆散物件中,其實並沒有消失,不過這沒有關係,重要的是在再進行推送或複製,這個物件不會再傳送出去。如果真的要完全把這個物件刪除,可以運行 git prune --expire 命令。

總結

現在你應該對 Git 可以作什麼相當瞭解了,並且在一定程度上也知道了 Git 是如何實現的。本章涵蓋了許多 plumbing 命令 ── 這些命令比較底層,且比你在本書其他部分學到的 porcelain 命令要來得簡單。從底層瞭解 Git 的工作原理可以幫助你更好地理解為何 Git 實現了目前的這些功能,也使你能夠針對你的工作流程寫出自己的工具和腳本。

Git 作為一套 content-addressable 的檔案系統,是一個非常強大的工具,而不僅僅只是一個 VCS。希望借助於你新學到的 Git 內部原理的知識,你可以自己實做出有趣的應用,並以更進階的方式、更如魚得水的使用 Git。