jffs2

最新推荐文章于 2024-02-21 00:26:03 发布

原创最新推荐文章于 2024-02-21 00:26:03 发布 · 4.7k 阅读

3 ·

CC 4.0 BY-SA版权

文章标签：

#struct #flash #list #cache #file #thread

第1章 jffs2在flash上的資料實體及其內核描述符

jffs2在flash上只有兩種類型的資料實體：jffs2_raw_inode和jffs2_raw_dirent，其中jffs2_raw_dirent用於描述一個目錄項，緊隨其後的爲被硬鏈結的文件的檔案名。一個目錄文件由若干jffs2_raw_dirent描述。其他文件（正規文件、符號鏈結文件、SOCKET/FIFO文件、設備文件）都由一個或多個jffs2_raw_inode來表示，而緊隨jffs2_raw_inode資料結構後的爲相關資料塊，不同文件（目錄文件除外）所需要的jffs2_raw_inode個數及其後資料的內容如下表所示：

文件類型	所需jffs2_raw_inode 結點的個數	後繼資料的內容
正規文件	>= 1	文件的資料
符號鏈結文件	1	被鏈結的檔案名
SOCKET/FIFO文件	1	無
設備文件	1	設備號

區分jffs2_raw_dirent和jffs2_raw_inode、目錄文件以若干jffs2_raw_dirent而非jffs2_raw_inode實體存儲是爲了實現硬鏈結。在jffs版本1中就只有類似jffs2_raw_inode的一種資料實體，只能實現符號鏈結。

flash上資料結點jffs2_raw_dirent和jffs2_raw_inode都以相同的“頭”開始：

struct jffs2_unknown_node

{

/* All start like this */

jint16_t magic;

jint16_t nodetype;

jint32_t totlen; /* So we can skip over nodes we don't grok */

jint32_t hdr_crc;

} __attribute__((packed));

其中nodetype指明資料結點的具體類型JFFS_NODETYPE_DIRENT或者JFFS2_NODETYPE_INODE；totlen爲包括後繼的資料的整個資料實體的總長度；hdr_crc爲頭部中其他域的CRC校驗值。另外整個資料結構在記憶體中以“緊湊”方式進行存儲，這樣當從flash上複製資料實體的頭部到該資料結構後，其各個域就能夠“各得其所”了。

每個flash資料實體在flash分區上的位置、長度都由內核資料結構jffs2_raw_node_ref描述（我把它稱爲flash資料實體的“內核描述符”）：

struct jffs2_raw_node_ref

{

struct jffs2_raw_node_ref *next_in_ino;

struct jffs2_raw_node_ref *next_phys;

uint32_t flash_offset;

uint32_t totlen;

};

其中flash_offset表示相應資料實體在flash分區上的物理位址，totlen爲包括後繼資料的總長度。同一個文件的多個jffs2_raw_node_ref由next_in_ino組成一個迴圈鏈表，鏈表首爲文件的jffs2_inode_cache資料結構的nodes域，鏈表末尾元素的next_in_ino則指向jffs2_inode_cache，這樣任何一個jffs2_raw_node_ref元素就都知道自己所在的文件了。

一個flash擦除塊內所有資料實體的內核描述符由next_phys域組織成一個鏈表，其首尾元素分別由擦除塊描述符jffs2_eraseblock資料結構的first_node和last_node域指向。

另外，每一個文件在內核中都由唯一的jffs2_inode_cache資料結構表示（我把它稱爲文件的“內核描述符”）：

struct jffs2_inode_cache {

struct jffs2_full_dirent *scan_dents;

struct jffs2_inode_cache *next;

struct jffs2_raw_node_ref *nodes;

uint32_t ino;

int nlink;

int state;

};

其中各個域的解釋如下：

ino爲文件的在文件系統中唯一的索引結點號；所有文件內核描述符被組織在一個hash表中，next用於組織衝突項的鏈表；nodes指向文件的所有資料實體的內核描述符jffs2_raw_node_ref鏈表。如果爲目錄文件，那麽在挂載文件系統時還會爲每一個資料實體的內核描述符創建相應的jffs2_full_dirent資料結構，它們組成的鏈表由scan_dents指向。只不過在挂載文件系統的最後將釋放這些jffs2_full_dirent資料結構，並將scan_dents域設置爲NULL，詳見後文分析；nlink爲該文件的硬鏈結個數。

在挂載jffs2文件系統時將遍曆整個文件系統（掃描jffs2文件系統映象所在的整個flash分區），爲flash上每個jffs2_raw_dirent和jffs2_raw_inode資料實體創建相應的內核描述符jffs2_raw_node_ref、爲每個文件創建內核描述符jffs2_inode_cache，具體過程詳見後文“挂載文件系統”。另外在打開文件時，如果是目錄文件，則還要爲每個jffs2_raw_dirent創建相應的jffs2_full_dirent資料結構並組織爲鏈表；如果是正規文件等，則爲每個jffs2_raw_inode創建相應的jffs2_full_dnode、jffs2_tmp_dnode_info、jffs2_node_frag資料結構，並組織到紅黑樹中，詳見下文。

jffs2_raw_dirent資料實體及其內核描述符

jffs2_raw_dirent資料實體定義如下：

struct jffs2_raw_dirent

{

jint16_t magic;

jint16_t nodetype; /* == JFFS_NODETYPE_DIRENT */

jint32_t totlen;

jint32_t hdr_crc;

jint32_t pino;

jint32_t version;

jint32_t ino; /* == zero for unlink */

jint32_t mctime;

uint8_t nsize;

uint8_t type;

uint8_t unused[2];

jint32_t node_crc;

jint32_t name_crc;

uint8_t name[0];

} __attribute__((packed));

其中關鍵的幾個域解釋如下：

如前所述，一個jffs2_raw_dirent表示一個目錄項，緊隨其後的爲相應文件的檔案名。檔案名的長度由nsize域表示。ino表示相應文件的索引結點號，pino表示該目錄項所在目錄文件的索引節點號。

版本號version是相對於某一文件內部的概念。任何文件都由若干jffs2_raw_dirent或者jffs2_raw_inode資料實體組成，修改文件的“某一個區域”時將向flash寫入新的資料實體，它的version總是不斷遞增的。一個文件的所有資料實體的最高version號由文件索引結點的u域，即jffs2_inode_info資料結構中的highest_version記錄。文件內同一“區域”可能由若干資料實體表示，它們的version互不相同，而且除了最新的一個資料結點外，其餘的都被標記爲“過時”（標記記錄在資料實體的內核描述符中）。（另外，按照jffs2作者的論文，如果flash上資料實體含有相同的資料則允許它們的version號相同）

打開目錄文件時要創建其VFS的inode、dentry、file物件，在創建VFS的inode物件時要調用jffs2文件系統提供的read_inode方法。此時會根據相應的內核描述符jffs2_raw_node_ref爲每個資料實體創建一個jffs2_full_dirent資料結構，並讀出flash中jffs2_raw_dirent資料實體後的檔案名，複製到jffs2_full_dirent資料結構後面。jffs2_full_dirent組成的鏈表則由目錄文件的索引結點inode.u.dents（即jffs2_inode_info.dents）指向，參見圖1。

jffs2_full_dirent資料結構的定義如下：

struct jffs2_full_dirent

{

struct jffs2_raw_node_ref *raw;

struct jffs2_full_dirent *next;

uint32_t version;

uint32_t ino; /* == zero for unlink */

unsigned int nhash;

unsigned char type;

unsigned char name[0];

};

其中raw指向相應的jffs2_raw_node_ref結構，緊隨其後的爲從flash上讀出的檔案名。

總之，在創建了一個目錄文件的VFS的索引結點inode後（比如在打開目錄文件時），內核創建的相關資料結構的關係如下（假設該目錄有3個目錄項，沒有畫出file、dentry）：

raw

version

ino

……

jffs2_full_dirent

name

raw

version

ino

……

jffs2_full_dirent

name

raw

version

ino

……

jffs2_full_dirent

name

NULL

nodes

ino

nlink

state

jffs2_inode_cache

next_in_ino

next_phys

flash_offset

totlen

jffs2_raw_node_ref

next_in_ino

next_phys

flash_offset

totlen

jffs2_raw_node_ref

next_in_ino

next_phys

flash_offset

totlen

jffs2_raw_node_ref

inode

……

fragtree

metadata

dents

inocache

……

u域

(jffs2_inode_info)

jff2_sb_info.inocache_list[ ]

FLASH

jffs2_raw_dirent實體及緊隨其後的檔案名

說明：

圖1，jffs2_raw_dirent及其內核描述符

jffs2_raw_inode資料實體及其內核描述符

jffs2_raw_inode資料實體的定義如下：

struct jffs2_raw_inode

{

jint16_t magic; /* A constant magic number. */

jint16_t nodetype; /* == JFFS_NODETYPE_INODE */

jint32_t totlen; /* Total length of this node (inc data, etc.) */

jint32_t hdr_crc;

jint32_t ino; /* Inode number. */

jint32_t version; /* Version number. */

jint32_t mode; /* The file's type or mode. */

jint16_t uid; /* The file's owner. */

jint16_t gid; /* The file's group. */

jint32_t isize; /* Total resultant size of this inode (used for truncations) */

jint32_t atime; /* Last access time. */

jint32_t mtime; /* Last modification time. */

jint32_t ctime; /* Change time. */

jint32_t offset; /* Where to begin to write. */

jint32_t csize; /* (Compressed) data size */

jint32_t dsize; /* Size of the node's data. (after decompression) */

uint8_t compr; /* Compression algorithm used */

uint8_t usercompr; /* Compression algorithm requested by the user */

jint16_t flags; /* See JFFS2_INO_FLAG_* */

jint32_t data_crc; /* CRC for the (compressed) data. */

jint32_t node_crc; /* CRC for the raw inode (excluding data) */

// uint8_t data[dsize];

} __attribute__((packed));

由於正規文件、符號鏈結、設備文件的jffs2_raw_inode後都有相應的資料，共同組成一個flash上的資料實體，所以在下文中若無特別說明，“jffs2_raw_inode”均指該資料結構本身及其後的資料。

其中主要的域的解釋如下：

一個正規文件可能由若干jffs2_raw_inode資料實體組成，每個資料實體含有文件的一個區域的資料。即使文件的同一個區域也可能因爲修改而在flash上存在多個資料實體，它們都含有相同的ino，即文件的索引結點編號。

文件在邏輯上被當作一個連續的線性空間，每個jffs2_raw_inode所含資料在該線性空間內的偏移由offset域表示。注意，offset爲文件內的偏移，而該jffs2_raw_inode在flash分區中的偏移則由其內核描述符jffs2_raw_node_ref的flash_offset域表示。

jffs2支援資料壓縮。如果後繼資料沒有被壓縮，則compr被設置JFFS2_COMPR_NONE。壓縮前（或解壓縮後）的資料長度由dsize表示，而壓縮後的資料長度由csize表示。從後文的相關函數分析中可以看到，在修改文件的已有內容或者寫入新內容時，首先要將資料壓縮，然後在記憶體中組裝合適的jffs2_raw_inode結構，最後再將二者連續地寫入flash。而在讀flash上的設備結點時首先讀出jffs2_raw_inode結構，然後根據其中的csize域的值，分配合適大小的緩衝區，第二次再讀出緊隨其後的（壓縮了的）資料。在解壓縮時則根據dsize大小分配合適的緩衝區。

另外，如果 jffs2_raw_node 沒有後繼資料而是代表一個洞，那麽 compr 被設置爲 JFFS2_COMPR_ZERO 。

除了文件頭jffs2_unknown_node中有crc校驗值外，在jffs2_raw_inode中還有該資料結構本身及其後資料的crc校驗值。這些校驗值在創建jffs2_raw_inode時計算，在讀出該資料實體時進行驗證。

正規文件由若干jffs2_raw_inode組成，除了內核描述符jffs2_raw_node_ref外，在打開文件是還要創建相應的內核映象jffs2_full_dnode（以及臨時資料結構jffs2_tmp_dnode_info）、jffs2_node_frag，並通過後者組織到紅黑樹中。

struct jffs2_full_dnode

{

struct jffs2_raw_node_ref *raw;

uint32_t ofs; /* Don't really need this, but optimisation */

uint32_t size;

uint32_t frags; /* Number of fragments which currently refer

to this node. When this reaches zero, the node is obsolete. */

};

該資料結構的ofs和size域用於描述資料實體的後繼資料在文件內的邏輯偏移及長度，它們的值來自資料實體jffs2_raw_inode的offset和dsize域。而raw指向資料實體的內核描述符jffs2_raw_node_ref資料結構。

尤其需要說明的是frags域。當打開一個文件時爲每個jffs2_raw_node_ref創建jffs2_full_node和jffs2_node_frag，並由後者插入文件的紅黑樹中。如果對文件的相同區域進行修改，則將新的資料實體寫入flash的同時，還要創建相應的jff2_raw_node_ref和jffs2_full_dnode，並將原有jffs2_node_frag資料結構的node域指向新的jffs2_full_dnode，使其frags引用計數爲1，而原有的frags則遞減爲0。

struct jffs2_node_frag

{

rb_node_t rb;

struct jffs2_full_dnode *node; /* NULL for holes */

uint32_t size;

uint32_t ofs; /* Don't really need this, but optimisation */

};

其中rb域用於組織紅黑樹，node指標指向相應的jffs2_full_dnode，size和ofs也是從相應的jffs2_full_dnode複製而來，表示資料結點所代表的區域在文件內的偏移和長度。

總之，在打開一個正規文件時內核中創建的資料結構之間的關係如下：（假設文件由三個資料結點組成、沒有畫出file、dentry、臨時創建後又被刪除的jffs2_tmp_dnode_info）

jff2_sb_info.inocache_list[ ]

raw

ofs

size

frags = 1

jffs2_full_dnode

NULL

nodes

ino

nlink

state

jffs2_inode_cache

next_in_ino

next_phys

flash_offset

totlen

jffs2_raw_node_ref

next_in_ino

next_phys

flash_offset

totlen

jffs2_raw_node_ref

next_in_ino

next_phys

flash_offset

totlen

jffs2_raw_node_ref

inode

……

fragtree

metadata

dents

inocache

……

u域

(jffs2_inode_info)

FLASH

jffs2_raw_inode實體及緊隨其後的文件資料

說明：

圖2，正規文件的jffs2_raw_inode及其內核描述符

node

size

ofs

jffs2_node_frag

node

size

ofs

jffs2_node_frag

node

size

ofs

jffs2_node_frag

raw

ofs

size

frags = 1

jffs2_full_dnode

raw

ofs

size

frags = 1

jffs2_full_dnode

需要說明的是上圖僅僅針對正規文件。對於符號鏈結和設備文件都只有一個jffs2_raw_inode資料實體，其後的資料分別是相應的檔案名和設備結點所代表的設備號。顯然這一個資料實體對應的jffs2_full_dnode就沒有必要用紅黑樹來組織了（在jffs2_do_read_inode函數中首先將它們的jffs2_full_dnode也向正規文件的那樣加入紅黑樹，然後又改爲由metadata直接指向）。

在ext2文件系統上，設備文件（設備結點）在磁片上的實體僅爲一個磁片索引結點。而內核中的VFS的索引結點inode中設計了i_dev和i_rdev兩個域，分別表示設備文件的磁片索引結點所在的設備的設備號，以及它所代表的設備的設備號。而在磁片索引結點中卻沒有相應的兩個域。這是因爲既然能訪問這個磁片索引結點，當然就已經知道該磁片索引結點所在的磁片的設備號了。另外，由於設備文件不需要任何額外的資料，所以可以用磁片索引結點中映射資料所在磁片塊的i_data[]陣列的第一個元素來記錄該磁片索引結點所代表的設備的設備號。

與之類似，在jffs2文件系統上，設備文件僅用一個jffs2_raw_inode表示，其中也沒有任何表示設備號的域。能從flash上訪問該資料實體，當然知道其所在flash分區的設備號。另外，緊隨該資料實體後的爲其所代表的設備的設備號。

第2章 jffs2文件系統的內核資料結構

如果在配置內核時選擇對jffs2文件系統的支援，則在內核啓動時在init內核線程上下文中執行jffs2的註冊工作：將其源代碼中定義的file_system_type類型的變數jffs2_fs_type註冊到由內核總體變數file_systems指向的鏈表中去，詳見後文“註冊文件系統”。如果內核引導命令行“root=”指定的設備上含有jffs2文件系統映象，則在內核啓動時還將jffs2文件系統挂載爲根文件系統。

在挂載文件系統時，內核將爲它創建相應的VFS物件super_block，並用文件系統註冊的方法read_super讀取設備，以填充、設置super_block資料結構，並建立根目錄的inode、dentry等基本VFS設施。有些文件系統，比如ext2，在磁片上就有文件系統的超級塊ext2_super_block，因此這個方法主要時讀取磁片上的超級塊，將其內容複製到記憶體中的super_block。而jffs2在flash上沒有超級塊實體，所以在這個方法執行其他挂載所必須的操作，比如遍曆flash爲所有的資料實體和文件創建內核描述符等等，詳見後文“挂載文件系統”。

文件系統超級塊的u域：jffs2_sb_info資料結構

VFS的super_block資料結構的最後爲一個共用體（union資料結構），在挂載具體的文件系統時這個域實例化爲相應文件系統的私有資料結構。對於jffs2文件系統，這個域實例化爲jffs2_sb_info，它描述了flash分區上整個jffs2文件系統：

struct jffs2_sb_info {

struct mtd_info *mtd;

uint32_t highest_ino;

uint32_t checked_ino;

unsigned int flags;

struct task_struct *gc_task; /* GC task struct */

struct semaphore gc_thread_start; /* GC thread start mutex */

struct completion gc_thread_exit; /* GC thread exit completion port */

struct semaphore alloc_sem; /* Used to protect all the following fields, and also to protect against

out-of-order writing of nodes. And GC.*/

uint32_t cleanmarker_size; /* Size of an _inline_ CLEANMARKER

(i.e. zero for OOB CLEANMARKER */

uint32_t flash_size;

uint32_t used_size;

uint32_t dirty_size;

uint32_t wasted_size;

uint32_t free_size;

uint32_t erasing_size;

uint32_t bad_size;

uint32_t sector_size;

uint32_t unchecked_size;

uint32_t nr_free_blocks;

uint32_t nr_erasing_blocks;

uint32_t nr_blocks;

struct jffs2_eraseblock *blocks; /* The whole array of blocks. Used for getting blocks

* from the offset (blocks[ofs / sector_size]) */

struct jffs2_eraseblock *nextblock; /* The block we're currently filling */

struct jffs2_eraseblock *gcblock; /* The block we're currently garbage-collecting */

struct list_head clean_list; /* Blocks 100% full of clean data */

struct list_head very_dirty_list; /* Blocks with lots of dirty space */

struct list_head dirty_list; /* Blocks with some dirty space */

struct list_head erasable_list; /* Blocks which are completely dirty, and need erasing */

struct list_head erasable_pending_wbuf_list; /* Blocks which need erasing but only after

the current wbuf is flushed */

struct list_head erasing_list; /* Blocks which are currently erasing */

struct list_head erase_pending_list; /* Blocks which need erasing now */

struct list_head erase_complete_list; /* Blocks which are erased and need the clean marker

written to them */

struct list_head free_list; /* Blocks which are free and ready to be used */

struct list_head bad_list; /* Bad blocks. */

struct list_head bad_used_list; /* Bad blocks with valid data in. */

spinlock_t erase_completion_lock; /* Protect free_list and erasing_list against erase

completion handler */

wait_queue_head_t erase_wait; /* For waiting for erases to complete */

struct jffs2_inode_cache **inocache_list;

spinlock_t inocache_lock;

/* Sem to allow jffs2_garbage_collect_deletion_dirent to drop the erase_completion_lock while it's holding a pointer to an obsoleted node. I don't like this. Alternatives welcomed. */

struct semaphore erase_free_sem;

/* Write-behind buffer for NAND flash */

unsigned char *wbuf;

uint32_t wbuf_ofs;

uint32_t wbuf_len;

uint32_t wbuf_pagesize;

struct tq_struct wbuf_task; /* task for timed wbuf flush */

struct timer_list wbuf_timer; /* timer for flushing wbuf */

/* OS-private pointer for getting back to master superblock info */

void *os_priv;

};

其中部分域的說明如下，其他域的作用放到代碼中說明：

文件系統是一種資料組織格式，在向輔存寫入資料前，需要通過文件系統的相應方法將資料組織爲特定的格式；在從輔存讀出資料後，需要通過文件系統的相應方法解釋資料，而真正訪問輔存的工作是由設備驅動成完成的。jffs2是建立在flash上的文件系統，所以向flash寫入、讀出資料實體的操作最終通過flash驅動程式完成。jffs2_sb_info的mtd域指向整個flash（注意是包含若干分區的整個flash）的mtd_info資料結構，該資料結構在安裝、初始化flash設備驅動程式時創建，提供了訪問flash的方法。從後文的讀寫操作分析可以看到，在向flash寫入資料實體jffs2_raw_inode或者jffs2_raw_dirent及其後的資料時，最終要調用mtd_info中的相應方法。

“文件的索引節點編號唯一”是指在文件系統所在的設備上索引節點編號唯一。highest_ino記錄了文件系統內最高的索引結點編號。另外，根目錄的索引節點編號爲1。

flags爲挂載文件系統時指定的各種標誌，比如是否以唯讀方式挂載等等。

jffs2文件系統需要一個內核線程來執行垃圾回收( GC，即Garbage Collecting)，以在flash空間不足時回收所有過時的資料結點。gc_task指向這個內核線程的PCB，這樣就不用爲它創建額外的等待佇列了。

信號量gc_thread_start用於保證在當前進程（或init內核線程）在創建了GC內核線程後，在調用kernel_thread的函數返回前，GC內核線程已經運行了。詳見後文源代碼分析。

gc_thread_exit是一個completion資料結構，定義如下：

struct completion {

unsigned int done;

wait_queue_head_t wait;

};

其核心是一個等待佇列wait，整個資料結構由wait.lock保護。在jffs2_stop_garbage_collect_thread函數中通過給GC內核線程發送SIGKILL信號來結束它，當前執行流在發送完信號後就阻塞在gc_thread_exit.wait等待佇列上；而GC內核線程處理SIGKILL信號時將喚醒受阻的執行流並退出，詳見後文分析。

若干xxxx_size的域重新羅列如下：

uint32_t flash_size;

uint32_t used_size;

uint32_t dirty_size;

uint32_t wasted_size;

uint32_t free_size;

uint32_t erasing_size;

uint32_t bad_size;

uint32_t sector_size;

uint32_t unchecked_size;

其中flash_size和sector_size的值在挂載文件系統時由jffs2_sb_info.mtd所指向的flash板塊描述符mtd_info中相應的域複製過來，分別代表jffs2文件系統所在flash分區的大小和擦除塊的大小。

在flash擦除塊描述符jffs2_eraseblock中就設計了used_size、dirty_size、wasted_size、free_size域，它們分別表示當前擦除塊內有效資料實體的空間大小、過時資料實體的空間大小、無法利用的空間大小和剩餘空間大小。其中，“無法利用的空間”是由於flash上資料結點之間必須是4位元組位址對齊的，因此在資料結點之間可能存在間隙；或者由於填充；或者擦除塊尾部的空間無法利用。那麽jffs2_sb_info中這些域就是分區上所有擦除塊相應域的求和。

jffs2_eraseblock資料結構爲擦除塊描述符，所有擦除塊的描述符都存放在blocks[ ]陣列中。另外，根據擦除塊的狀態（即是否有資料、資料過時情況等資訊）還將擦除塊描述符組織在不同的xxxx_list鏈表中，以供文件系統的寫方法和GC使用，以實現對所有擦除塊的均衡使用。根據作者的注釋，各個xxxx_list域所指向鏈表的含義如下：

鏈表	鏈表中擦除塊的性質
clean_list	只包含有效資料結點
very_dirty_list	所含資料結點大部分都已過時
dirty_list	至少含有一個過時資料結點
erasable_list	所有的資料結點都過時需要擦除。但尚未“調度”到erase_pending_list
erasable_pending_wbuf_list	同erase_pending_list，但擦除必須等待wbuf沖刷後
erasing_list	當前正在擦除
erase_pending_list	當前正等待擦除
erase_complete_list	擦除已完成，但尚未寫入CLEANMARKER
free_list	擦除完成，且已經寫入CLEANMARKER
bad_list	含有損壞單元
bad_used_list	含有損壞單元，但含有資料

jffs2文件系統中的所有文件都有唯一的內核描述符jffs2_inode_cache，它在挂載文件系統、遍曆flash上所有資料實體時創建。所有文件的內核描述符被組織在一張hash表中，即爲inocache_list所指向的指標陣列。在打開文件時通過訪問flash上的目錄項實體jffs2_raw_dirent即可得到文件的索引結點號ino（在打開文件時首先進行的路徑名解析過程在此不再贅述），然後通過該hash表就可以得到其jffs2_inode_cache的指標，進而通過其nodes域得到文件所有資料實體的內核描述符jffs2_raw_node_ref組成的鏈表，從而得到所有資料實體在flash上的位置、長度資訊。

文件索引結點的u域：jffs2_inode_info資料結構

在挂載文件系統時創建文件的內核描述符jffs2_inode_cache，在打開文件時創建相應的VFS物件file、dentry、inode，而建立inode的過程由文件系統提供的方法jffs2_read_inode完成，詳見後文。inode的最後一個域爲一個union共用體，實例化爲具體文件系統的私有資料結構，在jffs2文件系統中爲jffs2_inode_info：

struct jffs2_inode_info {

struct semaphore sem;

uint32_t highest_version; /* The highest (datanode) version number used for this ino */

rb_root_t fragtree; /* List of data fragments which make up the file */

/* There may be one datanode which isn't referenced by any of the above fragments, if it contains a metadata update but no actual data - or if this is a directory inode. This also holds the _only_ dnode for symlinks/device nodes, etc. */

struct jffs2_full_dnode *metadata;

struct jffs2_full_dirent *dents; /* Directory entries */

/* Some stuff we just have to keep in-core at all times, for each inode. */

struct jffs2_inode_cache *inocache;

uint16_t flags;

uint8_t usercompr;

#if LINUX_VERSION_CODE > KERNEL_VERSION(2,5,2)

struct inode vfs_inode;

#endif

};

其中各個域的說明如下：

由於inode.i_sem在generic_file_write/read期間一直被當前執行流持有，用於實現 上層用戶進程之間的同步。所以在讀寫操作期間還要使用信號量的話，就必須設計額外的信號量，所以在jffs2_inode_info中又設計了sem信號量，它用於實現 底層讀寫執行流與 GC 之間的同步。（這是因爲GC的本質就是將有效資料實體的副本寫到其他的擦除塊中去，即還是通過寫入操作完成的，所以需要與其他寫入操作同步，詳見第6章中的相關分析。）

一個文件的所有資料實體（無論過時與否）都有唯一的version號。當前所使用的最高version號由highest_version域記錄。

正規文件包含若干jffs2_raw_inode資料實體，它們的內核描述符jffs2_raw_node_ref組成的鏈表由jffs2_inode_cache.nodes指向。在打開文件時還創建相應的jffs2_full_dnode和jffs2_node_frag資料結構，並由後者組織在由fragtree指向的紅黑樹中。詳見圖2。

由於符號鏈結和設備文件只有一個jffs2_raw_inode資料實體所以沒有必要使用紅黑樹。所以在jffs2_do_read_inode函數中先象對待正規文件那樣先將它們的jffs2_full_dnode加入紅黑樹，然後又改爲由metadata直接指向。

如果是目錄文件，則在打開文件時爲資料實體的內核描述符jffs2_raw_node_ref創建相應的jffs2_full_dirent，並組織爲由dents指向鏈表，詳見圖1。

最後，inocache指向該文件的內核描述符jffs2_inode_cache資料結構。

打開文件後相關資料結構之間的引用關係

如前所述，在挂載文件系統後即創建了super_block資料結構，爲根目錄創建了inode、dentry，爲所有的文件創建了jffs2_inode_cache及每一個資料實體的描述符jffs2_raw_node_ref；在打開文件時創建file、dentry、inode，並爲資料實體描述符創建相應的jffs2_full_dnode或者jffs2_full_dirent等資料結構。打開正規文件後相關資料結構的關係如下圖所示：（沒有畫出根目錄的inode）

（注意，虛線框中的資料結構在 打開文件時才創建，其他資料結構在 挂載文件系統時就已經創建好了）

super_block

……

s_root

…….

mtd

……

blocks

……

clean_list

……

inocache_list

……

raw

ofs

size

frags = 1

jffs2_full_dnode

NULL

nodes

ino

nlink

state

jffs2_inode_cache

next_in_ino

next_phys

flash_offset

totlen

jffs2_raw_node_ref

next_in_ino

next_phys

flash_offset

totlen

jffs2_raw_node_ref

next_in_ino

next_phys

flash_offset

totlen

jffs2_raw_node_ref

圖3，打開文件後核心資料結構之間的引用關係

node

size

ofs

jffs2_node_frag

node

size

ofs

jffs2_node_frag

node

size

ofs

jffs2_node_frag

raw

ofs

size

frags = 1

jffs2_full_dnode

raw

ofs

size

frags = 1

jffs2_full_dnode

u域

(jffs2_sb_info)

jffs2_eraseblock

…..

……

write

priv

……

mtd

mtd_info

cfi_intelext_write_buffer

struct map_info

……

hash table

dentry

……

d_cover

d_mount

……

u域

(jffs2_inode_info)

inode

……

i_sb

……

fragtree

metadata

dents

inocache

……

第3章註冊文件系統

在linux上使用一個文件系統之前必須完成安裝和註冊。如果在配置內核時選擇對文件系統的支援，那麽其代碼被靜態鏈結到內核中，在內核初始化期間init內核線程將完成文件系統的註冊。或者在安裝文件系統模組時在模組初始化函數中完成註冊。

一旦完成註冊，這種文件系統的方法對內核就是可用的了，以後就可以用mount命令將其挂載到根文件系統的某個目錄結點上。而在內核初始化期間註冊、挂載根文件系統。根文件系統的挂載點（mount point）爲“/”，在輔存的根文件系統中並沒有“/”目錄，它只存在內核記憶體中，在內核初始化時由mount_root函數創建，並執行根文件系統的挂載工作。

init_jffs2_fs函數

在配置內核時選擇對jffs2的支援，那麽jffs2的源代碼編譯後被靜態鏈結入內核映象，在初始化期間init內核線程執行init_jffs2_fs函數完成jffs2的註冊：

static int __init init_jffs2_fs(void)

{

int ret;

Communications AB./n");

#ifdef JFFS2_OUT_OF_KERNEL

/* sanity checks. Could we do these at compile time? */

if (sizeof(struct jffs2_sb_info) > sizeof (((struct super_block *)NULL)->u)) {

printk(KERN_ERR "JFFS2 error: struct jffs2_sb_info (%d bytes) doesn't fit in the super_block union

(%d bytes)/n", sizeof(struct jffs2_sb_info), sizeof (((struct super_block *)NULL)->u));

return -EIO;

}

if (sizeof(struct jffs2_inode_info) > sizeof (((struct inode *)NULL)->u)) {

printk(KERN_ERR "JFFS2 error: struct jffs2_inode_info (%d bytes) doesn't fit in the inode union (%d

bytes)/n", sizeof(struct jffs2_inode_info), sizeof (((struct inode *)NULL)->u));

return -EIO;

}

#endif

VFS的超級塊資料結構super_block以及索引結點資料結構inode的最後一個域都爲一個共用體，爲具體文件系統的私有資料結構。這裏首先檢查super_block和inode的定義（linux/fs.h）中是否在共用體中包含了jffs2的相關資料結構。

ret = jffs2_zlib_init();

if (ret) {

printk(KERN_ERR "JFFS2 error: Failed to initialise zlib workspaces/n");

goto out;

}

jffs2文件系統支援壓縮和解壓縮，在將資料實體寫入flash前可以使用zlib庫的壓縮演算法進行壓縮，從flash讀出後進行解壓縮。在jffs2_zlib_init函數中爲壓縮、解壓縮分配空間deflate_workspace和inflate_workspace。

ret = jffs2_create_slab_caches();

if (ret) {

printk(KERN_ERR "JFFS2 error: Failed to initialise slab caches/n");

goto out_zlib;

}

爲資料實體jffs2_raw_dirent和jffs2_raw_inode、資料實體內核描述符jffs2_raw_node_ref、文件的內核描述符jffs2_inode_cache、jffs2_full_dnode和jffs2_node_frag等資料結構通過kmem_cache_create函數創建相應的記憶體快取記憶體（這些資料結構都是頻繁分配、回收的物件，因此使用記憶體快取記憶體再合適不過了。另外通過slab的著色能夠使不同slab內物件的偏移位址不盡相同，從而映射到不同的處理器快取記憶體行上）。

ret = register_filesystem(&jffs2_fs_type);

if (ret) {

printk(KERN_ERR "JFFS2 error: Failed to register filesystem/n");

goto out_slab;

}

return 0;

out_slab:

jffs2_destroy_slab_caches();

out_zlib:

jffs2_zlib_exit();

out:

return ret;

}

最後，就是通過register_filesystem函數向系統註冊jffs2文件系統了。

使用mount命令挂載某一文件系統前，它必須事先已經向系統註冊過了。每一個已註冊過的文件系統都由資料結構file_system_type描述：

struct file_system_type {

const char *name;

int fs_flags;

struct super_block *(*read_super) (struct super_block *, void *, int);

struct module *owner;

struct file_system_type * next;

struct list_head fs_supers;

};

所有已註冊的文件系統的file_system_type通過next域組織成一個鏈表，鏈表由內核總體變數file_systems指向。name域用於描述文件系統的名稱，由find_filesystem函數在鏈表中查找指定名稱的文件系統時使用。fs_flags指明了文件系統的一些特性，比如文件系統是否只支援一個超級塊結構、是否允許用戶使用mount命令挂載等等，詳見linux/fs.h文件。

file_system_type中最重要的域就是函數指標read_super了。在挂載文件系統時要創建VFS的超級塊物件，無論在輔存上是否存在對應的超級塊實體，文件系統代碼都必須提供這個函數來設置、初始化VFS的超級塊物件。

另外，根據注釋，任何文件系統的file_syste_type自註冊之時其就必須一直存在在內核中，直到其被登出。無論文件系統是否被挂載都不應該釋放file_system_type資料結構。

在jffs2源代碼中實現了所有jffs2的方法（當然包括read_super函數），並通過如下的巨集定義了file_system_type資料結構：

static DECLARE_FSTYPE_DEV(jffs2_fs_type, "jffs2", jffs2_read_super);

這個巨集定義在linux/fs.h中：

#define DECLARE_FSTYPE(var,type,read,flags) /

struct file_system_type var = { /

name: type, /

read_super: read, /

fs_flags: flags, /

owner: THIS_MODULE, /

}

由此可見，在jffs2源代碼文件中定義了file_system_type類型的變數jffs2_fs_type，其名字爲“jffs2”，而“read_super”方法爲jffs2_read_super。這個函數在後文挂載文件系統時會詳細分析。

register_filesystem函數

int register_filesystem(struct file_system_type * fs)

{

int res = 0;

struct file_system_type ** p;

if (!fs)

return -EINVAL;

if (fs->next)

return -EBUSY;

INIT_LIST_HEAD(&fs->fs_supers);

write_lock(&file_systems_lock);

p = find_filesystem(fs->name);

if (*p) //若已註冊過

res = -EBUSY;

else //否則，將新的file_system_type結構加入到file_systems鏈表的末尾

*p = fs;

write_unlock(&file_systems_lock);

return res;

}

如前所述，所有已註冊文件系統的file_system_type組成一個鏈表，由內核總體變數file_systems指向。註冊文件系統就是將其file_system_type加入到這個鏈表中。當然，在訪問鏈表期間必須首先獲得鎖file_system_locks。

如果name命名文件系統已經註冊過了，則find_filesystems函數返回其file_system_type結構的位址，否則返回內核file_systems鏈表的末尾元素next域的位址:

static struct file_system_type **find_filesystem(const char *name)

{

struct file_system_type **p;

for (p=&file_systems; *p; p=&(*p)->next)

if (strcmp((*p)->name, name) == 0)

break;

return p;

}

由此可見，register_filesystem函數就是將新的文件系統的file_system_type加入到file_systems鏈表的末尾。

第4章挂載文件系統

如前所述，在jffs2源代碼中定義了file_system_type類型的總體變數jffs2_fs_type，並將其註冊到內核的file_systems鏈表中去。在挂載一個文件系統時，內核爲之創建VFS的super_block資料結構，以及根目錄的inode、dentry等資料結構。mount系統調用處理函數爲sys_mount，其中的函數調用關係如下：（“>”表示調用）

sys_mount > do_mount > get_sb_bdev > read_super > jffs2_read_super

在read_super函數中將由get_empty_super函數分配一個super_block資料結構，稍後調用相應文件系統註冊的read_super方法初始化super_block資料結構（這個調用鏈中各個函數的源代碼分析可以參見《Linux內核源代碼情景分析》上冊P491-P507，在此不再贅述）。jffs2_fs_type的read_super域指向jffs2_read_super函數。

jffs2_read_super函數

這個函數除了初始化VFS超級塊物件外，還爲flash上所有的資料實體和文件建立內核描述符。內核描述符是資料實體和文件的“地圖”，早在挂載文件系統時就必須建立，而文件的其他資料結構，比如inode、jffs2_full_dnode或jffs2_full_dirent等資料結構要在打開文件時才被創建。

static struct super_block *jffs2_read_super(struct super_block *sb, void *data, int silent)

{

struct jffs2_sb_info *c;

int ret;

unsigned long j, k;

D1(printk(KERN_DEBUG "jffs2: read_super for device %s/n", kdevname(sb->s_dev)));

if (major(sb->s_dev) != MTD_BLOCK_MAJOR) {

if (!silent)

printk(KERN_DEBUG "jffs2: attempt to mount non-MTD device %s/n", kdevname(sb->s_dev));

return NULL;

}

在read_super函數中已經將super_block.bdev設置爲jffs2文件系統所在flash分區的設備號了，再次檢查設備號是否正確。

c = JFFS2_SB_INFO(sb);

memset(c, 0, sizeof(*c));

sb->s_op = &jffs2_super_operations;

c->mtd = get_mtd_device(NULL, minor(sb->s_dev));

if (!c->mtd) {

D1(printk(KERN_DEBUG "jffs2: MTD device #%u doesn't appear to exist/n", minor(sb->s_dev)));

return NULL;

}

JFFS2_SB_INFO宏返回super_block的u域（即jffs2_sb_info資料結構）的位址。首先將整個jffs2_sb_info資料結構清空，然後設置文件系統方法表的指標s_op指向jffs2_super_operations方法表。

static struct super_operations jffs2_super_operations =

{

read_inode: jffs2_read_inode,

put_super: jffs2_put_super,

write_super: jffs2_write_super,

statfs: jffs2_statfs,

remount_fs: jffs2_remount_fs,

clear_inode: jffs2_clear_inode

};

這個資料結構提供了訪問整個文件系統的基本方法，比如jffs2_read_inode函數在打開任何文件、爲其創建VFS物件inode時調用。（附：Linux中的文件系統模組編程介面就是：必須定義file_system_type變數並實現read_super方法、必須定義超級塊方法super_operations結構並實現其中的各個方法、必須完成註冊。在挂載、卸載文件系統，打開、關閉文件時，文件系統模組實現的這些函數就會被調用。當然，還得實現訪問文件系統上任何類型文件的方法。）

另外一個關鍵設置就是將jffs2_sb_info.mtd指向在初始化flash設備驅動程式時創建的mtd_info資料結構，它物理上描述了整個flash板塊，並提供了訪問flash的底層驅動程式。從後文源代碼分析可以看到，jffs2方法最終通過調用flash驅動程式中將資料實體jffs2_raw_dirent或jffs2_raw_inode及後繼的資料塊寫入flash（或從中讀出）。

j = jiffies;

ret = jffs2_do_fill_super(sb, data, silent);

k = jiffies;

if (ret) {

put_mtd_device(c->mtd);

return NULL;

}

printk("JFFS2 mount took %ld jiffies/n", k-j);

return sb;

}

真正初始化VFS超級塊super_block資料結構、爲flash上所有資料實體建立內核描述符jffs2_raw_node_ref、爲所有文件創建內核描述符jffs2_inode_cache的任務交給jffs2_do_fill_super函數完成。

jffs2_do_fill_super函數

int jffs2_do_fill_super(struct super_block *sb, void *data, int silent)

{

struct jffs2_sb_info *c;

struct inode *root_i;

int ret;

c = JFFS2_SB_INFO(sb);

c->sector_size = c->mtd->erasesize;

c->flash_size = c->mtd->size;

if (c->flash_size < 4*c->sector_size) {

printk(KERN_ERR "jffs2: Too few erase blocks (%d)/n", c->flash_size / c->sector_size);

return -EINVAL;

}

c->cleanmarker_size = sizeof(struct jffs2_unknown_node);

if (jffs2_cleanmarker_oob(c)) { /* Cleanmarker is out-of-band, so inline size zero */

c->cleanmarker_size = 0;

}

首先根據mtd_info資料結構的相應域來設置jffs2_sb_info中與flash參數有關的域：擦除塊大小和分區大小（mtd_info資料結構在flash驅動程式初始化中已創建好）。jffs2驅動在成功擦除了一個擦除塊後，要寫入類型爲CLEANMARKER的資料實體來標記擦除成功完成。

如果爲NOR flash，則CLEANMARKER寫在擦除塊內部，cleanmarker_size即爲該資料實體的大小；如果爲NAND flash，則它寫在oob（Out_Of_Band）區間內而不佔用擦除塊空間，所以將cleanmarker_size清0。（NAND flash可以看作是一組“page”，每個page都有一個oob空間。在oob空間內可以存放ECC（Error CorreCtion）代碼、或標識含有錯誤的擦除塊的資訊、或者與文件系統相關的資訊。jffs2就利用了oob來存放CLEANMARKER）

if (c->mtd->type == MTD_NANDFLASH) {

/* Initialise write buffer */

c->wbuf_pagesize = c->mtd->oobblock;

c->wbuf_ofs = 0xFFFFFFFF;

c->wbuf = kmalloc(c->wbuf_pagesize, GFP_KERNEL);

if (!c->wbuf)

return -ENOMEM;

/* Initialize process for timed wbuf flush */

INIT_TQUEUE(&c->wbuf_task,(void*) jffs2_wbuf_process, (void *)c);

/* Initialize timer for timed wbuf flush */

init_timer(&c->wbuf_timer);

c->wbuf_timer.function = jffs2_wbuf_timeout;

c->wbuf_timer.data = (unsigned long) c;

}

NAND flash由一組“page”組成，若干page組成一個擦除塊。讀寫操作的最小單元是page，擦除操作的最小單元是擦除塊。flash描述符mtd_info的oobblock域即page的大小，所以這裏分配oobblock大小的寫緩衝區，以及周期地將該寫緩衝區刷新（或同步）到flash的內核計時器及一個任務佇列元素。由內核計時器周期性地把jffs2_sb_info.wbuf_task通過schedule_task函數調度給keventd執行，相應的回調函數爲jffs2_wbuf_process，它將jffs2_sb_info.wbuf寫緩衝區的內容寫回flash。（注意，flash的寫操作可能阻塞，因此必須放到進程上下文中進行，所以交給keventd來完成）

（需要深究NAND flash的非同步寫入機制。另外，在嵌入式linux上好像沒有kflushd了）

c->inocache_list = kmalloc(INOCACHE_HASHSIZE * sizeof(struct jffs2_inode_cache *),

GFP_KERNEL);

if (!c->inocache_list) {

ret = -ENOMEM;

goto out_wbuf;

}

memset(c->inocache_list, 0, INOCACHE_HASHSIZE * sizeof(struct jffs2_inode_cache *));

如前所述flash上的任何文件都有唯一的內核描述符jffs2_inode_cache資料結構，它在挂載文件系統創建。在打開文件、創建相應的inode物件時，inode的u域，即jffs2_inode_info資料結構的inocache域指向它。參見圖3。所有文件的jffs2_inode_cache資料結構又被組織到一張哈希表裏，由jffs2_sb_info.inocache_list指向。

if ((ret = jffs2_do_mount_fs(c)))

goto out_inohash;

ret = -EINVAL;

這個函數完成挂載jffs2文件系統的絕大部分工作，詳見下文分析，這裏僅羅列之：

1. 創建擦除塊描述符陣列jffs2_sb_info.blocks[]陣列，初始化jffs2_sb_info的相應域；

2. 掃描整個flash分區，爲所有的資料實體建立內核描述符jffs2_raw_node_ref、爲所有的文件創建內核描述符jffs2_inode_cache；

3. 將所有文件的jffs2_inode_cache加入hash表，檢查flash上所有資料實體的有效性（注意，只檢查了資料實體jffs2_raw_dirent或jffs2_raw_inode自身的crc校驗值，而把後繼資料的 crc 校驗工作延遲到了真正打開文件時，參見 jffs2_scan_inode_node 函數）；

4. 根據擦除塊的內容，將其描述符加入jffs2_sb_info中相應的xxxx_list鏈表

D1(printk(KERN_DEBUG "jffs2_do_fill_super(): Getting root inode/n"));

root_i = iget(sb, 1);

if (is_bad_inode(root_i)) {

D1(printk(KERN_WARNING "get root inode failed/n"));

goto out_nodes;

}

D1(printk(KERN_DEBUG "jffs2_do_fill_super(): d_alloc_root()/n"));

sb->s_root = d_alloc_root(root_i);

if (!sb->s_root)

goto out_root_i;

爲flash上所有文件、所有資料實體創建相應的內核描述符後，就已經完成了挂載jffs2文件系統的大部分工作，下面就得爲根目錄“/”創建VFS的inode和dentry物件了。創建inode的工作由iget內聯函數完成，注意傳遞的第二個參數爲相應inode的索引節點編號，而根目錄的索引節點編號爲1。iget函數的函數調用路徑爲：

iget > iget4 > get_new_inode > jffs2_super_operations.read_inode

當需要爲文件創建VFS的inode物件時，首先根據其索引節點編號ino在索引節點hash表中查找，如果尚未創建，則調用get_new_inode函數分配一個inode資料結構，並用相應文件系統已註冊的read_super方法初始化。對於ext2文件系統，相應的ext2_read_inode函數將讀出磁片索引結點，而對於jffs2文件系統，若爲目錄文件，則爲目錄文件的所有資料結點創建相應的jffs2_full_dirent資料結構並組織爲鏈表；若爲其他類型文件，則爲資料結點創建相應的jffs2_full_dnode和jffs2_node_frag資料結構，並由後者組織到紅黑樹中，最後根據文件的類型設置索引結點方法表指標inode.i_op/i_fop/i_mapping指標。詳見後文。

根目錄“/”是僅存在與記憶體中的VFS概念，而沒有flash上的物理實體。對於其他任何文件，其VFS的inode、dentry等資料結構在打開文件時由get_new_inode函數創建，而根目錄文件的inode早在挂載文件系統時就創建了。然後還要爲根目錄創建dentry物件。文件系統超級塊super_block的s_root指標即指向根目錄的dentry，而dentry的d_inode指向根目錄的inode，而inode的i_sb指向文件系統超級塊super_block。

#if LINUX_VERSION_CODE >= 0x20403

sb->s_maxbytes = 0xFFFFFFFF;

#endif

sb->s_blocksize = PAGE_CACHE_SIZE;

sb->s_blocksize_bits = PAGE_CACHE_SHIFT;

sb->s_magic = JFFS2_SUPER_MAGIC;

if (!(sb->s_flags & MS_RDONLY))

jffs2_start_garbage_collect_thread(c);

return 0;

out_root_i:

iput(root_i);

out_nodes:

jffs2_free_ino_caches(c);

jffs2_free_raw_node_refs(c);

kfree(c->blocks);

out_inohash:

kfree(c->inocache_list);

out_wbuf:

if (c->wbuf)

kfree(c->wbuf);

return ret;

}

挂載文件系統的最後還要設置jffs2_sb_info中的幾個域，比如頁緩衝區中的頁面大小s_blocksize，標識文件系統的“魔數”s_magic。另外，就是要啓動GC（Garbage Collecting，垃圾回收）內核線程了。jffs2日誌文件系統的特點就是任何修改都會向flash中寫入新的資料結點，而不該動原有的資料結點。當flash可用擦除塊數量低於一定的閾值後，就得喚醒GC內核線程回收所有“過時的”資料結點所占的空間了。有關GC機制詳見第8章。

如果成功挂載則返回0，否則釋放所有的描述符及各種獲得的空間並返回ret中保存的錯誤碼。

jffs2_do_mount_fs函數

在這個函數中僅爲flash分區上所有的擦除塊分配描述符並初始化各種xxxx_list鏈表首部，然後調用jffs2_build_filesystem函數完成挂載文件系統的絕大部分操作。

int jffs2_do_mount_fs(struct jffs2_sb_info *c)

{

int i;

c->free_size = c->flash_size;

c->nr_blocks = c->flash_size / c->sector_size;

c->blocks = kmalloc(sizeof(struct jffs2_eraseblock) * c->nr_blocks, GFP_KERNEL);

if (!c->blocks)

return -ENOMEM;

在挂載文件系統之前，認爲整個flash都是可用的，所以設置空閒空間大小爲整個flash分區的大小，並計算擦除塊總數。jffs2_eraseblock資料結構是擦除塊描述符，這裏爲分配所有擦除塊描述符的空間並初始化：

for (i=0; i<c->nr_blocks; i++) {

INIT_LIST_HEAD(&c->blocks[i].list);

c->blocks[i].offset = i * c->sector_size;

c->blocks[i].free_size = c->sector_size;

c->blocks[i].dirty_size = 0;

c->blocks[i].wasted_size = 0;

c->blocks[i].unchecked_size = 0;

c->blocks[i].used_size = 0;

c->blocks[i].first_node = NULL;

c->blocks[i].last_node = NULL;

}

其中offset域爲擦除塊在flash分區內的邏輯偏移，free_size爲其大小。此時所有記錄擦除塊使用狀況的xxxx_size域都爲0，它們分別表示（按照代碼中的出現順序）擦除塊中過時資料實體所占空間、由於填充和對齊浪費的空間、尚未進行crc校驗的資料實體所占的空間、有效的資料實體所占的空間。一個擦除塊內所有資料實體的內核描述符jffs2_raw_node_ref由其next_phys域組織成一個鏈表，其首尾元素分別由first_node和last_node指向。

init_MUTEX(&c->alloc_sem);

init_MUTEX(&c->erase_free_sem);

init_waitqueue_head(&c->erase_wait);

spin_lock_init(&c->erase_completion_lock);

spin_lock_init(&c->inocache_lock);

INIT_LIST_HEAD(&c->clean_list);

INIT_LIST_HEAD(&c->very_dirty_list);

INIT_LIST_HEAD(&c->dirty_list);

INIT_LIST_HEAD(&c->erasable_list);

INIT_LIST_HEAD(&c->erasing_list);

INIT_LIST_HEAD(&c->erase_pending_list);

INIT_LIST_HEAD(&c->erasable_pending_wbuf_list);

INIT_LIST_HEAD(&c->erase_complete_list);

INIT_LIST_HEAD(&c->free_list);

INIT_LIST_HEAD(&c->bad_list);

INIT_LIST_HEAD(&c->bad_used_list);

c->highest_ino = 1;

在下面的jffs2_build_filesystem函數將根據所有擦除塊的使用情況將各個擦除塊的描述符插入不同的鏈表，這裏首先初始化這些鏈表指標xxxx_list。文件的索引結點號在某個設備上的文件系統內部才唯一，當前flash分區的jffs2文件系統中最高的索引結點號由jffs2_sb_info的highest_ino域記錄，而1是根目錄“/”的索引結點號。

if (jffs2_build_filesystem(c)) {

D1(printk(KERN_DEBUG "build_fs failed/n"));

jffs2_free_ino_caches(c);

jffs2_free_raw_node_refs(c);

kfree(c->blocks);

return -EIO;

}

return 0;

}

下面就是由jffs2_build_filesystem函數真正完成jffs2文件系統的挂載工作了。

jffs2_build_filesystem函數

上文羅列了jffs2_do_mount_fs函數完成挂載jffs2文件系統的絕大部分工作：

1. 創建擦除塊描述符陣列jffs2_sb_info.blocks[]陣列，初始化jffs2_sb_info的相應域；

2. 掃描整個flash分區，爲所有的資料實體建立內核描述符jffs2_raw_node_ref、爲所有的文件創建內核描述符jffs2_inode_cache；

3. 將所有文件的jffs2_inode_cache加入hash表，檢查flash上所有資料結點的有效性；

4. 根據擦除塊的內容，將其描述符加入jffs2_sb_info中相應的xxxx_list鏈表

除了第一條外其餘的工作都是由jffs2_build_filesystem函數完成的，我們分段詳細分析這個函數：

static int jffs2_build_filesystem(struct jffs2_sb_info *c)

{

int ret;

int i;

struct jffs2_inode_cache *ic;

/* First, scan the medium and build all the inode caches with lists of physical nodes */

c->flags |= JFFS2_SB_FLAG_MOUNTING;

ret = jffs2_scan_medium(c);

c->flags &= ~JFFS2_SB_FLAG_MOUNTING;

if (ret)

return ret;

由jffs2_scan_medium函數遍曆flash分區上的所有的擦除塊，讀取每一個擦除塊上的所有資料實體，建立相應的內核描述符jffs2_raw_node_ref，爲每個文件建立內核描述符jffs2_inode_cache，並建立相互連接關係；如果是目錄文件，則爲其所有資料結點（即目錄項）創建相應的jffs2_full_dirent並組織爲鏈表，鏈表由jffs2_inode_cache的scan_dents域指向；並將jffs2_inode_cache加入jffs2_sb_info.inocache_list所指向的hash表；最後，根據擦除塊的使用情況將其描述符jffs2_eraseblock加入jffs2_sb_info中的xxxx_list鏈表。詳見下文分析。

注意在挂載文件系統期間要設置超級塊中的JFFS2_SB_FLAG_MOUNTING標誌。

D1(printk(KERN_DEBUG "Scanned flash completely/n"));

D1(jffs2_dump_block_lists(c));

/* Now scan the directory tree, increasing nlink according to every dirent found. */

for_each_inode(i, c, ic) {

D1(printk(KERN_DEBUG "Pass 1: ino #%u/n", ic->ino));

ret = jffs2_build_inode_pass1(c, ic);

if (ret) {

D1(printk(KERN_WARNING "Eep. jffs2_build_inode_pass1 for ino %d returned %d/n",

ic->ino, ret));

return ret;

}

cond_resched();

}

上面jffs2_scan_medium已經爲flash上所有的資料實體和文件創建了內核描述符，並且進一步爲所有的目錄項資料實體（jffs2_raw_dirent）創建了臨時的jffs2_full_dirent資料結構（它們將在jffs2_build_filesystem函數的最後刪除，目的只是計算所有文件的硬鏈結計數），這樣，就在內核中建立起整個文件系統的完成的目錄樹。for_each_inode宏用於訪問所有文件的內核描述符，定義如下：

#define for_each_inode(i, c, ic) /

for (i=0; i<INOCACHE_HASHSIZE; i++) /

for (ic=c->inocache_list[i]; ic; ic=ic->next)

對每一個文件都調用jffs2_build_inode_pass1函數，如果它是一個目錄文件，則增加其下子目錄、文件的硬鏈結計數。這樣也就是遍曆了整個文件系統的目錄樹，爲所有的文件都計算了硬鏈結計數。

上面的操作可能比較耗時，因此cond_resched宏用於讓出cpu，定義如下：

#define cond_resched() do { if need_resched() schedule(); } while(0)

D1(printk(KERN_DEBUG "Pass 1 complete/n"));

D1(jffs2_dump_block_lists(c));

/* Next, scan for inodes with nlink == 0 and remove them. If they were directories, then decrement the nlink of their children too, and repeat the scan. As that's going to be a fairly uncommon occurrence, it's not so evil to do it this way. Recursion bad. */

do {

D1(printk(KERN_DEBUG "Pass 2 (re)starting/n"));

ret = 0;

for_each_inode(i, c, ic) {

D1(printk(KERN_DEBUG "Pass 2: ino #%u, nlink %d, ic %p, nodes %p/n", ic->ino, ic->nlink, ic,

ic->nodes));

if (ic->nlink)

continue;

/* XXX: Can get high latency here. Move the cond_resched() from the end of the loop? */

ret = jffs2_build_remove_unlinked_inode(c, ic);

if (ret)

break;

/* -EAGAIN means the inode's nlink was zero, so we deleted it, and furthermore that it had children and their nlink has now gone to zero too. So we have to restart the scan. */

}

D1(jffs2_dump_block_lists(c));

cond_resched();

} while(ret == -EAGAIN);

D1(printk(KERN_DEBUG "Pass 2 complete/n"));

在上面遍曆了整個文件系統的目錄樹、爲所有的文件都計算了nlink後，下面要再次遍曆系統目錄樹，刪除所有nlink爲0的文件。如果它是目錄，那麽還要減小其下的所有子目錄、文件的硬鏈結計數。

（爲什麽會存在 nlink 爲 0 的文件？另外需要進一步研究jffs2_build_remove_unlinked_inode函數。）

/* Finally, we can scan again and free the dirent nodes and scan_info structs */

for_each_inode(i, c, ic) {

struct jffs2_full_dirent *fd;

D1(printk(KERN_DEBUG "Pass 3: ino #%u, ic %p, nodes %p/n", ic->ino, ic, ic->nodes));

while(ic->scan_dents) {

fd = ic->scan_dents;

ic->scan_dents = fd->next;

jffs2_free_full_dirent(fd);

}

ic->scan_dents = NULL;

cond_resched();

}

D1(printk(KERN_DEBUG "Pass 3 complete/n"));

D1(jffs2_dump_block_lists(c));

/* Rotate the lists by some number to ensure wear levelling */

jffs2_rotate_lists(c);

return ret;

}

在jffs2_scan_medium函數中建立了臨時的、整個文件系統的目錄樹。這裏刪除這個目錄樹，即所有目錄項對應的jffs2_full_dirent，同時把每個目錄文件的jffs2_inode_cache.scan_dents域設置爲NULL（其他文件的這個域本來就是NULL），以標記資料實體內核描述符jffs2_raw_node_ref的next_in_ino域組成的鏈表的末尾。

（ jffs2_rotate_list 的作用？及與 wear leveling 演算法的關係如何？）

jffs2_scan_medium函數

這個函數的作用如下：遍曆flash分區上的所有的擦除塊，

1. 讀取每一個擦除塊上的所有資料實體，建立相應的內核描述符jffs2_raw_node_ref；

2. 爲每個文件建立內核描述符jffs2_inode_cache，並建立相互連接關係；

3. 建立整個文件系統的目錄樹：爲目錄文件的所有資料結點（即目錄項）創建相應的jffs2_full_dirent並組織爲鏈表，由它的jffs2_inode_cache的scan_dents域指向；（注：這個目錄樹僅在 jffs2_build_filesystem 函數內部使用，在後面通過 jffs2_build_inode_pass1 函數計算完所有文件的硬鏈結個數 nlink 後，在 jffs2_build_filesystem 函數退出前就被刪除了。）

4. 將所有文件的jffs2_inode_cache加入jffs2_sb_info.inocache_list所指向的hash表；

5. 根據擦除塊的使用情況將其描述符jffs2_eraseblock加入jffs2_sb_info中的xxxx_list鏈表。

int jffs2_scan_medium(struct jffs2_sb_info *c)

{

int i, ret;

uint32_t empty_blocks = 0, bad_blocks = 0;

unsigned char *flashbuf = NULL;

uint32_t buf_size = 0;

size_t pointlen;

if (!c->blocks) {

printk(KERN_WARNING "EEEK! c->blocks is NULL!/n");

return -EINVAL;

}

if (c->mtd->point) {

ret = c->mtd->point (c->mtd, 0, c->mtd->size, &pointlen, &flashbuf);

if (!ret && pointlen < c->mtd->size) {

/* Don't muck about if it won't let us point to the whole flash */

D1(printk(KERN_DEBUG "MTD point returned len too short: 0x%x/n", pointlen));

c->mtd->unpoint(c->mtd, flashbuf);

flashbuf = NULL;

}

if (ret)

D1(printk(KERN_DEBUG "MTD point failed %d/n", ret));

}

NOR flash允許“就地運行”（XIP，即eXecute_In_Place），比如在系統加電時引導程式的前端就是在flash上就地運行的。讀NOR flash的操作與讀sdram類似，而flash驅動中的讀方法（read或者read_ecc）的本質操作爲memcpy，所以通過記憶體映射讀取flash比通過其讀方法要節約一次記憶體拷貝。

如果NOR flash驅動程式實現了point和unpoint方法，則允許建立記憶體映射。point函數的第2、3個參數指定了被記憶體映射的區間，而實際被記憶體映射的區間長度由pointlen返回，起始虛擬位址存放在mtdbuf所指變數中。這裏試圖記憶體映射整個flash（傳遞的第2、3個參數爲0和mtd->size），如果實際被映射的長度pointlen小於flash大小，則用unpoint拆除記憶體映射。

if (!flashbuf) {

/* For NAND it's quicker to read a whole eraseblock at a time, apparently */

if (jffs2_cleanmarker_oob(c))

buf_size = c->sector_size;

else

buf_size = PAGE_SIZE;

D1(printk(KERN_DEBUG "Allocating readbuf of %d bytes/n", buf_size));

flashbuf = kmalloc(buf_size, GFP_KERNEL);

if (!flashbuf)

return -ENOMEM;

}

如果flashbuf爲空，即尚未建立flash的直接記憶體映射，那麽需要額外分配一個內核緩衝區用於讀出flash的內容。對於NAND flash根據作者的注釋，一次性讀出整個擦除塊更快，所以緩衝區大小爲擦除塊大小；對於NOR flash該緩衝區大小等於一個記憶體頁框大小。

（由此可見，如果flash驅動支援直接記憶體映射那麽在讀操作時就無需分配額外的緩衝區了。但是，根據David Woodhouse於2005年3月的文章 www.linux-mtd.infradead.org/archive/tech/mtd_info.html，函數point和unpoint的語義有待精確定義，所以目前還是不要使用的好。

另外，只有在讀NOR flash時才可能會用point方法建立記憶體映射，而此期間會持有鎖而阻塞其他寫操作，所以在讀操作完成後應該立即用unpoint拆除記憶體映射、釋放鎖。所以，在這裏就建立記憶體映射是否過早，而應該推遲到真正執行讀操作時？比如在執行讀操作時如果可以建立記憶體映射則通過它讀取，然後拆除之；否則由jffs2_flash_read函數通過mtd->read方法讀出）

for (i=0; i<c->nr_blocks; i++) {

struct jffs2_eraseblock *jeb = &c->blocks[i];

ret = jffs2_scan_eraseblock(c, jeb, buf_size?flashbuf:(flashbuf+jeb->offset), buf_size);

if (ret < 0)

return ret;

jffs2_scan_eraseblock函數完成了jff2_scan_medium函數前4條工作，詳見後文。它根據擦除塊的內的資料資訊返回描述擦除塊狀態的數值。然後，就得根據狀態資訊將擦除塊描述符組織到jffs2_sb_info的不同的xxxx_list鏈表中去了。具體的工作在一個switch結構中完成：

ACCT_PARANOIA_CHECK(jeb);

/* Now decide which list to put it on */

switch(ret) {

case BLK_STATE_ALLFF:

/* Empty block. Since we can't be sure it was entirely erased, we just queue it for erase

again. It will be marked as such when the erase is complete. Meanwhile we still count it as

empty for later checks. */

empty_blocks++;

list_add(&jeb->list, &c->erase_pending_list);

c->nr_erasing_blocks++;

break;

如果該擦除塊上爲全1，即沒有任何資訊，當然也沒有CLEANMARKER資料實體，則將該擦除塊描述符加入erase_pending_list鏈表，同時增加相應的引用計數。該鏈表中的擦除塊即將被擦除，成功擦除後要在擦除塊的開始寫入CLEANMARKER（在jffs2_scan_eraseblock函數中返回該值的代碼位置）。

case BLK_STATE_CLEANMARKER: /* Only a CLEANMARKER node is valid */

if (!jeb->dirty_size) { /* It's actually free */

list_add(&jeb->list, &c->free_list);

c->nr_free_blocks++;

} else { /* Dirt */

D1(printk(KERN_DEBUG "Adding all-dirty block at 0x%08x to erase_pending_list/n",

jeb->offset));

list_add(&jeb->list, &c->erase_pending_list);

c->nr_erasing_blocks++;

}

break;

如果擦除塊中只有一個CLEANMARKER資料實體是有效的，而且的確擦除塊描述符中dirty_size也爲0，即擦除塊中沒有任何過時資料實體，則將其描述符加入free_list中，否則加入erase_pending_list中。

case BLK_STATE_CLEAN: /* Full (or almost full) of clean data. Clean list */

list_add(&jeb->list, &c->clean_list);

break;

如果擦除塊中基本上都是有效的資料，則將其加入clean_list鏈表。

case BLK_STATE_PARTDIRTY: /* Some data, but not full. Dirty list. */

/*Except that we want to remember the block with most free space,

and stick it in the 'nextblock' position to start writing to it. Later when we do snapshots, this

must be the most recent block, not the one with most free space. */

if (jeb->free_size > 2*sizeof(struct jffs2_raw_inode) &&

(jffs2_can_mark_obsolete(c) || jeb->free_size > c->wbuf_pagesize) &&

(!c->nextblock || c->nextblock->free_size < jeb->free_size)) {

/* Better candidate for the next writes to go to */

if (c->nextblock) {

c->nextblock->dirty_size += c->nextblock->free_size + c->nextblock->wasted_size;

c->dirty_size += c->nextblock->free_size + c->nextblock->wasted_size;

c->free_size -= c->nextblock->free_size;

c->wasted_size -= c->nextblock->wasted_size;

c->nextblock->free_size = c->nextblock->wasted_size = 0;

if (VERYDIRTY(c, c->nextblock->dirty_size)) {

list_add(&c->nextblock->list, &c->very_dirty_list);

} else {

list_add(&c->nextblock->list, &c->dirty_list);

}

c->nextblock = jeb;

} else {

jeb->dirty_size += jeb->free_size + jeb->wasted_size;

c->dirty_size += jeb->free_size + jeb->wasted_size;

c->free_size -= jeb->free_size;

c->wasted_size -= jeb->wasted_size;

jeb->free_size = jeb->wasted_size = 0;

if (VERYDIRTY(c, jeb->dirty_size)) {

list_add(&jeb->list, &c->very_dirty_list);

} else {

list_add(&jeb->list, &c->dirty_list);

}

break;

如果該擦除塊含有至少一個過時的資料實體，那麽就把它加入dirty_list或者very_dirty_list鏈表。宏VERYDIRTY定義如下：

#define VERYDIRTY(c, size) ((size) >= ((c)->sector_size / 2))

即如果過時資料實體所占空間超過擦除塊的一半大小，則認爲該擦除塊“很髒”。

jffs2_sb_info的nextblock指向當前寫入操作發生的擦除塊。如果當前擦除塊的剩餘空間大小超過nextblock所指的擦除塊，則將nextblock指向當前擦除塊，而把原先的擦除塊加入(very)dirty_list。

（爲什麽要這樣做？爲什麽在加入 (very)dirty_list 前要把擦除塊的 free_size 和 wasted_size 的大小都記入 dirty_size ？並且將二者清 0 。）

case BLK_STATE_ALLDIRTY:

/* Nothing valid - not even a clean marker. Needs erasing. */

/* For now we just put it on the erasing list. We'll start the erases later */

D1(printk(KERN_NOTICE "JFFS2: Erase block at 0x%08x is not formatted.

It will be erased/n", jeb->offset));

list_add(&jeb->list, &c->erase_pending_list);

c->nr_erasing_blocks++;

break;

如果這個擦除塊中全都是過時的資料實體，設置沒有CLEANMARKER，那麽將它加入erase_pending_list等待擦除。

case BLK_STATE_BADBLOCK:

D1(printk(KERN_NOTICE "JFFS2: Block at 0x%08x is bad/n", jeb->offset));

list_add(&jeb->list, &c->bad_list);

c->bad_size += c->sector_size;

c->free_size -= c->sector_size;

bad_blocks++;

break;

最後，如果這個擦除塊已經損壞，那麽將其加入bad_list鏈表，並增加bad_size計數，減小flash分區大小計數free_size。

（如何判斷擦除塊已經損壞？）

default:

printk(KERN_WARNING "jffs2_scan_medium(): unknown block state/n");

BUG();

}//switch

}//for

至此，已經爲所有擦除塊上的所有資料實體和文件都建立了內核描述符，並且根據擦除塊的使用情況將其描述符加入了合適的xxxx_list鏈表。結束前還得做些額外工作：

/* Nextblock dirty is always seen as wasted, because we cannot recycle it now */

if (c->nextblock && (c->nextblock->dirty_size)) {

c->nextblock->wasted_size += c->nextblock->dirty_size;

c->wasted_size += c->nextblock->dirty_size;

c->dirty_size -= c->nextblock->dirty_size;

c->nextblock->dirty_size = 0;

}

jffs2_sb_info的nextblock指向當前正在寫入的擦除塊（即被寫入新的資料結點的擦除塊。注意在jffs2上資料結點是順序地寫入flash的）。根據作者的注釋，其過時的資料實體所占的空間被計算入被浪費的空間 wasted_size ，而這又是因爲當前擦除塊沒辦法被 recycle 。什麽意思？

if (!jffs2_can_mark_obsolete(c) && c->nextblock && (c->nextblock->free_size & (c->wbuf_pagesize-1))) {

/* If we're going to start writing into a block which already contains data, and the end of the data isn't page-aligned, skip a little and align it. */

uint32_t skip = c->nextblock->free_size & (c->wbuf_pagesize-1);

D1(printk(KERN_DEBUG "jffs2_scan_medium(): Skipping %d bytes in nextblock to

ensure page alignment/n", skip));

c->nextblock->wasted_size += skip;

c->wasted_size += skip;

c->nextblock->free_size -= skip;

c->free_size -= skip;

}

根據作者的注釋，如果當前正在被寫入的擦除塊的可用空間的大小不是頁地址對齊的，那麽跳過其開頭的部分空間，到達頁位址對齊處。因此，無論擦除塊描述符jffs2_eraseblock還是jffs2_sb_info中的wasted_size域都要相應地增加，free_size域都要相應地減少。

另外需要說明的是，在當前開發板上使用的是NOR flash，所以CONFIG_JFFS2_FS_NAND宏未定義，所以jffs2_can_mark_obsolete(c)被定義爲1，所以會跳過這段代碼。

if (c->nr_erasing_blocks) {

if ( !c->used_size && ((empty_blocks+bad_blocks)!= c->nr_blocks || bad_blocks == c->nr_blocks) ) {

printk(KERN_NOTICE "Cowardly refusing to erase blocks on filesystem with

no valid JFFS2 nodes/n");

printk(KERN_NOTICE "empty_blocks %d, bad_blocks %d, c->nr_blocks %d/n",

empty_blocks,bad_blocks,c->nr_blocks);

return -EIO;

}

jffs2_erase_pending_trigger(c);

}

if (buf_size)

kfree(flashbuf);

else

c->mtd->unpoint(c->mtd, flashbuf);

return 0;

}

最後，在挂載完整個文件系統後，如果有需要立即擦除的擦除塊則通過jffs2_erase_pending_trigger函數設置文件系統超級塊中的s_dirt標誌，並且釋放緩存擦除塊內容的緩衝區。

super_block.s_dirt 標誌是如何引發非同步擦除操作的？由誰完成擦除操作？（與擦除有關的函數調用路徑爲（反向）： jffs2_erase_blocks < jffs2_erase_pending_blocks < jffs2_write_super ，由此可見擦除操作可能是借助了 Linux 上周期地將超級塊寫回輔存的機制）

jffs2_scan_eraseblock函數

該函數解析一個擦除塊：

1. 爲擦除塊中所有資料實體，建立相應的內核描述符jffs2_raw_node_ref；

2. 爲擦除塊中的每個文件建立內核描述符jffs2_inode_cache，並建立相互連接關係；

3. 爲擦除塊中的每個目錄文件的所有資料實體（即目錄項）創建相應的jffs2_full_dirent並組織爲鏈表，由它的jffs2_inode_cache的scan_dents域指向；

4. 將擦除塊中所有文件內核描述符加入jffs2_sb_info.inocache_list所指向的hash表；

static int jffs2_scan_eraseblock (struct jffs2_sb_info *c, struct jffs2_eraseblock *jeb,

unsigned char *buf, uint32_t buf_size) {

struct jffs2_unknown_node *node;

struct jffs2_unknown_node crcnode;

uint32_t ofs, prevofs;

uint32_t hdr_crc, buf_ofs, buf_len;

int err;

int noise = 0;

int wasempty = 0;

uint32_t empty_start = 0;

#ifdef CONFIG_JFFS2_FS_NAND

int cleanmarkerfound = 0;

#endif

ofs = jeb->offset;

prevofs = jeb->offset - 1;

D1(printk(KERN_DEBUG "jffs2_scan_eraseblock(): Scanning block at 0x%x/n", ofs));

#ifdef CONFIG_JFFS2_FS_NAND

if (jffs2_cleanmarker_oob(c)) {

int ret = jffs2_check_nand_cleanmarker(c, jeb);

D2(printk(KERN_NOTICE "jffs_check_nand_cleanmarker returned %d/n",ret));

/* Even if it's not found, we still scan to see if the block is empty. We use this information

to decide whether to erase it or not. */

switch (ret) {

case 0: cleanmarkerfound = 1; break;

case 1: break;

case 2: return BLK_STATE_BADBLOCK;

case 3: return BLK_STATE_ALLDIRTY; /* Block has failed to erase min. once */

default: return ret;

}

#endif

buf_ofs = jeb->offset;

設置ofs和buf_ofs爲該擦除塊在flash分區內的偏移。上面的代碼與NAND類型的flash有關，在此略過。

if (!buf_size) {

buf_len = c->sector_size;

} else {

buf_len = EMPTY_SCAN_SIZE; //1024

err = jffs2_fill_scan_buf(c, buf, buf_ofs, buf_len);

if (err)

return err;

}

通過jffs2_fill_scan_buf函數讀取flash分區上偏移爲buf_ofs、長度爲buf_len的資料到buf緩衝區中。這個函數就是直接調用jffs2_flash_read函數，而後者爲直接調用flash驅動程式read方法的宏：

#define jffs2_flash_read(c, ofs, len, retlen, buf) ((c)->mtd->read((c)->mtd, ofs, len, retlen, buf))

對於NAND flash該緩衝區大小等於擦除塊大小，所以這裏就可以讀出整個擦除塊的內容；對於NOR flash，緩衝區大小只等於一個頁框，所以這裏只能讀出擦除塊首部一個頁面大小的內容，而在後文的while迴圈中逐頁讀出整個擦除塊。

/* We temporarily use 'ofs' as a pointer into the buffer/jeb */

ofs = 0;

/* Scan only 4KiB of 0xFF before declaring it's empty */

while(ofs < EMPTY_SCAN_SIZE && *(uint32_t *)(&buf[ofs]) == 0xFFFFFFFF)

ofs += 4;

前面ofs和buf_ofs都是一個擦除塊在flash分區內的邏輯偏移，從此開始將ofs用作指向緩衝區buf內部的指標。如果buf中所有的資料都是0xFF（注意buf的長度就是EMPTY_SCAN_SIZE），則ofs到達EMPTY_SCAN_SIZE處，否則指向第一個非1位元組。

if (ofs == EMPTY_SCAN_SIZE) {

#ifdef CONFIG_JFFS2_FS_NAND

if (jffs2_cleanmarker_oob(c)) {

/* scan oob, take care of cleanmarker */

int ret = jffs2_check_oob_empty(c, jeb, cleanmarkerfound);

D2(printk(KERN_NOTICE "jffs2_check_oob_empty returned %d/n",ret));

switch (ret) {

case 0: return cleanmarkerfound ? BLK_STATE_CLEANMARKER : BLK_STATE_ALLFF;

case 1: return BLK_STATE_ALLDIRTY;

case 2: return BLK_STATE_BADBLOCK; /* case 2/3 are paranoia checks */

case 3: return BLK_STATE_ALLDIRTY; /* Block has failed to erase min. once */

default: return ret;

}

#endif

D1(printk(KERN_DEBUG "Block at 0x%08x is empty (erased)/n", jeb->offset));

return BLK_STATE_ALLFF; /* OK to erase if all blocks are like this */

}

如果ofs果然到達EMPTY_SCAN_SIZE處，則認爲整個擦除塊都是全1（EMPTY_SCAN_SIZE只有1k，而擦除塊大小比如爲256k），所以返回BLK_STATE_ALLFF。（當從jffs2_scan_eraseblock函數返回到jff2_scan_medium後，在jff2_scan_medium中根據返回值BLK_STATE_ALLFF將當前擦除塊加入jffs2_sb_info.erase_pending_list鏈表）

if (ofs) {

D1(printk(KERN_DEBUG "Free space at %08x ends at %08x/n", jeb->offset,

jeb->offset + ofs));

DIRTY_SPACE(ofs);

}

如果ofs沒有到達EMPTY_SCAN_SIZE處，則指向buf中第一個非1位元組。那麽該擦除塊開頭這ofs長度的空間將無法被利用，所以用DIRTY_SIZE宏修改jffs2_eraseblock和jffs2_sb_info的free_size和dirty_size：

#define DIRTY_SPACE(x) do { typeof(x) _x = (x); /

c->free_size -= _x; c->dirty_size += _x; /

jeb->free_size -= _x ; jeb->dirty_size += _x; /

}while(0)

注意從這裏開始統計擦除塊的使用情況，設置jffs2_eraseblock和jffs2_sb_info中的xxxx_size域。

/* Now ofs is a complete physical flash offset as it always was... */

ofs += jeb->offset;

原來ofs指該buf內部的相對位址，而擦除塊塊的在flash分區的邏輯偏移爲jeb->offset，所以加上後者後ofs就是在flash分區內的邏輯偏移了。下面從已經讀出的buf_len個資料開始遍曆整個擦除塊。雖然讀出的資料量buf_len可能小於擦除塊大小sector_size，但是從下文可知，如果在buf的末尾含有一個資料實體的部分資料，則會接著讀出flash分區中ofs開始，長度爲buf_len的後繼資料（ofs即指向該資料實體的起始偏移）。

noise = 10;

while(ofs < jeb->offset + c->sector_size) {

D1(ACCT_PARANOIA_CHECK(jeb));

cond_resched();

ofs爲當前擦除塊的某個位元組在flash分區內的邏輯偏移，而“jeb->offset + c->sector_size”爲當前擦除塊在分區內的後繼位元組位址，即在一個迴圈中遍曆整個擦除塊。

新的一輪迴圈開始時，ofs所指位元組單元可能包括各種情況。在迴圈的開始首先處理一些特殊狀況：

if (ofs & 3) {

printk(KERN_WARNING "Eep. ofs 0x%08x not word-aligned!/n", ofs);

ofs = (ofs+3)&~3;

continue;

}

flash上的資料實體都要求是4位元組位址對齊的，所以如果如果沒有位址對齊，則步進ofs到地址對齊處並開始新的迴圈。

if (ofs == prevofs) {

printk(KERN_WARNING "ofs 0x%08x has already been seen. Skipping/n", ofs);

DIRTY_SPACE(4);

ofs += 4;

continue;

}

prevofs = ofs;

迴圈開始前，prevofs被設置爲“jeb->offset - 1”，即當前擦除塊的前驅位元組位址，所以對於第一個非1資料位元組的位置ofs一定不會等於prevofs。而以後prevofs就用於記錄已經遍曆過的位址。如果當前位址ofs已經遍曆過，則跳過4個位元組。

if (jeb->offset + c->sector_size < ofs + sizeof(*node)) {

D1(printk(KERN_DEBUG "Fewer than %d bytes left to end of block. (%x+%x<%x+%x)

Not reading/n", sizeof(struct jffs2_unknown_node),

jeb->offset, c->sector_size, ofs, sizeof(*node)));

DIRTY_SPACE((jeb->offset + c->sector_size)-ofs);

break;

}

再次重申，ofs爲當前擦除塊內非1位元組在flash分區中的邏輯偏移，而sizeof(*node)爲所有的資料實體的頭部大小（4位元組）。如果在當前擦除塊的末尾無法容納一個資料實體的頭部資訊，那麽從ofs開始到當前擦除塊末尾的空間（“jeb->offset + c->sector_size - ofs”）將不會被利用，所以應該計算爲“dirty”。另外，這種情況也代表這當前擦除塊遍曆完畢，所以通過break跳出迴圈。

if (buf_ofs + buf_len < ofs + sizeof(*node)) {

buf_len = min_t(uint32_t, buf_size, jeb->offset + c->sector_size - ofs);

D1(printk(KERN_DEBUG "Fewer than %d bytes (node header) left to end of buf. Reading 0x%x

at 0x%08x/n",sizeof(struct jffs2_unknown_node), buf_len, ofs));

err = jffs2_fill_scan_buf(c, buf, ofs, buf_len);

if (err)

return err;

buf_ofs = ofs;

}

當從擦除塊中讀出第一個塊長度爲buf_len的資料時，buf_ofs爲擦除塊的分區偏移。如果先前讀出的資料量的末尾沒有包含一個完整的資料實體的頭部，則從ofs開始，即該資料實體頭部開始，再讀出長度buf_len的資料，以便下面用node資料結構取出資料實體的頭部：

node = (struct jffs2_unknown_node *)&buf[ofs-buf_ofs];

假設從 ofs 開始發現了一個 jffs2_raw_inode/dirent ，那麽應該到達下面發現JFFS2_MAGIC_BITMASK的情況。我們先跳過下面這部分代碼。

if (*(uint32_t *)(&buf[ofs-buf_ofs]) == 0xffffffff) {

uint32_t inbuf_ofs = ofs - buf_ofs + 4;

uint32_t scanend;

empty_start = ofs;

ofs += 4;

/* If scanning empty space after only a cleanmarker, don't bother scanning the whole block */

if (unlikely(empty_start == jeb->offset + c->cleanmarker_size &&

jeb->offset + EMPTY_SCAN_SIZE < buf_ofs + buf_len))

scanend = jeb->offset + EMPTY_SCAN_SIZE - buf_ofs;

else

scanend = buf_len;

D1(printk(KERN_DEBUG "Found empty flash at 0x%08x/n", ofs));

while (inbuf_ofs < scanend) {

if (*(uint32_t *)(&buf[inbuf_ofs]) != 0xffffffff)

goto emptyends;

inbuf_ofs+=4;

ofs += 4;

}

/* Ran off end. */

D1(printk(KERN_DEBUG "Empty flash ends normally at 0x%08x/n", ofs));

if (buf_ofs == jeb->offset && jeb->used_size == PAD(c->cleanmarker_size) &&

!jeb->first_node->next_in_ino && !jeb->dirty_size)

return BLK_STATE_CLEANMARKER;

wasempty = 1;

continue;

} else if (wasempty) {

emptyends:

//printk(KERN_WARNING "Empty flash at 0x%08x ends at 0x%08x/n", empty_start, ofs);

DIRTY_SPACE(ofs-empty_start);

wasempty = 0;

continue;

}

if (ofs == jeb->offset && je16_to_cpu(node->magic) == KSAMTIB_CIGAM_2SFFJ) {

printk(KERN_WARNING "Magic bitmask is backwards at offset 0x%08x.

Wrong endian filesystem?/n", ofs);

DIRTY_SPACE(4);

ofs += 4;

continue;

}

如果資料實體的頭部node的第一個位元組（魔數）爲KSAMTIB_CIGAM_2SFFJ，則說明jffs2文件系統中資料實體的位元組序錯誤。正常情況下應該等於JFFS2_MAGIC_BITMASK。它們的定義爲：

/* Values we may expect to find in the 'magic' field */

#define JFFS2_OLD_MAGIC_BITMASK 0x1984

#define JFFS2_MAGIC_BITMASK 0x1985

#define KSAMTIB_CIGAM_2SFFJ 0x5981 /* For detecting wrong-endian fs */

#define JFFS2_EMPTY_BITMASK 0xffff

#define JFFS2_DIRTY_BITMASK 0x0000

if (je16_to_cpu(node->magic) == JFFS2_DIRTY_BITMASK) {

D1(printk(KERN_DEBUG "Empty bitmask at 0x%08x/n", ofs));

DIRTY_SPACE(4);

ofs += 4;

continue;

}

如果頭部的魔數爲0，則認爲是無效的資料頭部，所以跳過4位元組。

if (je16_to_cpu(node->magic) == JFFS2_OLD_MAGIC_BITMASK) {

//printk(KERN_WARNING "Old JFFS2 bitmask found at 0x%08x/n", ofs);

//printk(KERN_WARNING "You cannot use older JFFS2 filesystems with newer kernels/n");

DIRTY_SPACE(4);

ofs += 4;

continue;

}

如果jffs文件系統映象是用版本1的工具生成的，那麽顯然不能與版本2的代碼工作在一起，所以要跳過整個資料實體頭部。

上面已經考慮了魔數的各種其他情況，代碼執行到這裏魔數就應該爲JFFS2_MAGIC_BITMASK了。當然，如果還不是，就只能跳過4位元組開始新的迴圈。

這種情況也是很有可能發生的，比如當資料實體頭部的 crc 校驗錯誤時，從下面可用看到只是簡單的步進 ofs 四個位元組，在新的迴圈中又會執行到“ node = (struct jffs2_unknown_node *)&buf[ofs-buf_ofs]; ”

顯然，從 ofs 開始的應該是錯誤 crc 頭部的後繼資料，那麽就會進入這個 if 分支中。而且這種情況會重復發生，直到整個資料實體都被遍曆完。但是，由於 flash 上資料實體的長度也是 4 位元組位址對齊的，所以不會影響後繼的資料實體！

（這也是我覺得當發現錯誤的 crc 頭部時應該跳過整個資料實體的原因）

if (je16_to_cpu(node->magic) != JFFS2_MAGIC_BITMASK) {

/* OK. We're out of possibilities. Whinge and move on */

//noisy_printk(&noise, "jffs2_scan_eraseblock(): Magic bitmask 0x%04x not found at 0x%08x:

0x%04x instead/n", JFFS2_MAGIC_BITMASK, ofs, je16_to_cpu(node->magic));

DIRTY_SPACE(4);

ofs += 4;

continue;

}

/* We seem to have a node of sorts. Check the CRC */

crcnode.magic = node->magic;

crcnode.nodetype = cpu_to_je16( je16_to_cpu(node->nodetype) | JFFS2_NODE_ACCURATE);

crcnode.totlen = node->totlen;

hdr_crc = crc32(0, &crcnode, sizeof(crcnode)-4);

if (hdr_crc != je32_to_cpu(node->hdr_crc)) {

//noisy_printk(&noise, "jffs2_scan_eraseblock(): Node at 0x%08x {0x%04x, 0x%04x, 0x%08x}

has invalid CRC 0x%08x (calculated 0x%08x)/n",

// ofs, je16_to_cpu(node->magic),

// je16_to_cpu(node->nodetype),

// je32_to_cpu(node->totlen),

// je32_to_cpu(node->hdr_crc),

// hdr_crc);

DIRTY_SPACE(4);

ofs += 4;

continue;

}

計算資料實體頭部的crc值，並與其聲稱的crc值向比較。正常情況下應該相同，否則就跳過4個位元組並開始新的迴圈。

我覺得應該跳過整個資料實體！（整個資料實體的長度爲 node->totlen ）

if (ofs + je32_to_cpu(node->totlen) > jeb->offset + c->sector_size) {

/* Eep. Node goes over the end of the erase block. */

printk(KERN_WARNING "Node at 0x%08x with length 0x%08x would run over the end of

the erase block/n", ofs, je32_to_cpu(node->totlen));

printk(KERN_WARNING "Perhaps the file system was created with the wrong erase size?/n");

DIRTY_SPACE(4);

ofs += 4;

continue;

}

jffs2要求任何資料實體不能跨越一個擦除塊。如果這種情況發生了，則跳過4位元組並開始新的迴圈。

if (!(je16_to_cpu(node->nodetype) & JFFS2_NODE_ACCURATE)) {

/* Wheee. This is an obsoleted node */

D2(printk(KERN_DEBUG "Node at 0x%08x is obsolete. Skipping/n", ofs));

DIRTY_SPACE(PAD(je32_to_cpu(node->totlen)));

ofs += PAD(je32_to_cpu(node->totlen));

continue;

}

根據linux/jffs2.h，所有四種有效的資料結點類型中JFFS2_NODE_ACCURATE都有效，否則要跳過整個資料實體，注意頭部中totlen爲包括了後繼資料的資料實體總長度。jffs2文件系統的flash中的資料實體有如下4種類型：

#define JFFS2_NODETYPE_DIRENT (JFFS2_FEATURE_INCOMPAT |

JFFS2_NODE_ACCURATE | 1)

#define JFFS2_NODETYPE_INODE (JFFS2_FEATURE_INCOMPAT |

JFFS2_NODE_ACCURATE | 2)

#define JFFS2_NODETYPE_CLEANMARKER (JFFS2_FEATURE_RWCOMPAT_DELETE |

JFFS2_NODE_ACCURATE | 3)

#define JFFS2_NODETYPE_PADDING (JFFS2_FEATURE_RWCOMPAT_DELETE |

JFFS2_NODE_ACCURATE | 4)

下面就得根據頭部中的nodetype欄位判斷資料實體的類型。注意，資料實體的頭部是從flash分區中偏移ofs開始的：

switch(je16_to_cpu(node->nodetype)) {

case JFFS2_NODETYPE_INODE:

if (buf_ofs + buf_len < ofs + sizeof(struct jffs2_raw_inode)) {

buf_len = min_t(uint32_t, buf_size, jeb->offset + c->sector_size - ofs);

D1(printk(KERN_DEBUG "Fewer than %d bytes (inode node) left to end of buf. Reading

0x%x at 0x%08x/n", sizeof(struct jffs2_raw_inode), buf_len, ofs));

err = jffs2_fill_scan_buf(c, buf, ofs, buf_len);

if (err)

return err;

buf_ofs = ofs;

node = (void *)buf;

}

err = jffs2_scan_inode_node(c, jeb, (void *)node, ofs);

if (err) return err;

ofs += PAD(je32_to_cpu(node->totlen));

break; //跳出switch，開始新一輪大循環

首先，如果資料實體爲jffs2_raw_inode，則由jffs2_scan_inode_node函數爲之創建相應的內核描述符jffs2_raw_node_ref。如果該資料實體爲某文件的第一個資料實體，則該文件的內核描述符jffs2_inode_cache尚未創建，則創建之並加入到文件系統hash表中，然後建立資料實體描述符和文件描述符之間的連接關係。參見下文分析。

注意，先前讀出的“flash分區中偏移爲buf_ofs、長度爲buf_len”的空間的末尾可能只含有部分資料實體，那麽還得繼續讀出後繼的資料塊。注意，讀出的後繼資料塊長度不會小於“jeb->offset + c->sector_size - ofs”，而jffs2要求一個資料實體不會跨越擦除塊邊界，所以一定能讀出至少一個完整的資料實體。

掃描完該jffs2_raw_inode後，就要遞增ofs爲其長度，用break跳出switch結構、開始下一輪迴圈了。注意flash上的資料實體不但起始位址是4位元組位址對齊的，而且長度也是4位元組位址對齊的。遞增ofs前還要由PAD巨集將資料長度向上取整爲4位元組對齊的。

case JFFS2_NODETYPE_DIRENT:

if (buf_ofs + buf_len < ofs + je32_to_cpu(node->totlen)) {

buf_len = min_t(uint32_t, buf_size, jeb->offset + c->sector_size - ofs);

D1(printk(KERN_DEBUG "Fewer than %d bytes (dirent node) left to end of buf. Reading

0x%x at 0x%08x/n", je32_to_cpu(node->totlen), buf_len, ofs));

err = jffs2_fill_scan_buf(c, buf, ofs, buf_len);

if (err)

return err;

buf_ofs = ofs;

node = (void *)buf;

}

err = jffs2_scan_dirent_node(c, jeb, (void *)node, ofs);

if (err) return err;

ofs += PAD(je32_to_cpu(node->totlen));

break; //跳出switch，開始新一輪大循環

如果資料實體爲jffs2_raw_dirent，即爲目錄文件的目錄項，則由jffs2_scan_dirent_node函數分析之。詳見下文分析。其餘注意事項與前同。

case JFFS2_NODETYPE_CLEANMARKER:

D1(printk(KERN_DEBUG "CLEANMARKER node found at 0x%08x/n", ofs));

if (je32_to_cpu(node->totlen) != c->cleanmarker_size) {

printk(KERN_NOTICE "CLEANMARKER node found at 0x%08x has totlen 0x%x !=

normal 0x%x/n", ofs, je32_to_cpu(node->totlen), c->cleanmarker_size);

DIRTY_SPACE(PAD(sizeof(struct jffs2_unknown_node)));

ofs += PAD(sizeof(struct jffs2_unknown_node));

} else if (jeb->first_node) {

printk(KERN_NOTICE "CLEANMARKER node found at 0x%08x, not first node in block

(0x%08x)/n", ofs, jeb->offset);

DIRTY_SPACE(PAD(sizeof(struct jffs2_unknown_node)));

ofs += PAD(sizeof(struct jffs2_unknown_node));

} else {

struct jffs2_raw_node_ref *marker_ref = jffs2_alloc_raw_node_ref();

if (!marker_ref) {

printk(KERN_NOTICE "Failed to allocate node ref for clean marker/n");

return -ENOMEM;

}

marker_ref->next_in_ino = NULL;

marker_ref->next_phys = NULL;

marker_ref->flash_offset = ofs | REF_NORMAL;

marker_ref->totlen = c->cleanmarker_size;

jeb->first_node = jeb->last_node = marker_ref;

USED_SPACE(PAD(c->cleanmarker_size));

ofs += PAD(c->cleanmarker_size);

}

break;

如果資料實體爲CLEANMARKER，即爲一個jffs2_unknown_node資料結構，首先檢查其長度，如果不符則跳過。另外，CLEANMARKER是該擦除塊成功擦除後寫入的第一個資料實體，所以當掃描出它時擦除塊描述符的first_node指標應該爲空，否則跳過。

如果一切正常，則爲CLEANMARKER分配相應的內核描述符並插入擦除塊first_node所指向的鏈表中去。

case JFFS2_NODETYPE_PADDING:

DIRTY_SPACE(PAD(je32_to_cpu(node->totlen)));

ofs += PAD(je32_to_cpu(node->totlen));

break;

如果資料實體爲填充塊，則跳過即可。

正常情況下flash上只有上述4中資料實體。對於其他特殊資料實體，則需要另外處理（參見jffs2作者描述jffs2的論文）：

default://？？

switch (je16_to_cpu(node->nodetype) & JFFS2_COMPAT_MASK) {

case JFFS2_FEATURE_ROCOMPAT:

printk(KERN_NOTICE "Read-only compatible feature node (0x%04x) found at offset

0x%08x/n", je16_to_cpu(node->nodetype), ofs);

c->flags |= JFFS2_SB_FLAG_RO;

if (!(jffs2_is_readonly(c)))

return -EROFS;

DIRTY_SPACE(PAD(je32_to_cpu(node->totlen)));

ofs += PAD(je32_to_cpu(node->totlen));

break;

如果發現這種類型的資料實體，那麽整個jffs2文件系統都只能按照唯讀的方式挂載，所以設置文件系統超級塊的u域即jffs2_sb_info的flags域的JFFS2_SB_FLAG_RO標誌，同時還得檢查挂載jffs2時的方式，如果不是按照唯讀方式挂載的，則返回錯誤EROFS（錯誤值將逆著函數調用鏈一直向上游傳遞，從而導致挂載失敗）。

case JFFS2_FEATURE_INCOMPAT:

printk(KERN_NOTICE "Incompatible feature node (0x%04x) found at offset 0x%08x/n",

je16_to_cpu(node->nodetype), ofs);

return -EINVAL;

如果遇到這種類型的資料實體，則直接拒絕挂載jffs2。

case JFFS2_FEATURE_RWCOMPAT_DELETE:

D1(printk(KERN_NOTICE "Unknown but compatible feature node (0x%04x) found at offset

0x%08x/n", je16_to_cpu(node->nodetype), ofs));

DIRTY_SPACE(PAD(je32_to_cpu(node->totlen)));

ofs += PAD(je32_to_cpu(node->totlen));

break;

case JFFS2_FEATURE_RWCOMPAT_COPY:

D1(printk(KERN_NOTICE "Unknown but compatible feature node (0x%04x) found at offset

0x%08x/n", je16_to_cpu(node->nodetype), ofs));

USED_SPACE(PAD(je32_to_cpu(node->totlen)));

ofs += PAD(je32_to_cpu(node->totlen));

break;

如果遇到這兩種資料實體，則直接跳過即可。

}//switch

}//default

}

D1(printk(KERN_DEBUG "Block at 0x%08x: free 0x%08x, dirty 0x%08x, used 0x%08x/n", jeb->offset,

jeb->free_size, jeb->dirty_size, jeb->used_size));

至此，已經遍曆完整個擦除塊的內容，函數的最後就是要根據擦除塊描述符中的統計資訊，返回反應其狀態的數值了：

/* mark_node_obsolete can add to wasted !! */

if (jeb->wasted_size) {

jeb->dirty_size += jeb->wasted_size;

c->dirty_size += jeb->wasted_size;

c->wasted_size -= jeb->wasted_size;

jeb->wasted_size = 0;

}

如果當前擦除塊的 wasted_size 域不爲 0 ，則將其算入 dirty_size 。同時刷新相關統計資訊。爲什麽要這樣？

if ((jeb->used_size + jeb->unchecked_size) == PAD(c->cleanmarker_size) &&

!jeb->first_node->next_in_ino && !jeb->dirty_size)

return BLK_STATE_CLEANMARKER;

第一個條件滿足表示擦除塊中有效資料實體空間和未檢查空間等於一個CLEANMARKER大小，即只有一個CLEANMARKER，此時它的內核描述符中next_in_ino指標一定爲NULL（因爲CLEANMARKER不屬於任何文件），所以第二個條件也滿足；第三個條件再次檢查的確沒有任何過時資料實體，即dirty_size爲0，則返回BLK_STATE_CLEANMARKER。

/* move blocks with max 4 byte dirty space to cleanlist */

else if (!ISDIRTY(c->sector_size - (jeb->used_size + jeb->unchecked_size))) {

c->dirty_size -= jeb->dirty_size;

c->wasted_size += jeb->dirty_size;

jeb->wasted_size += jeb->dirty_size;

jeb->dirty_size = 0;

return BLK_STATE_CLEAN;

}

參見下面的jffs2_scan_inode/dirent_node函數分析，它們在掃描完一個有效的jffs2_raw_inode或jffs2_raw_dirent資料實體後，分別用UNCHECKED_SPACE和USED_SPACE巨集增加相應的統計資訊。那麽遍曆完整個擦除塊後，擦除塊描述符的used_size + unchecked_size即爲其上所有有效資料實體和未檢查空間的大小。所以，用擦除塊大小sector_size減去這個值就是dirty和wasted的空間的大小。ISDIRTY巨集用於判斷爲髒的區域大小size是否超過255：

/* check if dirty space is more than 255 Byte */

#define ISDIRTY(size) ((size) > sizeof (struct jffs2_raw_inode) + JFFS2_MIN_DATA_LEN)

所以，如果(c->sector_size - (jeb->used_size + jeb->unchecked_size))小於255，則還認爲這個擦除塊還是乾淨的，所以返回BLK_STATE_CLEAN。

else if (jeb->used_size || jeb->unchecked_size)

return BLK_STATE_PARTDIRTY;

else

return BLK_STATE_ALLDIRTY;

}

否則擦除塊爲髒。used_size爲有效資料實體的總長度。如果任意不爲0，即擦除塊中含有至少一個有效資料實體，則返回“部分髒”數值BLK_STATE_PARTDIRTY，否則一個有效資料實體也沒有，自然應該返回“全髒”數值BLK_STATE_ALLDIRTY。

jffs2_scan_inode_node函數

該函數爲資料實體創建內核描述符jffs2_raw_node_ref，如果所屬文件的內核描述符jffs2_inode_cache不存在，則創建之，並建立二者連接關係，再將jffs2_inode_cache記錄到hash表中。參數ri爲已經讀出到內核緩衝區中的jffs2_raw_inode資料實體的首址，該資料實體在flash分區內的偏移爲ofs，從函數中可用看到，資料實體的內核描述符jffs2_raw_node_ref的flash_offset域的值就由ofs設置。

static int jffs2_scan_inode_node(struct jffs2_sb_info *c, struct jffs2_eraseblock *jeb,

struct jffs2_raw_inode *ri, uint32_t ofs)

{

struct jffs2_raw_node_ref *raw;

struct jffs2_inode_cache *ic;

uint32_t ino = je32_to_cpu(ri->ino);

首先，從資料實體jffs2_raw_inode的ino欄位獲得其所屬文件的索引結點號。

D1(printk(KERN_DEBUG "jffs2_scan_inode_node(): Node at 0x%08x/n", ofs));

/* We do very little here now. Just check the ino# to which we should attribute

this node; we can do all the CRC checking etc. later. There's a tradeoff here --

we used to scan the flash once only, reading everything we want from it into

memory, then building all our in-core data structures and freeing the extra

information. Now we allow the first part of the mount to complete a lot quicker,

but we have to go _back_ to the flash in order to finish the CRC checking, etc.

Which means that the _full_ amount of time to get to proper write mode with GC

operational may actually be _longer_ than before. Sucks to be me. */

raw = jffs2_alloc_raw_node_ref();

if (!raw) {

printk(KERN_NOTICE "jffs2_scan_inode_node(): allocation of node reference failed/n");

return -ENOMEM;

}

然後，通過kmem_cache_alloc函數從raw_node_ref_slab中分配一個jffs2_raw_node_ref資料結構。回想在註冊jffs2文件系統時就已經爲所有的內核描述符資料結構創建了相應的內核快取記憶體。

ic = jffs2_get_ino_cache(c, ino);

if (!ic) {

/* Inocache get failed. Either we read a bogus ino# or it's just genuinely the

first node we found for this inode. Do a CRC check to protect against the former case */

uint32_t crc = crc32(0, ri, sizeof(*ri)-8);

if (crc != je32_to_cpu(ri->node_crc)) {

printk(KERN_NOTICE "jffs2_scan_inode_node(): CRC failed on node at 0x%08x: Read 0x%08x,

calculated 0x%08x/n", ofs, je32_to_cpu(ri->node_crc), crc);

/* We believe totlen because the CRC on the node _header_ was OK, just the node itself failed. */

DIRTY_SPACE(PAD(je32_to_cpu(ri->totlen)));

return 0;

}

ic = jffs2_scan_make_ino_cache(c, ino);

if (!ic) {

jffs2_free_raw_node_ref(raw);

return -ENOMEM;

}

函數的開始就已經得到了資料實體所屬文件的索引結點號，這裏通過jffs2_get_ino_cache函數返回相應文件的、組織在jffs2_sb_info.inocache_list所指向的hash表中的jffs2_inode_cache資料結構。如果不存在則返回NULL，這說明我們遇到了屬於該文件的第一個資料實體，所以還得通過jffs2_scan_make_ino_cache函數爲該文件分配一個jffs2_inode_cache資料結構並加入hash表。詳見下文分析。

另外，文件的jffs2_inode_cache資料結構不存在也有可能是由於前面得到的索引結點號ino錯誤引起的。所以在創建新的jffs2_inode_cache資料結構前，首先得驗證crc以保證ino號是正確的。jffs2_raw_inode資料結構的最後兩個域各爲長度爲4位元組的crc校驗值data_crc和node_crc，分別爲其後壓縮了的資料的crc校驗值和jffs2_raw_inode資料結構本身的crc校驗值。在計算jffs2_raw_inode資料結構本身的crc校驗值時要排除這兩個域共8位元組，然後再與讀出的node_crc相比較。

如果發生crc校驗錯誤，則增加jffs2_sb_info和jffs2_eraseblock中的dirty_size值、減小free_size值，並直接退出。

/* Wheee. It worked */

raw->flash_offset = ofs | REF_UNCHECKED;

raw->totlen = PAD(je32_to_cpu(ri->totlen));

raw->next_phys = NULL;

如果一切順利，則初始化jffs2_raw_node_ref，首先將flash_offset設置爲資料實體在flash分區內的邏輯偏移。該函數的參數ofs正是這個值。由於flash上資料結點的總是4位元組位址對齊的，所以jffs2_raw_node_ref的flash_offset的最低兩個bit總是0，所以可用利用它們標記相應資料實體的狀態。這兩位可能的值定義如下：

#define REF_UNCHECKED 0 /* We haven't yet checked the CRC or built its inode */

#define REF_OBSOLETE 1 /* Obsolete, can be completely ignored */

#define REF_PRISTINE 2 /* Completely clean. GC without looking */

#define REF_NORMAL 3 /* Possibly overlapped. Read the page and write again on GC */

而PAD巨集定義爲：#define PAD(x) (((x)+3)&~3)。由此可見，flash上資料實體的長度也是4位元組位址對齊的。next_phys用於把擦除塊內所有資料實體描述符組織爲一個鏈表，這裏先初始化爲NULL，在下面才加入鏈表。

尤其需要說明的是，根據作者的注釋這裏在挂載文件系統時並沒有檢查 flash 上資料實體的有效性，而把 crc 校驗的工作推遲到了打開文件、爲文件創建 jffs2_full_dnode 或者 jffs2_full_dirent 時進行（詳見後文“打開文件時讀 inode 的方法”一章中的 jffs2_get_inode_nodes 函數的相關部分 ）。所以需要在資料實體的內核描述符中設置 REF_UNCHECKED 標誌。這樣做的目的是加快文件系統的挂載過程，但是“跑得了和尚跑不了廟”，在打開文件時還得進行 crc 校驗。但是以後並不是一定需要打開所有的文件，所以推遲 crc 校驗還是有好處的。

另外還需要說明的是，在下文 jffs2_scan_dirent_node 函數分析中可以看到，對目錄項資料實體即刻進行了 crc 校驗，所以將其內核描述符設置爲 REF_PRINSTINE （參見 jffs2_scan_dirent_node 函數的相關部分 ）。

然後，就得建立資料實體內核描述符jffs2_raw_node_ref和其所屬文件描述符jffs2_inode_cache之間的聯繫了：

raw->next_in_ino = ic->nodes;

ic->nodes = raw;

即將該jffs2_raw_node_ref加入到jffs2_inode_cache.nodes鏈表的首部。

if (!jeb->first_node)

jeb->first_node = raw;

if (jeb->last_node)

jeb->last_node->next_phys = raw;

jeb->last_node = raw;

擦除塊內所有資料實體的內核描述符通過next_phys域組織成一個鏈表，鏈表的首尾元素由jffs2_eraseblock資料結構

D1(printk(KERN_DEBUG "Node is ino #%u, version %d. Range 0x%x-0x%x/n",

je32_to_cpu(ri->ino), je32_to_cpu(ri->version), je32_to_cpu(ri->offset),

je32_to_cpu(ri->offset)+je32_to_cpu(ri->dsize)));

pseudo_random += je32_to_cpu(ri->version);

UNCHECKED_SPACE(PAD(je32_to_cpu(ri->totlen)));

return 0;

}

最後，由UNCHECKED_SPACE宏減少jffs2_sb_info和jffs2_eraseblock中的free_size域，增加unchecked_size域：

#define UNCHECKED_SPACE(x) do { typeof(x) _x = (x); /

c->free_size -= _x; c->unchecked_size += _x; /

jeb->free_size -= _x ; jeb->unchecked_size += _x; /

}while(0)

jffs2_scan_inode_node函數完成後，資料實體和文件的內核描述符之間的關係參見圖2（爲圖2的一部分）。

jffs2_scan_make_ino_cache函數

這個函數爲索引結點號爲ino的文件分配一個新的jffs2_inode_cache資料結構並加入文件系統hash表：

static struct jffs2_inode_cache *jffs2_scan_make_ino_cache(struct jffs2_sb_info *c, uint32_t ino)

{

struct jffs2_inode_cache *ic;

ic = jffs2_get_ino_cache(c, ino);

if (ic)

return ic;

ic = jffs2_alloc_inode_cache();

if (!ic) {

printk(KERN_NOTICE "jffs2_scan_make_inode_cache(): allocation of inode cache failed/n");

return NULL;

}

memset(ic, 0, sizeof(*ic));

如果ino對應的jffs2_inode_cache已經在文件系統hash表（由jffs2_sb_info.inocache_list指向）中，則直接返回其地址即可。否則，通過jffs2_alloc_inode_cache函數從內核快取記憶體中分配一個，然後初始化之：

ic->ino = ino;

ic->nodes = (void *)ic;

jffs2_add_ino_cache(c, ic);

if (ino == 1)

ic->nlink=1;

return ic;

}

將jffs2_inode_cache.ino設置爲ino，並通過jffs2_add_ino_cache函數加入文件系統hash表。最後，如果是根目錄，則將硬鏈結計數nlink設置爲1（稍後會設置爲3，見上文）。

jffs2_scan_dirent_node函數

在挂載文件系統時爲已讀出的jffs2_raw_dirent目錄項資料實體創建內核描述符jffs2_raw_node_ref和臨時的jff2_full_dirent。如果爲相應目錄的jffs2_inode_cache尚未創建則創建之，並建立三者之間的連接關係。

static int jffs2_scan_dirent_node(struct jffs2_sb_info *c, struct jffs2_eraseblock *jeb,

struct jffs2_raw_dirent *rd, uint32_t ofs)

{

struct jffs2_raw_node_ref *raw;

struct jffs2_full_dirent *fd;

struct jffs2_inode_cache *ic;

uint32_t crc;

D1(printk(KERN_DEBUG "jffs2_scan_dirent_node(): Node at 0x%08x/n", ofs));

/* We don't get here unless the node is still valid, so we don't have to mask in the ACCURATE bit any more. */

crc = crc32(0, rd, sizeof(*rd)-8);

if (crc != je32_to_cpu(rd->node_crc)) {

printk(KERN_NOTICE "jffs2_scan_dirent_node(): Node CRC failed on node at 0x%08x:

Read 0x%08x, calculated 0x%08x/n", ofs, je32_to_cpu(rd->node_crc), crc);

/* We believe totlen because the CRC on the node _header_ was OK, just the node itself failed. */

DIRTY_SPACE(PAD(je32_to_cpu(rd->totlen)));

return 0;

}

與jffs2_raw_inode不同，需要爲jffs2_raw_dirent創建額外的內核描述符jffs2_full_dirent。首先進行crc校驗。計算jffs2_raw_dirent的crc校驗值時排除其後node_crc和name_crc兩個域，共8位元組，然後與node_crc相比較。如果錯誤，則整個資料實體（jffs2_raw_inode及緊隨其後的檔案名）被認爲爲dirty，增加jffs2_sb_info和jffs2_eraseblock中的dirty_size值、減小free_size值，並退出。

pseudo_random += je32_to_cpu(rd->version);

fd = jffs2_alloc_full_dirent(rd->nsize+1);

if (!fd) {

return -ENOMEM;

}

memcpy(&fd->name, rd->name, rd->nsize);

fd->name[rd->nsize] = 0;

通過jffs2_alloc_full_dirent函數從內核快取記憶體中分配一個jffs2_full_dirent資料結構。jffs2_raw_dirent其後跟隨的檔案名長度爲nsize，而爲jffs2_full_dirent.name分配空間時多分配一個位元組，用來填充字串結束符。然後將檔案名複製到jffs2_full_dirent.name所指空間中。

crc = crc32(0, fd->name, rd->nsize);

if (crc != je32_to_cpu(rd->name_crc)) {

printk(KERN_NOTICE "jffs2_scan_dirent_node(): Name CRC failed on node at 0x%08x:

Read 0x%08x, calculated 0x%08x/n",ofs, je32_to_cpu(rd->name_crc), crc);

D1(printk(KERN_NOTICE "Name for which CRC failed is (now) '%s', ino #%d/n",

fd->name, je32_to_cpu(rd->ino)));

jffs2_free_full_dirent(fd);

/* FIXME: Why do we believe totlen? */

/* We believe totlen because the CRC on the node _header_ was OK, just the name failed. */

DIRTY_SPACE(PAD(je32_to_cpu(rd->totlen)));

return 0;

}

接著，就得驗證檔案名的crc校驗值了。如果錯誤，則增加jffs2_sb_info和jffs2_eraseblock中的dirty_size值、減小free_size值，並退出。接下來就需要爲jffs2_raw_dirent創建內核描述符jffs2_raw_node_ref，如果其目錄文件的jffs2_inode_cache尚未創建，則創建之，並建立二者的連接關係。完全類似於jffs2_scan_inode_dnode函數，在此不再贅述。

raw = jffs2_alloc_raw_node_ref();

if (!raw) {

jffs2_free_full_dirent(fd);

printk(KERN_NOTICE "jffs2_scan_dirent_node(): allocation of node reference failed/n");

return -ENOMEM;

}

ic = jffs2_scan_make_ino_cache(c, je32_to_cpu(rd->pino));

if (!ic) {

jffs2_free_full_dirent(fd);

jffs2_free_raw_node_ref(raw);

return -ENOMEM;

}

raw->totlen = PAD(je32_to_cpu(rd->totlen));

raw->flash_offset = ofs | REF_PRISTINE;

raw->next_phys = NULL;

raw->next_in_ino = ic->nodes;

ic->nodes = raw;

if (!jeb->first_node)

jeb->first_node = raw;

if (jeb->last_node)

jeb->last_node->next_phys = raw;

jeb->last_node = raw;

需要說明的是，由於已經對目錄項資料實體進行了crc校驗，所以設置其內核描述符的REF_PRINSTINE標誌。與此相比，在jffs2_scan_inode_node函數中沒有對jffs2_raw_inode資料實體立即進行crc校驗（而是推遲到打開文件時），所以才設置了REF_UNCHECKED標誌（參見jffs2_scan_inode_node函數的相關部分）。

對於目錄項資料實體，還得進一步建立jffs2_full_dirent和jffs2_raw_node_ref、jffs2_inode_cache之間的連接關係：

fd->raw = raw;

fd->next = NULL;

fd->version = je32_to_cpu(rd->version);

fd->ino = je32_to_cpu(rd->ino);

fd->nhash = full_name_hash(fd->name, rd->nsize);

fd->type = rd->type;

USED_SPACE(PAD(je32_to_cpu(rd->totlen)));

jffs2_add_fd_to_list(c, fd, &ic->scan_dents);

return 0;

}

jffs2_full_dirent的nhash域爲根據檔案名計算的一個數值，然後在通過jffs2_add_fd_to_list函數將其插入由jffs2_inode_cache.scan_dents指向的鏈表時，按照nhash由小到大的順序插入。詳見下文。

在退出前還得用USED_SPACE宏增加jffs2_sb_info和當前擦除塊的jffs2_eraseblock的used_size值、減小free_size值：

#define USED_SPACE(x) do { typeof(x) _x = (x); /

c->free_size -= _x; c->used_size += _x; /

jeb->free_size -= _x ; jeb->used_size += _x; /

}while(0)

jffs2_scan_dirent_node函數完成後，目錄項實體和文件的內核描述符之間的關係如下：

raw

version

ino

……

jffs2_full_dirent

name

raw

version

ino

……

jffs2_full_dirent

name

raw

version

ino

……

jffs2_full_dirent

name

scan_dents

nodes

ino

nlink

state

jffs2_inode_cache

next_in_ino

next_phys

flash_offset

totlen

jffs2_raw_node_ref

next_in_ino

next_phys

flash_offset

totlen

jffs2_raw_node_ref

next_in_ino

next_phys

flash_offset

totlen

jffs2_raw_node_ref

jff2_sb_info.inocache_list[ ]

特別需要說明的是，挂載文件系統時從jffs2_build_filesystem函數到達這個函數的調用路徑爲：

jffs2_build_filesystem > jffs2_scan_medium > jffs2_scan_eraseblock > jffs2_scan_dirent_node

從這個函數一直返回到jffs2_build_filesystem函數後，會調用jffs2_build_inode_pass1函數。在jffs2_build_filesystem函數的最後，要逐一釋放jffs2_inode_cache.scan_dents所指向的jffs2_full_dirent資料結構的鏈表（參見 jffs2_build_filesystem函數的末尾）。

另外，在打開目錄文件時，在jffs2_do_read_inode函數中會再次爲所有的目錄項創建相應的jffs2_full_dirent資料結構的鏈表，並由inode.u.dents域指向。詳見後文。

full_name_hash函數

include/linux/dcache.h文件中定義的與full_name_hash函數有關的內聯函數如下：

/* Name hashing routines. Initial hash value */

/* Hash courtesy of the R5 hash in reiserfs modulo sign bits */

#define init_name_hash() 0

/* partial hash update function. Assume roughly 4 bits per character */

static __inline__ unsigned long

partial_name_hash(unsigned long c, unsigned long prevhash)

{

return (prevhash + (c << 4) + (c >> 4)) * 11;

}

/* Finally: cut down the number of bits to a int value (and try to avoid losing bits) */

static __inline__ unsigned long

end_name_hash(unsigned long hash)

{

return (unsigned int) hash;

}

/* Compute the hash for a name string. */

static __inline__ unsigned int

full_name_hash(const unsigned char * name, unsigned int len)

{

unsigned long hash = init_name_hash();

while (len--)

hash = partial_name_hash(*name++, hash);

return end_name_hash(hash);

}

由此可見，根據檔案名的每個字元，計算出一個hash值，然後這個值被累積到後繼字元的hash中。直到掃描到檔案名的最後一個字元，返回最終的hash值。

jffs2_add_fd_to_list函數

這個函數將new指向的jffs2_full_dirent元素加入(*list)指向的鏈表，此時就可以判斷處相應的目錄項實體是否過時了。

void jffs2_add_fd_to_list(struct jffs2_sb_info *c, struct jffs2_full_dirent *new, struct jffs2_full_dirent **list)

{

struct jffs2_full_dirent **prev = list;

D1(printk(KERN_DEBUG "jffs2_add_fd_to_list( %p, %p (->%p))/n", new, list, *list));

while ((*prev) && (*prev)->nhash <= new->nhash) {

if ((*prev)->nhash == new->nhash && !strcmp((*prev)->name, new->name)) {

/* Duplicate. Free one */

if (new->version < (*prev)->version) {

D1(printk(KERN_DEBUG "Eep! Marking new dirent node obsolete/n"));

D1(printk(KERN_DEBUG "New dirent is /"%s/"->ino #%u. Old is /"%s/"->ino #%u/n",

new->name, new->ino, (*prev)->name, (*prev)->ino));

jffs2_mark_node_obsolete(c, new->raw);

jffs2_free_full_dirent(new);

} else {

D1(printk(KERN_DEBUG "Marking old dirent node (ino #%u) obsolete/n", (*prev)->ino));

new->next = (*prev)->next;

jffs2_mark_node_obsolete(c, ((*prev)->raw));

jffs2_free_full_dirent(*prev);

*prev = new;

}

goto out;

}

prev = &((*prev)->next);

}

new->next = *prev;

*prev = new;

如果新元素的nhash值小於鏈表首元素的nhash值，則步進prev指標，否則將新元素插入prev指向的位置處。如果新元素的nhash值等於(*prev)所指元素的nhash值，則進一步比較二者的檔案名是否相同。如果也相同，則說明出現了關於同一文件的目錄項的重復jffs2_full_dirent資料結構，則需要刪除版本號較小的那一個：首先通過jffs2_mark_node_obsolete函數標記目錄項的記憶體描述符爲過時：設置其flash_offset的REF_OBSOLETE標誌（這個函數還進行了許多其他相關操作：刷新所在擦除塊描述符和jffs2_sb_info中的xxxx_size域，甚至要改變當前擦除塊所在的jffs2_sb_info.xxxx_list鏈表。尚未詳細研究），然後用jffs2_free_full_dirent函數釋放這個jffs2_full_dirent資料結構。

需要額外說明的是在挂載文件系統時無法知道 flash 上資料實體是否過時，只有等到創建紅黑樹、或者組織目錄項的鏈表時才能發現資料實體是否過時。顯然版本號最高的資料實體才是有效的，其他都是過時的。那麽應該釋放過時資料實體除內核描述符外的其他資料結構，保持紅黑樹或目錄項鏈表中只與有效資料實體相關。

out:

D2(while(*list) {

printk(KERN_DEBUG "Dirent /"%s/" (hash 0x%08x, ino #%u/n",

(*list)->name, (*list)->nhash, (*list)->ino);list = &(*list)->next;}

);

}

jffs2_build_inode_pass1函數

如果第二個參數ic所指文件是一個目錄文件，則該函數增加其下所有子目錄、文件的硬鏈結計數nlink：

int jffs2_build_inode_pass1(struct jffs2_sb_info *c, struct jffs2_inode_cache *ic)

{

struct jffs2_full_dirent *fd;

D1(printk(KERN_DEBUG "jffs2_build_inode building inode #%u/n", ic->ino));

if (ic->ino > c->highest_ino)

c->highest_ino = ic->ino;

爲所有的文件都調用一次該函數，用jffs2_sb_info的highest_ino域記錄文件系統內最高的索引節點號。

在jffs2_build_filesystem函數中先由jffs2_scan_medium函數爲目錄文件的目錄項創建了jffs2_full_dirent資料結構，它們的鏈表由jffs2_inode_cache的scan_dents域指向。在jffs2_build_filesystem函數的最後又釋放掉了所有目錄文件的jffs2_full_dirent資料結構的鏈表。臨時建立這個鏈表就是爲了jffs2_build_inode_pass1函數使用。

如果是目錄文件，則其jffs2_inode_cache.scan_dents非空，則遍曆相應鏈表檢索目錄下的所有子目錄：

/* For each child, increase nlink */

for(fd=ic->scan_dents; fd; fd = fd->next) {

struct jffs2_inode_cache *child_ic;

if (!fd->ino)

continue;

/* XXX: Can get high latency here with huge directories */

child_ic = jffs2_get_ino_cache(c, fd->ino);

if (!child_ic) {

printk(KERN_NOTICE "Eep. Child /"%s/" (ino #%u) of dir ino #%u doesn't exist!/n",

fd->name, fd->ino, ic->ino);

continue;

}

注意，目錄文件的索引節點編號爲ic->ino，而jffs2_full_dnode資料結構中的ino（即fd->ino）爲目錄項所對應的子目錄、子文件的索引結點編號。根據目錄項的索引結點編號，由jffs2_get_ino_cache訪問所有文件的jffs2_inode_cache資料結構的hash表（由jffs2_sb_info.inocache_list指向），返回其地址，否則返回NULL。

目錄項要麽是一個子目錄，要麽是其他文件。如果目錄項對應的文件的jffs2_inode_cache尚未建立，則列印警告資訊，並開始新的迴圈訪問下一目錄項。

if (child_ic->nlink++ && fd->type == DT_DIR) {

printk(KERN_NOTICE "Child dir /"%s/" (ino #%u) of dir ino #%u appears to be a hard link/n",

fd->name, fd->ino, ic->ino);

if (fd->ino == 1 && ic->ino == 1) {

printk(KERN_NOTICE "This is mostly harmless, and probably caused by creating

a JFFS2 image/n");

printk(KERN_NOTICE "using a buggy version of mkfs.jffs2. Use at least v1.17./n");

}

/* What do we do about it? */

}

D1(printk(KERN_DEBUG "Increased nlink for child /"%s/" (ino #%u)/n", fd->name, fd->ino));

/* Can't free them. We might need them in pass 2 */

}//for

return 0;

}

先前在jffs2_scan_medium函數中爲所有的文件創建jffs2_inode_cache時，nlink域被設置爲0，它代表指向文件索引結點的目錄項的個數。jffs2_build_inode_pass1函數的核心操作就是“child_ic->nlink++”，即增加目錄項所代表文件的硬鏈結個數。在ext2文件系統中，硬鏈結個數由inode.i_nlink記錄；在jffs2中，由jffs2_inode_cache.nlink記錄。

如果文件（無論是否是目錄）在A目錄下，在B目錄中存在一個硬鏈結（即B目錄的目錄文件中含有代表該文件的目錄項，即在flash上存在兩個代表該文件的jffs2_raw_dirent資料實體（一個屬於A目錄文件，另一個屬於B目錄文件）），那麽在遍曆A目錄時其nlink由0增加爲1，在遍曆B目錄時就會發現nlink已經不等於0了，並且將nlink進一步增加爲2。

如果發現存在目錄文件的硬鏈結，則列印告警資訊。

（目錄文件的硬鏈結的風險在什麽地方？）

第5章打開文件時建立inode資料結構的方法

挂載jffs2文件系統時，一旦爲flash上所有文件和資料實體創建了相應的內核描述符後，就已經完成了挂載的大部分工作。剩下的就得爲根目錄“/”創建VFS的inode和dentry物件了。另外在打開任何一個文件時需要爲其創建VFS的inode、dentry、file物件。

創建inode的工作由iget內聯函數完成。在jffs2_do_fill_super函數中爲根目錄創建inode的代碼摘錄如下：

D1(printk(KERN_DEBUG "jffs2_do_fill_super(): Getting root inode/n"));

root_i = iget(sb, 1);

if (is_bad_inode(root_i)) {

D1(printk(KERN_WARNING "get root inode failed/n"));

goto out_nodes;

}

注意傳遞的第二個參數爲相應inode的索引節點號，而根目錄的索引節點號爲1。iget函數的函數調用路徑爲：

iget > iget4 > get_new_inode > jffs2_super_operations.read_inode

當需要爲文件創建VFS的inode物件時，首先根據其索引節點號ino在索引節點哈希表inode_hashtable中查找，如果尚未創建，則調用get_new_inode函數分配一個inode資料結構，並用相應文件系統已註冊的read_super方法初始化。對於ext2文件系統，相應的ext2_read_inode函數將讀出磁片索引結點，而對於jffs2文件系統，若爲目錄文件，則爲目錄文件的所有資料實體創建相應的jffs2_full_dirent資料結構並組織爲鏈表；若爲其他類型文件，則爲資料結點創建相應的jffs2_full_dnode和jffs2_node_frag資料結構，並由後者組織到紅黑樹中，最後根據文件的類型設置索引結點方法表指標inode.i_op/i_fop/i_mapping指標。

iget和iget4函數

iget函數返回或者創建與索引結點號ino相應的inode資料結構：

static inline struct inode *iget(struct super_block *sb, unsigned long ino)

{

return iget4(sb, ino, NULL, NULL);

}

struct inode *iget4(struct super_block *sb, unsigned long ino, find_inode_t find_actor, void *opaque)

{

struct list_head * head = inode_hashtable + hash(sb,ino);

struct inode * inode;

文件索引節點號的唯一性是指，它在文件系統所在的設備中是唯一的。所以在計算索引節點號的散列值時要傳遞文件系統超級塊的地址。hash函數定義於fs/inode.c：

static inline unsigned long hash(struct super_block *sb, unsigned long i_ino)

{

unsigned long tmp = i_ino + ( (unsigned long) sb / L1_CACHE_BYTES);

tmp = tmp + (tmp >> I_HASHBITS);

return tmp & I_HASHMASK;

}

由此可見，它僅把文件系統超級塊的位址當作無符號長整型資料來使用。由於每個文件系統超級塊的位址不盡相同，所以可用保證不同文件系統內相同的索引結點號在整個作業系統中是不同的。

計算出ino對應的散列值後，就可用得到衝突項組成的鏈表了，鏈表由head指向。下面，就用find_inode函數返回該鏈表中ino相應的inode結構的位址：

spin_lock(&inode_lock);

inode = find_inode(sb, ino, head, find_actor, opaque);

if (inode) {

__iget(inode);

spin_unlock(&inode_lock);

wait_on_inode(inode);

return inode;

}

spin_unlock(&inode_lock);

如果這個inode已經存在，則返回其地址，並通過__iget增加其引用計數（當然iget還有其他操作，這裏沒有分析），並且通過wait_on_inode函數確保inode沒有被加鎖。如果已經被加鎖（inode.i_state的I_LOCK位被設置），則阻塞等待被解鎖爲止。這個函數很簡單，僅羅列其代碼如下：

static void __wait_on_inode(struct inode * inode)

{

DECLARE_WAITQUEUE(wait, current);

add_wait_queue(&inode->i_wait, &wait);

repeat:

set_current_state(TASK_UNINTERRUPTIBLE);

if (inode->i_state & I_LOCK) {

schedule();

goto repeat;

}

remove_wait_queue(&inode->i_wait, &wait);

current->state = TASK_RUNNING;

}

static inline void wait_on_inode(struct inode *inode)

{

if (inode->i_state & I_LOCK)

__wait_on_inode(inode);

}

回到iget4函數，如果相應的inode不存在，則通過get_new_inode函數分配一個新的inode：

/* get_new_inode() will do the right thing, re-trying the search in case it had to block at any point. */

return get_new_inode(sb, ino, head, find_actor, opaque);

}

get_new_inode函數

* This is called without the inode lock held.. Be careful.

* We no longer cache the sb_flags in i_flags - see fs.h

* -- rmk@arm.uk.linux.org

static struct inode * get_new_inode(struct super_block *sb, unsigned long ino, struct list_head *head,

find_inode_t find_actor, void *opaque)

{

struct inode * inode;

inode = alloc_inode();

if (inode) {

struct inode * old;

spin_lock(&inode_lock);

/* We released the lock, so.. */

old = find_inode(sb, ino, head, find_actor, opaque);

alloc_inode巨集定義爲：

#define alloc_inode() (struct inode *) kmem_cache_alloc(inode_cachep, SLAB_KERNEL))

首先從inode_cachep內核快取記憶體中分配一個inode結構，然後在設置該inode之前，再次調用find_inode函數確保此前所需的inode仍然沒有被創建。

if (!old) {

inodes_stat.nr_inodes++;

list_add(&inode->i_list, &inode_in_use);

list_add(&inode->i_hash, head);

如果真的需要新的inode，則將其加入內核鏈表inode_in_use，並加入索引結點哈希表inode_hashtable中head所指的衝突項鏈表，同時增加內核統計inode_stat_nr_inodes++。

inode->i_sb = sb;

inode->i_dev = sb->s_dev;

inode->i_blkbits = sb->s_blocksize_bits;

inode->i_ino = ino;

inode->i_flags = 0;

atomic_set(&inode->i_count, 1);

inode->i_state = I_LOCK;

spin_unlock(&inode_lock);

然後，根據傳遞的參數sb、ino來設置inode中的相應域。由於下面要用文件系統的read_inode方法來填充這個inode，所以設置了I_LOCK標誌以保證這個過程的原子性。設置inode的這些域時需要用自旋鎖保護。

clean_inode(inode);

/* reiserfs specific hack right here. We don't

** want this to last, and are looking for VFS changes

** that will allow us to get rid of it.

** -- mason@suse.com

if (sb->s_op->read_inode2) {

sb->s_op->read_inode2(inode, opaque) ;

} else {

sb->s_op->read_inode(inode);

}

clean_inode函數繼續初始化inode其他的域：

* This just initializes the inode fields

* to known values before returning the inode..

* i_sb, i_ino, i_count, i_state and the lists have

* been initialized elsewhere..

static void clean_inode(struct inode *inode)

{

static struct address_space_operations empty_aops;

static struct inode_operations empty_iops;

static struct file_operations empty_fops;

memset(&inode->u, 0, sizeof(inode->u));

inode->i_sock = 0;

inode->i_op = &empty_iops;

inode->i_fop = &empty_fops;

inode->i_nlink = 1;

atomic_set(&inode->i_writecount, 0);

inode->i_size = 0;

inode->i_blocks = 0;

inode->i_generation = 0;

memset(&inode->i_dquot, 0, sizeof(inode->i_dquot));

inode->i_pipe = NULL;

inode->i_bdev = NULL;

inode->i_cdev = NULL;

inode->i_data.a_ops = &empty_aops;

inode->i_data.host = inode;

inode->i_data.gfp_mask = GFP_HIGHUSER;

inode->i_mapping = &inode->i_data;

}

其中指向相關方法的指標都被指向空的資料結構，然後在文件系統的read_inode方法中設置爲合適的值。詳見下文分析。

* This is special! We do not need the spinlock

* when clearing I_LOCK, because we're guaranteed

* that nobody else tries to do anything about the

* state of the inode when it is locked, as we

* just created it (so there can be no old holders

* that haven't tested I_LOCK).

inode->i_state &= ~I_LOCK;

wake_up(&inode->i_wait);

return inode;

}

設置完inode後，根據作者的注釋，任何執行流在訪問inode.i_state時必須首先設置I_LOCK標誌。一旦這個標誌已經被設置，那麽執行流就得阻塞到i_wait等待佇列上。而當前執行流在設置inode期間時持有I_LOCK鎖的，所以可用保證此時沒有其他的執行流，即不存在競爭條件。所以這裏清除I_LOCK標誌時無需用自旋鎖保護。同時還要喚醒任何因等待I_LOCK被清除而阻塞在i_wait等待佇列上的執行流。

* Uhhuh, somebody else created the same inode under us. Use the old inode instead of the one we just

* allocated.

__iget(old);

spin_unlock(&inode_lock);

destroy_inode(inode);

inode = old;

wait_on_inode(inode);

}

return inode;

}

另外，如果在get_new_inode函數執行時已經由其他的執行流創建了所需的inode，則釋放先前獲得的inode結構。增加引用計數、等待其“可用”（I_LOCK標誌被清除）後直接返回其位址即可。

jffs2_read_inode函數

在jffs2文件系統源代碼文件中定義了類型爲file_system_type的資料結構jffs2_fs_type，其read_super方法爲jffs2_read_super：

static DECLARE_FSTYPE_DEV(jffs2_fs_type, "jffs2", jffs2_read_super);

然後，在jffs2文件系統的初始化函數init_jffs2_fs中用register_filesystem函數向系統註冊了jffs2文件系統，即把jffs2_fs_type加入file_systmes指向的file_system_type資料結構的鏈表。

然後，在挂載jffs2文件系統時，先前註冊的read_super方法，即jffs2_read_super函數被調用，在函數的開始就將文件系統超級塊的s_op指標指向了jffs2_super_operations方法表（參見 “挂載文件系統”）：

static struct super_operations jffs2_super_operations =

{

read_inode: jffs2_read_inode,

put_super: jffs2_put_super,

write_super: jffs2_write_super,

statfs: jffs2_statfs,

remount_fs: jffs2_remount_fs,

clear_inode: jffs2_clear_inode

};

其中read_inode指標正指向jffs2_read_inode函數，所以在get_new_inode函數中調用文件系統的read_inode方法初始化inode資料結構時這個函數才被調用。

void jffs2_read_inode (struct inode *inode)

{

struct jffs2_inode_info *f;

struct jffs2_sb_info *c;

struct jffs2_raw_inode latest_node;

int ret;

D1(printk(KERN_DEBUG "jffs2_read_inode(): inode->i_ino == %lu/n", inode->i_ino));

f = JFFS2_INODE_INFO(inode);

c = JFFS2_SB_INFO(inode->i_sb);

jffs2_init_inode_info(f);

對於jffs2文件系統，inode的u域爲jffs2_inode_info資料結構，super_block的u域爲jffs2_sb_info資料結構，先前get_new_inode函數中已經設置inode.i_sb指向super_block了。用宏返回兩個u域的位址，並且初始化inode的u域。

任何文件都在flash上至少有一個資料實體，而每個資料實體的jffs2_raw_inode或者jffs2_raw_dirent中都含有關於該文件的公共資訊，比如i_mode、i_gid、i_uid、i_size、i_atime、i_mtime、i_ctime、i_nlink等等，所以這些域只需從flash中讀出一個資料實體即可得到。

如果打開目錄文件，則爲每個目錄項創建jffs2_full_dirent，並組織爲鏈表，由jffs2_inode_info的dents域指向；如果打開正規文件，爲每個資料節點創建相應的jffs2_full_dnode/jffs2_node_frag，並組織爲紅黑樹，樹根爲jffs2_inode_info.fragtree。參見圖1，圖2。

上述這兩個工作都是通過jffs2_do_read_inode函數完成的，讀出的資料實體存放在latest_node中。詳見後文分析。

ret = jffs2_do_read_inode(c, f, inode->i_ino, &latest_node);

if (ret) {

make_bad_inode(inode);

up(&f->sem);

return;

}

如果jffs2_do_read_inode函數失敗，則通過make_bad_inode函數將該inode標記爲“bad”：

/**

* make_bad_inode - mark an inode bad due to an I/O error

* @inode: Inode to mark bad

* When an inode cannot be read due to a media or remote network

* failure this function makes the inode "bad" and causes I/O operations

* on it to fail from this point on.

void make_bad_inode(struct inode * inode)

{

inode->i_mode = S_IFREG;

inode->i_atime = inode->i_mtime = inode->i_ctime = CURRENT_TIME;

inode->i_op = &bad_inode_ops;

inode->i_fop = &bad_file_ops;

}

其中主要操作是將索引結點方法指標i_op指向bad_inode_ops方法表，而在is_bad_inode函數中將通過檢查i_op是否指向bad_inode_ops方法表來判斷inode是否爲bad。比如在GC操作中如果發現文件的inode爲bad，則返回EIO。如果一切順利，就可以根據讀到last_node中的資料實體的資訊來設置inode的相應域了。注意nlink域爲硬鏈結個數，在挂載文件系統後已經初步計算過。

inode->i_mode = je32_to_cpu(latest_node.mode);

inode->i_uid = je16_to_cpu(latest_node.uid);

inode->i_gid = je16_to_cpu(latest_node.gid);

inode->i_size = je32_to_cpu(latest_node.isize);

inode->i_atime = je32_to_cpu(latest_node.atime);

inode->i_mtime = je32_to_cpu(latest_node.mtime);

inode->i_ctime = je32_to_cpu(latest_node.ctime);

inode->i_nlink = f->inocache->nlink;

inode->i_blksize = PAGE_SIZE;

inode->i_blocks = (inode->i_size + 511) >> 9;

下面就得根據文件的性質，設置inode中的方法i_op和i_fop（注意，i_fop在打開文件時被賦值給file.f_ops）：

switch (inode->i_mode & S_IFMT) {

unsigned short rdev;

case S_IFLNK:

inode->i_op = &jffs2_symlink_inode_operations;

break;

case S_IFDIR:

{

struct jffs2_full_dirent *fd;

for (fd=f->dents; fd; fd = fd->next) {

if (fd->type == DT_DIR && fd->ino)

inode->i_nlink++;

}

/* and '..' */

inode->i_nlink++;

由此可見，jffs2也由採用了類似ext2中的在目錄下至少有 “..”目錄項，因爲若存在子目錄，則每一個子目錄都使父目錄的硬鏈結數加1。對於父目錄而言，它自己的目錄項在其父目錄中，所以最後還要再加上一個1。

/* Root dir gets i_nlink 3 for some reason */

if (inode->i_ino == 1)

inode->i_nlink++;

由於根目錄“/”僅僅是記憶體中的VFS概念，而在flash中並沒有對應的物理實體（比如資料結點），所以在jffs2_do_read_inode函數中爲根目錄文件創建內核描述符jffs2_inode_cache時其dents域爲空，那麽根目錄文件的硬鏈結計數就不會因爲其下存在的每個子目錄而增加1了。剛才爲每個目錄增加了一個硬鏈結計數，這裏將根目錄文件的硬鏈結計數設置爲3。

inode->i_op = &jffs2_dir_inode_operations;

inode->i_fop = &jffs2_dir_operations;

break;

}

case S_IFREG:

inode->i_op = &jffs2_file_inode_operations;

inode->i_fop = &jffs2_file_operations;

inode->i_mapping->a_ops = &jffs2_file_address_operations;

inode->i_mapping->nrpages = 0;

break;

對於正規文件，文件方法表指標i_fop被設置爲指向jffs2_file_operations方法表，而記憶體映射方法表指標i_mapping->a_ops被設置爲指向jffs2_file_address_operations方法表。從後文就可見讀寫文件時必須經過這兩個方法表中的相關函數。

case S_IFBLK:

case S_IFCHR:

/* Read the device numbers from the media */

D1(printk(KERN_DEBUG "Reading device numbers from flash/n"));

if (jffs2_read_dnode(c, f->metadata, (char *)&rdev, 0, sizeof(rdev)) < 0) { /* Eep */

printk(KERN_NOTICE "Read device numbers for inode %lu failed/n",

(unsigned long)inode->i_ino);

up(&f->sem);

jffs2_do_clear_inode(c, f);

make_bad_inode(inode);

return;

}

/* FALL THROUGH */

設備文件由一個jffs2_raw_inode資料實體表示，緊隨其後的的資料爲設備號rdev（類比在ext2上設備文件由一個磁片索引結點表示，其i_data[]中第一個元素保存設備號）。另外，符號鏈結、設備的jffs2_full_dnode由metadata直接指向而沒有加入fragtree紅黑樹中，詳見jffs2_do_read_inode函數。通過jffs2_read_dnode函數讀出設備號到rdev變數中，詳見後文。需要注意的是如果讀取設備號成功，則這個case後面沒有break，所以會進入下面的代碼。

case S_IFSOCK:

case S_IFIFO:

inode->i_op = &jffs2_file_inode_operations;

init_special_inode(inode, inode->i_mode, kdev_t_to_nr(mk_kdev(rdev>>8, rdev&0xff)));

break;

default:

printk(KERN_WARNING "jffs2_read_inode(): Bogus imode %o for ino %lu/n",

inode->i_mode, (unsigned long)inode->i_ino);

}

up(&f->sem);

D1(printk(KERN_DEBUG "jffs2_read_inode() returning/n"));

}

由此可見，對於特殊文件（設備文件、SOCKET、FIFO文件），它們的inode通過init_special_inode函數來進一步初始化：

void init_special_inode(struct inode *inode, umode_t mode, int rdev)

{

inode->i_mode = mode;

if (S_ISCHR(mode)) {

inode->i_fop = &def_chr_fops;

inode->i_rdev = to_kdev_t(rdev);

inode->i_cdev = cdget(rdev);

} else if (S_ISBLK(mode)) {

inode->i_fop = &def_blk_fops;

inode->i_rdev = to_kdev_t(rdev);

} else if (S_ISFIFO(mode))

inode->i_fop = &def_fifo_fops;

else if (S_ISSOCK(mode))

inode->i_fop = &bad_sock_fops;

else

printk(KERN_DEBUG "init_special_inode: bogus imode (%o)/n", mode);

}

其實它們的inode的mode已經在前面jffs2_do_read_inode函數讀取資料實體後設置好了，這裏主要是設置文件方法和設備文件的inode.i_rdev域。其中參數rdev經過to_kdev_t內聯函數加以格式轉換後就得到i_rdev（該函數及相關的巨集定義在linux/kdev_t.h）。

需要說明的是，在jffs2文件系統中設備文件用flash上一個jffs2_raw_inode資料實體表示。而內核中的VFS的索引結點inode中設計了i_dev和i_rdev兩個域，分別表示設備盤索引結點 所在的設備的設備號，以及它 所代表的設備的設備號。而jffs2_raw_inode中也沒有任何表示設備號的域。這是因爲，能從flash上訪問該資料實體就當然知道其所在flash分區的設備號。另外，緊隨該資料實體後的爲其所代表的設備的設備號。

jffs2_do_read_inode函數

如前文所述，這個函數需要從flash中讀出一個資料實體以得到文件索引節點的公共資訊，同時，如果是目錄文件，則爲每個目錄項創建jffs2_full_dirent，並組織爲鏈表，由jffs2_inode_info的dents域指向；如果是普通文件，爲每個資料節點創建相應的jffs2_full_dnode/jffs2_node_frag，並組織爲紅黑樹，樹根爲jffs2_inode_info.fragtree。

/* Scan the list of all nodes present for this ino, build map of versions, etc. */

int jffs2_do_read_inode(struct jffs2_sb_info *c, struct jffs2_inode_info *f,

uint32_t ino, struct jffs2_raw_inode *latest_node)

{

struct jffs2_tmp_dnode_info *tn_list, *tn;

struct jffs2_full_dirent *fd_list;

struct jffs2_full_dnode *fn = NULL;

uint32_t crc;

uint32_t latest_mctime, mctime_ver;

uint32_t mdata_ver = 0;

size_t retlen;

int ret;

D2(printk(KERN_DEBUG "jffs2_do_read_inode(): getting inocache/n"));

f->inocache = jffs2_get_ino_cache(c, ino);

D2(printk(KERN_DEBUG "jffs2_do_read_inode(): Got inocache at %p/n", f->inocache));

由前文可知，在挂載文件系統時（在jffs2_scan_medium函數中）已經爲flash上所有的文件創建了內核描述符jffs2_inode_cache並加入了文件系統hash表，這裏直接根據索引節點號返回其地址即可。

if (!f->inocache && ino == 1) {

/* Special case - no root inode on medium */

f->inocache = jffs2_alloc_inode_cache();

if (!f->inocache) {

printk(KERN_CRIT "jffs2_do_read_inode(): Cannot allocate inocache for root inode/n");

return -ENOMEM;

}

D1(printk(KERN_DEBUG "jffs2_do_read_inode(): Creating inocache for root inode/n"));

memset(f->inocache, 0, sizeof(struct jffs2_inode_cache));

f->inocache->ino = f->inocache->nlink = 1;

f->inocache->nodes = (struct jffs2_raw_node_ref *)f->inocache;

jffs2_add_ino_cache(c, f->inocache);

}

if (!f->inocache) {

printk(KERN_WARNING "jffs2_do_read_inode() on nonexistent ino %u/n", ino);

return -ENOENT;

}

D1(printk(KERN_DEBUG "jffs2_do_read_inode(): ino #%u nlink is %d/n", ino, f->inocache->nlink));

如果文件描述符尚未創建，則返回錯誤ENOENT。唯一例外的是由於根目錄文件僅僅是在記憶體中的VFS概念而在文件系統中沒有相應的物理實體，所以先前在挂載文件系統時並沒有創建根目錄文件的內核描述符。所以如果文件描述符不存在且ino又等於1，即爲根目錄，則在這裏創建之並加入hash表。

先前在挂載文件系統時已經爲flash上所有的資料實體創建了內核描述符jffs2_raw_node_ref，其中的flash_offset爲資料實體在flash分區內的邏輯偏移，totlen爲其長度。而且同一個文件的jffs2_raw_node_ref通過next_in_ino域組織成一個鏈表，鏈表由文件的jffs2_inode_cache.nodes域指向。jffs2_get_inode_nodes函數就可用利用這個鏈表，訪問文件的所有資料實體，然後：

1. 爲每一個jffs2_raw_dirent創建jffs2_full_dirent，並組織爲鏈表fd_list

2. 爲每一個jffs2_raw_inode創建jffs2_tmp_dnode_info和jffs2_full_dnode，並組織爲鏈表tn_list

/* Grab all nodes relevant to this ino */

ret = jffs2_get_inode_nodes(c, ino, f, &tn_list, &fd_list, &f->highest_version,

&latest_mctime, &mctime_ver);

if (ret) {

printk(KERN_CRIT "jffs2_get_inode_nodes() for ino %u returned %d/n", ino, ret);

return ret;

}

f->dents = fd_list;

對於目錄文件，將其jffs2_full_dirent組成的鏈表由jffs2_inode_info的dents域指向，參見圖1。對於其他文件則遍曆jffs2_tmp_dnode_info組成的鏈表，爲每一個jffs2_full_dnode創建相應的jffs2_node_frag結構並加入inode.u.fragtree所指向的紅黑樹。最後釋放整個jffs2_tmp_dnode_info鏈表。

while (tn_list) {

tn = tn_list;

fn = tn->fn;

if (f->metadata && tn->version > mdata_ver) {

D1(printk(KERN_DEBUG "Obsoleting old metadata at 0x%08x/n",

ref_offset(f->metadata->raw)));

jffs2_mark_node_obsolete(c, f->metadata->raw);

jffs2_free_full_dnode(f->metadata);

f->metadata = NULL;

mdata_ver = 0;

}

if (fn->size) {

jffs2_add_full_dnode_to_inode(c, f, fn);

} else {

/* Zero-sized node at end of version list. Just a metadata update */

D1(printk(KERN_DEBUG "metadata @%08x: ver %d/n", ref_offset(fn->raw), tn->version));

f->metadata = fn;

mdata_ver = tn->version;

}

tn_list = tn->next;

jffs2_free_tmp_dnode_info(tn);

}//while

在這個迴圈中遍曆jffs2_tmp_dnode_info的鏈表，爲每個元素所指向的jffs2_full_dnode資料結構創建相應的jffs2_node_frag資料結構，並插入以jffs2_inode_info.fragtree爲根紅黑樹。這個工作是通過函數jffs2_add_full_dnode_to_inode函數完成的（這個函數涉及紅黑樹的插入，尚未研究）。

對於正規文件、符號鏈結、設備文件，它們至少由一個後繼帶有資料的jffs2_raw_inode組成，所以這裏都組織了紅黑樹，估計只有SOCKET、FIFO文件的jffs2_raw_inode後沒有資料（所以fn->size等於0），所以它們的jffs2_full_dnode直接由jffs2_inode_info的metadata指向。而且對它們也沒有必要組織紅黑樹。

處理完一個jffs2_full_dnode，隨即釋放相應的jffs2_tmp_dnode_info。由此可見該資料結構是在打開文件期間爲處理jffs2_full_dnode資料結構而臨時創建的。

if (!fn) {

/* No data nodes for this inode. */

if (ino != 1) {

printk(KERN_WARNING "jffs2_do_read_inode(): No data nodes found for ino #%u/n", ino);

if (!fd_list) {

return -EIO;

}

printk(KERN_WARNING "jffs2_do_read_inode(): But it has children so we fake some modes

for it/n");

}

latest_node->mode = cpu_to_je32(S_IFDIR|S_IRUGO|S_IWUSR|S_IXUGO);

latest_node->version = cpu_to_je32(0);

latest_node->atime = latest_node->ctime = latest_node->mtime = cpu_to_je32(0);

latest_node->isize = cpu_to_je32(0);

latest_node->gid = cpu_to_je16(0);

latest_node->uid = cpu_to_je16(0);

return 0;

}

這段代碼沒有看懂！對於目錄文件，在 jffs2_get_inode_nodes 函數中通過 fd_list 返回目錄項的 jffs2_full_dirent 的鏈表，而參數 tn_list 爲 NULL 。而局部變數 fn 在上面的 while 迴圈中被設置爲“ tn->fn ”，所以對於目錄文件此時局部變數 fn 仍然爲 NULL 。那豈不是也要執行這段代碼了嗎？

ret = jffs2_flash_read(c, ref_offset(fn->raw), sizeof(*latest_node), &retlen, (void *)latest_node);

if (ret || retlen != sizeof(*latest_node)) {

printk(KERN_NOTICE "MTD read in jffs2_do_read_inode() failed: Returned

%d, %ld of %d bytes read/n", ret, (long)retlen, sizeof(*latest_node));

/* FIXME: If this fails, there seems to be a memory leak. Find it. */

up(&f->sem);

jffs2_do_clear_inode(c, f);

return ret?ret:-EIO;

}

在打開文件時，在jffs2_do_read_inode函數中除了爲資料實體創建相應的資料結構外，還要讀取一個資料實體返回給上層的jffs2_read_inode用於設置文件的inode資料結構。

對於這個函數，還有上面的問題：對於目錄文件，目錄項實體相應的資料結構爲 jffs2_full_dirent ，其 raw 域指向目錄項的內核描述符 jffs2_raw_node_ref ，那麽對於目錄文件，上述 jffs2_flash_read 函數的第二個參數應該爲 ref_offset(f->dents->raw) ，因爲此時局部變數 fn 爲 NULL 。但是程式是可用順利執行的，所以一定是我沒有看明白代碼。

crc = crc32(0, latest_node, sizeof(*latest_node)-8);

if (crc != je32_to_cpu(latest_node->node_crc)) {

printk(KERN_NOTICE "CRC failed for read_inode of inode %u at physical location 0x%x/n",

ino, ref_offset(fn->raw));

up(&f->sem);

jffs2_do_clear_inode(c, f);

return -EIO;

}

讀出了資料實體後，還要進行CRC校驗。

最後，還需要修改讀出的資料實體的某些域：（總體而言，餘下的代碼沒看明白！）

switch(je32_to_cpu(latest_node->mode) & S_IFMT) {

case S_IFDIR:

if (mctime_ver > je32_to_cpu(latest_node->version)) {

/* The times in the latest_node are actually older than mctime in the latest dirent. Cheat. */

latest_node->ctime = latest_node->mtime = cpu_to_je32(latest_mctime);

}

break;

case S_IFREG:

/* If it was a regular file, truncate it to the latest node's isize */

jffs2_truncate_fraglist(c, &f->fragtree, je32_to_cpu(latest_node->isize));

break;

case S_IFLNK:

/* Hack to work around broken isize in old symlink code.

Remove this when dwmw2 comes to his senses and stops

symlinks from being an entirely gratuitous special case. */

if (!je32_to_cpu(latest_node->isize))

latest_node->isize = latest_node->dsize;

/* fall through... */

case S_IFBLK:

case S_IFCHR:

/* Xertain inode types should have only one data node, and it's

kept as the metadata node */

if (f->metadata) {

printk(KERN_WARNING "Argh. Special inode #%u with mode 0%o had metadata node/n",

ino, je32_to_cpu(latest_node->mode));

up(&f->sem);

jffs2_do_clear_inode(c, f);

return -EIO;

}

if (!frag_first(&f->fragtree)) {

printk(KERN_WARNING "Argh. Special inode #%u with mode 0%o has no fragments/n",

ino, je32_to_cpu(latest_node->mode));

up(&f->sem);

jffs2_do_clear_inode(c, f);

return -EIO;

}

/* ASSERT: f->fraglist != NULL */

if (frag_next(frag_first(&f->fragtree))) {

printk(KERN_WARNING "Argh. Special inode #%u with mode 0%o had more than one node/n",

ino, je32_to_cpu(latest_node->mode));

/* FIXME: Deal with it - check crc32, check for duplicate node, check times and discard the older one */

up(&f->sem);

jffs2_do_clear_inode(c, f);

return -EIO;

}

/* OK. We're happy */

f->metadata = frag_first(&f->fragtree)->node;

jffs2_free_node_frag(frag_first(&f->fragtree));

f->fragtree = RB_ROOT;

break;

}

f->inocache->state = INO_STATE_PRESENT;

return 0;

}

對於符號鏈結，其唯一的jffs2_raw_inode後帶有資料，先前其jffs2_full_dnode已經通過jffs2_node_frag加入了紅黑樹；對於設備文件，其flash尚設備索引節點的後繼資料爲設備號，先前也已經被加入紅黑樹。由於這些文件都只有一個資料實體，紅黑樹中只有一個節點，所以這裏把它們都改爲由metadata直接指向。

jffs2_get_inode_nodes函數

在打開文件時這個函數就可用利用這個鏈表訪問文件的所有資料實體，然後：

1. 爲每一個jffs2_raw_dirent創建jffs2_full_dirent，並組織爲鏈表fd_list，由fdp參數返回。

2. 爲每一個jffs2_raw_inode創建jffs2_tmp_dnode_info和jffs2_full_dnode，並組織爲鏈表tn_list，由tnp參數返回。

另外由於在挂載文件系統、爲jffs2_raw_inode資料實體創建內核描述符時並沒有對其後繼資料進行crc校驗（所以才在其內核描述符中設置了REF_UNCHECKED標誌），那麽現在就到了真正進行crc校驗的時候了。

/* Get tmp_dnode_info and full_dirent for all non-obsolete nodes associated

with this ino, returning the former in order of version */

int jffs2_get_inode_nodes(struct jffs2_sb_info *c, ino_t ino, struct jffs2_inode_info *f,

struct jffs2_tmp_dnode_info **tnp, struct jffs2_full_dirent **fdp,

uint32_t *highest_version, uint32_t *latest_mctime, uint32_t *mctime_ver)

{

struct jffs2_raw_node_ref *ref = f->inocache->nodes;

struct jffs2_tmp_dnode_info *tn, *ret_tn = NULL;

struct jffs2_full_dirent *fd, *ret_fd = NULL;

union jffs2_node_union node;

size_t retlen;

int err;

*mctime_ver = 0;

D1(printk(KERN_DEBUG "jffs2_get_inode_nodes(): ino #%lu/n", ino));

if (!f->inocache->nodes) {

printk(KERN_WARNING "Eep. no nodes for ino #%lu/n", ino);

}

首先，根據jffs2_inode_info.inocache->nodes得到指向該文件所有資料實體的內核描述符鏈表的指標。

spin_lock_bh(&c->erase_completion_lock);

for (ref = f->inocache->nodes; ref && ref->next_in_ino; ref = ref->next_in_ino) {

/* Work out whether it's a data node or a dirent node */

if (ref_obsolete(ref)) {

/* FIXME: On NAND flash we may need to read these */

D1(printk(KERN_DEBUG "node at 0x%08x is obsoleted. Ignoring./n", ref_offset(ref)));

continue;

}

/* We can hold a pointer to a non-obsolete node without the spinlock,

but _obsolete_ nodes may disappear at any time, if the block they're in gets erased */

spin_unlock_bh(&c->erase_completion_lock);

遍歷數據實體的內核描述符鏈表，在訪問鏈表期間要持有jffs2_sb_info的erase_completion_lock自旋鎖。開始新的迴圈後即可釋放該自旋鎖；在for迴圈的最後、進入新的迴圈前重新獲得該自旋鎖。

從後文對寫操作的分析可用看到如果對文件進行了任何修改則直接寫入新的資料實體，而原有的“過時”的資料實體不做任何改動。在內核中爲新的資料實體創建新的內核描述符jffs2_raw_node_ref，同時將原有資料實體的jffs2_raw_node_ref標記爲“過時”（設置其flash_offset域的REF_OBSOLETE標誌）。

所以，在遍曆文件的資料實體內核描述符鏈表時，如果被標記爲過時，那麽說明相應的flash資料實體已經失效，則直接跳過之即可。

（所以，如果打開目錄文件，則不會爲過時的目錄項 jffs2_raw_dirent 創建 jffs2_full_dirent ；如果打開正規文件，則不會爲過時的 jffs2_raw_inode 創建 jffs2_tmp_dnode_info 和 jffs2_full_dnode ！）

cond_resched();

/* FIXME: point() */

err = jffs2_flash_read(c, (ref_offset(ref)), min(ref->totlen, sizeof(node)), &retlen, (void *)&node);

if (err) {

printk(KERN_WARNING "error %d reading node at 0x%08x in get_inode_nodes()/n",

err, ref_offset(ref));

goto free_out;

}

/* Check we've managed to read at least the common node header */

if (retlen < min(ref->totlen, sizeof(node.u))) {

printk(KERN_WARNING "short read in get_inode_nodes()/n");

err = -EIO;

goto free_out;

}

jffs2_flash_read函數最終通過調用flash驅動的read_ecc或者read方法讀出flash分區上指定偏移、長度的資料段。如果支援直接記憶體映射，那麽在讀NOR flash時可以通過記憶體映射完成（從而節省memcpy的記憶體拷貝開銷）。用flash驅動的point函數建立記憶體映射，在讀操作完成後再用unpoint拆除。

jffs2_flash_read函數的最後一個域node爲一個共用體：

union jffs2_node_union {

struct jffs2_raw_inode i;

struct jffs2_raw_dirent d;

struct jffs2_unknown_node u;

};

由於兩種資料實體都含有同樣的頭部，所以node的長度應該爲其中最大的jffs2_raw_inode資料結構的長度（不包括後繼資料）。

分兩步讀出有效的資料實體：首先讀出不包含後繼資料的jffs2_raw_dirent或者jffs2_raw_inode資料實體本身，而其中的totlen域爲整個資料實體的長度，第二次再讀出後繼資料。同時，根據頭部資訊中的nodetype欄位即可得到資料實體的類型並分配相應的資料結構：爲jffs2_raw_dirent分配jffs2_full_dirent，爲jffs2_raw_inode分配jffs2_tmp_dnode_info和jffs2_full_dnode。

switch (je16_to_cpu(node.u.nodetype)) {

case JFFS2_NODETYPE_DIRENT:

D1(printk(KERN_DEBUG "Node at %08x (%d) is a dirent node/n", ref_offset(ref),

ref_flags(ref)));

if (ref_flags(ref) == REF_UNCHECKED) {

printk(KERN_WARNING "BUG: Dirent node at 0x%08x never got checked? How?/n",

ref_offset(ref));

BUG();

}

if (retlen < sizeof(node.d)) {

printk(KERN_WARNING "short read in get_inode_nodes()/n");

err = -EIO;

goto free_out;

}

if (je32_to_cpu(node.d.version) > *highest_version)

*highest_version = je32_to_cpu(node.d.version);

if (ref_obsolete(ref)) {

/* Obsoleted. This cannot happen, surely? dwmw2 20020308 */

printk(KERN_ERR "Dirent node at 0x%08x became obsolete while we weren't looking/n",

ref_offset(ref));

BUG();

}

讀出目錄項資料實體後首先進行必要的有效性檢查：其內核描述符不應該是REF_UNCHECKED的（在挂載文件系統時爲目錄項資料實體創建相應的內核描述符，此時設置其標誌爲REF_PRISTINE，參見 jffs2_scan_dirent_node函數），否則爲BUG；如果前面jffs2_flash_read函數實際讀出的資料量retlen小於jffs2_raw_dirent資料結構的長度，則表明讀出失敗，所以返回EIO。另外，還要根據讀出的目錄項的version號來更新其所在目錄文件的jffs2_inode_info.highest_version。

fd = jffs2_alloc_full_dirent(node.d.nsize+1);

if (!fd) {

err = -ENOMEM;

goto free_out;

}

memset(fd,0,sizeof(struct jffs2_full_dirent) + node.d.nsize+1);

fd->raw = ref;

fd->version = je32_to_cpu(node.d.version);

fd->ino = je32_to_cpu(node.d.ino);

fd->type = node.d.type;

然後，爲目錄項實體jffs2_raw_dirent分配相應的jffs2_full_dirent資料結構及後繼檔案名的空間並初始化。而jffs2_full_dirent資料結構中的域都是從flash上目錄項資料實體的相應域複製過來的。

/* Pick out the mctime of the latest dirent */

if(fd->version > *mctime_ver) {

*mctime_ver = fd->version;

*latest_mctime = je32_to_cpu(node.d.mctime);

}

/* memcpy as much of the name as possible from the raw dirent we've already read from the flash

if (retlen > sizeof(struct jffs2_raw_dirent))

memcpy(&fd->name[0], &node.d.name[0], min((uint32_t)node.d.nsize,

(retlen-sizeof(struct jffs2_raw_dirent))));

先前給jffs2_flash_read函數傳遞的待讀出的資料長度爲min(ref->totlen, sizeof(node))，而node的大小爲jffs2_raw_inode的長度，大於jffs2_raw_dirent資料結構的長度，所以先前的讀操作至少從flash中讀取了部分檔案名（甚至是全部的檔案名）。所以這裏將已讀出的部分（全部）檔案名拷貝到jffs2_full_dirent的name所指的空間中。

/* Do we need to copy any more of the name directly from the flash?*/

if (node.d.nsize + sizeof(struct jffs2_raw_dirent) > retlen) {

/* FIXME: point() */

int already = retlen - sizeof(struct jffs2_raw_dirent);

err = jffs2_flash_read(c, (ref_offset(ref)) + retlen,

node.d.nsize - already, &retlen, &fd->name[already]);

if (!err && retlen != node.d.nsize - already)

err = -EIO;

if (err) {

printk(KERN_WARNING "Read remainder of name in jffs2_get_inode_nodes():

error %d/n", err);

jffs2_free_full_dirent(fd);

goto free_out;

}

正是由於第一次可能唯讀出了部分檔案名，所以這裏可能需要讀出剩餘的檔案名。注意第一次實際讀出的資料長度爲retlen，那麽剩餘檔案名的起始位址在flash分區上的邏輯偏移爲（ref_offset(ref) + retlen），而已經讀出的部分檔案名長度爲already，而nsize爲完整檔案名的長度，所以二者之差爲剩餘檔案名的長度。第二次讀操作之間將剩餘檔案名讀出到jffs2_full_dirent.name[already]所指的地方。

fd->nhash = full_name_hash(fd->name, node.d.nsize);

fd->next = NULL;

/* Wheee. We now have a complete jffs2_full_dirent structure, with

the name in it and everything. Link it into the list */

D1(printk(KERN_DEBUG "Adding fd /"%s/", ino #%u/n", fd->name, fd->ino));

jffs2_add_fd_to_list(c, fd, &ret_fd);

break;

最後，需要根據檔案名計算一個“散列值”，記錄到nhash域中，然後，通過jffs2_add_fd_to_list函數根據nhash值將目錄文件的所有目錄項的jffs2_full_dirent資料結構組織在ret_fd所指向的鏈表中（這個指標最終由參出返回，然後被設置到jffs2_inode_info.dents域）。

最後由break跳出switch結構，並開始新的for迴圈訪問當前文件的下一個flash資料實體。

case JFFS2_NODETYPE_INODE:

D1(printk(KERN_DEBUG "Node at %08x (%d) is a data node/n", ref_offset(ref), ref_flags(ref)));

if (retlen < sizeof(node.i)) {

printk(KERN_WARNING "read too short for dnode/n");

err = -EIO;

goto free_out;

}

if (je32_to_cpu(node.i.version) > *highest_version)

*highest_version = je32_to_cpu(node.i.version);

D1(printk(KERN_DEBUG "version %d, highest_version now %d/n", je32_to_cpu(node.i.version),

*highest_version));

if (ref_obsolete(ref)) {

/* Obsoleted. This cannot happen, surely? dwmw2 20020308 */

printk(KERN_ERR "Inode node at 0x%08x became obsolete while we weren't looking/n",

ref_offset(ref));

BUG();

}

如果該資料實體爲jffs2_raw_inode，則於上面處理目錄項資料實體類似首先進行有效性檢查。由於在挂載文件系統時並沒對jffs2_raw_inode資料實體進行crc校驗而是推遲到了真正打開文件時，所以在其內核描述符中設置了REF_UNCHECKED標誌（參見jffs2_scan_inode_node函數的相關部分）。那麽現在打開文件時就到了真正進行crc校驗的時候了：對jffs2_raw_inode本身和後繼資料進行crc校驗。

/* If we've never checked the CRCs on this node, check them now. */

if (ref_flags(ref) == REF_UNCHECKED) {

uint32_t crc;

struct jffs2_eraseblock *jeb;

crc = crc32(0, &node, sizeof(node.i)-8);

if (crc != je32_to_cpu(node.i.node_crc)) {

printk(KERN_NOTICE "jffs2_get_inode_nodes(): CRC failed on node at 0x%08x: Read

0x%08x, calculated 0x%08x/n",

ref_offset(ref), je32_to_cpu(node.i.node_crc), crc);

jffs2_mark_node_obsolete(c, ref);

spin_lock_bh(&c->erase_completion_lock);

continue;

}

jffs2_raw_inode的最後兩個域爲其本身及其後資料的crc校驗值，在計算本身的crc值時要去掉這個兩個域所占的8個位元組。如果crc校驗失敗，則將這個資料實體的內核描述符標記爲“過時”，然後獲得erase_completion_lock自旋鎖後開始新的迴圈。如果jffs2_raw_inode資料實體本身的crc校驗正確，下面接著對後繼資料進行crc校驗：（在挂載文件系統、爲資料實體建立內核描述符時已經對資料實體本身進行了crc校驗，這裏再次檢查就重復了）

if (node.i.compr != JFFS2_COMPR_ZERO && je32_to_cpu(node.i.csize)) {

/* FIXME: point() */

char *buf = kmalloc(je32_to_cpu(node.i.csize), GFP_KERNEL);

if (!buf)

return -ENOMEM;

err = jffs2_flash_read(c, ref_offset(ref) + sizeof(node.i), je32_to_cpu(node.i.csize),

&retlen, buf);

if (!err && retlen != je32_to_cpu(node.i.csize))

err = -EIO;

if (err) {

kfree(buf);

return err;

}

crc = crc32(0, buf, je32_to_cpu(node.i.csize));

kfree(buf);

if (crc != je32_to_cpu(node.i.data_crc)) {

printk(KERN_NOTICE "jffs2_get_inode_nodes(): Data CRC failed on node at

0x%08x: Read 0x%08x, calculated 0x%08x/n",

ref_offset(ref), je32_to_cpu(node.i.data_crc), crc);

jffs2_mark_node_obsolete(c, ref);

spin_lock_bh(&c->erase_completion_lock);

continue;

}

如果node.i.compr等於JFFS2_COMPR_ZERO，那麽表示該資料實體對應的是一個洞。如果不是洞而且的確存在後繼壓縮過了資料，則需要進行crc校驗（否則無需校驗，則不進入這個if分支）。csize爲壓縮後資料的長度。首先從flash上讀出壓縮資料，計算出crc值後即可釋放相應緩衝區。如果crc校驗失敗，則將這個資料實體的內核描述符標記爲“過時”並然後獲得erase_completion_lock自旋鎖後開始新的迴圈。

/* Mark the node as having been checked and fix the accounting accordingly */

jeb = &c->blocks[ref->flash_offset / c->sector_size];

jeb->used_size += ref->totlen;

jeb->unchecked_size -= ref->totlen;

c->used_size += ref->totlen;

c->unchecked_size -= ref->totlen;

mark_ref_normal(ref);

}//if (ref_flags(ref) == REF_UNCHECKED)

對資料實體進行了 crc 校驗後，就要通過 mark_ref_normal 改變其內核描述符的標誌爲 REF_NORMAL，並且刷新資料實體所在擦除塊描述符和文件系統超級塊的u域中的used_size和unchecked_size統計資訊。

tn = jffs2_alloc_tmp_dnode_info();

if (!tn) {

D1(printk(KERN_DEBUG "alloc tn failed/n"));

err = -ENOMEM;

goto free_out;

}

tn->fn = jffs2_alloc_full_dnode();

if (!tn->fn) {

D1(printk(KERN_DEBUG "alloc fn failed/n"));

err = -ENOMEM;

jffs2_free_tmp_dnode_info(tn);

goto free_out;

}

tn->version = je32_to_cpu(node.i.version);

tn->fn->ofs = je32_to_cpu(node.i.offset);

/* There was a bug where we wrote hole nodes out with csize/dsize swapped. Deal with it */

if (node.i.compr == JFFS2_COMPR_ZERO && !je32_to_cpu(node.i.dsize) &&

je32_to_cpu(node.i.csize))

tn->fn->size = je32_to_cpu(node.i.csize);

else // normal case...

tn->fn->size = je32_to_cpu(node.i.dsize);

tn->fn->raw = ref;

D1(printk(KERN_DEBUG "dnode @%08x: ver %u, offset %04x, dsize %04x/n",

ref_offset(ref), je32_to_cpu(node.i.version),

je32_to_cpu(node.i.offset), je32_to_cpu(node.i.dsize)));

jffs2_add_tn_to_list(tn, &ret_tn);

break;

接下來爲jffs2_raw_inode分配相應的jffs2_tmp_dnode_info和jffs2_full_dnode資料結構並初始化，然後將同文件的jffs2_tmp_dnode_info組織到ret_tn所指向的鏈表中。最後由break退出switch，開始新的迴圈。

注意，jffs2_tmp_dnode_info資料結構組成了鏈表，其fn域指向相應的jffs2_full_dnode，而後者的raw域指向資料實體的內核描述符。此時尚需jffs2_node_frag資料結構才能組織紅黑樹，這個操作在返回到上層jffs2_do_read_inode函數中才完成。（而且一旦紅黑樹組織完畢，jffs2_tmp_dnode_info資料結構的鏈表即被釋放）

一般情況下，文件由jffs2_raw_inode或者jffs2_raw_dirent資料實體組成。default分支中處理其他特殊類型的資料節點：（這些特殊的資料節點的作用是什麽？由誰？何時寫入？）

default:

if (ref_flags(ref) == REF_UNCHECKED) {

struct jffs2_eraseblock *jeb;

printk(KERN_ERR "Eep. Unknown node type %04x at %08x was marked

REF_UNCHECKED/n", je16_to_cpu(node.u.nodetype), ref_offset(ref));

/* Mark the node as having been checked and fix the accounting accordingly */

jeb = &c->blocks[ref->flash_offset / c->sector_size];

jeb->used_size += ref->totlen;

jeb->unchecked_size -= ref->totlen;

c->used_size += ref->totlen;

c->unchecked_size -= ref->totlen;

mark_ref_normal(ref);

}

（如果搞明白了爲什麽會出現特殊類型的資料實體，）如果特殊類型的資料實體尚未檢查crc，則硬性改爲已檢查過的。爲什麽？

node.u.nodetype = cpu_to_je16(JFFS2_NODE_ACCURATE | je16_to_cpu(node.u.nodetype));

if (crc32(0, &node, sizeof(struct jffs2_unknown_node)-4) != je32_to_cpu(node.u.hdr_crc)) {

/* Hmmm. This should have been caught at scan time. */

printk(KERN_ERR "Node header CRC failed at %08x. But it must have been OK earlier./n",

ref_offset(ref));

printk(KERN_ERR "Node was: { %04x, %04x, %08x, %08x }/n",

je16_to_cpu(node.u.magic), je16_to_cpu(node.u.nodetype),

je32_to_cpu(node.u.totlen), je32_to_cpu(node.u.hdr_crc));

jffs2_mark_node_obsolete(c, ref);

}

對特殊類型的資料實體進行頭部的crc校驗。如果失敗，則開始新的迴圈。否則進一步分析其類型：

else switch(je16_to_cpu(node.u.nodetype) & JFFS2_COMPAT_MASK) {

case JFFS2_FEATURE_INCOMPAT:

printk(KERN_NOTICE "Unknown INCOMPAT nodetype %04X at %08x/n",

je16_to_cpu(node.u.nodetype), ref_offset(ref));

/* EEP */

BUG();

break;

當初在挂載文件系統時如果發現了這種類型的資料實體，則拒絕挂載文件系統。所以現在在打開文件時就不會檢查出這種類型的資料實體，否則一定是BUG。

case JFFS2_FEATURE_ROCOMPAT:

printk(KERN_NOTICE "Unknown ROCOMPAT nodetype %04X at %08x/n",

je16_to_cpu(node.u.nodetype), ref_offset(ref));

if (!(c->flags & JFFS2_SB_FLAG_RO))

BUG();

break;

當初在挂載文件系統時如果發現了這種類型的資料實體，則把文件系統挂載爲RO的。現在在打開文件時如果發現文件含有這種類型的資料實體，則檢查文件系統是否按照RO方式挂載的。

case JFFS2_FEATURE_RWCOMPAT_COPY:

printk(KERN_NOTICE "Unknown RWCOMPAT_COPY nodetype %04X at %08x/n",

je16_to_cpu(node.u.nodetype), ref_offset(ref));

break;

如果是這種類型的資料實體，則不做任何額外操作。

case JFFS2_FEATURE_RWCOMPAT_DELETE:

printk(KERN_NOTICE "Unknown RWCOMPAT_DELETE nodetype %04X at %08x/n",

je16_to_cpu(node.u.nodetype), ref_offset(ref));

jffs2_mark_node_obsolete(c, ref);

break;

如果是這種類型的資料實體，則標記其內核描述符爲過時即可。

}

}//switch

spin_lock_bh(&c->erase_completion_lock);

}//for

spin_unlock_bh(&c->erase_completion_lock);

*tnp = ret_tn;

*fdp = ret_fd;

return 0;

free_out:

jffs2_free_tmp_dnode_info_list(ret_tn);

jffs2_free_full_dirent_list(ret_fd);

return err;

}

在函數的最後，通過返回參數返回ret_tn和tet_fd鏈表的指標。

第6章 jffs2中寫正規文件的方法

在打開文件、創建文件的 inode 資料結構時，在 jffs2_read_inode 函數中將正規文件的文件方法設置爲：

case S_IFREG:

inode->i_op = &jffs2_file_inode_operations;

inode->i_fop = &jffs2_file_operations;

inode->i_mapping->a_ops = &jffs2_file_address_operations;

inode->i_mapping->nrpages = 0;

break;

訪問文件的方法由jffs2_file_operation方法表提供，而文件的記憶體映射方法由jffs2_file_address_operation方法表提供，它們的定義如下：

struct file_operations jffs2_file_operations =

{

.llseek = generic_file_llseek,

.open = generic_file_open,

.read = generic_file_read,

.write = generic_file_write,

.ioctl = jffs2_ioctl,

.mmap = generic_file_mmap,

.fsync = jffs2_fsync,

#if LINUX_VERSION_CODE >= KERNEL_VERSION(2,5,29)

.sendfile = generic_file_sendfile

#endif

};

struct address_space_operations jffs2_file_address_operations =

{

.readpage = jffs2_readpage,

.prepare_write = jffs2_prepare_write,

.commit_write = jffs2_commit_write

};

爲了提高讀寫效率在設備驅動程式層次上設計了緩衝機制，以頁面爲單位緩存文件的內容。這樣做的好處是可用很容易地通過mmap系統調用將文件的緩衝頁面直接映射到用戶進程空間中去，從而實現文件的記憶體映射。也就是說文件的記憶體映射是建立在文件緩衝的基礎上的。

在inode中設計了一個域i_mapping，它指向一個address_space資料結構：

struct address_space {

struct list_head clean_pages; /* list of clean pages */

struct list_head dirty_pages; /* list of dirty pages */

struct list_head locked_pages; /* list of locked pages */

unsigned long nrpages; /* number of total pages */

struct address_space_operations *a_ops; /* methods */

struct inode *host; /* owner: inode, block_device */

struct vm_area_struct *i_mmap; /* list of private mappings */

struct vm_area_struct *i_mmap_shared; /* list of shared mappings */

spinlock_t i_shared_lock; /* and spinlock protecting it */

int gfp_mask; /* how to allocate the pages */

};

其中的clean_pages、dirty_pages、locked_pages分別指向頁快取記憶體中的相關頁面，i_mmap、i_mmap_shared指向映射該文件的用戶進程的線性區描述符的鏈表，host指向該資料結構所屬的inode，a_ops指向的address_space_operations方法表提供了文件緩衝機制和設備驅動程式之間的介面（由這個方法表中的函數最終調用設備驅動程式）。

從下文可見，當用戶進程訪問文件時方法表jffs2_file_operation中的相應函數會被調用，而它又會進一步調用address_space_operations方法表中的相關函數來完成與flash交換資料實體的操作。

sys_write函數

sys_write函數爲write系統調用的處理方法：

asmlinkage ssize_t sys_write(unsigned int fd, const char * buf, size_t count)

{

ssize_t ret;

struct file * file;

ret = -EBADF;

file = fget(fd);

進程描述符PCB中有一個file_struct資料結構，其中的fd[]陣列爲file資料結構的指標陣列，用進程已經打開的文件號索引。首先通過fget函數返回與打開文件號fd相對應的file結構的位址，同時增加其引用計數。

if (file) {

if (file->f_mode & FMODE_WRITE) {

struct inode *inode = file->f_dentry->d_inode;

ret = locks_verify_area(FLOCK_VERIFY_WRITE, inode, file, file->f_pos, count);

if (!ret) {

ssize_t (*write)(struct file *, const char *, size_t, loff_t *);

ret = -EINVAL;

if (file->f_op && (write = file->f_op->write) != NULL)

ret = write(file, buf, count, &file->f_pos);

}

然後通過file資料結構得到文件索引節點inode的指標。在進行寫操作前首先要由locks_verify_area檢查在待寫入的區域上沒有已存在的寫強制鎖。（這也就是“強制”鎖名稱的來歷了：任何寫入操作都會執行檢查）如果通過了檢查，則調用file->f_op所指方法表中的write方法。

我們沒有分析sys_open函數，這裏僅指出當創建file物件時會用inode.i_fop指標設置file.f_op。所以這裏調用的就是inode.i_fop指向的jffs2_file_operation方法表中的generic_file_wirte函數。

if (ret > 0)

dnotify_parent(file->f_dentry, DN_MODIFY);

fput(file);

}

return ret;

}

當寫操作完成後，要通過fput函數減少文件的file物件的引用計數。

generic_file_write函數

* Write to a file through the page cache.

* We currently put everything into the page cache prior to writing it.

* This is not a problem when writing full pages. With partial pages,

* however, we first have to read the data into the cache, then

* dirty the page, and finally schedule it for writing. Alternatively, we

* could write-through just the portion of data that would go into that

* page, but that would kill performance for applications that write data

* line by line, and it's prone to race conditions.

* Note that this routine doesn't try to keep track of dirty pages. Each

* file system has to do this all by itself, unfortunately.

* okir@monad.swb.de

ssize_t

generic_file_write(struct file *file,const char *buf,size_t count, loff_t *ppos)

{

struct address_space *mapping = file->f_dentry->d_inode->i_mapping;

struct inode *inode = mapping->host;

unsigned long limit = current->rlim[RLIMIT_FSIZE].rlim_cur;

通過file、dentry、inode結構之間的鏈結關係，就可以由file找到文件的inode了。同時得到當前進程在文件大小方面的“資源限制”。

loff_t pos;

struct page *page, *cached_page;

ssize_t written;

long status = 0;

int err;

unsigned bytes;

if ((ssize_t) count < 0)

return -EINVAL;

if (!access_ok(VERIFY_READ, buf, count))

return -EFAULT;

首先進行必要的參數檢查：待寫入的資料量不能小於0，而且寫入資料所在的用戶空間必須是可讀的。

cached_page = NULL;

down(&inode->i_sem);

在寫操作開始前要獲得信號量i_sem。根據待寫入的資料量一次寫入可能要分成若干次操作才能完成，但是在整個寫入操作期間當前進程一直持有這個信號量，直到在這個函數退出前（即寫入操作完成後）才釋放，從而實現了寫操作的原子性。

pos = *ppos;

err = -EINVAL;

if (pos < 0)

goto out;

任何針對文件的操作都是相對於進程在文件中的上下文，即文件指標ppos進行的。

err = file->f_error;

if (err) {

file->f_error = 0;

goto out;

}

執行失敗的系統調用在返回用戶態（即先前發出該系統調用的用戶進程）前可能自動重新執行。比如在執行系統調用時發生阻塞，後因爲收到信號而恢復執行，此時從結束阻塞處返回ERESTARTSYS。在從ret_from_sys_call返回後在do_signal中處理非阻塞挂起信號，信號處理方法決定了是否自動重新執行先前阻塞過程被打斷的系統調用：在由iret返回用戶態的系統調用封裝函數時可以選擇修改保存在內核棧中的返回地址，使從“int 0x80”處恢復執行（在x86體系結構上），從而在系統調用封裝常式中再次發出系統調用、而不是返回用戶進程（如果系統調用正常結束，或者相應信號的處理方法不自動重新執行失敗的系統調用，則應該返回到封裝常式中“int 0x80”之後的指令）。

file的f_error域用於記錄錯誤。如果它不爲0，則自動重新執行的系統調用就沒有必要重新執行了，直接從out退出，向上層返回該錯誤值。

written = 0;

由下文可見寫入操作可能要分多次完成。written記錄了當前已經完成的寫入量，這裏首先清0。

/* FIXME: this is for backwards compatibility with 2.4 */

if (!S_ISBLK(inode->i_mode) && file->f_flags & O_APPEND)

pos = inode->i_size;

如果文件標誌O_APPEND有效（表示只能向文件末尾追加資料），則調整寫入位置爲文件末尾（即文件的大小）。爲什麽還要第一個條件？（ generic_file_write 還用於處理非正規文件的寫入麽？）

* Check whether we've reached the file size limit.

err = -EFBIG;

if (!S_ISBLK(inode->i_mode) && limit != RLIM_INFINITY) {

if (pos >= limit) {

send_sig(SIGXFSZ, current, 0);

goto out;

}

if (pos > 0xFFFFFFFFULL || count > limit - (u32)pos) {

/* send_sig(SIGXFSZ, current, 0); */

count = limit - (u32)pos;

}

接著檢查文件的大小是否超過系統的設定值。如果不是允許無限地寫入（限制爲RLIM_INFINITY，即沒有限制），則如果待寫入的位置大於文件大小的限制值，則給當前進程發送SIGXFSZ信號後直接退出，向上層返回的錯誤碼爲EFBIG；如果待寫入的資料量超過了剩餘可寫入的資料量，則調整待寫入量爲允許寫入的資料量。

下面的代碼與“LFS rule”有關。（它是什麽？暫時沒有研究。）

* LFS rule

if ( pos + count > MAX_NON_LFS && !(file->f_flags&O_LARGEFILE)) {

if (pos >= MAX_NON_LFS) {

send_sig(SIGXFSZ, current, 0);

goto out;

}

if (count > MAX_NON_LFS - (u32)pos) {

/* send_sig(SIGXFSZ, current, 0); */

count = MAX_NON_LFS - (u32)pos;

}

* Are we about to exceed the fs block limit ?

* If we have written data it becomes a short write

* If we have exceeded without writing data we send

* a signal and give them an EFBIG.

* Linus frestrict idea will clean these up nicely..

if (!S_ISBLK(inode->i_mode)) {

if (pos >= inode->i_sb->s_maxbytes)

{

if (count || pos > inode->i_sb->s_maxbytes) {

send_sig(SIGXFSZ, current, 0);

err = -EFBIG;

goto out;

}

/* zero-length writes at ->s_maxbytes are OK */

}

if (pos + count > inode->i_sb->s_maxbytes)

count = inode->i_sb->s_maxbytes - pos;

} else {

if (is_read_only(inode->i_rdev)) {

err = -EPERM;

goto out;

}

if (pos >= inode->i_size) {

if (count || pos > inode->i_size) {

err = -ENOSPC;

goto out;

}

if (pos + count > inode->i_size)

count = inode->i_size - pos;

}

err = 0;

if (count == 0)

goto out;

remove_suid(inode);

inode->i_ctime = inode->i_mtime = CURRENT_TIME;

mark_inode_dirty_sync(inode);

只要待寫入的資料量不爲0下面就要開始真正的寫操作了。刷新VFS的inode中的時間戳，並用mark_inode_dirty_sync函數將其標記爲“髒”，以後它就會被寫回到設備索引節點了。

（沒有研究 remove_suid 函數即相關的機制，根據情景分析，如果當前進程沒有 setuid 權利而且目標文件具有 setuid 和 setgid 屬性，則它剝奪目標文件的這些屬性，詳見上冊 P588 ）

if (file->f_flags & O_DIRECT)

goto o_direct;

文件的 O_DIRECT 標誌的作用如何？是否代表 IO 設備？尚未研究相關的 generic_file_direct_IO 函數。

do {

unsigned long index, offset;

long page_fault;

char *kaddr;

* Try to find the page in the cache. If it isn't there, allocate a free page.

offset = (pos & (PAGE_CACHE_SIZE -1)); /* Within page */

index = pos >> PAGE_CACHE_SHIFT;

bytes = PAGE_CACHE_SIZE - offset;

if (bytes > count)

bytes = count;

文件在邏輯上被認爲是一個連續的線性空間，可用看作由若干連續頁面組成。根據待寫入的資料量及初始寫入位置可能要在一個迴圈中分多次完成寫入操作，而 每次迴圈都只能針對一個頁面寫入。

在每次迴圈的開始都首先計算本次迴圈涉及的頁面、寫入位置在頁面內的偏移和寫入的資料量。pos爲初始寫入位置在文件內的偏移，它右移頁面大小即得到文件內的頁面號index，對頁面大小取整即得到本次寫操作在該頁面內的偏移offset，而bytes爲寫入這個頁面的資料量，參見下圖。如果bytes大於剩餘待寫入的資料量，則調整bytes的值。

pos

count

index

offset

bytes

* Bring in the user page that we will copy from _first_.

* Otherwise there's a nasty deadlock on copying from the

* same page as we're writing to, without it being marked

* up-to-date.

{ volatile unsigned char dummy;

__get_user(dummy, buf);

__get_user(dummy, buf+bytes-1);

}

__get_user函數用於訪問用戶空間，這裏通過兩個__get_usr操作讀取用戶空間寫緩衝區的首尾位元組。這樣可用保證一定爲用戶空間緩衝區分配了相應的物理頁框。作者的注釋是什麽意思？？

status = -ENOMEM; /* we'll assign it later anyway */

page = __grab_cache_page(mapping, index, &cached_page);

if (!page)

break;

/* We have exclusive IO access to the page.. */

if (!PageLocked(page)) {

PAGE_BUG(page);

}

前面計算出了本次迴圈要寫入的頁面，這裏通過__grab_cache_page函數返回該頁面在內核頁快取記憶體中對應的物理頁框。內核頁快取記憶體中的所有物理頁框的指標被組織在哈希表page_hash_table中，由於“頁面在文件內的偏移”在系統內顯然不唯一，所以在計算散列值時要使用文件的address_space結構的指標mapping（將該指標值當作無符號長整型來使用）。另外，在返回頁框描述符時遞增了頁框的引用計數，等到本次迴圈結束時再遞減。

得到了該頁面對應的物理頁框的內核描述符page資料結構的指標後，就要對該頁面加鎖。在本次迴圈結束前再解鎖。

值得說明的是，一次迴圈只操作一個頁框，所以加鎖的粒度爲頁框。而寫操作由多次迴圈組成，針對的是整個文件，所以加鎖的粒度爲整個文件。回想前面在進入迴圈前就獲得了inode.i_sem信號量，在generic_file_write函數退出前才釋放這個信號量，從而保證整個寫文件操作的原子性。而在一次迴圈中加鎖相應的頁框，從而保證在迴圈期間對頁框操作的原子性。

kaddr = kmap(page);

status = mapping->a_ops->prepare_write(file, page, offset, offset+bytes);

if (status)

goto sync_failure;

獲得了頁面對應的頁框後，在開始真正的寫操作前還需要進行一些必須的準備操作，比如如果頁框不是“Uptodate”的話就得首先從設備上讀出相應的頁面（因爲本次寫操作不一定涵蓋整個頁面）。對於ext2文件系統，如果該頁框時剛才才分配、並加入頁快取記憶體的，那麽還需要爲頁框內的磁片塊緩衝區建立相應的描述符buffer_head，所有這些操作都由文件的address_space_operation方法表中prepare_write指標指向的函數完成。對於jffs2文件系統，由於底層flash驅動並不使用“磁片塊緩衝區”，所以只需要在相應的頁框過時、且寫入操作沒有包括整個頁框時讀入它。另外如果page頁框在文件內的起始大於文件大小，則本次迴圈將在文件中造成一個洞（hole）， 而且後面的寫操作並不會描述這個洞，所以得向flash寫入一個jffs2_raw_node資料實體來描述它。詳見後文。（那麽在 ext2 文件系統中是如何處理洞的？）

另外，kmap函數返回相應頁框的內核虛擬位址kaddr。

page_fault = __copy_from_user(kaddr+offset, buf, bytes);

flush_dcache_page(page);

然後，用__copy_from_user函數從用戶進程空間中讀取buf緩衝區的bytes個位元組到該頁框內offset偏移處。注意offset和bytes在迴圈開始已經被設置爲針對當前頁框的偏移和寫入量。

status = mapping->a_ops->commit_write(file, page, offset, offset+bytes);

if (page_fault)

goto fail_write;

if (!status)

status = bytes;

if (status >= 0) {

written += status;

count -= status;

pos += status;

buf += status;

}

將待寫入的資料從用戶空間複製到頁快取記憶體中的相應頁框後，就可以進行真正的寫入操作了，它由address_space_operation方法表的commit_write所指向的函數完成，即jffs2_commit_write函數，詳見後文分析。在寫入操作完成後根據實際寫入的量刷新已寫入資料量written、剩餘寫入資料量count、下次寫入位置pos和用戶空間緩衝區指標buf。

在ext2文件系統上寫入是非同步的，在generic_file_write中只需要將待寫入資料複製到頁快取記憶體中的相應頁框即可，而commit_write函數只是將髒頁框提交給kflushd，然後由kflushd內核線程非同步地將髒頁框刷新回磁片。在 jffs2 文件系統上寫入是同步的，在這裏立即執行寫入操作（由flash驅動程式提供寫入時的阻塞喚醒機制）。

unlock:

kunmap(page);

/* Mark it unlocked again and drop the page.. */

SetPageReferenced(page);

UnlockPage(page);

page_cache_release(page);

if (status < 0)

break;

} while (count);

在本次迴圈結束前還要遞減頁框的引用計數並解鎖。

done:

*ppos = pos;

if (cached_page)

page_cache_release(cached_page);

/* For now, when the user asks for O_SYNC, we'll actually provide O_DSYNC. */

if (status >= 0) {

if ((file->f_flags & O_SYNC) || IS_SYNC(inode))

status = generic_osync_inode(inode, OSYNC_METADATA|OSYNC_DATA);

}

out_status:

err = written ? written : status;

out:

up(&inode->i_sem);

return err;

在generic_file_write函數結束前還要刷新文件指標ppos爲最後一次迴圈後pos的值，返回總的寫入的資料量或者錯誤碼，並釋放加在整個文件上的信號量inode.i_sem。

另外，在exit2文件系統上，如果文件標誌O_SYNC有效，那麽表示應該立即把相應文件頁高速還從中的的髒頁框刷新回磁片。這個工作由generic_osync_inode函數完成（尚未研究該函數在jffs2文件系統中執行的具體操作）。

fail_write:

status = -EFAULT;

goto unlock;

sync_failure:

* If blocksize < pagesize, prepare_write() may have instantiated a

* few blocks outside i_size. Trim these off again.

kunmap(page);

UnlockPage(page);

page_cache_release(page);

if (pos + bytes > inode->i_size)

vmtruncate(inode, inode->i_size);

goto done;

o_direct:

written = generic_file_direct_IO(WRITE, file, (char *) buf, count, pos);

if (written > 0) {

loff_t end = pos + written;

if (end > inode->i_size && !S_ISBLK(inode->i_mode)) {

inode->i_size = end;

mark_inode_dirty(inode);

}

*ppos = end;

invalidate_inode_pages2(mapping);

}

* Sync the fs metadata but not the minor inode changes and

* of course not the data as we did direct DMA for the IO.

if (written >= 0 && file->f_flags & O_SYNC)

status = generic_osync_inode(inode, OSYNC_METADATA);

goto out_status;

}

jffs2_prepare_write函數

在generic_file_write函數的一次迴圈中要把[start，end]區間的資料寫入pg頁框，而在寫入前必須完成如下準備工作：如果pg頁框的在文件內的起始大於文件大小，則本次迴圈將在文件中造成一個洞（hole），所以得向flash寫入一個jffs2_raw_node資料實體來描述這個洞。另外，如果pg頁框的內容不是最新的，而且寫入操作沒有包括整個頁框，則首先得從flash上讀出該頁框的內容。由jffs2_prepare_write函數完成這兩個工作。

int jffs2_prepare_write (struct file *filp, struct page *pg, unsigned start, unsigned end)

{

struct inode *inode = pg->mapping->host;

struct jffs2_inode_info *f = JFFS2_INODE_INFO(inode);

uint32_t pageofs = pg->index << PAGE_CACHE_SHIFT;

int ret = 0;

如果頁框位於頁快取記憶體，則其描述符page的mapping指向其所屬文件的address_space資料結構，而其中的host即指向其所屬文件的inode。頁框描述符page的index指明頁框在相應文件內的頁面號，所以pageofs爲頁框在文件內的邏輯偏移。

down(&f->sem);

D1(printk(KERN_DEBUG "jffs2_prepare_write()/n"));

由於inode.i_sem在generic_file_write/read期間一直被當前執行流持有，用以實現原子地讀寫文件，用於實現 上層用戶進程之間的同步，而jffs2_inode_info.sem用於實現 底層讀寫執行流與 GC 之間的同步：

在寫文件時jffs2_prepare_write可能寫入代表空洞的資料實體、在jffs2_commit_write中要寫入新的資料實體，而GC內核線程每次執行時都將一個有效的資料實體的副本寫入新的擦除塊，即GC操作也是通過寫入資料實體完成的。伴隨著新資料實體的寫入還需要：

1. 創建新的內核描述符jffs2_raw_node_ref，並加入文件描述符jffs2_inode_cache的nodes鏈表

2. 創建新的jffs2_full_dnode資料結構，並修改紅黑樹中的相應結點jffs2_node_frag的node域指向這個新的資料結構

所以jffs2_inode_cache.nodes鏈表及紅黑樹結點是寫操作和GC操作的臨界資源，故必須採用同步機制避免競爭條件，這也就是設計jffs2_inode_info.sem的初衷了。（另外更底層訪問flash晶片時的同步問題由flash驅動程式處理，由此可見 在作業系統的不同層次要分別設計相應的同步機制！）

if (pageofs > inode->i_size) {

/* Make new hole frag from old EOF to new page */

struct jffs2_sb_info *c = JFFS2_SB_INFO(inode->i_sb);

struct jffs2_raw_inode ri;

struct jffs2_full_dnode *fn;

uint32_t phys_ofs, alloc_len;

D1(printk(KERN_DEBUG "Writing new hole frag 0x%x-0x%x between current EOF and new page/n",

(unsigned int)inode->i_size, pageofs));

如果新寫入的頁面在文件內的起始位置超過了文件的大小，則此次寫入將在文件原有結尾處到該頁面起始之間造成一個洞（hole），所以得向flash寫入一個jffs2_raw_node資料實體來描述這個洞。

ret = jffs2_reserve_space(c, sizeof(ri), &phys_ofs, &alloc_len, ALLOC_NORMAL);

if (ret) {

up(&f->sem);

return ret;

}

在向flash寫入jffs2_raw_inode資料實體之前得通過jffs2_reserve_space函數返回flash上一個合適的區間，由phys_ofs和alloc_len參數返回其位置及長度。（從flash上分配空間的操作可能因剩餘空間不足而觸發GC，同時在選擇擦除塊時必須考慮Wear Levelling策略。這個函數尚未詳細研究）

memset(&ri, 0, sizeof(ri));

ri.magic = cpu_to_je16(JFFS2_MAGIC_BITMASK);

ri.nodetype = cpu_to_je16(JFFS2_NODETYPE_INODE);

ri.totlen = cpu_to_je32(sizeof(ri));

ri.hdr_crc = cpu_to_je32(crc32(0, &ri, sizeof(struct jffs2_unknown_node)-4));

ri.ino = cpu_to_je32(f->inocache->ino);

ri.version = cpu_to_je32(++f->highest_version);

ri.mode = cpu_to_je32(inode->i_mode);

ri.uid = cpu_to_je16(inode->i_uid);

ri.gid = cpu_to_je16(inode->i_gid);

ri.isize = cpu_to_je32(max((uint32_t)inode->i_size, pageofs));

ri.atime = ri.ctime = ri.mtime = cpu_to_je32(CURRENT_TIME);

ri.offset = cpu_to_je32(inode->i_size);

ri.dsize = cpu_to_je32(pageofs - inode->i_size);

ri.csize = cpu_to_je32(0);

ri.compr = JFFS2_COMPR_ZERO;

ri.node_crc = cpu_to_je32(crc32(0, &ri, sizeof(ri)-8));

ri.data_crc = cpu_to_je32(0);

由代碼可以看出，洞僅由一個jffs2_raw_inode資料實體表示而不需要後繼資料，所以其頭部中的totlen就等於該資料結構本身的長度。而offset和dsize分別爲洞在文件內的起始位置和長度，分別爲i_size和pageofs – i_size。 注意，如果資料實體對應一個洞，則設置其 compr 爲 JFFS2_COMPR_ZERO 。然後通過jffs2_write_dnode函數將該資料實體寫入flash，並創建相應的內核描述符jffs2_raw_node_ref，然後組織到鏈表中去。詳見後文。

（其實，從原有文件結尾到該頁面的 start 前都是一個洞，即文件內部 [i_size, pageofs + start] 區間都是洞。但是這裏的資料實體只描述了洞的前部分 [i_size, pageofs] ，而沒有包括後部分 [pageofs, pageofs + start] 。所以我覺得洞的長度應該是 pageofs - inode->i_size + start ）

fn = jffs2_write_dnode(c, f, &ri, NULL, 0, phys_ofs, NULL);

if (IS_ERR(fn)) {

ret = PTR_ERR(fn);

jffs2_complete_reservation(c);

up(&f->sem);

return ret;

}

ret = jffs2_add_full_dnode_to_inode(c, f, fn);

同時，還必須爲資料實體創建相應的jffs2_full_dnode、jffs2_node_frag資料結構並刷新紅黑樹。（ jffs2_add_full_dnode_to_inode 函數在前文打開文件、創建 inode 時就碰到過，涉及紅黑樹結點的插入，或者修改已有結點。尚未深入研究)

if (f->metadata) {

jffs2_mark_node_obsolete(c, f->metadata->raw);

jffs2_free_full_dnode(f->metadata);

f->metadata = NULL;

}

問題：上述操作和 f->metadata 有什麽關係？

if (ret) {

D1(printk(KERN_DEBUG "Eep. add_full_dnode_to_inode() failed in prepare_write,

returned %d/n", ret));

jffs2_mark_node_obsolete(c, fn->raw);

jffs2_free_full_dnode(fn);

jffs2_complete_reservation(c);

up(&f->sem);

return ret;

}

如果加入紅黑樹失敗，則釋放jffs2_full_dnode。由於相應的資料實體已經寫入flash，所以沒有刪除其內核描述符，而是將其標記爲“過時”。jffs2_complete_reservation函數用於向GC內核線程發送SIGHUP信號，使其喚醒。

jffs2_complete_reservation(c);

inode->i_size = pageofs;

}//if (pageofs > inode->i_size)

如果一切順利，將文件大小增加洞的長度。注意， 洞本身並不屬於待寫入的資料，所以沒有刷新剩餘資料量 count 和已寫入資料量 written ，而只是調整了文件的大小。 （爲什麽要在這裏喚醒 GC 內核線程？調用 jffs2_complete_reservation 函數的時機是什麽？？）

/* Read in the page if it wasn't already present, unless it's a whole page */

if (!PageUptodate(pg) && (start || end < PAGE_CACHE_SIZE))

ret = jffs2_do_readpage_nolock(inode, pg);

D1(printk(KERN_DEBUG "end prepare_write(). pg->flags %lx/n", pg->flags));

up(&f->sem);

return ret;

}

最後，如果頁框的內容不是“Uptodate”的，而且待寫入的區域又不包括整個頁框，則必須首先從設備中讀出整個頁框的內容，然後再寫入相應的區間，最後再把整個頁框寫回。否則頁框內不在本次寫入範圍內的其他資料就會丟失。jffs2_do_readpage_nolock函數的分析詳見後文。

jffs2_commit_write函數

本函數將pg頁框中[start, end]區間的資料寫入flash。首先應該寫入一個jffs2_raw_inode資料實體，然後再寫入資料。同時創建相應的內核描述符jffs2_raw_node_ref以及jffs2_full_dnode和jffs2_node_frag，並刷新紅黑樹：要麽插入新結點，要麽刷新過時結點（從而使得紅黑樹只涉及有效的資料實體）。

int jffs2_commit_write (struct file *filp, struct page *pg, unsigned start, unsigned end)

{

/* Actually commit the write from the page cache page we're looking at.

* For now, we write the full page out each time. It sucks, but it's simple */

struct inode *inode = pg->mapping->host;

struct jffs2_inode_info *f = JFFS2_INODE_INFO(inode);

struct jffs2_sb_info *c = JFFS2_SB_INFO(inode->i_sb);

struct jffs2_raw_inode *ri;

int ret = 0;

uint32_t writtenlen = 0;

D1(printk(KERN_DEBUG "jffs2_commit_write(): ino #%lu, page at 0x%lx, range %d-%d, flags %lx/n",

inode->i_ino, pg->index << PAGE_CACHE_SHIFT, start, end, pg->flags));

if (!start && end == PAGE_CACHE_SIZE) {

/* We need to avoid deadlock with page_cache_read() in

jffs2_garbage_collect_pass(). So we have to mark the

page up to date, to prevent page_cache_read() from trying to re-lock it. */

SetPageUptodate(pg);

}

如果寫入的範圍包括整個頁框，那麽在寫入前就設置頁框的“Uptodate”標誌。根據作者的注釋，其實這樣做的目的是爲了避免和 GC 執行的 page_cache_read 發生鎖死。爲什麽當前寫入整個頁框時就會發生鎖死？鎖死是怎麽産生的？寫入部分頁框時會發生鎖死麽？需要研究 GC。

ri = jffs2_alloc_raw_inode();

if (!ri) {

D1(printk(KERN_DEBUG "jffs2_commit_write(): Allocation of raw inode failed/n"));

return -ENOMEM;

}

/* Set the fields that the generic jffs2_write_inode_range() code can't find */

ri->ino = cpu_to_je32(inode->i_ino);

ri->mode = cpu_to_je32(inode->i_mode);

ri->uid = cpu_to_je16(inode->i_uid);

ri->gid = cpu_to_je16(inode->i_gid);

ri->isize = cpu_to_je32((uint32_t)inode->i_size); //文件大小

ri->atime = ri->ctime = ri->mtime = cpu_to_je32(CURRENT_TIME);

在將資料實體寫入flash前首先得準備好一個jffs2_raw_inode資料實體，並根據文件的索引節點inode來設置它。這裏只設定了部分域，剩下的與後繼資料長度相關的域在下面寫入flash的函數中再接著設置。這是因爲根據資料量可能需要寫入多個jffs2_raw_inode資料實體，而這又是因爲資料實體的後繼資料有最大長度限制。但是這些資料實體的jffs2_raw_inode中含有關於該文件的相同的資訊，而這些相同的資訊在這裏就可以設置了，剩下與資料長度相關的域及版本號在寫入flash的函數中才能確定。

下面將jffs2_raw_inode及相應的資料一起寫入flash。注意先前在generic_file_write函數中用kmap函數得到了pg頁框的內核虛擬位址，所以傳遞的第四個參數爲該頁框內寫入位置的內核虛擬位址，而第五個參數爲寫入位置在文件內的邏輯偏移，寫入長度爲end – start，實際寫入的資料量由writtenlen參數返回。另外，在這個函數中還要創建資料實體的內核描述符jffs2_raw_node_ref和jffs2_full_dnode、jffs2_node_frag，並刷新紅黑樹。詳見下文。

/* We rely on the fact that generic_file_write() currently kmaps the page for us. */

//傳遞資料的起始虛擬位址和在文件內的邏輯偏移（jffs2_raw_inode.offset記錄資料在文件內的偏移）

ret = jffs2_write_inode_range(c, f, ri, page_address(pg) + start,

(pg->index << PAGE_CACHE_SHIFT) + start, end - start, &writtenlen);

if (ret) {

/* There was an error writing. */

SetPageError(pg);

}

if (writtenlen) {

if (inode->i_size < (pg->index << PAGE_CACHE_SHIFT) + start + writtenlen) {

inode->i_size = (pg->index << PAGE_CACHE_SHIFT) + start + writtenlen;

inode->i_blocks = (inode->i_size + 511) >> 9;

inode->i_ctime = inode->i_mtime = je32_to_cpu(ri->ctime);

}

jffs2_free_raw_inode(ri);

（pg->index << PAGE_CACHE_SHIFT + start）爲寫入位置在文件內的邏輯偏移，而本次寫入的資料量爲writtenlen，所以它們的和爲新的文件末尾位置。寫操作完成後設置文件大小i_size爲這個值，並釋放jffs2_raw_inode資料實體。

if (start+writtenlen < end) {

/* generic_file_write has written more to the page cache than we've

actually written to the medium. Mark the page !Uptodate so that it gets reread */

D1(printk(KERN_DEBUG "jffs2_commit_write(): Not all bytes written. Marking page !uptodate/n"));

SetPageError(pg);

ClearPageUptodate(pg);

}

D1(printk(KERN_DEBUG "jffs2_commit_write() returning %d/n",writtenlen?writtenlen:ret));

return writtenlen?writtenlen:ret;

}

由上文可見，在函數的開始如果發現寫入範圍包括了整個頁框則設置了其“Uptodate”標誌。如果寫入操作完成後發現沒有寫入額定的資料，即沒有寫完整個頁框，則必須清除頁框的“Uptodate”標誌。

另外，在 generic_file_write 的一個迴圈中寫入一個頁框。在一個頁框寫入後怎麽沒看到設置其“ Uptodate ”標誌？？

jffs2_write_inode_range函數

該函數將jffs2_raw_inode及相應的資料一起寫入flash。第四個參數爲該頁框內寫入位置的內核虛擬位址，而第五個參數爲寫入位置在文件內的邏輯偏移，寫入長度爲end – start，實際寫入的資料量由writtenlen參數返回。另外，在這個函數中還要創建資料實體的內核描述符jffs2_raw_node_ref和jffs2_full_dnode、jffs2_node_frag，並刷新紅黑樹。

/* The OS-specific code fills in the metadata in the jffs2_raw_inode for us, so that

we don't have to go digging in struct inode or its equivalent. It should set:

mode, uid, gid, (starting)isize, atime, ctime, mtime */

int jffs2_write_inode_range(struct jffs2_sb_info *c, struct jffs2_inode_info *f,

struct jffs2_raw_inode *ri, unsigned char *buf,

uint32_t offset, uint32_t writelen, uint32_t *retlen)

{

int ret = 0;

uint32_t writtenlen = 0;

D1(printk(KERN_DEBUG "jffs2_write_inode_range(): Ino #%u, ofs 0x%x, len 0x%x/n",

f->inocache->ino, offset, writelen));

因爲每個資料實體所攜帶的資料有長度限制，所以根據待寫入資料量可能需要寫入多個jffs2_raw_inode資料實體。但是這些資料實體的jffs2_raw_inode中含有關於該文件的相同的資訊，而這些相同的資訊在jffs2_commit_write函數中就已經設置好了，而剩下的與資料長度相關的域及版本號在這裏才能確定。

下面就在一個迴圈中寫入若干資料實體，以便把所有的資料都寫入flash。在寫入一個資料實體前可能要首先壓縮資料，並設置jffs2_raw_inode中與後繼資料長度相關的域。在寫入操作完成後，還要爲資料實體創建內核描述符jffs2_raw_node_ref及相應的jffs2_full_dnode和jffs2_node_frag，並刷新紅黑樹：要麽插入新結點，要麽刷新結點指向新的jffs2_node_frag（從而使得紅黑樹只與有效資料實體有關），並標記相關區域的原有資料實體爲“過時”。

while(writelen) {

struct jffs2_full_dnode *fn;

unsigned char *comprbuf = NULL;

unsigned char comprtype = JFFS2_COMPR_NONE;

uint32_t phys_ofs, alloclen;

uint32_t datalen, cdatalen;

D2(printk(KERN_DEBUG "jffs2_commit_write() loop: 0x%x to write to 0x%x/n", writelen, offset));

ret = jffs2_reserve_space(c, sizeof(*ri) + JFFS2_MIN_DATA_LEN, &phys_ofs, &alloclen,

ALLOC_NORMAL);

if (ret) {

D1(printk(KERN_DEBUG "jffs2_reserve_space returned %d/n", ret));

break;

}

down(&f->sem);

首先通過jffs2_reserve_space函數在flash上找到一個合適大小的空間，參數phys_ofs和alloclen返回該空間的位置和大小。注意傳遞的第二個參數爲sizeof(*ri) + JFFS2_MIN_DATA_LEN，可見資料實體後繼資料是有最大長度限制的。在寫操作期間要持有信號量jffs2_inode_info.sem，它用於實現寫操作和GC操作之間的同步，參見上文。

datalen = writelen;

cdatalen = min(alloclen - sizeof(*ri), writelen);

comprbuf = kmalloc(cdatalen, GFP_KERNEL);

writelne爲剩餘待寫入的資料量，而給本次寫操作分配的空間長度爲alloclen，所以本次寫操作實際寫入的原始資料量爲cdatalen。這裏按照本次操作的原始寫入量分配一個緩衝區用於存放壓縮後的待寫入資料，然後通過jffs2_compress函數壓縮原始資料，其參數爲：原始資料在buf中，長度爲datalen。函數返回後datalen爲實際被壓縮了資料量；壓縮後的資料存放在comprbuf中，緩衝區長度爲cdatalen。函數返回後爲實際被壓縮了的資料長度（可能小於comprbuf的長度）（尚未研究jffs2所採用的壓縮演算法）：

if (comprbuf) {

comprtype = jffs2_compress(buf, comprbuf, &datalen, &cdatalen);

}

if (comprtype == JFFS2_COMPR_NONE) {

/* Either compression failed, or the allocation of comprbuf failed */

if (comprbuf)

kfree(comprbuf);

comprbuf = buf;

datalen = cdatalen;

}

如果壓縮失敗，則資料實體的後繼資料沒有被壓縮，所以實際參與壓縮的資料量datalen就等於先前參與壓縮的資料量cdatalen，而comprbuf也指向原始資料緩衝區buf；如果壓縮成功，則根據jffs2_compress函數語義，datalen爲實際參與壓縮的資料量，cdatalen爲壓縮後的資料量，下面就可以根據這兩個返回參數確定資料實體中與後繼資料長度相關的域了：

/* Now comprbuf points to the data to be written, be it compressed or not.

comprtype holds the compression type, and comprtype == JFFS2_COMPR_NONE means

that the comprbuf doesn't need to be kfree()d. */

ri->magic = cpu_to_je16(JFFS2_MAGIC_BITMASK);

ri->nodetype = cpu_to_je16(JFFS2_NODETYPE_INODE);

ri->totlen = cpu_to_je32(sizeof(*ri) + cdatalen);

ri->hdr_crc = cpu_to_je32(crc32(0, ri, sizeof(struct jffs2_unknown_node)-4));

ri->ino = cpu_to_je32(f->inocache->ino);

ri->version = cpu_to_je32(++f->highest_version);

ri->isize = cpu_to_je32(max(je32_to_cpu(ri->isize), offset + datalen));

ri->offset = cpu_to_je32(offset);

ri->csize = cpu_to_je32(cdatalen);

ri->dsize = cpu_to_je32(datalen);

ri->compr = comprtype;

ri->node_crc = cpu_to_je32(crc32(0, ri, sizeof(*ri)-8));

ri->data_crc = cpu_to_je32(crc32(0, comprbuf, cdatalen));

由此可見，資料實體頭部中的totlen爲資料實體本身及後繼資料的長度；同一個文件的所有資料實體的version號遞增（一個文件的最高version號記錄在jffs2_inode_info.highest_verison）；資料實體中的csize和dsize分別爲壓縮了的和解壓縮後的資料長度；compr爲對壓縮演算法的描述；node_crc爲資料實體本身的crc校驗值，data_crc爲後繼資料本身的校驗值。另外，offset爲該資料實體的後繼資料在文件內的邏輯偏移。

fn = jffs2_write_dnode(c, f, ri, comprbuf, cdatalen, phys_ofs, NULL);

準備好資料實體和後繼資料後，就可以將它們順序地寫入flash了。jffs2_write_dnode函數將jffs2_raw_inode和壓縮過的資料寫入flash上phys_ofs處，同時分配內核描述符jffs2_raw_node_ref以及相應的jffs2_full_dnode，將前者加入文件的鏈表並返回後者的地址。

if (comprtype != JFFS2_COMPR_NONE)

kfree(comprbuf);

if (IS_ERR(fn)) {

ret = PTR_ERR(fn);

up(&f->sem);

jffs2_complete_reservation(c);

break;

}

ret = jffs2_add_full_dnode_to_inode(c, f, fn);

if (f->metadata) {

jffs2_mark_node_obsolete(c, f->metadata->raw);

jffs2_free_full_dnode(f->metadata);

f->metadata = NULL;

}

if (ret) { /* Eep */

D1(printk(KERN_DEBUG "Eep. add_full_dnode_to_inode() failed in commit_write, returned

%d/n", ret));

jffs2_mark_node_obsolete(c, fn->raw);

jffs2_free_full_dnode(fn);

up(&f->sem);

jffs2_complete_reservation(c);

break;

}

up(&f->sem);

jffs2_complete_reservation(c);

寫入flash完成後即可釋放緩存壓縮資料的緩衝區並釋放jffs2_inode_info.sem信號量。在jffs2_write_dnode函數中創建、註冊了資料實體的內核描述符，還要用jffs2_add_full_dnode_to_inode函數在文件的紅黑樹中查找相應資料實體的jffs2_node_frag資料結構，如果沒找到，則創建新的並插入紅黑樹；如果找到，則將其改爲指向新的jffs2_full_node，並遞減原有jffs2_full_node的移用計數，並標記原有資料結點的內核描述爲過時。

if (!datalen) {

printk(KERN_WARNING "Eep. We didn't actually write any data in ffs2_write_inode_range()/n");

ret = -EIO;

break;

}

D1(printk(KERN_DEBUG "increasing writtenlen by %d/n", datalen));

writtenlen += datalen;

offset += datalen;

writelen -= datalen;

buf += datalen;

}//while

*retlen = writtenlen;

return ret;

}

在函數的最後刷新writtenlen和offset：datalen在jffs2_compress函數返回後爲實際參與壓縮的資料量，用它遞增已寫入資料量writtenlen、待寫入資料在文件內的邏輯偏移offset、待寫入資料在緩衝區內的偏移buf並遞減待寫入資料量writenlen。

jffs2_write_dnode函數

jffs2_write_dnode函數將ri所指jffs2_raw_inode和data緩衝區內長度爲datalen的資料寫入flash上phys_ofs處，同時分配內核描述符jffs2_raw_node_ref以及相應的jffs2_full_dnode，將前者加入文件的鏈表並返回後者的地址。

/* jffs2_write_dnode - given a raw_inode, allocate a full_dnode for it,

write it to the flash, link it into the existing inode/fragment list */

struct jffs2_full_dnode *jffs2_write_dnode(struct jffs2_sb_info *c, struct jffs2_inode_info *f,

struct jffs2_raw_inode *ri, const unsigned char *data,

uint32_t datalen, uint32_t flash_ofs, uint32_t *writelen)

{

struct jffs2_raw_node_ref *raw;

struct jffs2_full_dnode *fn;

size_t retlen;

struct iovec vecs[2];

int ret;

unsigned long cnt = 2;

D1(if(je32_to_cpu(ri->hdr_crc) != crc32(0, ri, sizeof(struct jffs2_unknown_node)-4)) {

printk(KERN_CRIT "Eep. CRC not correct in jffs2_write_dnode()/n");

BUG();});

vecs[0].iov_base = ri;

vecs[0].iov_len = sizeof(*ri);

vecs[1].iov_base = (unsigned char *)data;

vecs[1].iov_len = datalen;

顯然寫入操作分兩步完成，依次寫入資料實體及後繼資料，於是用兩個iovec類型的變數分別指向它們的基底位址和長度。

writecheck(c, flash_ofs);

if (je32_to_cpu(ri->totlen) != sizeof(*ri) + datalen) {

printk(KERN_WARNING "jffs2_write_dnode: ri->totlen (0x%08x) != sizeof(*ri) (0x%08x) + datalen

(0x%08x)/n", je32_to_cpu(ri->totlen), sizeof(*ri), datalen);

}

raw = jffs2_alloc_raw_node_ref();

if (!raw)

return ERR_PTR(-ENOMEM);

fn = jffs2_alloc_full_dnode();

if (!fn) {

jffs2_free_raw_node_ref(raw);

return ERR_PTR(-ENOMEM);

}

raw->flash_offset = flash_ofs;

raw->totlen = PAD(sizeof(*ri)+datalen);

raw->next_phys = NULL;

fn->ofs = je32_to_cpu(ri->offset);

fn->size = je32_to_cpu(ri->dsize);

fn->frags = 0;

fn->raw = raw;

在寫入前還要爲新的資料實體創建相應的內核描述符jff2_raw_node_ref和jffs2_full_dnode資料結構，內描述符的flash_offset和totlen爲資料實體在 flash 分區內的邏輯偏移和整個資料實體的長度。由於此時尚未加入相應文件的鏈表，所以next_phys域暫時設置爲NULL；後者用於描述後繼資料在文件內的位置和長度，所以ofs和size域分別爲相應資料 在文件內的邏輯偏移和長度。由於此時尚未創建相應的jffs2_node_frag資料結構，所以其frags域設置爲0。

/* check number of valid vecs */

if (!datalen || !data)

cnt = 1;

ret = jffs2_flash_writev(c, vecs, cnt, flash_ofs, &retlen);

前面已經用兩個iovec類型的資料結構描述好了資料實體及後繼資料的基底位址及長度，這裏就可以通過jffs2_flash_writev函數將它們寫入flash了（這個函數最終調用flash驅動的mtd->writev方法）。retlen返回實際寫入的資料量。如果實際寫入的資料量小於整個資料實體的長度、或者函數返回錯誤，則需要立即處理：

if (ret || (retlen != sizeof(*ri) + datalen)) {

printk(KERN_NOTICE "Write of %d bytes at 0x%08x failed. returned %d, retlen %d/n",

sizeof(*ri)+datalen, flash_ofs, ret, retlen);

/* Mark the space as dirtied */

if (retlen) {

/* Doesn't belong to any inode */

raw->next_in_ino = NULL;

/* Don't change raw->totlen to match retlen. We may have

written the node header already, and only the data will

seem corrupted, in which case the scan would skip over

any node we write before the original intended end of this node */

raw->flash_offset |= REF_OBSOLETE;

jffs2_add_physical_node_ref(c, raw);

jffs2_mark_node_obsolete(c, raw);

}

如果實際寫入的資料量小於整個資料實體的長度，即認爲只寫入了部分資料，而資料實體本身認爲已完整寫入。根據作者的注釋此時並不改變頭部中的totlen域，而是將內核描述標記爲過時，並照樣加入文件的鏈表。將內核描述符標記爲過時，那麽以後再打開該文件時就會跳過該資料實體。（參見討論中的相關部分？）

else {

printk(KERN_NOTICE "Not marking the space at 0x%08x as dirty because the flash driver

returned retlen zero/n", raw->flash_offset);

jffs2_free_raw_node_ref(raw);

}

/* Release the full_dnode which is now useless, and return */

jffs2_free_full_dnode(fn);

if (writelen)

*writelen = retlen;

return ERR_PTR(ret?ret:-EIO);

}

/* Mark the space used */

if (datalen == PAGE_CACHE_SIZE)

raw->flash_offset |= REF_PRISTINE;

else

raw->flash_offset |= REF_NORMAL;

如果成功寫入，則需要設置資料實體的內核描述符的相關標誌：如果後繼資料爲整個頁框大小，則設置PREF_PRISTINE標誌，至少也是REF_NORMAL標誌。然後，將其內核描述符加入文件的jffs2_inode_cache的nodes域指向的鏈表的首部，並通過jffs2_add_physical_node_ref函數更新相應flash擦除塊和文件系統內的相關統計資訊：

jffs2_add_physical_node_ref(c, raw);

/* Link into per-inode list */

raw->next_in_ino = f->inocache->nodes;

f->inocache->nodes = raw;

D1(printk(KERN_DEBUG "jffs2_write_dnode wrote node at 0x%08x with dsize 0x%x, csize 0x%x,

node_crc 0x%08x, data_crc 0x%08x, totlen 0x%08x/n",

flash_ofs, je32_to_cpu(ri->dsize), je32_to_cpu(ri->csize),

je32_to_cpu(ri->node_crc), je32_to_cpu(ri->data_crc), je32_to_cpu(ri->totlen)));

if (writelen)

*writelen = retlen;

f->inocache->nodes = raw;

return fn;

}

第7章 jffs2中讀正規文件的方法

與寫正規文件類似，讀正規文件時函數調用路徑如下：

sys_read > do_generic_file_read > jffs2_readpage > jffs2_do_readpage_unlock > jffs2_do_readpage_nolock

在do_generic_file_read函數中需要處理預讀，並且在一個迴圈中通過inode.i_mapping->a_ops->readpage方法依次讀出文件的各個頁面到頁快取記憶體的相應頁框中，而這個方法即爲jffs2_readpage函數。

jffs2_readpage函數

int jffs2_readpage (struct file *filp, struct page *pg)

{

struct jffs2_inode_info *f = JFFS2_INODE_INFO(pg->mapping->host);

int ret;

down(&f->sem);

ret = jffs2_do_readpage_unlock(pg->mapping->host, pg);

up(&f->sem);

return ret;

}

int jffs2_do_readpage_unlock(struct inode *inode, struct page *pg)

{

int ret = jffs2_do_readpage_nolock(inode, pg);

unlock_page(pg);

return ret;

}

在do_generic_file_read函數中啓動一個頁面的讀操作前，已經通過lock_page獲得了頁快取記憶體中頁框的鎖，所以在讀操作完成後還要釋放鎖。

jffs2_do_readpage_nolock函數

這個函數讀出文件中指定的一頁內容到其頁快取記憶體中的相應頁框中（頁框描述符由pg參數指向）。

int jffs2_do_readpage_nolock (struct inode *inode, struct page *pg)

{

struct jffs2_inode_info *f = JFFS2_INODE_INFO(inode);

struct jffs2_sb_info *c = JFFS2_SB_INFO(inode->i_sb);

unsigned char *pg_buf;

int ret;

D1(printk(KERN_DEBUG "jffs2_do_readpage_nolock(): ino #%lu, page at offset 0x%lx/n", inode->i_ino,

pg->index << PAGE_CACHE_SHIFT));

if (!PageLocked(pg))

PAGE_BUG(pg);

在操作函數調用鏈的上游、在do_generic_file_read函數中就已經獲得了該頁框的鎖，否則爲BUG。

pg_buf = kmap(pg);

/* FIXME: Can kmap fail? */

ret = jffs2_read_inode_range(c, f, pg_buf, pg->index << PAGE_CACHE_SHIFT,

PAGE_CACHE_SIZE);

然後，由kmap函數返回相應頁框的內核虛擬位址，並由jffs2_read_inode_range函數讀入整個頁面的內容到該頁框中。詳見下文。

if (ret) {

ClearPageUptodate(pg);

SetPageError(pg);

} else {

SetPageUptodate(pg); //如果讀成功，則設置uptodate標誌

ClearPageError(pg);

}

flush_dcache_page(pg);

kunmap(pg);

D1(printk(KERN_DEBUG "readpage finished/n"));

return 0;

}

最後，根據讀操作完成的情況設置頁框的“Uptodate”標誌、清除錯誤標誌，或者反之。並由kunmap函數解除內核頁表中對相應頁框的映射。

jffs2_read_inode_range函數

jffs2_read_inode_range 函數用於從 flash 上讀取文件的一個頁框中 [offset, offset + len] 區域的內容到頁快取記憶體中。在打開文件、讀 inode 時已經爲所有有效的 jffs2_raw_inode 資料實體創建了相應的 jffs2_full_dnode ，並由 jffs2_node_frag 加入了紅黑樹，這個函數僅是通過紅黑樹訪問相應的資料實體即可。

（早在打開文件、創建 inode 時就會跳過所有過時的資料實體，另外寫入新的資料實體時會修改紅黑樹中的相應結點的 node 指向新資料實體的 jffs2_node_frag ，那麽紅黑樹中的結點始終指向有效資料實體。 flash 上的資料實體一旦過時就再也不會被訪問到，最終它所在的擦除塊會被 GC 加入待擦除鏈表。）

第三個參數爲緩衝區的內核虛擬位址，第四、五個參數描述文件的一個頁中的一個區域。爲了幫助理解這個函數的邏輯，可以參考下圖所示的具有普遍意義的一種情形：

offset

end

假設文件的一個頁面內容由三部分資料組成：第一塊資料的起始即爲頁面起始，第三塊資料結束在頁面結尾處，兩部分資料之間存在一個空洞。三部分資料都由相應的jffs2_raw_inode資料實體描述（回想在寫操作時address_space_operation的prepare_write方法向flash寫入一個資料實體來描述空洞）。而需要讀出的區域始於第一塊資料中間、終於第三塊資料中間。下面我們就以這種情形爲例來分析這個函數的邏輯：

int jffs2_read_inode_range(struct jffs2_sb_info *c, struct jffs2_inode_info *f, unsigned char *buf,

uint32_t offset, uint32_t len)

{

uint32_t end = offset + len;

struct jffs2_node_frag *frag;

int ret;

D1(printk(KERN_DEBUG "jffs2_read_inode_range: ino #%u, range 0x%08x-0x%08x/n",

f->inocache->ino, offset, offset+len));

frag = jffs2_lookup_node_frag(&f->fragtree, offset);

由於在打開文件、創建inode時已經爲爲jffs2_raw_inode資料實體創建了相應的jffs2_full_dnode，並由jffs2_node_frag加入了紅黑樹，所以在讀文件時就可以通過紅黑樹來定位相應的資料實體了。這裏首先找到包含 offset 的資料實體，或者起始位置 ofs 大於 offset 、但 ofs 又是最小的資料實體（ jffs2_lookup_node_frag 涉及紅黑樹的查找，尚未詳細分析，所以該函數的行爲有待確認）。

/* XXX FIXME: Where a single physical node actually shows up in two

frags, we read it twice. Don't do that. */

/* Now we're pointing at the first frag which overlaps our page */

由於待讀出的區域可能涉及多個資料實體，所以在一個迴圈中可能需要分多次讀出，其中每次迴圈只涉及一個資料實體中的資料（儘管一次迴圈待讀出的資料可能只是一個資料實體的一部分，但是也要首先讀出整個資料實體的內容。）每次迴圈後遞增offset指標，並從紅黑樹中返回指向“後繼”資料實體的frag。

while(offset < end) {

D2(printk(KERN_DEBUG "jffs2_read_inode_range: offset %d, end %d/n", offset, end));

if (!frag || frag->ofs > offset) {

uint32_t holesize = end - offset;

if (frag) {

D1(printk(KERN_NOTICE "Eep. Hole in ino #%u fraglist. frag->ofs = 0x%08x,

offset = 0x%08x/n", f->inocache->ino, frag->ofs, offset));

holesize = min(holesize, frag->ofs - offset);

D1(jffs2_print_frag_list(f));

}

D1(printk(KERN_DEBUG "Filling non-frag hole from %d-%d/n", offset, offset+holesize));

memset(buf, 0, holesize);

buf += holesize;

offset += holesize;

continue;

在上一個資料實體讀出後，frag即指向其“後繼”的資料實體。或者在第一次迴圈中frag指向第一個資料實體。如果該資料實體的資料在文件的相關頁面內的邏輯偏移大於offset，則說明在兩次迴圈的兩個資料實體之間存在一個空洞、或者在第一次迴圈、第一個資料實體前存在一個空洞。此時洞的大小爲ofs – offset。洞內的資料都爲0，所以直接將buf中相應長度的區域清0即可，同時步進offset和buf指標爲洞的大小，並開始下一輪迴圈。

} else if (frag->ofs < offset && (offset & (PAGE_CACHE_SIZE-1)) != 0) {

D1(printk(KERN_NOTICE "Eep. Overlap in ino #%u fraglist. frag->ofs = 0x%08x,

offset = 0x%08x/n", f->inocache->ino, frag->ofs, offset));

D1(jffs2_print_frag_list(f));

memset(buf, 0, end - offset);

return -EIO;

}

這不正是我假設的第一種情況嗎，爲什麽是非法的呢？！

else if (!frag->node) {

uint32_t holeend = min(end, frag->ofs + frag->size);

D1(printk(KERN_DEBUG "Filling frag hole from %d-%d (frag 0x%x 0x%x)/n", offset, holeend,

frag->ofs, frag->ofs + frag->size));

memset(buf, 0, holeend - offset);

buf += holeend - offset;

offset = holeend;

frag = frag_next(frag);

continue;

}

jffs2_node_frag的node域指向資料實體的jffs2_full_dnode資料結構。在什麽情況下會爲 NULL ？？這種情況也是洞，將讀緩衝區中buf開始相應長度的區域清0（另外在jffs2_prepare_write中寫入洞的資料實體，創建相應內核描述符並加入紅黑樹）。

else {

uint32_t readlen;

uint32_t fragofs; /* offset within the frag to start reading */

fragofs = offset - frag->ofs;

readlen = min(frag->size - fragofs, end - offset);

D1(printk(KERN_DEBUG "Reading %d-%d from node at 0x%x/n", frag->ofs+fragofs,

frag->ofs+fragofs+readlen, ref_offset(frag->node->raw)));

fragofs爲資料起始位置在資料實體內的偏移。在我們假設的情形中，本次迴圈需要讀取資料實體中除fragofs之外的資料，所以讀出的資料量readlen等於資料實體大小frag->size減去fragofs。

第四個參數爲什麽是這樣？？ frag.ofs 和 jffs2_full_dnode.ofs 都是資料在文件內的偏移，相減爲 0 。那麽第四個參數不就是資料在資料實體內部的偏移 fragofs 嗎？？從jffs2_read_dnode函數分析看，第四個參數的確也是指起始讀位置在資料實體內部的偏移。

ret = jffs2_read_dnode(c, frag->node, buf, fragofs + frag->ofs - frag->node->ofs, readlen);

D2(printk(KERN_DEBUG "node read done/n"));

if (ret) {

D1(printk(KERN_DEBUG"jffs2_read_inode_range error %d/n",ret));

memset(buf, 0, readlen);

return ret;

}

buf += readlen;

offset += readlen;

frag = frag_next(frag);

D2(printk(KERN_DEBUG "node read was OK. Looping/n"));

continue;

}

printk(KERN_CRIT "dwmw2 is stupid. Reason #5325/n");

BUG();

}//while

return 0;

}

讀出操作完成後步進buf和offset指標，frag_next宏用於返回紅黑樹中“後繼”資料實體的指標。然後開始新的迴圈。

jffs2_read_dnode函數

該函數讀出fd所指數據實體內部[ofs, ofs + len]區域的資料到緩衝區buf中。即使可能只需要讀出一個資料實體的一部分，但還是首先讀出整個資料實體。而從flash上讀取一個資料實體時分兩步進行：首先讀出jffs2_raw_inode，從而獲得csize和dsize資訊，然後再分配合適的大小，再讀出緊隨其後的壓縮了的資料，最後再解壓縮。最後再將指定區域內的資料複製到buf中。

int jffs2_read_dnode(struct jffs2_sb_info *c, struct jffs2_full_dnode *fd, unsigned char *buf, int ofs, int len)

{

struct jffs2_raw_inode *ri;

size_t readlen;

uint32_t crc;

unsigned char *decomprbuf = NULL;

unsigned char *readbuf = NULL;

int ret = 0;

ri = jffs2_alloc_raw_inode();

if (!ri)

return -ENOMEM;

ret = jffs2_flash_read(c, ref_offset(fd->raw), sizeof(*ri), &readlen, (char *)ri);

if (ret) {

jffs2_free_raw_inode(ri);

printk(KERN_WARNING "Error reading node from 0x%08x: %d/n", ref_offset(fd->raw), ret);

return ret;

}

if (readlen != sizeof(*ri)) {

jffs2_free_raw_inode(ri);

printk(KERN_WARNING "Short read from 0x%08x: wanted 0x%x bytes, got 0x%x/n",

ref_offset(fd->raw), sizeof(*ri), readlen);

return -EIO;

}

crc = crc32(0, ri, sizeof(*ri)-8);

D1(printk(KERN_DEBUG "Node read from %08x: node_crc %08x, calculated CRC %08x. dsize %x, csize

%x, offset %x, buf %p/n", ref_offset(fd->raw), je32_to_cpu(ri->node_crc),

crc, je32_to_cpu(ri->dsize), je32_to_cpu(ri->csize), je32_to_cpu(ri->offset), buf));

if (crc != je32_to_cpu(ri->node_crc)) {

printk(KERN_WARNING "Node CRC %08x != calculated CRC %08x for node at %08x/n",

je32_to_cpu(ri->node_crc), crc, ref_offset(fd->raw));

ret = -EIO;

goto out_ri;

}

/* There was a bug where we wrote hole nodes out with csize/dsize swapped. Deal with it */

if (ri->compr == JFFS2_COMPR_ZERO && !je32_to_cpu(ri->dsize) && je32_to_cpu(ri->csize)) {

ri->dsize = ri->csize;

ri->csize = cpu_to_je32(0);

}

如前所述，讀取資料實體時首先要讀出其jffs2_raw_inode結構本身的內容，以得到後繼資料長度的資訊csize和dsize。首先用jffs2_alloc_raw_inode函數分配一個jffs2_raw_inode資料結構，然後由jffs2_flash_read函數填充之。如果實際讀出的資料長度小於jffs2_raw_inode資料結構本身的長度、或者發生crc校驗錯誤，則返回錯誤碼EIO。

D1(if(ofs + len > je32_to_cpu(ri->dsize)) {

printk(KERN_WARNING "jffs2_read_dnode() asked for %d bytes at %d from %d-byte node/n",

len, ofs, je32_to_cpu(ri->dsize));

ret = -EINVAL;

goto out_ri;

});

if (ri->compr == JFFS2_COMPR_ZERO) {

memset(buf, 0, len);

goto out_ri;

}

如果compr等於JFFS2_COMPR_ZERO，則表示爲一個洞，所以只需直接將讀緩衝區中buf偏移開始、長度爲len的空間清0即可。處理完洞，下面就根據讀出的範圍是否涵蓋資料實體後繼的所有資料、資料是否壓縮，分爲4種情況進行處理：

/* Cases:

Reading whole node and it's uncompressed - read directly to buffer provided, check CRC.

Reading whole node and it's compressed - read into comprbuf, check CRC and decompress

to buffer provided

Reading partial node and it's uncompressed - read into readbuf, check CRC, and copy

Reading partial node and it's compressed - read into readbuf, check checksum,

decompress to decomprbuf and copy */

if (ri->compr == JFFS2_COMPR_NONE && len == je32_to_cpu(ri->dsize)) {

readbuf = buf;

第一種情況，如果需要讀出後繼所有資料，並且資料沒有經過壓縮，則直接將資料讀入到接收緩衝區buf種即可（而無需通過解壓縮緩衝區中轉）。

只要不是讀出所有的資料，就需要經過中間的緩衝區中轉，即使資料也沒有被壓縮，所以下面根據資料實體的長度csize分配中間緩衝區：（如果沒有壓縮，則 csize 等於 dsize）

} else {

readbuf = kmalloc(je32_to_cpu(ri->csize), GFP_KERNEL);

if (!readbuf) {

ret = -ENOMEM;

goto out_ri;

}

if (ri->compr != JFFS2_COMPR_NONE) {

if (len < je32_to_cpu(ri->dsize)) { //讀出部分壓縮資料，需要額外解壓縮緩衝區

decomprbuf = kmalloc(je32_to_cpu(ri->dsize), GFP_KERNEL);

if (!decomprbuf) {

ret = -ENOMEM;

goto out_readbuf;

}

} else {

decomprbuf = buf; //讀出全部壓縮資料，直接將接收緩衝區當作解壓縮緩衝區即可

}

} else { //讀出部分、未壓縮的資料

decomprbuf = readbuf;

}

compr不等於JFFS2_COMPR_NONE，則說明資料被壓縮。如果需要讀出部分被壓縮的資料，那麽還需要另一個容納解壓縮了的資料的緩衝區，然後再截取其中的資料到接收緩衝區buf中；如果要讀出全部壓縮資料，則無需截取操作，所以可以直接將資料解壓縮到接收緩衝區buf中即可；如果資料沒有被壓縮，則首先將全部資料讀出到中間緩衝區readbuf中，隨後在截取其中相應範圍的資料到接收緩衝區buf中。下面通過jffs2_flash_read函數將所有後繼資料讀到readbuf中，並進行crc校驗。

D2(printk(KERN_DEBUG "Read %d bytes to %p/n", je32_to_cpu(ri->csize),

readbuf));

ret = jffs2_flash_read(c, (ref_offset(fd->raw)) + sizeof(*ri), je32_to_cpu(ri->csize), &readlen, readbuf);

if (!ret && readlen != je32_to_cpu(ri->csize))

ret = -EIO;

if (ret)

goto out_decomprbuf;

crc = crc32(0, readbuf, je32_to_cpu(ri->csize));

if (crc != je32_to_cpu(ri->data_crc)) {

printk(KERN_WARNING "Data CRC %08x != calculated CRC %08x for node at %08x/n",

je32_to_cpu(ri->data_crc), crc, ref_offset(fd->raw));

ret = -EIO;

goto out_decomprbuf;

}

資料讀出後，如果是壓縮了的資料，則進行解壓縮到decomprbuf中；如果需要讀出的僅是其中的部分資料，那麽還要截取這部分資料：

D2(printk(KERN_DEBUG "Data CRC matches calculated CRC %08x/n", crc));

if (ri->compr != JFFS2_COMPR_NONE) {

D2(printk(KERN_DEBUG "Decompress %d bytes from %p to %d bytes at %p/n", ri->csize, readbuf,

je32_to_cpu(ri->dsize), decomprbuf));

ret = jffs2_decompress(ri->compr, readbuf, decomprbuf, je32_to_cpu(ri->csize),

je32_to_cpu(ri->dsize));

if (ret) {

printk(KERN_WARNING "Error: jffs2_decompress returned %d/n", ret);

goto out_decomprbuf;

}

if (len < je32_to_cpu(ri->dsize)) {

memcpy(buf, decomprbuf+ofs, len);

}

out_decomprbuf:

if(decomprbuf != buf && decomprbuf != readbuf)

kfree(decomprbuf);

out_readbuf:

if(readbuf != buf)

kfree(readbuf);

out_ri:

jffs2_free_raw_inode(ri);

return ret;

}

第8章 jffs2的Garbage Collection

在挂載jffs2文件系統時、在jffs2_do_fill_super函數的最後創建並啓動GC內核線程，相關代碼如下：

if (!(sb->s_flags & MS_RDONLY))

jffs2_start_garbage_collect_thread(c);

return 0;

如果jffs2文件系統不是以唯讀方式挂載的，就會有新的資料實體寫入flash。而且jffs2文件系統的特點是在寫入新的資料實體時並不修改flash上原有資料實體，而只是將其內核描述符標記爲“過時”。系統運行一段時間後若空白flash擦除塊的數量小於一定閾值，則GC被喚醒用於釋放所有過時的資料實體。

爲了儘量均衡地使用flash分區上的所有擦除塊，在選擇有效資料實體的副本所寫入的擦除塊時需要考慮Wear Levelling演算法。

jffs2_start_garbage_collect_thread函數

該函數用於創建GC內核線程。

/* This must only ever be called when no GC thread is currently running */

int jffs2_start_garbage_collect_thread(struct jffs2_sb_info *c)

{

pid_t pid;

int ret = 0;

if (c->gc_task)

BUG();

init_MUTEX_LOCKED(&c->gc_thread_start);

init_completion(&c->gc_thread_exit);

pid = kernel_thread(jffs2_garbage_collect_thread, c, CLONE_FS|CLONE_FILES);

if (pid < 0) {

printk(KERN_WARNING "fork failed for JFFS2 garbage collect thread: %d/n", -pid);

complete(&c->gc_thread_exit);

ret = pid;

} else {

/* Wait for it... */

D1(printk(KERN_DEBUG "JFFS2: Garbage collect thread is pid %d/n", pid));

down(&c->gc_thread_start);

}

return ret;

}

信號量gc_thread_start用於保證在當前執行流在創建了GC內核線程後、在返回到當前執行流時GC內核線程已經運行了。kernel_thread函數創建GC內核線程，此時GC內核線程與當前執行流誰獲得cpu還不一定，於是當前執行流在獲得gc_thread_start信號量時會阻塞（先前通過init_MUTEX_LOCKED巨集已將信號量初始化爲“不可用”狀態），直到GC內核線程第一次獲得運行後釋放該信號量，參見下文。

傳遞給kernel_thread函數的第一個參數爲新的內核線程所執行的代碼，所以GC內核線程的行爲由函數jffs2_garbage_collect_thread確定：

jffs2_garbage_collect_thread函數

static int jffs2_garbage_collect_thread(void *_c)

{

struct jffs2_sb_info *c = _c;

daemonize();

c->gc_task = current;

up(&c->gc_thread_start);

sprintf(current->comm, "jffs2_gcd_mtd%d", c->mtd->index);

既然是運行于後臺的內核線程，首先就得通過daemonize函數釋放從其父進程獲得的一些資源，比如關閉已經打開的控制臺設備文件描述符、釋放所有的屬於用戶空間的頁面（如果存在的話）、並改換門庭投靠到init門下等等，其詳細分析可參見《Linux內核源代碼情景分析》下冊P725。

用jffs2_sb_info.gc_task來指向GC內核線程的PCB，這樣就不用爲它創建額外的等待佇列了：當GC內核線程無事可作時不用加入等待佇列，只需從運行佇列中刪除即可；而在需要喚醒GC內核線程時可以通過gc_task指向通過send_sig函數相它發送SIGHUP信號。

釋放gc_thread_start信號量，這將喚醒先前創建GC內核線程的執行流。然後給GC內核線程起個名字“jffs2_gcd_mtd#”，其中最後的數位爲jffs2文件系統所在的flash分區的編號（從0開始），比如在我的系統中這個編號爲5（整個flash上有6個分區：uboot映象、uboot參數、分區表、內核映象、work分區、root分區）。

通過set_user_nice函數設置GC內核線程的靜態優先順序爲10後，就進入它的主體循環了：

set_user_nice(current, 10);

for (;;) {

spin_lock_irq(&current->sigmask_lock);

siginitsetinv (&current->blocked, sigmask(SIGHUP) | sigmask(SIGKILL) | sigmask(SIGSTOP) |

sigmask(SIGCONT));

recalc_sigpending();

spin_unlock_irq(&current->sigmask_lock);

信號的編號與體系結構相關，定義於asm/signal.h。巨集sigmask返回信號的位元索引：

#define sigmask(sig) (1UL << ((sig) - 1))

這是因爲在Linux上沒有編號爲0的信號。

GC內核線程的狀態由SIGHUP、SIGKILL、SIGSTOP和SIGCONT四種信號控制，所以每次迴圈的開始都要做相應的準備工作，以便在進入TASK_INTERRUPTIBLE狀態後接收：由內聯函數siginitsetinv清除其信號遮罩字blocked中這四種信號的相應位元，使得GC內核線程只接收這些信號。recalc_sigpending函數檢查當前進程是否有非阻塞的挂起信號，如果有，則設置PCB中的sigpending標誌。這個標誌由下面的signal_pending函數檢查。

（比較：用戶進程的信號處理函數在用戶態執行，用戶進程不必關心接收信號的細節，不包含接收信號的代碼。而內核線程運行於內核態，要由內核線程自己來完成信號的接收工作。）

if (!thread_should_wake(c)) {

set_current_state (TASK_INTERRUPTIBLE);

D1(printk(KERN_DEBUG "jffs2_garbage_collect_thread sleeping.../n"));

/* Yes, there's a race here; we checked thread_should_wake() before

setting current->state to TASK_INTERRUPTIBLE. But it doesn't

matter - We don't care if we miss a wakeup, because the GC thread

is only an optimisation anyway. */

schedule();

}

每次迴圈的開始都要判斷GC內核線程是否真的需要運行。若不需要，那麽主動調用調度程式。由於jffs2_sb_info.gc_task保存了GC內核線程PCB的位址，所以就不需要創建一個額外的等待佇列並將其加入其中了，只需將其狀態改爲TASK_INTERRUPTIBLE即可，而在調度程式中會把它從運行佇列中刪除。

需要說明的是這樣進入睡眠的方法存在競爭條件。要想無競爭地進入睡眠就必須在判斷條件是否爲真前改變進程的狀態，這樣在判斷條件之後、進入調度程式之前，用於喚醒GC內核線程的中斷執行流能夠撤銷進入睡眠。但是根據作者的注釋即使發生競爭條件也無大礙。

GC內核線程無事可作時進入TASK_INTERRUPTIBLE狀態，收到SIGHUP、SIGKILL、SIGSTOP和SIGCONT中任何一種信號時恢復到TASK_RUNNING狀態。那麽每次恢復運行後都得首先處理所有非阻塞挂起信號，判斷喚醒的原因。dequeue_signal取出非阻塞挂起信號中編號最小的那一個：

cond_resched();

/* Put_super will send a SIGKILL and then wait on the sem. */

while (signal_pending(current)) {

siginfo_t info;

unsigned long signr;

spin_lock_irq(&current->sigmask_lock);

signr = dequeue_signal(&current->blocked, &info);

spin_unlock_irq(&current->sigmask_lock);

switch(signr) {

case SIGSTOP:

D1(printk(KERN_DEBUG "jffs2_garbage_collect_thread(): SIGSTOP received./n"));

set_current_state(TASK_STOPPED);

schedule();

break;

如果因收到SIGSTOP信號而喚醒，那麽GC內核線程應該進入TASK_STOPPED狀態並立刻引發調度。注意當再次恢復執行時，通過break語句直接跳出信號處理while迴圈。

case SIGKILL:

D1(printk(KERN_DEBUG "jffs2_garbage_collect_thread(): SIGKILL received./n"));

spin_lock_bh(&c->erase_completion_lock);

c->gc_task = NULL;

spin_unlock_bh(&c->erase_completion_lock);

complete_and_exit(&c->gc_thread_exit, 0);

如果因收到SIGKILL信號而喚醒，那麽GC內核線程應該立即結束。當需要結束GC內核線程時（比如卸載jffs2文件系統時、或者重新按照唯讀方式挂載時），當前進程通過jffs2_stop_garbage_collect_thread函數給它發送SIGKILL信號，然後阻塞在gc_thread_exit.wait等待佇列上；而由GC內核線程在處理SIGKILL信號時再將這個進程喚醒並完成退出。

case SIGHUP:

D1(printk(KERN_DEBUG "jffs2_garbage_collect_thread(): SIGHUP received./n"));

break;

如果因收到SIGHUP信號而喚醒，那麽說明GC內核線程有事可作了，所以通過break語句直接跳出信號處理while迴圈。在jffs2_garbage_collect_trigger函數中相它發送SIGHUP信號。

default:

D1(printk(KERN_DEBUG "jffs2_garbage_collect_thread(): signal %ld received/n", signr));

}

} //while

最後的default分支用於處理SIGCONT信號。此時沒有直接跳出while迴圈，而是接著處理剩餘的非阻塞挂起信號（如果存在的話），直到所有的信號都處理完才結束while迴圈。

（需要說明的是，雖然代碼中允許GC內核線程接收四種信號，但現有的代碼中只使用到了SIGKILL和SIGHUP兩種信號。）

/* We don't want SIGHUP to interrupt us. STOP and KILL are OK though. */

spin_lock_irq(&current->sigmask_lock);

siginitsetinv (&current->blocked, sigmask(SIGKILL) | sigmask(SIGSTOP) | sigmask(SIGCONT));

recalc_sigpending();

spin_unlock_irq(&current->sigmask_lock);

D1(printk(KERN_DEBUG "jffs2_garbage_collect_thread(): pass/n"));

jffs2_garbage_collect_pass(c);

} //for

}

最後，如果GC內核線程恢復執行的原因不是希望其退出的話，就說明它有事可做了。GC操作由jffs2_garbage_cllect_pass函數完成。另外根據作者的注釋，在執行 GC 操作期間不希望收到 SIGHUP 信號，這是爲什麽？另外在 GC 操作後如何進入 TASK_INTERRUPIBLE 狀態？

jffs2_garbage_collect_pass函數

在一個擦除塊中既有有效資料，又有過時資料。那麽 GC 的思路就是：挑選一個擦除塊，每次 GC 只針對一個有效的資料實體：將它的副本到另外一個擦除塊中，從而使得原來的資料實體變成過時的。直到某次 GC 後使得整個擦除塊只含有過時的資料實體，就可以啓動擦除操作了。

/* jffs2_garbage_collect_pass

Make a single attempt to progress GC. Move one node, and possibly start erasing one eraseblock.

int jffs2_garbage_collect_pass(struct jffs2_sb_info *c)

{

struct jffs2_eraseblock *jeb;

struct jffs2_inode_info *f;

struct jffs2_raw_node_ref *raw;

struct jffs2_node_frag *frag;

struct jffs2_full_dnode *fn = NULL;

struct jffs2_full_dirent *fd;

uint32_t start = 0, end = 0, nrfrags = 0;

uint32_t inum;

struct inode *inode;

int ret = 0;

if (down_interruptible(&c->alloc_sem))

return -EINTR;

spin_lock_bh(&c->erase_completion_lock);

在GC期間要持有信號量alloc_sem。如果 unchecked_size 不爲 0 ，則不能開始 GC 操作。爲什麽？

while (c->unchecked_size) {

/* We can't start doing GC yet. We haven't finished checking

the node CRCs etc. Do it now and wait for it. */

struct jffs2_inode_cache *ic;

if (c->checked_ino > c->highest_ino) {

printk(KERN_CRIT "Checked all inodes but still 0x%x bytes of unchecked space?/n",

c->unchecked_size);

D1(jffs2_dump_block_lists(c));

BUG();

}

ic = jffs2_get_ino_cache(c, c->checked_ino++);

if (!ic)

continue;

if (!ic->nlink) {

D1(printk(KERN_DEBUG "Skipping check of ino #%d with nlink zero/n", ic->ino));

continue;

}

if (ic->state != INO_STATE_UNCHECKED) {

D1(printk(KERN_DEBUG "Skipping check of ino #%d already in state %d/n", ic->ino, ic->state));

continue;

}

spin_unlock_bh(&c->erase_completion_lock);

D1(printk(KERN_DEBUG "jffs2_garbage_collect_pass() triggering inode scan of ino#%d/n", ic->ino));

{

/* XXX: This wants doing more sensibly -- split the core of jffs2_do_read_inode up */

struct inode *i = iget(OFNI_BS_2SFFJ(c), ic->ino);

if (is_bad_inode(i)) {

printk(KERN_NOTICE "Eep. read_inode() failed for ino #%u/n", ic->ino);

ret = -EIO;

}

iput(i);

}

up(&c->alloc_sem);

return ret;

}

/* First, work out which block we're garbage-collecting */

jeb = c->gcblock;

if (!jeb)

jeb = jffs2_find_gc_block(c);

if (!jeb) {

printk(KERN_NOTICE "jffs2: Couldn't find erase block to garbage collect!/n");

spin_unlock_bh(&c->erase_completion_lock);

up(&c->alloc_sem);

return -EIO;

}

jffs2_sb_info的gcblock指向當前應該被GC的擦除塊。如果它爲空，則由jffs2_find_gc_block函數從含有過時資料實體的擦除塊中選擇一個（該函數涉及Wear Levelling演算法，有待深究）。（在第一次運行 GC 前、或完成一個擦除塊的 GC 操作後 gcblock 爲 NULL ，還有其他情況麽？）

D1(printk(KERN_DEBUG "GC from block %08x, used_size %08x, dirty_size %08x, free_size %08x/n",

jeb->offset, jeb->used_size, jeb->dirty_size, jeb->free_size));

D1(if (c->nextblock)

printk(KERN_DEBUG "Nextblock at %08x, used_size %08x, dirty_size %08x, wasted_size %08x,

free_size %08x/n", c->nextblock->offset, c->nextblock->used_size, c->nextblock->dirty_size,

c->nextblock->wasted_size, c->nextblock->free_size)

);

if (!jeb->used_size) {

up(&c->alloc_sem);

goto eraseit;

}

used_size爲擦除塊內所有有效資料實體所占的空間。如果它爲0，則說明整個擦除塊都過時了，因此可以直接跳到eraseit處將其描述符加入erase_pending_list鏈表。

一個flash擦除塊內所有資料實體的內核描述符由next_phys域組織成一個鏈表，其首尾元素分別由擦除塊描述符jffs2_eraseblock的first_node和last_node域指向，而gc_node指向當前被GC的資料實體的描述符。

raw = jeb->gc_node;

while(ref_obsolete(raw)) {

D1(printk(KERN_DEBUG "Node at 0x%08x is obsolete... skipping/n", ref_offset(raw)));

jeb->gc_node = raw = raw->next_phys;

if (!raw) {

printk(KERN_WARNING "eep. End of raw list while still supposedly nodes to GC/n");

printk(KERN_WARNING "erase block at 0x%08x. free_size 0x%08x, dirty_size 0x%08x,

used_size 0x%08x/n", jeb->offset, jeb->free_size, jeb->dirty_size, jeb->used_size);

spin_unlock_bh(&c->erase_completion_lock);

up(&c->alloc_sem);

BUG();

}

當重新選擇一個新的擦除塊進行GC時，gc_node指向該擦除塊內第一個資料實體的描述符（見jffs2_find_gc_block函數）。如果資料實體過時，那麽一直步進到第一個不過時的資料實體。這是因爲 GC 操作的物件是有效資料實體（就是要將該擦除塊內有效的資料實體寫一個副本到另外的擦除塊中，從而使有效的資料實體變成過時的，最終使整個擦除塊上的資料實體都過時，然後就可以擦除整個擦除塊了）。

D1(printk(KERN_DEBUG "Going to garbage collect node at 0x%08x/n", ref_offset(raw)));

if (!raw->next_in_ino) {

/* Inode-less node. Clean marker, snapshot or something like that */

/* FIXME: If it's something that needs to be copied, including something

we don't grok that has JFFS2_NODETYPE_RWCOMPAT_COPY, we should do so */

spin_unlock_bh(&c->erase_completion_lock);

jffs2_mark_node_obsolete(c, raw);

up(&c->alloc_sem);

goto eraseit_lock;

}

任何文件的資料實體的內核描述符的next_in_ino用於組織迴圈鏈表。所以如果它爲空，則說明該資料實體不從屬於任何文件。根據注釋可能爲 cleanmarker 或者 snapshot ，此時標記描述符爲過時即可。

inum = jffs2_raw_ref_to_inum(raw);

D1(printk(KERN_DEBUG "Inode number is #%u/n", inum));

spin_unlock_bh(&c->erase_completion_lock);

D1(printk(KERN_DEBUG "jffs2_garbage_collect_pass collecting from block @0x%08x. Node @0x%08x,

ino #%u/n", jeb->offset, ref_offset(raw), inum));

inode = iget(OFNI_BS_2SFFJ(c), inum);

然後通過jffs2_raw_ref_to_inum函數返回資料實體所在文件的索引節點號，並由iget函數返回文件的索引結點指標（同時增加其引用計數）。

if (is_bad_inode(inode)) {

printk(KERN_NOTICE "Eep. read_inode() failed for ino #%u/n", inum);

/* NB. This will happen again. We need to do something appropriate here. */

up(&c->alloc_sem);

iput(inode);

return -EIO;

}

先前在打開文件、創建inode物件時（在jffs2_read_inode函數中），如果讀取flash上的資料實體失敗，則通過make_bad_inode函數標記該inode爲bad，其代碼如下：

ret = jffs2_do_read_inode(c, f, inode->i_ino, &latest_node);

if (ret) {

make_bad_inode(inode);

up(&f->sem);

return;

}

讀取資料實體失敗是由於訪問介質錯誤引起的，所以在GC操作中一旦發現這種情況就直接以EIO退出了。

f = JFFS2_INODE_INFO(inode);

down(&f->sem);

/* Now we have the lock for this inode. Check that it's still the one at the head of the list. */

在GC操作期間必須持有jffs2_inode_info.sem信號量，它用於同步寫操作和GC操作。參見上文。

if (ref_obsolete(raw)) {

D1(printk(KERN_DEBUG "node to be GC'd was obsoleted in the meantime./n"));

/* They'll call again */

goto upnout;

}

在真正開始GC操作前還要再次檢查該資料實體是否變成過時的了，這是因爲前面獲得jffs2_inode_info.sem信號量時可能阻塞，所以需要再次檢查。

（下面的代碼有待進一步分析，針對資料實體的類型通過相應的jffs2_garbage_collect_xxxx函數完成GC操作。等到把一個擦除塊中所有有效的資料實體移走之後，即不含有任何有效資料實體（擦除塊描述符中used_size域等於0）就可以擦除這個擦除塊了：將其描述符加入erase_pending_list鏈表並啟動擦除過程，並將gcblock設置爲NULL）。

/* OK. Looks safe. And nobody can get us now because we have the semaphore. Move the block */

if (f->metadata && f->metadata->raw == raw) {

fn = f->metadata;

ret = jffs2_garbage_collect_metadata(c, jeb, f, fn);

goto upnout;

}

/* FIXME. Read node and do lookup? */

for (frag = frag_first(&f->fragtree); frag; frag = frag_next(frag)) {

if (frag->node && frag->node->raw == raw) {

fn = frag->node;

end = frag->ofs + frag->size;

#if 1 /* Temporary debugging sanity checks, till we're ready to _trust_ the REF_PRISTINE flag stuff */

if (!nrfrags && ref_flags(fn->raw) == REF_PRISTINE) {

if (fn->frags > 1)

printk(KERN_WARNING "REF_PRISTINE node at 0x%08x had %d frags. Tell

dwmw2/n", ref_offset(raw), fn->frags);

if (frag->ofs & (PAGE_CACHE_SIZE-1) && frag_prev(frag) && frag_prev(frag)->node)

printk(KERN_WARNING "REF_PRISTINE node at 0x%08x had a previous non-hole

frag in the same page. Tell dwmw2/n", ref_offset(raw));

if ((frag->ofs+frag->size) & (PAGE_CACHE_SIZE-1) && frag_next(frag) &&

frag_next(frag)->node)

printk(KERN_WARNING "REF_PRISTINE node at 0x%08x (%08x-%08x) had a

following non-hole frag in the same page. Tell dwmw2/n",

ref_offset(raw), frag->ofs, frag->ofs+frag->size);

}

#endif

if (!nrfrags++)

start = frag->ofs;

if (nrfrags == frag->node->frags)

break; /* We've found them all */

}

} // for

if (fn) {

/* We found a datanode. Do the GC */

if((start >> PAGE_CACHE_SHIFT) < ((end-1) >> PAGE_CACHE_SHIFT)) {

/* It crosses a page boundary. Therefore, it must be a hole. */

ret = jffs2_garbage_collect_hole(c, jeb, f, fn, start, end);

} else {

/* It could still be a hole. But we GC the page this way anyway */

ret = jffs2_garbage_collect_dnode(c, jeb, f, fn, start, end);

}

goto upnout;

}

/* Wasn't a dnode. Try dirent */

for (fd = f->dents; fd; fd=fd->next) {

if (fd->raw == raw)

break;

}

if (fd && fd->ino) {

ret = jffs2_garbage_collect_dirent(c, jeb, f, fd);

} else if (fd) {

ret = jffs2_garbage_collect_deletion_dirent(c, jeb, f, fd);

} else {

printk(KERN_WARNING "Raw node at 0x%08x wasn't in node lists for ino #%u/n",

ref_offset(raw), f->inocache->ino);

if (ref_obsolete(raw)) {

printk(KERN_WARNING "But it's obsolete so we don't mind too much/n");

} else {

ret = -EIO;

}

upnout:

up(&f->sem);

up(&c->alloc_sem);

iput(inode);

eraseit_lock:

/* If we've finished this block, start it erasing */

spin_lock_bh(&c->erase_completion_lock);

eraseit:

if (c->gcblock && !c->gcblock->used_size) {

D1(printk(KERN_DEBUG "Block at 0x%08x completely obsoleted by GC. Moving to

erase_pending_list/n", c->gcblock->offset));

/* We're GC'ing an empty block? */

list_add_tail(&c->gcblock->list, &c->erase_pending_list);

c->gcblock = NULL;

c->nr_erasing_blocks++;

jffs2_erase_pending_trigger(c);

}

spin_unlock_bh(&c->erase_completion_lock);

return ret;

}

第9章討論和體會

什麽是日誌文件系統，爲什麽要使用jffs2？

與ext2是針對磁片設計的文件系統類似，jffs2是專門針對flash的特點設計的文件系統。flash爲EEPROM，可以隨機讀出，但是在寫之前必須執行擦除操作，這是因爲flash的寫操作只能將1寫爲0，所以必須首先將待寫區域全部“擦除”爲1。而且擦除操作必須以擦除塊爲單位進行，即使只改動一個位元組的資料，也不得不擦除其所在的整個擦除塊。

flash晶片由若干擦除塊組成，單個擦除塊的壽命（大約100,000次）決定了整個flash晶片的壽命。所以基於flash的文件系統應該使得所有的擦除塊都被幾乎同等頻繁地使用，避免過度使用一部分擦除塊而導致整個晶片過早報廢。這種演算法稱爲“Wear levelling”。

在ext2文件系統中如果修改了文件的某一個區域，那麽在寫文件前，必須首先找出相應區域所對應的磁片塊，然後視需要讀出這個塊，再修改，最後非同步地寫回這個塊。而在jffs2文件系統中如果修改了文件的某一個區域，那麽不會改動flash上含有該區域資料的原有資料實體，是直接向flash分區中寫入一個嶄新的資料實體，同時將原有資料實體在內核中的描述符標記爲“過時”。那麽在打開這個文件、爲其資料實體創建相應的資料結構時就會跳過所有過時的資料實體，參見 jffs2_get_inode_nodes函數。

寫入jffs2文件系統的過程就好像在寫日記：今天的日記總會在新的、空白的一頁上寫，而不會去修改昨天的日記。

日記本也有用完的時候。flash上乾淨擦除塊數量低於某一閾值GC就會啓用：回收“過時”的資料實體所在的空間。其實每次GC內核線程運行時只把被GC的擦除塊中一個有效資料實體變成過時，當整個擦除塊中只含有過時資料實體時進行擦除操作。

jffs2儘量減少擦除操作：如果每擦除一個擦除塊只是修改其中的一個單元，那麽效率是最低的。如果需要修改整個擦除塊的所有單元，那麽執行一次擦除操作的效率就是最高的。在GC時會挑選一個含有過時資料實體的擦除塊，把其中有效的資料實體寫一個副本到另外的擦除塊上，從而使有效的資料實體也變成過時的。當整個擦除塊變成全髒的時候就可以進行擦除操作了（挑選“被GC的”擦除塊和“另外的”擦除塊都與Wear Levelling演算法有關）。

爲什麽文件系統源代碼中要提供那麽多方法？

文件系統的本質是一種資料組織格式。在設備上可以安裝不同的文件系統，從而以不同的格式來存儲資料。在利用設備驅動程式讀出設備上的資料流程後，就必須經由相應文件系統的方法才能夠解釋資料。同理，在寫入設備前，必須首先用文件系統提供的方法把資料組織爲特定的格式。

VFS向用戶進程提供訪問不同文件系統的統一介面，而VFS內部在通過標準介面調用具體文件系統的方法處理過資料後，最終通過設備驅動程式將資料寫入設備。

如果沒有文件系統方法而只有設備驅動程式，那麽就只能從設備上讀出原始的位元組流，此時設備也只能被當作“裸設備”（raw device）。

爲什麽需要紅黑樹？

jffs2文件系統中需要維護資料實體的內核描述符jffs2_raw_node_ref的鏈表。假設文件初始資料結點爲N個，那麽多次修改後實際的資料實體個數和鏈表長度將遠遠大於N。

VFS的讀寫函數介面提供文件某個區域的起始位置offset及長度len。當然可以遍曆整個鏈表，跳過落在這個區域內的過時資料實體、找到有效的資料實體，然後再訪問flash上的有效實體。但是如果N較大，修改次數較多，那麽每次讀寫時都要遍曆整個鏈表就十分缺乏效率了。

所以需要一種合適的用於快速查找的資料結構及演算法，將鏈表元素以另外的方式組織起來，於是就採用了紅黑樹。文件個資料實體的內核描述符除了組織在鏈表中，還通過相應的jffs2_full_dnode和jffs2_node_frag組織在紅黑樹中，樹根爲jffs2_inode_info.fragtree。

對文件進行修改時要寫入新的資料實體，同時在內核中創建相應的描述符和jffs2_full_dnode資料結構，在將新實體的描述符加入jffs2_inode_cache.nodes鏈表的首部時還要標記原有實體的描述符爲過時，同時，還要修改紅黑樹中相應結點的node指標指向新的jffs2_full_dnode結構，同時遞減過時的jffs2_full_dnode的frags引用計數（原來爲1，現在就爲0了）。（注意，紅黑樹中的結點始終指向有效資料實體的相關資料結構）

何時、如何判斷資料實體是過時的？

在修改文件時寫入新的資料實體，此時顯然知道原有資料實體是過時的。但這只是整個事情的一小部分。

在挂載文件系統時將掃描整個flash分區， 爲所有的資料實體創建內核描述符，無論過時與否。此時只對正規文件的資料實體本身進行crc校驗，而把對後繼資料的crc校驗延遲到了打開文件時。如果發現資料實體本身的crc校驗錯誤，則標記其內核描述符爲過時的，參見 jffs2_get_inode_nodes函數的相關部分。

需要強調的是， 在建立內核描述符時無法知道 flash 上的資料實體是否過時（因爲資料實體並不知道自己是否過時，是否過時由其內核描述符表示）。

對於目錄文件，在挂載的後期創建目錄項的jffs2_full_dirent資料結構的鏈表。此時就可以發現資料實體是否過時了：如果它們的名字相同，那麽版本號較高的那個爲有效的，其他的爲過時的。此時只將有效資料實體的jffs2_full_dirent加入鏈表，而釋放過時的資料結構。詳見 jffs2_add_fd_to_list函數。

對於正規文件，在打開文件、創建inode時根據資料實體的內核描述符創建其他資料結構，並組織紅黑樹。 在組織紅黑樹時就可以發現是否有多個資料實體涉及相同區域的資料了，版本號最高的資料實體爲有效的，而其他的都是過時的。此時將紅黑樹中的結點指向版本號最高的資料實體的jffs2_full_dnode資料結構，並標記其他資料實體的內核描述符爲過時。具體操作在 jffs2_add_full_dnode_to_inode 函數中，有待詳細分析核實。

後記

本文從jffs2文件系統的註冊、挂載、文件的打開、正規文件的讀寫這幾個情景出發分析了其源代碼的主體框架，描述了jffs2文件系統的核心資料結構及之間的相互引用關係。

由於作者時間有限，jffs2文件系統所特有的垃圾回收（GC）演算法、Wear Levelling演算法及紅黑樹的插入刪除操作等方面需要進一步深究，在此羅列如下：

1. 紅黑樹的插入、刪除、查找、平衡，比如jffs2_add_full_dnode_to_ino de 函數、 jffs2_lookup_node_frag 函數。

2. 讀寫除正規文件之外其他特殊文件的方法，比如目錄文件、符號鏈結文件等等。

3. 關閉文件時釋放相應資料結構的過程。

4. 使用zlib庫壓縮、解壓縮資料的方法。

5. Wear Levelling演算法有關的函數，比如jffs2_rotate_list和jffs2_resereve_space函數，以及GC時選擇一個擦除塊的方法。

6. NAND flash的非同步寫入機制。

由於作者能力有限，現有文檔中存在許多疑問（採用紅色字體），也一定有許多錯誤，歡迎大家補充、批評指正，謝謝！

（從 www.infradead.org上可以獲得與jffs2相關的各種資源，包括正在開發中的jffs3）

（全文完）