mirror of https://github.com/Qortal/Brooklyn
You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
453 lines
13 KiB
453 lines
13 KiB
.. SPDX-License-Identifier: GPL-2.0 |
|
|
|
Directory Entries |
|
----------------- |
|
|
|
In an ext4 filesystem, a directory is more or less a flat file that maps |
|
an arbitrary byte string (usually ASCII) to an inode number on the |
|
filesystem. There can be many directory entries across the filesystem |
|
that reference the same inode number--these are known as hard links, and |
|
that is why hard links cannot reference files on other filesystems. As |
|
such, directory entries are found by reading the data block(s) |
|
associated with a directory file for the particular directory entry that |
|
is desired. |
|
|
|
Linear (Classic) Directories |
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
|
|
|
By default, each directory lists its entries in an “almost-linear” |
|
array. I write “almost” because it's not a linear array in the memory |
|
sense because directory entries are not split across filesystem blocks. |
|
Therefore, it is more accurate to say that a directory is a series of |
|
data blocks and that each block contains a linear array of directory |
|
entries. The end of each per-block array is signified by reaching the |
|
end of the block; the last entry in the block has a record length that |
|
takes it all the way to the end of the block. The end of the entire |
|
directory is of course signified by reaching the end of the file. Unused |
|
directory entries are signified by inode = 0. By default the filesystem |
|
uses ``struct ext4_dir_entry_2`` for directory entries unless the |
|
“filetype” feature flag is not set, in which case it uses |
|
``struct ext4_dir_entry``. |
|
|
|
The original directory entry format is ``struct ext4_dir_entry``, which |
|
is at most 263 bytes long, though on disk you'll need to reference |
|
``dirent.rec_len`` to know for sure. |
|
|
|
.. list-table:: |
|
:widths: 8 8 24 40 |
|
:header-rows: 1 |
|
|
|
* - Offset |
|
- Size |
|
- Name |
|
- Description |
|
* - 0x0 |
|
- \_\_le32 |
|
- inode |
|
- Number of the inode that this directory entry points to. |
|
* - 0x4 |
|
- \_\_le16 |
|
- rec\_len |
|
- Length of this directory entry. Must be a multiple of 4. |
|
* - 0x6 |
|
- \_\_le16 |
|
- name\_len |
|
- Length of the file name. |
|
* - 0x8 |
|
- char |
|
- name[EXT4\_NAME\_LEN] |
|
- File name. |
|
|
|
Since file names cannot be longer than 255 bytes, the new directory |
|
entry format shortens the name\_len field and uses the space for a file |
|
type flag, probably to avoid having to load every inode during directory |
|
tree traversal. This format is ``ext4_dir_entry_2``, which is at most |
|
263 bytes long, though on disk you'll need to reference |
|
``dirent.rec_len`` to know for sure. |
|
|
|
.. list-table:: |
|
:widths: 8 8 24 40 |
|
:header-rows: 1 |
|
|
|
* - Offset |
|
- Size |
|
- Name |
|
- Description |
|
* - 0x0 |
|
- \_\_le32 |
|
- inode |
|
- Number of the inode that this directory entry points to. |
|
* - 0x4 |
|
- \_\_le16 |
|
- rec\_len |
|
- Length of this directory entry. |
|
* - 0x6 |
|
- \_\_u8 |
|
- name\_len |
|
- Length of the file name. |
|
* - 0x7 |
|
- \_\_u8 |
|
- file\_type |
|
- File type code, see ftype_ table below. |
|
* - 0x8 |
|
- char |
|
- name[EXT4\_NAME\_LEN] |
|
- File name. |
|
|
|
.. _ftype: |
|
|
|
The directory file type is one of the following values: |
|
|
|
.. list-table:: |
|
:widths: 16 64 |
|
:header-rows: 1 |
|
|
|
* - Value |
|
- Description |
|
* - 0x0 |
|
- Unknown. |
|
* - 0x1 |
|
- Regular file. |
|
* - 0x2 |
|
- Directory. |
|
* - 0x3 |
|
- Character device file. |
|
* - 0x4 |
|
- Block device file. |
|
* - 0x5 |
|
- FIFO. |
|
* - 0x6 |
|
- Socket. |
|
* - 0x7 |
|
- Symbolic link. |
|
|
|
To support directories that are both encrypted and casefolded directories, we |
|
must also include hash information in the directory entry. We append |
|
``ext4_extended_dir_entry_2`` to ``ext4_dir_entry_2`` except for the entries |
|
for dot and dotdot, which are kept the same. The structure follows immediately |
|
after ``name`` and is included in the size listed by ``rec_len`` If a directory |
|
entry uses this extension, it may be up to 271 bytes. |
|
|
|
.. list-table:: |
|
:widths: 8 8 24 40 |
|
:header-rows: 1 |
|
|
|
* - Offset |
|
- Size |
|
- Name |
|
- Description |
|
* - 0x0 |
|
- \_\_le32 |
|
- hash |
|
- The hash of the directory name |
|
* - 0x4 |
|
- \_\_le32 |
|
- minor\_hash |
|
- The minor hash of the directory name |
|
|
|
|
|
In order to add checksums to these classic directory blocks, a phony |
|
``struct ext4_dir_entry`` is placed at the end of each leaf block to |
|
hold the checksum. The directory entry is 12 bytes long. The inode |
|
number and name\_len fields are set to zero to fool old software into |
|
ignoring an apparently empty directory entry, and the checksum is stored |
|
in the place where the name normally goes. The structure is |
|
``struct ext4_dir_entry_tail``: |
|
|
|
.. list-table:: |
|
:widths: 8 8 24 40 |
|
:header-rows: 1 |
|
|
|
* - Offset |
|
- Size |
|
- Name |
|
- Description |
|
* - 0x0 |
|
- \_\_le32 |
|
- det\_reserved\_zero1 |
|
- Inode number, which must be zero. |
|
* - 0x4 |
|
- \_\_le16 |
|
- det\_rec\_len |
|
- Length of this directory entry, which must be 12. |
|
* - 0x6 |
|
- \_\_u8 |
|
- det\_reserved\_zero2 |
|
- Length of the file name, which must be zero. |
|
* - 0x7 |
|
- \_\_u8 |
|
- det\_reserved\_ft |
|
- File type, which must be 0xDE. |
|
* - 0x8 |
|
- \_\_le32 |
|
- det\_checksum |
|
- Directory leaf block checksum. |
|
|
|
The leaf directory block checksum is calculated against the FS UUID, the |
|
directory's inode number, the directory's inode generation number, and |
|
the entire directory entry block up to (but not including) the fake |
|
directory entry. |
|
|
|
Hash Tree Directories |
|
~~~~~~~~~~~~~~~~~~~~~ |
|
|
|
A linear array of directory entries isn't great for performance, so a |
|
new feature was added to ext3 to provide a faster (but peculiar) |
|
balanced tree keyed off a hash of the directory entry name. If the |
|
EXT4\_INDEX\_FL (0x1000) flag is set in the inode, this directory uses a |
|
hashed btree (htree) to organize and find directory entries. For |
|
backwards read-only compatibility with ext2, this tree is actually |
|
hidden inside the directory file, masquerading as “empty” directory data |
|
blocks! It was stated previously that the end of the linear directory |
|
entry table was signified with an entry pointing to inode 0; this is |
|
(ab)used to fool the old linear-scan algorithm into thinking that the |
|
rest of the directory block is empty so that it moves on. |
|
|
|
The root of the tree always lives in the first data block of the |
|
directory. By ext2 custom, the '.' and '..' entries must appear at the |
|
beginning of this first block, so they are put here as two |
|
``struct ext4_dir_entry_2``\ s and not stored in the tree. The rest of |
|
the root node contains metadata about the tree and finally a hash->block |
|
map to find nodes that are lower in the htree. If |
|
``dx_root.info.indirect_levels`` is non-zero then the htree has two |
|
levels; the data block pointed to by the root node's map is an interior |
|
node, which is indexed by a minor hash. Interior nodes in this tree |
|
contains a zeroed out ``struct ext4_dir_entry_2`` followed by a |
|
minor\_hash->block map to find leafe nodes. Leaf nodes contain a linear |
|
array of all ``struct ext4_dir_entry_2``; all of these entries |
|
(presumably) hash to the same value. If there is an overflow, the |
|
entries simply overflow into the next leaf node, and the |
|
least-significant bit of the hash (in the interior node map) that gets |
|
us to this next leaf node is set. |
|
|
|
To traverse the directory as a htree, the code calculates the hash of |
|
the desired file name and uses it to find the corresponding block |
|
number. If the tree is flat, the block is a linear array of directory |
|
entries that can be searched; otherwise, the minor hash of the file name |
|
is computed and used against this second block to find the corresponding |
|
third block number. That third block number will be a linear array of |
|
directory entries. |
|
|
|
To traverse the directory as a linear array (such as the old code does), |
|
the code simply reads every data block in the directory. The blocks used |
|
for the htree will appear to have no entries (aside from '.' and '..') |
|
and so only the leaf nodes will appear to have any interesting content. |
|
|
|
The root of the htree is in ``struct dx_root``, which is the full length |
|
of a data block: |
|
|
|
.. list-table:: |
|
:widths: 8 8 24 40 |
|
:header-rows: 1 |
|
|
|
* - Offset |
|
- Type |
|
- Name |
|
- Description |
|
* - 0x0 |
|
- \_\_le32 |
|
- dot.inode |
|
- inode number of this directory. |
|
* - 0x4 |
|
- \_\_le16 |
|
- dot.rec\_len |
|
- Length of this record, 12. |
|
* - 0x6 |
|
- u8 |
|
- dot.name\_len |
|
- Length of the name, 1. |
|
* - 0x7 |
|
- u8 |
|
- dot.file\_type |
|
- File type of this entry, 0x2 (directory) (if the feature flag is set). |
|
* - 0x8 |
|
- char |
|
- dot.name[4] |
|
- “.\\0\\0\\0” |
|
* - 0xC |
|
- \_\_le32 |
|
- dotdot.inode |
|
- inode number of parent directory. |
|
* - 0x10 |
|
- \_\_le16 |
|
- dotdot.rec\_len |
|
- block\_size - 12. The record length is long enough to cover all htree |
|
data. |
|
* - 0x12 |
|
- u8 |
|
- dotdot.name\_len |
|
- Length of the name, 2. |
|
* - 0x13 |
|
- u8 |
|
- dotdot.file\_type |
|
- File type of this entry, 0x2 (directory) (if the feature flag is set). |
|
* - 0x14 |
|
- char |
|
- dotdot\_name[4] |
|
- “..\\0\\0” |
|
* - 0x18 |
|
- \_\_le32 |
|
- struct dx\_root\_info.reserved\_zero |
|
- Zero. |
|
* - 0x1C |
|
- u8 |
|
- struct dx\_root\_info.hash\_version |
|
- Hash type, see dirhash_ table below. |
|
* - 0x1D |
|
- u8 |
|
- struct dx\_root\_info.info\_length |
|
- Length of the tree information, 0x8. |
|
* - 0x1E |
|
- u8 |
|
- struct dx\_root\_info.indirect\_levels |
|
- Depth of the htree. Cannot be larger than 3 if the INCOMPAT\_LARGEDIR |
|
feature is set; cannot be larger than 2 otherwise. |
|
* - 0x1F |
|
- u8 |
|
- struct dx\_root\_info.unused\_flags |
|
- |
|
* - 0x20 |
|
- \_\_le16 |
|
- limit |
|
- Maximum number of dx\_entries that can follow this header, plus 1 for |
|
the header itself. |
|
* - 0x22 |
|
- \_\_le16 |
|
- count |
|
- Actual number of dx\_entries that follow this header, plus 1 for the |
|
header itself. |
|
* - 0x24 |
|
- \_\_le32 |
|
- block |
|
- The block number (within the directory file) that goes with hash=0. |
|
* - 0x28 |
|
- struct dx\_entry |
|
- entries[0] |
|
- As many 8-byte ``struct dx_entry`` as fits in the rest of the data block. |
|
|
|
.. _dirhash: |
|
|
|
The directory hash is one of the following values: |
|
|
|
.. list-table:: |
|
:widths: 16 64 |
|
:header-rows: 1 |
|
|
|
* - Value |
|
- Description |
|
* - 0x0 |
|
- Legacy. |
|
* - 0x1 |
|
- Half MD4. |
|
* - 0x2 |
|
- Tea. |
|
* - 0x3 |
|
- Legacy, unsigned. |
|
* - 0x4 |
|
- Half MD4, unsigned. |
|
* - 0x5 |
|
- Tea, unsigned. |
|
* - 0x6 |
|
- Siphash. |
|
|
|
Interior nodes of an htree are recorded as ``struct dx_node``, which is |
|
also the full length of a data block: |
|
|
|
.. list-table:: |
|
:widths: 8 8 24 40 |
|
:header-rows: 1 |
|
|
|
* - Offset |
|
- Type |
|
- Name |
|
- Description |
|
* - 0x0 |
|
- \_\_le32 |
|
- fake.inode |
|
- Zero, to make it look like this entry is not in use. |
|
* - 0x4 |
|
- \_\_le16 |
|
- fake.rec\_len |
|
- The size of the block, in order to hide all of the dx\_node data. |
|
* - 0x6 |
|
- u8 |
|
- name\_len |
|
- Zero. There is no name for this “unused” directory entry. |
|
* - 0x7 |
|
- u8 |
|
- file\_type |
|
- Zero. There is no file type for this “unused” directory entry. |
|
* - 0x8 |
|
- \_\_le16 |
|
- limit |
|
- Maximum number of dx\_entries that can follow this header, plus 1 for |
|
the header itself. |
|
* - 0xA |
|
- \_\_le16 |
|
- count |
|
- Actual number of dx\_entries that follow this header, plus 1 for the |
|
header itself. |
|
* - 0xE |
|
- \_\_le32 |
|
- block |
|
- The block number (within the directory file) that goes with the lowest |
|
hash value of this block. This value is stored in the parent block. |
|
* - 0x12 |
|
- struct dx\_entry |
|
- entries[0] |
|
- As many 8-byte ``struct dx_entry`` as fits in the rest of the data block. |
|
|
|
The hash maps that exist in both ``struct dx_root`` and |
|
``struct dx_node`` are recorded as ``struct dx_entry``, which is 8 bytes |
|
long: |
|
|
|
.. list-table:: |
|
:widths: 8 8 24 40 |
|
:header-rows: 1 |
|
|
|
* - Offset |
|
- Type |
|
- Name |
|
- Description |
|
* - 0x0 |
|
- \_\_le32 |
|
- hash |
|
- Hash code. |
|
* - 0x4 |
|
- \_\_le32 |
|
- block |
|
- Block number (within the directory file, not filesystem blocks) of the |
|
next node in the htree. |
|
|
|
(If you think this is all quite clever and peculiar, so does the |
|
author.) |
|
|
|
If metadata checksums are enabled, the last 8 bytes of the directory |
|
block (precisely the length of one dx\_entry) are used to store a |
|
``struct dx_tail``, which contains the checksum. The ``limit`` and |
|
``count`` entries in the dx\_root/dx\_node structures are adjusted as |
|
necessary to fit the dx\_tail into the block. If there is no space for |
|
the dx\_tail, the user is notified to run e2fsck -D to rebuild the |
|
directory index (which will ensure that there's space for the checksum. |
|
The dx\_tail structure is 8 bytes long and looks like this: |
|
|
|
.. list-table:: |
|
:widths: 8 8 24 40 |
|
:header-rows: 1 |
|
|
|
* - Offset |
|
- Type |
|
- Name |
|
- Description |
|
* - 0x0 |
|
- u32 |
|
- dt\_reserved |
|
- Zero. |
|
* - 0x4 |
|
- \_\_le32 |
|
- dt\_checksum |
|
- Checksum of the htree directory block. |
|
|
|
The checksum is calculated against the FS UUID, the htree index header |
|
(dx\_root or dx\_node), all of the htree indices (dx\_entry) that are in |
|
use, and the tail block (dx\_tail).
|
|
|