Structure of UNIX file system

This is reading notes of Operating Systems: Three Easy Pieces.

Modelling file and directory

We can model File and Directory in Python as follows:

class File:
    inode: int
    data: bytes

class Directory:
    inode: int
    content: T.List[T.Tuple[str, int]]

Both File and Directory has the internal name (inode).

File’s content is an array of bytes, while Directory’s content is a list of pairs (readable_file_name, the_file_inode)

why it is a list of tuple, not a dictionary?

Create a file

In C, the system call open returns the file descriptor which is an integer, private per process, and is used in UNIX systems to access files.

The file descriptor corresponds to an open file handle object:

class FileHandle:
    # reference count because the object is shared
    ref: int
    readable: bool
    writable: bool
    inode: Inode
    offset: int

In Python open return the file handle instead of the file descriptor. os.fdopen and open are same. os.fdopen accepts file descriptor instead of a path. Both return the file handle.

what does unlink() do

open, fdopen and fopen

See https://stackoverflow.com/questions/1658476/c-fopen-vs-open

int open(const char*): System call
FILE* fdopen(int): Given file descriptor return the FILE*
FILE* fopen(const char*): It is from textbook. Is it same as fdopen(open(char *filename)) with buffer IO?

Read a file

Here strace is introduced, which is very useful.

If read() returns 0, does it mean it is closed? No! It just means the file has no more content (EOF).

What about write?

If you open two files, then it will have independent file descriptor and FileHandles.

Shared file table

The file table is shared between parent and child processes. That means when you open a file, seek the file in child, you will see the offset change in parent as well. Because the FileHandle is shared. ref is used to track how many processes are sharing the handle. when all processes close the handle, the object will be removed.

The dup() call creates a new file descriptor that refers to the same underlying open file as an existing descriptor.

#+sh cat dup.c

[] why dup() is useful when writing a UNIX shell and performing operations like output redirection?

fsync

write() buffer the content to write in memory. fsync() does not. It sounds like flush.

Note that you need to create the folder as well:

Interestingly, this sequence does not guarantee everything that you might expect; in some cases, you also need to fsync() the directory that contains the file foo. Adding this step ensures not only that the file itself is on disk, but that the file, if newly created, also is durably a part of the directory. Not surprisingly, this type of detail is often overlooked (忽略), leading to many application-level bugs [P+13,P+14].

rename

Rename a file is atomic.

In the Emacs example, I do not fully understand why it needs to write('foo.txt.tmp') and then rename it to foo.txt. Why no open('foo.txt') and then write the new content? Is it because open and write are not atomic but rename is? Yes I think so.

File stats

stat() or fstats (what is the difference?). A lot of information including:

stat $(mktemp)
#+sh stat $(mktemp)

This information is kept in a structure called inode.

Remove file

Remove a file is same as unlink a file.

> strace rm foo
unlink("foo")

Make directories

mkdir("foo", 0777)

read

https://blog.delphij.net/posts/2021/06/fsck_msdosfs/

Back to Home