README.rst 7.37 KB
Newer Older
ticki's avatar
ticki committed
1
.. image:: https://rawgit.com/ticki/tfs/master/icon.svg
ticki's avatar
ticki committed
2
3
    :alt: The TFS icon.
    :align: center
ticki's avatar
ticki committed
4

ticki's avatar
ticki committed
5
6
7
8
9
================================
TFS: Next-generation file system
================================

TFS is a modular, fast, and feature rich next-gen file system, employing
10
modern techniques for high performance, high space efficiency, and high
ticki's avatar
ticki committed
11
12
13
scalability.

TFS was created out of the need for a modern file system for Redox OS, as a
14
replacement for ZFS, which proved to be slow to implement because of its
ticki's avatar
ticki committed
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
monolithic design.

TFS is inspired by the ideas behind ZFS, but at the same time it aims to be
modular and easier to implement.

TFS is not related to the file system of the same name by *terminalcloud*.

*While many components are complete, TFS itself is not ready for use.*

.. image:: https://img.shields.io/github/license/ticki/tfs.svg
    :target: https://en.wikipedia.org/wiki/MIT_License
    :alt: MIT/X11 permissive license.
.. image:: https://img.shields.io/github/stars/ticki/tfs.svg?style=social&label=Star
    :alt: GitHub Stars

ticki's avatar
ticki committed
30
31
32
Design goals
------------

ticki's avatar
ticki committed
33
34
TFS is designed with the following goals in mind:

ticki's avatar
ticki committed
35
36
37
38
39
Concurrent
    TFS contains very few locks and aims to be as suitable for multithreaded
    systems as possible. It makes use of multiple truly concurrent structures
    to manage the data, and scales linearly by the number of cores. **This is
    perhaps the most important feature of TFS.**
40
Asynchronous
41
    TFS is asynchronous: operations can happen independently; writes and
Ofek Lev's avatar
Ofek Lev committed
42
    reads from the disk need not block.
ticki's avatar
ticki committed
43
Full-disk compression
44
    TFS is the first file system to incorporate complete full-disk compression
ticki's avatar
ticki committed
45
46
    through a scheme we call RACC (random-access cluster compression). This
    means that every cluster is compressed only affecting performance slightly.
47
    It is estimated that you get 60-120% more usable space.
ticki's avatar
ticki committed
48
Revision history
49
    TFS stores a revision history of every file without imposing extra
ticki's avatar
ticki committed
50
51
52
    overhead. This means that you can revert any file into an earlier version,
    backing up the system automatically and without imposed overhead from
    copying.
53
54
55
56
Data integrity
    TFS, like ZFS, stores full checksums of the file (not just metadata), and
    on top of that, it is done in the parent block. That means that almost all
    data corruption will be detected upon read.
ticki's avatar
ticki committed
57
58
59
60
Copy-on-write semantics
    Similarly to Btrfs and ZFS, TFS uses CoW semantics, meaning that no cluster
    is ever overwritten directly, but instead it is copied and written to a new
    cluster.
ticki's avatar
ticki committed
61
62
63
64
O(1) recursive copies
    Like some other file systems, TFS can do recursive copies in constant time,
    but there is an unique addition: TFS doesn't copy even after it is mutated.
    How? It maintains segments of the file individually, such that only the
65
    updated segment needs copying.
ticki's avatar
ticki committed
66
67
Guaranteed atomicity
    The system will never enter an inconsistent state (unless there is hardware
68
    failure), meaning that unexpected power-off won't ever damage the system.
ticki's avatar
ticki committed
69
70
Improved caching
    TFS puts a lot of effort into caching the disk to speed up disk accesses.
ticki's avatar
ticki committed
71
    It uses machine learning to learn patterns and predict future uses to
ticki's avatar
ticki committed
72
73
    reduce the number of cache misses. TFS also compresses the in-memory cache,
    reducing the amount of memory needed.
ticki's avatar
ticki committed
74
Better file monitoring
75
    CoW is very suitable for high-performance, scalable file monitoring, but
ticki's avatar
ticki committed
76
77
78
79
80
81
82
83
    unfortunately only few file systems incorporate that. TFS is one of those.
All memory safe
    TFS uses only components written in Rust. As such, memory unsafety is only
    possible in code marked `unsafe`, which is checked extra carefully.
Full coverage testing
    TFS aims to be full coverage with respect to testing. This gives relatively
    strong guarantees on correctness by instantly revealing large classes of
    bugs.
84
85
SSD friendly
    TFS tries to avoid the write limitation in SSD by repositioning dead sectors.
ticki's avatar
ticki committed
86
87
88
89
Improved garbage collection
    TFS uses Bloom filters for space-efficient and fast garbage collection. TFS
    allows the FS garbage collector to run in the background without blocking
    the rest of the file system.
90

ticki's avatar
ticki committed
91
92
93
94
95
FAQ
---

Why do you use SPECK as the default cipher?
    SPECK is a relatively young cipher, yet it has been subject to a lot of
Ofek Lev's avatar
Ofek Lev committed
96
    (ineffective) cryptanalysis, so it is relatively secure. It has really
ticki's avatar
ticki committed
97
    good performance and a simple implementation. Portability is an important
98
    part of the TFS design, and truly portable AES implementations without
ticki's avatar
ticki committed
99
100
    side-channel attacks is harder than many think (particularly, there are
    issues with `SubBytes` in most portable implementations). SPECK does not
ticki's avatar
ticki committed
101
102
    have this issue, and can thus be securely implemented portably with minimal
    effort.
ticki's avatar
ticki committed
103
How similar is TFS and ZFS?
Ofek Lev's avatar
Ofek Lev committed
104
    Not that similar, actually. They share many of the basic ideas, but
ticki's avatar
ticki committed
105
106
    otherwise they are essentially unconnected. But ZFS' design has shaped TFS'
    a lot.
ticki's avatar
ticki committed
107
108
109
110
111
112
113
Is TFS Redox-only?
    No, and it was never planned to be Redox-only.
How does whole-disk compression work?
    Whole-disk compression is -- to my knowledge -- exclusive to TFS. It works
    by collecting as many "pages" (virtual data blocks) into a "cluster"
    (allocation unit). By doing this, the pages can be read by simply
    decompressing the respective cluster.
ticki's avatar
ticki committed
114
115
116
117
118
119
Why is ZMicro so slow? Will it affect the performance of TFS?
    The reason ZMicro is so slow is because it works on a bit level, giving
    excellent compression ratio on the cost of performance. This horribly slow
    performance is paid back by the reduced number of writes. In fact, more
    than 50% of the allocations with ZMicro will only write one sector, as
    opposed to 3. Secondly, no matter how fast your disk is, it will not get
120
    anywhere near the performance of ZMicro because disk operations are
121
    inherently slow, and when put in perspective, the performance of the
ticki's avatar
ticki committed
122
    compression is really unimportant.
ticki's avatar
ticki committed
123
124
Extendible hashing or B+ trees?
    Neither. TFS uses a combination of trees and hash tables: Nested hash
125
126
    tables, a form of hash trees. The idea is that instead of reallocating, a
    new subtable is created in the bucket.
ticki's avatar
ticki committed
127
128
129
130
131
132

Resources on design
-------------------

I've written a number of pieces on the design of TFS:

Mario Rodas's avatar
Mario Rodas committed
133
- `SeaHash: Explained <http://ticki.github.io/blog/seahash-explained/>`_. This
ticki's avatar
ticki committed
134
  describes the default checksum algorithm designed for TFS.
135
- `On Random-Access Compression <http://ticki.github.io/blog/on-random-access-compression/>`_.
ticki's avatar
ticki committed
136
  This post describes the algorithm used for random-access compression.
Mario Rodas's avatar
Mario Rodas committed
137
- `Ternary as a prediction residue code <http://ticki.github.io/blog/ternary-as-a-prediction-residue-code/>`_. The
ticki's avatar
ticki committed
138
139
  use of this is related to creating a good adaptive (headerless) entropy
  compressor.
Mario Rodas's avatar
Mario Rodas committed
140
- `How LZ4 works <http://ticki.github.io/blog/how-lz4-works/>`_. This describes
ticki's avatar
ticki committed
141
  how the LZ4 compression algorithm works.
142
143
144
- `Collision Resolution with Nested Hash Tables <https://ticki.github.io/blog/collision-resolution-with-nested-hash-tables/>`_.
  This describes the method of nested hash tables we use for the directory
  structure.
ticki's avatar
ticki committed
145
146
- `An Atomic Hash Table <https://ticki.github.io/blog/an-atomic-hash-table/>`_.
  This describes the concurrent, in-memory hash table/key-value store.
ticki's avatar
ticki committed
147

148
149
150
Specification
-------------

ticki's avatar
ticki committed
151
152
The full specification can be found in `specification.tex`. To render it,
install `texlive` or another distribution with XeTeX, and run
153

154
.. code:: bash
Mario Rodas's avatar
Mario Rodas committed
155

ticki's avatar
ticki committed
156
    xelatex --shell-escape specification.tex
157
158

Then open the file named `specification.pdf`.