VFS, proc and root filesystems

The VFS (Virtual File System) is a subsystem of the kernel that implements the file and filesystem-related interfaces provided to user-space programs. All filesystems rely on the VFS in order to coexist and interoperate. Figure 1 shows the three major layers of a filesystem implementation. The first layer is the filesystem interface based on the open(), read(), write() and close() calls and on file descriptors. The VFS, second layer, enables system calls to work regardless of the filesystem or underlying physical medium as Figure 2 indicates. The third layer of the architecture implements the file system type or the remote file system protocol.

Figure 1: Schematic view of a VFS [1].
Figure 1: Schematic view of a VFS [1].
Figure 2: The flow of data from user-space issuing a write() call, through the VFS’s generic system call, into the filesystem’s specific write method, and finally arriving at the physical media. [2].
Figure 2: The flow of data from user-space issuing a write() call, through the
VFS’s generic system call, into the filesystem’s specific write method, and finally arriving at the physical media. [2].
In other words, VFS provides a common file model that can represent any filesystem’s general feature and behavior. VFS layer serves two important tasks. First, it separates generic filesystem operations from their implementation by defining a clean VFS interface. For the same machine several implementations for the VFS interface may coexist and consequently allowing access to different types of file systems. Furthermore, with VFS a file can be uniquely represented throughout a network. VFS is based on a V-node, which is a file representation structure. V-node contains a numerical designator for a network-wide unique file. The kernel retains one V-node structure for each active node (directory or file). Linux however does not have V-node. Instead, a generic i-node structure is used. Although the implementations differ, the V-node is conceptually the same as a generic i-node. Both point to an i-node structure specific to the file system.

The flow of Figures 1 and 2 shows that VFS activates filesystem operations to handle requests (local or remote) according to their filesystem types. File handles are constructed form the relevant V-nodes and are passed as arguments to these procedures.

Figure 3 shows a generic overview of VFS that separates the filesystem-specific structures from the rest of the kernel by “translating” system calls to proper filesystem type functions.

Figure 3: Overview of filesystem management structure [3].
Figure 3: Overview of filesystem management structure [3].
The VFS is object-oriented. The four primary object types of the VFS are:

  • The superblock object, which represents a specific mounted filesystem.
  • The i-node object, which represents a specific file.
  • The dentry object, which represents a directory entry, which is a single component of a path.
  • The file object, which represents an open file as associated with a process.

Each of the above objects contain a pointer to a function table that lists the addresses of the actual functions that implement the defined operations (listed below) for that particular object. Hence, the VFS layer can perform an operation on one of the four above objects by calling the suitable function from the object’s function table without having knowledge of the kind of the object.

In each object an operation object is included that describes the methods that the kernel invokes against the four objects above:

  • The superblock operations object (<linux/fs.h>) contains methods such as write_inode() and sync_fs() that the kernel can invoke on a specific filesystem.
  • The i-node operations object (<linux/fs.h>) contains methods such as create() and link() that the kernel can invoke on a specific file.
  • The dentry operations object (<linux/dcache.h>) contains methods such as d_compare() and d_delete() that the kernel can invoke on a specific directory entry.
  • The file operations object (<linux/fs.h>) contains methods such as read() and write() that the kernel can invoke on an open file.

/proc filesystem (/procfs)

The /procfs (process filesystem) is an additional mechanism (registered to the VFS layer) for the kernel and kernel modules to send information to processes. It exists only in kernel memory and is typically mounted at /proc. Reading or writing files in /procfs invokes kernel functions that simulate reading or writing from a real file. /procfs contents are computed on demand according to user file I/O requests. In other words, /procfs provides a method of communication between kernel space and user space.

The /proc file implements two important functions: a directory structure and the file contents within. A UNIX filesystem is a set of file and directory i-nodes identified by their
i-node numbers. The /proc filesystem defines a unique and persistent i-node number for each directory and the associated files. Given this i-node number, /procfs identifies the required operation when a user tries to read from a file i-node or performs a lookup in a particular directory i-node. After the reading from one of these files, /procfs collects the appropriate information and formats it into textual form. Then, /procfs places this information into the requesting process’s read buffer.

The mapping from i-node number to information type splits the i-node number into two fields. In Linux the i-node number is 32 bits. The top 16 bits define the Process ID (PID) and the remaining bits define the type of information that is requested about that process. A zero PID in the i-node number means that the i-node contains global information. Separate global files exist in /proc to report information such as the kernel version, drivers currently running and performance statistics. The kernel of course can dynamically allocate new /proc i-node mappings. Each global /proc filesystem entry contains the file’s i-node number, file name and access permissions along with the special functions used to generate the file’s contents. Figures 4, 5 and 6 shows a tour into /proc filesystem.

Figure 4: Interactive tour of /proc.
Figure 4: Interactive tour of /proc.

/root filesystem

A filesystem* is a hierarchically structured tree. The root of the tree on UNIX-like operating systems is the root directory/filesystem which is identified by the slash character “/”. The sub-trees of the root directory include other subdirectories and files as shown in Figure 7.

A root filesystem is also contained (“root on root”) on the same partition on which the root directory is located (/root directory is where all the home directories for root user on a system are stored).

The root filesystem is the “parent” of the entire filesystem [6]. All other directories are “children” of this directory. The partition which the root filesystem resides on, is mounted during boot, i.e. when the system is booted up all the other filesystems are mounted to the root filesystem. If the root filesystem is corrupted that means that the system becomes unbootable (of course using a boot flash drive as special measure could be bootable).

*filesystem definition can be found in several papers, books and websites such as [7] and [8]: “A filesystem is the methods and data structures that an operating system uses to keep track of files on a disk or partition; that is, the way the files are organized on the disk.”

Figure 5: /proc CPU Information file.
Figure 5: /proc CPU Information file.
Figure 6: Information about Mozilla running process with PID 1797 in /proc.
Figure 6: Information about Mozilla running process with PID 1797 in /proc.

Figure 7: Linux file system layout from a RedHat system [5].
Figure 7: Linux file system layout from a RedHat system [5].
Important directories in Linux include:

/bin: Hold the most commonly used essential user programs.
/sbin: Hold essential maintenance or system programs (difference between the programs stored in /bin and /sbin is that the programs in /sbin are executable only by root).
/etc: Store the system wide configuration files required by many programs.
/home: The place where all the home directories for all the users on a system are stored.
/root: The directory where all the home directories for root user on a system are stored.
/dev: The special files representing hardware are kept in it.
/tmp and /var: Directories which are used to hold temporary files or files with constantly varying content.
/usr: Most programs and files directly relating to users of the system are stored.

/proc: Explained before. To sum up, it is a VFS, i.e. a special filesystem provided by the kernel as a way of providing information about the system to user programs. The main tasks of /proc filesystem is to provide information about the kernel and processes.

References

[1] Abraham Silberschatz, Peter B. Galvin, Greg Gagne, “Operating System Concepts”, 7th edition,
[2] Linux Kernel Development, 3rd edition.
[3] Linux Virtual FileSystem, Online: http://bethechip.com/
[4] Advanced Linux Programming, Online: http://www.advancedlinuxprogramming.com/
[5] General overview of the Linux file system, Online: http://www.tldp.org/LDP/intro-linux/html/sect_03_01.html
[6] Sven Vermeulen, “Linux Sea”, Chapter 5: The Linux File System
[7] Wikipedia, File System, Online: http://en.wikipedia.org/wiki/File_system
[8] Moshe Bar, “Linux File Systems”.

Advertisements

Cracking Passwords with JTR

The previous two posts covered a part of the theory behind password cracking. Now, we will show how to crack passwords of two famous leaked password hashes list: Formspring and Linkedin. If you need the lists drop me an email.

First, we started by using the single mode attack on John the Ripper (JTR). Of course we didn’t get any positive results since as we explained in the previous post single mode uses login names, GECOS and users’ home directory names as candidate passwords. In the virtual machine that we installed for the purposes of this tutorial there isn’t much information that can be used for this method:

harrys@harrys-VirtualBox:~$ john –single 
'/harrys/formspring.txt' 
Loaded 1 password hash (generic crypt(3) [?/64]) 
guesses: 0 time: 0:00:00:25 100% c/s: 105 trying: harrys1928 – hharrys1900 


harrys@harrys-VirtualBox:~$ john --single 
'/harrys/SHA1.txt' 
Loaded 1 password hash (generic crypt(3) [?/64]) 
guesses: 0 time: 0:00:00:25 100% c/s: 106 trying: harrys1928 – hharrys1900

Next, we used the dictionary mode attack. Unfortunately, we didn’t manage to crack any
passwords using this method although we used a variety of password dictionary lists. Let’s see some of the results for various dictionary lists that either are included in JTR or one can find online.

a. password.lst 

harrys@harrys-VirtualBox:~$ sudo john -- 
wordlist=/harrys/john/run/password.lst 
'/harrys/formspring.txt' 
Loaded 1 password hash (generic crypt(3) [?/64]) 
guesses: 0 time: 0:00:00:24 100% c/s: 143 trying: !@#$% - sss 

harrys@harrys-VirtualBox:~$ sudo john -- 
wordlist=/harrys/john/run/password.lst '/harrys/SHA1.txt' 
Loaded 1 password hash (generic crypt(3) [?/64]) 
guesses: 0 time: 0:00:00:35 100% c/s: 98.88 trying: !@#$% - sss 

b. common-passwords.txt 

harrys@harrys-VirtualBox:~$ sudo john -- 
wordlist='/harrys/Downloads/common-passwords.txt' 
'/harrys/formspring.txt' 
Loaded 1 password hash (generic crypt(3) [?/64]) 
guesses: 0 time: 0:00:00:06 100% c/s: 135 trying: uucp – zmodem 

harrys@harrys-VirtualBox:~$ sudo john -- 
wordlist='/harrys/Downloads/common-passwords.txt' 
'/harrys/SHA1.txt' 
Loaded 1 password hash (generic crypt(3) [?/64]) 
guesses: 0 time: 0:00:00:07 100% c/s: 116 trying: uucp - zmodem

c. Given-Names.txt 

harrys@harrys-VirtualBox:~$ sudo john -- 
wordlist='/harrys/Downloads/Given- 
Names.txt' '/harrys/formspring.txt' 
Loaded 1 password hash (generic crypt(3) [?/64]) 
guesses: 0 time: 0:00:01:27 100% c/s: 98.81 trying: Zainab – Zygmunt 

harrys@harrys-VirtualBox:~$ sudo john -- 
wordlist='/harrys/Downloads/Given- 
Names.txt' '/harrys/SHA1.txt' 
Loaded 1 password hash (generic crypt(3) [?/64]) 
guesses: 0 time: 0:00:01:03 100% c/s: 136 trying: Zainab – Zygmunt 

d. Huge-List.txt 

harrys@harrys-VirtualBox:~$ sudo john -- 
wordlist='/harrys/Downloads/Huge- 
List.txt' '/harrys/formspring.txt' 
7Loaded 1 password hash (generic crypt(3) [?/64]) 
guesses: 0 time: 0:01:43:14 100% c/s: 140 trying: gethostbyname/gethostbyaddr - Karntnerstrasse- Rotenturmstrasse 

harrys@harrys-VirtualBox:~$ sudo john -- 
wordlist='/harrys/Downloads/Huge- 
List.txt' '/harrys/SHA1.txt' 
Loaded 1 password hash (generic crypt(3) [?/64]) 
guesses: 0 time: 0:01:18:58 100% c/s: 183 trying: gethostbyname/gethostbyaddr - Karntnerstrasse- Rotenturmstrasse

We have to mention that JTR also supports hybrid attacks, i.e. attacks that check for variations of a word or a combination of dictionary words. This method like the dictionary attack didn’t help us to crack the provided password lists.

Let’s move to our approach using the brute force attack method. As we explained, it consists of systematically checking all possible keys or passwords until the correct one is found. The program goes through all the possible plaintexts, hashing each one of them and then comparing it to the input hash. JTR uses character frequency tables to try first plaintexts containing more frequently used characters. This method is useful for cracking passwords which do not appear in dictionary wordlists, but it does take a long time to run. The default incremental mode checks potential passwords up to 8 characters. Just to show how time consuming this method is, we let our computer running in this mode for 2 days but the only result we got is that the our pc got overheated.

JTR program has an option for brute force attack that can make this process quicker. If we have prior knowledge of the password format, we can make this operation faster, by using a variation in incremental mode that only checks certain formats. Specifically, if we run:

./john – format=[MODE] password_list_to_crack.txt

JTR will use the MODE that we specified.

For example, for SHA1 mode (Fig. 1) JTR achieves cracking SHA1 hashes by first guessing the password using dictionary and incremental mode, and then generates theSHA1 hash for this password. Then, it checks if the hash that was generated for the password matches a hash in our file. When JTR finds a match then we recover the password from the provided list.

Fig. 1: One iteration within SHA1 compression function [1].
Fig. 1: One iteration within SHA1 compression function [1].
Our goal at this point is to do some research and find which hash function our lists used.

Formspring list:

This list used SHA256 hash function to encrypt the data that contains. This conclusion is based on two facts. First the provided hashes in this list have the follow form:

51e9b7073797d875a8862fbba743f7febec03164db4fdf91f10987b95be1852b

That means that the output of this hash function has 64*4 = 256 bits. This narrow our options of hash functions to algorithms like HAVAL256, RIPEMD256, SHA-3-256, SHA256 and GOST. JTR supports the two last methods. We checked both of those two formats in JTR. The GOST format processing time was too long without any result. On the other hand the SHA256 algorithm format was faster and we came to the conclusion that formspring list is hashed by that function (Fig. 2). This fact is also supported by various sites.

Fig. 2. Applying SHA256 algorithm format to JTR.
Fig. 2. Applying SHA256 algorithm format to JTR.

Linkedin list:

This list used sha1 hash function to encrypt the data that contains. This conclusion is based on two facts. First the provided hashes in this list have the follow form:

37b5b1edf4f84a85d79d04d75fd8f8a1c3d2fbde

That means that the output of this hash function has 40*4 = 160 bits. This narrow our options of hash functions to algorithms like HAVAL160, RIPEMD160, SHA0, TIGER(2)-160 and SHA1. JTR supports only the last hash function and this is the function that it was used to hash the passwords. This argument is also supported by the fact that the list name is SHA1.txt and the confirmation by the LinkedIn blog after the list was leaked. The new unstable version of JTR supports a mode called raw-sha1-linkedin that is especially for this specific list that it was leaked from LinkedIn.

Before the presentation of our results, we should mention that we could use also rainbow tables with JTR although the program does not support this method per se. The idea is to create a dictionary file using JTR and another software (e.g. CowPatty) to generate rainbow tables. In this post we focus our interest onto brute force attach method which generates a hash for each password and then compare it immediately with the correct password hash.

Fig. 3. JTR results for formspring list (SHA256 format).
Fig. 3. JTR results for Formspring list (SHA256 format).
Fig. 4. JTR results for linkedin list (sha256 format).
Fig. 4. JTR results for Linkedin list (sha256 format).

All the results above were obtained through running JTR on our computer for thirteen hours (let it run more time will give more cracked passwords). The results are quite impressive. To understand what these results mean we used graphs for better representation.

Fig. 5. (a)
Fig. 5. (a) Graphical representation of the Formspring cracked list results: Cracked passwords as a function of time.
Fig. 5 (b). Graphical representation of the formspring cracked list results: Combinations per second as a function of time.
Fig. 5. (b) Graphical representation of the Formspring cracked list results: Combinations per second as a function of time.
Fig. 6. (a) Graphical representation of the linkedin cracked list results: Cracked passwords as a function of time.
Fig. 6. (a) Graphical representation of the Linkedin cracked list results: Cracked passwords as a function of time.
Fig. 6. (b) Graphical representation of the linkedin cracked list results: Combinations per second as a function of time.
Fig. 6. (b) Graphical representation of the Linkedin cracked list results: Combinations per second as a function of time.

For the Formspring list we are able to crack 1392 passwords in around nine and a half hours, whereas for the Linkedin list we cracked 1108644 passwords in around thirteen hours. For both lists the combinations per second are decreasing as the time runs and the guesses are increasing. Of course, if we were able to run the program for more time, then our results would be more impressive because we could present all the passwords from the hashed lists.

Observations

Based on the results presented in the previous paragraphs, a number of observations can be made: First, we noticed that the LinkedIn list is cracked easier than the Formspring list. That means that using the LinkedIn list we cracked faster more passwords than the Formspring list. The reason is that in the LinkedIn list passwords are stored as unsalted SHA1 hashes and thus it takes less amount of time to crack this list. Unsalted hashes are much less secure than salted hashes because the password is not merged with another combination and then hashed. So, it takes less time to recover the passwords from Linkedin list compared to the Formspring one. On the other hand, the Formspring list uses the SHA256 function to hash the password after the password is combined with a prep-ending salt. Hence, it takes more time for JTR to recover the passwords from the Formspring list. Another interesting observation is that JTR brute force attack cannot easily crack both lists after a considerable amount of time. Although JTR starts cracking password easily at the beginning, as the time is running JTR recovers less passwords than before because it checks for password with a larger length.

What is unique on those two password lists?

The cracked passwords from the Formspring list show that the formula that it was used to hash these data was SHA256(“00-99”+”password”), i.e. each password was hashed after a number with range from 00 to 99 (salt) prep-ends the password. For example the password named “harrys” was hashed using the SHA256 hash algorithm after mixed with a random number from 00 to 99, i.e. SHA256(“58”+”harrys”). This hash weak method is quite easy to be cracked since the used salt has a range of 100 possibilities.

A view at the LinkedIn list reveals that many of the hashes contain five zeros at the beginning. That is something impossible for passwords that are hashed with the SHA1 hash function. We assumed that someone replaced the first five digits of some hashes with zeros so we would not be able to crack them. We removed these lines starting with “00000” as they are obfuscated hashes. This technique can be done using the “grep” command, i.e. “grep -v ‘^00000′” which will exclude all the lines from Linkedin file containing the “00000” string. We can get the same result using the raw-sha1-linkedin format mode in JTR that works with the same way.

Intro to John The Ripper

In this post we will present a short intro regarding the software password cracking tool named “John the Ripper” (JTR). This tool is designed to be both powerful and fast, and it combines several cracking modes in one program. It is available for many flavors of Unix, Windows, DOS, BeOS, and OpenVMS and it is the most popular available password cracking tool. The tool can be customized to mix different cracking techniques and also can auto detect password hash types.

JTR can work in three different modes: wordlist mode, single mode and incremental mode. Wordlist mode takes text string samples from a file (wordlist) containing words found in a dictionary or real passwords cracked before). Then it encrypts those words in the same format as the password being examined (including both the encryption algorithm and key), and comparing the output to the encrypted string. In addition, it can perform alternations to the data from the wordlist and compare these as well. Single mode uses login names, GECOS (information about the account or its user(s) such as their real name and phone number) and users’ home directory names as candidate passwords. This mode is faster than the wordlist mode since the used information is compared with passwords (and hashes with the same salt) for the account that is was taken from. JTR incremental mode is the most powerful mode since it can try any character combination until the password is found. However, this technique is often impractical due to the possible number of combinations that increase the computational cost (overhead). This mode actually uses trigraph frequencies tables (three letters combined together), separately for each character position and for each password length.

JTR supports two of the techniques described in the previous post, dictionary and brute force attacks. Single and wordlist modes are practically dictionary attacks, since as we explained above each one of them uses a wordlist that contains strings that can be used as a password. Of course, single mode will search in a shorter wordlist in length but nonetheless it can be classified as a dictionary attack. Moreover, JTR supports brute force attacks since the incremental mode theoretically is an implementation of this attack. As in brute force attacks, the incremental mode tries every combination of characters until the password is found. The practical difference is that incremental mode uses characters with trigraph frequency.

Information of how to install and use JTR can be found here. In the next post, we will show how to crack one million passwords from online leaked lists.

Password Cracking Techniques

There are a lot of different ways to crack a password. The figure below shows some scenarios attempts that can occur. If the attacker has access to a machine and consequently to the password hashes then he can use any password cracking technique of the follow:

Figure 1: The flow of password attacking possibilities [1].
Dictionary attack

Dictionary attack uses a wordlist that contains words, phrases, common passwords and other strings that can used as a password. The password is cracked after comparing every word in the wordlist (after hashed) to the password hash. Dictionary files are created by extracting words from large bodies of text and even from real databases of passwords. These files are also containing words with appending numbers to the end of them (e.g. “harrys123”) and with their leet speak equivalent (“harrys” becomes ”h4rry5”). This method is popular because many people use common words as passwords. We have to mention that dictionary wordlists are also available for technical and foreign languages. Although, dictionary attacks have a fairly high speed, are ineffective against passwords that are not based on a dictionary word. For instance, the password “#!d^3m4&0z” is improbable to be found by a dictionary attack. Moreover, by “salting” the password, the attacker would need to precompute hashes for each password coupled with each possible “salt”. This can be a huge restraint for the attacker, especially if the salt is large enough.

Brute Force Attack

Brute force attacks can be used in cases that dictionary attacks were unable to recover a password. The reason is that brute force attacks try every combination of characters up to a given length until the password is found. For instance, given a length of 6 characters in a password, brute force attack will follow a sequence of “aaaaaa”, ”aaaaab”, ”aaaaac” and so on. It is obvious that this attack is the slowest method of password crack attacks, very computationally expensive and the least efficient in terms of cracked hashes per processor time. Nevertheless, brute force attack will always eventually find the password and can be very successful on short and simple passwords.

Rainbow table attack

Rainbow table attack is an implementation of the time-memory trade-off method developed by Philippe Oechslin. It is called like that because it consumes time to be created, but after that password recovery it is very quick.The idea behind rainbow tables attacks is lookup tables*. The previous attacks, i.e dictionary and brute force, generate the hash for each password and then compare this hashed password to the correct password hash. On the other hand, rainbow tables compute hashes for words taken from a dictionary, store all these values into a table, retrieve the hash of the password to be cracked and do a comparison between the real password hash and the password hash from the created table. The difference however between these lookup tables is that rainbow tables sacrifice hash cracking speed to make lookup tables smaller. Thus, rainbow tables are more effective compared to simple lookup tables because the same amount of space is used to store more cracked password hashes. However, these tables are also quite large. For example for passwords up to 7 characters the MD5 rainbow table is 64GB [2]. These tables can crack non-dictionary based words very fast but due to the nature of this attack not all passwords can be found. Like dictionary attacks, rainbow table attacks can be prevented by “salting” the passwords before storing them.

Figure 2: The spectrum of possibilities for password cracking attacks [3].
Figure 2: The spectrum of possibilities for password cracking attacks [1].
*lookup tables are data structures containing pre-computed hashes of the passwords in a password dictionary and also their corresponding password.

Password Hashing and Cryptographic Hash Functions

According to the National Institute of Standards and Technology (NIST) [1], a hash algorithm takes binary data (message) and produces a condensed representation, called the message digest. Particularly, a hash function accepts a variable-length input and generates a unique fixed-size output (“compression property”).

The origin of the term “hash” has non-technical meaning. Hans Peter Luhn (1896 – 1964), a computer scientist for IBM, was the first that used the term “hash” to explain that hash functions “chop” the input domain into many sub-domains that get “mixed” into the output range to improve the uniformity of the key distribution. Hash functions are used for password protecting because the main purpose for this issue is to encrypt passwords in a form that is unfeasible to decrypt and also confirm if the given password is correct.

Hash function classification: Hash functions informally have the property of easy computation, i.e. given y and an input x, y(x) is easy to compute. That property implies an unkeyed hash function, but in general hash functions refer to both unkeyed and keyed hash algorithms. Two types of hash functions are the Modification Detection Codes (MDCs) and Message Authentication Codes (MACs). MDCs are a subclass of unkeyed hash functions and their purpose is to ensure the required by applications data integrity. MDCs are classified into one-way hash functions (OWHFs) and collision resistant hash functions (CRHFs). OWHFs are the functions that given an output (hashed input) is difficult to find the input (pre-specified hash value) and CRHFs are functions that given two inputs it’s difficult to find the same hash-value output. On the other hand, MACs purpose is to assure, without additional mechanisms, the source of a message and its integrity. MACs have two functionally parameters, a secret key and a message input and they belong to the class of keyed hash functions.

Figure 1: Hash function classification [1].
Figure 1: Hash function classification [2].

Cryptographic hash functions (CHFs) can be used to implement password hashing. We have to mention that although CHFs are hash functions, the term hash function per se doesn’t mean that every hash function is cryptographic as well. A cryptographic function is usable for security purposes because it offers a high degree of protection against unintentional and intentional modification. The cost for this purpose is that they are far more computationally intensive and thus slower than typically hash functions.

A cryptographic hash function must satisfy some properties depending on the application. CHFs are one way (“preimage resistance”) procedures, i.e. h is “preimage resistant” if given a hash value y it is computationally infeasible to find x such that h(x)=y. Moreover, CHFs have the requirement of “preimage resistance”, i.e. given the input x1 and its hash value h(x1), generating an input x2 with hash value h(x2)=h(x1) is computationally infeasible. Also, CHFs are “collision resistance”, i.e h is “collision resistant” if it is computationally infeasible to find any two inputs x1 and x2 such that h(x1)=h(x2). For example, if we hash the words “harrys” and “harris” the result is totally different:

hash(“harrys”) = 47682637ac38fe39739da284829b937482398ef384

hash(“harris”) = ab1738e02893829ef9312cb14c14ca452400030402

Now that we have an overview of hash functions let’s explain why we hash passwords and the properties that password hashing must have. First of all, password hashing is somehow necessary because we want to be more “secure” from attackers that maybe get access of our database. Specifically, by password hashing we are trying to make it as painful as possible for attackers to retrieve our passwords. Password hashing will not make our passwords impervious to attacks but it will perform “damage containment” in the event of a breach. Hashes have the advantages to generate quick and also are small to store. However, these advantages are also their weakness because it is simple after all to compare hashes of basic passwords with a table of precalculated hashes and then “dehash” all the passwords. Thus, password hashing requires to be fast on software, slow on hardware and have a unique salt per password. 

The solution is to hash more than just the password. This procedure is called “salting”. Not all the passwords use “salt” but when they do it boosts the security of each password so it won’t crack. The “salt” is a random value (not duplicated between passwords) appending or prepending the password before hashing. We do this process in order two passwords don’t generate the same hash value. In order to check if the password is correct we need the “salt”, hence it is stored in addition to the hash or as part of the hash string itself so that the hashed password can be reconstructed correctly for comparison. “Salting” is mostly used to prevent precomputation attacks against the digested message. Precomputation-based attack method works only for small ranges of input values, so by “salting” our password we increase the size of it and the resulting precomputation table with possible hashes would be too large even for a small subset of passwords.

Figure 2: Examples of hashed passwords [3]
Figure 2: Examples of hashed passwords [3].