Perhaps most people working with computers already know that computers, down below, work with 1s and 0s. This is something that I've known for years, without really paying much attention to it. Therefore, this was just a fact that I had accepted, and dare I say, taken for granted. This was true until about 1-1.5 years ago.

I had to write a FAT12 driver from scratch, in C.

"Why on earth would someone decide to do such a thing?", you may ask.

I did so because this project was a part of my Systems Programming graduate course at Cleveland State University. Most of my acquaintances complained left and right that they don't see any benefit to this topic and project, pointing to the fact that FAT12 was used for floppy disks (when adding this hyperlink, I had to think about the possibility that some readers of this post may not know what a floppy disk is), and nobody uses a floppy disk in 2018.

I am a firm believer that my acquaintances were missing the main point of this project: thinking about data at a very low level, as close as possible to the hardware. And this is the very reason that I decided to dedicate an article to this topic. Without further ado, let's dive into the subject.

From Letters, to Words, to Sentences

Let's consider how humans communicate with each other:

  • Letters from a set, also known as alphabet, are used to form words.
  • Words are combined together to make sentences.
  • Sentences follow each other to form a conversation.

Much like humans, computers need a set of letters which are combined to form higher-level constructs. While the English language is comprised of 26 letters, computers only have 2 letters: 1 and 0.

By using 2 letters, and without any combination at play, we can only have 2 possible constructs which is obviously very limiting. However, if we put together 2 letters from the computer alphabet, we can form 4 different combinations: 00, 01, 10, 11.

What happens if we consider putting together 4 letters from the computer alphabet?

0000, 0001, 0010, 0011, 0100, 0101, 0110, 0111, 1000, 1001, 1010, 1011, 1100, 1101, 1110, 1111.

By using 4 digits, and considering the fact that each digit can only be 1 or 0, there are a total of 16 possible combinations. That is right: you can calculate this number by raising 2, which is the number of possible values for a single bit, to the power of 4, which is the number of bits being combined together. In mathematics, this is called permutation.

From here on, I am going to use the more formal term "bit" to refer to a single letter from the computer alphabet.

Byte

If 8 bits are combined together, a byte is formed. For example: 01001101.

In order to find out how many possible values we can represent by using a single byte, we can simply calculate the total number of permutations:

2 ^ 8 = 256

Which means, we can represent 256 different values by using a single byte. How these 256 possible values are used is completely up to interpretation. For example, you can represent integer numbers in the inclusive range [-128, 127] or [0, 255].

With the above numbers and knowledge in mind, imagine that you are working with a 4-byte integer variable. Would you be able to find out how many possible values can be represented by this variable?

Do you notice the problem?

When working a lot with memory addresses, values, and low-level corners of the computing world, you will end up seeing lots of 1s and 0s, and this can make life very difficult.

As an example, going back to the FAT12 example I mentioned in the beginning of this post, bad clusters are identified by 1111 1111 0111 in the fat entry table. Frankly, that's a handful of digits to see and work with. Moreover, it is somewhat difficult to make much sense of binary numbers without doing some sort of calculation in your head.

This is exactly why using hexadecimal numbers to represent data in computers is very frequent. Therefore it is important to be familiar with it.

Numeral Systems and Bases

The numeral system that humans use is decimal, which literally means it uses base 10. That's why a single digit in this system can take 10 possible values: 0, 1, 2, 3, 4, 5, 6, 7, 8, and 9.

As discussed earlier in this post, in computers, each digit van take 2 possible values: 0, and 1. Therefore, the numeral system that computers understand at a very low level is binary, or base 2.

Hexadecimal is a numeral system that uses base 16, which means each digit in this system can take 16 possible values.

"How can we represent 16 possible values using symbols that we understand?", you may ask.

The simplest solution is to use the 10 numeric digits that we already use in the decimal system, plus the first 6 alphabetical letters. Therefore, each digit in the hexadecimal system has a value from the following set: {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, a, b, c, d, e, f}

Can you calculate how many different values a single hexadecimal digit can represent?
Similarly, can you calculate how many hexadecimal digits are needed to represent a byte?

As an example, we can represent the value for bad clusters in a FAT12 filesystem by using FF7, instead of 1111 1111 0111. Each hexadecimal digit is able to cover the permutations of 4 binary digits. Therefore, we can condense a 12-digit binary number into a 3-digit hexadecimal number.

In order to be more explicit about the base of a value (e.g. 12), hexadecimal numbers usually have a prefix of "0x" (number 0, followed by an x). Therefore, the value for a FAT12 bad cluster is written as 0xFF7.


Conclusion

The goal of this article was to explain binary numbers and higher-level values using simple terms, and to link it to the way humans use letters and words to communicate. Therefore, topics such as converting between numerical bases are excluded.

If you write software at higher levels, chances are you may not have to deal with binary or hexadecimal values at all. However, some degree of familiarity with this concept can greatly benefit you as a software developer, engineer, or computer scientist. I am hoping that this article can serve as a light introduction to this subject.

Please feel free to share your thoughts or feedback by sending me an email: k.nejadfard@gmail.com