Understanding the Power of Bitwise Operators. No math needed

May 11, 2023

When learning a new programming language, there is usually a table in the documentation that shows the various operators that can be used with numbers. While we are all familiar with +,-,*, and /, there’s always that one section that most of us skip. These are the bitwise operators <<, >>, &, ^ and |. While at first, they might seem obscure, unhelpful, or tools for people who write in low level programming languages, they do serve a purpose. Even more surprising, some of the most useful ways that can be used do not require any math at all. Bitwise operations allow us to manipulate the binary representation of the data which turns out to be extremely useful. Let’s take a look at these strange operators and see if we can make sense of them.

Give me the Bits🍟

Binary is the only language computers understand. It consists of 1s and 0s. The context of those 1s and 0s are what give these numbers meaning when surfaced at the human level. This is similar to English words, whose meaning changes depending on the context. For example, the word “present” can mean a gift or the current time, depending on the context.

One common example where the context of the binary representation is important is with text encoding. If you've ever tried to open a file with the incorrect encoding, you may have seen the Null character (�). This character is used when the computer can't decode certain strings of bits that make up the text in the document you're reading. Switching to the correct encoding removes those characters, allowing you to see the actual text in the file. Our files are all just binary at the end of the day to a computer, but our file formats give them meaning.

A single bit is almost completely useless to perform tasks, so we usually work with them in groups. With 2 bits you can now represent 2^2 values or 4 different numbers using the combinations

With 3 you can represent 2^3 numbers or 8 different combinations

And so on and so forth. In modern programming we usually work with standardized groups of bits and so they get their own special names

Nibble - 4 bits. 2^4 or 16 values
Byte - 8 bits. 2^8 or 256 values
Word - 4 bytes on 32bit systems, 8 bytes on 64bit architectures

Most people have probably never heard of these terms, but they may have heard of 32-bit and 64-bit architectures. These terms refer to the largest sized number a computer can natively support. Whether your architecture is 32-bit or 64-bit determines the largest sized number you can work with, and those values are manipulated in 8-bit (1 byte) sections at a time.

A great way to visualize a common grouping of bits is to open up your windows calculator app, click the hamburger menu in the top left-hand corner, and switch your calculator from standard to programmer. Now you can see a number in their Hexadecimal, Decimal, Octal, and Binary forms. Above the ’>>’ button you can restrict the maximum and minimum integer size to what can be contained by a Byte, Word, Dword etc.

You can even switch to a binary mode by clicking the cascading dots button under the BIN value which will allow you to flip the bits on and off yourself. This will give you a much more intuitive understanding of the representation of 0s and 1s and how they translate to the other number formats than I could give in this short time.

view of calculator in binary mode — The integers are signed and support negative and positive ranges

For those of you not on Windows, Linux has the gnome-calculator app that offers similar functionality.

On MacOS the programmer calculator can be accessed by opening the regular calculator app and typing ⌘ then the number 3

macOS calculator app in programming mode — Typical of apple to hide this useful feature from us

Encoding numbers as binary has a unique advantage over its decimal representation. It allows us to use the binary itself as a sort of command, where sections of the binary correspond to certain functionality. By toggling the various ones and zeros on and off we can manipulate our program in an efficient and compact way with relatively small number sizes. Let’s look at an example. Say we have a byte of data that looks like this (the space is just for readability) 0010 0011 and we want to produce a number that only has the bottom 2 bits set 0000 0011. How would we do this? With a logic and, which uses the ampersand character ‘&’. But before we can create the final result, we need one more binary number. This is called the mask. The mask is a binary number that has bits turned on and off in key places so that when combined with a bitwise operator turn on and off, or mask sections of the original binary. Here’s an example.

0010 0011 (Original Binary)
    &    -> 0000 0011 (Resulting Binary)
0000 0011 (Mask)

What are the decimal representations of these 3 binary numbers? Who cares! We only care about whether certain bits in the resulting binary number are on or off. What the & is doing is at each index of the two numbers, it compares the individual bits, and produces a bit that is either zero or one. The rules for the resulting bit are as follows.

If they are both 0 the resulting bit is 0
If they are both 1 the resulting bit is 1
If one bit is 0 and the other bit is 1 then the resulting bit is 0

So really rules #1 and 2 can be combined to say “If the bit in the first binary number matches the bit in the second binary number, produce the same bit in the resulting binary number”. By leveraging these properties, we can create the resulting number 0000 0011 with the bit pattern we want.

Let’s do another one. We have a byte that looks like 1011 0011, and we want to create a byte that looks like this 1011 0000. So, we have to turn off the last two 1 bits in our binary number. Take a second to review the rules and see if you can construct the correct mask. Then look at the example below.

1011 0011
    &    -> 1011 0000
1011 0000

If you made the correct mask, give yourself a pat on the back. If you had trouble, don’t worry, this is new stuff and takes some time to internalize. I want to highlight again that we don’t need to know what any of these number’s decimal representation are, nor do we need to know what the mathematical result of these operations are. We simply are trying to create masks that have specific patterns, so we can get resulting binary numbers with the desired bit patterns.

Bitwise OR

We’ve seen how we can use the bitwise & operator to mask off bits we aren’t interested in from the original binary number, but what if we wanted to set specific bits in a binary number? We use a ‘|’ (the pipe character) to do a bitwise OR. Taking our previous example 1011 0011 if instead we wanted the result to be 1011 1111 we can set the two zeros to ones using the bitwise OR. First, we construct a mask which has 1s in the position we want to turn on. Then we execute it.

1011 0011
    |    -> 1011 1111
1011 1111

Looking back at the rules for a bitwise AND we see that they are similar to the bitwise OR except for the final rule.

If they are both 0 the resulting bit is 0
If they are both 1 the resulting bit is 1
If one bit is 0 and the other bit is 1 then the resulting bit is 1

Using this knowledge, constructing a mask that has the right properties should be easy. If you see 0 bits you want to turn on, just put 1 in the mask’s bits. If you have bits that are turned on that you want to stay turned on, just put a 1 there as well.

Invert Operator

The invert operator uses the tilde (~). It’s used to flip all the bits to their opposite state. The invert operator is special because it only needs a single number to work on, so there is no need for a mask. A little bit of programming minutiae, this type of operator is called a unary operator. Operators like + and - are examples of binary operators. This does not mean that they only work on binary numbers, but that it requires two numbers for them to work. See there’s that context again 😉

The behavior is straight forward. If a bit is 1 it will be 0 and vice versa. This is commonly used to produce a mask which is used in conjunction with the & operator. Suppose we have a number 0101 0010 and we wanted to turn off the first 1 in the binary to create the number 0001 0010. First, we construct a binary number that has a 1 in the spot we want to clear 0100 0000. Then we take that number 0100 0000 and invert it.

0100 0000 ~ -> 1011 1111

As you can see the inversion turned every 0 into a 1 and every 1 into a 0. We can then take that inverted number 1011 1111 and use it as a mask with the & operator on our original number 0101 0010

0101 0010
    &     -> 0001 0010
1011 1111

We then produce the binary number we want with the 1 cleared to 0 in the place we desired.

Exclusive OR (XOR)

Exclusive OR uses the ^ character (pronounced hat or caret). It’s used when you want to toggle certain bits on and off. For example, if we wanted to toggle the 2nd 1 in the number 1001 0111 we construct a mask which has a 1 in the position we want to toggle on or off.

1001 0111
    ^    -> 1000 0111
0001 0000

If we take the resulting 1000 0111 and exclusive or it with the previous mask 0001 0000, We get the original binary number back.

 1000 0111
    ^    -> 1001 0111
 0001 0000

What the XOR is doing in this case is comparing each bit and returning 1 if the bits are different and 0 if they are the same. In most contexts what this mean is that you just want to construct a mask that only has 1s in the spots you want to turn on or off, leaving the rest of the bits at 0.

Shifting

Finally, we come to the last two operators. Bit shift left ‘<<’ and bit shift right ’>>’. They are represented by the greater than and less than symbols or left and right guillemets « ». The bit shift left operator (<<) inserts zeroes into the least significant bits of a binary number. We haven’t talked about least significant bits, but you can think about it like this. In the decimal number 123,456 we know that the smallest number is 6 because it’s in the 1s place and the biggest number is 1 because it’s hundred thousandths place. In binary we do the same thing. If we had the nibble 1110 reading from right to left, 0 is the least significant bit and the final 1 is the most significant. If we left shift 1110 by 4 then we essentially tack on 4 zeros onto the least significant side of the binary. Our number now looks like this.

1110 << 4 -> 1110 0000

As you probably guessed, if left shifting a binary adds zeros to the least significant side of a binary, right shifting adds zeros to the most significant side. So, using the same number we get…

1110 >> 4 -> 0000

Another way to think about shift is not only that zeros are tacked on to the least significant side, or most significant side, but those zeros shift the original binary numbers left or right. That’s why they are called left and right shifts.

This article hasn’t focused on math at all, but a fun fact is that left shifting a number by 1 is the equivalent of multiplying a number by 2^1 aka doubling the number. Left shifting by 2 is the equivalent of multiplying by 2^2 or quadrupling the number, and so on and so forth. Right shifting does the opposite, using the same numbers it halves and quarters the number. If your number needs to be divided or scaled by a power of 2 this is an extremely fast operation, and if your program is compiled this is a common optimization that happens under the hood. For our purposes though, we can just think of this operator as tacking on 0s to the beginning or end of a binary number or pushing our binary left or right by the specified numbers of zeros.

How is this useful 🤔?

Besides some clever optimization tricks how can we utilize these operators in a practical context? Before we can understand that we need to go over one more concept. Hexadecimal numbers.

Hexadecimal numbers are another way of representing numbers in computing. Unlike binary numbers, which are base-2, hexadecimal numbers are base-16. This means that instead of using only two digits (0 and 1), we use sixteen: 0-9 and then the letters A, B, C, D, E, and F to represent 10, 11, 12, 13, 14, and 15. The reason hexadecimal is important in computing is that it's a more compact way of representing binary numbers. One hexadecimal digit can represent four binary digits (a nibble). This means that two hexadecimal digits can represent a byte. In many programming languages hexadecimal numbers are specified by prefixing a number with 0x to differentiate them from binary or decimal numbers. So, your hex number 1234 would be written as 0x1234 if you wanted it to be interpreted in hex.

A word of caution, don’t confuse the decimal number 1234 and the hexadecimal number 0x1234 are not the same number. One is the decimal value 1,234 the other is the decimal value 4,660.

To convert a binary number to hexadecimal, we can break it up into groups of nibbles, starting from the right-hand side. We then replace each nibble with the corresponding hexadecimal digit. For example, the binary number 1101 1010 can be broken up into 1101 and 1010. Converting each group to hexadecimal gives us DA. Therefore, 1101 1010 in binary is equivalent to 0xDA in hexadecimal. If you still have the programmer calculator handy, you don’t even have to do the calculation yourself, just type in the binary representation of a number and it will tell you the hex value.

Programmer calculator showing 1101 in hex

programmer calculator showing 1010 in hex

programmer calculator showing 1101 1010 in hex

Hexadecimal numbers are also commonly used to represent memory addresses in computing. This is because memory addresses are typically represented as 32-bit or 64-bit numbers, which can be quite long and difficult to read in binary. By representing memory addresses in hexadecimal, we can make them much more compact and easier to read. Here is DA as a 32bit binary number.

0000 0000 0000 0000 0000 0000 1101 1010
vs
0000 00DA

Bitwise Operators in Emulators

Now that we have all this information, we can look at a common use case, a Chip-8 interpreter. A Chip-8 interpreter is a good first project for someone that is just dipping their toes into emulation. The instructions are 2 bytes long or 16 bits and there are 36 different ones. Since working with 16 bits in binary is tedious, we will look at the instructions in hex. Remember each hex value is 4 bits. This is convenient because the smallest section of a Chip-8 instruction we will need to work with is 4 bits long.

Telmac 1800 running CHIP-8 game Space Intercept (Joseph Weisbecker, 1978)

Depending on the instruction the relevant components of the instruction change. Cowgod’s Chip-8 reference1 is an excellent resource for finding out what each instruction does. Let's look at a simple instruction case, the ones that fall in the 0x1000 range. We will be using my Crystal Chip-8 emulator as an example. Remember these numbers are in hex values.

      case op & 0xF000
      #nnn or addr - A 12-bit value, the lowest 12 bits of the instruction
      #n or nibble - A 4-bit value, the lowest 4 bits of the instruction
      #x - A 4-bit value, the lower 4 bits of the high byte of the instruction
      #y - A 4-bit value, the upper 4 bits of the low byte of the instruction
      #kk or byte - An 8-bit value, the lowest 8 bits of the instruction
      when 0x1000
        #1nnn
        #set the program counter to nnn which is the lower 3 bytes
        @pc = op & 0x0FFF;

Assuming that the instruction we have is 0x1ABC we first need to figure out what the most significant byte in the instruction is. This will determine how we interpret it. Using Cowgod’s documentation we see that instructions that have 1 as the most significant byte set the program counter to the last 12 bits of the instruction. What we need to do is keep the most significant bit and clear out the other 12 bits we aren’t interested in when identifying the instruction. This should be familiar to you.

We can write a case statement which allows us to create multiple branches depending on the instruction we receive. We can take our operation code 0x1ABC and do a bitwise & with the mask 0xF000 to get the most significant bit of our instruction 0x1000. This ensures that no matter what instruction comes in, if it starts with 0x1 it will always go to the correct branch no matter whether its 0x1DDD, 0x1012, etc. Now that we have the instruction going to the right place we need to get the lower 12 bits of the instruction, and set our program counter to that value. Again we can use the bitwise &. By doing a bitwise & using our op code and the mask 0x0FFF we can get the number 0x0ABC. We can then add this number to our program counter.

You might be confused as to how the hexadecimal values translate back to the binary numbers like how we did it in our previous examples, so I’ll show it for this example. 0x1ABC translates to 0001 1010 1011 1100 in binary. 0xF000 translates to 1111 0000 0000 0000. If we do the bitwise & on these two numbers, we get…

 0001 1010 1011 1100
          &         -> 0001 0000 0000 0000
 1111 0000 0000 0000

0001 0000 0000 0000. Which is just our op code with its most significant bit. As you may have guessed 0001 0000 0000 0000 is 0x1000 in binary, which is why it goes to the 0x1000 branch of our case statement. The same logic applies to getting the lower 12 bits of our operation. In this case, we care about all the numbers except for the first 4 bits so we need to clear them. This time we use 0x0FFF which translates to 0000 1111 1111 1111 in binary. We do our bitwise & on this and we get

 0001 1010 1011 1100
          &         -> 0000 1010 1011 1100
 0000 1111 1111 1111

Which is exactly what we want. The last question you might have is, why we use F in hex number instead of 1 like we do in binary. That’s because F corresponds to 1111 in binary, so we know if we want to keep the bit pattern at that index in our original hex value, just throw an F there. Doesn’t matter what the combination of 0s and 1s is, it will always be preserved in a bitwise &. Since we wanted the last 12 bits of our hex to be preserved, we put 3 Fs there. Since we needed only the bit pattern of our most significant 4 bits to be preserved to get it to go to the right case statement, we only put one F in the first spot of our hex mask.

Well finish off with another example that uses our bit shift operators.

      when 0x5000
        #5xy0
        #Skip next instruction if Vx = Vy.
        #Compares register Vx to register Vy
        #if they are equal, increments the program counter by 2.
        x = (op & 0x0F00) >> 8
        y = (op & 0x00F0) >> 4
       
        #register index 
        if @reg[x] == @reg[y]
          @pc +=2
        end

Suppose our next instruction is 0x5120. We need to construct a mask that will get our op to branch to the 0x5000 branch of our case statement. To construct the mask, we want to keep the most significant digit (the first hex digit) and set all the other digits to 0. Try to figure that out now. Next, we see that instructions that fall into the 0x5000 range give us an x and y value which we can use to access the registers at x and y. Therefore, we need to somehow get the x and y values out of the operation. Just like in our previous examples we can put an F at the index of the hex value we are interested in keeping, and a 0 anywhere else. For the x value that is a mask that looks like this 0x0F00 and for the y value it’s a mask that looks like this 0x00F0. We learned that right shifts tack on 0s on the most significant bit side of a binary number. While this is true in most cases, what happens in the case where your number is a fixed width? For Chip-8 the smallest binary instruction is 0000 0000 0000 0000 and the largest value is 1111 1111 1111 1111. They are confined to 16 bits in size. Since we can’t just tack on zeros to the most significant bit size like this 0000 1111 1111 1111 1111 as that would create an instruction that is 20bits in length or 4 bits larger than the Chip-8 supports, something has to give.

It turns out that in this case, the numbers on the least significant side get pushed off to “make room” for zeros on the most significant side in shifts with a fixed width. This means that a number like this 1111 1111 1111 1111 right shifted by 4 now looks like this…

 (1111 1111 1111 1111 >> 4) 0000 1111 1111 1111

A block of 1111 was pushed off the binary number to make room for the 0000. How many times do we want to right shift an instruction to get the x and y?
If we look at the instruction template for the 0x5000 instructions 5xy0 you can see that x is 8 bits away from the least significant side of the instruction. Since a shift moves an integer by 1 bit it will take 8 shifts to get x all the way to the least significant side of the binary. Since y is only 4 bits away from the least significant side of the instruction, it only takes 4. Hence to get our x and y values we shift by 8 and 4 respectively.

 x = (op & 0x0F00) >> 8
 y = (op & 0x00F0) >> 4

Once we have our x and y it then just becomes a matter of comparing our register at x to our register at y and seeing if they are the same. If they are we increase the program counter.

        #register index 
        if @reg[x] == @reg[y]
          @pc +=2
        end

I won’t go through all the instructions in this article as it would be too long, but if you were to look at some of the instructions for the Chip-8 like the ones in the 0x8000s…

8xy1 - OR Vx, Vy
Set Vx = Vx OR Vy.

8xy3 - XOR Vx, Vy
Set Vx = Vx XOR Vy.

You would see that we end up using all the bitwise functions that we talked about in the first part of our article in implementing it. I hope that this article has given you an answer as to why we use bitwise operators in programming, as well as a practical example of where they can be used. Emulators are not the only examples so I would encourage you to seek out more. But the next time that you fire up an emulator with your legally obtained or homebrew ROMS, remember that bitwise operators are powering your experience. If you are interested in how emulators work, I encourage you to create your own Chip-8 interpreter, as it is a valuable and worthwhile experience, and I recommend using Cowgod’s reference as it has been invaluable for me in my Chip-8 journey.

Call To Action 📣

If you made it this far thanks for reading! If you are new welcome! I like to talk about technology, niche programming languages, AI, and low-level coding. I’ve recently started a Twitter and would love for you to check it out. I also have a Mastodon if that is more your jam. If you liked the article, consider liking and subscribing. And if you haven’t why not check out another article of mine! Thank you for your valuable time.