Mr. Martin's programming school

Bits and bytes and stuff

« How does it work?
So, what are those bits and bytes really? Everybody is talking about them but do they really know what they are talking about? And what is the difference between a 32-bit processor and a 64-bit one?

Let's go back to the old 8-bit architectures. The microprocessors by the time, e.g. Intel 8080, had eight connections, eight electrical wires, that were dedicated for data. Why eight? Well, let's start binary.

One connection can either have e.g. a 5 volt output or 0 volt output. It can have two states, some volts output, or no volts output, we can call that a 1 or a 0. If we have only one connection, one electrical wire, we cannot use it to say much. We can only say things like 1 or 0, on or off, true or false, or whatever meaning we want to use it for, or could be a LED with some text printed next to it.

If you e.g. would like to represent one digit, 0 - 9, you need a few more connections, or wires, in order to do that. But you do not need ten of them. You only need four, if you use combinations of volt on/volt off, 1/0 on them, like this:

4 wires with a
combination of
5 or 0 volt:
| | | |
0 0 0 0 = 0 - Values
0 0 0 5 = 1
0 0 5 0 = 2
0 0 5 5 = 3
0 5 0 0 = 4
0 5 0 5 = 5
0 5 5 0 = 6
0 5 5 5 = 7
5 0 0 0 = 8
5 0 0 5 = 9

Or, if we use 1's and 0's to represent the wires, like this:

0000 = 0 Values
0001 = 1
0010 = 2
0011 = 3
0100 = 4
0101 = 5
0110 = 6
0111 = 7
1000 = 8
1001 = 9

There was actually one early Texas Instrument microprocessor called 4004 preceeding the Intel 8080. It had a four-bit architecture. Thus it could output e.g. only one digit. It also had 6 more combinations that we do not need for our single digit. Let's give them some values, or names, now:

1010 = A
1011 = B
1100 = C
1101 = D
1110 = E
1111 = F

Whatever they can be used for, at least we have a name for each and every combination of four bits now. Ok, talking about letters, and still, why 8 bits? Am I leading up to something now? Of course! Using 8 bits gives us, let's see now... 256 different combinations!

That would be enough to represent the entire alphabet, in upper and lower case as well as all digits, special characters and much more.

Old computers used electro-mechanical type-writers for input and output. When a key was pressed, the corresponding character was printed the usual mechanical way, but there was also a switch under each key, and a grid of transistors that made up the logic to send a combination of 8 bits to the computer. Thus the computer was fed characters while a person was typing.

When the computer on the other hand sent out characters one by one, the type-writer printed them by means of electro-magnets pulling down the keys. There were special combinations of 1's and 0's that meant special functions like 00001010 for advancing the paper to next line, 00001101 for moving the printing to the beginning of a line and 00001001 for tabulator.

So, by using 8 bits, a specialized type-writer could be used to communicate with the computer. That I call a definite advantage.

Ten fingers and number bases

Have you ever stopped to think about why we use digits 0 - 9 and not e.g. 0 - 7? I can tell. It's because we have 10 fingers. There has been cultures using only one hand and thus only 5 symbols for counting. There were also the Mayas, they obviously also used their toes, because they based their numbering system on 20. They had one symbol for 1, a dot, and one for 5, a bar, which they stacked in a 'box' until they reached 20. They meant that going above 20 was a bit too much for the box, so the stacked another box above the other one, where the dots had the value 20 and the bars the value of 100. That's what you get if you multiply by 20.

So, stacking dots and bars up to 19 in a box, and stacking boxes. Each box had always dots and bars with 20 times higher value than the box below. That way they combined stacked dots and bars in stacked boxes into any number.

And that we do too, with our numbers. Just a different way. While the Mayas just stacked symbols, we use a positioning system. In our system the rightmost digit has a value as it stands, while the digit to the left of that has 10 times higher value, the next 100 times its value etcetera. Let us do an example, e.g. the number 256:

 \\\__ The 6 in the rightmost column has the value as it stands, in this exampe = 6.
  \\__ The 5 in the next column has the value ten times the digit, in this example = 50
   \__ And the in next column with a 2 in it, has the value 100 times that = 200

So, the total sum is 6 + 50 + 200 = 256. Which looks very obvious, but that is only because we have gotten used to think that way. The actual value for a Mayan would be a two boxes stacked on eachother. The top one with two bars and two dots, that is 240, and the bottom one made up the sum of 16 by stacking 3 bars and one dot. 240 + 16 = 256.

But all this is only different ways to represent a count, not the thing itself. What it really is , is a symbolic representation of 256 things of some kind, e.g. dots. Here are 256 dots:

That, is what 256 really means! Not the way we are used to think about it, eh?

Mathenaticians call our way of writing numbers a 10-based system, or the decimal system. Deci means ten. Likewise they call our 1's and 0's a 2-based system, or more commonly a binary system. Binary in that sense that a digit can have 2 different values, 1 or 0. In the same manner, the rightmost digit has the value it as it stands, while the next one to the left has double its value, the next quadruple its value etcetera.

We can take a few examples from the table above:

0000 = 0 + 0 + 0 + 0 = 0
0001 = 0 + 0 + 0 + 1 = 1
0010 = 0 + 0 + 2 + 0 = 2
0101 = 0 + 4 + 0 + 1 = 5
1001 = 8 + 0 + 0 + 1 = 9

Just to be correct here, we are converting from a base-2 number to a base-10 ditto. Therefore we should show that in some clear and unmistakenly way. The way to do that in mathematics is to put a subscript after each number to clarify in what base it is meant to be understood, like this:

00002 = 010 + 010 + 010 + 010 = 010
00012 = 010 + 010 + 010 + 110 = 110
00102 = 010 + 010 + 210 + 010 = 210
01012 = 010 + 410 + 010 + 110 = 510
10012 = 810 + 010 + 010 + 110 = 910

Since the computers seems to have 16 'fingers' rather than our 10, we have some extra combinations. But we have used up all our digits, 0 - 9, so why not use the alphabet to fill out the rest? We have already assigned A - F above, and that is just one way to do it. We could start drawing new symbols, but A - F has actually become a standard.

So, now we do have a new mathematical number-base that suits the computers perfect, we have base-16. The mathematicians call this a hexa-decimal system. Hexa means 6, and as I mentioned above, deci means 10.

But what about bytes? Do we have to invent a 256-based system, and where will we get all symbols for that? Hold your horses, we do not need to do that. We can use the hexadecimal symbols just like we use our 'normal' digits. Rightmost digit has the value as it stands, the next one to the left has 16 times its value, the next one 256 times its value etcetera.

Let's do a few examples:

0016 = 010 + 010 = 010
0116 = 010 + 110 = 110
0F16 = 010 + 1510 = 1510
1016 = 1610 + 010 = 1610
1116 = 1610 + 110 = 1710
2016 = 3210 + 010 = 3210
4116 = 6410 + 110 = 6510
6416 = 9610 + 410 = 10010
A016 = 16010 + 010 = 16010
F316 = 24010 + 310 = 24310
FF16 = 24010 + 1510 = 25510
10016 = 25610 + 010 + 010 = 25610

The meaning of bytes

So, we still have numbers, right? How can this be 'the entire alphabet, in upper and lower case as well as all digits, special characters and much more'? Well, in order to have that we need to make some kind of agreement where we assign each and every letter, number and special characters a value. And that has already been done. There were actually already some work done in the area. Old teletypes used a 5-bit code. Based on that code the 7-bit ASCII code was developed. ASCII = American Standard Code for Information Interchange. This was only for the english alphabet. Later it was extended to use all 8 bit in order to add a lot of national characters, enabling writings in a number of languages.

The numbers 0 - 9 does not have the binary values of 0000 to 1001 as we used above. Still, when we use the bytes for numerical values, they do, but in text they are assigned ASCII values 48 - 57. In hexadecimal representation that is 30 - 39. The upper-case alphabet starts at 65, or hexadecimal 41 and is followed by [, \, ], ^, _, and ` before the lower-case alphabet begins with 'a' at hexadecimal 61 ending in 'z' at hexadecimal A7.

There are a lot of ASCII tables on the Internet. Just ask mr. google about ascii.

So, we have two different uses for our bytes, numerical values and character representation. Of course there are more. They can be representing pixels in a picture, thus their value tells what color a pixel should have etcetera.

But the most important meaning is of course machine-code. The instructions our micorprocessor can understand. The code that makes up our programs, operating-system, BIOS and so on.

Different mathematical bases are often indicated by a digit in subscript, as mentioned above:

26510 is a base-10 number, the type we are used to.
10916 is a hexadecimal number. Its base-10 value is 265.
4118 is an octal number. Its base-10 value is 265.
1000010012 is a binary number. Its base-10 value is also 265.
10916 interpreted as an ASII code will have the character value 'm'.
The same number can represent a color in an old 8-bit VGA format, like this: 

So, it is all about how the one's and zeroe's in memory is interpreted, how they are intended to be used.

When typing pure text into a text editor, which you will be doing when programming, you cannot type in subscript since a text file does not contain any formatting. Therefore there are other ways of indicating number bases like prefix 0x for hexadecimal numbers and no prefix for 'normal' base-10 values. Thus, rather than typing A716, which you cannot, you will be typing 0xA7.

More bytes makes better computers

The microprocessors does not only have connections for sending and receiving data. They also need to communicate with its work-memory the RAM (Random Access Memory) where program and data are temporary stored for the microprocessor to use. This is neccessary because the microprocessor can not use the disk drive directly. Then how does it use the memory?

Just think about it, the RAM holds a lot of bytes. We need a way to 'point' to the byte we want the microprocessor to read or write. Therefore we have another grid of transistors that can combine a number of 1's and 0's to electrically point at one row in the memory. A row can be one byte, two, four and even 8 bytes, which I will describe here, that's why I say row rather than byte. Those 1's and 0's makes like an address into the memory.

Already the 8080, mentioned at the beginning of this page, had 16 connections that was intended to address the memory. So, the thus called 'memory bus' was 16 bits, or 2 bytes 'wide'. That way it could accress 65536 rows of memory. The engineers that designed the 8080 was well aware that one byte was not enough since it would limit the memory size to only 256 rows.

Modern computer does not only have 32 or 64 bits wide memory bus, but also 32 or 64 bits wide data bus thus reading and writing 4 or 8 bytes in a single instruction. They also have more machine-code instructions.

So, the difference between a 32-bit microprocessor and a 64-bit ditto is the amount of bytes read and written in one instruction and the amount of RAM it can address. There is more, but we won't go inte all details. This was the most important ones.

As John Cleese used to say 'And now to something completely different'. If you have installed Visual Studio for the Desktop, go ahead and try the first lesson on Console programming.