Does a byte contain 8 bits, or 9?

52

10

I read in this assembly programming tutorial that 8 bits are used for data while 1 bit is for parity, which is then used for detecting parity error (caused by hardware fault or electrical disturbance).

Is this true?

Jason

Posted 2016-12-20T16:44:03.907

Reputation: 381

5

See http://cs.stackexchange.com/a/19851/584 for a discussion of what a byte can be.

– AProgrammer – 2016-12-20T18:27:35.177

60That article is filled with nonsense and you should ignore it. – David Schwartz – 2016-12-20T20:51:20.977

12If you want to be pedantic, just call them "octets". That article is either written with a very specific processor in mind (one that must keep parity bits in ROM for some reason...) or is just wack. Microchip PICs, for example, use a 14-bit word length. The entire program memory is organized in a N x 14 bit array. – Nick T – 2016-12-20T23:49:26.653

13@NickT: they're not the same thing, though. An octet is always 8 bits, a byte may be anything. – Jörg W Mittag – 2016-12-20T23:49:50.037

Networking protocols, in particular, use the term "octet" to unambiguously refer to 8-bit quantities. – chepner – 2016-12-21T14:44:49.743

4The article may have been referencing the memory correction mechanisms used in some early IBM PCs, but stating that "byte is 8 bits data + 1 bit parity" is utter nonsense. As an example, CD-ROMs usually use error correction mechanisms that are much more greedy - a typical audio CD will use 8 bytes per 24 bytes of audio data. But the most important part is that you don't care. At all. It's exclusive to the actual memory storage mechanism - the CPU doesn't care, your code doesn't care. – Luaan – 2016-12-22T20:16:20.687

Comments are not for extended discussion; this conversation has been moved to chat.

– D.W. – 2016-12-24T16:23:16.063

The PDP-9/-15 architecture by Digital Equipment Corporation (DEC) in the 1960's and early 1970's had a 9 bits/byte (and thus a 18 bits/word) architecture. This allowed for 18 bits of addressable memory, a whopping 262,144 bytes instead of the mere 65,536 allowed by more standard architectures. – Pieter Geerkens – 2016-12-25T19:32:52.240

Answers

75

A byte of data is eight bits, there may be more bits per byte of data that are used at the OS or even the hardware level for error checking (parity bit, or even a more advanced error detection scheme), but the data is eight bits and any parity bit is usually invisible to the software. A byte has been standardized to mean 'eight bits of data'. The text isn't wrong in saying there may be more bits dedicated to storing a byte of data of than the eight bits of data, but those aren't typically considered part of the byte per se, the text itself points to this fact.

You can see this in the following section of the tutorial:

Doubleword: a 4-byte (32 bit) data item

4*8=32, it might actually take up 36 bits on the system but for your intents and purposes it's only 32 bits.

JustAnotherSoul

Posted 2016-12-20T16:44:03.907

Reputation: 1 071

5Well, if the hardware implements error detection it would probably do so with bigger chunks of memory than a byte, like with 512-byte sectors or so... in this way you can reduce the overhead of extra memory needed. Just to clarify: even with error correction the hardware still uses 8-bit per byte plus some bits for each "chunk" of data, which is probably much bigger than a single byte. – Bakuriu – 2016-12-20T19:45:36.923

1IIRC early HP-3000's from Hewlett-Packard used a five-bit parity scheme which could correct single-bit errors and detect multiple-bit errors. Frankly I thought that was overkill, but as a lowly college computer operator nobody asked me. :-) – Bob Jarvis – 2016-12-21T04:05:58.283

8

Note that there are systems with software-visible non-8-bit bytes. See What platforms have something other than 8-bit char? question at StackOverflow.

– Ruslan – 2016-12-21T19:32:34.213

2Yes, they do indeed exist. Though that particular link is talking about non-8-bit chars. As it were: byte used to simply refer to the number of bits that a given system took to store a 'char', which was as low as six bits. But IIRC it is standardized in the IEC-80000 specification that a byte is 8-bits. As you move away from mainstream systems, you do find oddities of course, and standards aren't laws. – JustAnotherSoul – 2016-12-22T16:28:24.667

3@JustAnotherSoul: And there are competing standards, that define byte as "at least 8 bit" or in other ways. It is interesting to see how decades later the definition of byte changes in the minds of people. Back in the time of much more architectural heterogeneity byte was simply the smallest addressable unit of your architecture (look at various PDPs for examples). This is also the reason that in the advent of the internet the term octet was used to describe the data on wire, as byte was not a universal word for a chunk of 8 bit data. – PlasmaHH – 2016-12-23T09:10:31.180

1@JustAnotherSoul note that char in C (which is what the link is about) is exactly the smallest addressable unit of memory. It's just called char, but the C Standard makes it synonymous to byte. – Ruslan – 2016-12-24T18:58:10.593

I just wanted to comment that your last sentence " it might actually take up 36 bits on the system " is worded incorrectly. The reason is that the size on disk(system) will be a variant of the size of the allocation unit, in this case 32/6 bits would be 4096 bytes(default). I figured I would mention this as it might confuse someone. For example I currently have a 100 byte file open that takes up 4096 bytes on disk. – XaolingBao – 2016-12-25T18:54:04.150

42

Traditionally, a byte can be any size, and is just the smallest addressable unit of memory. These days, 8 bit bytes have pretty much been standardized for software. As JustAnotherSoul said, the hardware may store more bits than the 8 bits of data.

If you're working on programmable logic devices, like FPGAs, you might see that their internal memory is often addressable as 9-bit chunks, and as the HDL author, you could use that 9th bit for error checking or just to store larger amounts of data per "byte". When buying memory chips for custom hardware, you generally have the choice of 8 or 9 bit addressable units (or 16/18, 32/36, etc), and then it is up to you whether you have 9 bit "bytes" and what you do with that 9th bit if you choose to have it.

Extrarius

Posted 2016-12-20T16:44:03.907

Reputation: 641

10Generally when there's a group of data that is logically a single unit but contains more/less than 8 bits, it is called a "word." For example, some processors use a 40-bit instruction word. – Devsman – 2016-12-20T19:36:40.007

3+1. Incidentally, there have been architectures with both "bit pointers" and "byte pointers". In such architectures, a byte is technically not "the smallest addressable unit of memory" (since you can address each bit independently), though it's hard to succinctly say what it is. I guess it's an "I know it when I see it" sort of thing. :-P – ruakh – 2016-12-20T19:56:20.750

16"Octet" was the traditionally used word to mean "I'd call it a byte, but I really do mean exactly 8 bits" for various communication protocols between systems that may have different byte sizes. But these days, using byte to mean anything but 8 bits is anachronistic. – wnoise – 2016-12-21T01:12:24.910

@Devsman Not necessarily. x86 chips have 32 bit words and 8 bit bytes, for example. A byte is the smallest addressable size. The word is a bit more vaguely defined, but tends to be the size that is most convenient to work with; i.e. the expected operand length of most instructions. – Ray – 2016-12-22T01:09:35.600

This should be marked as the correct answer, it is more correct. – awiebe – 2016-12-30T09:20:42.990

29

That text is extremely poorly worded. He is almost certainly talking about ECC (error-correcting code) RAM.

ECC ram will commonly store 8-bits worth of information using 9-bits. The extra bit-per-byte is used to store error correction codes.

ECC vs non-ECC (In both cases, every byte is spread across every chip. Image courtesy of Puget Systems)

This is all completely invisible to users of the hardware. In both cases, software using this RAM sees 8 bits per byte.


As an aside: error-correcting codes in RAM typically aren't actually 1 bit per byte; they're instead 8 bits per 8 bytes. This has the same space overhead, but has some additional advantages. See SECDED for more info.

BlueRaja - Danny Pflughoeft

Posted 2016-12-20T16:44:03.907

Reputation: 677

12Parity RAM and ECC RAM are different things. Parity RAM stores one additional bit per error domain, can detect all single-bit errors and no double-bit errors, and can fix nothing. ECC stores a number of additional bits per error domain, can detect and fix all single-bit errors, can detect but not fix all double-bit errors, and can catch some larger errors. Parity RAM is rare these days, having been almost entirely replaced by ECC RAM. – Mark – 2016-12-20T21:54:13.710

1@Mark: I hinted at that in my last paragraph, there are more details in the link. Parity RAM is basically non-existent these days because a (72,64) error-correction code has the same overhead as a (9,8) parity code. – BlueRaja - Danny Pflughoeft – 2016-12-20T22:20:28.887

6While you hint at it, you also state things that make it imprecise/confusing. ECC RAM does not "store 8-bits worth of information using 9-bits". Stating that implies you can do ECC for 8 bits using 9 bits, which is not possible. For 8 bits of discrete information 1 extra bit is enough to detect, not correct, single bit errors. ECCs use larger numbers of bits, or bytes, to contain data sufficient to correct errors for groups of data, usually larger than a single byte. While this might average an extra bit per 8 bits, it can not be broken down to associating only 1 bit with each 8 bits. – Makyen – 2016-12-21T19:16:34.497

There is a 36-bit scheme (32 bit word + 4 bit ECC) which permits single bit error correction and two bit error detection. While you can arithmetically divide it down to 8 data bits + 1 ECC bit, it cannot/does not work that way. The full 4 bits of ECC are required, which covers 32 data bits. – Zenilogix – 2016-12-23T16:56:12.160

@Zenilogix and others who repeated the same thing: I understand very well how ECC works, and nothing I said was incorrect. I never claimed 8-bit ECC can be done with 9 bits, I said ECC RAM uses 9-bits-per-byte of storage. How ECC works is completely out-of-scope for this question, which is why I left the details as an aside with a link. Please stop all the pedantic comments. – BlueRaja - Danny Pflughoeft – 2016-12-23T17:12:02.323

15

Generally speaking, the short answer is that a byte is 8 bits. This oversimplifies the matter (sometimes even to the point of inaccuracy), but is the definition most people (including a large number of programmers) are familiar with, and the definition nearly everyone defaults to (regardless of how many differently-sized bytes they've had to work with).

More specifically, a byte is the smallest addressable memory unit for the given architecture, and is generally large enough to hold a single text character. On most modern architectures, a byte is defined as 8 bits; ISO/IEC 80000-13 also specifies that a byte is 8 bits, as does popular consensus (meaning that if you're talking about, say, 9-bit bytes, you're going to run into a lot of trouble unless you explicitly state that you don't mean normal bytes).

However, there are exceptions to this rule. For example:

So, in most cases, a byte will generally be 8 bits. If not, it's probably 9 bits, and may or may not be part of a 36-bit word.

Justin Time

Posted 2016-12-20T16:44:03.907

Reputation: 251

7

A byte is usually defined as the smallest individually addressable unit of memory space. It can be any size. There have been architectures with byte sizes anywhere between 6 and 9 bits, maybe even bigger. There are also architectures where the only addressable unit is the size of the bus, on such architectures we can either say that they simply have no byte, or the byte is the same size as the word (in one particular case I know of that would be 32 bit); either way, it is definitely not 8 bit. Likewise, there are bit-addressable architectures, on those architectures, we could again argue that bytes simply don't exist, or we could argue that bytes are 1 bit; either way is a sensible definition, but 8 bit is definitely wrong.

On many mainstream general purpose architectures, one byte contains 8 bit. However, that is not guaranteed. The further away you stray from the mainstream and/or from general purpose CPUs, the more likely you will encounter non-8-bit-bytes. This goes so far that some highly-portable software even makes the size configurable. E.g. older versions of GCC contained a macro called BITS_PER_BYTE (or something like that), which configured the size of a byte for a particular architecture. I believe some older versions of NetBSD could be made to run on non-8-bit-per-byte architectures.

If you really want to stress that you are talking about an exact amount of 8 bit rather than the smallest addressable amount of memory, however large that may be, you can use the term octet, which is for example used in many newer RfCs.

Jörg W Mittag

Posted 2016-12-20T16:44:03.907

Reputation: 1 263

2Standard C and C++ have a predefined macro CHAR_BIT (found in limits.h), I am not aware of BITS_PER_BYTE – njuffa – 2016-12-21T21:30:04.153

7

Note that the term byte is not well-defined without context. As far as computer architectures are concerned, you can assume that a byte is 8-bit, at least for modern architectures. This was largely standardised by programming languages such as C, which required bytes to have at least 8 bits but didn't provide any guarantees for larger bytes, making 8 bits per byte the only safe assumption.

There are computers with addressable units larger than 8 bits (usually 16 or 32), but those units are usually called machine words, not bytes. For example, a DSP with 32K 32-bit RAM words would be advertised as having 128 KB or RAM, not 32 KB.

Things are not so well-defined when it comes to communication standards. ASCII is still widely used, and it has 7-bit bytes (which nicely fit in 8-bit bytes on computers). UART transceivers are still produced to have configurable byte size (usually, you get to pick at least between 6, 7 and 8 bits per byte, but 5 and 9 are not unheard of).

Dmitry Grigoryev

Posted 2016-12-20T16:44:03.907

Reputation: 201

2

When I started programming in 1960, we had 48 bit words with 6 bit bytes - they ware not called that name then, they were called characters. Then I worked on the Golem computer with 75 bit words and 15 bit bytes. Later, 6 bit bytes were common, but nowadays a byte is commonly equivalent to an octet, i.e. 8 bits of data. Some hardware had additional bits for error detection and possibly for error correction, but these were not accessible by the software.

Jonathan Rosenne

Posted 2016-12-20T16:44:03.907

Reputation: 121

2

A byte is 8 bits.

In the distant past, there were different definitions of a memory word and of a byte. The suggestion that this ambiguity is widespread or is prevalent in today's life is false.

Since at least the late 1970's, a byte has been 8 bits. The mass populace of home computers and PCs have all used unambiguously used a byte as an 8-bit value in their documentation, as have all of the data sheets and documentation for floppy disk drives, hard disk drives and PROM/EPROM/EEPROM/Flash EPROM/SRAM/SDRAM memory chips that I have read in that time period. (And I have personally read a great deal of them right across that time period.) Ethernet and a couple of other communications protocols stand out to me as unusual in talking about octets.

The ambiguity of the term byte is itself a rare and obscure thing. Very, very few of the population of programmers, design engineers, test engineers, salespeople, service engineers or average punters in the last 30 years or more would think it meant something other than an 8-bit value if they recognised the word at all.

When a byte is handled by hardware, such as when stored in memory chips or communicated along wire, the hardware may add redundant data to the byte. This may later assist in detecting hardware errors so that unreliable data can be recognised and discarded (e.g. parity, checksum, CRC). Or it may allow errors in the data to be corrected and the data recovered (e.g. ECC). Either way, the redundant data will be discarded when the byte has been retrieved or received for further processing. The byte remains the central 8-bit value and the redundant data remains redundant data.

TonyM

Posted 2016-12-20T16:44:03.907

Reputation: 121

1

First, the tutorial that you are referencing seems to be quite outdated, and seems to be directed at outdated versions of x86 processors, without stating it, so lots of the things you read there will not be understood by others (for example if you claim that a WORD is 2 bytes, people will either not know what you are talking about, or they will know that you have been taught based on very outdated x86 processors and will know what to expect).

A byte is whatever number of bits someone decides it should be. It could be 8 bit, or 9 bit, or 16 bit, anything. In 2016, in most cases a byte will be eight bit. To be safe you can use the term octet - an octet is always, always, eight bits.

The real confusion here is confusing two questions: 1. What is the number of bits in a byte? 2. If I wanted to transfer one byte from one place to another, or if I wanted to store a byte, using practical physical means, how would I do that? The second question is usually of little interest to you, unless you work at a company making modems, or hard drives, or SSD drives. In practice you are interested in the first question, and for the second one you just say "well, someone looks after that".

The parity bit that was mentioned is a primitive mechanism that helps detecting that when a byte is stored in memory, and later the byte is read, the memory has changed by some accident. It's not very good at that, because it won't find that two bits have been changed so a change is likely to go undetected, and it cannot recover from the problem because there is no way to find out which of the 8 bits have changed, or even if the parity bit has changed.

Parity bits are practically not used in that primitive form. Data that is stored permanently is usually protected in more complicated ways, for example by adding a 32 bit or longer checksum to a block of 1024 bytes - which takes much less extra space (0.4% in this example instead of 12.5%) and is much less likely to not find out when something is wrong.

gnasher729

Posted 2016-12-20T16:44:03.907

Reputation: 7 784

Really outdated: the 16-byte "paragraph" hasn't been a meaningful unit of memory since the switch from real mode and segmented addressing. – Mark – 2016-12-24T21:21:53.553

1

Despite the really excellent answers given here, I'm surprised that no one has pointed that parity bits or error correction bits are by definition 'metadata' and so not part of the byte itself.

A byte has 8 bits!

user34445

Posted 2016-12-20T16:44:03.907

Reputation: 116

Please see my answer of two days ago :-) – TonyM – 2016-12-26T17:35:02.863

1My apologies TonyM. :) – user34445 – 2016-12-26T17:41:58.517

0

In modern usage, a byte is 8 bits, period (although it has historically had other definitions). On the other hand, a data word is whatever the hardware in question handles as an atomic unit - could be 8 bits, 9 bits, 10 bits, 12 bits, 16 bits, 20 bits, 24 bits, 32 bits, etc. Various computer systems over the years have had all sorts of different word sizes.

To implement a memory system or a transmission protocol, it is beneficial to add error detection/correction which involves additional bits. They don't make for a 9-bit byte because, as stated above, a byte is 8 bits.

Various schemes add error detection and/or correction in various ways.

The typical use of parity is to add an extra bit to the transmission word so that the receiver can detect a single bit of error.

A scheme which can provide single-bit error correction involves the addition of 4 ECC bits per 32 bit data word. This just happens to be arithmetically equivalent to 1 bit per byte, but it cannot/does not work that way. One 36-bit data word can carry enough information to recover from a single bit error for a 32-bit data space.

Zenilogix

Posted 2016-12-20T16:44:03.907

Reputation: 131