133

Yes yes, I am aware that '\n' writes a newline in UNIX while for Windows there is the two character sequence: '\r\n'. All this is very nice in theory, but my question is why? Why the carriage return character is extra in Windows? If UNIX can do it in \n why does it take Windows two characters to do this?

I am reading David Beazley's Python book and he says:

For example, on Windows, writing the character '\n' actually outputs the two- character sequence '\r\n' (and when reading the file back, '\r\n' is translated back into a single '\n' character).

Why the extra effort?

I will be honest. I have known the difference for a long time but have never bothered to ask WHY. I hope that is answered today.

Thanks for your time.

sukhbir
  • 1,499
  • 2
  • 11
  • 9

8 Answers8

166

Backward compatibility.

Windows is backward compatible with MS-DOS (aggressively so, even) and MS-DOS used the CR-LF convention because MS-DOS was compatible with CP/M-80 (somewhat by accident) which used the CR-LF convention because that was how you drove a printer (because printers were originally computer controlled typewriters).

Printers have a separate command to move the paper up one line to a new line, and a separate command for returning the carriage (where the paper was mounted) back to the left margin.

That's why. And, yes, it is an annoyance, but it is part of the package deal that allowed MS-DOS to win over CP/M, and Windows 95 to win over all the other GUI's on top of DOS, and Windows XP to take over from Windows 98.

(Note: Modern laser printers still have these commands because they too are backwards compatible with earlier printers - HP in particular do this well)

For those unfamiliar with typewriters, here is a video showing how typing was done: http://www.youtube.com/watch?v=LJvGiU_UyEQ. Notice that the paper is first moved up, and then the carriage is returned, even if it happens in a simple movement. The ding notified the typist that the end was near, and to prepare for it.

31

As far as I'm aware this harks back to the days of typewriters.

\r is carriage return, which is what moves where you are typing on the page back to the left (or right if that is your culture)

\n is new line, which moves your paper up a line.

Doing only one of these on a typewriter would put you in the wrong place to start writing a new line of text.

When computers came about I guess some people kept the old model, but others realised that it wasn't necessary and encapsulated a full newline as one character.

Matt Ellen
  • 3,368
15

I don't know if this is common knowledge, but it should be noted that CR is still understood by modern terminal emulators:

$ printf "hey world\rsup\n"
sup world

It's handy for progress indicators, e.g.

for i in {1..100}
do
    printf "\rLoading... %d%%" $i
    sleep 0.01
done
echo
8

History of the Newline Character (Wikipedia):

ASCII was developed simultaneously by the ISO and the ASA, the predecessor organization to ANSI. During the period of 1963–1968, the ISO draft standards supported the use of either CR+LF or LF alone as a newline, while the ASA drafts supported only CR+LF.

The sequence CR+LF was in common use on many early computer systems that had adopted teletype machines, typically an ASR33, as a console device, because this sequence was required to position those printers at the start of a new line. On these systems, text was often routinely composed to be compatible with these printers, since the concept of device drivers hiding such hardware details from the application was not yet well developed; applications had to talk directly to the teletype machine and follow its conventions.

The separation of the two functions concealed the fact that the print head could not return from the far right to the beginning of the next line in one-character time. That is why the sequence was always sent with the CR first. In fact, it was often necessary to send extra characters (extraneous CRs or NULs, which are ignored) to give the print head time to move to the left margin.

Even after teletypes were replaced by computer terminals with higher baud rates, many operating systems still supported automatic sending of these fill characters, for compatibility with cheaper terminals that required multiple character times to scroll the display.

MS-DOS (1981) adopted CP/M's CR+LF; CP/M's use of CR+LF made sense for using computer terminals via serial lines. This convention was inherited by Microsoft's later Windows operating system.

The Multics operating system began development in 1964 and used LF alone as its newline. Unix followed the Multics practice, and later systems followed Unix.

Craige
  • 3,781
7

Historically, line feed meant that the platen - the roller on which you type - rotated one line, causing text to appear on the next line... but in the next column.

Carriage return meant "return the bit with which you type to the beginning of the line".

Windows uses CR+LF because MS-DOS did, because CP/M did, because it made sense for serial lines.

Unix copied its \n convention because Multics did.

I suspect if you dig far enough back, you'll find a political disagreement between implementors!

(You left out the extra fun bit, where Mac convention is (or used to be) to just use CR to separate lines. And now Unicode also has its own line separator, U+2028!)

Frank Shearar
  • 16,751
7

What is it with people asking "why can Unix do \n and not Windows"? It's such a strange question.

  1. The OS has almost nothing to do with it. It's more a matter of how apps, libraries, protocols and file formats deal with things. Other than where the OS reads/writes text-based configuration or command line commands, it makes no sense to fault the OS.
  2. Most Windows apps can read both \n and \r\n just fine. They also output \r\n so that everyone's happy. A program doesn't simply "do" either \n or \r\n -- it accepts one, the other, or both, and outputs one, the other, or both.
  3. As a programmer this should really almost never bother you. Practically every language/platform has facilities to write the correct end-line and read most robustly. The only time I've had to deal with the problem was when I wrote an HTTP server -- and it was because a certain browser (hint: the next most popular browser after IE) was doing \n instead of the correct \r\n.
  4. A much more pertinent question is, why do so many modern Unix apps output only \n fully knowing that there are some protocols and programs that don't like it?
Rei Miyasaka
  • 4,551
5

Here is an answer from the best source - Microsoft. Why is the line terminator CR+LF?

This protocol dates back to the days of teletypewriters. CR stands for "carriage return" - the CR control character returned the print head ("carriage") to column 0 without advancing the paper. LF stands for "linefeed" - the LF control character advanced the paper one line without moving the print head. So if you wanted to return the print head to column zero (ready to print the next line) and advance the paper (so it prints on fresh paper), you need both CR and LF.

If you go to the various internet protocol documents, such as RFC 0821 (SMTP), RFC 1939 (POP), RFC 2060 (IMAP), or RFC 2616 (HTTP), you'll see that they all specify CR+LF as the line termination sequence. So the the real question is not "Why do CP/M, MS-DOS, and Win32 use CR+LF as the line terminator?" but rather "Why did other people choose to differ from these standards documents and use some other line terminator?"

Unix adopted plain LF as the line termination sequence. If you look at the stty options, you'll see that the onlcr option specifies whether a LF should be changed into CR+LF. If you get this setting wrong, you get stairstep text, where

each
    line
        begins

where the previous line left off. So even unix, when left in raw mode, requires CR+LF to terminate lines. The implicit CR before LF is a unix invention, probably as an economy, since it saves one byte per line.

The unix ancestry of the C language carried this convention into the C language standard, which requires only "\n" (which encodes LF) to terminate lines, putting the burden on the runtime libraries to convert raw file data into logical lines.

The C language also introduced the term "newline" to express the concept of "generic line terminator". I'm told that the ASCII committee changed the name of character 0x0A to "newline" around 1996, so the confusion level has been raised even higher.

4

The reason the conventions hold on their various systems (\n on unix type systems, \r\n on Windows, etc) is that once you've picked a convention you CAN'T change it without breaking a bunch of people's files. And that's generally frowned upon.

Unix-type systems were developed (very early days) using various models of teletype, and at some point someone decided the equipment should carriage return when it did a line feed.

Windows came from DOS, so for Windows the question really is: Why did DOS use this cr/lf sequence? I'm guessing it has something to do with CP/M, where DOS has some of it's roots. Again, specific models of teletype may have played a role.

Michael Kohne
  • 10,146