lines in an email message

According to rfc2822, an email message is:

   A message that is conformant with this standard is comprised of
characters with values in the range 1 through 127 and interpreted as
US-ASCII characters [ASCII].  For brevity, this document sometimes
refers to this range of characters as simply “US-ASCII characters”.

US ASCII characters include ascii code 0 through 127. So the allowed set of characters in an email message is not the same set of US ASCII characters(0 is excluded). Do not think us ascii characters are all printable. Some(code 0 through 32 are non-printing). Also note that US ASCII code set is not all the codes that a byte can represent(128 through 255 are not US ASCII codes, although they are printable).  Some parts of the message have more restrictions about what characters can be used. For example, header name can only use printable US ASCII characters(i.e., characters that have values between 33 and 126, inclusive).

RFC2822 also says a message is composed of lines that must be no longer than 998 characters. It recommends a line is not longer than 78 characters.  This line length limit applies to both headers and body of the message. If a header exceeds that limit such as a long subject, it should be split into multiple lines using CRLF SPACE, which is called folding. The receiving end needs to unfold the lines into single line before further processing. The unfolding is simple: just remove the CRLF before any space(the space itself is kept though). This unfolding operation reminds us that do not start a header with space, otherwise, it will be considered as the continuation of the last header.

The message body is also subject to  the line length limit. But it is hard to let an email composer count the length of a line and pay attention not to exceed the length limit when he is writing the content of an email. The email client should do this task: split long lines into short lines that are shorter than the length limit, by adding CRLFs at the end of lines. The receiving email client should remove the added CRLF lines before presenting the email to user. Now, there is a problem. How does the receiving email client know which CRLFs should be removed? It should not remove all CRLFs at the end of all lines because some of them are typed by the email composer and should be kept as is. It should only remove the CRLFs added by the sending email client in order to conform to the line length limit. Until now, you will be aware of the subtle part of the standard text:”Each line of characters MUST be no more than 998 characters, and SHOULD be no more than 78 characters, excluding the CRLF.” Note that the CRLF that is added in order to conform to the line length limit is out of the line itself. So, the receiving email client only needs to remove the CRLFs that are outside of the 78 characters. Those CRLFs that are within the 78-character limit were added by the email composer and should not be removed.

Things become more complicated for quoted-printable content transfer encoding. Content transfer encoding is the encoding method applied to message body. Quote-printable encoding prohibits space being the ending character of a line so the email client cannot insert a CRLF after a space.  The line length limit is also reduced from 78 to 76 for quote-printable encoding. So we’d better add a soft line break ‘=’ as the 76th character of a line. then add a CRLF as the line separator.  This guarantees no line ends with a space and every line satisfies the line length limit. The receiving email client will remove all soft line breaks(= at the end of lines). If the email composer types a “=” in his email content, it must be encoded as “=3D”. The 3 “=XX” encoding characters should not be separated by line breaks, so if they appear at the end of a line that would exceed the line length limit, they should be moved to the beginning of the next line.

Posted in tips of hosting