Here are some notes on the parsing semantics:
- Most non-multipart type messages are parsed as a single
message object with a string payload. These objects will return
0 for is_multipart().
- One exception is for message/delivery-status type
messages. Because the body of such messages consist of
blocks of headers, Parser will create a non-multipart
object containing non-multipart subobjects for each header
block.
- Another exception is for message/* types (more
general than message/delivery-status). These are
typically message/rfc822 messages, represented as a
non-multipart object containing a singleton payload which is
another non-multipart Message instance.
There are several useful utilities provided with the email
package.
- quote(str)
-
Return a new string with backslashes in str replaced by two
backslashes and double quotes replaced by backslash-double quote.
- unquote(str)
-
Return a new string which is an unquoted version of str.
If str ends and begins with double quotes, they are stripped
off. Likewise if str ends and begins with angle brackets, they
are stripped off.
- parseaddr(address)
-
Parse address - which should be the value of some address-containing
field such as or - into its constituent
realname and email address parts. Returns a tuple of that
information, unless the parse fails, in which case a 2-tuple of
(None, None)
is returned.
- dump_address_pair(pair)
-
The inverse of parseaddr(), this takes a 2-tuple of the form
(realname, email_address)
and returns the string value suitable
for a or header. If the first element of
pair is false, then the second element is returned unmodified.
- getaddresses(fieldvalues)
-
This method returns a list of 2-tuples of the form returned by
parseaddr()
. fieldvalues is a sequence of header field
values as might be returned by Message.getall(). Here's a
simple example that gets all the recipients of a message:
from email.Utils import getaddresses
tos = msg.get_all('to')
ccs = msg.get_all('cc')
resent_tos = msg.get_all('resent-to')
resent_ccs = msg.get_all('resent-cc')
all_recipients = getaddresses(tos + ccs + resent_tos + resent_ccs)
- decode(s)
-
This method decodes a string according to the rules in RFC 2047. It
returns the decoded string as a Python unicode string.
- encode(s[, charset[, encoding]])
-
This method encodes a string according to the rules in RFC 2047. It
is not actually the inverse of decode() since it doesn't
handle multiple character sets or multiple string parts needing
encoding. In fact, the input string s must already be encoded
in the charset character set (Python can't reliably guess what
character set a string might be encoded in). The default
charset is "iso-8859-1".
encoding must be either the letter "q" for
Quoted-Printable or "b" for Base64 encoding. If
neither, a ValueError is raised. Both the charset and
the encoding strings are case-insensitive, and coerced to lower
case in the returned string.
- parsedate(date)
-
Attempts to parse a date according to the rules in RFC 2822.
however, some mailers don't follow that format as specified, so
parsedate() tries to guess correctly in such cases.
date is a string containing an RFC 2822 date, such as
"Mon, 20 Nov 1995 19:12:08 -0500"
. If it succeeds in parsing
the date, parsedate() returns a 9-tuple that can be passed
directly to time.mktime(); otherwise None
will be
returned. Note that fields 6, 7, and 8 of the result tuple are not
usable.
- parsedate_tz(date)
-
Performs the same function as parsedate(), but returns
either
None
or a 10-tuple; the first 9 elements make up a tuple
that can be passed directly to time.mktime(), and the tenth
is the offset of the date's timezone from UTC (which is the official
term for Greenwich Mean Time)12.5. If the input
string has no timezone, the last element of the tuple returned is
None
. Note that fields 6, 7, and 8 of the result tuple are not
usable.
- mktime_tz(tuple)
-
Turn a 10-tuple as returned by parsedate_tz() into a UTC
timestamp. It the timezone item in the tuple is
None
, assume
local time. Minor deficiency: mktime_tz() interprets the
first 8 elements of tuple as a local time and then compensates
for the timezone difference. This may yield a slight error around
changes in daylight savings time, though not worth worring about for
common use.
- formatdate([timeval])
-
Returns the time formatted as per Internet standards RFC 2822
and updated by RFC 1123. If timeval is provided, then it
should be a floating point time value as expected by
time.gmtime(), otherwise the current time is used.
Footnotes
- ... Time)12.5
- Note that the sign of the timezone
offset is the opposite of the sign of the
time.timezone
variable for the same timezone; the latter variable follows the
POSIX standard while this module follows RFC 2822.
See About this document... for information on suggesting changes.