End of body / message terminator in .txt archive?

Asked by Sandy Walsh

I have the .gz archives for my list, which contains the single .txt file. I'm trying to determine the terminator for the body of each message?

I know the blank line after the headers marks the start of the message, but how would I know if From: is part of the body or the start of the next message?

I don't see anything in the RFC's that relate to archives.

Is this even possible with the .txt archive? Or do I have to use the .mbox format?

Question information

Language:
English Edit question
Status:
Solved
For:
Ubuntu mailman Edit question
Assignee:
No assignee Edit question
Solved by:
Sandy Walsh
Solved:
Last query:
Last reply:

This question was reopened

Revision history for this message
Sandy Walsh (sandy-walsh) said :
#1

nm, I should have read this closer.

There is no EOM marker, just a BOM marker.

"From " marks the start of the message. That's all.

Sorry for the noise.

Revision history for this message
Sandy Walsh (sandy-walsh) said :
#2
Revision history for this message
Sandy Walsh (sandy-walsh) said :
#3

Hmm, close ... but it seems leading "From " in the body of the message is not getting escaped.

This is the first body line
From here to eternity
The is the third body line

isn't getting converted to

This is the first body line
>From here to eternity
The is the third body line

And giving a false beginning-of-message.

Is there some special flag that needs to get turned on for this?

Revision history for this message
Sandy Walsh (sandy-walsh) said :
#4

Likewise for

This is the first body line

From here to eternity third
This is the forth body line

as per
http://email.about.com/cs/standards/a/mbox_format.htm

still no escape

Revision history for this message
Mark Sapiro (msapiro) said :
#5

What Mailman version is this? Is this an old version or perhaps an old .txt(.gz) file in the older portion of an archive? In any recent version of Mailman body lines beginning with "From " will be escaped somehow.

Many modern MUAs will either quoted printable encode the 'F' as '=46' when sending such a message and many MTA/MDAs will escape the line by prepending '>', and if all that fails and a message arrives to Mailman with a body line beginning with "From ", Mailman will prepend the line(s) with '>'. Mailman has been doing this for a long time.

See the description of the mangle_from flag at <https://docs.python.org/2/library/email.generator.html#module-email.generator>.

Revision history for this message
Mark Sapiro (msapiro) said :
#6

By the way. The periodic .txt(.gz) files may be handy for searching one period of the archive, but if you want an actual mbox containing all the list posts, the cuumulative archive mbox has all the headers. It is Mailman's archives/private/LISTNAME.mbox/LISTNAME.mbox file on the server, and even if you have only web access, you can get it. There may be a link to "download the full raw archive" on the archive table of contents page, but if not, you can still get it by going to a URL like >http://www.example.com/mailman/private/listname/> and logging in and then getting <http://www.example.com/mailman/private/listname.mbox/listname.mbox>. And yes, you go to the private archive URL even if the archive is public.

Revision history for this message
Mark Sapiro (msapiro) said :
#7

Also, I forgot to mention, Mailman has a bin/cleanarch script that will do a reasonable job of detecting and optionally escaping unescaped "From " lines in message bodies in a mbox file. See 'bin/cleanarch --help'.

Revision history for this message
Sandy Walsh (sandy-walsh) said :
#8

Thanks Mark,

I don't have admin privileges to the list, so the various mbox options are unavailable to me (or disabled in general)

We're using 2.1.14
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

The offending message can be seen
http://lists.openstack.org/pipermail/openstack-dev/2014-September/044644.html

"
From what I gathered so far Octavia is a fully ..."

I'll look into cleanarch and see about getting mbox access.

Cheers!

Revision history for this message
Mark Sapiro (msapiro) said :
#9

First of all, you don't need any special privilege to get the cummulative archive mailbox I describe in comment #6. You only need to authenticate for private archive access with your own list member address and list member password.

If you get this mbox, you'll see that the real From separators do not have obfuscated email addresses (which makes cleanarch think they are not real From separators). You will also see that the messages have all (or at least most of) the original headers, not just the abbreviated set in the .txt files.

It is not possible for me to be of further help with this question as I have no information about the modications that OpenStack has made to its Mailman installation. Aside from the obvious web style changes, there is at least one anomaly, i.e. the file at <http://lists.openstack.org/pipermail/openstack-dev/2014-September.txt.gz> is a gzip archive containing a file named 2014-September.txt, but this is not a text file. It is another gzip archive which contains the actual text file. This is due to some modification and/or misconfiguration of Mailman. I have no idea what other things might be in play here that result in the "From " line in the message body not being escaped, but I don't think this will happen with an unmodified Mailman.

I think you'll need to follow up with OpenStack to find why their Mailman installation is broken in this way. If they need assistance, they can join the <email address hidden> list at <http://mail.python.org/mailman/listinfo/mailman-users> and post their request to that list. Note that this is a publicly archived list. One should not post personal information that one doesn't want to expose to the world.

Revision history for this message
Sandy Walsh (sandy-walsh) said :
#10

Thanks Mark ... you've been a great help. Will do.

Btw> found the proper url for the mbox

http://lists.openstack.org/cgi-bin/mailman/private/openstack-dev.mbox/openstack-dev.mbox