[Evolution] Importing from KMail -- Lost Messages
Not Zed
notzed@ximian.com
Fri, 01 Apr 2005 09:45:15 +0800
--=-h28JxKPEKf2xDtifQ6ot
Content-Type: text/plain
Content-Transfer-Encoding: 7bit
Still, having said all that, a good IMPORTER - and not just a native
storage format - should be able to deal with this sort of nonsense.
On Thu, 2005-03-31 at 18:37 +0200, Jeffrey Stedfast wrote:
> please see http://www.jwz.org/doc/content-length.html
>
> "Stricter parsing of the ``From '' separator line doesn't help either,
> because there are many, many variations on what goes in that line (since
> it was never standardized either); and also, some mail readers include
> that line verbatim when forwarding messages (Sun's MailTool, for
> example) so a stricter parser wouldn't help that case at all, because
> message bodies tend to contain valid matches."
>
> later on the page describes why you can't unmunge ">From" lines as well.
>
> Jeff
>
> On Thu, 2005-03-31 at 00:28 -0500, Garry Williams wrote:
> > On Thu, 2005-03-31 at 11:44 +0800, Not Zed wrote:
> > > On Wed, 2005-03-30 at 21:10 -0500, Garry Williams wrote:
> > > > On Wed, 2005-03-30 at 17:10 -0500, Rob Matlack wrote:
> >
> > [snip]
> >
> > > > but the same symptoms were produced
> > > > in my case because some of my messages had lines in them that matched
> > > > this regular expression:
> > > >
> > > > ^[Ff]rom[[:space:]]
> >
> > [snip]
> >
> > > Hmm, no, it definitely must be capitalised. I can't see how you could
> > > see it matching against non capitalised words. It uses a memcmp to
> > > look for the "From " line.
> >
> > I just did a test and you are right. My memory is faulty. It takes a
> > capitalized ^From to trigger the break.
> >
> > (By the way, I forgot to mention that the experience I described is with
> > Evolution 2.0.1 on Mandrake 10.1.)
> >
> > [snip]
> >
> > > Ahh well, that isn't berkeley mailbox format then. That's something
> > > similar but different. Rather like sunos' mailbox format which also
> > > uses/honours the content-length header.
> > >
> > > I had no idea mutt did such a thing, it is a pity, since it is a poor
> > > convention to use.
> >
> > I also notice a Lines: header in mutt's messages. I guess it uses both
> > a belt and suspenders. :-)
> >
> > Anyway, it might help to change the import test to also check for a mail
> > address after the ^From and some number of white spaces. Of course,
> > that opens a whole new can of worms because recognizing a syntactically
> > valid E-mail address is non-trivial -- even if it omits comments.
> >
> > The third "field" in a ^From separator is a time stamp. I've seen a few
> > different variations of their formats, depending on the client that
> > created it. Still, recognizing a time stamp should be easier than an
> > E-mail address.
> >
> > Maybe a different change would be to, in the presence of Content-Length:
> > or Lines:, ignore ^From when it occurs too soon.
> >
>
> _______________________________________________
> evolution maillist - evolution@lists.ximian.com
> http://lists.ximian.com/mailman/listinfo/evolution
>
--=-h28JxKPEKf2xDtifQ6ot
Content-Type: text/html; charset=utf-8
Content-Transfer-Encoding: 7bit
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 TRANSITIONAL//EN">
<HTML>
<HEAD>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; CHARSET=UTF-8">
<META NAME="GENERATOR" CONTENT="GtkHTML/3.7.0">
</HEAD>
<BODY>
<BR>
Still, having said all that, a good IMPORTER - and not just a native storage format - should be able to deal with this sort of nonsense.<BR>
<BR>
On Thu, 2005-03-31 at 18:37 +0200, Jeffrey Stedfast wrote:
<BLOCKQUOTE TYPE=CITE>
<PRE>
<FONT COLOR="#000000">please see <A HREF="http://www.jwz.org/doc/content-length.html">http://www.jwz.org/doc/content-length.html</A></FONT>
<FONT COLOR="#000000">"Stricter parsing of the ``From '' separator line doesn't help either,</FONT>
<FONT COLOR="#000000">because there are many, many variations on what goes in that line (since</FONT>
<FONT COLOR="#000000">it was never standardized either); and also, some mail readers include</FONT>
<FONT COLOR="#000000">that line verbatim when forwarding messages (Sun's MailTool, for</FONT>
<FONT COLOR="#000000">example) so a stricter parser wouldn't help that case at all, because</FONT>
<FONT COLOR="#000000">message bodies tend to contain valid matches."</FONT>
<FONT COLOR="#000000">later on the page describes why you can't unmunge ">From" lines as well.</FONT>
<FONT COLOR="#000000">Jeff</FONT>
<FONT COLOR="#000000">On Thu, 2005-03-31 at 00:28 -0500, Garry Williams wrote:</FONT>
<FONT COLOR="#000000">> On Thu, 2005-03-31 at 11:44 +0800, Not Zed wrote:</FONT>
<FONT COLOR="#000000">> > On Wed, 2005-03-30 at 21:10 -0500, Garry Williams wrote: </FONT>
<FONT COLOR="#000000">> > > On Wed, 2005-03-30 at 17:10 -0500, Rob Matlack wrote:</FONT>
<FONT COLOR="#000000">> </FONT>
<FONT COLOR="#000000">> [snip]</FONT>
<FONT COLOR="#000000">> </FONT>
<FONT COLOR="#000000">> > > but the same symptoms were produced</FONT>
<FONT COLOR="#000000">> > > in my case because some of my messages had lines in them that matched</FONT>
<FONT COLOR="#000000">> > > this regular expression:</FONT>
<FONT COLOR="#000000">> > > </FONT>
<FONT COLOR="#000000">> > > ^[Ff]rom[[:space:]]</FONT>
<FONT COLOR="#000000">> </FONT>
<FONT COLOR="#000000">> [snip]</FONT>
<FONT COLOR="#000000">> </FONT>
<FONT COLOR="#000000">> > Hmm, no, it definitely must be capitalised. I can't see how you could</FONT>
<FONT COLOR="#000000">> > see it matching against non capitalised words. It uses a memcmp to</FONT>
<FONT COLOR="#000000">> > look for the "From " line.</FONT>
<FONT COLOR="#000000">> </FONT>
<FONT COLOR="#000000">> I just did a test and you are right. My memory is faulty. It takes a</FONT>
<FONT COLOR="#000000">> capitalized ^From to trigger the break.</FONT>
<FONT COLOR="#000000">> </FONT>
<FONT COLOR="#000000">> (By the way, I forgot to mention that the experience I described is with</FONT>
<FONT COLOR="#000000">> Evolution 2.0.1 on Mandrake 10.1.)</FONT>
<FONT COLOR="#000000">> </FONT>
<FONT COLOR="#000000">> [snip]</FONT>
<FONT COLOR="#000000">> </FONT>
<FONT COLOR="#000000">> > Ahh well, that isn't berkeley mailbox format then. That's something</FONT>
<FONT COLOR="#000000">> > similar but different. Rather like sunos' mailbox format which also</FONT>
<FONT COLOR="#000000">> > uses/honours the content-length header.</FONT>
<FONT COLOR="#000000">> > </FONT>
<FONT COLOR="#000000">> > I had no idea mutt did such a thing, it is a pity, since it is a poor</FONT>
<FONT COLOR="#000000">> > convention to use.</FONT>
<FONT COLOR="#000000">> </FONT>
<FONT COLOR="#000000">> I also notice a Lines: header in mutt's messages. I guess it uses both</FONT>
<FONT COLOR="#000000">> a belt and suspenders. :-)</FONT>
<FONT COLOR="#000000">> </FONT>
<FONT COLOR="#000000">> Anyway, it might help to change the import test to also check for a mail</FONT>
<FONT COLOR="#000000">> address after the ^From and some number of white spaces. Of course,</FONT>
<FONT COLOR="#000000">> that opens a whole new can of worms because recognizing a syntactically</FONT>
<FONT COLOR="#000000">> valid E-mail address is non-trivial -- even if it omits comments. </FONT>
<FONT COLOR="#000000">> </FONT>
<FONT COLOR="#000000">> The third "field" in a ^From separator is a time stamp. I've seen a few</FONT>
<FONT COLOR="#000000">> different variations of their formats, depending on the client that</FONT>
<FONT COLOR="#000000">> created it. Still, recognizing a time stamp should be easier than an</FONT>
<FONT COLOR="#000000">> E-mail address.</FONT>
<FONT COLOR="#000000">> </FONT>
<FONT COLOR="#000000">> Maybe a different change would be to, in the presence of Content-Length:</FONT>
<FONT COLOR="#000000">> or Lines:, ignore ^From when it occurs too soon.</FONT>
<FONT COLOR="#000000">> </FONT>
<FONT COLOR="#000000">_______________________________________________</FONT>
<FONT COLOR="#000000">evolution maillist - <A HREF="mailto:evolution@lists.ximian.com">evolution@lists.ximian.com</A></FONT>
<FONT COLOR="#000000"><A HREF="http://lists.ximian.com/mailman/listinfo/evolution">http://lists.ximian.com/mailman/listinfo/evolution</A></FONT>
</PRE>
</BLOCKQUOTE>
</BODY>
</HTML>
--=-h28JxKPEKf2xDtifQ6ot--