[Evolution] Importing from KMail -- Lost Messages

Not Zed notzed@ximian.com
Fri, 01 Apr 2005 09:45:15 +0800


--=-h28JxKPEKf2xDtifQ6ot
Content-Type: text/plain
Content-Transfer-Encoding: 7bit


Still, having said all that, a good IMPORTER - and not just a native
storage format - should be able to deal with this sort of nonsense.

On Thu, 2005-03-31 at 18:37 +0200, Jeffrey Stedfast wrote:

> please see http://www.jwz.org/doc/content-length.html
> 
> "Stricter parsing of the ``From '' separator line doesn't help either,
> because there are many, many variations on what goes in that line (since
> it was never standardized either); and also, some mail readers include
> that line verbatim when forwarding messages (Sun's MailTool, for
> example) so a stricter parser wouldn't help that case at all, because
> message bodies tend to contain valid matches."
> 
> later on the page describes why you can't unmunge ">From" lines as well.
> 
> Jeff
> 
> On Thu, 2005-03-31 at 00:28 -0500, Garry Williams wrote:
> > On Thu, 2005-03-31 at 11:44 +0800, Not Zed wrote:
> > > On Wed, 2005-03-30 at 21:10 -0500, Garry Williams wrote: 
> > > > On Wed, 2005-03-30 at 17:10 -0500, Rob Matlack wrote:
> > 
> > [snip]
> > 
> > > > but the same symptoms were produced
> > > > in my case because some of my messages had lines in them that matched
> > > > this regular expression:
> > > > 
> > > >     ^[Ff]rom[[:space:]]
> > 
> > [snip]
> > 
> > > Hmm, no, it definitely must be capitalised.  I can't see how you could
> > > see it matching against non capitalised words.  It uses a memcmp to
> > > look for the "From " line.
> > 
> > I just did a test and you are right.  My memory is faulty.  It takes a
> > capitalized ^From to trigger the break.
> > 
> > (By the way, I forgot to mention that the experience I described is with
> > Evolution 2.0.1 on Mandrake 10.1.)
> > 
> > [snip]
> > 
> > > Ahh well, that isn't berkeley mailbox format then.  That's something
> > > similar but different.  Rather like sunos' mailbox format which also
> > > uses/honours the content-length header.
> > > 
> > > I had no idea mutt did such a thing, it is a pity, since it is a poor
> > > convention to use.
> > 
> > I also notice a Lines: header in mutt's messages.  I guess it uses both
> > a belt and suspenders.  :-)
> > 
> > Anyway, it might help to change the import test to also check for a mail
> > address after the ^From and some number of white spaces.  Of course,
> > that opens a whole new can of worms because recognizing a syntactically
> > valid E-mail address is non-trivial -- even if it omits comments.  
> > 
> > The third "field" in a ^From separator is a time stamp.  I've seen a few
> > different variations of their formats, depending on the client that
> > created it.  Still, recognizing a time stamp should be easier than an
> > E-mail address.
> > 
> > Maybe a different change would be to, in the presence of Content-Length:
> > or Lines:, ignore ^From when it occurs too soon.
> > 
> 
> _______________________________________________
> evolution maillist  -  evolution@lists.ximian.com
> http://lists.ximian.com/mailman/listinfo/evolution
> 

--=-h28JxKPEKf2xDtifQ6ot
Content-Type: text/html; charset=utf-8
Content-Transfer-Encoding: 7bit

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 TRANSITIONAL//EN">
<HTML>
<HEAD>
  <META HTTP-EQUIV="Content-Type" CONTENT="text/html; CHARSET=UTF-8">
  <META NAME="GENERATOR" CONTENT="GtkHTML/3.7.0">
</HEAD>
<BODY>
<BR>
Still, having said all that, a good IMPORTER - and not just a native storage format - should be able to deal with this sort of nonsense.<BR>
<BR>
On Thu, 2005-03-31 at 18:37 +0200, Jeffrey Stedfast wrote:
<BLOCKQUOTE TYPE=CITE>
<PRE>
<FONT COLOR="#000000">please see <A HREF="http://www.jwz.org/doc/content-length.html">http://www.jwz.org/doc/content-length.html</A></FONT>

<FONT COLOR="#000000">&quot;Stricter parsing of the ``From '' separator line doesn't help either,</FONT>
<FONT COLOR="#000000">because there are many, many variations on what goes in that line (since</FONT>
<FONT COLOR="#000000">it was never standardized either); and also, some mail readers include</FONT>
<FONT COLOR="#000000">that line verbatim when forwarding messages (Sun's MailTool, for</FONT>
<FONT COLOR="#000000">example) so a stricter parser wouldn't help that case at all, because</FONT>
<FONT COLOR="#000000">message bodies tend to contain valid matches.&quot;</FONT>

<FONT COLOR="#000000">later on the page describes why you can't unmunge &quot;&gt;From&quot; lines as well.</FONT>

<FONT COLOR="#000000">Jeff</FONT>

<FONT COLOR="#000000">On Thu, 2005-03-31 at 00:28 -0500, Garry Williams wrote:</FONT>
<FONT COLOR="#000000">&gt; On Thu, 2005-03-31 at 11:44 +0800, Not Zed wrote:</FONT>
<FONT COLOR="#000000">&gt; &gt; On Wed, 2005-03-30 at 21:10 -0500, Garry Williams wrote: </FONT>
<FONT COLOR="#000000">&gt; &gt; &gt; On Wed, 2005-03-30 at 17:10 -0500, Rob Matlack wrote:</FONT>
<FONT COLOR="#000000">&gt; </FONT>
<FONT COLOR="#000000">&gt; [snip]</FONT>
<FONT COLOR="#000000">&gt; </FONT>
<FONT COLOR="#000000">&gt; &gt; &gt; but the same symptoms were produced</FONT>
<FONT COLOR="#000000">&gt; &gt; &gt; in my case because some of my messages had lines in them that matched</FONT>
<FONT COLOR="#000000">&gt; &gt; &gt; this regular expression:</FONT>
<FONT COLOR="#000000">&gt; &gt; &gt; </FONT>
<FONT COLOR="#000000">&gt; &gt; &gt;     ^[Ff]rom[[:space:]]</FONT>
<FONT COLOR="#000000">&gt; </FONT>
<FONT COLOR="#000000">&gt; [snip]</FONT>
<FONT COLOR="#000000">&gt; </FONT>
<FONT COLOR="#000000">&gt; &gt; Hmm, no, it definitely must be capitalised.  I can't see how you could</FONT>
<FONT COLOR="#000000">&gt; &gt; see it matching against non capitalised words.  It uses a memcmp to</FONT>
<FONT COLOR="#000000">&gt; &gt; look for the &quot;From &quot; line.</FONT>
<FONT COLOR="#000000">&gt; </FONT>
<FONT COLOR="#000000">&gt; I just did a test and you are right.  My memory is faulty.  It takes a</FONT>
<FONT COLOR="#000000">&gt; capitalized ^From to trigger the break.</FONT>
<FONT COLOR="#000000">&gt; </FONT>
<FONT COLOR="#000000">&gt; (By the way, I forgot to mention that the experience I described is with</FONT>
<FONT COLOR="#000000">&gt; Evolution 2.0.1 on Mandrake 10.1.)</FONT>
<FONT COLOR="#000000">&gt; </FONT>
<FONT COLOR="#000000">&gt; [snip]</FONT>
<FONT COLOR="#000000">&gt; </FONT>
<FONT COLOR="#000000">&gt; &gt; Ahh well, that isn't berkeley mailbox format then.  That's something</FONT>
<FONT COLOR="#000000">&gt; &gt; similar but different.  Rather like sunos' mailbox format which also</FONT>
<FONT COLOR="#000000">&gt; &gt; uses/honours the content-length header.</FONT>
<FONT COLOR="#000000">&gt; &gt; </FONT>
<FONT COLOR="#000000">&gt; &gt; I had no idea mutt did such a thing, it is a pity, since it is a poor</FONT>
<FONT COLOR="#000000">&gt; &gt; convention to use.</FONT>
<FONT COLOR="#000000">&gt; </FONT>
<FONT COLOR="#000000">&gt; I also notice a Lines: header in mutt's messages.  I guess it uses both</FONT>
<FONT COLOR="#000000">&gt; a belt and suspenders.  :-)</FONT>
<FONT COLOR="#000000">&gt; </FONT>
<FONT COLOR="#000000">&gt; Anyway, it might help to change the import test to also check for a mail</FONT>
<FONT COLOR="#000000">&gt; address after the ^From and some number of white spaces.  Of course,</FONT>
<FONT COLOR="#000000">&gt; that opens a whole new can of worms because recognizing a syntactically</FONT>
<FONT COLOR="#000000">&gt; valid E-mail address is non-trivial -- even if it omits comments.  </FONT>
<FONT COLOR="#000000">&gt; </FONT>
<FONT COLOR="#000000">&gt; The third &quot;field&quot; in a ^From separator is a time stamp.  I've seen a few</FONT>
<FONT COLOR="#000000">&gt; different variations of their formats, depending on the client that</FONT>
<FONT COLOR="#000000">&gt; created it.  Still, recognizing a time stamp should be easier than an</FONT>
<FONT COLOR="#000000">&gt; E-mail address.</FONT>
<FONT COLOR="#000000">&gt; </FONT>
<FONT COLOR="#000000">&gt; Maybe a different change would be to, in the presence of Content-Length:</FONT>
<FONT COLOR="#000000">&gt; or Lines:, ignore ^From when it occurs too soon.</FONT>
<FONT COLOR="#000000">&gt; </FONT>

<FONT COLOR="#000000">_______________________________________________</FONT>
<FONT COLOR="#000000">evolution maillist  -  <A HREF="mailto:evolution@lists.ximian.com">evolution@lists.ximian.com</A></FONT>
<FONT COLOR="#000000"><A HREF="http://lists.ximian.com/mailman/listinfo/evolution">http://lists.ximian.com/mailman/listinfo/evolution</A></FONT>

</PRE>
</BLOCKQUOTE>
</BODY>
</HTML>

--=-h28JxKPEKf2xDtifQ6ot--