Page 1 of 2 12 LastLast
Results 1 to 10 of 16

Thread: [hack] Unicode workaround

  1. #1
    Junior Member
    Join Date
    Feb 2009
    Posts
    6

    [hack] Unicode workaround

    Hi all. Zarafa doesn't support Unicode and I've proposed a temporary workaround solution, that benefits only Webaccess users, on IRC. Imar suggested that I posted the details on the forum.

    I'd do it myself but I don't actually know enough C++ to do this. A small part of iconv functionality needs to be hacked/duplicated, or a different encoding library found. Besides that, our client has chosen for Zimbra with Zarafa's lack of Unicode support as the major show stopper, so I don't have much use for this solution myself.

    Currently, all messages are converted from whatever character set they're in, to Windows-1252. Characters that aren't in Windows-1252 are replaced by question marks and lost forever.

    My suggestion is to convert to a mixture of Windows-1252 and UTF-8. Continue to convert to Windows-1252 as much as possible so everything that only supports that keeps working, but instead of using question marks to "encode" unsupported characters, use UTF-8 for those. (An easier way to implement this is: convert everything to UTF-8 first, and then as much as possible back to Windows-1252, leaving the rest of the UTF-8 intact. The latter can be done with 128 regexes or the character-scanning equivalent :mrgreen

    The second half of this solution is to hack the webaccess. Find valid multi-octet UTF-8 sequences and convert them to €-style HTML entities. A regex with eval-code RHS can probably handle this in a oneliner.

    The main downside of this hack is that non-webaccess users will see "나는 ìœ*리를 먹ì?„ 수 있어요. 그래ë?„ 아프지 ì•Šì•„ìš”" instead of a bunch of question marks. But webaccess users will see the intended Korean "?? ??? ?? ? ???. ??? ??? ???". I think mojibake isn't worse than question marks, but the improvement for webaccess users would be huge.

  2. #2
    Senior Member
    Join Date
    Jan 2008
    Posts
    189

    Re: [hack] Unicode workaround

    It may be a silly question, but wouldn't switching to UTF-8 make more sense? Or does that break outlook?
    - Zarafa 6.40 | Ubuntu Server 10.04 LTS (64-bit)

  3. #3
    Junior Member
    Join Date
    Jan 2009
    Posts
    11

    Re: [hack] Unicode workaround

    Ok, I kinda would need a quick solution for that because I have some russian ppl using their mail with my zarafa solution and I cant migrate them without the unicode support. So what can I do to get that to work till its officially supported?

    Outlook Support is not even that important for me so if the change to UTF-8 breaks only that I can live with it, I just need a working webaccess.

  4. #4
    Senior Member
    Join Date
    Jan 2008
    Posts
    189

    Re: [hack] Unicode workaround

    I mean ditching windows-xxx charsets in favor of UTF-8 altogether...
    - Zarafa 6.40 | Ubuntu Server 10.04 LTS (64-bit)

  5. #5
    Junior Member
    Join Date
    Jan 2009
    Posts
    11

    Re: [hack] Unicode workaround

    Yes I know I also would prefer a complete UTF-8 encoded system but I guess the problem is the Outlook support then.

  6. #6
    Junior Member
    Join Date
    Feb 2009
    Posts
    6

    Re: [hack] Unicode workaround

    Please don't hijack this thread to express how much you want real Unicode support. I'm sure the Zarafa developers know your concerns and wishes, but postpone the great switch from Windows-1252 to UTF-8 because that's a big ordeal.

    Windows-1252 is very pervasive in Zarafa. Getting rid of it is a lot of work. I'm suggesting a minimally invasive hack, to serve as a temporary solution until real Unicode and UTF-8 support is possible.

    This forum is about development, and I'd like to hear what actual C++ programmers think about my idea. Am I right if I estimate that this would only be a few hours, maybe one day, of work?

  7. #7
    Senior Member
    Join Date
    Nov 2008
    Location
    Hilden, close to Duesseldorf, NRW, Germany, Europe, Earth...
    Posts
    1,070

    Re: [hack] Unicode workaround

    Hi Juerd,

    I'm quite sure that you read about my cyrillic problem in other part of this forum, too. I would really appreciate to have a solution for this problem, I even wanted to do myself in working on a solution. Point is that I found the UTF-8 issue being a quite big job without help and information from the Zarafa Developers. Furthermore everything that is being done on this issue outside of the Zarafa Development will lead into the problem that in case official support will be there could cause migration problems that for sure Zarafa Team cannot support. This is my most concern about any way to solve the problem with chars in eMails out of the Windows-1252 codepage.

    But what I saw... as long as eMail is being sent in UTF-8 HTML, the content is being displayed correctly at least in the web interface. Maybe for the mean time this would help to get at least the content of the message in a readable form. Means, that prior writting any email content to database, just convert it into UTF-8 HTML.

    Best regards

    Andreas
    Using Zarafa 7.2.5-29, Z-Push 2.3.4 with GZip and soon again SMS Support.

  8. #8
    Junior Member
    Join Date
    Feb 2009
    Posts
    6

    Re: [hack] Unicode workaround

    Moving to UTF-8 completely would indeed be a big job. It's hard to oversee the consequences of that because it affects large parts of the codebase. That's why I'm suggesting keeping Windows-1252 (for now, not permanently) and using UTF-8 only for what would otherwise be question marks.

    This requires a small change where e-mail is converted, and where e-mail is displayed, but nowhere in between.

    Your plan sounds workable too, but it involves changes at more levels. For instance, the headers need to be updated. These can be the message headers or mime part headers. There's a lot of complexity involved.

    In the end, only either converting everything to a UTF, or adding full charset support to the entire chain, will work. But it is my impression that this won't happen any time soon.

  9. #9
    Senior Member
    Join Date
    Nov 2008
    Location
    Hilden, close to Duesseldorf, NRW, Germany, Europe, Earth...
    Posts
    1,070

    Re: [hack] Unicode workaround

    Hi Juerd,

    the question is for my opinion if your patch idea will mentain compatibility and what it means in upgrade scenarios. In case your solution will not bring problems during updates or is being supported by zarafa during updates in case it must be considered - well, why not using it... Furthermore the question is in general about updates. How the patch will be mentained?

    About my idea. I think the work is similar because we just need to check for the cases that the mail is not being send as utf-8 html (plaintext, html messages being send in different encoding that cannot be expressed in windows 1252 encoding). In that case we need to make it either html and use utf-8 encoding or convert it just into utf-8. This can be done by extending zarafa-dagent for my opinion. More things need not be changed because all other functions working the same - they will simply display html content ;-). During updates we just need to apply the same patch since the delivery functions will for my opinion not being that much changed the patch could be easier to be mentained.

    Best regards

    Andreas
    Using Zarafa 7.2.5-29, Z-Push 2.3.4 with GZip and soon again SMS Support.

  10. #10
    Junior Member
    Join Date
    Feb 2009
    Posts
    6

    Re: [hack] Unicode workaround

    Since I won't be writing any code, I can't say anything about maintenance or support. I would hope that the dear people at Zarafa will themselves hack it into Zarafa.

    I think our ideas are not similar. I think yours is cleaner but much more work. Converting "text/plain" to "text/html; charset=UTF-8" is more involved. It may look easy, but it can get complex as multipart MIME messages are involved and existing charset attributes are in the Content-Type header(s). (If you need to change non-UTF8 text/html parts too, then you're in for a treat, involving parsing HTML for meta tags.)

    Also, if you fully convert messages to UTF-8, everything that's built for Windows-1252 will stop working even if the characters are supported in Windows-1252. I don't know how much code depends on this... do you? I can imagine things like search features to break on this.

    I'm not going to discuss this further. I'm not against your proposal, and I am not going to defend mine.

Page 1 of 2 12 LastLast

Similar Threads

  1. Unicode issue
    By casalicomputers in forum WebAccess
    Replies: 4
    Last Post: 02-12-2011, 03:29 PM
  2. [solved] One more Unicode problem
    By mcepl in forum Beta Feedback Archives
    Replies: 3
    Last Post: 12-03-2011, 12:45 AM
  3. Unicode Support
    By casalicomputers in forum Administration and Integration Archives
    Replies: 14
    Last Post: 05-11-2010, 05:08 PM
  4. Unicode support in 6.20
    By msavchenko in forum Administration and Integration Archives
    Replies: 9
    Last Post: 03-03-2010, 10:30 AM
  5. Outlook Zarafa client installation fails (with workaround)
    By micral in forum Outlook usage Archives
    Replies: 3
    Last Post: 18-11-2009, 10:45 AM

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •