Fixing Garbled Text

Under certain conditions, characters may appear corrupt in the browser (for example, "I � Unicode" appears, instead of 'I ♥ Unicode'). Or, text may appear to be missing.

  • If characters appear corrupt in the browser, then the character set sent in the headers does not match the data. Unless the data is corrupted, this can only happen if the data is encoded in a charset other than the one it claims to be.
  • If there seems to be missing text, the text is probably not actually missing. Instead, it is probably made up of one or more whitespace/non-printable characters. This can happen when a third-party legacy file's 'charset' key does not match the text's actual character set. The text is thus re-encoded from what it claims to be into what it needs to be, and the resulting corrupt data is entirely whitespace/non-printable characters. Note: There is no 100% reliable way to verify the validity of a specified charset, which is why the information on this page was created.

Read on to learn how to resolve both of these issues.

PICK Important: This is not meant to be a comprehensive guide to charset conversion. It is meant as a basic guide to fixing encoding issues with files involved in Cpanel::Locale. Typically, this will apply to third-party legacy lang files, but could apply equally to any files that were misencoded in some manner.

To resolve the problem, change the file's character set to utf-8:

  1. Open it in an editor that allows you to save it as utf-8 (such as textmate).*
  2. Delete the 'charset' key. (This will keep the compiler from trying to convert it to utf-8. The key is ignored and unused otherwise anyway.)
  3. Save the file as utf-8. It will be converted, assuming it's not irreparably corrupt, and a byte-order mark (BOM) will most likely be given to it.
  4. Recompile the given CDB file. Look at the output: it will say File updated '...cdb' if it is actually updated. Otherwise, it will say 'CDB file '...cdb' is already current'.

The data must be encoded as what it says it is. The easiest thing to do is to make it utf-8.

* You may also wish to use the command man iconv to learn more about the iconv tool. However, this only applies if you know the actual charset the original data is encoded in.

Third-Party Theme Files

We strongly encourage that you use our locale system instead of hard coding text when creating 3rd party theme files. Using our locale system prevents garbled text and allows you to translate the entire interface. If you choose to use hard coded text, the text will appear garbled if it is not properly encoded.

If it is absolutely necessary to use hard coded text, the file must be encoded as utf-8.

  1. Open your theme file in an editor that allows you to save it as utf-8 (such as textmate).*
  2. Save the file as utf-8. It will be converted, assuming it's not irreparably corrupt, and a byte-order mark (BOM) will most likely be given to it.

* You may also wish to use the command man iconv to learn more about the iconv tool. However, this only applies if you know the actual charset the original data is encoded in.

Topic revision: r6 - 26 May 2011 - 19:25:39 - Main.MelanieSeibert