Table of contents
- Understanding Encoding
- Encoding scenarios
- Step by step procedure to fix the encoding issues
Different languages require different characters. Classic character encodings were limited to 255 characters per character set to reduce the amount of disk space required to store text. The original character set, ASCII, was limited to 128 characters and was suitable for English.
The Unicode effort and the UTF-8 encoding were created to support all characters for all languages as efficiently as possible.
Tiki uses UTF-8 internally for all manipulations and has done so for a very long time.
- Extracting the data stored by Tiki from other applications, such as phpMyAdmin, lead to strangely encoded characters.
- Moving the database to a different server could cause unexpected errors.
- Ordering in lists could appear to be random.
This configuration had a pervert effect. When Tiki would send UTF-8 data, MySQL would think it was latin1. Because the database requested UTF-8, it would perform a conversion, leading to doubly-encoded UTF-8.
In reality, the consequences were very similar to the latin case.
Other than not supporting languages not supported by the database encoding, all features would work correctly in Tiki.
Tiki5, more options were added in Tiki5.1.
This section explains how the upgrade to 5.1 will affect your installation.
Correcting the encoding issue will require manual intervention. The Tiki installer comes with tools to assist in the conversion. However, you should have reliable backups before using them. Other techniques involving exporting data, modifying the dump and re-importing it can be used as well.
To avoid data loss, you should make sure your tables use UTF-8. This can be done from the install/upgrade page in the installer.
Removing the double-encoding can be done from Enter Your Tiki. It will require the client charset to be forced to UTF-8.
Follow the steps described here:
New 5.0 installations correctly specified the connection encoding and will lead to either the UTF-8 case or the information loss case.
Upgraded 5.0 installations will lead to different results. However, the upgrade procedure to 5.1 does not impact the changes that were made at that time. Some administrator judgment is advised if encoding issues appear. The client character set can be altered in the configuration file. Contact the developer mailing list if support is required.
If you have no control over the database encoding and still need to use non-latin characters (ex.: Arabic) , it is best not to force the connection encoding to UTF-8 or to set it as the same encoding as the database itself. It will lead to the latin case, which comes with certain downfalls, but still allow for multiple languages to be used on a single site.
Then, in PhpMyAdmin, go to the database you'll be using your Tiki in Server: localhost -> Database: tiki -> Operations -> Collation, and also change from latin_1 to utf8_general_ci before running the installer. (Generally, this needs to be changed in typical shared hosting)
This is the equivalent of the following MySQL statement:
ALTER DATABASE DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci;
- applying revision 30522, which fixes it (and will be 5.4)
If you can't do that, you can try to
- add $api_tiki='adodb'; to your db/local.php,
and/or not forcing utf8
Tiki19 will by default set the MySQL/MariaDB tables to utf8mb4 encoding to be fully UTF-8 compatible (including the Emojis support).
Getting out of MySQL Character Set Hell