Disabling WordPress character replacements inside code tags

Update 2009-02-12: This bug appears to have been resolved with WordPress 2.7.1.

I noticed when reviewing a recent post that WordPress was “helpfully” replacing two hyphens with emdash, three dots with ellipsis, apostrophe with curly apostrophe, etc. It does this even between <code> tags – on the second line and beyond. I appreciate that WordPress does this when I’m writing in a journalistic style, but I’m also a programmer and engineer, which means it hinders as often as it helps.

In WordPress 2.6.2, the problem comes from a bug in ./wp-includes/formatting.php. The auto-formatter tries not to do these replacements between code tags, but incorrectly identifies the start of any following tag as the end of the code section. That means a multi-line code section (which has <br> tags breaking each line) will have replacements run on it on all lines after the first.

This bug is found between lines 30 and 43:

if (isset($curl{0}) && '<' != $curl{0} && '[' != $curl{0} && $next && !$has_pre_parent) { // If it's not a tag
   // static strings
   $curl = str_replace($static_characters, $static_replacements, $curl);
   // regular expressions
   $curl = preg_replace($dynamic_characters, $dynamic_replacements, $curl);
} elseif (strpos($curl, '<code') !== false || strpos($curl, '<kbd') !== false || strpos($curl, '<style') !== false || strpos($curl, '<script') !== false) {
   $next = false;
} elseif (strpos($curl, '<pre') !== false) {
   $has_pre_parent = true;
} elseif (strpos($curl, '</pre>') !== false) {
   $has_pre_parent = false;
} else {
   $next = true;
}

One solution is to make code tags behave like pre tags. Change lines 35-39 to the following:

} elseif (strpos($curl, '<kbd') !== false || strpos($curl, '<style') !== false || strpos($curl, '<script') !== false) {
   $next = false;
} elseif (strpos($curl, '<pre') !== false || strpos($curl, '<code') !== false) {
   $has_pre_parent = true;
} elseif (strpos($curl, '</pre>') !== false || strpos($curl, '</code>') !== false) {
   $has_pre_parent = false;

Now code tags will be treated as pre tags – absolutely no replacements except for those necessary for HTML encoding. This does introduce bugs in the case of nested code and pre tags, so don’t put them inside each other.

If you want to disable replacements altogether, just comment out lines 32 and 34 (the str_replace and preg_replace calls).

Aside: writing this article so all the tags would display correctly was an incredible pain.

Tags:

Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.