You are here: start » plugin_tutorial_escaping

Plugin Tutorial: Escaping of Strings

Depending on the context where they'll be used, strings sometimes needs to be escaped (aka. quoted) to prevent some characters of having a special meaning.

PHP Source Files

There are several ways to specify literal string in PHP. These have different rules, which are explained in detail in the PHP Language Reference; simply choose the most appropriate one. Note, however, to be particularly careful when writing configuration and language files. Usually the strings in these files are double-quoted, so a few characters have to be escaped (most notably quotes and dollar signs).

Magic Quotes

You can read about the details in the PHP Manual. With regard to CMSimple_XH you typically have to cater for incoming data ($_GET, $_POST, $_COOKIE), which may be “magically” quoted. The solution is simple: just process all input data which may contain quoted characters with stsl() once:

$my_parameter = stsl($_GET['my_parameter']);

stsl() recognizes the setting of magic_quotes_gpc and acts accordingly.

Furthermore you have to cater for magic_quotes_runtime = On. This should be set rather seldom, so it's probably sufficient to add a check that the setting is Off.

(X)HTML Output

When writing strings to the generated (X)HTML, you have to be careful to avoid special HTML characters that are not intedend as such. This is primarily meant to generate valid HTML that “behaves” as intended. An additional benefit is that it will most likely render all XSS attacks harmless.

If you're dealing with plain text (i.e. without markup), you'll have to escape it with htmlspecialchars(), if it may contain “<”, “>” or “&”. Note that it is important to explicitely specify the $encoding parameter of htmlspecialchars(), as its default has changed since PHP 5.4. Furthermore you have to be aware, that htmlspecialchars() returns an empty string, if a byte sequence in the given $string is not valid for the given $encoding parameter. To cater for that you can pass ENT_IGNORE or ENT_SUBSTITUTE for the $flags parameter (unfortunately these constants are recognized only since PHP 5.3 resp. 5.4).

If you want to write the value of an HTML element's attribute, additionally you have to escape the values delimiter character (i.e. '“' or ”'“). Escaping a quote character is already handled by htmlspecialchars() default (ENT_COMPAT), so it's recommendable to always use the quote as value delimiter.

If you want to write some HTML including markup, you must not pass it to htmlspecialchars(). However, you should be careful, that the HTML doesn't contain unintended special characters.

CMSimple_XH 1.6 is likely to introduce XH_hsc(), which should make it easier for you to escape strings for HTML output. FIXME

A note about htmlentities() and html_entity_decode(): htmlentities() is seldom necessary when working with UTF-8 encoded HTML. Usually htmlspecialchars() is absolutely sufficient. html_entity_decode() is sometimes necessary to unescape already HTML escaped strings (e.g. when searching through the content of a CMSimple page); note, that there's no htmlspecialchars_decode() or something. However, be aware that html_entity_decode() doesn't work with multibyte encodings (such as UTF-8) before PHP 5.

When should strings be prepared for HTML output

If you're absolutely sure that you don't need the unescaped strings sometimes, you can pass the input to htmlspecialchars() in the first place:

$my_var = htmlspecialchars($_GET['my_parameter'], ENT_COMPAT, 'UTF-8');

However, it's usually preferable to HTML escape as late as possible, i.e. immediately before or while you generate HTML output. This allows for a cleaner separation of the (business) model and the views.

In some cases, for instance, when you're processing code written in an HTML editor, you already get escaped HTML. Usually you can leave it as is, e.g. writing it directly to the HTML output.


When you write JavaScript code to the HTML output, you have to take care with some special characters. If you want to prepare for real XHTML processing (that doesn't happen, even if you generate XHTML, for now), you have to declare the script as character data:

<script type="text/javascript">/* <![CDATA[ */alert('Hello, world!');/* ]]> */</script>

Otherwise an XHTML parser will choke on all special XML characters in the script.

For normal HTML parsing it is sufficient to treat the character sequence ”</“ appropriately. If you don't, the HTML parser will assume that the script ends there1) (unless it's using sophisticated error correction). You can get around this problem by escaping the slash with a backslash:

<script type="text/javascript">document.write("<p>Hello, world!<\/p>");</script>

If you want to output some data for JavaScript, you're probably best off, if you json_encode() an appropriately prepared PHP array.

Regular Expressions

When working with regular expressions2), you may need to quote them also. If you're having a fixed regular expression, you usually can do the quoting manually. If you build up the regular expression dynamically for arbitrary input, you'll want to use preg_quote().

Note, that you have to be very careful about the $replacement parameter of preg_replace(). Some characters have a special meaning there, so you have to properly escape them. preg_quote() is not meant to be used for this parameter.


That's usually not an issue regarding CMSimple_XH, but for the sake of completeness a short note: you'll probably best off using prepared statements when you have to construct a query dynamically. Then the engine will handle the escaping for you.

« Administration | Data Storage »

to be more precise, the ”</“ is read as the beginning of a closing tag
i.e. PCRE; Posix Regex are deprecated since a long time
You are here: start » plugin_tutorial_escaping
Except where otherwise noted, content on this wiki is licensed under the following license: GNU Free Documentation License 1.3
Valid XHTML 1.0 Valid CSS Driven by DokuWiki