functions-kses.php
Note: Zenphoto does not want html entities encoded. This script has been modified to prevent the encodings.
Additionally it has been modified so css properties within style attributes are not mistaken as "bad protocols" and cleared.
Also some re-formatting has been done and function documentation has been changed to follow standard phpdoc blocks
Search for Zenphoto for changes.
kses 0.2.2 - HTML/XHTML filter that only allows some elements and attributes Copyright (C) 2002, 2003, 2005 Ulf Harnhammar
This program is free software and open source software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA or visit http://www.gnu.org/licenses/gpl.html
*** CONTACT INFORMATION ***
E-mail: metaur at users dot sourceforge dot net Web page: http://sourceforge.net/projects/kses Paper mail: Ulf Harnhammar Ymergatan 17 C 753 25 Uppsala SWEDEN
[kses strips evil scripts!]
Table of Contents
Functions
- kses() : type
- This function makes sure that only the allowed HTML element names, attribute names and attribute values plus only sane HTML entities will occur in $string. You have to remove any slashes from PHP's magic quotes before you call this function.
- kses_hook() : mixed
- You add any kses hooks here.
- kses_version() : mixed
- This function returns kses' version number.
- kses_split() : mixed
- This function searches for HTML tags, no matter how malformed. It also matches stray ">" characters.
- kses_split2() : string
- This function does a lot of work. It rejects some very malformed things like <:::>. It returns an empty string, if the element isn't allowed (look ma, no strip_tags()!). Otherwise it splits the tag into an element and an attribute list.
- kses_attr() : type
- This function removes all attributes, if none are allowed for this element.
- kses_hair() : type
- This function does a lot of work. It parses an attribute list into an array with attribute data, and tries to do the right thing even if it gets weird input. It will add quotes around attribute values that don't have any quotes or apostrophes around them, to make it easier to produce HTML code that will conform to W3C's HTML specification. It will also remove bad URL protocols from attribute values.
- kses_check_attr_val() : bool
- This function performs different checks for attribute values. The currently implemented checks are "maxlen", "minlen", "maxval", "minval" and "valueless" with even more checks to come soon.
- kses_bad_protocol() : type
- This function removes all non-allowed protocols from the beginning of $string. It ignores whitespace and the case of the letters, and it does understand HTML entities. It does its work in a while loop, so it won't be fooled by a string like "javascript:javascript:alert(57)".
- kses_no_null() : type
- This function removes any NULL characters in $string.
- kses_stripslashes() : type
- This function changes the character sequence \" to just " It leaves all other slashes alone. It's really weird, but the quoting from preg_replace(//e) seems to require this.
- kses_array_lc() : mixed
- This function goes through an array, and changes the keys to all lower case.
- kses_js_entities() : type
- This function removes the HTML JavaScript entities found in early versions of Netscape 4.
- kses_html_error() : type
- This function deals with parsing errors in kses_hair(). The general plan is to remove everything to and including some whitespace, but it deals with quotes and apostrophes as well.
- kses_bad_protocol_once() : type
- This function searches for URL protocols at the beginning of $string, while handling whitespace and HTML entities.
- kses_bad_protocol_once2() : string
- This function processes URL protocols, checks to see if they're in the white- list or not, and returns different data depending on the answer.
- kses_normalize_entities() : type
- This function normalizes HTML entities. It will convert "AT&T" to the correct "AT&T", ":" to ":", "&#XYZZY;" to "&#XYZZY;" and so on.
- kses_normalize_entities2() : type
- This function helps kses_normalize_entities() to only accept 16 bit values and nothing more for &#number; entities.
- kses_decode_entities() : type
- This function decodes numeric HTML entities (A and A). It doesn't do anything with other entities like ä, but we don't need them in the URL protocol whitelisting system anyway.
Functions
kses()
This function makes sure that only the allowed HTML element names, attribute names and attribute values plus only sane HTML entities will occur in $string. You have to remove any slashes from PHP's magic quotes before you call this function.
kses(type $string, type $allowed_html[, type $allowed_protocols = array('http', 'https', 'ftp', 'news', 'nntp', 'telnet', 'gopher', 'mailto') ]) : type
Parameters
- $string : type
- $allowed_html : type
- $allowed_protocols : type = array('http', 'https', 'ftp', 'news', 'nntp', 'telnet', 'gopher', 'mailto')
Return values
typekses_hook()
You add any kses hooks here.
kses_hook(mixed $string) : mixed
Parameters
- $string : mixed
kses_version()
This function returns kses' version number.
kses_version() : mixed
kses_split()
This function searches for HTML tags, no matter how malformed. It also matches stray ">" characters.
kses_split(mixed $string, mixed $allowed_html, mixed $allowed_protocols) : mixed
Parameters
- $string : mixed
- $allowed_html : mixed
- $allowed_protocols : mixed
kses_split2()
This function does a lot of work. It rejects some very malformed things like <:::>. It returns an empty string, if the element isn't allowed (look ma, no strip_tags()!). Otherwise it splits the tag into an element and an attribute list.
kses_split2(type $matches) : string
Parameters
- $matches : type
Tags
Return values
stringkses_attr()
This function removes all attributes, if none are allowed for this element.
kses_attr(type $element, type $attr, type $allowed_html, type $allowed_protocols) : type
If some are allowed it calls kses_hair() to split them further, and then it builds up new HTML code from the data that kses_hair() returns. It also removes "<" and ">" characters, if there are any left. One more thing it does is to check if the tag has a closing XHTML slash, and if it does, it puts one in the returned code as well.
Parameters
- $element : type
- $attr : type
- $allowed_html : type
- $allowed_protocols : type
Return values
typekses_hair()
This function does a lot of work. It parses an attribute list into an array with attribute data, and tries to do the right thing even if it gets weird input. It will add quotes around attribute values that don't have any quotes or apostrophes around them, to make it easier to produce HTML code that will conform to W3C's HTML specification. It will also remove bad URL protocols from attribute values.
kses_hair(type $attr, type $allowed_protocols) : type
Parameters
- $attr : type
- $allowed_protocols : type
Return values
typekses_check_attr_val()
This function performs different checks for attribute values. The currently implemented checks are "maxlen", "minlen", "maxval", "minval" and "valueless" with even more checks to come soon.
kses_check_attr_val(type $value, type $vless, type $checkname, type $checkvalue) : bool
Parameters
- $value : type
- $vless : type
- $checkname : type
- $checkvalue : type
Return values
boolkses_bad_protocol()
This function removes all non-allowed protocols from the beginning of $string. It ignores whitespace and the case of the letters, and it does understand HTML entities. It does its work in a while loop, so it won't be fooled by a string like "javascript:javascript:alert(57)".
kses_bad_protocol(type $string, type $allowed_protocols) : type
Parameters
- $string : type
- $allowed_protocols : type
Return values
typekses_no_null()
This function removes any NULL characters in $string.
kses_no_null(type $string) : type
Parameters
- $string : type
Return values
typekses_stripslashes()
This function changes the character sequence \" to just " It leaves all other slashes alone. It's really weird, but the quoting from preg_replace(//e) seems to require this.
kses_stripslashes(type $string) : type
Parameters
- $string : type
Return values
typekses_array_lc()
This function goes through an array, and changes the keys to all lower case.
kses_array_lc(mixed $inarray) : mixed
Parameters
- $inarray : mixed
kses_js_entities()
This function removes the HTML JavaScript entities found in early versions of Netscape 4.
kses_js_entities(type $string) : type
Parameters
- $string : type
Return values
typekses_html_error()
This function deals with parsing errors in kses_hair(). The general plan is to remove everything to and including some whitespace, but it deals with quotes and apostrophes as well.
kses_html_error(type $string) : type
Parameters
- $string : type
Return values
typekses_bad_protocol_once()
This function searches for URL protocols at the beginning of $string, while handling whitespace and HTML entities.
kses_bad_protocol_once(type $string, type $allowed_protocols) : type
Parameters
- $string : type
- $allowed_protocols : type
Tags
Return values
typekses_bad_protocol_once2()
This function processes URL protocols, checks to see if they're in the white- list or not, and returns different data depending on the answer.
kses_bad_protocol_once2(type $matches) : string
Parameters
- $matches : type
Tags
Return values
stringkses_normalize_entities()
This function normalizes HTML entities. It will convert "AT&T" to the correct "AT&T", ":" to ":", "&#XYZZY;" to "&#XYZZY;" and so on.
kses_normalize_entities(type $string) : type
Parameters
- $string : type
Return values
typekses_normalize_entities2()
This function helps kses_normalize_entities() to only accept 16 bit values and nothing more for &#number; entities.
kses_normalize_entities2(type $matches) : type
Parameters
- $matches : type
Return values
typekses_decode_entities()
This function decodes numeric HTML entities (A and A). It doesn't do anything with other entities like ä, but we don't need them in the URL protocol whitelisting system anyway.
kses_decode_entities(type $string) : type
Parameters
- $string : type