Snippet Name: Convert HTML to plain text
Description: If you want to extract the text content of a HTML document (eg - get rid of all the HTML and Javascript), try the following code:
Also see: » BR2NL Function - Opposite of NL2BR
» Convert Seconds to Hours:Minutes:Secon...
» Convert UK Dates To mySQL Format Dates
» Convert miles to feet, feet to miles, ...
» Roman2dec and Dec2Roman
» Convert Minutes to Hours
» Binary to Text / Text to Binary
» Decimal to octal conversion
» Convert minutes to hours #2
» Convert minutes to hours #1
» Convert BBCode Tags
Comment: (none)
Language: PHP
Highlight Mode: PHP
Last Modified: March 16th, 2009
|
<?PHP
// $document should contain an HTML document.
// This will remove HTML tags, javascript sections
// and white space. It will also convert some
// common HTML entities to their text equivalent.
$search = ARRAY ("'<script[^>]*?>.*?</script>'si", // Strip out javascript
"'<[/!]*?[^<>]*?>'si", // Strip out HTML tags
"'([rn])[s]+'", // Strip out white space
"'&(quot|#34);'i", // Replace HTML entities
"'&(amp|#38);'i",
"'&(lt|#60);'i",
"'&(gt|#62);'i",
"'&(nbsp|#160);'i",
"'&(iexcl|#161);'i",
"'&(cent|#162);'i",
"'&(pound|#163);'i",
"'&(copy|#169);'i",
"'&#(d+);'e"); // evaluate as php
$replace = ARRAY ("",
"",
"\1",
"\"",
"&",
"<",
">",
" ",
CHR(161),
CHR(162),
CHR(163),
CHR(169),
"chr(\1)");
$text = PREG_REPLACE($search, $replace, $document);
?> |