Some functions to convert Numeric Character Reference (NCR) to UTF8:
Method 01
<?php function detectUTF8($string) { return preg_match('%(?: [\xC2-\xDF][\x80-\xBF] # non-overlong 2-byte |\xE0[\xA0-\xBF][\x80-\xBF] # excluding overlongs |[\xE1-\xEC\xEE\xEF][\x80-\xBF]{2} # straight 3-byte |\xED[\x80-\x9F][\x80-\xBF] # excluding surrogates |\xF0[\x90-\xBF][\x80-\xBF]{2} # planes 1-3 |[\xF1-\xF3][\x80-\xBF]{3} # planes 4-15 |\xF4[\x80-\x8F][\x80-\xBF]{2} # plane 16 )+%xs', $string); } function encoding($string){ if (detectUTF8($string)) { return $string; } else { $string = html_entity_decode($string,ENT_QUOTES,"UTF-8"); return $string; } } ?>
Ex:
<?php .... echo encoding("La Fenêtre de Soleil"); ?>
Result:
La Fenêtre de Soleil
Alternative methods:
Method 02
function ncr_utf8_2($string) { $_utf8 = create_function('$data', 'if ($data < 128) return chr($data);if ($data < 2048) return chr(($data >> 6) + 192) . chr(($data & 63) + 128); if ($data < 65536) return chr(($data >> 12) + 224) . chr((($data >> 6) & 63) + 128) . chr(($data & 63) + 128); if ($data < 2097152)return chr(($data >> 18) + 240) . chr((($data >> 12) & 63) + 128) . chr((($data >> 6) & 63) + 128) . chr(($data & 63) + 128); return "";'); $string = preg_replace('/&#x([0-9a-f]+);/ei', '$_utf8(hexdec("\\1"))', $string); $string = preg_replace('/&#([0-9]+);/e', '$_utf8(\\1)', $string); if (!isset($tbl)) { $tbl = array(); foreach (get_html_translation_table(HTML_ENTITIES) as $val => $key) $tbl[$key] = utf8_encode($val); } return strtr($string, $tbl); }
Update 08/23/2018:
if you see error:
Deprecated: preg_replace(): The /e modifier is deprecated, use preg_replace_callback instead in ...php on line 8
You must use this method instead:
function ncr_utf8_2($string) { $_utf8 = create_function('$data', 'if ($data < 128) return chr($data);if ($data < 2048) return chr(($data >> 6) + 192) . chr(($data & 63) + 128); if ($data < 65536) return chr(($data >> 12) + 224) . chr((($data >> 6) & 63) + 128) . chr(($data & 63) + 128); if ($data < 2097152)return chr(($data >> 18) + 240) . chr((($data >> 12) & 63) + 128) . chr((($data >> 6) & 63) + 128) . chr(($data & 63) + 128); return "";'); $string = preg_replace_callback('/&#x([0-9a-f]+);/i', function($ms) use($_utf8){return $_utf8(hexdec($ms[1]));}, $string); $string = preg_replace_callback('/&#([0-9]+);/', function($ms) use($_utf8){$_utf8($ms[1]);}, $string); if (!isset($tbl)) { $tbl = array(); foreach (get_html_translation_table(HTML_ENTITIES) as $val => $key) $tbl[$key] = ($val); } return strtr($string, $tbl); }
Method 03
function ncr_utf8_3($string) { $_utf8 = create_function('$data', 'if ($data > 127){ $i = 5; while (($i--) > 0){ if ($data != ($a = $data % ($p = pow(64, $i)))) { $ret = chr(base_convert(str_pad(str_repeat(1, $i + 1), 8, "0"), 2, 10) + (($data - $a) / $p)); for ($i; $i > 0; $i--) $ret .= chr(128 + ((($data % pow(64, $i)) - ($data % ($p = pow(64, $i - 1)))) / $p)); break;}}} else $ret = "&#$data;"; return $ret;'); return preg_replace("/\\&\\#([0-9]{3,10})\\;/e", '$_utf8("\\1")', $string); }
Ex:
echo ncr_utf8_3("Peut être");
result:
Peut être
Update 08/23/2018:
if you see error:
Deprecated: preg_replace(): The /e modifier is deprecated, use preg_replace_callback instead in ...php on line 8
You must use this method instead:
function ncr_utf8_3($string) { $_utf8 = create_function('$data', 'if ($data > 127){ $i = 5; while (($i--) > 0){ if ($data != ($a = $data % ($p = pow(64, $i)))) { $ret = chr(base_convert(str_pad(str_repeat(1, $i + 1), 8, "0"), 2, 10) + (($data - $a) / $p)); for ($i; $i > 0; $i--) $ret .= chr(128 + ((($data % pow(64, $i)) - ($data % ($p = pow(64, $i - 1)))) / $p)); break;}}} else $ret = "&#$data;"; return $ret;'); return preg_replace_callback("/\\&\\#([0-9]{3,10})\\;/", function($ms) use($_utf8){$_utf8($ms[1]);}, $string); }
Next: PHP: Convert Numeric Character Reference to UTF8 – Part 2
2 Comments
PHP: Convert Numeric Character Reference to UTF8 – part 2 | Free Online Tutorials
(April 2, 2016 - 1:27 pm)[…] First, read PHP: Convert Numeric Character Reference to UTF8 – part 1 […]
Javascript: Convert Numeric Character Reference to UTF8 | Free Online Tutorials
(April 17, 2018 - 2:15 pm)[…] Read more: PHP: Convert Numeric Character Reference to UTF8 […]