A simple one-liner approach for a case insensitive equivalent to strtr():
<?php
function stritr(string $str, array $replace_pairs): string {
return str_ireplace(array_keys($replace_pairs), array_values($replace_pairs), $str);
}
?>
(PHP 4, PHP 5, PHP 7)
strtr — 转换指定字符
$str
, string $from
, string $to
) : string$str
, array $replace_pairs
) : string
该函数返回 str
的一个副本,并将在 from
中指定的字符转换为 to
中相应的字符。
比如, $from[$n]中每次的出现都会被替换为
$to[$n],其中 $n 是两个参数都有效的位移(offset)。
如果 from
与 to
长度不相等,那么多余的字符部分将被忽略。 str
的长度将会和返回的值一样。
If given two arguments, the second should be an array in the form array('from' => 'to', ...). The return value is a string where all the occurrences of the array keys have been replaced by the corresponding values. The longest keys will be tried first. Once a substring has been replaced, its new value will not be searched again.
In this case, the keys and the values may have any length, provided that
there is no empty key; additionally, the length of the return value may
differ from that of str
.
However, this function will be the most efficient when all the keys have the
same size.
str
待转换的字符串。
from
字符串中与将要被转换的目的字符 to
相对应的源字符。
to
字符串中与将要被转换的字符 from
相对应的目的字符。
replace_pairs
参数 replace_pairs
可以用来取代 to
和 from
参数,因为它是以 array('from' => 'to', ...) 格式出现的数组。
返回转换后的字符串。
如果 replace_pairs
中包含一个空字符串("")键,那么将返回 FALSE
。
If the str
is not a scalar
then it is not typecasted into a string, instead a warning is raised and
NULL
is returned.
Example #1 strtr() 范例
<?php
$addr = strtr($addr, "???", "aao");
?>
The next example shows the behavior of strtr() when called with only two arguments. Note the preference of the replacements ("h" is not picked because there are longer matches) and how replaced text was not searched again.
Example #2 使用两个参数的 strtr() 范例
<?php
$trans = array("hello" => "hi", "hi" => "hello");
echo strtr("hi all, I said hello", $trans);
?>
以上例程会输出:
hello all, I said hi
The two modes of behavior are substantially different. With three arguments, strtr() will replace bytes; with two, it may replace longer substrings.
Example #3 strtr() behavior comparison
<?php
echo strtr("baab", "ab", "01"),"\n";
$trans = array("ab" => "01");
echo strtr("baab", $trans);
?>
以上例程会输出:
1001 ba01
A simple one-liner approach for a case insensitive equivalent to strtr():
<?php
function stritr(string $str, array $replace_pairs): string {
return str_ireplace(array_keys($replace_pairs), array_values($replace_pairs), $str);
}
?>
Since I was having a lot of trouble finding a multibyte safe strtr and the solutions I found didn't help, I came out with this function, I don't know how it works with non latin chars but it works for me using spanish/french utf8, I hope it helps someone...
<?php
if(!function_exists('mb_strtr')) {
function mb_strtr ($str, $from, $to = null) {
if(is_array($from)) {
$from = array_map('utf8_decode', $from);
$from = array_map('utf8_decode', array_flip ($from));
return utf8_encode (strtr (utf8_decode ($str), array_flip ($from)));
}
return utf8_encode (strtr (utf8_decode ($str), utf8_decode($from), utf8_decode ($to)));
}
}
?>
Here's an important real-world example use-case for strtr where str_replace will not work or will introduce obscure bugs:
<?php
$strTemplate = "My name is :name, not :name2.";
$strParams = [
':name' => 'Dave',
'Dave' => ':name2 or :password', // a wrench in the otherwise sensible input
':name2' => 'Steve',
':pass' => '7hf2348', // sensitive data that maybe shouldn't be here
];
echo strtr($strTemplate, $strParams);
// "My name is Dave, not Steve."
echo str_replace(array_keys($strParams), array_values($strParams), $strTemplate);
// "My name is Steve or 7hf2348word, not Steve or 7hf2348word2."
?>
Any time you're trying to template out a string and don't necessarily know what the replacement keys/values will be (or fully understand the implications of and control their content and order), str_replace will introduce the potential to incorrectly match your keys because it does not expand the longest keys first.
Further, str_replace will replace in previous replacements, introducing potential for unintended nested expansions. Doing so can put the wrong data into the "sub-template" or even give users a chance to provide input that exposes data (if they get to define some of the replacement strings).
Don't support recursive expansion unless you need it and know it will be safe. When you do support it, do so explicitly by repeating strtr calls until no more expansions are occurring or a sane iteration limit is reached, so that the results never implicitly depend on order of your replacement keys. Also make certain that any user input will expanded in an isolated step after any sensitive data is already expanded into the output and no longer available as input.
Note: using some character(s) around your keys to designate them also reduces the possibility of unintended mangling of output, whether maliciously triggered or otherwise. Thus the use of a colon prefix in these examples, which you can easily enforce when accepting replacement input to your templating/translation system.
<?php
/**
* Clean string,
* minimize and remove space, accent and other
*
* @param string $string
* @return string
*/
public function mb_strtoclean($string){
// Valeur a nettoyer (conversion)
$unwanted_array = array( '?'=>'S', '?'=>'s', '?'=>'Z', '?'=>'z', 'à'=>'A', 'á'=>'A', '?'=>'A', '?'=>'A', '?'=>'A', '?'=>'A', '?'=>'A', '?'=>'C', 'è'=>'E', 'é'=>'E',
'ê'=>'E', '?'=>'E', 'ì'=>'I', 'í'=>'I', '?'=>'I', '?'=>'I', '?'=>'N', 'ò'=>'O', 'ó'=>'O', '?'=>'O', '?'=>'O', '?'=>'O', '?'=>'O', 'ù'=>'U',
'ú'=>'U', '?'=>'U', 'ü'=>'U', 'Y'=>'Y', 'T'=>'B', '?'=>'Ss', 'à'=>'a', 'á'=>'a', 'a'=>'a', '?'=>'a', '?'=>'a', '?'=>'a', '?'=>'a', '?'=>'c',
'è'=>'e', 'é'=>'e', 'ê'=>'e', '?'=>'e', 'ì'=>'i', 'í'=>'i', '?'=>'i', '?'=>'i', 'e'=>'o', '?'=>'n', 'ò'=>'o', 'ó'=>'o', '?'=>'o', '?'=>'o',
'?'=>'o', '?'=>'o', 'ù'=>'u', 'ú'=>'u', '?'=>'u', 'y'=>'y', 'y'=>'y', 't'=>'b', '?'=>'y',
' ' => '', '_' => '', '-' => '', '.'=> '', ',' => '', ';' => '');
return mb_strtolower(strtr($string, $unwanted_array ));
}
The example of VOVA (http://www.php.net/manual/fr/function.strtr.php#111968) is good but the result is false:
His example dont replace the string.
<?php
function f1_strtr() {
for($i=0; $i<1000000; ++$i) {
$new_string = strtr("aboutdealers.com", array(".com" => ""));
}
return $new_string;
}
function f2_str_replace() {
for($i=0; $i<1000000; ++$i) {
$new_string = str_replace( ".com", "", "aboutdealers.com");
}
return $new_string;
}
$start = microtime(true);
$strtr = f1_strtr();
$stop = microtime(true);
$time_strtr = $stop - $start;
$start = microtime(true);
$str_replace = f2_str_replace();
$stop = microtime(true);
$time_str_replace = $stop - $start;
echo 'time strtr : ' . $time_strtr . "\tresult :" . $strtr . "\n";
echo 'time str_replace : ' . $time_str_replace . "\tresult :" . $str_replace . "\n";
echo 'time strtr > time str_replace => ' . ($time_strtr > $time_str_replace);
?>
--------------------------------------
time strtr : 3.9719619750977 result :aboutdealers
time str_replace : 2.9930369853973 result :aboutdealers
time strtr > time str_replace => 1
str_replace is faster than strtr
strstr will issue a notice when $replace_pairs contains an array, even unused, with php 5.5.0.
It was not the case with version at least up to 5.3.2, but I'm not sure the notice was added with exactly 5.5.0.
<?php
$str = 'hi all, I said hello';
$replace_pairs = array(
'all' => 'everybody',
'unused' => array('somtehing', 'something else'),
'hello' => 'hey',
);
// php 5.5.0 Notice: Array to string conversion in test.php on line 8
echo strtr($str, $replace_pairs); // hi everybody, I said hey
?>
since the result is still correct, @strstr seems a working solution.
<?php
//note this output null
echo strtr('abc', array('' => ''));
?>
Since strtr (like PHP's other string functions) treats strings as a sequence of bytes, and since UTF-8 and other multibyte encodings use - by definition - more than one byte for at least some characters, the three-string form is likely to have problems. Use the associative array form to specify the mapping.
<?php
// Assuming UTF-8
$str = '?bc ?bc'; // strtr() sees this as nine bytes (including two for each ?)
echo strtr($str, '?', 'a'); // The second argument is equivalent to the string "\xc3\x84" so "\xc3" gets replaced by "a" and the "\x84" is ignored
echo strtr($str, array('?' => 'a')); // Works much better
?>
Weird, but strtr corrupting chars, if used like below and if file is encoded in UTF-8;
<?php
$str = '?bc ?bc';
echo strtr($str, '?', 'a');
// output: a?bc a?bc
?>
And a simple solution;
<?php
function strtr_unicode($str, $a = null, $b = null) {
$translate = $a;
if (!is_array($a) && !is_array($b)) {
$a = (array) $a;
$b = (array) $b;
$translate = array_combine(
array_values($a),
array_values($b)
);
}
// again weird, but accepts an array in this case
return strtr($str, $translate);
}
$str = '?bc ?bc';
echo strtr($str, '?', 'a') ."\n";
echo strtr_unicode($str, '?', 'a') ."\n";
echo strtr_unicode($str, array('?' => 'a')) ."\n";
// outputs
// a?bc a?bc
// abc abc
// abc abc
?>
Since strtr() is twice faster than strlwr I decided to write my own lowering function which also handles UTF-8 characters.
<?php
function strlwr($string, $utf = 1)
{
$latin_letters = array('?' => 'a',
'?' => 'a',
'?' => 'i',
'?' => 's',
'?' => 's',
'?' => 't',
'?' => 't');
$utf_letters = array('?' => '?',
'?' => 'a',
'?' => '?',
'?' => '?',
'?' => '?',
'?' => '?',
'?' => '?');
$letters = array('A' => 'a',
'B' => 'b',
'C' => 'c',
'D' => 'd',
'E' => 'e',
'F' => 'f',
'G' => 'g',
'H' => 'h',
'I' => 'i',
'J' => 'j',
'K' => 'k',
'L' => 'l',
'M' => 'm',
'N' => 'n',
'O' => 'o',
'P' => 'p',
'Q' => 'q',
'R' => 'r',
'S' => 's',
'T' => 't',
'U' => 'u',
'V' => 'v',
'W' => 'w',
'X' => 'x',
'Y' => 'y',
'Z' => 'z');
return ($utf == 1) ? strtr($string, array_merge($utf_letters, $letters)) : strtr($string, array_merge($latin_letters, $letters));
}
?>
This allows you to lower every character (even UTF-8 ones) if you don't set the second parameter, or just lower the UTF-8 ones into their specific latin characters (used when making friendly-urls for example).
I used romanian characters but, of course, you can add your own local characters.
Feel free to use/modify this function as you wish. Hope it helps.
Here my solution of an classes recursive caseinsentive strtr..
<?php
class String
{
public static function stritr(&$string, $from, $to = NULL)
{
if(is_string($from))
$string = preg_replace("'$from'i", $to, $string);
else if(is_array($from))
{
foreach ($from as $key => $val)
self::stritr($string, $key, $val);
}
return $string;
}
}
// example:
$string = "Hello world. This is just a simple test";
print String::stritr($string, 'WorLd', 'foo');
// array example:
print String::stritr($string, array('WorLd' => 'foo', 'TEST' => 'bar'));
?>
Here is my array for char normalization:
<?php
$normalizeChars = array(
'á'=>'A', 'à'=>'A', '?'=>'A', '?'=>'A', '?'=>'A', '?'=>'A', '?'=>'AE', '?'=>'C',
'é'=>'E', 'è'=>'E', 'ê'=>'E', '?'=>'E', 'í'=>'I', 'ì'=>'I', '?'=>'I', '?'=>'I', 'D'=>'Eth',
'?'=>'N', 'ó'=>'O', 'ò'=>'O', '?'=>'O', '?'=>'O', '?'=>'O', '?'=>'O',
'ú'=>'U', 'ù'=>'U', '?'=>'U', 'ü'=>'U', 'Y'=>'Y',
'á'=>'a', 'à'=>'a', 'a'=>'a', '?'=>'a', '?'=>'a', '?'=>'a', '?'=>'ae', '?'=>'c',
'é'=>'e', 'è'=>'e', 'ê'=>'e', '?'=>'e', 'í'=>'i', 'ì'=>'i', '?'=>'i', '?'=>'i', 'e'=>'eth',
'?'=>'n', 'ó'=>'o', 'ò'=>'o', '?'=>'o', '?'=>'o', '?'=>'o', '?'=>'o',
'ú'=>'u', 'ù'=>'u', '?'=>'u', 'ü'=>'u', 'y'=>'y',
'?'=>'sz', 't'=>'thorn', '?'=>'y'
);
?>
Case Insensitive strtr
<?php
function stritr($string, $one, $two=null) {
if (is_string($one)) {
return strtr($string, strtoupper($one) . strtolower($one), "$two$two");
} else if (is_array($one)) {
$strReturn = $string
foreach ($one as $key => $val) {
$strReturn = preg_replace("'$key'i", $val, $strReturn);
}
return $strReturn;
}
return $string;
}
?>
I found that this approach is often faster than strtr() and won't change the same thing in your string twice (as opposed to str_replace(), which will overwrite things in the order of the array you feed it with):
<?php
function replace ($text, $replace) {
$keys = array_keys($replace);
$length = array_combine($keys, array_map('strlen', $keys));
arsort($length);
$array[] = $text;
$count = 1;
reset($length);
while ($key = key($length)) {
if (strpos($text, $key) !== false) {
for ($i = 0; $i < $count; $i += 2) {
if (($pos = strpos($array[$i], $key)) === false) continue;
array_splice($array, $i, 1, array(substr($array[$i], 0, $pos), $replace[$key], substr($array[$i], $pos + strlen($key))));
$count += 2;
}
}
next($length);
}
return implode($array);
}
?>
Hope this is useful when you need to see ASCII control characters:
<?php
$xlate = array(chr(0) => '^@/NUL/null', chr(1) => '^A/SOH/start of heading', chr(2) => '^B/STX/start of text', chr(3) => '^C/ETX/end of text', chr(4) => '^D/EOT/end of transmisssion', chr(5) => '^E/ENQ/enquiry', chr(6) => '^F/ACK/acknowledge', chr(7) => '^G/BEL/bell', chr(8) => '^H/BS/backspace', chr(9) => '^I/TAB/horizontal tab', chr(10) => '^J/LF/NL/line feed/new line', chr(11) => '^K/VT/vertical tab', chr(12) => '^L/FF/NP/form feed/new page/', chr(13) => '^M/CR/carrige return', chr(14) => '^N/SO/shift out', chr(15) => '^O/SI/shift in', chr(16) => '^P/DLE/data link escape', chr(17) => '^Q/DC1/device control 1', chr(18) => '^R/DC2/device control 2', chr(19) => '^S/DC3/device control 3', chr(20) => '^T/DC4/device control 4', chr(21) => '^U/NAK/negative acknowledge', chr(22) => '^V/SYN/synchronous idle', chr(23) => '^W/ETB/end of transmission block', chr(24) => '^X/CAN/cancel', chr(25) => '^Y/EM/end of medium', chr(26) => '^Z/SUB/substiute', chr(27) => '^[/ESC/escape', chr(28) => '^\/FS/file separator', chr(29) => '^]/GS/group separator', chr(30) => '^^/RS/record separator', chr(31) => '^_/US/unit separator', chr(32) => 'Space');
$x = 0;
$pad = strlen(strlen($str));
while(isset($str[$x]))
echo 'character ', str_pad($x+1, $pad), ' = ', strtr($str[$x], $xlate), ' (ascii ', ord($str[$x++]), ')';
?>
If you supply 3 arguments and the 2nd is an array, strtr will search the "A" from "Array" (because you're treating it as a scalar string) and replace it with the 3rd argument:
strtr('Analogy', array('x'=>'y'), '_'); //'_nalogy'
so in reality the above code has the same affect as:
strtr('Analogy', 'A' , '_');
fixed "normaliza" functions written below to include Slavic Latin characters... also, it doesn't return lowercase any more (you can easily get that by applying strtolower yourself)...
also, renamed to normalize()
<?php
function normalize ($string) {
$table = array(
'?'=>'S', '?'=>'s', '?'=>'Dj', '?'=>'dj', '?'=>'Z', '?'=>'z', '?'=>'C', '?'=>'c', '?'=>'C', '?'=>'c',
'à'=>'A', 'á'=>'A', '?'=>'A', '?'=>'A', '?'=>'A', '?'=>'A', '?'=>'A', '?'=>'C', 'è'=>'E', 'é'=>'E',
'ê'=>'E', '?'=>'E', 'ì'=>'I', 'í'=>'I', '?'=>'I', '?'=>'I', '?'=>'N', 'ò'=>'O', 'ó'=>'O', '?'=>'O',
'?'=>'O', '?'=>'O', '?'=>'O', 'ù'=>'U', 'ú'=>'U', '?'=>'U', 'ü'=>'U', 'Y'=>'Y', 'T'=>'B', '?'=>'Ss',
'à'=>'a', 'á'=>'a', 'a'=>'a', '?'=>'a', '?'=>'a', '?'=>'a', '?'=>'a', '?'=>'c', 'è'=>'e', 'é'=>'e',
'ê'=>'e', '?'=>'e', 'ì'=>'i', 'í'=>'i', '?'=>'i', '?'=>'i', 'e'=>'o', '?'=>'n', 'ò'=>'o', 'ó'=>'o',
'?'=>'o', '?'=>'o', '?'=>'o', '?'=>'o', 'ù'=>'u', 'ú'=>'u', '?'=>'u', 'y'=>'y', 'y'=>'y', 't'=>'b',
'?'=>'y', '?'=>'R', '?'=>'r',
);
return strtr($string, $table);
}
?>
This work fine to me:
<?php
function normaliza ($string){
$a = 'àá??????èéê?ìí??D?òó????ùú?üYT
?àáa?????èéê?ìí??e?òó????ùú?yyt???';
$b = 'aaaaaaaceeeeiiiidnoooooouuuuy
bsaaaaaaaceeeeiiiidnoooooouuuyybyRr';
$string = utf8_decode($string);
$string = strtr($string, utf8_decode($a), $b);
$string = strtolower($string);
return utf8_encode($string);
}
?>
OK, I debugged the function (had some errors)
Here it is:
if(!function_exists("stritr")){
function stritr($string, $one = NULL, $two = NULL){
/*
stritr - case insensitive version of strtr
Author: Alexander Peev
Posted in PHP.NET
*/
if( is_string( $one ) ){
$two = strval( $two );
$one = substr( $one, 0, min( strlen($one), strlen($two) ) );
$two = substr( $two, 0, min( strlen($one), strlen($two) ) );
$product = strtr( $string, ( strtoupper($one) . strtolower($one) ), ( $two . $two ) );
return $product;
}
else if( is_array( $one ) ){
$pos1 = 0;
$product = $string;
while( count( $one ) > 0 ){
$positions = array();
foreach( $one as $from => $to ){
if( ( $pos2 = stripos( $product, $from, $pos1 ) ) === FALSE ){
unset( $one[ $from ] );
}
else{
$positions[ $from ] = $pos2;
}
}
if( count( $one ) <= 0 )break;
$winner = min( $positions );
$key = array_search( $winner, $positions );
$product = ( substr( $product, 0, $winner ) . $one[$key] . substr( $product, ( $winner + strlen($key) ) ) );
$pos1 = ( $winner + strlen( $one[$key] ) );
}
return $product;
}
else{
return $string;
}
}/* endfunction stritr */
}/* endfunction exists stritr */
Here is the stritr I always needed... I wrote it in 15 minutes... But only after the idea struck me. Hope you find it helpful, and enjoy...
<?php
if(!function_exists("stritr")){
function stritr($string, $one = NULL, $two = NULL){
/*
stritr - case insensitive version of strtr
Author: Alexander Peev
Posted in PHP.NET
*/
if( is_string( $one ) ){
$two = strval( $two );
$one = substr( $one, 0, min( strlen($one), strlen($two) ) );
$two = substr( $two, 0, min( strlen($one), strlen($two) ) );
$product = strtr( $string, ( strtoupper($one) . strtolower($one) ), ( $two . $two ) );
return $product;
}
else if( is_array( $one ) ){
$pos1 = 0;
$product = $string;
while( count( $one ) > 0 ){
$positions = array();
foreach( $one as $from => $to ){
if( ( $pos2 = stripos( $product, $from, $pos1 ) ) === FALSE ){
unset( $one[ $from ] );
}
else{
$positions[ $from ] = $pos2;
}
}
$winner = min( $positions );
$key = array_search( $winner, $positions );
$product = ( substr( $product, 0, $winner ) . $positions[$key] . substr( $product, ( $winner + strlen($key) ) ) );
$pos1 = ( $winner + strlen( $positions[$key] ) );
}
return $product;
}
else{
return $string;
}
}/* endfunction stritr */
}/* endfunction exists stritr */
?>
Here's another transcribe function. This one converts cp1252 (aka. Windows-1252) into iso-8859-1 (aka. latin1, the default PHP charset). It only transcribes the few exotic characters, which are unique to cp1252.
function transcribe_cp1252_to_latin1($cp1252) {
return strtr(
$cp1252,
array(
"\x80" => "e", "\x81" => " ", "\x82" => "'", "\x83" => 'f',
"\x84" => '"', "\x85" => "...", "\x86" => "+", "\x87" => "#",
"\x88" => "^", "\x89" => "0/00", "\x8A" => "S", "\x8B" => "<",
"\x8C" => "OE", "\x8D" => " ", "\x8E" => "Z", "\x8F" => " ",
"\x90" => " ", "\x91" => "`", "\x92" => "'", "\x93" => '"',
"\x94" => '"', "\x95" => "*", "\x96" => "-", "\x97" => "--",
"\x98" => "~", "\x99" => "(TM)", "\x9A" => "s", "\x9B" => ">",
"\x9C" => "oe", "\x9D" => " ", "\x9E" => "z", "\x9F" => "Y"));
/**
* Replaces special characters with single quote,double quote and comma for charset iso-8859-1
*
* replaceSpecialChars()
* @param string $str
* @return string
*/
function replaceSpecialChars($str)
{
//`(96) '(130) ?(132) '(145) '(146) "(147) "(148) ′(180) // equivalent ascii values of these characters.
$str = strtr($str, "`'?''′", "'','''");
$str = strtr($str, '""', '""');
return $str;
}
Here is a function to convert middle-european windows charset (cp1250) to the charset, that php script is written in:
<?php
function cp1250_to_utf2($text){
$dict = array(chr(225) => 'á', chr(228) => '?', chr(232) => '?', chr(239) => '?',
chr(233) => 'é', chr(236) => 'ě', chr(237) => 'í', chr(229) => '?', chr(229) => '?',
chr(242) => 'ň', chr(244) => '?', chr(243) => 'ó', chr(154) => '?', chr(248) => '?',
chr(250) => 'ú', chr(249) => '?', chr(157) => '?', chr(253) => 'y', chr(158) => '?',
chr(193) => 'á', chr(196) => '?', chr(200) => '?', chr(207) => '?', chr(201) => 'é',
chr(204) => 'ě', chr(205) => 'í', chr(197) => '?', chr(188) => '?', chr(210) => '?',
chr(212) => '?', chr(211) => 'ó', chr(138) => '?', chr(216) => '?', chr(218) => 'ú',
chr(217) => '?', chr(141) => '?', chr(221) => 'Y', chr(142) => '?',
chr(150) => '-');
return strtr($text, $dict);
}
?>
After battling with strtr trying to strip out MS word formatting from things pasted into forms I ended up coming up with this..
it strips ALL non-standard ascii characters, preserving html codes and such, but gets rid of all the characters that refuse to show in firefox.
If you look at this page in firefox you will see a ton of "question mark" characters and so it is not possible to copy and paste those to remove them from strings.. (this fixes that issue nicely, though I admit it could be done a bit better)
<?
function fixoutput($str){
$good[] = 9; #tab
$good[] = 10; #nl
$good[] = 13; #cr
for($a=32;$a<127;$a++){
$good[] = $a;
}
$len = strlen($str);
for($b=0;$b < $len+1; $b++){
if(in_array(ord($str[$b]), $good)){
$newstr .= $str[$b];
}//fi
}//rof
return $newstr;
}
?>
// if you are upset with windows' ^M characters at the end of the line,
// these two lines are for you:
$trans = array("\x0D" => "");
$text = strtr($orig_text,$trans);
// note that ctrl+M (in vim known as ^M) is hexadecimally 0x0D
<?
// Windows-1250 to ASCII
// This function replace all Windows-1250 accent characters with
// thier non-accent ekvivalents. Useful for Czech and Slovak languages.
function win2ascii($str) {
$str = StrTr($str,
"\xE1\xE8\xEF\xEC\xE9\xED\xF2",
"\x61\x63\x64\x65\x65\x69\x6E");
$str = StrTr($str,
"\xF3\xF8\x9A\x9D\xF9\xFA\xFD\x9E\xF4\xBC\xBE",
"\x6F\x72\x73\x74\x75\x75\x79\x7A\x6F\x4C\x6C");
$str = StrTr($str,
"\xC1\xC8\xCF\xCC\xC9\xCD\xC2\xD3\xD8",
"\x41\x43\x44\x45\x45\x49\x4E\x4F\x52");
$str = StrTr($str,
"\x8A\x8D\xDA\xDD\x8E\xD2\xD9\xEF\xCF",
"\x53\x54\x55\x59\x5A\x4E\x55\x64\x44");
return $str;
}
?>
Here you are a simple function to rotate a variable according to an array of possible values. You can make a strict comparison (===).
<?php
function rotateValue($string, $values, $strict = TRUE)
{
if (!empty($string) AND is_array($values))
{
$valuesCount = count($values);
for ($i = 0; $i < $valuesCount; $i++)
{
if ($strict ? ($string === $values[$i]) : ($string == $values[$i]))
{
return $values[($i + 1) % $valuesCount];
}
}
}
return FALSE;
}
?>
For example:
- rotateValue("A", array("A", "B", "C")) will return "B"
- rotateValue("C", array("A", "B", "C")) will return "A"
Posting umlaute here resulted in a mess. Heres a version of the same function that works with preg_replace only:
<?php
function getRewriteString($sString) {
$string = strtolower(htmlentities($sString));
$string = preg_replace("/&(.)(uml);/", "$1e", $string);
$string = preg_replace("/&(.)(acute|cedil|circ|ring|tilde|uml);/", "$1", $string);
$string = preg_replace("/([^a-z0-9]+)/", "-", html_entity_decode($string));
$string = trim($string, "-");
return $string;
}
?>
As Daijoubu suggested use str_replace instead of this function for large arrays/subjects. I just tried it with a array of 60 elements, a string with 8KB length, and the execution time of str_replace was faster at factor 20!
Patrick
If you are going to call strtr a lot, consider using str_replace instead, as it is much faster. I cut execution time in half just by doing this.
<?
// i.e. instead of:
$s=strtr($s,$replace_array);
// use:
foreach($replace_array as $key=>$value) $s=str_replace($key,$value,$s);
?>
Replace control characters in a binary string:
<?
function cc_replace($in) {
for ($i = 0; $i <= 31; $i++) {
$from .= chr($i);
$to .= ".";
}
return strtr($in, $from, $to);
}
?>
This function is usefull for
accent insensitive regexp
searches into greek (iso8859-7) text:
(Select View -> Character Encoding -> Greek (iso8859-7)
at your browser to see the correct greek characters)
function gr_regexp($mystring){
$replacement=array(
array("?","?","?","?"),
array("?","?","?","?"),
array("?","?","?","?"),
array("?","?","?","?","?","?"),
array("?","?","?","?"),
array("?","?","?","?","?","?"),
array("?","?","?","?")
);
foreach($replacement as $group){
foreach($group as $character){
$exp="[";
foreach($group as $expcharacter){
$exp.=$expcharacter;
}
$exp.="]";
$trans[$character]=$exp;
}
}
$temp=explode(" ", $mystring);
for ($i=0;$i<sizeof($temp);$i++){
$temp[$i]=strtr($temp[$i],$trans);
$temp[$i]=addslashes($temp[$i]);
}
return implode(".*",$temp);
}
$match=gr_regexp("????????????????????? ??? ????????");
//The next query string can be sent to MySQL
through mysql_query()
$query=
"Select `column` from `table` where `column2` REGEXP
'".$match."'";
Regarding christophe's conversion, note that the \x## values should be in double quotes, not single, so that the escape will be applied.
This version of macRomanToIso (originally posted by: marcus at synchromedia dot co dot uk) offers a couple of improvements. First, it removes the extra slashes '\' that broke the original function. Second, it adds four quote characters not supported in ISO 8859-1. These are the left double quote, right double quote, left single quote and right single quote.
Be sure to remove the line breaks from the two strings going into strtr or this function will not work properly.
Be careful what text you apply this to. If you apply it to ISO 8859-1 encoded text it will likely wreak havoc. I'll save you some trouble with this bit of advice: don't bother trying to detect what charset a certain text file is using, it can't be done reliably. Instead, consider making assumptions based upon the HTTP_USER_AGENT, or prompting the user to specify the character encoding used (perhaps both).
<?php
/**
* Converts MAC OS ROMAN encoded strings to the ISO 8859-1 charset.
*
* @param string the string to convert.
* @return string the converted string.
*/
function macRomanToIso($string)
{
return strtr($string,
"\x80\x81\x82\x83\x84\x85\x86\x87\x88\x89\x8a\x8b
\x8c\x8d\x8e\x8f\x90\x91\x92\x93\x94\x95\x96\x97
\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f\xa1\xa4\xa6\xa7
\xa8\xab\xac\xae\xaf\xb4\xbb\xbc\xbe\xbf\xc0\xc1
\xc2\xc7\xc8\xca\xcb\xcc\xd6\xd8\xdb\xe1\xe5\xe6
\xe7\xe8\xe9\xea\xeb\xec\xed\xee\xef\xf1\xf2\xf3
\xf4\xf8\xfc\xd2\xd3\xd4\xd5",
"\xc4\xc5\xc7\xc9\xd1\xd6\xdc\xe1\xe0\xe2\xe4\xe3
\xe5\xe7\xe9\xe8\xea\xeb\xed\xec\xee\xef\xf1\xf3
\xf2\xf4\xf6\xf5\xfa\xf9\xfb\xfc\xb0\xa7\xb6\xdf\xae
\xb4\xa8\xc6\xd8\xa5\xaa\xba\xe6\xf8\xbf\xa1\xac
\xab\xbb\xa0\xc0\xc3\xf7\xff\xa4\xb7\xc2\xca\xc1
\xcb\xc8\xcd\xce\xcf\xcc\xd3\xd4\xd2\xda\xdb\xd9
\xaf\xb8\x22\x22\x27\x27");
}
?>
Here's a very useful function to translate Microsoft characters into Latin 15, so that people won't see any more square instead of characters in web pages .
function demicrosoftize($str) {
return strtr($str,
"\x82\x83\x84\x85\x86\x87\x89\x8a" .
"\x8b\x8c\x8e\x91\x92\x93\x94\x95" .
"\x96\x97\x98\x99\x9a\x9b\x9c\x9e\x9f",
"'f\".**^\xa6<\xbc\xb4''" .
"\"\"---~ \xa8>\xbd\xb8\xbe");
}
Here's a function to replace linebreaks to html <p> tags. This was initially designed to receive a typed text by a form in a "insert new notice" page and put in a database, then a "notice" page could get the text preformatted with paragraph tags instead of linebreaks that won't appear on browser. The function also removes repeated linebreaks the user may have typed in the form.
function break_to_tags(&$text) {
// find and remove repeated linebreaks
$double_break = array("\r\n\r\n" => "\r\n");
do {
$text = strtr($text, $double_break);
$position = strpos($text, "\r\n\r\n");
} while ($position !== false);
// find and replace remanescent linebreaks by <p> tags
$change = array("\r\n" => "<p>");
$text = strtr($text, $change);
}
[]'s
Fernando
Referring to note from 11 October 2000, Thorn (?, ?), Eth (?, ?), Esset (?) and Mu (?) aren't really accented letters. ?, ?, ?, ? are ligatures. Best to do the following:
function removeaccents($string){
return strtr(
strtr($string,
'????????????????????????????????????????????????????????????',
'SZszYAAAAAACEEEEIIIINOOOOOOUUUUYaaaaaaceeeeiiiinoooooouuuuyy'),
array('?' => 'TH', '?' => 'th', '?' => 'DH', '?' => 'dh', '?' => 'ss',
'?' => 'OE', '?' => 'oe', '?' => 'AE', '?' => 'ae', '?' => 'u'));
}
This would be no good for sorting, as thorn and eth aren't actually found under th and dh. Also especially redundant because of Unicode! Still, I'm sure somone can find use for it - perhaps to constrict filenames...
Mark
to get the ascii equivalent of unicode characters simply use the
utf8_decode() function
#!/bin/sh
# This shell script generates a strtr() call
# to translate from a character set to another.
# Requires: gnu recode, perl, php commandline binary
#
# Usage:
# Set set1 and set2 to whatever you prefer
# (multibyte character sets are not supported)
# and run the script. The script outputs
# a strtr() php code for you to use.
#
# Example is set to generate a
# cp437..latin9 conversion code.
#
set1=cp437
set2=iso-8859-15
result="`echo '<? for($c=32;$c<256;$c++)'\
'echo chr($c);'\
|php -q|recode -f $set1..$set2`"
echo "// This php function call converts \$string in $set1 to $set2";
cat <<EOF | php -q
<?php
\$set1='`echo -n "$result"\
|perl -pe "s/([\\\\\'])/\\\\\\\\\\$1/g"`';
\$set2='`echo -n "$result"|recode -f $set2..$set1\
|perl -pe "s/([\\\\\'])/\\\\\\\\\\$1/g"`';
\$erase=array();
\$l=strlen(\$set1);
for(\$c=0;\$c<\$l;++\$c)
if(\$set1[\$c]==\$set2[\$c])\$erase[\$set1[\$c]]='';
if(count(\$erase))
{
\$set1=strtr(\$set1,\$erase);
\$set2=strtr(\$set2,\$erase);
}
if(!strlen(\$set1))echo 'IRREVERSIBLE';else
echo "strtr(\\\$string,\n '",
ereg_replace('([\\\\\\'])', '\\\\\\1', \$set2),
"',\n '",
ereg_replace('([\\\\\\'])', '\\\\\\1', \$set1),
"');";
EOF
To convert special chars to their html entities strtr you can use strtr in conjunction with get_html_translation_table(HTML_ENTITIES) :
$trans = get_html_translation_table(HTML_ENTITIES);
$html_code = strtr($html_code, $trans);
This will replace in $html_code the ? by Á , etc.
As noted in the str_rot13 docs, some servers don't provide the str_rot13() function. However, the presence of strtr makes it easy to build your own facsimile thereof:
if (!function_exists('str_rot13')) {
function str_rot13($str) {
$from = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ';
$to = 'nopqrstuvwxyzabcdefghijklmNOPQRSTUVWXYZABCDEFGHIJKLM';
return strtr($str, $from, $to);
}
}
This is suitable for very light "encryption" such as hiding email addressess from spambots (then unscrambling them in a mail class, for example).
$mail_to=str_rot13("$mail_to");
As an alternative to the not-yet-existing function stritr mentioned in the first note above You can easily do this:
strtr("abc","ABCabc","xyzxyz")
or more general:
strtr("abc",
strtoupper($fromchars).strtolower($fromchars),
$tochars.$tochars);
Just a thought.