|May the source be with you, but remember the KISS principle ;-)|
|Contents||Bulletin||Scripting in shell and Perl||Network troubleshooting||History||Humor|
|News||Perl Language||Recommended Links||Reference||Regular expressions|
|sort||substr||Split||sprintf||index and rindex|
|Pipes in Perl||History||Humor||Etc|
The tr function (actually this is an operator ;-) allows character-by-character translation with several enhancements.
It takes two argument source character set and target character set. Generally they should be of equal length, but if targer charater set is shorter it is expended by the last chanracter to the necessary length.
Syntax is rather strange and belongs to "Perl warts" as it does not fit well into general string manipulation functions framework
That can be explained by the fact that the tr operator is derived from the UNIX tr utility. The UNIX sed utility uses a y for this operation -- it is supported as a synonym for tr.
The string to be modified is not supplied as a parameter, but is taken from the $_ variable, for example:
tr/a/z/; # change all "a" into "z"
The following expression replaces each digit with 9 so any resulting number will
consist of 9 only.
tr/[0-9]/9/; # change all digits into 9
This sometimes can be a useful parsing technique or as data scrambling technique.
|By default tr modifies the content of the variable $_.|
The function returns the number of substitutions made, not the translated string as we might expect.
$_='Test string 123456789123456789123456789'; $k=tr/2345678/9/; # $k will contain the number of substitutions made
Unlike index and
substr the tr
function returns not the translated string,
but the number of substitutions made.
If you specify more than one character in the match character list, you can translate multiple characters at a time. For example:
tr/0123456789/9999999999/; # replace all digits with 9translates all digits into the 9 character. If the replacement list of characters is shorter than the target list of characters, the last character in the replacement list is repeated as often as needed. That means that we can rewrite the statement above as:
tr/0123456789/9/; # same as above
if more than one replacement character is given for a matched character (this is stupid idea because arguments are sets, but can happen if sets are generated automatically and corresponding check is not in place), only the first is used. The rest of the replacement list is ignored. For instance:
results in all characters "9" in the string being converted to an 1 character. So it's equal to
The translation operator doesn't perform variable interpolation, for example:
$from_set="0123456789"; $to_set ="ABCDEFGHIJ"; tr/$from_set/$to_set/; # does not work
The translation operator doesn't perform variable interpolation.
The translation operator several useful options: you can delete matched characters, replace repeated characters with a single character, and translate only characters that don't match the character list (see the table below).
Historically the translate function is considered to be one of pattern matching operators. That is untrue, but as you will see the syntax is derived form (also pretty strange) match and substitute operators that we will study in Ch.5. At the same time the translation function operates with strings of character sets, not with regular expressions. Delimiter can vary, but slashes are most commonly used. (slashes are also used in Perl 5 for regular expressions). Most of the special regular expression codes are not applicable.
However, like in regular expressions the dash is used to mean "between". This statement converts $_ to upper case.
tr/a-z/A-Z/; # again this is not the best way to do it. Use uc() instead
Please note that Perl 4 did not have lc and uc functions. Therefore the tr function was often used to convert case. If you see this idiom in the script that probably means that the script was initially written for Perl 4. The example above that converts all digits to 9 can be rewritten as
tr/0-9/9/; # the shortest way to replace all digits to 9
If the target set contains no characters and you use modifier d that operations deletes characters from the source set that were not replaced
If you want to deleting all characters in the the source set, then you do need to specify empty second set with option d or the function does not work as expected
So this option is an exception to the rule that target character send is extended to the length of the source character set. With this option it is not, if target set is empty.
# cat test $test='test '; print "Before test 1: |$test|\n"; $test=~tr/ / /d; print "After: test 1: |$test|\n"; $test='test '; print "Before test 2: |$test|\n"; $test=~tr/ //d; print "After test 2: |$test|\n"; # perl test Before test 1: |test | After: test 1: |test | Before test 2: |test | After test 2: |test|
If the new set is empty and there is no d option, then target set is assumed to be equal to the source one and function will not modify the source string -- it can be used for counting characters from the specified set in the string.
For example, the statement here counts the number of dots (dot is a special character in regular expressions in the variable $ip and stores that in the variable $total.
$_="220.127.116.11" $total = tr/.//;
Another more complex example counts a set of characters
$k=tr/0-9//; # counts number of digits in the string $_You can specify set not only directly, but using the idea of complement set operation:
$k=tr/0-9//c; $ will count all non digit characters
tr/a-zA-Z//s; # bookkeeper -> bokeper (squeeze in its pure form should use empty target set which will be assumed to be equal to source set tr/a-zA-Z/ /cs; # change non-alphas to single space @stripped = map tr/a-zA-Z/ /csr, @original; . # /r with map
If you use tr to parse the string into lexical elements then you need to squash repeated character after transliteration. In this case one can use option s. This permits easy building of primitive lexical parsers:
$k=tr/0-9a-Z_/9999999999A/s; # each identifier replaced by A, each number by 9 (target set is extended to the length of the source with letter A)
|Normally, if the match list is longer than the replacement
list, the last character in the replacement list is used as the replacement
for the extra characters. However, when the d option is used, the
matched characters are simply deleted.
If the replacement list is empty, then no translation is done. The operator will still return the number of characters that matched, though. This is useful when you need to know how often a given letter appears in a string. This feature also can compress repeated characters using the s option.
Here is the list of all possible options:
|c||This option complements the source character set. In other words, the translation is done for every character that does not match the source character set.|
|d||This option deletes any character in the source character set that does not have a corresponding character in the target character set. (Deletes found but unreplaced characters.)|
|r||Return the modified string and leave the original string untouched. $HOST = $host =~ tr/a-z/A-Z/r;|
|s||This option reduces repeated sequences of the same character in the output to to a single instance of that character.. If the replacement list is empty all characters in source string are squashed.|
For example ROT13 is a simple substitution cipher that is sometimes used for distributing offensive jokes and other potentially objectionable materials on Usenet.
This is a Caesar cyper with the value of key equal to 13 (A->N, B->O etc.).
Using tr function for decoding ROT13 is an interesting example because the target set is constructed by concatenation of disjoint character subranges [n-z][a-m] (or [N-Z][A-M] for the upper case:
|UNIX programmers may be familiar with using the tr utility to convert lowercase characters to uppercase characters, or vice versa. Do not do that -- Perl 5 has the lc() and uc() functions for this purpose|
For complex transliterations the tr/// syntax is bad. . One of the problems is that the notation doesn't actually show which characters correspond, so you have to count characters. for example:
But in Perl there is a way to make this example more readable using different delimiters:
If the first string contains duplicates, then the first corresponding character is used, not the last:
The tr function allows character-by-character translation. The following expression replaces each a with e, each b with d, and each c with f in the variable $sentence. The expression returns the number of substitutions made.$sentence =~ tr/abc/edf/
Most of the special RE codes do not apply in the tr function. For example, the statement here counts the number of asterisks in the $sentence variable and stores that in the $count variable.$count = ($sentence =~ tr/*/*/);However, the dash is still used to mean "between". This statement converts $_ to upper case.tr/a-z/A-Z/;
Retrieving String Length Using tr
The tr function provides another way of determining the length of a character string, in conjunction with the built-in system variable $_.
The syntax for the tr function istr/sourcelist/replacelist/
sourcelist is the list of characters to replace, and replacelist is the list of characters to replace with. (For details, see the following listing and the explanation provided with it.)Listing 13.10. A program that uses tr to retrieve the length of a string.
1: #!/usr/local/bin/perl 2: 3: $string = "here is a string"; 4: $_ = $string; 5: $length = tr/a-zA-Z /a-zA-Z /; 6: print ("the string is $length characters long\n");
Unlike in C, the assignment operator produces a valid lvalue. Modifying an assignment is equivalent to doing the assignment and then modifying the variable that was assigned to. This is useful for modifying a copy of something, like this:
($tmp = $global) =~ tr [A-Z] [a-z]; ... ... ...
$_string is transliterated. (The string specified with =~ must be a scalar variable, an array element, a hash element, or an assignment to one of those, i.e., an lvalue.) A character range may be specified with a hyphen, so
tr/A-J/0-9/does the same replacement as
tr/ACEGIBDFHJ/0246813579/. For sed devotees, y is provided as a synonym for tr. If the SEARCHLIST is delimited by bracketing quotes, the REPLACEMENTLIST has its own pair of quotes, which may or may not be bracketing quotes, e.g.,
c Complement the SEARCHLIST. d Delete found but unreplaced characters. s Squash duplicate replaced characters.
/c modifier is specified, the SEARCHLIST character
set is complemented. If the
/d modifier is specified, any characters specified by
SEARCHLIST not found in REPLACEMENTLIST are
deleted. (Note that this is slightly more flexible than the behavior of some tr
programs, which delete anything they find in the SEARCHLIST, period.) If
modifier is specified, sequences of characters that were transliterated to the same character
are squashed down to a single instance of the character.
/d modifier is used, the REPLACEMENTLIST is always
interpreted exactly as specified. Otherwise, if the REPLACEMENTLIST is
shorter than the SEARCHLIST, the final character is replicated till it is
long enough. If the REPLACEMENTLIST is empty, the
SEARCHLIST is replicated. This latter is useful for counting characters in a class or for
squashing character sequences in a class.
$ARGV =~ tr/A-Z/a-z/; # canonicalize to lower case
$cnt = tr/*/*/; # count the stars in $_
$cnt = $sky =~ tr/*/*/; # count the stars in $sky
$cnt = tr/0-9//; # count the digits in $_
tr/a-zA-Z//s; # bookkeeper -> bokeper
($HOST = $host) =~ tr/a-z/A-Z/;
tr/a-zA-Z/ /cs; # change non-alphas to single space
tr [\200-\377] [\000-\177]; # delete 8th bit
If multiple transliterations are given for a character, only the first one is used:
will transliterate any A to X.
Note that because the transliteration table is built at compile time, neither the
SEARCHLIST nor the REPLACEMENTLIST are subjected
to double quote interpolation. That means that if you want to use variables, you must use an
eval "tr/$oldlist/$newlist/"; die $@ if $@;
eval "tr/$oldlist/$newlist/, 1" or die $@;
Softpanorama hot topic of the month
tr - perldoc.perl.org
FAIR USE NOTICE This site contains copyrighted material the use of which has not always been specifically authorized by the copyright owner. We are making such material available in our efforts to advance understanding of environmental, political, human rights, economic, democracy, scientific, and social justice issues, etc. We believe this constitutes a 'fair use' of any such copyrighted material as provided for in section 107 of the US Copyright Law. In accordance with Title 17 U.S.C. Section 107, the material on this site is distributed without profit exclusivly for research and educational purposes. If you wish to use copyrighted material from this site for purposes of your own that go beyond 'fair use', you must obtain permission from the copyright owner.
ABUSE: IPs or network segments from which we detect a stream of probes might be blocked for no less then 90 days. Multiple types of probes increase this period.
Groupthink : Two Party System as Polyarchy : Corruption of Regulators : Bureaucracies : Understanding Micromanagers and Control Freaks : Toxic Managers : Harvard Mafia : Diplomatic Communication : Surviving a Bad Performance Review : Insufficient Retirement Funds as Immanent Problem of Neoliberal Regime : PseudoScience : Who Rules America : Neoliberalism : The Iron Law of Oligarchy : Libertarian Philosophy
War and Peace : Skeptical Finance : John Kenneth Galbraith :Talleyrand : Oscar Wilde : Otto Von Bismarck : Keynes : George Carlin : Skeptics : Propaganda : SE quotes : Language Design and Programming Quotes : Random IT-related quotes : Somerset Maugham : Marcus Aurelius : Kurt Vonnegut : Eric Hoffer : Winston Churchill : Napoleon Bonaparte : Ambrose Bierce : Bernard Shaw : Mark Twain Quotes
Vol 25, No.12 (December, 2013) Rational Fools vs. Efficient Crooks The efficient markets hypothesis : Political Skeptic Bulletin, 2013 : Unemployment Bulletin, 2010 : Vol 23, No.10 (October, 2011) An observation about corporate security departments : Slightly Skeptical Euromaydan Chronicles, June 2014 : Greenspan legacy bulletin, 2008 : Vol 25, No.10 (October, 2013) Cryptolocker Trojan (Win32/Crilock.A) : Vol 25, No.08 (August, 2013) Cloud providers as intelligence collection hubs : Financial Humor Bulletin, 2010 : Inequality Bulletin, 2009 : Financial Humor Bulletin, 2008 : Copyleft Problems Bulletin, 2004 : Financial Humor Bulletin, 2011 : Energy Bulletin, 2010 : Malware Protection Bulletin, 2010 : Vol 26, No.1 (January, 2013) Object-Oriented Cult : Political Skeptic Bulletin, 2011 : Vol 23, No.11 (November, 2011) Softpanorama classification of sysadmin horror stories : Vol 25, No.05 (May, 2013) Corporate bullshit as a communication method : Vol 25, No.06 (June, 2013) A Note on the Relationship of Brooks Law and Conway Law
Fifty glorious years (1950-2000): the triumph of the US computer engineering : Donald Knuth : TAoCP and its Influence of Computer Science : Richard Stallman : Linus Torvalds : Larry Wall : John K. Ousterhout : CTSS : Multix OS Unix History : Unix shell history : VI editor : History of pipes concept : Solaris : MS DOS : Programming Languages History : PL/1 : Simula 67 : C : History of GCC development : Scripting Languages : Perl history : OS History : Mail : DNS : SSH : CPU Instruction Sets : SPARC systems 1987-2006 : Norton Commander : Norton Utilities : Norton Ghost : Frontpage history : Malware Defense History : GNU Screen : OSS early history
The Peter Principle : Parkinson Law : 1984 : The Mythical Man-Month : How to Solve It by George Polya : The Art of Computer Programming : The Elements of Programming Style : The Unix Haterís Handbook : The Jargon file : The True Believer : Programming Pearls : The Good Soldier Svejk : The Power Elite
Most popular humor pages:
Manifest of the Softpanorama IT Slacker Society : Ten Commandments of the IT Slackers Society : Computer Humor Collection : BSD Logo Story : The Cuckoo's Egg : IT Slang : C++ Humor : ARE YOU A BBS ADDICT? : The Perl Purity Test : Object oriented programmers of all nations : Financial Humor : Financial Humor Bulletin, 2008 : Financial Humor Bulletin, 2010 : The Most Comprehensive Collection of Editor-related Humor : Programming Language Humor : Goldman Sachs related humor : Greenspan humor : C Humor : Scripting Humor : Real Programmers Humor : Web Humor : GPL-related Humor : OFM Humor : Politically Incorrect Humor : IDS Humor : "Linux Sucks" Humor : Russian Musical Humor : Best Russian Programmer Humor : Microsoft plans to buy Catholic Church : Richard Stallman Related Humor : Admin Humor : Perl-related Humor : Linus Torvalds Related humor : PseudoScience Related Humor : Networking Humor : Shell Humor : Financial Humor Bulletin, 2011 : Financial Humor Bulletin, 2012 : Financial Humor Bulletin, 2013 : Java Humor : Software Engineering Humor : Sun Solaris Related Humor : Education Humor : IBM Humor : Assembler-related Humor : VIM Humor : Computer Viruses Humor : Bright tomorrow is rescheduled to a day after tomorrow : Classic Computer Humor
The Last but not Least
Copyright © 1996-2016 by Dr. Nikolai Bezroukov. www.softpanorama.org was created as a service to the UN Sustainable Development Networking Programme (SDNP) in the author free time. This document is an industrial compilation designed and created exclusively for educational use and is distributed under the Softpanorama Content License.
Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.
FAIR USE NOTICE This site contains copyrighted material the use of which has not always been specifically authorized by the copyright owner. We are making such material available to advance understanding of computer science, IT technology, economic, scientific, and social issues. We believe this constitutes a 'fair use' of any such copyrighted material as provided by section 107 of the US Copyright Law according to which such material can be distributed without profit exclusively for research and educational purposes.
This is a Spartan WHYFF (We Help You For Free) site written by people for whom English is not a native language. Grammar and spelling errors should be expected. The site contain some broken links as it develops like a living tree...
|You can use PayPal to make a contribution, supporting development of this site and speed up access. In case softpanorama.org is down you can use the at softpanorama.info|
The statements, views and opinions presented on this web page are those of the author (or referenced source) and are not endorsed by, nor do they necessarily reflect, the opinions of the author present and former employers, SDNP or any other organization the author may be associated with. We do not warrant the correctness of the information provided or its fitness for any purpose.
Last modified: September, 12, 2017