Letzte Änderung: 09. Mai 2022
Sinvolle Links
Link : ➞
REGEXR
Link : ➞
Regex 101
Link : ➞
myregexp
Regex Erklärung
This table is a cheat sheet
containing some of the most common operators used to write Perl Compatible
Regular Expressions (PCRE). It is not an exhaustive list of all the possible
PCRE operators or rules. It also is not a complete guide to the syntax with
which you write PCRE expressions. There are many other rules (and, sorry to say,
often exceptions to those rules). For more detail on PCRE expressions, I
recommend reading the "PCRE Regular Expression Details" section of the
PCRE Man page.
What this sheet does contain are the operators most commonly
used in SpamScreen rules. If you work with SpamScreen rules often, consider
printing and posting this sheet, or bookmarking it.
For instance, regular expressions (regexes) can be used in queries to find all items in which any frame, or a specific frame, or any of
a list of frames, contains text matching the regular expression that you are searching for.
For example, if you wanted to find all items containing sequences of capital letters followed by numbers, then the regular expression would be: [A-Z]+[0-9]+
Special Characters (escape them with \) :
Char Description
\ Used to escape a special character
^ Match the beginning of the string. For instance, the regular expression "^The" would match any string that begins with "The". Note: Can be negate with brackets (see below).
$ This matches the end of a string. For instance, the regular expression "b$" would match any string that ends with the letter "b."
. This operator is used to match any single character (technically any character other than the new line character.) For instance "c.t" would match "cat", "cot", "cut", and so on.
| Matches previous OR next character/group
? This operator denotes that the preceding character appears once or not at all. For example, "a?rg" would only match "rg" or "arg".
* This operator denotes that the preceding character appears zero or more times. For example, "a*rg" would match "rg", "arg", "aarg", or even "aaaaaaaarg".
+ This operator denotes that the preceding character appears one or more times. For example, "a+rg" would match "arg", "aarg", or "aaaaaaaarg". Note that unlike the operator above, "a+rg" would NOT match "rg".
( ) You can use parentheses to perform operations on groups of characters. For instance, if you want to repeat an entire word three times you could write "(word){3}" to match "wordwordword".
[ ] Brackets denote values possible for one character. For instance, "[arxp]" means that the character in question can be either "a", "r", "x", or "p". You can also define ranges.
The expression "[a-zA-Z]" means the character can be any letter. The range [0-9] means the character can be any single digit. Bracketed values are called character classes.
You can also use the "^" symbol to negate a character class. For instance, [^0-9] means the character can be any character BESIDES a single digit.
{ } The bounds operator defines the number of instances of a preceding value. For instance, "precious{5}" would match the word "preciousssss" with five Ss. You can also define a range within the bounds operator.
The expression "precious{2,6}" would match "preciou" followed by two to six Ss.
| This operator acts as a Boolean OR operation. For instance, 1|2|3|4 would match one number between one and four.
Greed (Gier): Standardmäßig versuchen die Quantoren *, ?, +, und {min,max}, so viele Zeichen wie möglich einzubeziehen, um eine Übereinstimmung zu finden. Um dieses Verhalten auf so wenig Zeichen wie möglich zu begrenzen, muss nach den Quantoren ein Fragezeichen eingefügt werden. Zum Beispiel würde das Suchmuster <.+> (also ohne Fragezeichen) bedeuten: "suche nach einer Zeichenkette, die sich zusammensetzt aus <, mindestens einem Zeichen und >". Um zu verhindern, dass das Suchmuster mit der kompletten Zeichenkette <em>text</em> übereinstimmt, ist es notwendig, nach dem Pluszeichen ein Fragezeichen einzufügen: <.+?>. Die Suche stoppt bereits beim ersten '>' und führt dazu, dass das erste HTML-Tag <em> die gefundene Übereinstimmung ist.
Special notations with \ :
\d This symbol stands for any character BESIDES a single digit. equiv to [0-9].
\D matches any non-digit character
\w Word. In Perl, a word is any letter (regardless of case), number, or the underscore character. A word is synonymous with the character class "[a-zA-Z0-9_]".
\W This symbol is the equivalent of a NON-word character (see above). This symbol is synonymous with the character class "[^a-zA-Z0-9_]".
\s This symbol stands for any white space character. White space characters include spaces, tabs, and line endings (linefeeds and carriage returns).
\S This symbol stands for any NON-white space character.
\t tab
\n newline
\r return (CR)
\b "word" boundary
\B not a "word" boundary
\xhh character with hex. code hh
(?i) This puts the statement following into case-insensitive mode. For instance, "((?i)caseless)" would match "CASELESS", "caseless", and even variants such as "CaSeLeSS".
Examples :
a* zero or more a's
a+ one or more a's
a? zero or one a's (i.e., optional a)
a{m} exactly m a's
a{m,} at least m a's
a{m,n} at least m but at most n a's
Finished\? matches “Finished?”
^http matches strings that begin with http
[^0-9] matches any character not 0-9
ing$ matches “exciting” but not “ingenious”
gr.y matches “gray“, “grey”
Red|Yellow matches “Red” or “Yellow”
colou?r matches colour and color
Ah? matches “Al” or “Ah”
Ah* matches “Ahhhhh” or “A”
Ah+ matches “Ah” or “Ahhh” but not “A”
[cbf]ar matches “car“, “bar“, or “far”
[a-zA-Z] matches ascii letters a-z (uppercase and lower case)
abc abc (that exact character sequence, but anywhere in the string)
^abc abc at the beginning of the string
abc$ abc at the end of the string
a|b either of a and b
^abc|abc$ the string abc at the beginning or at the end of the string
ab{2,4}c an a followed by two, three or four b's followed by a c
ab{2,}c an a followed by at least two b's followed by a c
ab*c an a followed by any number (zero or more) of b's followed by a c
ab+c an a followed by one or more b's followed by a c
ab?c an a followed by an optional b followed by a c; that is, either abc or ac
a.c an a followed by any single character (not newline) followed by a c
a\.c a.c exactly
[abc] any one of a, b and c
[Aa]bc either of Abc and abc
[abc]+ any (nonempty) string of a's, b's and c's (such as a, abba, acbabcacaa)
[^abc]+ any (nonempty) string which does not contain any of a, b and c (such as defg)
[x-y] matches any of the characters from x to y (inclusively) in the ASCII code
[\-] matches the hyphen character -
[\n] matches the newline; other single character denotations with \ apply normally, too
\d\d any two decimal digits, such as 42; same as \d{2}
\w+ a "word": a nonempty sequence of alphanumeric characters and low lines (underscores),such as foo and 12bar8 and foo_1
100\s*mk
the strings 100 and mk optionally separated by any amount of white space (spaces, tabs, newlines)
abc\b abc when followed by a word boundary (e.g. in abc! but not in abcd)
perl\B perl when not followed by a word boundary (e.g. in perlert but not in perl stuff)