Can I Mix Character Classes In Python Regex?
Solution 1:
You can use r"[^\W\d]"
, ie. invert the union of non-alphanumerics and numbers.
Solution 2:
You cannot subtract character classes, no.
Your best bet is to use the new regex
module, set to replace the current re
module in python. It supports character classes based on Unicode properties:
\p{IsAlphabetic}
This will match any character that the Unicode specification states is an alphabetic character.
Even better, regex
does support character class subtraction; it views such classes as sets and allows you to create a difference with the --
operator:
[\w--\d]
matches everything in \w
except anything that also matches \d
.
Solution 3:
You can exclude classes using a negative lookahead assertion, such as r'(?!\d)[\w]'
to match a word character, excluding digits. For example:
>>> re.search(r'(?!\d)[\w]', '12bac')
<_sre.SRE_Match object at 0xb7779218>
>>> _.group(0)
'b'
To exclude more than one group, you can use the usual [...]
syntax in the lookahead assertion, for example r'(?![0-5])[\w]'
would match any alphanumeric character except for digits 0-5.
As with [...]
, the above construct matches a single character. To match multiple characters, add a repetition operator:
>>> re.search(r'((?!\d)[\w])+', '12bac15')
<_sre.SRE_Match object at 0x7f44cd2588a0>
>>> _.group(0)
'bac'
Post a Comment for "Can I Mix Character Classes In Python Regex?"