Skip to content Skip to sidebar Skip to footer

Can I Mix Character Classes In Python Regex?

Special sequences (character classes) in Python RegEx are escapes like \w or \d that matches a set of characters. In my case, I need to be able to match all alpha-numerical charact

Solution 1:

You can use r"[^\W\d]", ie. invert the union of non-alphanumerics and numbers.

Solution 2:

You cannot subtract character classes, no.

Your best bet is to use the new regex module, set to replace the current re module in python. It supports character classes based on Unicode properties:

\p{IsAlphabetic}

This will match any character that the Unicode specification states is an alphabetic character.

Even better, regexdoes support character class subtraction; it views such classes as sets and allows you to create a difference with the -- operator:

[\w--\d]

matches everything in \w except anything that also matches \d.

Solution 3:

You can exclude classes using a negative lookahead assertion, such as r'(?!\d)[\w]' to match a word character, excluding digits. For example:

>>> re.search(r'(?!\d)[\w]', '12bac')
<_sre.SRE_Match object at 0xb7779218>
>>> _.group(0)
'b'

To exclude more than one group, you can use the usual [...] syntax in the lookahead assertion, for example r'(?![0-5])[\w]' would match any alphanumeric character except for digits 0-5.

As with [...], the above construct matches a single character. To match multiple characters, add a repetition operator:

>>> re.search(r'((?!\d)[\w])+', '12bac15')
<_sre.SRE_Match object at 0x7f44cd2588a0>
>>> _.group(0)
'bac'

Post a Comment for "Can I Mix Character Classes In Python Regex?"