Skip to content Skip to sidebar Skip to footer

Replace With Multi Line Regex

Given the following text, I want to remove everything in data_augmentation_options{random_horizontal_flip {..}} (... means other text in the following) i.e., input is : ... bat

Solution 1:

You are only matching a sinlge line instead of all the lines.

You can repeat the lines for this format keypoint_flip_permutation: \d+ and match the 2 closing curly's

Note that you don't need re.S as there is no dot in the pattern.

data_augmentation_options {\s+random_horizontal_flip\s+{(?:\s+keypoint_flip_permutation: \d+)+\s*}\s*}\s*

Explanation

  • data_augmentation_options { Match literally
  • \s+random_horizontal_flip\s+ match the starting line
  • { Match literally
  • (?: Non capture group
    • \s+keypoint_flip_permutation: \d+ Match the string string followed by 1+ digits
  • )+ Repeat 1+ times
  • \s*} Match optional whitespace chars and }
  • \s*} Match optional whitespace chars and }
  • \s* Match optional whitespace chars

If you want to remove only the trailing newline, you can match \r?\n at the end instead of \s*

Regex demo | Python demo

for example

print(re.sub(r"data_augmentation_options {\s+random_horizontal_flip\s+{(?:\s+keypoint_flip_permutation: \d+)+\s*}\s*}\s*", "", s))

Solution 2:

A few modification to your regex to become

data_augmentation_options {\s+random_horizontal_flip\s+{(\s+keypoint_flip_permutation:\s\d+\s)+\s+}\s+}
  • replace [\s] by just \s, which is equivalent
  • put the \s+ inside the capture group ()
  • replace \d by \d+ to match multi-digits numbers

Solution 3:

You can use a lookahead to stop the deletion at the second pattern:

>>>re.sub(r'^[ \t]*data_augmentation_options[\s\S]+?(?=^[ \t]*data_augmentation_options)','\n\n',s, flags=re.M)

        batch_size: 4
        num_steps: 30
        

        data_augmentation_options {
            random_crop_image {
                min_aspect_ratio: 0.5
                max_aspect_ratio: 1.7
                random_coef: 0.25
            }
        }

The (?=^[ \t]*data_augmentation_options) is the lookahead.

Regex Demo

Post a Comment for "Replace With Multi Line Regex"