Skip to content Skip to sidebar Skip to footer

Finding A Pattern Match And Concatenating The Rest Of Lines In Python

I have a small data set to clean. I have opened the text file in Pycharm. The data set is like this: Code-6667+ Name of xyz company+ Address + Number+ Contact person+ Code-6668+

Solution 1:

Your question wasn't really clear, following a snippet to print out a line for each company, starting with "CodeXXXX - " and following with the other details.

withopen(FILEPATH, 'r') as f:
    current_line = Nonefor line in f:
        line = line.strip()
        if line.startswith('Code-'):
            # new companyif current_line isnotNone:
                print(current_line)

            # create a line that starts with 'CodeXXXX - '
            current_line = line.replace('-', '').replace('+', '') + ' - 'else:
            current_line += line
            current_line += ' '

Output of your example code:

Code6667 - Name of xyz company+ Address + Number+ Contact person+ 
Code6668 - Name of abc company,Address, number, contact person+ 

Solution 2:

I don't know what these + mean in your example.. if they are part of the file you'll want to deal with them as well but here is a way to extract the data (with regex) in a dictionary with the code as key and the info as a list.. afterwards you can format it however you want

This is assuming your entries, when on the same line are separated by ,, but it can be adapted for anything else. Also this is based on the fact that in your example every code is on a new line, and has no info after it.

import re

res = {}

withopen('in.txt', 'r') as f:
    current = Nonefor line in f.readlines():
        if re.match(r"Code-\d+", line):
            current = line.strip()
            res[current] = []
            continueif current: res[current] += line.strip().split(",")

print res

result:

{'Code-6667+': ['Name of xyz company+', 'Address +', 'Number+', 'Contact person+'], 'Code-6668+': ['Name of abc company', 'Address', ' number', ' contact person+'], 'Code-6669+': ['name of company ', ' Address+', 'number ', ' contact person +']}

Solution 3:

(Note: I'm note quite sure whether you want to keep the + sign. The following codes assume you do. Otherwise it's easy to get rid of the + with a bit of string manipulations).

 Input file

Here is the input file...

dat1.txt:

Code-6667+
Name of xyz company+ 
Address +
Number+ 
Contact person+
Code-6668+
Name of abc company,Address, number, contact person+
Code-6669+
name of company , Address+
number , contact person +

Code

Here is the code... (comment / uncomment the print block for Python 2.x/3.x version)

mycode.py:

import sys
print sys.version

# open input text file
f = open("dat1.txt", "r")

# initialise our final output - a phone book
phone_book = {}

# parse text file data to phone book, in a specific format
code = ''for line in f:
        if line[:5] == 'Code-':
            code = (line[:4] + line[5:]).strip()
            phone_book[code] = []
        elif code:
            phone_book[code].append(line.strip())    
        else:
            continue# close text file
f.close()


# print result to console (for ease of debugging). Comment this block if you want:for key, value in phone_book.items():

    #python 3.x# print("{0} - Company details: {1}".format(key, value))#python 2.xprint key + " - Company details: " + "".join(value)

# write phone_book to dat2.txt
f2 = open("dat2.txt", "w")
for key, value in phone_book.items():
    f2.write("{0} - Company details: {1}\n".format(key, value))
f2.close()

 Output

Here is what you will see in console (via print()) or dat2.txt (via f2.write())...

Code6667+ -Companydetails: ['Name of xyz company+', 'Address +', 'Number+', 'Contact person+']Code6668+ -Companydetails: ['Name of abc company,Address, number, contact person+']Code6669+ -Companydetails: ['name of company , Address+', 'number , contact person +']

 Screenshot

enter image description here

Post a Comment for "Finding A Pattern Match And Concatenating The Rest Of Lines In Python"