Skip to content Skip to sidebar Skip to footer

Filtering Results Of A Counter Function

I'd like to ask for help with finishing my python code. I have a huge text file, filled with 3 columns: First has User names, for example: user_003 Second has number of visits, fo

Solution 1:

Since you have a " huge text file", a faster method would be to use Python Pandas to avoid Python for loops (which are slow).

Code

df = pd.read_csv("bigfile.txt", header=None, sep='\s+')  # Read csv into Dataframe
df.columns = ['users', 'visits', 'dates']                # Name columns# Most frequent user
n = 1                                                    # top n i.e. could be 1, 2, 3, etc.print(df['users'].value_counts()[:n])                              

# Most frequent visitprint(df['visits'].value_counts()[:n])

Example

File: bigfile.txt

user_123    visit_188   1330796847
user_123    visit_188   1330797173
user_123    visit_189   1330802227
user_123    visit_189   1330802277
user_123    visit_190   1330806287
user_123    visit_190   1330806353
user_123    visit_190   1330806353
user_456    visit_191   1330806354

Result for df['users'].value_counts()[:n] shows user_123 occurred 7 times

user_1237Name:users,dtype:int64

Result for df['visits'].value_counts()[:n] shows visit_190 occured 3 times

visit_1903Name:visits,dtype:int64

Solution 2:

Also possible without libraries. This just prints the top (user, visit) tuple.

data = """user_123    visit_188   1330796847
user_123    visit_188   1330797173
user_123    visit_188   1330797173
user_123    visit_188   1330797173
user_123    visit_189   1330802227
user_123    visit_189   1330802277
user_123    visit_190   1330806287
user_123    visit_190   1330806353
"""

c = {}
for line in data.split('\n'):
    idx = tuple(line.split()[:2])
    if idx in c:
        c[idx] += 1else:
        c[idx] = 1
ordered = sorted(c.items(), key=lambda x: x[1], reverse=True)
print(ordered[0])

Solution 3:

You need to parse out the specific user names and visit counts and maintain two separate counters:

import re
from collections import Counter

withopen("bigfile.txt", "r") as f:
    data = f.read()
    
visit_counter = Counter()
user_counter = Counter()
rex = re.compile(r'^(\w+)\s+(visit_\d+)')
for line in data.split('\n'):
    m = rex.search(line)
    if m:
        user = m[1]
        visit = m[2]
        user_counter[user] += 1
        visit_counter[visit] += 1
most_common_visits, most_common_visits_number = visit_counter.most_common(1)[0]
print('most common visits:', most_common_visits, 'number:', most_common_visits_number)
print('most common user:', user_counter.most_common(1)[0][0])

Post a Comment for "Filtering Results Of A Counter Function"