Filtering Results Of A Counter Function
I'd like to ask for help with finishing my python code. I have a huge text file, filled with 3 columns: First has User names, for example: user_003 Second has number of visits, fo
Solution 1:
Since you have a " huge text file", a faster method would be to use Python Pandas to avoid Python for loops (which are slow).
Code
df = pd.read_csv("bigfile.txt", header=None, sep='\s+') # Read csv into Dataframe
df.columns = ['users', 'visits', 'dates'] # Name columns# Most frequent user
n = 1 # top n i.e. could be 1, 2, 3, etc.print(df['users'].value_counts()[:n])
# Most frequent visitprint(df['visits'].value_counts()[:n])
Example
File: bigfile.txt
user_123 visit_188 1330796847
user_123 visit_188 1330797173
user_123 visit_189 1330802227
user_123 visit_189 1330802277
user_123 visit_190 1330806287
user_123 visit_190 1330806353
user_123 visit_190 1330806353
user_456 visit_191 1330806354
Result for df['users'].value_counts()[:n] shows user_123 occurred 7 times
user_1237Name:users,dtype:int64
Result for df['visits'].value_counts()[:n] shows visit_190 occured 3 times
visit_1903Name:visits,dtype:int64
Solution 2:
Also possible without libraries. This just prints the top (user, visit) tuple.
data = """user_123 visit_188 1330796847
user_123 visit_188 1330797173
user_123 visit_188 1330797173
user_123 visit_188 1330797173
user_123 visit_189 1330802227
user_123 visit_189 1330802277
user_123 visit_190 1330806287
user_123 visit_190 1330806353
"""
c = {}
for line in data.split('\n'):
idx = tuple(line.split()[:2])
if idx in c:
c[idx] += 1else:
c[idx] = 1
ordered = sorted(c.items(), key=lambda x: x[1], reverse=True)
print(ordered[0])
Solution 3:
You need to parse out the specific user names and visit counts and maintain two separate counters:
import re
from collections import Counter
withopen("bigfile.txt", "r") as f:
data = f.read()
visit_counter = Counter()
user_counter = Counter()
rex = re.compile(r'^(\w+)\s+(visit_\d+)')
for line in data.split('\n'):
m = rex.search(line)
if m:
user = m[1]
visit = m[2]
user_counter[user] += 1
visit_counter[visit] += 1
most_common_visits, most_common_visits_number = visit_counter.most_common(1)[0]
print('most common visits:', most_common_visits, 'number:', most_common_visits_number)
print('most common user:', user_counter.most_common(1)[0][0])
Post a Comment for "Filtering Results Of A Counter Function"