Skip to content Skip to sidebar Skip to footer

Python3, Nested Dict Comparison (recursive?)

I'm writing a program to take a .csv file and create 'metrics' for ticket closure data. Each ticket has one or more time entries; the goal is to grab the 'delta' (ie - time differe

Solution 1:

The problem here is that with a nested loop like the one you implemented you double-examine the same ticket. Let me explain it better:

ticket_list = [111111, 111111, 666666, 777777] # lets simplify considering the ids only# I'm trying to keep the same variable namesfor i, key1 inenumerate(ticket_list): # outer loop

    cnt = 1for h, key2 inenumerate(ticket_list): # inner loopif key1 == key2 and i != h:
            print('>> match on i:', i, '- h:', h)
            cnt += 1print('Found', key1, cnt, 'times')

See how it double counts the 111111

>> match on i: 0 - h: 1
Found 111111 2 times
>> match on i: 1 - h: 0
Found 111111 2 times
Found 666666 1 times
Found 777777 1 times

That's because you will match the 111111 both when the inner loop examines the first position and the outer the second (i: 0, h: 1), and again when the outer is on the second position and the inner is on the first (i: 1, h: 0).


A proposed solution

A better solution for your problem is to group the entries of the same ticket together and then sum your deltas. groupby is ideal for your task. Here I took the liberty to rewrite some code:

Here I modified the constructor in order to accept the dictionary itself. It makes passing the parameters later less messy. I also removed the methods to add the deltas, later we'll see why.

import csv
import itertools
from datetime import *

classTime_Entry(object):

    def__init__(self, entry):
        self.ticket_no = entry['Ticket #']
        self.time_entry_day = entry['Time Entry Day']
        self.opened = datetime.strptime(entry['Opened'], '%Y-%m-%d %H:%M:%S.%f')
        self.closed = datetime.strptime(entry['Closed'], '%Y-%m-%d %H:%M:%S.%f')
        self.start = datetime.strptime(entry['Start'], '%Y-%m-%d %H:%M:%S.%f')
        self.end = datetime.strptime(entry['End'], '%Y-%m-%d %H:%M:%S.%f')
        self.total_open_close_delta = (self.closed - self.opened).seconds
        self.total_start_end_delta = (self.end - self.start).seconds


    defdisplay(self):
        print('Ticket #: %7.7s Start: %-15s End: %-15s Delta: %-10s' % (self.ticket_no, self.start.time(), self.end.time(), self.total_start_end_delta))

Here we load the data using list comprehensions, the final output will be a the list of Time_Entry objects:

with open('metrics.csv') as ticket_list:
    time_entry_list = [Time_Entry(line) for line in csv.DictReader(ticket_list)]

print(time_entry_list)
# [<Time_Entry object at 0x101142f60>, <Time_Entry object at 0x10114d048>, <Time_Entry object at 0x1011fddd8>, ... ]

In the nested-loop version instead you kept rebuilding the Time_Entry inside the inner loop, which means for 100 entries you end up initializing 10000 temporary variables! Creating a list "outside" instead allows us to initialize each Time_Entry only once.

Here comes the magic: we can use the groupby in order to collect all the objects with the same ticket_no in the same list:

sorted(time_entry_list, key=lambda x: x.ticket_no)
ticket_grps = itertools.groupby(time_entry_list, key=lambda x: x.ticket_no)

tickets = [(id, [t for t in tickets]) forid, tickets in ticket_grps]

The final result in ticket is a list tuples with the ticket id in the first position, and the list of associated Time_Entry in the last:

print(tickets)
# [('737385', [<Time_Entry object at 0x101142f60>]),
#  ('737318', [<Time_Entry object at 0x10114d048>]),
#  ('737238', [<Time_Entry object at 0x1011fdd68>, <Time_Entry object at 0x1011fde80>]),
#  ...]

So finally we can iterate over all the tickets, and using again a list comprehension we can build a list containing only the deltas so we can sum them together. You can see why we removed the old method to update the deltas, since now we simply store their value for the single entry and then sum them externally.

Here is your result:

for ticket in tickets:
    print('ticket:', ticket[0])
    # extract list of deltas and then sumprint('Delta open / close:', sum([entry.total_open_close_delta for entry in ticket[1]]))
    print('Delta start / end:', sum([entry.total_start_end_delta for entry in ticket[1]]))
    print('(found {} occurrences)'.format(len(ticket[1])))
    print()

Output:

ticket:736964Delta open / close:17012Delta start / end:420(found1occurrences)ticket:737197Delta open / close:18715Delta start / end:840(found1occurrences)ticket:737220Delta open / close:7980Delta start / end:360(found1occurrences)ticket:737238Delta open / close:34718Delta start / end:540(found2occurrences)ticket:737261Delta open / close:9992Delta start / end:600(found1occurrences)ticket:737273Delta open / close:9223Delta start / end:660(found1occurrences)ticket:737296Delta open / close:6957Delta start / end:240(found1occurrences)ticket:737318Delta open / close:8129Delta start / end:1860(found1occurrences)ticket:737385Delta open / close:10401Delta start / end:2340(found1occurrences)

At the end of the story: list comprehensions can be super-useful, they allows you to do a lot of stuff with a super-compact syntax. Also the python standard library contains a lot of ready-to-use tools that can really come to your aid, so get familiar!

Post a Comment for "Python3, Nested Dict Comparison (recursive?)"