Skip to content Skip to sidebar Skip to footer

Choosing Right Data Structure To Parse A File

I have a csv file with contents in the following format: CSE110, Mon, 1:00 PM, Fri, 1:00 PM CSE114, Mon, 8:00 AM, Wed, 8:00 AM, Fri, 8:00 AM which is basically course name followe

Solution 1:

Start with noting a few things about your data:

  1. You have a number of unique strings (the courses)
  2. After each course, there is a number of strings (the times the class meets per week)

With that, you have a series of unique keys that each have a number of values.

Sounds like a dictionary to me.

To get that data into a dictionary, start with reading the file. Next, you can either use regular expressions to select each [day], [hour]:[minutes] [AM/PM] section or plain old string.split() to break the line into sections by the commas. The course string is the key into the dictionary with the rest of the line as a tuple or list of values. Move onto the next line.


Solution 2:

{
    'CSE110': {'Mon': ['8: 00 AM'], 'Wed': ['8: 00 AM'], 'Fri': ['8: 00 AM'], 
    'CSE110': {'Mon': ['1: 00 PM'], 'Fri': ['1: 00 PM']}
}

A dictionary of this form. A course can have multiple slots for the same day.

When you read the csv file, you create for the course and that day(if it doesnt already exists) and assign it a single element list for the timing. If the value for the course and day is already present, you just append to the existing list. This means that course has more than one timings on the same day.

You don't need a regex to find the category of the input line. The first and second types that you have(i.e. single day and multiple days) can be found like

l = line.split(', ')
try:
    n = int(l[1]) # n = strength
except:
    #continue adding to dictionary since second element in the list is not an integer

Post a Comment for "Choosing Right Data Structure To Parse A File"