Skip to content Skip to sidebar Skip to footer

How To Convert Nested Json Structure Having Varying List (as Dictionary Values) To Dataframe

I converted a JSON into DataFrame and ended up with a column 'Structure_value' having below values as a list of dictionary/dictionaries: Structure_value [{'Room'

Solution 1:

If we speak about this particular structure of data, i hope this will help.

Source Data

s_v = [[{'Room': [6], 'Length': 7}, {'Room': [6], 'Length': 7}],[{'Room': [6], 'Length': 22}], [{'Room': [6,6], 'Length': 8}]]
df = pd.DataFrame({'Structure_value':s_v})
df

Out[1]:

    Structure_value
0   [{'Room': [6], 'Length': 7}, {'Room': [6], 'Le...
1   [{'Room': [6], 'Length': 22}]
2   [{'Room': [6, 6], 'Length': 8}]

Normalization

df['tmp'] = df['Structure_value'].apply(lambda x: [{'Room':[v], 'Length': x[0]['Length']} for v in x[0]['Room']] if ((len(x) == 1) & (type(x[0]['Room'])==list)) else x)
pd.DataFrame(df['tmp'].values.tolist())

Out[2]:

     0                            1
0   {'Room': [6], 'Length': 7}    {'Room': [6], 'Length': 7}
1   {'Room': [6], 'Length': 22}   None
2   {'Room': [6], 'Length': 8}    {'Room': [6], 'Length': 8}

You said that this structure of data is appropriate for following processing for you.


Solution 2:

I could not handle your Structure_value presentation as a json file, I don't know if they represent many single files. I used [{'Room': [6], 'Length': 7}, {'Room': [6], 'Length': 7}] as file1 and [{'Room': [6], 'Length': 22}] as file2 and [{'Room': [6,6], 'Length': 8}] as file3.

#treat the irregular structures
def process_structure(s):

    specs = []

    for label,quantity in s.items():

        if isinstance(quantity,list):       
            specs.append(label)
            for elem in quantity:
                specs.append(elem)          
        elif isinstance(quantity,int):
            specs.append(label)
            specs.append(quantity)

    return specs

#open and treat jsons
def treat_json(file):

    with open(file, 'r') as f:

        dicts   = {}
        to_df   = []
        load_df = []

        valRoom = 0
        valLen  = 0

        structures = json.load(f)

        for dicts in structures:

            to_df = process_structure(dicts)
            long  = len(to_df) 

            for i in range(0,long):

                if to_df[i] == 'Room':
                    valRoom = to_df[i+1]
                    load_df.append(valRoom)
                elif to_df[i] == 'Length':
                    valLen = to_df[i+1]
                    load_df.append(valLen)
                elif isinstance(to_df[i],int) and i < (long - 1):
                    if isinstance(to_df[i+1],int):
                        load_df.append(to_df[i+1])
                        load_df.append(valLen)#repeat Length

        while len(load_df) < 4: #if its no complete
            load_df.append(None)

        df_temp = pd.DataFrame([load_df],columns=['Structure_value_room_1','Structure_value_length_1','Structure_value_room_2','Structure_value_length_2'])

    return df_temp

that's the prints:

treat_json('house3.json')
    Structure_value_room_1  ...  Structure_value_length_2
0                       6  ...                         8

[1 rows x 4 columns]

treat_json('house2.json')
    Structure_value_room_1  ...  Structure_value_length_2
0                       6  ...                      None

[1 rows x 4 columns]

treat_json('house1.json')

    Structure_value_room_1  ...  Structure_value_length_2
0                       6  ...                         7

[1 rows x 4 columns]

Post a Comment for "How To Convert Nested Json Structure Having Varying List (as Dictionary Values) To Dataframe"