How To Convert Nested Json Structure Having Varying List (as Dictionary Values) To Dataframe
I converted a JSON into DataFrame and ended up with a column 'Structure_value' having below values as a list of dictionary/dictionaries: Structure_value [{'Room'
Solution 1:
If we speak about this particular structure of data, i hope this will help.
Source Data
s_v = [[{'Room': [6], 'Length': 7}, {'Room': [6], 'Length': 7}],[{'Room': [6], 'Length': 22}], [{'Room': [6,6], 'Length': 8}]]
df = pd.DataFrame({'Structure_value':s_v})
df
Out[1]:
Structure_value
0 [{'Room': [6], 'Length': 7}, {'Room': [6], 'Le...
1 [{'Room': [6], 'Length': 22}]
2 [{'Room': [6, 6], 'Length': 8}]
Normalization
df['tmp'] = df['Structure_value'].apply(lambda x: [{'Room':[v], 'Length': x[0]['Length']} for v in x[0]['Room']] if ((len(x) == 1) & (type(x[0]['Room'])==list)) else x)
pd.DataFrame(df['tmp'].values.tolist())
Out[2]:
0 1
0 {'Room': [6], 'Length': 7} {'Room': [6], 'Length': 7}
1 {'Room': [6], 'Length': 22} None
2 {'Room': [6], 'Length': 8} {'Room': [6], 'Length': 8}
You said that this structure of data is appropriate for following processing for you.
Solution 2:
I could not handle your Structure_value presentation as a json file, I don't know if they represent many single files. I used [{'Room': [6], 'Length': 7}, {'Room': [6], 'Length': 7}] as file1 and [{'Room': [6], 'Length': 22}] as file2 and [{'Room': [6,6], 'Length': 8}] as file3.
#treat the irregular structures
def process_structure(s):
specs = []
for label,quantity in s.items():
if isinstance(quantity,list):
specs.append(label)
for elem in quantity:
specs.append(elem)
elif isinstance(quantity,int):
specs.append(label)
specs.append(quantity)
return specs
#open and treat jsons
def treat_json(file):
with open(file, 'r') as f:
dicts = {}
to_df = []
load_df = []
valRoom = 0
valLen = 0
structures = json.load(f)
for dicts in structures:
to_df = process_structure(dicts)
long = len(to_df)
for i in range(0,long):
if to_df[i] == 'Room':
valRoom = to_df[i+1]
load_df.append(valRoom)
elif to_df[i] == 'Length':
valLen = to_df[i+1]
load_df.append(valLen)
elif isinstance(to_df[i],int) and i < (long - 1):
if isinstance(to_df[i+1],int):
load_df.append(to_df[i+1])
load_df.append(valLen)#repeat Length
while len(load_df) < 4: #if its no complete
load_df.append(None)
df_temp = pd.DataFrame([load_df],columns=['Structure_value_room_1','Structure_value_length_1','Structure_value_room_2','Structure_value_length_2'])
return df_temp
that's the prints:
treat_json('house3.json')
Structure_value_room_1 ... Structure_value_length_2
0 6 ... 8
[1 rows x 4 columns]
treat_json('house2.json')
Structure_value_room_1 ... Structure_value_length_2
0 6 ... None
[1 rows x 4 columns]
treat_json('house1.json')
Structure_value_room_1 ... Structure_value_length_2
0 6 ... 7
[1 rows x 4 columns]
Post a Comment for "How To Convert Nested Json Structure Having Varying List (as Dictionary Values) To Dataframe"