Skip to content Skip to sidebar Skip to footer

How To Use To_json And From_json To Eliminate Nested Structfields In Pyspark Dataframe?

This solution in theory, works perfectly for what I need, which is to create a new copied version of a dataframe while excluding certain nested structfields. here is a minimally re

Solution 1:

It should be working, you just need to adjust your new_schema to include metadata for the column 'big' only, not for the dataframe:

new_schema = ArrayType(StructType([StructField("keep", StringType())]))

test_df = df.withColumn("big", from_json(to_json("big"), new_schema))

Post a Comment for "How To Use To_json And From_json To Eliminate Nested Structfields In Pyspark Dataframe?"