How To Use To_json And From_json To Eliminate Nested Structfields In Pyspark Dataframe?
This solution in theory, works perfectly for what I need, which is to create a new copied version of a dataframe while excluding certain nested structfields. here is a minimally re
Solution 1:
It should be working, you just need to adjust your new_schema to include metadata for the column 'big' only, not for the dataframe:
new_schema = ArrayType(StructType([StructField("keep", StringType())]))
test_df = df.withColumn("big", from_json(to_json("big"), new_schema))
Post a Comment for "How To Use To_json And From_json To Eliminate Nested Structfields In Pyspark Dataframe?"