Combine Pivoted And Aggregated Column In Pyspark Dataframe
My question is related to this one. I have a PySpark DataFrame, named df, as shown below. date | recipe | percent | volume ---------------------------------------- 2019-01-0
Solution 1:
According to Spark's source code, it has a special branch for pivoting with single aggregation.
valsingleAgg= aggregates.size == 1
def outputName(value: Expression, aggregate: Expression): String = {
valstringValue= value.name
if(singleAgg) {
stringValue <--- Here
}
else {
valsuffix= {...}
stringValue + "_" + suffix
}
}
I don't know the reason, but the single remaining option is column renaming.
Here is a simplified version for renaming:
def rename(identity: Set[String], suffix: String)(df: DataFrame): DataFrame = {
val fieldNames = df.schema.fields.map(filed => filed.name)
val renamed = fieldNames.map(fieldName => {
if (identity.contains(fieldName)) {
fieldName
} else {
fieldName + suffix
}} )
df.toDF(renamed:_*)
}
Usage:
rename(Set("date"), "_percent")(pivoted).show()
+----------+---------+---------+|date|A_percent|B_percent|+----------+---------+---------+|2019-01-01|0.025|0.05||2019-01-02|0.11|0.06|+----------+---------+---------+
Post a Comment for "Combine Pivoted And Aggregated Column In Pyspark Dataframe"