Skip to content Skip to sidebar Skip to footer

Loading Json Dataset Into Spark, Then Use Filter, Map, Etc

I'm new to Apache Spark, and would like to take a dataset saved in JSON (a list of dictionaries), load it into an RDD, then apply operations like filter and map. This seems to me l

Solution 1:

You could do something like

import org.json4s.JValue
import org.json4s.native.JsonMethods._

val jsonData: RDD[JValue] = sc.textFile(path).flatMap(parseOpt)

and then do your JSON processing on that JValue, like

jsonData.foreach(json => {
  println(json \ "someKey")
  (json \ "id") match {
    caseJInt(x) => ???
    case _ => ???
})

Solution 2:

Have you tried appling json.loads() in the mapping?

import json
f = sc.textFile('/path/to/file')
d = lines.map(lambda line: json.loads(line))

Post a Comment for "Loading Json Dataset Into Spark, Then Use Filter, Map, Etc"