Left join with stream

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Left join with stream

Gabriel Martins Dias
Hi guys,

I am using Scala 2.11.8 and Spark 2.0.1 to receive a CSV String via API. 
This string has 3 fields: "id_X, id_Y, id_Z".

Before saving the information into a Cassandra table X, I want to make two left joins with other tables (Y and Z) and store "id_X, id_Y, name_Y, id_Z, name_Z.

It may happen that id_Y is not registered in Y and it would store "null" in the database.

I started with this code to get the values: 

case class InputRegister(id_X:Int, id_Y:Int, name_Y:String, id_Z:Int, name_Z:String)

val myDStream: DStream[InputRegister] = lines.map(value => {
val p = value.split(",(?=(?:[^\"]*\"[^\"]*\")*[^\"]*$)", -1)
InputRegister(
p(0).toInt, // id_X
p(1).toInt, // id_Y
null, // name_Y
p(2).toInt, // id_Z
null // name_Z
)
})

myDStream.foreachRDD( rdd =>{
val sqlContext = SparkSession.builder.getOrCreate()
val df = sqlContext.createDataFrame(rdd, InputRegister.getClass)
val yRdd: CassandraLeftJoinRDD[InputRegister, CassandraRow] = rdd.leftJoinWithCassandraTable("keyspace","table",SomeColumns("name_Y"), SomeColumns("id_Y"))
// TODO: left join with table Z
val rddToSave = yRdd.map(r => {
(r._1.id_X, r._1.id_Y, r._2.get.columnValues(0), r_1.id_Z, null)
})
rddToSave.saveToCassandra("keyspace", "X")
})

Now, I have a few questions:
  • How do I properly make the join with Z to generate the final values?
  • Is there any better way to make this join? Y and Z do not change as often as X, it would be okay to have their values cached.

Best Regards,

--
Gabriel Martins Dias

Telefone: <a href="tel:+55%2011%205082-2656" value="+551150822656">+55 11 5082-2656 | <a href="tel:+55%2011%2097347-6676" value="+5511973476676">+55 11 97347-6676 | Email: [hidden email]

--
You received this message because you are subscribed to the Google Groups "scala-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.