lansalo.com - Scala Notes – Scala

Description: Scala

Example domain paragraphs

Menu Home About Contact How Not To Persist Spark Dataset / DataFrame toNeo4j (And How To?) 1 January 9, 2020 June 15, 2020 ~ lansaloltd ~ Leave a comment Spark provide an API for graphs and graph-parallel computation ( GraphX ) and Neo4j is probably the most popular graph database so you might expect some sort of “bridge” between them. That’s actually the case, and Neo4j will direct you to the Neo4j-Spark-Connector . It’s a Scala based implementation that uses the officially supported Neo4j Java driver behi

The scenario I’m considering could result from processing data with GraphX and then persisting the results in Neo4j where they could then be read by other process (including another Spark job). Probably because is not such an unusual use case, there are people asking details about it on Stackoverflow and in this issue .

Neo4j-Spark-Connector allegedly cover this case and there is also an example in: Neo4jDataFrame.mergeEdgeList . If we take a look at the code though, mergeEdgeList basically map over the DataFrame and for each row call a method called execute :