Configuring Zepplin to work with Spark

October 20, 2015 - Spark

Here are the difficulties I faced with getting Zepplin to work with Spark, if any are relevant to you, you’ll find this post useful:

  1. Zepplin on Spark, while running via Mesos, but with the Zepplin interpreter returning a JsonMappingException everytime a cluster map/reduce operation is invoked
  2. Zepplin on Spark Standalone, but with the Zepplin app always showing “Disconnected”

Problem number 1 solution:

Sorry, I haven’t found a way around this yet. Will post if I figure this out.

Problem number 2 solution:

It turns out that Zepplin was conflicting with Spark standalone’s ports. This is why whenever I had zepplin running by itself, and without Spark standalone running, Zepplin local spark mode worked fine; however, as soon as I switched on the Spark standalone cluster, Zepplin started showing a “Disconnected” message at the top right, and would not open or allow the creation of notebooks. The key insight was that Zepplin uses TWO ports for its interface, one for the Zepplin front-end and one for its websocket connection. When you have a Spark standalone cluster running, by default your port mappings will be identical to Zepplin’s.

Default port mappings for Spark:

  • SPARK_MASTER_WEBUI_PORT=8080
  • SPARK_WORKER_WEBUI_PORT=8081

Default port mappings for Zepplin:

  • port: 8080
  • websockets: (port + 1 = 8081)

See the conflict? It’s not immediately apparent. The solution is to either change the Spark default WEBUI port mappings or to change the Zepplin port.

Leave a Reply

Your email address will not be published. Required fields are marked *