should i use scala or python for spark ?

Python is slower but very easy to use, while Scala is faster and moderately easy to use. Performance is mediocre when Python programming code is used to make calls to Spark libraries, but if there is a lot of processing involved, Python code becomes much slower than the equivalent Scala code. Its API is intended for data processing and analysis in multiple programming languages such as Java, Python and Scala. Choosing a programming language for Apache Spark is a subjective matter, as the reasons why a particular data scientist or data analyst likes Python or Scala for Apache Spark may not always apply to others.

Refactoring code from a statically typed language like Scala is much easier and hassle-free than refactoring code from a dynamic language like Python. Data scientists often prefer to learn both Scala for Spark and Python for Spark, but Python is often the second favourite language for Apache Spark, as Scala came first. If you have enough experience with any statically typed programming language like Java, you can stop worrying about not using Scala at all. I'm working on a project called bebe that will hopefully provide the community with a high-performance, type-safe Scala programming interface.

However, when there is significant processing logic, performance is an important factor and Scala definitely offers better performance than Python, for programming against Spark. A quick look at the salaries offered by Python and Scala skills shows that Scala as a skill offers more salary in the job market than Python. Scala allows you to express general programming patterns in a very concise and effective format while minimising the number of lines of code. Scala offers a lot of advanced programming features, but you don't need to use any of them when writing Spark code.

You can use the basic programming features of Scala with the IntelliJ IDE and get useful features like type hints and compile-time checks for free. You will master the essential skills of the open source Apache Spark framework and the Scala programming language. There is a growing demand for Scala developers because big data companies value developers who can master a productive and robust programming language for data analysis and processing in Apache Spark. Scala is a powerful programming language that offers developer-friendly features not available in Python.

Scala and Python languages are equally expressive in the context of Spark, so using Scala or Python can achieve the desired functionality.