Are you proficient in Python but considering stepping into Scala?
Data Analyst, Emma Grimaldi has given us her reflections on her experience of learning how to handle strings, lists, dictionaries and more.
What has your experience of moving to another language been?
'I recently started playing a little bit with Scala, and I have to say it has been kind of traumatic. I love learning new things but after months of programming with Python, it is just not natural to set that aside and switch mode while solving Data Science problems. When learning a new language, whether it is a coding or a spoken one, it is normal for this to happen. We tend to fill in the gaps of the things we don’t know with the things we know, even if they don’t belong to the language we are trying to write/speak! When trying to learn a new language, it is important to be completely surrounded by the language you want to learn, but first of all, it is important to have well-established parallelisms between the known and the new language, at least in the beginning. This works for me, a bilingual person who learned a second language really quickly, at an adult age. At the beginning, I needed connections between Italian (the language I knew) and English (the language I was learning), but as I became more and more fluent in English, I started to forget the parallelisms because it was just becoming natural and I didn’t need to translate it in my head first, anymore. The reason why I decided to write this post is, in fact, to establish parallelisms between Python and Scala, for people who are fluent in one of the two, and are starting to learn the other one, like myself.
I initially wanted to focus on Pandas/Sklearn and Spark, but I realized that it doesn’t make much sense without covering the foundations first. This is why in this post we’ll look at the basics of Python and Scala: how to handle strings, lists, dictionaries and so on. I intend in the near future to publish a second part, where I will cover how to handle dataframes and build predictive models in both languages.
1. First things first
The first difference is the convention used when coding is these two languages: this will not throw an error or anything like that if you don’t follow it, but it’s just a non-written rule that coders follow.
When defining a new variable, function or whatever, we always pick a name that makes sense to us, that most likely will be composed by two or more words. If this is the case, in Python we will use 'snake_case', while in Scala 'camelCase': the difference is immediately noticeable. In snake case, all words all lower-case and we use '_' to separate them, in camel case there is no separation, and all words are capitalized except for the first one.
Another striking difference is how we define the variables in the two languages. In Python we just make up a name and assign it to the value we need it to be, while in Scala, we need to specify whether we are defining a variable or a value, and we do this by placing 'var' or 'val' respectively, before the name (notice that this is valid whether we are assigning numerical values or strings).
The difference between 'var' and 'val' is simple: variables can be modified, while values cannot. In the example represented in the image, I instantiated a 'var' string and then changed it: all good. Then, I assigned the same string to a val and tried to change it again: not doable.
In Python there is no need to specify: if you want to change something you previously assigned, it’s up to you. In Python’s case I would just do 'string = 'my_string''.
Initializing values and variables in Scala.
Another general difference regards commenting. In Python there is only one way to do it, whether it’s a single or multi-line, and that is putting a '#' before the comment, on each line:
'# this is a commented line in Python'
Scala offers a couple of ways to comment, and these are either putting '//' on each line, or wrap the comment between '/*' and '*/':
Now that the very basics are explained, let’s see dive deeper.
2. Lists and arrays
List (in Python) or Array (in Scala) are among the most important objects: they can contain strings and/or numbers, we can manipulate them, iterate over them, add or subtract elements and so on. They can basically serve any purposes, and I don’t think I have ever coded anything without using them, so let’s see what we can do with them, and how.
Let’s create a list containing a mix of numbers and strings.
Both lists and arrays are zero indexed, which means that the first element is placed at the index 0. So, if we want to extract the second element:
In both languages, the second index will not be counted when slicing. So, if we want to extract the first 3 elements:
2.4. Checking first, last, maximum and minimum element
2.5. Sum and product
These operations, as for min and max, will be supported only if the lists/arrays contain exclusively numbers. Also, to multiply all the elements in a Python’s list, we will need to set up a 'for' loop, which will be covered further down in the post. There is no preloaded function for that, as opposed to Scala.
2.6. Adding elements
Lists and arrays are not ordered, so it’s common practice to add elements at the end. Let’s say we want to add the string '"last words"':
If, for some reason, we want to add something at the very beginning, let’s say the number '99':
This is also something that we use all the time while coding, luckily there is a only a slight difference between the two languages.
4. For loop
5. Mapping and/or filtering
All things that, in Python, can be done by using list comprehensions. In Scala we will have to use functions.
5.3. Filtering and mapping
What if we want to find the even numbers and multiply only them by 3?
6.1. Create dictionary/map
In Scala we can do this in two different ways.
6.2. Adding to dictionary/map
Let’s add my Country of origin to my dictionary/map.
If we want to print the dictionary/map, we will have to for loop in both cases, over keys and values.
Indentation is also important in Python, or the function will not work. Scala instead just likes its curly braces.
That’s it! I hope you found this helpful as an immediate reference for those of you who are just starting to get familiar with either Python or Scala. The following step will be to build a similar guide to explore the differences between pandas/sklearn and sparks, looking forward to it! I hope you do as well!
If you are wondering why you should use Python rather than Scala, or vice versa, I found the image below rather helpful in clarifying the immediate differences between the two.