Chapter 7: Type Erasure

Type Erasure is the concept which erases types related information during compile time. Angelika Langer explains this concept in such a beautiful manner which we conclude here.

All statically typed languages have two-phase i.e compilation and execution. Compilation phase handle by the compiler which translates our source code into the type of execution environment requires. Like in the case of C, C++ the source is compiled into machine code and In case of JVM based languages like Java and Scala, code converts into Bytecode.

Most of the programming languages have a feature called Generics or Parameterize Types, which help us remove the boilerplate code and makes our code for reusability. Generics perform an important part for every programming language because it helps us to design libraries and APIs. As per the article by Angelika, generics are handled via two principles in every programming language as below:

  • Code Specialization

  • Code Sharing

Code Specialization

The way, after compilation the code, the generics code is converted into a specified type of code like, if we are using generic List[String], List[Int] and more. It actually created two different List, one of String and other for Int as shown in below code:

Note: Below sample code is written via Scala syntax for learning perspective, but Scala compilation follows code sharing technique.

class List[T] {
	def contains(a: T): Boolean = { … }
}

new List[Int]
new List[String]
class List {
	public boolean contains(Integer a) { ... }
}

class List {
	public boolean contains(String a) { ... } 
}

C++ is one of the languages which follows code specialization for generating machine code to different types while we are using templates.

Now, guess what is the problem which that approach?

The major problem with that approach is during execution, we are creating the un-necessary large amount of code for execution which is called code blot and because of repeatable code, our memories consumption going to increase and if we are using generics libraries based on code specialization, then guess what happened?

Code Sharing

The way, where the compiler generates share code for all types and add require conversation where it needs and adds synthetic method(Bridge Method) if requires. All JVM based language compilers follow the same principle.

In Scala/Java, we are creating a List[Int] and List[String]. By default compiler replaces type parameter [T] with Object type (we can also restict that with specific type which we will discuss in chapter 9) and add type conversions where we read the elements from the list. So we can see the code below:

class List[T] {
	def contains(a: T): Boolean = { … }
}

val integerList = new List[Int].head
val stringList = new List[String].head
class List {
	public boolean contains(Object a) { … }
}

Integer integerList = (Integer) new List.head
String stringList = (String) new List.head

The above conversation is sample of Bytecode conversion because compiler adds a lot of stuff after compilation of Scala code, but the main points are compiler removes the type parameter into the bytecode and add the conversions(casting) during read the elements from the list.

The problem with this approach is, the compiler needs to add conversion into various parts of the code. As per reading the element which makes bytecode vague, but still, this is not our responsibility to add conversion into the bytecode, compiler handles this with the graceful way and our executables are not going to huge.

Bridge Method

The synthetic method which is added by the compiler, if it required. I am not going too in-depth of this method but this is an interesting role performed by the compiler. The bridge method is only added if our class or type extends some other class or type which is parameterized (generic) and we are overriding one or more methods, like in below code:

class Stack [T] {
	private val list: ListBuffer[T] = new ListBuffer[T]

	def push(a: T) = list += a

	def pop = list.remove(list.size - 1)
}

class IntegerStack extends Stack[Int] {
	private val list: ListBuffer[Int] = new ListBuffer[Int]

	override def push(a: Int) = list += a
}

As we know, after compilation the Type information is removed and replaced by leftmost bound or default bound. In our case, the default bound Object has come into the picture. Let's see the code after compile:

class Stack {
	private ListBuffer list = new ListBuffer

	public void push(Object a) {  list += a } 

	public void pop() {  list.remove(list.size - 1) } 
}

As per example, in a subclass, we are overriding the push method but the problem is, Type Erasure, change the signature of the superclass method, because of that our overriding rule violates. So, here compiler plays the role and add synthetic method in a subclass for maintaining overriding rules in bytecode.

class IntegerStack extends Stack {
	private ListBuffer list = new ListBuffer

	public void push(Int a) {  list += a } 
	public void push(Object a) {  this.push((Int) a) } 
}

In the above code, it actually adds synthetic method, push(Object) which not violates the overriding rule. But this doesn’t mean we can call the bridge method directly. If we really want to call the bridge method, we need to use reflection. For more detail, you can jump into Angelika Langer FAQ.

There are a couple of other things are added by Scala compile like TypeTag, ClassTag, and WeakTypeTag which you can explore from Sinisa Louc blog. Where he explains this for the novice.

Last updated