July 5th, 2013
A core part of what we are doing at Fidesmo is related to implementing standards like Global Platform card specification, ISO 8825-1 or Mifare 4 Mobile. All these standards are fairly low level and are related to coding the data to be sent on the “wire”. The implementation of a low level standard is usually pretty straightforward, it is like having a customer specification written by an engineer. What has been constantly challenging us is instead how we can make these standards available to our customers in the language of their choice. We can not write a version for each programing language out there. At the same time our implementation must not be a source of faults. Most of our customers are running big online transaction processing systems (such as online ticketing or issuance of access credentials), so fatal faults are not an option. All in all we came up with the following requirements for the library:
- Easy to integrate into any language and platform
- Easy to integrate into existing systems
- Must never crash
- Must never leak memory
This might not seem as such tough requirements, at least the last two bullets is something that all developers will assure that their system conforms to. E.g. a software written in Java would never leak memory (unless you are actively trying to). So what is the big deal? It is the combination of these four requirements that has given us a big headache.
In times when the latest functional language is the cool kid in town, this choice of programming language might be a little bit surprising. Our choice of language is C, not C++ or Objective-C, just plain old C. The reason? C is easily integrated into nearly any language, examples are Java, C#, and Python. Actually, I do not know of any language that can interact with other languages that does not interact well with C. This is based on the fact that most languages themselves are written in C. C is also a very portable language with compilers for all major operating systems and platforms, including Android and iOS. This solves the first requirement “Easy to integrate into any language and platform”
To say that C is easily integrated into any language and platform is not the same as to say that the integrated code is easy to use. Therefore, for each language for which we provide support, we also provide a binding (or wrapper) that transforms the C library API into something that fits nicely into the specific language is required. This is not as easy as one can think. There are lots of concepts that we want to support, some of which need to be carefully designed. Take serialization as an example. It is not possible to serialize data that is inside the C library. It has to be extracted into objects native to the language itself before any serialization can be made. Another concept is that of a virtual machine. In languages based on a virtual machine it prevents one part (or thread) of the program to accidentally crash other parts. A segmentation fault in the C-library would circumvent this protection and kill the entire virtual machine. This is contradictory to the requirements “Must never crash” and “Easy to integrate into existing systems”. How can this be solved?
The multi language C-library
The core of the problem with integrating a C-library into another language is how to manage state. Not only state internal to the library itself, but also the state communicated over the binding. The solution is to keep the C-library completely stateless. This means that each function in the library is given all data required for its execution from the external caller. Therefore it is the caller that manages the state, thus any type of serialization is possible. This also implies that there is no need for memory allocation in the C-library, which completely eliminates problems with memory leaks.
A nice side effect of a stateless library is that it also has very few side effects. In our library the only side effects are for the objects allocated to be used as output, like byte buffers that will contain the encoded data. Everything else is even passed as constants (const keyword) in the C-API. Since we are also writing the bindings, it is trivial to assure that all functions have no side effects by always allocating new objects for arguments used as output. All in all, this is starting to look a lot like the properties you would strive for in a functional programing language (C is a fundamentally imperative language). Actually, the functionality is so well isolated that the library does not even require to do system calls! This makes it very easy to assure the safety of any pointers passed (they are only related to output buffers) and thus assure that there are no places where we would reference data not properly allocated by the high level language.
The library we are writing has proven to be extremely well suited for unit testing. This is also a side effect of the statelessness of the library. So what we ended up with is an easily testable and verifiable C-library that is specifically tailored for integration into other programming languages. Right now we are working on the first binding for the Scala language which will be used internally in our SaaS Trusted Service Manager. It is sure that we will need to improve the C-Library to make it even easier to integrate, we will follow up on this progress in this blog, but we already now feel confident that the C-library will create no additional problems for us when we use it in our own SaaS.