Could the Actuarial community benefit from an open Actuarial Intermediate Representation?

Last updated on May 29, 2024

Note, when I talk about Actuarial modeling below I'm referring to long term insurance, not P&C.

What is an "Intermediate Representation"?

Programming languages make different trade-offs, some are high level, some are low level. I might consider Python to be high level and Rust to be low level. However, this classification depends on your perspective.

Modern compilers are made up of many layers of "programming languages", it's just the ones under the hood are not designed for humans. In fact, several compilers for modern languages use the same backend to generate optimized machine code, i.e. they use the same internal "programing language" or intermediate representation.

An intermediate representation (IR) is a standard representation that is designed with the next phase of compilation in mind. When you run cargo build (for Rust) you don't see any of this, you just get a nice new optimized binary but your code can pass through many layers of IR before the final binary is created.

Why are open standards important?

Open standards help spur innovation. The Rust compiler is a very sophisticated piece of software but if it could not rely on the LLVM compiler infrastructure it would have taken much more effort to realize. At some point it became obvious that it did not make sense for every programming language to rebuild how to optimize code for specific hardware targets while also supporting a wide range of targets. Programming languages "compete" on other things, like the experience for developers. Investment in LLVM, for example, benefits all the languages that use LLVM as a backend and the entire developer community is better off as a result.

Apache Arrow set out to do something very simple. It set out to align the community on how to lay out tables of data in memory (down to the bits and memory alignment). This is a very simple concept but has had major impacts, for example see here. As a result, Arrow is seeing major adoption in data systems. With Arrow establishing itself as the foundation it has allowed other projects to be realized such as Apache Substrait which builds on Arrow to define a kind of IR for relational algebra (i.e. database systems).

The competition in the database space is intense, new databases are popping up all the time to the point that it's hard to keep up. However, there is more and more of a trend toward less vertically integrated systems. Some (very smart) people think this is the future of database development and this is only possible due to open standards like Apache Arrow and Apache Substrait.

What would an Actuarial IR look like?

I'm not sure, but I've thought about this alot. Actuarial modeling is interesting as it's quite varied. Regular valuation type runs are alot like batch processing, on the other hand other types of jobs involve path dependent benefits, etc. and can't be parallelized in the same way. However, this would be the whole point of a single IR representation. You represent the logic via one IR and you could deploy and optimize the logic for the specific use case and computation patterns.

There is alot of prior art to pull inspiration from. From Substrait mentioned above to TVM's relay IR which allows automatic differentiation, this would be a very useful feature in Actuarial models.

"Community over Code" is the mantra of the Apache Software Foundation, and it's especially true for projects around open standards. The concept of Arrow is quite simple, the technical details are quite complicated, but its success as an open standard is dependent on the community that supports it.

If an Actuarial IR could be developed it would have to be a community effort.

Why is now the time?

The Actuarial profession and community is being forced to evolve. Actuaries were the original data scientists. However, the rise of the data scientist / Machine Learning / AI, etc. is challenging the Actuarial profession. On this front, the SOA is recognizing the challenge and encouraging Actuaries to learn new skills.

However, on the technology side I don't think we are evolving the way we need to. Machine learning / AI evolved from computer science and require a strong technology foundation (both the skills of the practitioners and the tools available). We also need to free up Actuaries to allow them to work in the areas that have the most impact for the businesses they serve. We need to reduce the cost of Actuarial modeling and provide access to new technologies.

I believe that the status quo of vendor supplied vertically integrated systems is holding us back on this front (this is my experience, if you feel your system is different please reach out! I would love to be proved wrong 😄).

What could the future look like?

Let's assume we have defined our new IR called Actuarial IR (AIR), it was a community effort and is seeing broad adoption, what would be different today?

Each modeling system could focus on what they believe is their competitive advantage, they could focus more on the experience for Actuaries who build models and less on how to optimize their custom domain specific language to the new accelerator of the day. Or they could target a different part of the stack like program analysis to identify inconsistencies in models across an organization, etc.

We could develop a common infrastructure for deploying and optimizing AIR for different hardware targets. This would turn into a common challenge and insurers don't compete in this domain, so they could collaborate to reduce the cost of insurance for the industry.

We could define how to serialize AIR so compute plans could be sent to different machines efficiently allowing all systems that adopt the standard to run in a distributed fashion.

It would allow us to define new reporting standards in a single place as a reference implementation for all the community to use. i.e. the specification for common reserving approaches could be written in AIR.

It would support the modularization of Actuarial models. In this YouTube video on Actuarial transformation the speaker dreams of a more modular Actuarial modeling infrastructure (starting at 22:20). This would be possible with AIR.

We could build a variety of "front-ends". For a small insurer they could use Excel with a plugin that exports their Excel model to AIR. From there the ecosystem of tooling around AIR could be used.

Actuarial software vendors would be pushed to innovate! Vendor lock-in would no longer be the reason to stay as there would be pressure to support an export (and import) to AIR. Some Actuaries may not like this (if they work for Actuarial software vendors) but I believe the benefits to the community as a whole could be immense. Also, there could be a huge new opportunity for software vendors, they could focus less on major parts of the Actuarial modeling stack and instead could focus on innovation and incorporating new technologies like advances in AI, etc.

Summing up...

Actuarial models are becoming more and more complex
different models require different compute patterns
new regulations are pushing us and our models to deliver more in less time
all these pressures leave no room for pushing forward the state of the art in Actuarial modeling/science
we need to reduce the burden
I believe that defining and adopting community driven open standards could do this

A similar path has been followed in the database community, and it has led to new innovation in database development. This YouTube video (though technical in parts) is only 15 minutes long and provides a good summary of the changes in that domain. I can see so many parallels in Actuarial modeling...

Paddy Horan

Actuarial Technology Lead

Technology Actuary and leader who loves solving interesting problems.