Interfacing OCaml and Rust: picking the right tool for the job (CUFP 2017)

Track

CUFP 2017

Time Zone

The program is currently displayed in (GMT+01:00) Belfast.

Use conference time zone: (GMT+01:00) BelfastSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Sat 9 Sep 2017 10:30 - 10:55 at L2 - CUFP Talks 2

Abstract

At Ahrefs, we store a huge amount of data. With nearly 200B web-pages in index, our database contains a copy of the Web graph. Given the constant stream of updates, the size of our dataset imposes some challenging constraints on the implementation of data processing algorithms. When a complete processing pass can take several days to complete, it is critical to split the work into smaller tasks and order them by priority in order to minimize the number of out of date data-points.

Due to the dynamic nature of the web, storing in memory as much information as possible offers several advantages. First, keeping state inside the program makes performance predictable and program states easier to reason by reducing interactions with the outside world. Second, in-memory immutable priority queues have good amortized complexity and simplified error handling. Third, through the option to access any previous state (and backtrack to it), a task scheduler interacting with other services in the network can reset to any combination of the previous state and keep in memory storage consistent. However, these features do not come without a cost. In addition to the already sizable amount of memory used by path duplication, the implementations of persistent queues make a heavy use of pointers, which account for an even larger amount of memory at scale, and the GC pressure induced by the high allocation rate increases the proportion of unreachable objects to be collected on the heap.

Moreover, OCaml makes it hard to implement mutable space efficient data structures, with both value boxing and complexity in writing optimized low-level code. Rust, on the other hand, is designed for such a use case. Values are packed by default, the programmer is given thorough control of memory allocations and it comes with a powerful optimizing compiler. But there is a complexity price to pay for this power.

Instead of dropping OCaml altogether, it is possible to combine the two in order to get the best of both world while minimizing friction via code generation and separation of concerns. In this talk, I will discuss our experience implementing a fast and space-efficient priority queue in Rust, and accessing it from OCaml. By wrapping states under an OCaml functional interface, we can keep the advantages of easily composable immutable queues, and try to retain some of the benefits we lost in error handling by recording state changes in a monadic interface for transparent commit/rollback semantics.

This solution can be tedious, as binding two high level languages is not well supported and it is for now required to go through the C API, breaking all abstractions and type safety and making polymorphism hard. However, careful separation can help to work around the pros and cons of each language while containing this problem’s complexity.

Time Zone

The program is currently displayed in (GMT+01:00) Belfast.

Use conference time zone: (GMT+01:00) BelfastSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Sat 9 Sep
Displayed time zone: Belfast change

10:30 - 11:20	CUFP Talks 2CUFP at L2

10:30 25m Talk		Interfacing OCaml and Rust: picking the right tool for the job CUFP Joris Giovannangeli Ahrefs Research
10:55 25m Talk		Distributed load testing with MZBench CUFP Renat Idrisov

Interfacing OCaml and Rust: picking the right tool for the job

Sat 9 Sep
Displayed time zone: Belfast change

Joris Giovannangeli

Ahrefs Research

Tracks

Co-hosted Conferences

Workshops

Co-hosted Symposia

Interfacing OCaml and Rust: picking the right tool for the job

Program Display Configuration

Program Display Configuration

Sat 9 SepDisplayed time zone: Belfast change

Joris Giovannangeli

Ahrefs Research

Sat 9 Sep
Displayed time zone: Belfast change