OCaml Multicore - March 2020

Multicore OCaml: March 2020

Welcome to the March 2020 news update from the Multicore OCaml team! This update has been assembled with @shakthimaan and @kayceesrk, as with the February and January ones.

Our work this month was primarily focused on performance improvements to the Multicore OCaml compiler and runtime, as part of a comprehensive evaluation exercise. We continue to add additional benchmarks to the Sandmark test suite. The eventlog tracing system and the use of hash tables for marshaling in upstream OCaml are in progress, and more PRs are being queued up for OCaml 4.11.0-dev as well.

The biggest observable change for users trying the branch is that a new GC (the "parallel minor gc") has been merged in preference to the previous one ("the concurrent minor gc"). We will have the details in longer form at a later stage, but the essential gist is that the parallel minor GC no longer requires a read barrier or changes to the C API. It may have slightly worse scalability properties at a very high number of cores, but is roughly equivalent at up to 24 cores in our evaluations. Given the vast usability improvement from not having to port existing C FFI uses, we have decided to make the parallel minor GC the default one for our first upstream runtime patches. The concurrent minor GC follow at a later stage when we ramp up testing to 64-core+ machines. The multicore opam remote has been updated to reflect these changes, for those who wish to try it out at home.

We are now at a stage where we are porting larger applications to multicore. Thanks go to:

  • @UnixJunkie who helped us integrate the Gram Matrix benchmark in https://github.com/ocaml-bench/sandmark/issues/99
  • @jhw has done extensive work towards supporting Systhreads in https://github.com/ocaml-multicore/ocaml-multicore/pull/240. Systhreads is currently disabled in multicore, leading to some popular packages not compiling.
  • @antron has been advising us on how best to port Lwt_preemptive and the Lwt_unix modules to multicore, giving us a widely used IO stack to test more applications against.

If you do have other suggestions for application that you think might provide useful benchmarks, then please do get in touch with myself or @kayceesrk.

Onto the details! The various ongoing and completed tasks for Multicore OCaml are listed first, which is followed by the changes to the Sandmark benchmarking infrastructure and ongoing PRs to upstream OCaml.

Multicore OCaml

Ongoing

  • ocaml-multicore/ocaml-multicore#240 Proposed implementation of threads in terms of Domain and Atomic

    A new implementation of the Threads library (for use with the new Domain and Atomic modules in Multicore OCaml) has been proposed. This builds Dune 2.4.0, which in turn makes it useful to build other packages. This PR is open for review.

  • ocaml-multicore/safepoints-cmm-mach Better safe points for OCaml

    A newer implementation to insert safe points at the Cmm level is being worked upon in this branch.

Completed

The following PRs have been merged into Multicore OCaml:

  • ocaml-multicore/ocaml-multicore#303 Account correctly for incremental mark budget

    The patch correctly measures the incremental mark budget value, and improves the maximum latency for the menhir.ocamly benchmark.

  • ocaml-multicore/ocaml-multicore#307 Put the phase change event in the actual phase change code. The PR includes the major_gc/phase_change event in the appropriate context.

  • ocaml-multicore/ocaml-multicore#309 Don't take all the full pools in one go.

    The code change selects one of the global_full_pools to try sweeping it later, instead of adopting all of the full ones.

  • ocaml-multicore/ocaml-multicore#310 Statistics for the current domain are more recent than other domains

    The statistics (minor_words, promoted_words, major_words, minor_collections) for the current domain are more recent, and are used in the right context.

  • ocaml-multicore/ocaml-multicore#315 Writes in caml_blit_fields should always use caml_modify_field to record young_to_young pointers

    The PR enforces that caml_modify_field() is always used to store young_to_young pointers.

  • ocaml-multicore/ocaml-multicore#316 Fix bug with Weak.blit.

    The ephemerons are allocated as marked, but, the keys or data can be unmarked. The blit operations copy weak references from one ephemeron to another without marking them. The patch marks the keys that are blitted in order to keep the unreachable keys alive for another major cycle.

  • ocaml-multicore/ocaml-multicore#317 Return early for 0 length blit

    The PR forces a CAMLreturn() call if the blit length is zero in byterun/weak.c.

  • ocaml-multicore/ocaml-multicore#320 Move num_domains_running decrement

    The caml_domain_alone() invocation needs to be used in the shared heap teardown, and hence the num_domains_running decrement is moved as the last operation for at least the shared_heap lockfree fast paths.

Benchmarking

The Sandmark performance benchmarking test suite has had newer benchmarks added, and work is underway to enhance its functionality.

  • ocaml-bench/sandmark#88 Add PingPong Multicore benchmark

    The PingPong benchmark that uses producer and consumer queues has now been included into Sandmark.

  • ocaml-bench/sandmark#98 Add the read/write Irmin benchmark

    A basic read/write file performance benchmark for Irmin has been added to Sandmark. You can vary the following input parameters: number of branches, number of keys, percentage of reads and writes, number of iterations, and the number of write operations.

  • ocaml-bench/sandmark#100 Add Gram Matrix benchmark

    A request ocaml-bench/sandmark#99 to include the Gram Matrix initialization numerical benchmark was created. This is useful for machine learning applications and is now available in the Sandmark performance benchmark suite. The speedup (sequential_time/multi_threaded_time) versus number of cores for Multicore (Concurrent Minor Collector), Parmap and Parany is quite significant and illustrated in the graph:

Gram matrix speedup benchmark

  • ocaml-bench/sandmark#103 Add depend target in Makefile

    Sandmark now includes a depend target defined in the Makefile to check that both libgmp-dev and libdw-dev packages are installed and available on Ubuntu.

  • ocaml-bench/sandmark#90 More parallel benchmarks

    An issue has been created to add more parallel benchmarks. We will use this to keep track of the requests. Please feel free to add your wish list of benchmarks!

OCaml

Ongoing

  • ocaml/ocaml#9082 Eventlog tracing system

    The configure script has now been be updated so that it can build on Windows. Apart from this major change, a number of minor commits have been made for the build and sanity checks. This PR is currently under review.

  • ocaml/ocaml#9353 Reimplement output_value using a hash table to detect sharing.

    The ocaml/ocaml#9293 "Use addrmap hash table for marshaling" PR has been re-implemented using a hash table and bit vector, thanks to @xavierleroy. This is a pre-requisite for Multicore OCaml that uses a concurrent garbage collector.

As always, we thank the OCaml developers and users in the community for their code reviews, support, and contribution to the project. From OCaml Labs, stay safe and healthy out there!