Thanks to the work of Nicholas Nethercote and Alex Crichton, there have been some recent improvements that reduce the size of compiled libraries, and improves the compile-time performance, particularly when using LTO. This post dives into some of the details of what changed, and an estimation of the benefits.

These changes have been added incrementally over the past three months, with the latest changes landing just a few days ago on the nightly channel. The bulk of the improvements will be found in the 1.46 stable release (available on 2020-08-27). It would be great for any projects that use LTO to test it out on the nightly channel (starting from the 2020-06-13 release) and report any issues that arise.

archive file. This has historically contained the following:

  • Object code, which is the result of code generation. This is used during regular linking.
  • LLVM bitcode, which is a binary representation of LLVM’s intermediate representation. This can be used for Link Time Optimization (LTO).
  • Rust-specific metadata, which covers a wide range of data about the crate.

LTO is an optimization technique that can perform whole-program analysis. It analyzes all of the bitcode from every library at once, and performs optimizations and code generation. rustc supports several forms of LTO:

  • Fat LTO. This performs “full” LTO, which can take a long time to complete and may require a significant amount of memory.
  • Thin LTO. This LTO variant supports much better parallelism than fat LTO. It can achieve similar performance improvements as fat LTO (sometimes even better!), while taking much less total time by taking advantage of more CPUs.
  • Thin-local LTO. By default, rustc will split a crate into multiple “codegen units” so that they can be processed in parallel by LLVM. But this prevents some optimizations as code is separated into different codegen units, and is handled independently. Thin-local LTO will perform thin LTO across the codegen units within a single crate, bringing back some optimizations that would otherwise be lost by the separation. This is rustc‘s default behavior if opt-level is greater than 0.

profile LTO settings. If the project is not using LTO, then Cargo will instruct rustc to not place bitcode in the rlib files, which should reduce the amount of disk space used. This may have a small improvement in performance since rustc no longer needs to compress and write out the bitcode.

If the project is using LTO, then Cargo will instruct rustc to not place object code in the rlib files, avoiding the expensive code generation step. This should improve the build time when building from scratch, and reduce the amount of disk space used.

Two rustc flags are now available to control how the rlib is constructed:

  • -C linker-plugin-lto causes rustc to only place bitcode in the .o files, and skips code generation. This flag was originally added to support cross-language LTO. Cargo now uses this when the rlib is only intended for use with LTO.
  • -C embed-bitcode=no causes rustc to avoid placing bitcode in the rlib altogether. Cargo uses this when LTO is not being used, which reduces some disk space usage.

Additionally, the method in which bitcode is embedded in the rlib has changed. Previously, rustc would place compressed bitcode as a .bc.z file in the rlib archive. Now, the bitcode is placed as an uncompressed section within each .o object file in the rlib archive. This can sometimes be a small performance benefit, because it avoids cost of compressing the bitcode, and sometimes can be slower due to needing to write more data to disk. This change helped simplify the implementation, and also matches the behavior of clang’s -fembed-bitcode option (typically used with Apple’s iOS-based operating systems).

https://blog.mozilla.org/nnethercote/2020/04/24/how-to-speed-up-the-rust-compiler-in-2020/. It took several PRs across rustc and Cargo to make this happen:

Source link

LEAVE A REPLY

Please enter your comment!
Please enter your name here