Module arch
SIMD and vendor intrinsics module.
This module is intended to be the gateway to architecture-specific intrinsic functions, typically related to SIMD (but not always!). Each architecture that Rust compiles to may contain a submodule here, which means that this is not a portable module! If you're writing a portable library take care when using these APIs!
Under this module you'll find an architecture-named module, such as
x86_64. Each #[cfg(target_arch)] that Rust can compile to may have a
module entry here, only present on that particular target. For example the
i686-pc-windows-msvc target will have an x86 module here, whereas
x86_64-pc-windows-msvc has x86_64.
Overview
This module exposes vendor-specific intrinsics that typically correspond to a single machine instruction. These intrinsics are not portable: their availability is architecture-dependent, and not all machines of that architecture might provide the intrinsic.
The arch module is intended to be a low-level implementation detail for
higher-level APIs. Using it correctly can be quite tricky as you need to
ensure at least a few guarantees are upheld:
- The correct architecture's module is used. For example the
armmodule isn't available on thex86_64-unknown-linux-gnutarget. This is typically done by ensuring that#[cfg]is used appropriately when using this module. - The CPU the program is currently running on supports the function being called. For example it is unsafe to call an AVX2 function on a CPU that doesn't actually support AVX2.
As a result of the latter of these guarantees all intrinsics in this module
are unsafe and extra care needs to be taken when calling them!
CPU Feature Detection
In order to call these APIs in a safe fashion there's a number of
mechanisms available to ensure that the correct CPU feature is available
to call an intrinsic. Let's consider, for example, the _mm256_add_epi64
intrinsics on the x86 and x86_64 architectures. This function requires
the AVX2 feature as documented by Intel so to correctly call
this function we need to (a) guarantee we only call it on x86/x86_64
and (b) ensure that the CPU feature is available
Static CPU Feature Detection
The first option available to us is to conditionally compile code via the
#[cfg] attribute. CPU features correspond to the target_feature cfg
available, and can be used like so:
#[cfg(
all(
any(target_arch = "x86", target_arch = "x86_64"),
target_feature = "avx2"
)
)]
fn foo() {
#[cfg(target_arch = "x86")]
use std::arch::x86::_mm256_add_epi64;
#[cfg(target_arch = "x86_64")]
use std::arch::x86_64::_mm256_add_epi64;
unsafe {
_mm256_add_epi64(...);
}
}
Here we're using #[cfg(target_feature = "avx2")] to conditionally compile
this function into our module. This means that if the avx2 feature is
enabled statically then we'll use the _mm256_add_epi64 function at
runtime. The unsafe block here can be justified through the usage of
#[cfg] to only compile the code in situations where the safety guarantees
are upheld.
Statically enabling a feature is typically done with the -C target-feature or -C target-cpu flags to the compiler. For example if
your local CPU supports AVX2 then you can compile the above function with:
Or otherwise you can specifically enable just the AVX2 feature:
Note that when you compile a binary with a particular feature enabled it's important to ensure that you only run the binary on systems which satisfy the required feature set.
Dynamic CPU Feature Detection
Sometimes statically dispatching isn't quite what you want. Instead you might want to build a portable binary that runs across a variety of CPUs, but at runtime it selects the most optimized implementation available. This allows you to build a "least common denominator" binary which has certain sections more optimized for different CPUs.
Taking our previous example from before, we're going to compile our binary without AVX2 support, but we'd like to enable it for just one function. We can do that in a manner like:
fn foo() {
#[cfg(any(target_arch = "x86", target_arch = "x86_64"))]
{
if is_x86_feature_detected!("avx2") {
return unsafe { foo_avx2() };
}
}
// fallback implementation without using AVX2
}
#[cfg(any(target_arch = "x86", target_arch = "x86_64"))]
#[target_feature(enable = "avx2")]
unsafe fn foo_avx2() {
#[cfg(target_arch = "x86")]
use std::arch::x86::_mm256_add_epi64;
#[cfg(target_arch = "x86_64")]
use std::arch::x86_64::_mm256_add_epi64;
unsafe { _mm256_add_epi64(...); }
}
There's a couple of components in play here, so let's go through them in detail!
-
First up we notice the
is_x86_feature_detected!macro. Provided by the standard library, this macro will perform necessary runtime detection to determine whether the CPU the program is running on supports the specified feature. In this case the macro will expand to a boolean expression evaluating to whether the local CPU has the AVX2 feature or not.Note that this macro, like the
archmodule, is platform-specific. For example callingis_x86_feature_detected!("avx2")on ARM will be a compile time error. To ensure we don't hit this error a statement level#[cfg]is used to only compile usage of the macro onx86/x86_64. -
Next up we see our AVX2-enabled function,
foo_avx2. This function is decorated with the#[target_feature]attribute which enables a CPU feature for just this one function. Using a compiler flag like-C target-feature=+avx2will enable AVX2 for the entire program, but using an attribute will only enable it for the one function. Usage of the#[target_feature]attribute currently requires the function to also beunsafe, as we see here. This is because the function can only be correctly called on systems which have the AVX2 (like the intrinsics themselves).
And with all that we should have a working program! This program will run across all machines and it'll use the optimized AVX2 implementation on machines where support is detected.
Ergonomics
It's important to note that using the arch module is not the easiest
thing in the world, so if you're curious to try it out you may want to
brace yourself for some wordiness!
The primary purpose of this module is to enable stable crates on crates.io to build up much more ergonomic abstractions which end up using SIMD under the hood. Over time these abstractions may also move into the standard library itself, but for now this module is tasked with providing the bare minimum necessary to use vendor intrinsics on stable Rust.
Other architectures
This documentation is only for one particular architecture, you can find others at:
x86x86_64armaarch64amdgpuriscv32riscv64mipsmips64powerpcpowerpc64nvptxwasm32loongarch32loongarch64s390x
Examples
First let's take a look at not actually using any intrinsics but instead using LLVM's auto-vectorization to produce optimized vectorized code for AVX2 and also for the default platform.
unsafe
Next up let's take a look at an example of manually using intrinsics. Here we'll be using SSE4.1 features to implement hex encoding.
// translated from
// <https://github.com/Matherunner/bin2hex-sse/blob/master/base16_sse4.cpp>
unsafe
Modules
- alloc Memory allocation APIs
- any Utilities for dynamic typing or type reflection.
- arch SIMD and vendor intrinsics module.
- array Utilities for the array primitive type.
- ascii Operations on ASCII strings and characters.
-
assert_matches
Unstable module containing the unstable
assert_matchesmacro. - async_iter Composable asynchronous iteration.
-
autodiff
Unstable module containing the unstable
autodiffmacro. - borrow Utilities for working with borrowed data.
-
bstr
The
ByteStrtype and trait implementations. - cell Shareable mutable containers.
-
char
Utilities for the
charprimitive type. -
clone
The
Clonetrait for types that cannot be 'implicitly copied'. - cmp Utilities for comparing and ordering values.
- contracts Unstable module containing the unstable contracts lang items and attribute macros.
- convert Traits for conversions between types.
-
default
The
Defaulttrait for types with a default value. - error Interfaces for working with Errors.
-
f128
Constants for the
f128quadruple-precision floating point type. -
f16
Constants for the
f16half-precision floating point type. -
f32
Constants for the
f32single-precision floating point type. -
f64
Constants for the
f64double-precision floating point type. - ffi Platform-specific types, as defined by C.
- fmt Utilities for formatting and printing strings.
-
from
Unstable module containing the unstable
Fromderive macro. - future Asynchronous basic functionality.
- hash Generic hashing support.
- hint Hints to compiler that affects how code should be emitted or optimized.
- index Helper types for indexing slices.
- intrinsics Compiler intrinsics.
- io Traits, helpers, and type definitions for core I/O functionality.
- iter Composable external iteration.
- marker Primitive traits and types representing basic properties of types.
- mem Basic functions for dealing with memory.
- net Networking primitives for IP communication.
- num Numeric traits and functions for the built-in numeric types.
- ops Overloadable operators.
- option Optional values.
- os OS-specific functionality.
- panic Panic support in the standard library.
- panicking Panic support for core
-
pat
Helper module for exporting the
pattern_typemacro - pin Types that pin data to a location in memory.
- prelude The core prelude
- primitive This module reexports the primitive types to allow usage that is not possibly shadowed by other declared types.
- profiling Profiling markers for compiler instrumentation.
- ptr Manually manage memory through raw pointers.
- random Random value generation.
-
range
Experimental replacement range types
-
result
Error handling with the
Resulttype. - simd Portable SIMD module.
- slice Slice management and manipulation.
- str String manipulation.
- sync Synchronization primitives
- task Types and Traits for working with asynchronous tasks.
- time Temporal quantification.
-
ub_checks
Provides the
assert_unsafe_preconditionmacro as well as some utility functions that cover common preconditions. - unsafe_binder Operators used to turn types into unsafe binders and back.
Macros
-
assert
Asserts that a boolean expression is
trueat runtime. -
assert_eq
Asserts that two expressions are equal to each other (using
PartialEq). -
assert_ne
Asserts that two expressions are not equal to each other (using
PartialEq). - assert_unsafe_precondition Checks that the preconditions of an unsafe function are followed.
- cfg Evaluates boolean combinations of configuration flags at compile-time.
- column Expands to the column number at which it was invoked.
- compile_error Causes compilation to fail with the given error message when encountered.
- concat Concatenates literals into a static string slice.
- concat_bytes Concatenates literals into a byte slice.
-
const_format_args
Same as
format_args, but can be used in some const contexts. -
debug_assert
Asserts that a boolean expression is
trueat runtime. - debug_assert_eq Asserts that two expressions are equal to each other.
- debug_assert_ne Asserts that two expressions are not equal to each other.
- env Inspects an environment variable at compile time.
- file Expands to the file name in which it was invoked.
- format_args Constructs parameters for the other string-formatting macros.
- include Parses a file as an expression or an item according to the context.
- include_bytes Includes a file as a reference to a byte array.
- include_str Includes a UTF-8 encoded file as a string.
- line Expands to the line number on which it was invoked.
- log_syntax Prints passed tokens into the standard output.
- matches Returns whether the given expression matches the provided pattern.
- module_path Expands to a string that represents the current module path.
- option_env Optionally inspects an environment variable at compile time.
- panic Panics the current thread.
-
pattern_type
Creates a pattern type.
type Positive = std::pat::pattern_type!(i32 is 1..); - stringify Stringifies its arguments.
- todo Indicates unfinished code.
- trace_macros Enables or disables tracing functionality used for debugging other macros.
-
try
Unwraps a result or propagates its error.
[raw-identifier syntax][ris]:
r#try. [propagating-errors]: https://doc.rust-lang.org/book/ch09-02-recoverable-errors-with-result.html#a-shortcut-for-propagating-errors-the--operator [ris]: https://doc.rust-lang.org/nightly/rust-by-example/compatibility/raw_identifiers.html - unimplemented Indicates unimplemented code by panicking with a message of "not implemented".
- unreachable Indicates unreachable code.
- write Writes formatted data into a buffer.
- writeln Writes formatted data into a buffer, with a newline appended.