mlx_rs::ops

Function quantized_matmul_device

Source
pub fn quantized_matmul_device(
    x: impl AsRef<Array>,
    w: impl AsRef<Array>,
    scales: impl AsRef<Array>,
    biases: impl AsRef<Array>,
    transpose: impl Into<Option<bool>>,
    group_size: impl Into<Option<i32>>,
    bits: impl Into<Option<i32>>,
    stream: impl AsRef<Stream>,
) -> Result<Array>
Expand description

Perform the matrix multiplication with the quantized matrix w. The quantization uses one floating point scale and bias per group_size of elements. Each element in w takes bits bits and is packed in an unsigned 32 bit integer.