pub fn quantized_matmul(
x: impl AsRef<Array>,
w: impl AsRef<Array>,
scales: impl AsRef<Array>,
biases: impl AsRef<Array>,
transpose: impl Into<Option<bool>>,
group_size: impl Into<Option<i32>>,
bits: impl Into<Option<i32>>,
) -> Result<Array>Expand description
Perform the matrix multiplication with the quantized matrix w. The quantization uses one
floating point scale and bias per group_size of elements. Each element in w takes bits
bits and is packed in an unsigned 32 bit integer.