pub fn quantize(
w: impl AsRef<Array>,
group_size: impl Into<Option<i32>>,
bits: impl Into<Option<i32>>,
) -> Result<(Array, Array, Array)>
Expand description
Quantize the matrix w
using bits
bits per element.
Note, every group_size
elements in a row of w
are quantized together. Hence, number of
columns of w
should be divisible by group_size
. In particular, the rows of w
are divided
into groups of size group_size
which are quantized together.
quantized
currently only supports 2D inputs with dimensions which are multiples of 32
For details, please see this documentation
ยงParams
w
: The input matrixgroup_size
: The size of the group inw
that shares a scale and bias. (default:64
)bits
: The number of bits occupied by each element of w in the returned quantized matrix. (default: 4)