The MAMBA design transformer having a language modeling head on top (linear layer with weights tied on the input
With these representations, You will find a neat trick that we are able to use, namely go with a https://k2spiceshop.com/product/liquid-k2-on-paper-online/