xAI has released the weights and architecture for Grok-1. Grok-1 is a Mixture-of-Experts model with 314 billion parameters. The released model is a raw base model checkpoint from the Grok-1 pretraining phase, which means that it is not fine-tuned for any specific application. It was trained from scratch by xAI using a custom training stack.
Monday, March 18, 2024xAI has released the weights and architecture of its 314 billion parameter Mixture-of-Experts model, Grok-1. It is written in JAX and uses a modern Transformer architecture with GeGLU, ROPE, sandwich Norm, and other niceties.