A theoretical paper that attempts to explain the success of MuP hyperparameter transfer. Its authors find that the largest eigenvalue of the training loss Hessian is independent of the width and depth of the network.
Monday, March 4, 2024Over 200 million OCR pages of books from the Internet archive, available for research use.
Language models rely on separately trained tokenizers. These may produce tokens that are never seen during language model training. There are many in even the most powerful modern language models. This paper explores this phenomenon and presents their findings for identifying and dealing with these tokens.
Researchers present ProtT3, a new framework designed to enhance text-based understanding of proteins by combining Protein Language Models (PLMs) with traditional Language Models (LMs). ProtT3 integrates a PLM for processing amino acid sequences and a language model for generating high-quality textual descriptions using a cross-modal projector called Q-Former.
Researchers have developed a new method, Global-Local Semantic Consistent Learning (GLSCL), to enhance text-video retrieval while significantly reducing computational costs.
The Aya project has 3 different models of increasing size and can converse in 101 languages, many of which are extremely low resource. This project is an incredible step for the open and general access research community.
Tree search is an extremely active area of research for inference time computation for language models. This paper from Microsoft shows a particularly compelling case that enables small models to improve dramatically on mathematics problems.
This post is a long in-depth summary of what DeepMind is working on in its AGI safety and alignment research efforts.