Anthropic's Claude 3 Opus has surpassed OpenAI's GPT-4 for the first time on Chatbot Arena. Chatbot Arena is a leaderboard run by the Large Model Systems Organization, a research organization dedicated to open models. Its site allows visitors to rate outputs from various models, enabling it to calculate the best models in aggregate. While Claude's rise is notable, GPT-4 is now over a year old.
Thursday, March 28, 2024Anthropic has trained three new models in the Claude 3 family, the strongest of which matches GPT4’s reported benchmark results. It is also a multimodal model and performs well on vision tasks. Importantly, Claude's coding ability is substantially improved with this release.
Tuesday, March 5, 2024- Claude 3 successfully summarizes a video into a blog post, showcasing its long context capabilities.
Andrej Karpathy issued a long context challenge to make a blog post from a recent video of his. Claude 3 was able to perform this task with some data pre-processing help. The resulting blog post is high quality and interesting.
Anthropic's new AI model, Claude 3, stands out for its 'warmth,' making it a robust partner in creative writing tasks. Claude 3 is described as more human-feeling and naturalistic, crossing the threshold from good thought to deep thought made enjoyable. Despite technical benchmarks not fully capturing this nuance, Claude 3 is poised to revolutionize how we interact with AI in creative processes.
Anthropic's Claude 3 Haiku is its fastest and most cost-effective AI model. It features advanced vision capabilities and excels in benchmarks. Claude 3 Haiku is designed for enterprises, with a focus on speed and affordability.
- Anthropic releases a prompt library for Claude 3, offering effective user prompts for various tasks.
The release of Claude 3 has been quite popular, but the prompting style for these models is slightly different. Anthropic has collected a set of user prompts that work well for a wide variety of tasks and topics.
Anthropic developed a technique to jailbreak long context models. It has shared these findings with other organizations and implemented mitigations. This post outlines the technique and some of the things it did to defend against the technique.
High-profile AI startups like Inflection AI, Stability AI, and Anthropic are facing financial pressures as they struggle with the high costs of developing generative AI models. While OpenAI, backed by Microsoft, has shown revenue growth, competitors like Anthropic and Stability AI grapple with substantial gaps between revenue and operating expenses. Microsoft's investment in AI hints at the tech industry's belief in AI's long-term profitability, despite the current challenges in monetizing these expensive technologies.
Anthropic has expanded its AI assistant, Claude, to Europe. Claude supports multiple languages. Anthropic is offering the service across its website, iOS app, and business plans for teams. The company is beginning the process of raising more money.
Mike Krieger, one of the co-founders of Instagram, is Anthropic's new chief product officer. Krieger spent the last few years working on an AI news-reading app that was recently acquired by Yahoo. His background in developing intuitive products and user experiences will be invaluable for Anthropic as it creates new ways for people to interact with its AI chatbot Claude.
Anthropic recently published a public research paper explaining why its AI chatbot chooses to generate content about certain subjects over others. Its researchers deciphered what parts of the chatbot's neural network mapped to specific concepts using a process known as 'dictionary learning'. The research showed how neurons associated with a topic fired together when the model was thinking about something associated with the topic - similar sets of neurons firing can evoke adjacent subjects. A link to the paper is available at the end of the article.
Anthropic's Responsible Scaling Policy aims to prevent catastrophic AI safety failures by identifying high-risk capabilities, testing models regularly, and implementing strict safety standards, with a focus on continuous improvement and collaboration with industry and government.
Anthropic researchers have unveiled a method to interpret the inner workings of its large language model, Claude Sonnet, by mapping out millions of features corresponding to a diverse array of concepts. This interpretability could lead to safer AI by allowing specific manipulations of these features to steer model behaviors. The study demonstrates a significant step in understanding and improving the safety mechanisms of AI language models.
Jan Leike, a former OpenAI researcher who resigned over AI safety concerns, has joined Anthropic to lead a new "superalignment" team focusing on AI safety and security. Leike's team will address scalable oversight, weak-to-strong generalization, and automated alignment research.
Anthropic is introducing a "tool use" feature for its Claude AI chatbot, enabling users to create personalized assistants that can interact with any external API. This feature can analyze data, provide product recommendations, track orders, offer technical support, and even process images for applications like interior design.
Claude is more than just a middle-of-the-road, sycophantic AI that agrees with the user. Claude's personality and character have been specifically designed using a character variant of Constitutional AI. This post goes in-depth on how post-training is used to steer the type of output often generated by Claude to represent this desired character.
Anthropic has launched Claude 3.5 Sonnet, boasting better performance than GPT-4o and Gemini on several benchmarks alongside increased speed and expanded capabilities. The update also introduces the Artifacts feature, enhancing user interaction with the AI's outputs. Claude aims to transition from a chatbot to a central tool in business environments, integrating knowledge and workflow management.
Anthropic's new features in Claude allow developers to automate prompt engineering, improving AI application development by generating, testing, and refining prompts with quick feedback.
Several major AI companies, including Anthropic, Nvidia, Apple, and Salesforce, used subtitles from 173,536 YouTube videos across 48,000 channels to train AI models, despite YouTube's rules against unauthorized data harvesting. This has sparked backlash from content creators, who argue that their work has been exploited without consent or compensation, raising concerns about AI's impact on the creative industry and the ethics of using such training data.
Anthropic, in partnership with Menlo Ventures, is launching a $100 million Anthology Fund to support AI startups. The fund will provide at least $100k per startup, access to Anthropic's AI models, and various other perks such as networking opportunities and workspace access.
Freelancer.com and iFixit CEOs criticized Anthropic for excessive and unauthorized web scraping by its ClaudeBot, which disrupted their sites. Anthropic claims to respect the robots.txt file and says it will investigate the issue.
Reddit's CEO is calling on Microsoft and other companies to pay if they want to continue scraping the site's data. The site is now blocking companies that haven't signed agreements about how its data will be used or not used. Companies like Microsoft, Anthropic, and Perplexity have refused to negotiate. OpenAI's SearchGPT will be able to show Reddit results as the two companies reached a deal earlier this year.
Anthropic has introduced prompt caching for its Claude models, allowing developers to cache frequently used context, significantly reducing costs and latency, with early users like Notion already benefiting from faster and more efficient AI-powered features.
Anthropic has added system prompts and updated dates for all models.
Anthropic has made Artifacts generally available, including on mobile.
Anthropic has published the system prompts used to guide its Claude AI models and plans to continue being transparent moving forward.
Anthropic has released a useful set of starter projects. It has partnered with former heads of AI from Brex, Uber, Facebook, and others to help write the first Quickstart, a scalable customer service agent powered by Claude.
This article contains an interview with Mike Krieger, the new chief product officer at Anthropic, where he discusses why he decided to work in AI, what products he sees AI will bring in the future, and how he is thinking about building them. Krieger co-founded Instagram. He left Meta in 2018 and launched an AI-powered news reader, which was shut down earlier this year. Anthropic was started in 2021 by former OpenAI executives and researchers. Its main product is Claude, an industry-leading AI model and chatbot that competes with ChatGPT.
Anthropic shows how to semantically chunk documents, which dramatically improves performance while only costing $1/million chunks due to caching.
Simon Willison explores the mechanics of HTTP streaming APIs from various large language model (LLM) providers, detailing how they function and providing practical examples of their usage. The primary focus is on the commonalities among these APIs, which utilize the `text/event-stream` content type to facilitate real-time data streaming. Each API sends data in blocks separated by double newlines, with each block containing a JSON line prefixed by `data:`. Notably, the EventSource API in browsers cannot be used directly with these APIs since they typically employ POST requests instead of GET. Willison provides a specific example using OpenAI's API, demonstrating how to send a prompt to the GPT-4o Mini model and request a streaming response. The command utilizes `curl` with the `--no-buffer` option to ensure that the output is displayed in real-time as it is received. The response includes various data chunks, each representing parts of the model's output, along with metadata about token usage. He then examines the Anthropic Claude API, which also supports streaming responses. The example shows how to send a similar prompt and receive a structured response that includes event types such as `message_start`, `content_block_start`, and `content_block_delta`, indicating the progression of the response. The Google Gemini API is discussed next, highlighting its ability to return larger token chunks. Willison demonstrates this by prompting for a longer joke, which results in multiple parts being streamed back in succession. In addition to the API examples, Willison shares code snippets for accessing these streaming responses using Python's HTTPX library and JavaScript's Fetch API. The Python example employs asynchronous programming to handle the streaming data, while the JavaScript example uses asynchronous iterators to process incoming events. Overall, the exploration provides a comprehensive look at how different LLM APIs implement streaming, the structure of their responses, and practical coding examples for developers looking to integrate these capabilities into their applications.