• GitHub introduced a beta for code-scanning autofix, merging GitHub’s Copilot with CodeQL to detect and automatically remedy security vulnerabilities in JavaScript, TypeScript, Java, and Python.

    Hi Impact
    Monday, March 25, 2024
  • GitHub developed “merge queue” to provide a streamlined, automated, and consistent method for deploying and merging code across its repositories. Merge queue's key features include dynamic grouping of pull requests, automatic conflict detection, and integration with GitHub Actions for testing. After a phased rollout, merge queue is now the primary way GitHub engineers ship code, leading to better deployment velocity and developer satisfaction.

    Thursday, March 7, 2024
  • GitHub Copilot analyzes code in your editor to understand what you’re working on and then sends gathered context to a backend service that sanitizes the input by removing harmful content and irrelevant prompts. The cleaned prompt is run through OpenAI’s ChatGPT API and then a final suggestion is presented in your editor.

  • To become a better developer in 2024, get better at using AI tools like GitHub Copilot, use shortcuts often, and focus on soft skills. It's best to use AI in a way that makes you more productive, for example, by automating boilerplate code and finding bugs early.

  • GitHub's comment file upload feature is being used to distribute malware disguised as legitimate files from trusted Microsoft repositories.

  • GitHub Copilot Extensions enables developers to build and deploy to the cloud in their natural language with their preferred tools and services without leaving the IDE or GitHub.com. It helps developers stay in the flow longer, uplevel their skills, and innovate faster. Extensions can be found on the GitHub Marketplace. Organizations can create private Copilot Extensions for their homegrown developer tooling.

  • Design system experts from Bumble, GitHub, and HP discuss leveraging the new Code Connect feature to integrate design and code, highlighting the importance of a shared language and seamless workflow. They emphasize the need for continuous collaboration and the adoption of best practices to maximize the utility of design systems, ensuring consistency and reducing friction between designers and developers.

  • GitHub revamped its code push handling system by transitioning from a single monolithic job to a decoupled architecture using Apache Kafka. This change involved breaking down the original large job into smaller, independent tasks based on dependencies and ownership. They improved reliability by enabling targeted retries and reduced latency by allowing parallel processing.

  • Between 2018 and 2023, GitHub grew its ARR from $250 million to over $1 billion and its user base from 30 million developers to over 100 million. Its growth strategy includes an unwavering developer-centric positioning, expanding its platform with relevant and innovative solutions to its users' problems, and strong cultural values and community alignment. Despite initial criticism when Microsoft acquired the company, GitHub stuck to its own positioning and maintained a brand that developers trust.

    Hi Impact
  • GitHub is starting to decline in quality and is showing signs of legacy software. A recent issue with the blame view feature, which appears to be caused by a React frontend rewrite, is evidence of this decline. GitHub's priorities may have shifted.

  • Truffle Security found a huge security flaw on GitHub where deleted and private repository data can be accessed by anyone. The issue is due to GitHub's repository network architecture, which allows forks to retain access to commit data even after the original repository is deleted or its visibility is changed. This vulnerability allows attackers to potentially access sensitive information like API keys and private code.

  • GitHub's new Copilot Autofix is a future that analyzes vulnerabilities in code and offers code suggestions to help fix them. The feature has been enabled by default for customers on GitHub Enterprise Cloud and will be offered for free in pull requests in open source projects beginning in September. It works with dozens of classes of vulnerabilities. Developers can dismiss, edit, or comment on suggestions in pull requests.

  • This article, written by one of GitHub's cofounders, discusses the reasoning behind the rise and dominance of the platform. GitHub started at the right time and it had good taste. It started when distributed open source version control tools were starting to get useful, solid, and adopted and there was nobody around to host them. Its competitors couldn't compete with a developer tools company whose cofounders were all product-focused open source software developers that cared about the developer experience. Everyone else was trying to build what they thought they could sell to advertisers or CTOs.

  • Scott Chacon, co-founder of GitHub, explains why GitHub became the dominant code hosting platform. He attributes it to two key factors: timing and taste. GitHub launched at a time when distributed version control systems like Git were gaining traction, filling a void in the market for hosting these projects. GitHub's founders, being developers themselves, focused on creating a developer-centric platform with a user-friendly interface and workflow, unlike other hosting services that prioritized revenue and distribution.

  • AI tools like GitHub Copilot are making programmers worse at programming. These tools can erode fundamental programming skills and create a false sense of expertise. Relying on them without a deep understanding of the code and the ability to problem-solve independently will make developers dependent on AI.

  • David Lord discusses the challenges he faces while maintaining multiple libraries on GitHub, particularly regarding the overwhelming number of scheduled dependency updates. He actively manages around 20 libraries and has access to another 20, most of which are stable and experience low activity. The introduction of scheduled dependency updates has disrupted the calmness of these projects, leading him to disable these updates in favor of a local update command. To maintain a consistent development environment, Lord pins development dependencies across three ecosystems: Python requirements files using pip-compile, pre-commit hooks, and GitHub Actions in CI workflows. Each of these ecosystems can generate monthly pull requests (PRs) for updates, resulting in a significant influx of notifications—approximately 60 PR notifications at the start of each month. This flood of notifications not only overwhelms him but also makes it difficult for occasional contributors to identify relevant updates. The process of addressing each PR is cumbersome, requiring multiple steps that disrupt his workflow. When test failures occur, he must switch contexts to troubleshoot issues in projects he hasn't engaged with for months. This busy work detracts from meaningful contributions and makes it challenging to differentiate between actual fixes or features and routine updates. While scheduled updates may be beneficial for applications with continuous deployment, Lord argues that for libraries, which primarily serve as development environments, constant attention to updates is unnecessary. He prefers to manage updates manually, running tools like pip-compile and pre-commit locally. However, he found that there was no straightforward way to update GitHub Actions locally, prompting him to create a tool called gha-update. This tool automates the process of finding and updating action versions in workflow files, allowing him to maintain control over updates. With this new approach, Lord can update dependencies only when actively working on a project, ensuring that he has a stable environment upon returning to a project after a hiatus. He successfully updated all his projects without the usual barrage of notifications, marking a significant improvement in his workflow. To facilitate these updates, he utilizes specific tox environments for each tool, enabling him to run updates individually or collectively. He can also leverage all-repos to apply these updates across all his projects, run tests, and create pull requests or push changes to the main branch, all on his own schedule. This method allows him to regain control over his development process, reducing noise and enhancing productivity.

  • The GitHub blog post discusses the strategies and tools used to enhance system availability through iterative simplification, particularly in the context of scaling a complex platform like GitHub. The author, Nick Hengeveld, emphasizes the importance of monitoring and addressing performance issues proactively to maintain a seamless user experience. To manage the growing demands on their system, GitHub employs various tools for monitoring and analysis. Key among these are Datadog for tracking metrics and performance patterns, Splunk for analyzing event context and troubleshooting, and custom monitors for identifying slow database queries. The use of the Scientist tool allows for testing proposed changes to ensure they improve performance before implementation. Additionally, Flipper is utilized for controlled rollouts of new features, enabling gradual exposure to users while monitoring for any issues. A specific example highlighted in the post involves optimizing a SQL query related to the Command Palette feature, which was causing timeouts due to inefficient data retrieval. By reworking the query logic and conducting experiments with the Scientist tool, the team achieved significant performance improvements, reducing query timeouts by 80-90%. Further optimizations were made by eliminating unnecessary queries and batching access checks, leading to additional performance gains. The blog also discusses the importance of removing unused code to prevent potential performance degradation. By analyzing request data and identifying bottlenecks, the team was able to simplify the code for a frequently accessed endpoint, resulting in improved latency and a more consistent user experience. Throughout the process, several key lessons emerged: the value of investing in observability to quickly identify and resolve issues, the need to consider adjacent code for potential improvements, and the importance of making small, controlled changes to monitor their impact effectively. The overarching message is that maintaining system performance is an ongoing effort that requires vigilance and a proactive approach to problem-solving.

  • DALDA, which stands for Data Augmentation Leveraging Diffusion Model and LLM with Adaptive Guidance Scaling, is a framework designed to enhance data augmentation techniques, particularly in scenarios where data is scarce. This innovative approach utilizes both a Large Language Model (LLM) and a Diffusion Model (DM) to generate semantically rich images. By embedding novel semantic information into text prompts through the LLM and employing real images as visual prompts, DALDA effectively addresses the challenge of generating samples that remain within the target distribution. The installation process for DALDA involves creating a virtual environment and installing necessary dependencies. Users are guided to set up a conda environment, activate it, and install the required packages from a requirements file. Additionally, users are instructed to download specific models and datasets, including the Flowers102, Oxford Pets, and Caltech101 datasets, with detailed commands provided for each. To generate prompts using the LLM, specifically GPT-4, users must create a configuration file that includes their Azure endpoint and API key. Once the environment is set up, prompts can be generated by executing a designated script. The framework also includes functionality for training classifiers, with instructions on how to run the training script and utilize a resume feature for ongoing training sessions. The development of DALDA draws on existing code from projects like DA-Fusion and integrates components from IP-Adapter, diffusers, and CLIP, ensuring compliance with their respective licenses. The repository is publicly accessible on GitHub, where it has garnered attention with stars and forks, indicating community interest and collaboration. The project is positioned within the broader context of data augmentation, synthetic data generation, and the application of diffusion models and large language models in machine learning.

  • The GitHub Repository Visualizer is a tool designed to provide insights into various aspects of a GitHub repository. Users can start by entering the repository owner's name and the specific repository they wish to analyze. Once the information is submitted, the visualizer generates a detailed breakdown of the repository's contents. One of the key features of the visualizer is its ability to display the programming languages used within the repository. Users can interact with a chart that illustrates the percentages of different languages, allowing for a quick understanding of the technological stack employed in the project. Additionally, the visualizer highlights the top contributors to the repository, showcasing their commit counts. This feature emphasizes the collaborative nature of the project and recognizes the individuals who have significantly contributed to its development. Another important aspect presented is the comparison between the number of stargazers and forks. This metric provides insight into the repository's popularity and how many users have chosen to engage with or replicate the project. The tool also tracks the duration of the repository's activity, giving users a sense of its longevity and ongoing development. In conclusion, the GitHub Repository Visualizer offers a comprehensive overview of a selected repository, making it easier for users to appreciate its structure, contributions, and overall engagement within the GitHub community. For those interested in exploring more projects, the visualizer encourages users to visit the creator's website for additional resources and tools.

  • The GitHub repository titled "facad," created by the user yellow-footed-honeyguide, presents a modern and colorful directory listing tool designed for command line use. This tool enhances the traditional directory listing experience by incorporating various features that improve usability and aesthetics. One of the standout features of facad is its use of emoji-based icons for files and directories, which adds a visual element to the command line interface. The output is sorted to display directories first, making navigation more intuitive. Additionally, facad supports symbolic links and executable files, ensuring that users can easily identify and interact with different types of files. The tool also offers a compact grid display and customizable column widths, allowing users to tailor the output to their preferences. Furthermore, it is designed to be Unicode-aware, which enhances its compatibility with a wide range of characters and symbols. For users interested in building facad from source, the repository provides clear instructions. Users can clone the repository, set up a build directory, and compile the tool using the Meson build system and Ninja. There is also an option to install facad system-wide for easier access. Contributions to the project are encouraged, with guidelines provided for users who wish to submit pull requests. The author of the project, Sergey Veneckiy, is open to collaboration and can be contacted via email or through his GitHub profile. The repository is licensed under the MIT License, which allows for broad usage and modification of the software. Overall, facad aims to provide a visually appealing and functional alternative to traditional command line directory listing tools, making it a valuable addition for users who frequently work in terminal environments.

  • GitHub has announced a significant evolution of its Issues and Projects features, marking a major update since the platform's inception in 2009. This update introduces several highly requested enhancements aimed at improving collaboration among software teams. Key features include the introduction of sub-issues, issue types, and advanced search capabilities, all designed to streamline the process of managing and tracking work within GitHub. Sub-issues allow users to create a hierarchical structure by nesting issues under a parent issue. This feature enables teams to break down complex tasks into manageable components, making it easier to track progress and understand the remaining work. Users can also monitor the status of sub-issues within their projects, enhancing visibility and organization. The addition of issue types provides a standardized way to classify and manage issues across all repositories within an organization. This classification helps teams quickly assess the status of their bug backlog, identify high-level initiatives, and understand the overall breakdown of work in a project. Advanced search functionality has been enhanced to allow users to build complex queries using logical operators like AND and OR, as well as parentheses for nested searches. This capability enables users to find specific issues more efficiently, catering to the diverse needs of development teams. The user interface for issues has also been updated to improve speed and familiarity. Enhancements include a new filter bar with autocomplete and syntax highlighting, a quicker process for creating multiple issues, and an organized presentation of issue forms and templates. Additionally, users can now easily share issue links and load more events in long issues, further improving the overall user experience. In conjunction with these updates, GitHub has expanded the item limits for GitHub Projects from 1,200 to 50,000 items, allowing for greater scalability in project management. This expansion includes new features such as support for slicing by swimlanes and GraphQL API, along with performance improvements based on user feedback. GitHub is inviting users to participate in the public preview of these new features, encouraging feedback to refine and enhance the tools further. In addition to the Issues update, GitHub has also made advancements in its Copilot offerings, now available for both individual and business users. Copilot provides powerful natural language code search capabilities, streamlines development processes, and offers summaries of discussions and pull requests, all aimed at enhancing productivity for developers. Furthermore, GitHub Enterprise Cloud has introduced support for the System for Cross-domain Identity Management (SCIM) specification, allowing for flexible identity management options for Enterprise Managed Users. This update includes improved security features and auditing capabilities, ensuring that organizations can manage access effectively while maintaining a high level of security. Overall, these updates reflect GitHub's commitment to enhancing the developer experience, fostering collaboration, and providing robust tools for managing software development projects.

  • The discussion surrounding AI coding assistants, particularly tools like GitHub Copilot, has revealed a complex landscape of developer experiences and outcomes. While many developers express that these tools enhance their productivity, a recent study by Uplevel challenges this notion, indicating that the actual benefits may be minimal or even negative. The study analyzed the performance of approximately 800 developers over a six-month period, comparing their output before and after adopting GitHub Copilot. The findings showed no significant improvements in key programming metrics such as pull request cycle time and throughput. Alarmingly, the use of Copilot was associated with a 41% increase in bugs. In addition to productivity metrics, the Uplevel study also examined developer burnout. It found that while the amount of time spent working outside standard hours decreased for both groups, it decreased more for those not using Copilot. This suggests that the AI tool may not alleviate the pressures of work but could instead contribute to a heavier review burden on developers, who may find themselves spending more time scrutinizing AI-generated code. Despite the mixed results, the study's authors were initially optimistic about the potential for productivity gains. They anticipated that the use of AI tools would lead to faster code merging and fewer defects. However, the reality proved different, leading to a reevaluation of how productivity is measured in software development. Uplevel acknowledges that while their metrics are valid, there may be other ways to assess developer output. In the broader industry, experiences with AI coding assistants vary significantly. For instance, Ivan Gekht, CEO of Gehtsoft USA, reported that his team has not seen substantial productivity improvements from AI tools. He emphasized the challenges of understanding and debugging AI-generated code, noting that it often requires more effort to troubleshoot than to rewrite code from scratch. Gekht highlighted the distinction between simple coding tasks and the more complex process of software development, which involves critical thinking and system design. Conversely, some organizations, like Innovative Solutions, report substantial productivity gains from using AI coding assistants. Their CTO, Travis Rehl, noted that his team has experienced a two to threefold increase in productivity, completing projects in a fraction of the time it previously took. However, he cautioned against overestimating the capabilities of these tools, emphasizing that they should be viewed as supplements to human effort rather than replacements. Overall, the conversation around AI coding assistants reflects a broader uncertainty in the tech industry about the role of AI in software development. While some developers find value in these tools, others face challenges that may outweigh the benefits. As the technology continues to evolve, organizations are encouraged to remain vigilant and critical of the outputs generated by AI, ensuring that they maintain high standards of code quality and developer well-being.

  • GitHub has announced a significant evolution of its Issues and Projects features, marking a major update since the platform's inception in 2009. This update introduces several highly requested enhancements aimed at improving collaboration among software teams. Key features include the introduction of sub-issues, issue types, and advanced search capabilities, all designed to streamline the process of managing and tracking work within GitHub. Sub-issues allow users to create a hierarchical structure by nesting issues under a parent issue. This feature enables teams to break down complex tasks into manageable parts, making it easier to track progress and understand the remaining work. Users can also monitor the status of sub-issues within their projects, enhancing visibility and organization. The new issue types feature provides a standardized way to classify and manage issues across all repositories within an organization. This classification helps teams quickly assess the status of their bug backlog, identify high-level initiatives, and understand the overall breakdown of work in a project. Advanced search functionality has also been enhanced, allowing users to build complex queries using logical operators like AND and OR, as well as parentheses for nested searches. This capability enables users to find specific issues more efficiently, catering to the diverse needs of development teams. In addition to these features, the user interface for issues has been updated to improve speed and familiarity. Enhancements include a new filter bar with autocomplete and syntax highlighting, a quicker process for creating multiple issues, and an organized presentation of issue forms and templates. A new 'copy link' button simplifies sharing issues, and the loading of events in long issues has been optimized. Furthermore, GitHub has expanded the item limits for Projects from 1,200 to 50,000, allowing for greater scalability. This update includes support for features like slice by, swimlanes, and GraphQL API, along with performance improvements based on user feedback. GitHub is inviting users to join the public preview of these new features and provide feedback to help refine the tools further. This initiative reflects GitHub's commitment to enhancing the developer experience and fostering collaboration within software teams. In addition to the Issues update, GitHub has also rolled out new capabilities for GitHub Copilot, which is now available for both individual and business subscriptions. Users can leverage natural language code search, receive suggestions for resolving build failures, and access summaries of discussions and pull requests. These features are also accessible via GitHub Mobile, enhancing the overall usability of the platform. Lastly, GitHub has improved its Enterprise Managed Users (EMUs) functionality by supporting the System for Cross-domain Identity Management (SCIM) specification. This update allows administrators to integrate their preferred identity systems, enhancing flexibility in access management. New security features, including a reduced personal access token scope and improved audit logs for SCIM events, further bolster the platform's security and usability. Overall, these updates reflect GitHub's ongoing efforts to enhance its platform, making it more efficient and user-friendly for developers and teams worldwide.

  • The content revolves around a GitHub repository named "RouterDC," created by a user named shuhao02. This repository contains the code for a project that focuses on a method called "Query-Based Router by Dual Contrastive Learning for Assembling Large Language Models." The repository is public, allowing users to access and contribute to the code. The main features of the repository include a structured layout with folders for datasets, evaluation scripts, training scripts, and utility functions. Users can find necessary training datasets in the designated folder and are provided with instructions on how to create their own datasets from scratch. This involves evaluating outputs from various language models using specific evaluation harnesses, preparing datasets by merging scores with queries, and assigning cluster IDs for training datasets. For training, the repository includes detailed instructions within the training scripts folder. The model is designed to automatically evaluate its performance at predefined steps during the training process, and users can also manually evaluate specific checkpoints using a provided script. The repository encourages academic use by providing a citation format for researchers who find the RouterDC project beneficial for their work. The citation includes details such as the title, authors, and the conference where the work will be presented. Overall, RouterDC serves as a resource for those interested in advanced techniques for assembling large language models, offering both the code and guidance necessary for implementation and experimentation.

  • Eddie Aftandilian, a developer at GitHub with a background in Java at Google, discusses the implications of Hyrum's Law in the context of software development, particularly focusing on hash ordering. Hyrum's Law, articulated by engineer Hyrum Wright, states that with a sufficient number of users of an API, all observable behaviors of the system will be relied upon by someone, regardless of what is promised in the API contract. This principle has significant consequences for developers, especially when undertaking large-scale migrations. Aftandilian highlights that even with clear documentation advising against reliance on certain implementation-specific behaviors, users often do so inadvertently. This creates challenges when attempting to update systems, as developers may find that their seemingly straightforward migrations are complicated by unexpected dependencies. A key example Aftandilian provides is the iteration order of hash tables. In Java, the `HashMap` class does not guarantee a specific order for its keys and values, as stated in its documentation. However, in practice, the iteration order tends to remain stable over time, leading users to depend on this behavior despite the lack of guarantees. When a team is tasked with upgrading Java versions, they may encounter numerous instances where users have relied on this iteration order, complicating the migration process. Aftandilian identifies several patterns where users might accidentally depend on hash iteration order. One common issue arises with order-dependent tests in unit testing frameworks like JUnit, which historically did not specify the order of test execution. If the execution order remains stable, users may inadvertently write tests that rely on this behavior, leading to fragile test cases. Another issue is over-specified test assertions, where developers might write tests that expect a specific order of elements returned from a method, despite the fact that such an order is not guaranteed. This can lead to tests passing incorrectly, as they are based on an assumption that does not hold true across different implementations. To address these challenges, Aftandilian discusses several approaches. One option is to file bugs against the teams responsible for the code that relies on incorrect assumptions, but this does not resolve the underlying issue. Instead, JUnit opted to specify test execution order to align with the previous behavior of Java, thus providing a temporary solution. A more robust approach is defensive randomization, which aims to eliminate the ability to observe the behavior that users have come to depend on. Aftandilian describes how the Java Development Kit (JDK) was modified to randomize hash iteration order, making it impossible for users to rely on a stable order. This method, inspired by practices in Python and Go, involves using an environment variable to set a random seed, ensuring that the hash iteration order is consistent within a single invocation but varies between invocations. In conclusion, Aftandilian emphasizes that hash iteration order exemplifies Hyrum's Law, illustrating how users will depend on stable behaviors regardless of documentation. The most effective solution is to randomize the iteration order, preventing users from making incorrect assumptions. While it can be frustrating for developers to see others relying on undocumented behaviors, the focus should be on creating systems that minimize the potential for such mistakes. He also references related work in the field, noting that Python and Go have implemented similar randomization strategies, and highlights research addressing the broader issue of underspecified APIs.

  • The content revolves around a GitHub repository named "zero," which is described as an experimental approach to modern frontend development without relying on traditional frameworks. The repository is maintained by a user named "nhh" and has garnered attention with 72 stars and 1 fork. The core concept of Zero is to provide a set of types and functions that allow JSX to be transpiled directly into DOM nodes. This approach aims to eliminate the need for developers to update frameworks, as Zero operates directly with the DOM, making it a more stable and straightforward solution for building web applications. Zero is not intended to be a full-fledged framework; instead, it focuses on simplicity and direct interaction with the DOM. The creator emphasizes that modern frameworks often serve the needs of developers more than those of users, leading to unnecessary updates and complexity. By using Zero, developers can avoid these pitfalls and work with a more streamlined process. The repository includes example code demonstrating how to use Zero. It showcases the creation of DOM elements using JSX syntax, dependency injection, and the use of modern DOM APIs. The code snippets illustrate how to define components, manage state without reactivity, and perform asynchronous operations like fetching data from an API. Under the hood, Zero consists of a few snippets and configurations that facilitate the transpilation of JSX to JavaScript. The runtime JavaScript code provided in the repository outlines how elements are created and how event listeners are attached. It also includes a Vite configuration file that specifies how to handle TypeScript and JSX files, ensuring that the necessary functions are injected into the main JavaScript file. Additionally, Zero offers a set of types that enhance the developer experience by connecting JSX types with DOM types. This includes custom interfaces and type definitions that allow for better type checking and autocompletion in development environments. Overall, the Zero repository presents an innovative approach to frontend development, prioritizing simplicity and direct interaction with the DOM while providing a developer-friendly experience through TypeScript and JSX integration.