Grab manages its data using a data lake, using different storage formats for high and low throughput data. For high-throughput data, which is frequently updated, it uses Apache Avro with a Merge on Read (MOR) strategy, appending new data to log files for efficient writes and periodically compacting them for manageable reads. For low-throughput data with infrequent updates, it uses Parquet with Copy on Write (CoW), creating new file versions for each write.
Tuesday, June 4, 2024