I’ve had the pleasure of having had to analyse multi-gigabyte JSON dumps in a project context recently. JSON itself is actually a rather pleasant format to consume, as it’s human-readable and there is a lot of tooling available for it. JQ allows expressing sophisticated processing steps in a single command line, and Jupyter with Python and Pandas allow easy interactive analysis to quickly find what you’re looking for.
However, with multi-gigabyte files, analysis becomes quite a lot more difficult.
Interesting. I had the pleasure of comparing very large, complex JSON files when I was a NASA contractor. I ended up writing a somewhat custom JSON crawler using Node JS that would first parse the entire ~500 MB json files into JS objects, and then using asynchronous programming it would “crawl” through the objects and compare each and every value for every property or array index and writing out differences to a text file that included the actual path to the value, so you’d get something like “object.array1.velocity”, but waaay longer of a path than that a lot of the time.