cross-posted from: https://lemmy.dbzer0.com/post/46446
First this was created as a comment, but than I decided to make this post so people will find it easier =)
Thanks to
@InternetPirate@lemmy.fmhy.mlfor finding the link athttps://the-eye.eu/redarcscomment on https://lemmy.dbzer0.com/comment/129402There are 19,980 sub-reddit’s archived at the-eye. To download/install/view on linux do this;
Download archives:
wget https://the-eye.eu/redarcs/files/Piracy_submissions.zst(size: 42 MB)wget https://the-eye.eu/redarcs/files/Piracy_comments.zst(size: 208 MB)I also recommend downloading index page for faster offline viewing.
curl -A Firefox https://the-eye.eu/redarcs/ -o redarcs.htmlAnd to extract all the links for index page do;
cat redarcs.html | grep -oE "href='(.*?)'" | cut -d\' -f2Than just drag-and-drop redarcs.html into firefox or chrome to view it.
You can also save them in text file:
cat redarcs.html | grep -oE "href='(.*?)'" | cut -d\' -f2 >> links.txtInstall zstd package:
- Arch:
pacman -S zstd- Ubuntu:
apt install zstdExtract files:
zstd -d Piracy_submissions.zst(extracted size: 593 MB)zstd -d Piracy_comments.zst(extracted size: 2.4 GB)View files with head/tail/grep:
cat Piracy_submissions | head -10(example)cat Piracy_submissions | tail -10(example)cat Piracy_submissions | grep "word"(example)Note format seems to be json files. You can use
jqtool for this.
cat Piracy_submissions | jq -r "."orcat Piracy_submissions | jq -r ".title"
