cross-posted from: https://lemmy.dbzer0.com/post/46446
First this was created as a comment, but than I decided to make this post so people will find it easier =)
Thanks to
@InternetPirate@lemmy.fmhy.ml
for finding the link athttps://the-eye.eu/redarcs
comment on https://lemmy.dbzer0.com/comment/129402There are 19,980 sub-reddit’s archived at the-eye. To download/install/view on linux do this;
Download archives:
wget https://the-eye.eu/redarcs/files/Piracy_submissions.zst
(size: 42 MB)wget https://the-eye.eu/redarcs/files/Piracy_comments.zst
(size: 208 MB)I also recommend downloading index page for faster offline viewing.
curl -A Firefox https://the-eye.eu/redarcs/ -o redarcs.html
And to extract all the links for index page do;
cat redarcs.html | grep -oE "href='(.*?)'" | cut -d\' -f2
Than just drag-and-drop redarcs.html into firefox or chrome to view it.
You can also save them in text file:
cat redarcs.html | grep -oE "href='(.*?)'" | cut -d\' -f2 >> links.txt
Install zstd package:
- Arch:
pacman -S zstd
- Ubuntu:
apt install zstd
Extract files:
zstd -d Piracy_submissions.zst
(extracted size: 593 MB)zstd -d Piracy_comments.zst
(extracted size: 2.4 GB)View files with head/tail/grep:
cat Piracy_submissions | head -10
(example)cat Piracy_submissions | tail -10
(example)cat Piracy_submissions | grep "word"
(example)Note format seems to be json files. You can use
jq
tool for this.
cat Piracy_submissions | jq -r "."
orcat Piracy_submissions | jq -r ".title"