tjr@innernet.link

tjr@innernet.link

cross-posted from: https://lemmy.dbzer0.com/post/46446

First this was created as a comment, but than I decided to make this post so people will find it easier =)

Thanks to @InternetPirate@lemmy.fmhy.ml for finding the link at https://the-eye.eu/redarcs comment on https://lemmy.dbzer0.com/comment/129402

There are 19,980 sub-reddit’s archived at the-eye. To download/install/view on linux do this;

Download archives:

wget https://the-eye.eu/redarcs/files/Piracy_submissions.zst (size: 42 MB)

wget https://the-eye.eu/redarcs/files/Piracy_comments.zst (size: 208 MB)

I also recommend downloading index page for faster offline viewing.

curl -A Firefox https://the-eye.eu/redarcs/ -o redarcs.html

And to extract all the links for index page do;

cat redarcs.html | grep -oE "href='(.*?)'" | cut -d\' -f2

Than just drag-and-drop redarcs.html into firefox or chrome to view it.

You can also save them in text file:

cat redarcs.html | grep -oE "href='(.*?)'" | cut -d\' -f2 >> links.txt

Install zstd package:

Arch: pacman -S zstd

Ubuntu: apt install zstd

Extract files:

zstd -d Piracy_submissions.zst (extracted size: 593 MB)

zstd -d Piracy_comments.zst (extracted size: 2.4 GB)

View files with head/tail/grep:

cat Piracy_submissions | head -10 (example)

cat Piracy_submissions | tail -10 (example)

cat Piracy_submissions | grep "word" (example)

Note format seems to be json files. You can use jq tool for this.

cat Piracy_submissions | jq -r "." or cat Piracy_submissions | jq -r ".title"

Reddit Archives of Sub-Reddits on The-Eye

Reddit Archives of Sub-Reddits on The-Eye