• arotrios@lemmy.world
    link
    fedilink
    English
    arrow-up
    17
    ·
    edit-2
    7 months ago

    Thanks for the list. Unfortunately, they list “Fediverse” which likely means they’re scraping ActivityPub. They’re also going after your Steam account, Twitch, YouTube, and porn.

    In other words, this is so much worse than the headline makes it out to be.

    Surprisingly, Reddit is NOT on the list.

    EDIT: I was wrong - thanks to Da Cap’n for the correction.

    Here’s the full list of names:

    4chan Archives

    Discord Archives

    21Buttons

    500px

    about.me

    AllMyLinks

    AllTrails

    Amazon

    Ameba

    Amino

    AnimePlanet

    Apple Music

    Artists&Clients

    Asciinema

    AudioJungle

    AudiUSA

    BabyCenter

    Baidu

    BeReal

    Bigo Live

    Bing

    Biolink

    BitChute

    BlackPlanet

    Blogger

    Bluesky

    Bodybuilding

    BookCrossing

    Breaches

    BuyMeACoffee

    Cash App

    CastingCall Club

    Chaturbate

    Chess.com

    Cigar Dojo

    CityXGuide

    CloutHub

    Cocolog

    Companies House

    Cozy.tv

    Cracked

    Creema

    Dailymotion

    Danbooru

    Dark Web

    DeepL

    DeviantArt

    Disqus

    DLive

    Dot.cards

    Douyin

    Drum

    DuckDuckGo

    Duolingo

    E621

    eBay

    Eporner

    Etsy

    Facebook

    Fansly

    FastPeopleSearch

    Fediverse (likely ActivityPub - possibly DMs between servers)

    FetLife

    Fiverr

    Flickr

    FlightAware

    Foursquare

    FriendFinder

    FurAffinity

    Gab

    Gaia Online

    GameFAQs

    Gelbooru

    GeneralMotors

    Geocaching

    GeoEstimation

    Gettr

    Giphy

    GitHub

    Glassdoor

    GoFundMe

    Goo

    Google

    Goodreads

    Gravatar

    Guancha

    GunBroker

    Habbo

    Hackaday

    Hatena

    Honda

    Hubski

    ILoveGrowingMarijuana

    ImageShack

    Imgur

    IMVU

    Indeed

    Instagram

    Instructables

    JudyRecords

    Jugem

    JustForFans

    Keybase

    Kick

    Kik

    Last.fm

    LibraryThing

    Lichess

    Likee

    Line

    LinkedIn

    Linktree

    LiveIn

    LiveJournal

    Lobsters

    Mail.ru

    Malgari

    MapMyTracks

    Marshmallow

    MarTech

    Massage Anywhere

    Medium

    MeetMe

    Mercari Jp

    MeWe

    Minds

    Minecraft

    Mix

    Mixlr

    ModDB

    Mughosts

    MyFitnessPal

    Myspace

    MySubaru

    Naijapals

    Nextdoor

    NissanUSA

    Odysee

    OFAC Sanctions List

    OkCupid

    OK.ru

    OnlyFans

    Pandia

    Pandora

    Passes

    Pastebin

    Patreon

    PayPal

    PCGamer

    Peloton

    PGP

    Pinterest

    Plurk

    Poal

    Popl

    Pornhub

    Poshmark

    Product Hunt

    ProtonMail

    PSNProfiles

    Reblogme

    Reddit

    RedGifs

    Replit

    ReverbNation

    Roblox

    Rule34.xxx

    Rumble

    Rutube

    ScoutWiki

    Seesaa

    Seneporno

    Signal

    SkipTheGames

    Skype

    SlideShare

    Snapchat

    Sogou

    SoundCloud

    SourceForge

    Spiceworks

    Spotify

    Sprashivai

    Steam (fuck off you fucking fucks)

    StellantisEU

    StellantisUSA

    Strava

    Stripchat

    Substack

    TechNet

    Telegram

    Tellows

    Tesseract OCR

    Threads

    TikTok

    Tinder

    TinEye

    ToyotaUSA

    Trakt

    Triller

    TripAdvisor

    TrueCaller

    TruthSocial

    Tumblr

    Twilio

    Twitch

    Twitter

    Untappd

    Venmo

    VidLii

    Vimeo

    Vine

    VirusTotal

    VK

    Volkswagen

    VSCO

    WatchMeMore

    Weibo

    WhatsApp

    Wire

    Wordfeud

    Xbox

    xHamster

    XING

    XVideos

    Yahoo

    Yandex

    Yappy

    YCombinator

    Yelp

    YouTube

    Zhihu

    Zillow

    ZoneH

    • dubyakay@lemmy.ca
      link
      fedilink
      arrow-up
      8
      ·
      8 months ago

      Reddit is right there in your list.

      Also:

      Gaia Online

      Thanks. Brings back memories.

      • drascus@sh.itjust.worksOP
        link
        fedilink
        arrow-up
        5
        ·
        edit-2
        7 months ago

        Probably just whatever the public metadata is. metadata is super powerful especially if you have a lot of it. if the email was protonmail to protonmail they will get nothing. If it’s gmail to protonmail they will know that user X is talking to User Y in gmail. They will also have the email header information which is basically just going to be clear text. so they can still ascertain who you know, who you are talking about, and maybe a bit about what the conversation has to do with.

        EDIT: so I asked protonmail directly about it and they confirmed its only publicly available information that they can get. For instance they can try and verify if a certain email address exists. However proton told me that they actively watch for this kind of thing and block IPS trying to do this sort of monitoring.

        • EveryMuffinIsNowEncrypted@lemmy.blahaj.zone
          link
          fedilink
          English
          arrow-up
          3
          ·
          7 months ago

          Oof, yeah I forgot about the metadata… What you say is certainly true and is worrisome.

          Plus, most people who use email don’t use encrypted email so even if they can’t get a transcript of a conversation from my account, they can certainly get everything from the other account if they also scrape that platform.

    • davel [he/him]@lemmy.ml
      link
      fedilink
      English
      arrow-up
      6
      arrow-down
      1
      ·
      8 months ago

      Surprisingly, Reddit is NOT on the list.

      If they’re slurping all these other sites, I highly doubt they’re not slurping Reddit, too, even if it’s not on the list.

      Fediverse (likely ActivityPub - possibly DMs between servers)

      They would have to hack the individual servers to get at the DMs, because they’re encrypted in transit. All the public stuff is trivial to scrape.

      • arotrios@lemmy.world
        link
        fedilink
        English
        arrow-up
        2
        ·
        8 months ago

        They would have to hack the individual servers to get at the DMs, because they’re encrypted in transit. All the public stuff is trivial to scrape.

        Nope, ActivityPub DMs are not encrypted between servers - if it’s on the feed, it’s public- or at least it was as of six months ago. I found this out when I attached a Wordpress site to a Mastodon instance and suddenly found i could read anyone’s DMs to users on other servers. Totally unencrypted. I actually paused development and working with ActivityPub because of it.

        This doesn’t mean that messages to users on the same server are necessarily exposed, but the potential is there if you don’t have a filter for local publishing only engaged on your Mastodon instance.

        • davel [he/him]@lemmy.ml
          link
          fedilink
          English
          arrow-up
          5
          ·
          8 months ago

          ActivityPub DMs are not encrypted between servers

          It is insofar as TLS/SSL/HTTPS encryption is used in transit. That’s what I mean by encrypted in transit.

          i could read anyone’s DMs to users on other servers

          If you’re an administrator for (WordPress) ActivityPub server A, you can see all the DMs coming to and leaving from your server, yes. And they’re not encrypted at rest, so you can read them any time. But how would you see DMs going between server B and server C, when your server isn’t involved in the transaction?

          • arotrios@lemmy.world
            link
            fedilink
            English
            arrow-up
            3
            ·
            edit-2
            8 months ago

            It apparently scrapes everything on the public feed. So when I subscribed to users on Mastodon server A from Wordpress, DMs from Mastodon server A going to Mastodon server B became visible.

            I had a separate account on Mastodon server A to confirm that I couldn’t see these DMs as Mastodon user on server A, and that the Wordpress scrape was grabbing messages normally not meant for public view.

            This was using the ActivityPub plugin for Wordpress about six months ago.

            EDIT: I should be clear that I was as surprised as the other commentators that the DMs weren’t encrypted and that I could see them at all through a 3rd party software. I did NOT see DMs between local users - only cross-instance.

      • drascus@sh.itjust.worksOP
        link
        fedilink
        arrow-up
        3
        ·
        8 months ago

        Duckduckgo is not the problem. They are using publicly scrapable information. So for instance if they have fingerprinted your device they see you go to duckduckgo, then they see you access a site about buying guns, it becomes trivial to determine what you searched for. They would not have direct access to what you search on duckduckgo and duckduckgo is not giving them access. They are using various methods to collect data based on habits. You can use literally any service you want and they could do the same thing.

        • RvTV95XBeo@sh.itjust.works
          link
          fedilink
          arrow-up
          1
          ·
          7 months ago

          If that’s true, why bother “monitoring” a search engine? This whole list screams of somebody who knows nothing about tech put out a vague RFP and a contractor pulled a list of “top sites” and used it to justify an egregious proposal cost.

          DOGE, if you’re looking for waste and fraud, perhaps here’s a good source.

          • drascus@sh.itjust.worksOP
            link
            fedilink
            arrow-up
            1
            ·
            7 months ago

            They do it all to build up a huge web of interconnected data points. Duckduckgo itself they might take as evidence that someone is trying to hide something. Then the government goes to a FISA court and gets permission to have other tech companies hand over all your data. Its not any one site its the picture that can be gleamed from all the data available across all the sites.

    • Euphoma@lemmy.ml
      link
      fedilink
      English
      arrow-up
      1
      ·
      7 months ago

      Why tf is PGP, tesseract ocr, and deepl on this list. Deepl is literally just a machine translation service, users don’t post onto it. tesseract ocr is a downloadable software for ocr. PGP is encryption.