• davel [he/him]@lemmy.ml
    link
    fedilink
    English
    arrow-up
    25
    ·
    8 months ago

    Paywall bypass: http://archive.today/2025.03.12-170136/https://www.404media.co/the-200-sites-an-ice-surveillance-contractor-is-monitoring/

    The list: https://archive.ph/o/Lldzh/https://docs.google.com/spreadsheets/d/1VyAaJaWCutyJyMiTXuDH4D_HHefoYxnbGL9l02kyCus/edit?usp=sharing&ref=404media.co

    It doesn’t appear to have any fediverse instances, unless you want to count Threads. It does have ProtonMail & Signal; I wonder what that actually means.

    • arotrios@lemmy.world
      link
      fedilink
      English
      arrow-up
      17
      ·
      edit-2
      7 months ago

      Thanks for the list. Unfortunately, they list “Fediverse” which likely means they’re scraping ActivityPub. They’re also going after your Steam account, Twitch, YouTube, and porn.

      In other words, this is so much worse than the headline makes it out to be.

      Surprisingly, Reddit is NOT on the list.

      EDIT: I was wrong - thanks to Da Cap’n for the correction.

      Here’s the full list of names:

      4chan Archives

      Discord Archives

      21Buttons

      500px

      about.me

      AllMyLinks

      AllTrails

      Amazon

      Ameba

      Amino

      AnimePlanet

      Apple Music

      Artists&Clients

      Asciinema

      AudioJungle

      AudiUSA

      BabyCenter

      Baidu

      BeReal

      Bigo Live

      Bing

      Biolink

      BitChute

      BlackPlanet

      Blogger

      Bluesky

      Bodybuilding

      BookCrossing

      Breaches

      BuyMeACoffee

      Cash App

      CastingCall Club

      Chaturbate

      Chess.com

      Cigar Dojo

      CityXGuide

      CloutHub

      Cocolog

      Companies House

      Cozy.tv

      Cracked

      Creema

      Dailymotion

      Danbooru

      Dark Web

      DeepL

      DeviantArt

      Disqus

      DLive

      Dot.cards

      Douyin

      Drum

      DuckDuckGo

      Duolingo

      E621

      eBay

      Eporner

      Etsy

      Facebook

      Fansly

      FastPeopleSearch

      Fediverse (likely ActivityPub - possibly DMs between servers)

      FetLife

      Fiverr

      Flickr

      FlightAware

      Foursquare

      FriendFinder

      FurAffinity

      Gab

      Gaia Online

      GameFAQs

      Gelbooru

      GeneralMotors

      Geocaching

      GeoEstimation

      Gettr

      Giphy

      GitHub

      Glassdoor

      GoFundMe

      Goo

      Google

      Goodreads

      Gravatar

      Guancha

      GunBroker

      Habbo

      Hackaday

      Hatena

      Honda

      Hubski

      ILoveGrowingMarijuana

      ImageShack

      Imgur

      IMVU

      Indeed

      Instagram

      Instructables

      JudyRecords

      Jugem

      JustForFans

      Keybase

      Kick

      Kik

      Last.fm

      LibraryThing

      Lichess

      Likee

      Line

      LinkedIn

      Linktree

      LiveIn

      LiveJournal

      Lobsters

      Mail.ru

      Malgari

      MapMyTracks

      Marshmallow

      MarTech

      Massage Anywhere

      Medium

      MeetMe

      Mercari Jp

      MeWe

      Minds

      Minecraft

      Mix

      Mixlr

      ModDB

      Mughosts

      MyFitnessPal

      Myspace

      MySubaru

      Naijapals

      Nextdoor

      NissanUSA

      Odysee

      OFAC Sanctions List

      OkCupid

      OK.ru

      OnlyFans

      Pandia

      Pandora

      Passes

      Pastebin

      Patreon

      PayPal

      PCGamer

      Peloton

      PGP

      Pinterest

      Plurk

      Poal

      Popl

      Pornhub

      Poshmark

      Product Hunt

      ProtonMail

      PSNProfiles

      Reblogme

      Reddit

      RedGifs

      Replit

      ReverbNation

      Roblox

      Rule34.xxx

      Rumble

      Rutube

      ScoutWiki

      Seesaa

      Seneporno

      Signal

      SkipTheGames

      Skype

      SlideShare

      Snapchat

      Sogou

      SoundCloud

      SourceForge

      Spiceworks

      Spotify

      Sprashivai

      Steam (fuck off you fucking fucks)

      StellantisEU

      StellantisUSA

      Strava

      Stripchat

      Substack

      TechNet

      Telegram

      Tellows

      Tesseract OCR

      Threads

      TikTok

      Tinder

      TinEye

      ToyotaUSA

      Trakt

      Triller

      TripAdvisor

      TrueCaller

      TruthSocial

      Tumblr

      Twilio

      Twitch

      Twitter

      Untappd

      Venmo

      VidLii

      Vimeo

      Vine

      VirusTotal

      VK

      Volkswagen

      VSCO

      WatchMeMore

      Weibo

      WhatsApp

      Wire

      Wordfeud

      Xbox

      xHamster

      XING

      XVideos

      Yahoo

      Yandex

      Yappy

      YCombinator

      Yelp

      YouTube

      Zhihu

      Zillow

      ZoneH

      • dubyakay@lemmy.ca
        link
        fedilink
        arrow-up
        8
        ·
        8 months ago

        Reddit is right there in your list.

        Also:

        Gaia Online

        Thanks. Brings back memories.

        • drascus@sh.itjust.worksOP
          link
          fedilink
          arrow-up
          5
          ·
          edit-2
          7 months ago

          Probably just whatever the public metadata is. metadata is super powerful especially if you have a lot of it. if the email was protonmail to protonmail they will get nothing. If it’s gmail to protonmail they will know that user X is talking to User Y in gmail. They will also have the email header information which is basically just going to be clear text. so they can still ascertain who you know, who you are talking about, and maybe a bit about what the conversation has to do with.

          EDIT: so I asked protonmail directly about it and they confirmed its only publicly available information that they can get. For instance they can try and verify if a certain email address exists. However proton told me that they actively watch for this kind of thing and block IPS trying to do this sort of monitoring.

          • EveryMuffinIsNowEncrypted@lemmy.blahaj.zone
            link
            fedilink
            English
            arrow-up
            3
            ·
            7 months ago

            Oof, yeah I forgot about the metadata… What you say is certainly true and is worrisome.

            Plus, most people who use email don’t use encrypted email so even if they can’t get a transcript of a conversation from my account, they can certainly get everything from the other account if they also scrape that platform.

      • davel [he/him]@lemmy.ml
        link
        fedilink
        English
        arrow-up
        6
        arrow-down
        1
        ·
        8 months ago

        Surprisingly, Reddit is NOT on the list.

        If they’re slurping all these other sites, I highly doubt they’re not slurping Reddit, too, even if it’s not on the list.

        Fediverse (likely ActivityPub - possibly DMs between servers)

        They would have to hack the individual servers to get at the DMs, because they’re encrypted in transit. All the public stuff is trivial to scrape.

        • arotrios@lemmy.world
          link
          fedilink
          English
          arrow-up
          2
          ·
          8 months ago

          They would have to hack the individual servers to get at the DMs, because they’re encrypted in transit. All the public stuff is trivial to scrape.

          Nope, ActivityPub DMs are not encrypted between servers - if it’s on the feed, it’s public- or at least it was as of six months ago. I found this out when I attached a Wordpress site to a Mastodon instance and suddenly found i could read anyone’s DMs to users on other servers. Totally unencrypted. I actually paused development and working with ActivityPub because of it.

          This doesn’t mean that messages to users on the same server are necessarily exposed, but the potential is there if you don’t have a filter for local publishing only engaged on your Mastodon instance.

          • davel [he/him]@lemmy.ml
            link
            fedilink
            English
            arrow-up
            5
            ·
            8 months ago

            ActivityPub DMs are not encrypted between servers

            It is insofar as TLS/SSL/HTTPS encryption is used in transit. That’s what I mean by encrypted in transit.

            i could read anyone’s DMs to users on other servers

            If you’re an administrator for (WordPress) ActivityPub server A, you can see all the DMs coming to and leaving from your server, yes. And they’re not encrypted at rest, so you can read them any time. But how would you see DMs going between server B and server C, when your server isn’t involved in the transaction?

            • arotrios@lemmy.world
              link
              fedilink
              English
              arrow-up
              3
              ·
              edit-2
              8 months ago

              It apparently scrapes everything on the public feed. So when I subscribed to users on Mastodon server A from Wordpress, DMs from Mastodon server A going to Mastodon server B became visible.

              I had a separate account on Mastodon server A to confirm that I couldn’t see these DMs as Mastodon user on server A, and that the Wordpress scrape was grabbing messages normally not meant for public view.

              This was using the ActivityPub plugin for Wordpress about six months ago.

              EDIT: I should be clear that I was as surprised as the other commentators that the DMs weren’t encrypted and that I could see them at all through a 3rd party software. I did NOT see DMs between local users - only cross-instance.

        • drascus@sh.itjust.worksOP
          link
          fedilink
          arrow-up
          3
          ·
          8 months ago

          Duckduckgo is not the problem. They are using publicly scrapable information. So for instance if they have fingerprinted your device they see you go to duckduckgo, then they see you access a site about buying guns, it becomes trivial to determine what you searched for. They would not have direct access to what you search on duckduckgo and duckduckgo is not giving them access. They are using various methods to collect data based on habits. You can use literally any service you want and they could do the same thing.

          • RvTV95XBeo@sh.itjust.works
            link
            fedilink
            arrow-up
            1
            ·
            7 months ago

            If that’s true, why bother “monitoring” a search engine? This whole list screams of somebody who knows nothing about tech put out a vague RFP and a contractor pulled a list of “top sites” and used it to justify an egregious proposal cost.

            DOGE, if you’re looking for waste and fraud, perhaps here’s a good source.

            • drascus@sh.itjust.worksOP
              link
              fedilink
              arrow-up
              1
              ·
              7 months ago

              They do it all to build up a huge web of interconnected data points. Duckduckgo itself they might take as evidence that someone is trying to hide something. Then the government goes to a FISA court and gets permission to have other tech companies hand over all your data. Its not any one site its the picture that can be gleamed from all the data available across all the sites.

      • Euphoma@lemmy.ml
        link
        fedilink
        English
        arrow-up
        1
        ·
        7 months ago

        Why tf is PGP, tesseract ocr, and deepl on this list. Deepl is literally just a machine translation service, users don’t post onto it. tesseract ocr is a downloadable software for ocr. PGP is encryption.

    • drascus@sh.itjust.worksOP
      link
      fedilink
      arrow-up
      1
      ·
      7 months ago

      Likely every product any amazon customer ever views. They could potentially even figure out which things you buy. But you can get a pretty clear picture of someone’s personality and interests if you know everything they search for.

      • plz1@lemmy.world
        link
        fedilink
        English
        arrow-up
        1
        ·
        7 months ago

        How would they find that info from the outside, though? Or are you saying they are hooked into Amazon’s internal data harvesting ecosystem?

        • drascus@sh.itjust.worksOP
          link
          fedilink
          arrow-up
          1
          ·
          7 months ago

          Not really necessary people make DNS requests which are pretty easy to track if you know what URL was requested that will be the exact product. This can all be done by man in the middle and monitoring network traffic. But even that is sort of unnecessary. They could very possibly have contracts with ISPs or other network operators some of that is likely just secret and they dont disclose it.