A ‘Shocking’ Amount of the Web Is Already AI-Translated Trash, Scientists Determine

L4sBot@lemmy.world · 2 years ago

A ‘Shocking’ Amount of the Web Is Already AI-Translated Trash, Scientists Determine

Brkdncr@lemmy.world · 2 years ago

I recently was searching for some tips on overlanding routes. So many sites are just long strung together SEO word salad.

1984@lemmy.today · 2 years ago

I bet you get better results with Kagi. I don’t see much crap in my results with it.

ABCDE@lemmy.world · 2 years ago

Looks interesting. I recommend Perplexity.ai for finding information (sourced), like a more accurate GPT.

1984@lemmy.today · 2 years ago

Heard about it yesterday too, will try. Thanks.

2 years ago

Someone recommended that to me awhile back, the app is sleek too.

grue@lemmy.world · 2 years ago

I’ve been saying for quite a while now that the Internet was best in the '90s and early 2000s back before it was commercialized, even despite all the “under construction” gifs and whatnot. The signal/noise ratio has only continued to drop since then.

maness300@lemmy.world · 2 years ago

Counterpoint: the Internet still exists as it did back then, but relatively smaller compared to what it’s become.

You just need to find the right people and content to interact with, which is harder now because there’s so much more garbage. I’d say they have grown in absolute numbers.

grue@lemmy.world · 2 years ago

I get what you’re saying that '90s-style content is largely still there if you look for it, but this…

…which is harder now because there’s so much more garbage…

…has nevertheless destroyed the “Internet as it existed back then,” which was specifically an Internet where finding such content was easy.

Euphoma@lemmy.ml · 2 years ago

You can find a lot of old school websites hosted on neocities, though a lot of them are more of an art project than an actual website.

layzerjeyt@lemmy.dbzer0.com · 2 years ago

But all our tripod, angelfire, geocities etc websites were little art projects.

layzerjeyt@lemmy.dbzer0.com · 2 years ago

https://www.bleepingcomputer.com/news/software/browse-the-web-like-its-the-90s-with-this-free-service/

lolcatnip@reddthat.com · 2 years ago

Is it harder? It was very hard to find anything on the old internet.

jaybone@lemmy.world · 2 years ago

No. 2000s Google, I could search for a specific string in quotes (like an obscure error message trying to boot xbmc on an old xbox, or a kernel patch for a hackintosh) Now it’s all some SEO bullshit about how I need to watch some asshole’s 10 minute YouTube video about something tangentially related.

layzerjeyt@lemmy.dbzer0.com · 2 years ago

i search for error messages all the time on ddg and it usually finds relevant results. it fails when errors are not sufficiently obscure, such as a common python error occurring in many code bases, permissions errors, vaguely-worded errors etc. But there is no way for the internet to guess context in such a situation. spam is not a problem.

if google is so bad stop using it.

ThirdWorldOrder@lemm.ee · 2 years ago

Just had to find the right webring /s

layzerjeyt@lemmy.dbzer0.com · 2 years ago

and there are websites like https://wiby.me/ that exist to assist people in finding the old-type content.

rottingleaf@lemmy.zip · edit-2 2 years ago

I hope you remember the amounts of spam and machine-translated text back then.

Being not an English speaker, you’d basically expect most of what you find to be machine-translated and badly at that.

Pirate localizations of games were basically translated the way that you’d get some basic idea sometimes somewhere, but in general it was probably worse than the English version, which would at least make some sense if you knew some English.

It’s people and IT companies which were better.

grue@lemmy.world · edit-2 2 years ago

Since I am an English speaker, my '90s Internet experience was very different than that. There were “link farms” (pages designed to exploit early search engine algorithms that scored pages higher when they got linked to a lot) and e-mail spam, of course, but being unsophisticated, it was generally a lot easier not to get suckered in by than the firehose of AI-written advertorials and shit we have today.

wikibot@lemmy.world · 2 years ago

Here’s the summary for the wikipedia article you mentioned in your comment:

An advertorial is an advertisement in the form of editorial content. The term “advertorial” is a blend (see portmanteau) of the words “advertisement” and "editorial. " Merriam-Webster dates the origin of the word to 1946. In printed publications, the advertisement is usually written to resemble an objective article and designed to ostensibly look like a legitimate and independent news story. In television, the advertisement is similar to a short infomercial presentation of products or services.

^to ^opt ^out^, ^pm ^me ^{‘optout’.} ^article ^| ^about

rottingleaf@lemmy.zip · 2 years ago

Right, but what we have today has been predicted by people seeing what was then (and even earlier).

jawa21@startrek.website · 2 years ago

You forgot the pop-ups, forced midi music, easily injected malware, difficulty in verifying sources, html frames that frequently broke, the entire concept of needing a site map, fucking keywords, true banner ads that could force clicks with Javascript, and RealPlayer to name a few. I don’t miss it at all.

grue@lemmy.world · edit-2 2 years ago

No, I didn’t forget anything. It was still better even despite all that.

jaybone@lemmy.world · 2 years ago

Besides, getting RealPlayer videos to play on Redhat 6 was half the fun.

LillyPip@lemmy.ca · edit-2 2 years ago

It was always bad, it’s just now bad in a slightly different way. I’ve been online since 1994 and, yeah. If anything, it’s a bit easier to avoid malware and scams these days. Even websites from reputable sources were sketch as fuck back then, with seizure-inducing popups and a minefield of JavaScript malware with no real options for VPN or blocking ads.

It’s been getting steadily better over the past 10 years or so, and the AI nonsense is threatening to send us back to the early internet Wild West.

All we need now is for Microsoft to start including 30 very sketchy ‘demos’ and mandatory adware with Windows again and the nostalgia will be complete.

The internet is light years ahead today. What we need is anti-ai filters in our browser to keep our browsing clean of shitty AI nonsense, kinda like ad blocking plugins.

e: I’d do UX, usability, and some dev on such a plugin if anyone wants to do some dev, too.

paddirn@lemmy.world · 2 years ago

More evidence for the Dead Internet Theory.

wikibot@lemmy.world · 2 years ago

Here’s the summary for the wikipedia article you mentioned in your comment:

The dead Internet theory is an online conspiracy theory that asserts that the Internet now consists mainly of bot activity and automatically generated content that is manipulated by algorithmic curation, marginalizing organic human activity. Proponents of the theory believe these bots are created intentionally to help manipulate algorithms and boost search results in order to ultimately manipulate consumers. Furthermore, some proponents of the theory accuse government agencies of using bots to manipulate public perception, stating “The U. S. government is engaging in an artificial intelligence powered gaslighting of the entire world population”.

^to ^opt ^out^, ^pm ^me ^{‘optout’.} ^article ^| ^about

BananaOnionJuice@lemmy.dbzer0.com · 2 years ago

Best time for a bot to reply.

asudox@lemmy.world · edit-2 2 years ago

ironically

riodoro1@lemmy.world · 2 years ago

Fucking ironic

stewsters@lemmy.world · 2 years ago

Lol, read the room bot.

Octopus1348@lemy.lol · 2 years ago

WikiBot on Lemmy!

robocall@lemmy.world · 2 years ago

Good bot

Linssiili@sopuli.xyz · edit-2 2 years ago

Recently I was looking for info (in finnish) how to prevent car windows from fogging. I found a really weird website all about car windows, but it kept confusing car and house windows. It instructed to clean car windows by “opening the window and cleaning between the panels”.

It was obviously ai-generated, but I couldn’t figure out why. They weren’t selling anything, there were no ads and no links to other websites or services.

Edit: I found the site again, I cannot spot anything nefarious, but proceed with caution: https://www.lasinvaihto.fi/

theluddite@lemmy.ml · 2 years ago

It’s probably either waiting for approval to sell ads or was denied and they’re adding more stuff. Google has a virtual monopoly on ads, and their approval process can take 1-2 weeks. Google’s content policy basially demands that your site by full of generated trash to sell ads. I did a case study here, in which Google denied my popular and useful website for ads until I filled it with the lowest-quality generated trash imaginable. That might help clarify what’s up.

CashewNut 🏴󠁢󠁥󠁧󠁿@lemmy.world · 2 years ago

What an absolute ballbag Google is.

Linssiili@sopuli.xyz · 2 years ago

The posts are from march 2023, and there are no ads yet :/

theluddite@lemmy.ml · edit-2 2 years ago

Dates could be made up, too.The blog posts that I generated for my site included made up dates in the past. The internet archive says it has a snapshot for March of 2023, but when I click it, it says it doesn’t, so I have no way of verifying. The theory about parking real estate hoping to sell it also seems pretty plausible to me. Who knows what dumb shit they’re up to.

aubertlone@lemmy.world · 2 years ago

Hey man! I’ve read this article a few times, perhaps from other comments on Lemmy!

Thanks for the write-up. I’m a programmer myself.

Stuck in operations in my new job until we’re done with the data center exit/ migration. Anyway cool beans, and very interesting article. Will keep all this in mind if any of my hobby projects take off.

Lemminary@lemmy.world · 2 years ago

Instead of feeling defeated, like every other millennial that doesn’t want to work,

That is one weird glib to throw in there.

theluddite@lemmy.ml · 2 years ago

My editor is an actual saint. Imagine all the shit that she has to put up with that gets cut if that made it through!

jdf038@mander.xyz · 2 years ago

Perhaps parking a site for traffic and then using the enshitified data to sell it?

It makes me sick how dumb it sounds.

crazyCat@sh.itjust.works · 2 years ago

People who care about SEO for their window-related businesses will pay the blog to link to them from there.

Linssiili@sopuli.xyz · 2 years ago

That would make sence, also the domain is really good (lasinvaihto.fi, translates to windscreenreplacement.fi). Maybe they are planning to sell the domain?

Tetractys@lemmy.world · 2 years ago

Good bot.

SomeGuy69@lemmy.world · 2 years ago

I need an AI Firefox extension that detects badly translated AI text and automatically blocks those domains.

50MYT@aussie.zone · 2 years ago

A search engine that displays only human created content, and hides AI.

lolcatnip@reddthat.com · 2 years ago

That will probably never be possible.

spacesatan@lemm.ee · 2 years ago

Automatically no, but I’ve been waiting years for somebody to make a crowdsourced blacklist extension for search engines. A little ‘84% of voting users say this site contains low-quality algorithmically generated content’ next to search results or something.

pinkdrunkenelephants@lemmy.world · 2 years ago

🤔 It could be if you removed anonymity from the internet, though that would open a whole different can of worms.

Pringles@lemm.ee · 2 years ago

That’s actually a pretty good idea.

crazyCat@sh.itjust.works · 2 years ago

It is and it isn’t, “AI detection” is even crappier than AI is.

aesthelete@lemmy.world · edit-2 2 years ago

For a time I thought this Fediverse thing would help or change things or something, but honestly…the Internet is just plain boring now…and it’s pretty clear what is causing that: AI / SEO trash content, social media’s rise, and commercialization of the Internet generally.

One day I was even feeling nostalgic so I went back to where I spent hours upon hours of my youth: EFNet on IRC…there was basically nobody there and of the few channels I saw some were even Trump-leaning weirdo “communities”.

It’s basically finished. I can’t even find a decent place to procrastinate or hang out anymore on this POS. It’s all just a giant ad surface and e-commerce portal. The fucking owners won.

AMDIsOurLord@lemmy.ml · 2 years ago

EFNet is boomer shit. Most of IRC happens on other servers now, like LiberaChat, or on new protocols like Matrix.

We’re still here, we’re still alive

Liz@midwest.social · 2 years ago

Yo someone mentioned librechat while I was on hexchat. How do I get onto a librechat server?

AMDIsOurLord@lemmy.ml · 2 years ago

Like you get to any other IRC, look up the address and login then make a /nickserv account and browse

Also Hexchat if I’m not mistaken doesn’t support IRC v3 protocol

Just_Pizza_Crust@lemmy.world · edit-2 2 years ago

The fucking owners won.

Always has been 🔫

That said, I would suggest smaller communities and private messaging. Find your niche and make it home.

Jax@sh.itjust.works · edit-2 2 years ago

Yep, it might have been hijacked by consumers but it’s still a communication network.

ABCDE@lemmy.world · 2 years ago

Thanks, scientists, couldn’t have known that without you.

ForgotAboutDre@lemmy.world · 2 years ago

There is value in verifying and quantifying opinion, even if your sure this opinion is true.

pinkdrunkenelephants@lemmy.world · 2 years ago

*you’re sure

jaybone@lemmy.world · 2 years ago

Next up: scientists detect sarcasm.

ABCDE@lemmy.world · 2 years ago

No way.

Jayu@lemm.ee · 2 years ago

The most annoying aspect of this is when you know actual information has to be out there, but it is being drowned out by dozens of sites reposting the less relevant and low quality information… And then you go to search in another language and you see substandard machine translations of all the garbage you were just fleeing, lol.

TheRealKuni@lemmy.world · edit-2 2 years ago

I was trying to find the radius of the corner of the iPad Pro. Not the screen, the actual device. No matter what I modified my search term to all I could find was information about the screen corner (and how it isn’t a true radius and blah blah blah) or AI generated bullshit.

Eventually I gave up and changed the way I was tackling my project. I know the info is out there, people make cases for these things.

Misconduct@lemmy.world · 2 years ago

It’s getting to the point where I have to use AI to help me sift through all the AI bullshit :(

kingthrillgore@lemmy.ml · 2 years ago

Turing tests solving turing tests solving turing tests

maegul (he/they)@lemmy.ml · 2 years ago

The whole webring idea needs to come back. Human curated recommendations of good resources and pages. So long as these pages remain in the control of humans and dedicated to curation and are decentralised, unlike the search engines, then they’ll be reliable.

Plugging in some social and community organisation, perhaps like a wiki, and you could get even more out of it.

Euphoma@lemmy.ml · 2 years ago

There are modern webrings. Dang the yesterweb webring shut down, that was a really good one.

UNWILLING_PARTICIPANT@sh.itjust.works · 2 years ago

Got any other reccos? I’m brand new to the concept

BetaDoggo_@lemmy.world · edit-2 2 years ago

This isn’t shocking at all. The markets for obscure language content is incredibly small so there’s no incentive for most to spend resources on it. I’d argue mediocre machine translation is better than nothing at all in many cases, but for unsupervised training it does pose a challenge.

xantoxis@lemmy.world · edit-2 2 years ago

They didn’t only look at low-resource languages, they just started there because that was the problem domain. They found that 57% of ALL sentences on the Internet appeared to be machine translated, including translations into high-resource languages. The remaining 43% might also be machine generated, it just wasn’t found to be part of a multi-way parallel group.

Falcon@lemmy.world · 2 years ago

Translation is very different from generation.

As a matter of fact, even AI generation has different grades of quality.

SEO garbage is certainly not the same as an article with AI generated components and very different from a translated article.

Random_Character_A@lemmy.world · 2 years ago

Too good not to be ruined by humanity

Laurel Raven@lemmy.blahaj.zone · 2 years ago

In the beginning humanity was created. This had made many people very angry and has been widely regarded as a bad move.

Douglas Adams, probably…

Whelks_chance@lemmy.world · 2 years ago

I had a lot of fun pasting that into dalle the other day, created some funny stuff.

GilgameshCatBeard@lemmy.ca · 2 years ago

AI is going to fuck up everything we’ve ever done.

stewsters@lemmy.world · edit-2 2 years ago

We fucked it up on our own with SEO long before chatgpt came along. Google has been going downhill for years as people learn to game the algorithm.

It will speed it along sure, but the core problem is that is profitable to dump garbage on the internet and put ads on it. The monitozation is the root of this.

rottingleaf@lemmy.zip · 2 years ago

Is this really 2024? I felt myself in 2004 for a moment.

SkyNTP@lemmy.ml · 2 years ago

If only. 2004 was better.

1984@lemmy.today · 2 years ago

I agree. If I could go back to the 1980s, I would.

Unforeseen@sh.itjust.works · 2 years ago

1984, specifically

rottingleaf@lemmy.zip · 2 years ago

1983 to see “Wargames” in theater?