PSA: Lemmy votes can be manipulated

koper@feddit.nl · 2 years ago

PSA: Lemmy votes can be manipulated

PetrichorBias@lemmy.one · edit-2 2 years ago

This was a problem on reddit too. Anyone could create accounts - heck, I had 8 accounts:

one main, one alt, one “professional” (linked publicly on my website), and five for my bots (whose accounts were optimistically created, but were never properly run). I had all 8 accounts signed in on my third-party app and I could easily manipulate votes on the posts I posted.

I feel like this is what happened when you’d see posts with hundreds / thousands of upvotes but had only 20-ish comments.

There needs to be a better way to solve this, but I’m unsure if we truly can solve this. Botnets are a problem across all social media (my undergrad thesis many years ago was detecting botnets on Reddit using Graph Neural Networks).

Fwiw, I have only one Lemmy account.

impulse@lemmy.world · 2 years ago

I see what you mean, but there’s also a large number of lurkers, who will only vote but never comment.

I don’t think it’s unfeasible to have a small number of comments on a highly upvoted post.

SGforce@lemmy.ca · 2 years ago

If it’s a meme or shitpost there isn’t anything to talk about

PetrichorBias@lemmy.one · 2 years ago

Maybe you’re right, but it just felt uncanny to see thousands of upvotes on a post with only a handful of comments. Maybe someone who active on the bot-detection subreddits can pitch in.

RedCowboy@lemmy.world · 2 years ago

I agree completely. 3k upvotes on the front page with 12 comments just screams vote manipulation

randomname01@feddit.nl · 2 years ago

True, but there were also a number of subs (thinking of the various meirl spin-offs, for example) that naturally had limited engagement compared to other subs. It wasn’t uncommon to see a post with like 2K upvotes and five comments, all of them remarking how little comments there actually were.

simple@lemmy.world · 2 years ago

Reddit had ways to automatically catch people trying to manipulate votes though, at least the obvious ones. A friend of mine posted a reddit link for everyone to upvote on our group and got temporarily suspended for vote manipulation like an hour later. I don’t know if something like that can be implemented in the Fediverse but some people on github suggested a way for instances to share to other instances how trusted/distrusted a user or instance is.

cynar@lemmy.world · 2 years ago

An automated trust rating will be critical for Lemmy, longer term. It’s the same arms race as email has to fight. There should be a linked trust system of both instances and users. The instance ‘vouches’ for the users trust score. However, if other instances collectively disagree, then the trust score of the instance is also hit. Other instances can then use this information to judge how much to allow from users in that instance.

hawkwind@lemmy.management · 2 years ago

LLM bots has make this approach much less effective though. I can just leave my bots for a few months or a year to get reputation, automate them in a way that they are completely indistinguishable from a natural looking 200 users, making my opinion carry 200x the weight. Mostly for free. A person with money could do so much more.

cynar@lemmy.world · 2 years ago

It’s the same game as email. An arms race between spam detection, and spam detector evasion. The goal isn’t to get all the bots with it, but to clear out the low hanging fruit.

In your case, if another server noticed a large number of accounts working in lockstep, then it’s fairly obvious bot-like behaviour. If their home server also noticed the pattern and reports it (lowers the users trust rating) then it wont be dinged harshly. If it reports all is fine, then it’s also assumed the instance might be involved.

If you control the instance, then you can make it lie, but this downgrades the instance’s score. If it’s someone else’s, then there is incentive not to become a bot farm, or at least be honest in how it reports to the rest.

This is basically what happens with email. It’s FAR from perfect, but a lot better than nothing. I believe 99+% of all emails sent are spam. Almost all get blocked. The spammers have to work to get them through.

fmstrat@lemmy.nowsci.com · 2 years ago

This will be very difficult. With Lemmy being open source (which is good), bot maker’s can just avoid the pitfalls they see in the system (which is bad).

70ms@lemmy.world · edit-2 2 years ago

I got suspended multiple times because my partner and daughter were also in our city’s sub, and sometimes one of them would upvote my comments without realizing it was me. It got really fucking annoying, and of course there’s no way to talk to a real person at reddit to prove we’re different people. I’d appeal every time and they’d deny it every time. How reddit could have gotten so huge without realizing that multiple people can live in the same household is beyond me. In the end they both just stopped upvoting anything in the sub because it was too risky (for me).

Derproid@sh.itjust.works · 2 years ago

That’s such a hilariously bad metric for detecting a bot network too. It wouldn’t even work to detect a real one, so all that policy ever did was annoy real users.

TheSaneWriter@lemm.ee · 2 years ago

Hearing that, I wonder if they were using an IP address based system. That would cause real problems for people using a VPN, but it wouldn’t surprise me.

TWeaK@lemm.ee · 2 years ago

RIP u/unidan

ඞmir@lemmy.ml · 2 years ago

I miss everyone being Unidan

PeleSpirit@lemmy.world · 2 years ago

I think it’s the 3rd party app or VPN thing that would have saved your friend.

esty@lemmy.ca · 2 years ago

nope, i tried manipulating votes from apollo once and got a warning

PeleSpirit@lemmy.world · 2 years ago

Were you on a VPN?

esty@lemmy.ca · 2 years ago

nope, so that’s probably it

Thorny_Thicket@sopuli.xyz · 2 years ago

I got that message too when switching accounts to vote several times. They can probably see it’s all coming from the same ip.

BrianTheeBiscuiteer@lemmy.world · 2 years ago

Yes, I feel like this is a moot point. If you want it to be “one human, one vote” then you need to use some form of government login (like id.me, which I’ve never gotten to work). Otherwise people will make alts and inflate/deflate the “real” count. I’m less concerned about “accurate points” and more concerned about stability, participation, and making this platform as inclusive as possible.

PetrichorBias@lemmy.one · edit-2 2 years ago

In my opinion, the biggest (and quite possibly most dangerous) problem is someone artificially pumping up their ideas. To all the users who sort by active / hot, this would be quite problematic.

I’d love to actually see some social media research groups actually consider how to detect and potentially eliminate this issue on Lemmy, considering Lemmy is quite new and is malleable at this point (compared to other social media). For example, if they think metric X may be a good idea to include in all metadata to increase chances of detection, then it may be possible to include this in the source code of posts / comments / activities.

I know a few professors and researchers who do research on social media and associated technologies, I’ll go talk to them when they come to their office on Monday.

BrianTheeBiscuiteer@lemmy.world · 2 years ago

This also vaguely reminds me of some advanced networking topics. In mesh networks there is the possibility of rogue nodes causing havoc and different methods exist to reduce their influence or cut them out of the process.

theolodger@feddit.uk · edit-2 2 years ago

!remindme - oh wait…

Lumidaub@feddit.de · 2 years ago

@remindme@mstdn.social 1 day

:)

Remind Me@mstdn.social · 2 years ago

@Lumidaub Ok, I will remind you on Monday Jul 10, 2023 at 9:36 AM PDT.

zuhayr@lemmy.world · 2 years ago

I have been thinking about this government id aspect too. But it’s not coming to me.

Users sign up with govt ID, obtain a unique social media key that’s used for all activities beyond the sign up. One key per person, but a person can have multiple accounts? You know, like that database primary key.

The relationship between the govt id and social media key needs to be in a zero knowledge encryption so that no one can corelate the real person with their online presence. THIS is the bummer.

SrElsewhere@lemmy.world · 2 years ago

These downvotes indicate that some of the assholes have now migrated.

InternetPirate@lemmy.fmhy.ml · edit-2 2 years ago

I feel like this is what happened when you’d see posts with hundreds / thousands of upvotes but had only 20-ish comments.

Nah it’s the same here in Lemmy. It’s because the algorithm only accounts for votes and not for user engagement.

AndrewZabar@beehaw.org · 2 years ago

Yeah votes are the worst metric to measure anything because of bot voters.

AndrewZabar@beehaw.org · 2 years ago

On Reddit there were literally bot armies by which thousands of votes could be instantly implemented. It will become a problem if votes have any actual effect.

It’s fine if they’re only there as an indicator, but if the votes are what determine popularity, prioritize visibility, it will become a total shitshow at some point. And it will be rapid. So yeah, better to have a defense system in place asap.

Thorny_Thicket@sopuli.xyz · 2 years ago

I always had 3 or 4 reddit accounts in use at once. One for commenting, one for porn, one for discussing drugs and one for pics that could be linked back to me (of my car for example) I also made a new commenting account like once a year so that if someone recognized me they wouldn’t be able to find every comment I’ve ever written.

On lemmy I have just two now (other is for porn) but I’m probably going to make one or two more at some point

auth@lemmy.ml · 2 years ago

I have about 20 reddit accounts… I created/ switched account every few months when I used reddit

Dandroid@dandroid.app · 2 years ago

If you and several other accounts all upvoted each other from the same IP address, you’ll get a warning from reddit. If my wife ever found any of my comments in the wild, she would upvoted them. The third time she did it, we both got a warning about manipulating votes. They threatened to ban both of our accounts if we did it again.

But here, no one is going to check that.

MigratingtoLemmy@lemmy.world · 2 years ago

Congratulations on such a tough project.

And yes, as long as the API is accessible somebody will create bots. The alternative is far worse though

Puph@lemmy.dbzer0.com · 2 years ago

I had all 8 accounts signed in on my third-party app and I could easily manipulate votes on the posts I posted.

There’s no chance this works. Reddit surely does a simple IP check.

Salamander@mander.xyz · 2 years ago

I would think that they need to set a somewhat permissive threshold to avoid too many false positives due to people sharing a network. For example, a professor may share a reddit post in a class with 600 students with their laptops connected to the same WiFi. Or several people sharing an airport’s WiFi could be looking at /r/all and upvoting the top posts.

I think 8 accounts liking the same post every few days wouldn’t be enough to trigger an alarm. But maybe it is, I haven’t tried this.

Valmond@lemmy.ml · 2 years ago

I had one main account but also a couple for using when I didn’t want to mix my “private” life up with other things. I don’t even know if it’s not allowed in the TOS?

Anyway, I stupidly made a Valmond account on several Lemmy instances before I got the hang of it, and when (if!) my server will one day function I’ll make an account there so …

I guess it might be like in the old forum days, you have a respectable account and another if you wanted to ask a stupid question etc. admin would see (if they cared) but not the ordinary users.

averyminya@beehaw.org · 2 years ago

Reddit will definitely send you PM’s for vote manipulation

FartsWithAnAccent@lemmy.world · 2 years ago

I’d just make new usernames whenever I thought of one I thought was funny. I’ve only used this one on Lemmy (so far) but eventually I’ll probably make a new one when I have one of those “Oh shit, that’d be a good username” moments.

Azzu@lemm.ee · 2 years ago

You can change your display name on Lemmy to whatever you want whenever you want.

FartsWithAnAccent@lemmy.world · 2 years ago

Oh neat! Thanks!

AndrewZabar@beehaw.org · 2 years ago

May I ask how do you format your text? My format bar has disappeared from wefwef.

PetrichorBias@lemmy.one · edit-2 2 years ago

I don’t use wefwef, I use jerboa for android.

**bold**

*italics*

> quote

`code`

# heading

- list

AndrewZabar@beehaw.org · edit-2 2 years ago

Ah ok. Yeah I thought the markdown was the same as reddit being markdown but it used to have a toolbar.

Thanks for response.

Also I’ve wondered why don’t they have an underline markdown.

TWeaK@lemm.ee · edit-2 2 years ago

Fun fact: old reddit used to use one of the header functions as an underline. I think it was 5x # that did it. However, this was an unofficial implementation of markdown, and it was discarded with new reddit. Also, being a header function you could only apply it to an entire line or paragraph, rather than individual words.

Hexorg@beehaw.org · 2 years ago

I think the best solution there is so far is to require captcha for every upvote but that’d lead to poor user experience. I guess it’s the cost benefit of user experience degrading through fake upvotes vs through requiring captcha.

magnetosphere @beehaw.org · 2 years ago

If any instance ever requires a captcha for something as trivial as an upvote, I’ll simply stop upvoting on that instance.

Hexorg@beehaw.org · 2 years ago

Yes that’s what I meant by degrading user experience

ඞmir@lemmy.ml · 2 years ago

It wouldn’t stop bots because they would just use any instance without the captcha

Catsrules@lemmy.ml · 2 years ago

I could see this being useful on a per community basis. Or something that a moderator could turn on and off.

For example on a political or news community during an election. It might be worth while to turn captcha on.

🐱TheCat@sh.itjust.works · 2 years ago

IMO the best way to solve it is to ‘lower the stakes’ - spread out between instances, avoid behaviors like buying any highly upvoted recommendation without due diligence etc. Basically, become ‘un-advertiseable’, or at least less so

Takatakatakatakatak@lemmy.dbzer0.com · 2 years ago

I don’t know how you got away with that to be honest. Reddit has fairly good protection from that behaviour. If you up vote something from the same IP with different accounts reasonably close together there’s a warning. Do it again there’s a ban.

PetrichorBias@lemmy.one · 2 years ago

I did it two or three times with 3-5 accounts (never all 8). I also used to ask my friends (N=~8) to upvote stuff too (yes, I was pathetic) and I wasn’t warned/banned. This was five-six years ago.

Andy@lemmy.world · 2 years ago

I’m curious what value you get from a bot? Were you using it to upvote your posts, or to crawl for things that you found interesting?

PetrichorBias@lemmy.one · edit-2 2 years ago

The latter. I was making bots to collect data (for the previously-mentioned thesis) and to make some form of utility bots whenever I had ideas.

I once had an idea to make a community-driven tagging bot to tag images (like hashtags). This would have been useful for graph building and just general information-lookup. Sadly, the idea never came to fruition.

Andy@lemmy.world · 2 years ago

Cool, thank you for clarifying!

vis4valentine@lemmy.ml · 2 years ago

I have like tens of accounts on reddit.

Boozilla@lemmy.world · 2 years ago

The lack of karma helps some. There’s no point in trying to rack up the most points for your account(s), which is a good thing. Why waste time on the lamest internet game when you can engage in conversation with folks on lemmy instead.

Protoknuckles@lemmy.world · 2 years ago

It can still be used to artificially pump up an idea. Or used to bury one.

danc4498@lemmy.world · 2 years ago

This is the problem. All the algorithms are based on the upvote count. Bad actors will abuse this.

Derproid@sh.itjust.works · 2 years ago

So maybe more weight should be put on comment count? Much harder to fake those.

AeroSoap@lemm.ee · edit-2 2 years ago

deleted by creator

Protoknuckles@lemmy.world · 2 years ago

So, the question becomes how do we rank posts and comments in a way that is not based on either upvotes or down votes or number of comments? I could see a trust value being made for each user based on trusted users marking others as trusted combined with a personal trust score, but that puts a barrier on new users and enforces echo chambers.

What else could be tried?

TheOnlyMego@lemmy.world · 2 years ago

that puts a barrier on new users and enforces echo chambers

Only if trust starts at 0. A system where trust started high enough to not filter out posts and comments would avoid that issue.

AeroSoap@lemm.ee · edit-2 2 years ago

deleted by creator

danc4498@lemmy.world · 2 years ago

Maybe instances should be assigned a rank for how dependable they are. Length of time active, number of active users… Stuff like that and each instance keeps track of its own rankings for each instance it is federated with. Put the upvote and those stats in a magic box to calculate the actual upvote value.

arefx@lemmy.ml · 2 years ago

That’s where all the harm comes from

hawkwind@lemmy.management · 2 years ago

Agree. Farming karma is nothing compared to making a single individual polar-opinion APPEAR as though it is other’s (or most’s) polar-opinion. We know that other’s opinions are not our own, but they do influence our opinions. It’s pretty important that either 1) like numbers mean nothing, in which case hot/active/etc. are meaningless or 2) we work together to ensure trust in like numbers.

Steve@compuverse.uk · 2 years ago

Maybe you move public perception of a product or political goal.
To push a narrative of some kind. Astroturfing basically.

Muddybulldog@mylemmy.win · edit-2 2 years ago

Lack of karma is a fallacy. The default Lemmy UI doesn’t display it but the karma system appears to be fully built.

hawkwind@lemmy.management · 2 years ago

The data to build it is there. Ftfy

Muddybulldog@mylemmy.win · edit-2 2 years ago

Tallies are maintained in the db in real-time. No calculating needed

hawkwind@lemmy.management · 2 years ago

I just mean that the karma system ala Reddit did more than just keep track of it and display it afaik. The data is in the db but a fully done karma system it is not. I could be wrong.

bassdrop321@feddit.de · 2 years ago

Corporations could use it to push their ads to the top

cakeistheanswer@lemmy.fmhy.ml · 2 years ago

This is near inevitable if this platform takes off.

Advertisers gonna advertise.

Shartacus@lemmy.world · edit-2 2 years ago

Just rip them in the comments and boycott their brand

Derproid@sh.itjust.works · 2 years ago

This is exactly why they wouldn’t risk officially advertising here. Not enough control over the platform leads to too much risk to brand perception.

Derproid@sh.itjust.works · 2 years ago

I was actually talking to someone that works in advertising and for big companies this is unlikely. Pepsi for example pays a lot for the guarntee that their product ads won’t appear near posts they don’t want them to. Since Lemmy advertising would only be through regular posts where they have no control over this, they likely wouldn’t risk the potential detriment to brand perception.

Now this can change if the potential reach of Lemmy is big enough but that size will be different for each company.

cakeistheanswer@lemmy.fmhy.ml · 2 years ago

Probably true. it’s the agencies who are desperate and likely to be looking to chatGPT to outsource ad copy who are going to be looking to capitalize.

No community is really above being targeted, because the good campaigns done by people in the niche tend to be indistinguishable from good posts.

reallynotnick@lemmy.world · 2 years ago

Maybe I’m misunderstanding karma, but Memmy appears to show the total upvotes I’ve gotten for comments and posts, isn’t that basically karma?

influence1123@psychedelia.ink · 2 years ago

I don’t think other people can see it though. On Reddit bot accounts would rack up karma so that when they switch to posting spam it looks like they have a lot of karma and are someone who posts worthwhile things.

Rufio@lemm.ee · 2 years ago

I’m using wefwef and can see what everyone score is on any given comment as well as their overall score when I go to their profile

influence1123@psychedelia.ink · 2 years ago

Yeah I was wrong. I use Jerboa mostly.

riceandbeans161@discuss.tchncs.de · 2 years ago

same on Memmy for me

reallynotnick@lemmy.world · 2 years ago

I can click on you and see the same stats for you… though the numbers seems too low when I eyeball it compared to your comments, but I’m thinking maybe it’s just total points for a single lemmy server?

influence1123@psychedelia.ink · 2 years ago

I guess I was wrong. I shouldn’t have assumed. I’m using Jerboa.

Someology@lemmy.world · edit-2 2 years ago

EDIT I was wrong! Lemmy does have karma, even listed in the API, though for some reason it doesn’t show this to you itself. So, those of us just using Lemmy directly have been under the mistaken idea that it didn’t do it, and those using third party apps are seeing it: https://lemmy.world/post/1250922?scrollToComments=true

~~That’s interesting, because on the Lemmy website, there is no total upvotes number visible. It only shows the total number of posts and total number of comments. It then shows the list of posts and comments, and you can see the scores for each, but there’s no total. Memmy must be calculating this itself. This seems to be something third party app developers are adding which is not present in actual Lemmy itself, in order to try to replicate Reddit Karma somewhat.

As Lemmy works itself: On Reddit, in addition to your posts and comments having visible scores, your username also has an aggregate score, which Lemmy does not have. At least, when I go to your profile, I can see the scores for your posts and comments, but I cannot see any aggregate score for you as a user. That’s what Reddit Karma is. I don’t know what black magic formula Reddit calculates it from, as old Reddit and new Reddit show different Karma numbers for the same user, but whatever algorithm they use, it’s an overall user score that Lemmy does not have (so far, at least). ~~

Muddybulldog@mylemmy.win · 2 years ago

While the Lemmy UI doesn’t expose the data is available via the API. That’s how clients like Memmy are getting it.

Someology@lemmy.world · 2 years ago

Yep! I just saw this other post where I learned I was wrong. That’s what I get for just using Lemmy itself. https://lemmy.world/post/1250922?scrollToComments=true

Ciryamo@feddit.de · 2 years ago

The lack of karma also makes it worse. Usually if I saw a discussion that felt kinda off I’d check the accounts age and karma. Made it easier to sniff out bots.

really@lemmy.world · 2 years ago

The karma though is what drove Reddit adoption to an extent. Gamification helps. It helped Reddit, it helped robinhood stocks app.

Maybe fediverse needs some gamification.

Or maybe not. Facebook and YouTube seem to be doing fine just using the line/unlike button.

Wander@yiffit.net · 2 years ago

In case anyone’s wondering this is what we instance admins can see in the database. In this case it’s an obvious example, but this can be used to detect patterns of vote manipulation.

Toish@yiffit.net · 2 years ago

“Shill” is a rather on-the-nose choice for a name to iterate with haha

Evergreen5970@beehaw.org · edit-2 2 years ago

I appreciate it, good for demonstration and just tickles my funny bone for some reason. I will be delighted if this user gets to 100,000 upvotes—one for every possible iteration of shill#####.

thanks_shakey_snake@lemmy.ca · 2 years ago

Oh cool 👀 What’s the rest of that table? Is the actor_id one column in like… an upvotes table or something?

Wander@yiffit.net · 2 years ago

actor_id is just the full url of an user. It has the username at the end. That’s why I have censored it.

popemichael@lemmy.world · 2 years ago

You can buy 700 votes anonymously on reddit for really cheap

I don’t see that it’s a big deal, really. It’s the same as it ever was.

Valmond@lemmy.ml · 2 years ago

Over a houndred dollars for 700 upvotes O_o

I wouldn’t exactly call that cheap 🤑

On the other hand, ten or twenty quick downvotes on an early answer could swing things I guess …

popemichael@lemmy.world · 2 years ago

For the companies who want a huge advantage over others, $100 is nothing in an advertising budget.

I have a small business and I do $1000 a week in advertising.

OtakuAltair@lemmy.world · edit-2 2 years ago

Yeah, 700 upvotes soon after a post is made could easily shoot it up to the top of even a popular sub for a few days (specially with the lack of mod tools rn), with others upvoting it purely because it already has alot of upvotes.

Zana@startrek.website · 2 years ago

I don’t know anything about advertising but what are you doing that costs $1000 a week? I am legitimately curious.

OsrsNeedsF2P@lemmy.ml · edit-2 2 years ago

Advertising is incredibly expensive. I pay upwards to $1/click for one of my services targetting a specific group.

If you hate ads, use something like Ad Nauseum instead of UBlock origin. You’ll cost companies hundreds of dollars a day.

Aran@livellosegreto.it · 2 years ago

deleted by creator

OsrsNeedsF2P@lemmy.ml · edit-2 2 years ago

Honestly, most of them :). If you’re reasonably wealthy (make above average wage), every ad you click will cost advertisers at least 25-50¢. The value of your clicks will go down a little depending on a few things, but anything on a website that serves its own ads instead of going through a 3rd party network (think Reddit ads) will stay in the 25-50¢ range, if not more

Aran@livellosegreto.it · 2 years ago

deleted by creator

Zana@startrek.website · 2 years ago

I do use As Nauseum, I love it!

popemichael@lemmy.world · 2 years ago

I run a digital currency investment group.

I can make 10-15k per day, so it’s not a lot in the grand scheme of things

sombrero@lemm.ee · 2 years ago

You have no idea about business expenses do you. I work in the events industry, corporations hold single evening events for their higher up employees for 10s of thousands in only technical expenses, before the venue asks for rent, or the catering etc. A single month of any basic service on the enterprise level starts from 5 grand.

fruitywelsh@lemmy.ml · 2 years ago

People are down voting you for responding to someone saying they don’t know and would like to know more with “you have no idea do you?”. Like yeah, they said so themselves.

elk_1337@lemmy.world · 2 years ago

People are downvoting because 1) the tone is unnecessary and 2) it doesn’t answer the question. Sure, huge businesses spend a lot of money. Over 95 percent of businesses have fewer than 100 employees though and depending on size and sector 1000 a week could be nothing or orders of magnitude larger than a small business’s advertising budget.

Zana@startrek.website · 2 years ago

You’re right, as I said, I don’t know. That was why I asked.

MeetInPotatoes@lemmy.ml · 2 years ago

You have no idea about business expenses do you.

Figure out punctuation first.

sombrero@lemm.ee · 2 years ago

super relevant, not everyone speaks english as a first language.

MeetInPotatoes@lemmy.ml · 2 years ago

Then those people should not try to insult others for their lack of knowledge about business while displaying a lack of proficiency in English.

14th_cylon@lemm.ee · edit-2 2 years ago

huge advantage over others, $100 is nothing in an advertising budget.

the only problem here is that 700 reddit upvotes is not “huge advantage over others”. i honestly fail to see how someone could pay $100 for that. i’d consider $10 too much.

or do you spend your $1000 budget on 7000 reddit upvotes? :D

why_rob_y@lemmy.world · 2 years ago

700 extra upvotes in the first couple hours on a medium sized hobby sub is an enormous amount and will give you great exposure to potentially tens of thousands of potential customers who won’t just ignore it like some banner ad (since they’ll think it’s real content).

AdmiralShat@programming.dev · 2 years ago

If you’re an indie dev marketing game, it’s cheap as shit. Shoving your post into the faces of thousands would very easily get you more than that in sales.

Usernameblankface@lemmy.world · 2 years ago

To me, the draw of Lemmy is that it’s not the same as it ever was here. I don’t know the internet before ads, this place is great!

Random_user@lemmy.world · 2 years ago

Cause the problem, sell the solution. What a degenerate.

sparr@lemmy.world · 2 years ago

Web of trust is the solution. Show me vote totals that only count people I trust, 90% of people they trust, 81% of people they trust, etc. (0.9 multiplier should be configurable if possible!)

CanadianNomad@lemmy.world · edit-2 2 years ago

deleted by creator

OsrsNeedsF2P@lemmy.ml · 2 years ago

Fwiw, search engines need to figure out what is “reliable”. The original implementations were, well if BananaPie.com is referenced by 10% of the web, it must be super trustworthy! So people created huge networks of websites that all linked each other and a website they wanted to promote in order to gain reliability.

shagie@programming.dev · 2 years ago

Web of trust is the solution. Show me vote totals that only count people I trust, 90% of people they trust, 81% of people they trust, etc. (0.9 multiplier should be configurable if possible!)

If this was implemented on the server, that implies a significant amount of information about said web of trust to be stored by the server admins. Furthermore, it would imply that that trust web is also federated out as if you’re a first tier trust of mine, for me (or a server calculating on my behalf) to evaluate the value of the likes and dislikes it would need your web of trust too… and transitively out.

If this was implemented on the client, that means effectively revealing the origins of all the likes and dislikes on an object. Aside from the “this can be a lot of data to send over the wire whenever someone looks at an active post” it also means that you wouldn’t need to be a server admin to see that data.

Either way, this approach with ActivityPub being the underlying protocol, would entail privacy violations and opportunities for bullying that anything Threads (from other threads of concern) could do would pale in comparison to existing bad actors.

P.S. Don’t trust me because I trust JohnDoe and he works for marketing at BigCo and might be persuaded to list some of his paid clients highly.

sparr@lemmy.world · 2 years ago

It could be implemented on both the server and the client, with the client trusting the server most of the time and spot checking occasionally to keep the server honest.

The origins of upvotes and downvotes are already revealed on objects on Lemmy and most other fediverse platforms. However, this is not an absolute requirement; there are cryptographic solutions that allow verifying vote aggregation without identifying vote origins, but they are mathematically expensive.

shagie@programming.dev · 2 years ago

Given that Lemmy isn’t that popular yet, how big would that payload and computational cost be when considering the votes on the highly active threads of !fediverse@lemmy.world … 1.5k votes with 960 comments. Or the highly active https://lemmy.world/post/1033769 (3k votes, with 1081 comments) from earlier this week.

interdimensionalmeme@lemmy.ml · 2 years ago

It’s nothing. You don’t recompute everything for each page refresh. Your sucks well the data, compute reputation total over time and discard old raw data when your local cache is full.

Historical daily data gets packaged, compressed, and cross signed by multiple high reputation entities.

When there are doubts about a user’s history, your client drills down those historical packages and reconstitute their history to recalculate their reputation

Whenever a client does that work, they publish the result and sign it with their private keys and that becomes a web of trust data point for the entire network.

Only clients and the network matter, servers are just untrustworthy temporary caches.

Opafi@feddit.de · 2 years ago

Any solution that only works because the platform is small and that doesn’t scale is a bad solution though.

interdimensionalmeme@lemmy.ml · 2 years ago

Client must computer all raw data. All individual moderation action (vote,block, subscribe) would be made public by default and stealth optional.

Only user led moderation has a future, it all has to be transparent, public, client sided, optional and consensual

sugar_in_your_tea@sh.itjust.works · 2 years ago

That sounds a bit hyperbolic.

You can externalize the web of trust with a decentralized system, and then just link it to accounts at whatever service you’re using. You could use a browser extension, for example, that shows you whether you trust a commenter or poster.

That list wouldn’t get federated out, it could live in its own ecosystem, and update your local instance so it provides a separate list of votes for people in your web of trust. So only your admin (which could be you!) would know who you trust, and it would send two sets of vote totals to your client (or maybe three if you wanted to know how many votes it got from your instance alone).

So no, I don’t think it needs to be invasive at all.

shagie@programming.dev · 2 years ago

The single layer web of trust on the server wouldn’t be terribly difficult.

A single layer web of trust on a client would mean that the client is getting sufficient information about all the votes to be able to weight them. This means that instead of “+4 -1” for the information that the client gets instead it would get that “shagie liked the object, JohnDoe liked the object, BadGuy liked the object, SomeoneElse liked it, and YetAnotherPerson disliked it.” That implies a lot more information being revealed to a client than many would be comfortable with.

Granted all of that is available if you federate with a system and poke in the database. It’s there. But this makes it really easy to get that information.

A transitive web of trust implies not only are you getting those votes and considering that “shagie liked the object” but also that you trust me and so that I trust JohnDoe is available to whatever is making that vote weighting calculation.

And while that single layer on the server isn’t too eyebrow raising, getting the transitive listing gets into the Facebook level of social graph building - but for all to see. I’m not sure that people would be comfortable with that degree of nakedness of personal information.

Consider also the data payload sizes. This post (rather mundane and not viral) has 243 comments. Some of them have over a hundred votes. How big of a payload do you want to get to send to the vote weigher (and back)?

Consider the load for… say… https://lemm.ee/post/843533

And for bad actors, all they have to do is cast a couple hundred votes on each comment (until they’re defederated and the database cleaned up by the admin) to DDOS the vote weigher.

sugar_in_your_tea@sh.itjust.works · 2 years ago

My point is you can have a mixed system. For example:

server stores list of “special interest” users (followed users, WoT, mods, etc)
server stores who voted for what (already does)
client updates the server’s list of “special interest” users with WoT data
when retrieving metadata about a post, you’d get:
- total votes
- votes from “special interest” users
- total votes from your instance

That’s not a ton of data, and the “special interest” users wouldn’t need to be synchronized to any other instance. The client would store the WoT data and update the server as needed (this way the server doesn’t need any transitive logic, the client handles it).

Zeppo@sh.itjust.works · 2 years ago

Facebook and Twitter have always had their equivalent of upvotes be public.

SQL_InjectMe@partizle.com · 2 years ago

What if the web of trust is calculated with upvotes and downvotes? We already trust server admins to store those.

sugar_in_your_tea@sh.itjust.works · 2 years ago

I think that could work well. At the very least, I want the feature where I can see how many times I’ve upvoted/down voted a given individual when they post.

That wouldn’t/shouldn’t give you transitive data imo, because voting for something doesn’t mean you trust them, just that the content is valuable (e.g. it could be a useful bot).

interdimensionalmeme@lemmy.ml · 2 years ago

Your client has to compute the raw data, not the server or else it will just be your server manipulating what you see and think.

nekat_emanresu@lemmy.ml · edit-2 2 years ago

Love that type of solution.

I’ve been thinking about an admin that votes on example posts to define the policy, and then getting users scored against it, then using high scorers to represent user copies of the admins spirit of moderation, and then make systems that use that for automoderation.

e.g. I vote yes, no, yes. I then run the script that checks my users that have voted in all three, and the ones with the highest matching votes that i define(must be 100% matching to my votes) gets counted as “matching my spirit of moderation”. If a spirit of moderation user downvotes or reports then it can be auto flagged into an admin console for me to then rapidly view instead of sifting through user complaints, and if things get critically spicy i can promote them to emergency mods, or automate their reports so that if a spirit user and a random user both report, it gets auto removed.

interdimensionalmeme@lemmy.ml · 2 years ago

For each vote, read user post content and vote history and age

This should happen in the client and easily controllable by the user. As well as to investigate why one particular post or current was selected by the local content discovery algorithm. So you can quickly find fraudulent accounts and block them.

And this public, user led moderation actions then go on to inform the content discovery algorithm of other users until we have consensus user led content discovery and moderation.

And just like that we eliminate the need for shadowy humans of the moderator priesthood to play human spamfilter / human thought manipulator

rDrDr@lemmy.world · 2 years ago

This was a great feature of reddit enhancement suite.

czarrie@lemmy.world · 2 years ago

The nice things about the Federated universe is that, yes, you can bulk create user accounts on your own instance - and that server can then be defederated by other servers when it becomes obvious that it’s going to create problems.

It’s not a perfect fix and as this post demonstrated, is only really effective after a problem has been identified. At least in terms of vote manipulation from across servers, it could act if it, say, detects that 99% of new upvotes are coming from a server created yesterday with 1 post, it could at least flag it for a human to review.

two_wheel2@lemm.ee · 2 years ago

It actually seems like an interesting problem to solve. Instance runners have the sql database with all the voting record, finding manipulative instances seems a bit like a machine learning problem to me

Pleonasm@programming.dev · 2 years ago

There’s an XKCD for that: https://xkcd.com/810/

flux@lemmy.ml · 2 years ago

One other thing is that you can bulk create your own instances, and that’s a lot more effort to defederate. People could be creating those instances right now and just start using them after a year; at least they have incurred some costs during that…

I believe abuse management in openly federated systems (e.g. Lemmy, Mastodon, Matrix) is still an unsolved problem. I doubt good solutions will arrive before they become popular enough to attract commercial spammers.

AeroSoap@lemm.ee · edit-2 2 years ago

deleted by creator

Black_Gulaman@lemmy.dbzer0.com · 2 years ago

Then they will just distribute their bots equally to other legit servers, and by that, defederation is not a viable solution anymore.

One other problem are real human troll farms

bdonvr@thelemmy.club · 2 years ago

If they can do that, they could’ve done it on a traditional site anyway

DigitalJacobin@lemmy.ml · 2 years ago

“Legit” instances are able to moderate/control the spam coming from their users.

7heo@lemmy.ml · edit-2 2 years ago

expired

nekat_emanresu@lemmy.ml · 2 years ago

Interesting idea.

TheGreatHerald@sh.itjust.works · edit-2 2 years ago

deleted by creator

7heo@lemmy.ml · 2 years ago

This could become a problem on posts only relevant on one server

Obviously, on the server the posts are from, you display the full vote count. There, the admins know the accounts, can vet them, etc.

kolorafa@lemmy.world · 2 years ago

This would be rather to detect and alert admin of a bad actors (instances) and then admin can kick it off from federation same for other tupe of offences.

SQL_InjectMe@partizle.com · 2 years ago

Small instances are cheap, so we need a way to prevent 100 bot instances running on the same server from gaming this too

7heo@lemmy.ml · edit-2 2 years ago

expired

Skull giver@popplesburger.hilciferous.nl · 2 years ago

How would you prevent someone using wildcard domains from spamming servers the same way they can spam clients? The Fediverse has no way to distinguish between subdomains and normal domains. Anyone running an instance through classic DDNS would be affected by this.

The approach could work, but it would invalidate some major assumptions in the Fediverse itself. The algorithm would also need to make sure a few single user instances don’t get to sway entire servers.

YoBuckStopsHere@lemmy.world · 2 years ago

Reddit admins manipulated vote counts all the time.

auth@lemmy.ml · 2 years ago

Reddit also created fake users to post fake content… At least in the beginning of reddit.

misterundercoat@lemmy.world · 2 years ago

TIL “beginning of Reddit” comprises the time up to and including July 2023.

OtakuAltair@lemmy.world · 2 years ago

It marks both the beginning and end

Marxine@lemmy.ml · 2 years ago

Alpha and Omega if you may.

throwing_handles@lemmy.world · edit-2 2 years ago

deleted by creator

daw_germany@feddit.de · 2 years ago

Aaaand…?

Flashoflight@lemmy.world · 2 years ago

This is really important to call out. Also though the bots have gotten so good it would be hard to tell the difference. To be honest though I’m pretty sure reddit was teeming withing them and it didn’t really bother me. lol

nekat_emanresu@lemmy.ml · 2 years ago

I have strong feelings about reddit being infested with bots too. And because reddit could, there’s no reason lemmy doesn’t have the same issue.

it didn’t really bother me

Bot armies could have hidden things from you that would bother you deeply, but because it’s hidden, you don’t have a chance to be bothered.

Robust Mirror@aussie.zone · 2 years ago

Ignorance is bliss?

fermuch@lemmy.ml · 2 years ago

Votes were just a number on reddit too… There was no magic behind them, and as Spez showed us multiple times: even reddit modified counts to make some posts tell something different.

And remember: reddit used to have a horde of bots just to become popular.

Everything on the internet is or can be fake!

Sean Tilley@lemmy.ml · 2 years ago

Honestly, thank you for demonstrating a clear limitation of how things currently work. Lemmy (and Kbin) probably should look into internal rate limiting on posts to avoid this.

I’m a bit naive on the subject, but perhaps there’s a way to detect “over x amount of votes from over x amount of users from this instance”? and basically invalidate them?

jochem@lemmy.ml · 2 years ago

How do you differentiate between a small instance where 10 votes would already be suspicious vs a large instance such as lemmy.world, where 10 would be normal?

I don’t think instances publish how many users they have and it’s not reliable anyway, since you can easily fudge those numbers.

Sean Tilley@lemmy.ml · 2 years ago

10 votes within a minute of each other is probably normal. 10 votes all at once, or microseconds of each other, is statistically less likely to happen.

I won’t pretend to be an expert on the subject, but it seems like it’s mathematically possible to set some kind of threshold? If a set percent of users from an instance are all interacting microseconds from each other on one post locally, that ought to trigger a flag.

Not all instances advertise their user counts accurately, but they’re nevertheless reflected through a NodeInfo endpoint.

CybranM@feddit.nu · 2 years ago

Surely the bot server can just set up a random delay between upvotes to circumvent that sort of detection

Andreas@feddit.dk · 2 years ago

Federated actions are never truly private, including votes. While it’s inevitable that some people will abuse the vote viewing function to harass people who downvoted them, public votes are useful to identify bot swarms manipulating discussions.

Wander@yiffit.net · 2 years ago

This. It’s only a matter of time until we can automatically detected vote manipulation. Furthermore, there’s a possibility that in future versions we can decrease the weight of votes coming from certain instances that might be suspicious.

hawkwind@lemmy.management · 2 years ago

And it’s only a matter of time until that detection can be evaded. The knife cuts both ways. Automation and the availability of internet resources makes this back and forth inevitable and unending. The devs, instance admins and users that coalesce to make the “Lemmy” have to be dedicated to that. Everyone else will just kind of fade away as edge cases or slow death.

Skull giver@popplesburger.hilciferous.nl · 2 years ago

I’ve set the registration date on my account back 100 years just to show how easy it is to manipulate Lemmy when you run your own server. Don’t believe everything the internet tells you that includes ChatGPT, the first page of Google’s search results, and the statistics Lemmy servers provide you with.

The problem isn’t that bad because defederating is always an option. To prevent subdomain attacks we need some kind of wildcard defederation system in Lemmy, but after that the whole voting ring issue is just a matter of detection.

gthutbwdy@lemmy.sdf.org · 2 years ago

I think people often forget federation is not a new thing, it’s a first design for internet communication services. Email, which is predating the Internet, is also federated network and most popular widely adopted of them all modes of Internet communication. It also had spam issues and there where many solutions for that case.

The one I liked the most was hashcash, since it requires not trust. It’s the first proof-of-work system and it was an inspiration to blockchains.

mintyfrog@lemmy.ml · 2 years ago

PSA: internet votes are based on a biased sample of users of that site and bots