Spam

From StatusNet
Jump to: navigation, search

Contents

What is spam?

A rule of Thumb:

  • Generating a lot of links
  • Links to topics typically found in spam (promoting drugs, buying products, home loans)
  • Not having significant social connections (10 or less followers)
  • Most of their subscribers are also spammy accounts
  • Not participating in conversations
  • Using terms like "SEO" or "Web marketing"
  • Posting off-topic notices to groups or tags
  • Is member of 25 or more groups

It can be difficult to clearly define what is or isn't "spam" in an open microblogging site. We often "know it when we see it" when there are a bunch of shady-looking links, but it's very hard to programatically distinguish a spambot spewing unwanted links from a legitimate tool bot like bugs@status.net.

Jordan Conway (used-to-be StatusNet-Staff) once defined the stance on spam on identi.ca: "...if your timeline consists solely of "how to make money" or the same links to scammy looking autopost blogs covered in ads…" we consider this spam. "Identi.ca is not a self promotion tool, it is a community site, moderated by the community. If the community has decided you should be banned 99% of the time they're right."

What to do after seeing a spammer?

  1. Use the block button. This removes a subscriber from your subscription list and flags him, too. But remember: Block won't prevent you from seeing older posts, just future ones.
  2. Reporting is better than flagging. - but is unsupported by StatusNet.
  3. Fighting Spam is a community thing. If you are ready to help, join !spamreport and report spammers to the group: "@support !spamreport spammer's name #UserID - what sort of spam". Please DO NOT link to spammer's account when reporting (don't use @): you're helping them with extra high-PR links! - note that StatusNet support people have asked repeatedly that people not use this method. Go ahead and do it if you want, but we really don't like it; and absolutely don't expect support for it.
  4. Seeking identica ModHelpers


listening to !spamreport:


Marjoleink has written a comprehensive Spam Report Package.
Bavatar has written a python script to ease reporting. here's a howto. and here is the documentation for the script
identicurse has an option to /nuke spammers, though it does @mention the spammer.

  • at the moment some spammers are silenced and blacklisted.
  • but most of the spammers are not deleted.
    • support somewhere [where?] said to collect spammers to find patterns to better find spammers to delete them better, you know ...
    • "deleting users and notices from identi.ca causes big lag between master and slave DB servers. So we don't do it." @evan, 2011-12-06
    • I'm working on some code to hide them from the Web interface even if they're in the database still. @evan, 2011-12-06



  • problem: group member blocking is limited to the group administrator(s) and site administrators
  • With User flag plugin (as on identi.ca), you can mark any profile on the site as 'flagged', which puts it on a list for site moderators to check out and if needed, block/delete.
    • problem: general perception is that this isn't very effective; there's not good feedback on the process, and it's hard to tell if anybody's watching the flag list.
      • How can we make this more transparent?
        • Could there be somehow a "counter" on how many people already flagged the profile in question? Then, when a certain amount of users have flagged the profile (I've no idea how many would be needed 10, 20, 100?), the account in question is automatically sandboxed until @support looks at it. This way it's immediately out of sight...
      • What does StatusNet need to do to make sure it runs smoothly?
  • Site moderators can mark particular accounts such that they won't appear in the public timeline.
    • problem: as above; moderator scaling issues...
  • Site moderators delete problem accounts.
    • problem: as above; moderator scaling issues...

Where do we want to avoid spam?


The public timeline

  • Community sites like identi.ca have a public timeline showing the latest posts from local users on the site. Generally we like that to represent the people in the community, and it should look nice! Nobody likes to see a bunch of viagra ads in their community's banner page.
    • problem 1: unwanted messages annoying everybody
    • problem 2: spammer's links are repeated in more places


Your home timeline

  • @-replies
    • Other user accounts on the same site, and folks from other sites if OStatus federation is set up, can send messages to you which will appear in your home timeline.
      • problem 1: unwanted messages in your timeline
      • problem 2: spammer's links are repeated in more places
  • Groups
    • When someone joins a group, they can post messages that will be received by everyone else in the group in their home timeline. Currently, StatusNet groups are open to join to anybody on the site (and potentially to people on other sites, if it's open-access and set up with OStatus!) You could end up with unwanted messages in your timeline through the group.
      • problem 1: unwanted notification in your email inbox
      • problem 2: unwanted messages in your timeline
      • problem 3: spammer's links are repeated in more places


Your subscriber lists

  • Any user on the site -- or remotely if OStatus is set up -- can subscribe to your public feed, and will be linked in your subscriber list page.
    • problem 1: unwanted notification in your email inbox
    • problem 2: unwanted noise in your subscriber list
    • problem 3: spammer's links are repeated in more places


Search results

  • A spammer's user profile and messages may appear in searches, clogging up results and making it harder to find legit people and notices.
    • problem 1: unwanted noise in search results
    • problem 2: spammer's links are repeated in more places


Hashtags

  • A spammer's messages may appear in hashtags, clogging up results and making it harder to find legit notes.


Groups

  • Spammer's messages are still visibles when he have been blocked, it should not.

How do we try to stop spammers before they start?

  • Registration captcha reduces bot-based spam account registrations
    • problem: doesn't stop everything
    • problem: is an additional annoyance for legit users (but one we're gonna have to put up with for now, sorry!)
  • Keyword/site blacklists on links, profile pages
    • problem: only works for known repeat offenders; slight changes in keywords/URLs can avoid blacklist
    • problem: blacklists can have 'collateral damage' to legit users even when the keyword looks very spammy
  • Partial closed communities
    • Smaller portions of communities might prefer to manage group membership or require 2-way relationship approval before group or @-replies can be sent.
      • problem: that doesn't help for deliberately-open communities

antispam-bot

okay, we need an antispam-bot. things to feed it with:


also have a look at Spam/Reports

Further ideas for fighting spam

  • Every time someone enters an URL in their profile the URL should be checked against the FLOSS TypePad AntiSpam db.
  • Time limit: If an account subscribes to N accounts in M minutes, it's probably spam and should either be auto-blocked or checked out by StatusNet staff or a trusted volunteer with the power to block spammers.
  • Use Project Honeypot.
  • Spambots like to hang out together and subscribe to each other.
  • Third-party services. There's a Mollom and BlogSpam plugin. We need plugins for Akismet and Defensio, and we need to make sure that those services know we'll be checking with them.
  • User reporting. "This is spam". Probably our first line of defense; data from here can help feed automated systems below. (available UserFlag plugin 0.9.0)
    • this can be connected with Throttles, so that when more than !SomeLargeNumber of people flags an account as spam, it is automagically throttled down to !SomeSmallNumber of posts/day, and so on; when an account wants to lift the throttle, they need to contact site moderators (this way there is actually less work for moderators!)
  • Captchas. These keep bots from doing things only people should do. I think the reCaptcha plugin is great for registration. Not sure how it would work for posting.
    • Captchas are bad! Please try almost everything else before resorting to this. --Forteller 21:06, 29 December 2009 (UTC)
  • Throttles. These keep the same account from posting too often. This is already in place, but I'm not sure how accurate it is.
  • IP lookups. We should try to prevent posting from known botnets or open proxies. We may want to keep our own IP block list.
  • Bayesian filters. Checking words, author, context, that kind of thing. Seems to be pretty effective.
  • Keyword filters. More direct: you can't say "viagra" on this system.
    • This is even worse than Captchas. Please no censorship --Forteller 21:06, 29 December 2009 (UTC)
  • Bad behaviour. Sniffs HTTP messages for tell-tale signs of poorly-programmed Web tools. Not sure it's going to be effective for StatusNet; poorly-programmed Web tools are our major interface.
  • Invisible field on signup. If one adds an invisible field on the registration page, one can easily stop spam-bots. Bots will most likely fill in all fields, but humans will not see the invisible field and thus not fill it out. Do not accept any registration where there's anything in the invisible field.
  • Invite Only Groups Or moderator approval to join.
  • Desktop Mod App A cross platform standalone installed app for the moderators to query things like common intro phrases, or all accounts with same home page URL etc including a mass deletion mode.
  • no group *creation* until you have ># notes
  • no group posting until you have ># notes
  • Report Button Alongside the "message" "redent" "reply", add a "report" which copies the user name from that dent into the text input area, and allows a reason to be added. When the reporting user sends it, it checks looks up the ID info of that spammer, if it's already reported, it doesn't submit, but returns a message to the reporter "spammer already reported, thanks", if it's not already reported, it includes the ID info and reason and sends to @support with the !sr group. You could also have a reported spammer auto hidden from the public timeline until support deals with it or at least auto hidden from the reporting users timeline, regardless of whether or not it was already reported. A button is much easier to click, than a script, or manually finding the info.
  • Leaderboard Table Everyone loves a leaderboard right? Why not have a monthly table of who reports the most spammers? It'd tie in with the idea of the button, so there's only the first to report, everyone else gets the "spammer reported, thanks" message. Each time someone reports one, it's added to the leaderboard, if support rejects it, it comes off the leaderboard. You could even get sponsors who appreciate a spam free Identi.ca to sponsor it, with prizes like Amazon vouchers. It'd also be a way of crediting, and publicly thanking those users who do go to the effort of reporting spammers.
Personal tools
Namespaces
Variants
Actions
Navigation
Status.net
Toolbox