nofollow on the StatusNet Cloud

Evan Prodromou's picture

Today we rolled out code on the StatusNet Cloud to set the "nofollow" relationship for certain links on public sites. I wanted to take a few minutes to describe what this will do, why we did it, and what users can expect.
 
Many search engines and other automated Web software use the incoming links to a Web page from other parts of the Web as a way to "rank" the page. High-rank pages are shown on search results higher than low-rank pages. Google's PageRank is the canonical example of this kind of system.
 
These algorithms date from ancient times on the Web, when HTML was hand-coded and content carefully screened by the publisher. In the age of user-generated content,  people use Web-based communications systems to share links with each other without the consent or approval of the site publisher: on wikis, in blog comments, and on social messaging systems like StatusNet. Although the vast majority of these links are well-intentioned, some people abuse these content systems to get more PageRank for their sites.
 
Search engines haven't kept up with this change in the Web; instead, they've put the onus on service providers to keep their search results accurate.  nofollow is special markup that Web site operators can add to their HTML output to say that it's a user-generated link that isn't screened or validated by the publisher. Algorithms like PageRank will skip links with the nofollow attribute.
 
With some systems, this is relatively easy. On a blog, for example, comment links may be "nofollowed" but the blog author's links are not. On some wikis, internal links and links to trusted sites are left alone, but all other links to external sites have the nofollow attribute. On a microblogging system like StatusNet, this is considerably harder. Who is responsible for your personal inbox? For a tag page? For a personal profile? Everything in a microblogging site is complex and intertwined.
 
In the changes we released today (which will be an optional part of the upcoming 0.9.2 version of StatusNet), we've tried to make reasonable compromises that discourage abuse of the system without unnecessarily disconnecting StatusNet sites from the rest of the Web. Our guideline was that users and groups could and should share links out to other sites, but that no one should be able to elect themselves to get "Google juice" from anyone else. So, these are the kinds of links we've added the "nofollow" relationship to:

  • Subscribers. If I subscribe to your stream, that means I find you and your stream interesting. If you subscribe to my stream, however, I have no idea who you are.
  • Group members. Joining a group doesn't mean that the group has chosen you.
  • People tags. People who tag themselves "ubuntu" or "php" don't necessarily deserve the votes of thousand of other people with the same tags.

 
We will continue fine-tuning our HTML output to strike a fair balance between Web presence and abuse discouragement. Any feedback or suggestions are very welcome.

Comments

Finetuning

What about more finetuning. Maybe all external links are nofollow. And if the same external link ist linked more than x times from more than x different users with moren than x followers, it would be a trusted source and get's a follow link. Every user could understand this solution, bt now it's confusing in my eyes.

Beginning of the end

This is a knee jerk reaction and has is the beginning of the end. when you start implementing "Gestapo" methods, then people move else where.
So what about the person who legitimately uses your network, but also for some link juice? Are they a spammer? No.
Although I can empathize somewhat with your rationale, you have stepped in the the realm of restrictive dictatorship and it inevitably spells the beginning of the end for your site and the beginning for another site with less restrictions.
Disagree all you want, but it's just the nature of the beast.

You had a good run.

It's not, in fact, a knee jerk reaction

We're very interested in having users with all kinds of usage profiles, and we especially think that commercial use of StatusNet is an excellent case that we want to support.
It is a mistake to think that all links on StatusNet are now nofollow; in fact, it's a tiny minority of links. The ones that we've chosen to nofollow are those where a user can choose to extract page rank from another user without consent. For example, if you subscribe to my feed, that's not in any way an endorsement by me of you. So the linkback under subscribers should be nofollow.
Companies that want to use Identi.ca and other status.net sites for SEO can still do so. But they're going to have to earn their links by getting followers and making valuable content contributions -- not just by following high-PR users or joining high-PR groups. That's the nature of the beast.
I heartily encourage anyone who wants to set up a "dofollow" network to take our source code and install a network. It should take about 5 minutes to get going.

Re: Knee Jerk

Thats cool, but you just invited the bots just like on Facebook and Twitter. Would have been better off leaving as is.

Bad move

you just lost users.

On the contrary

Our traffic is up significantly since we made this move.

No follow to stop spammers-maybe try this instead

I don't know much but it seems to stand to reason that there are some unintentional spammers out there-real people who have been misguided that in order to get THEIR own personal website ranked high is to have alot of back links to other sites including this one. So if you turn on nofollow for these hard working people all there posting is in vane right? If you use nofollow-half the people that post wont even use your service-the ones who are submitting recycled, re-spun articles-they are NOT trying to get their "voice" heard-they want ranking! They are using auto-spinning software-kinda like a bot. So, instead of nofollow-have them use CAPTCHA before posting. Or have captcha pop up for them after 3 posts in one day. The same thing you are having me do right now to post this comment LOL!

i didn't like the nofollow it

i didn't like the nofollow it is not necessary that all user are spammers only

Geia sou Evan! I think that

Geia sou Evan! I think that implementation of nofollow is not good idea. DoFollow links in user generated content in Identica, for example, are one of the most important advances of Identica over Twitter for many members.

Since outbound links which are dofollow can't make any damage to your website, you should let your users to post dofollow links. It is also ok to reward loyal members with some link juice.

With nofollow StatusNet will not be so Free and so Open...

Being a new user doesn't

Being a new user doesn't necessarily mean that you intend to spam the site. I am not for automated messages or bots. I don't also believe that we should use automated spam-fighting programs to moderate comments.

I have head this experience wherein I really wanted to make a comment on a post I came across as I was surfing the web, only to get my comment deleted by those programs.

I still think it is better to manually screen post comments. It can be time-consuming, but it is to ensure no spam will really get into your blog, and you will still be able to have sensible comments on your posts.

Nofollow = Nothing changes

I'm not a friend of nofollow, it may stop the linkjuice flowing, but it doesn't stop the negative experience of users from mass spam messages. It doesn't change or improve anything.

Spammers don't care for follow or nofollow, they start the bots and hope people click their spam. Twitter is all nofollow and full of spambots.

I have a blog that had follow comments, askimet caught over 50 spams daily. I switched to nofollow half a year ago because I was told it would reduce the problem, the number of daily spams increased to about 120 a day...so much for nofollow doing any good.

So, instead of all this link attribute discussion, we should talk about how to prevent spam effectively instead of fighting windmills.

Pretty much agree, but ...

This sounds like a decent implementation of nofollow. Its not an indiscriminate disconnect of everything every status.net user contributes, but it does kind of force profiles to have their own authority to a greater degree. Blog comment spam doesn't care about nofollow, Twitter spam doesn't care about nofollow, so it would be naive and would undermine one of Status.net's strengths if they were trying to put that as a blanket rule.

From running a small SN site, I'd something that silences a user from all public broadcast without ever really letting them know what happened. I haven't really looked in to it though so I'm not sure what is currently possible along those lines. Deleting spam every day is a frustrating job, but I guess I'm just getting used to it.

Completely disagree.

For a user to find your site and start posting does NOT mean he or she is a spammer, It could be that you have spend perhaps far to long on the internet and are unaware of JUST how clicky it is.

Perhaps some times it might sound strange but content can be posted by a newbie and it actually is decent?

Please Apply It Broadly, Not Narrowly

I think nofollow should be applied to all user-generated content (possibly excepting your paid customers' statusnet sites). Here's why: http://file.status.net/lnxwalt/lnxwalt-20100406T011931-hphfbsr.png

That's a screen grab from the front page of 1000.status.net. There are almost no real users, but dozens of spammers (automated?) have hours-long posting sprees. Since no site or project should be dependent upon microblog postings for search rankings, it is reasonable to prevent the misuse of cloud sites (and other hosted sites) in this way.

Search engines would still be able to index full-text content, so they'll pick up individual posts and any general "buzz" around a site or topic, but it will make it harder for abusers to profit from their misdeeds.

Thanks for the pointer

I've just asked our support staff to get all the same anti-spam tools we're using for identi.ca onto our public community sites like 1000.status.net.
 

nofollow is not a solution

I think broad implementation of nofollow links is not a solution. I mean have a look at twitter. Nobody knows how many real users there are, but I guess to say that the half of them are bots is very a optimistic judgement. To reduce spam there is supposed to be another solution; e.g. security improvement (at least this would definitely work for automated spamming)

Different Purposes Need Different Countermeasures

Spammers do their deeds for different purposes. Some are after a higher search ranking (SEO). Those are the ones that nofollow is effective against. If broadly and consistently applied, it will eventually stop nearly all SEO spammers on a site.

Others are trying to bring direct traffic, so their ads (or malware) can appear on more viewers' computers. For these, nofollow is not effective. Also, since these individuals' usage profile closely matches that of legitimate users who use links to advertise their sites' content, there aren't really preventive measures, so they require remedial measures. These are the users where "flag & bag" can be effective, if and only if there are also measures that prevent a bagged user from quickly establishing another account.

A lot of this second type of spam happens when someone realizes that they don't have much value to offer, so they seek shortcuts around people's defenses. Many of them are so obviously low-quality that they cannot get into search rankings at all. Or maybe they are a low-budget operation and therefore go for a manual approach.

Still others are automated and may not ever know whether nofollow is enabled or not. For such users, a CAPTCHA may deter many users. But having a real person do the sign up and then letting the bot do the subsequent posting isn't too hard for spammers to implement. With automation, the goal is to open a lot of accounts or spread a lot of spam in a short time period. These can be in either of the above groups.

At the core is the perception of a high return for very little effort. What spam countermeasures should do is raise the apparent effort level needed for that return without hindering regular users so much that they also leave. The Yahoo chat rooms (and IM) are instructive. As the awareness spread of an easy way to spread one's promotional message to a global group of users, the number of spambots soared. Eventually, a lot of people stopped using Y!chat.

By the way, Twitter, with its spam problems, is nonetheless quick to close accounts for "strange activities" like a rapid increase in subscriptions in a new account. Frequently, I'll be notified of a new follower, but before I can check on that user, they have closed the account. It isn't fair to compare the two (Identica/StatusNet and Twitter), because I think they have different goals.

This short course in spamming explains why a layered approach is necessary, and why "measured response" doesn't work.

Exactly!

Exactly!

New users

How about adding nofollow to all links posted by a new user, until she/he has been a member for X weeks?

Heuristics are hard

It's hard to tell who's a good person and who's a bad person. That's not really our job.
We may build more heuristics into our system in the future, but it's not on the agenda right now.

I just mean that a new user

I just mean that a new user is more likely a spammer than an old user.

Our open responsibility

Different communities have different priorities. Private StatusNet sites don't really give a hoot about allowing in new users, so they'll be as restrictive as they can be. Other more targeted communities might have open registration but very restrictive participation.
As the flagship site on the OStatus network, and as the main social network for the Free and Open Source software/open standards/open web/open content communities, I think Identi.ca has a responsibility to be as open as possible. But no opener!
We do not yet have the peripheral device that will look into the heart of a person and determine if their intentions are pure or wicked. We can and will develop heuristic systems to try and detect problem users before they become abusive, and restrict their usage of the software accordingly. These systems will depend on the user's social network, their posting patterns, their time on the site, the content of their posts, their network location.
But heavy-handed applications of inaccurate "common sense" rules will be counterproductive. These rules will need to be tuned to make sure they're predictive of bad behavior. That's not easy; SpamAssassin uses neural-network algorithms to weight its different rules.
So: thanks for the idea. We'll definitely include length of registration in any heuristics we use in the future.
 

I thing its better idea to

I thing its better idea to put nofollow just to new user ... example after 1month they will get their do follow feature :)

Post new comment

Please note that blog comments are not monitored by our support staff. If you need assistance please visit our forums at forum.status.net or see the Support page for other options.
The content of this field is kept private and will not be shown publicly.
CAPTCHA
This question is for testing whether you are a human visitor and to prevent automated spam submissions.
Image CAPTCHA
Enter the characters shown in the image.