Plugin:TwitterBridge

From StatusNet
Revision as of 08:22, 22 March 2012 by March (Talk | contribs)
Jump to: navigation, search

Version: self::VERSION
by: Zach Copley, Julien C
source: http://gitorious.org/statusnet/mainline/trees/master/plugins/TwitterBridge
description: The Twitter "bridge" plugin allows integration of a StatusNet instance with Twitter


Contents

Bridge model

Unlike OStatus federation, we have no way to inform the Twitter service that one of our users wants to follow one of their users, and their service provides no way for one of their users to directly follow one of our users.

As a result, we have to work with a bridge model rather than a federation model. You create separate accounts on the StatusNet and Twitter sides, and the bridge sends messages back and forth between the two sides. Other Twitter users follow your Twitter account, and other StatusNet users follow your StatusNet account.

The bridge is designed with the StatusNet account as primary:

  • your posts on StatusNet -> reposted on Twitter for your Twitter followers to see
  • posts from your Twitter followees -> reposted on your StatusNet friends timeline for you to see (if enabled)

This has some annoying limitations, such as conversations and replies sometimes being split between the services, but in many respects it works reasonably well... as long as we can actually get messages moving in real time.

Export to Twitter

Exporting basic messages over the bridge to your Twitter account has been working pretty well for quite some time.

State as of 0.9.5

In 0.9.5, some new features were added which can toss some more metadata over the bridge when manipulating a notice known to have been bridged to/from Twitter:

  • repeating a notice will use Twitter's retweet API instead of posting an 'RT @blah' status update.
  • faving/unfaving a notice on StatusNet will duplicate the favorite state on the Twitter side
  • deleting your own notice will delete its copy on Twitter

Execution model

After a notice is posted into the system, it's run through a number of background queue handlers for processing. One of these checks to see if the user who posted has connected their Twitter account and configured it to send messages over the bridge.

If so, the queue handler reposts the notice text as a Twitter status update using the user's stored OAuth credentials.

Fave/unfave and delete actions are currently mirrored immediately.

Todo

  • Work out issues with confusing @-reply settings
    • allow for explicit @foo@twitter.com addressing as well?
  • Run bridged meta events (fave/unfave, delete) from background queues to improve interactive performance of main stuff
  • Improve failure modes in the queueing: if Twitter is down for a while, we should resend when it's back up!
  • Currently we have no way to know if an OStatus-borne message was bridged to Twitter from its origin site, so the special fave & repeat bridging won't work on them.

Import from Twitter

State as of 0.9.5

So far we've had a functional, but fairly limited, ability to import notices from Twitter into StatusNet.

0.9.5 added some ability to pull more metadata as well, such as proper marking of retweets as repeats. In the incoming case, we're able to pull the original remote notice in silently even if we hadn't seen it before, since all needed info is passed to us with the retweet.

Statuses from private/protected Twitter streams are not currently imported; nor are direct messages.

Execution model

There's a standalone daemon, twitterstatusfetcher.php which can be run in the background for each individual StatusNet site with importing enabled. This daemon periodically fires up and goes through the list of all users who have connected Twitter accounts and enabled import of their friends timeline.

For each such user:

  • Make a request to the Twitter API for the friends timeline, for changes since the last recorded status update on that timeline
    • For each status:
      • Check if the status was already imported; if not:
        • Check if we have saved a profile for the sender's Twitter account; if not:
          • Create one, saving the avatar etc.
        • Save as a local notice with the local profile, marked as having come over a remote gateway.
      • Add the imported notice to the user's inbox.

For an individual single- or few-users site this isn't too awful, but polling means it may take a few minutes for messages to make it in over the bridge.

For a high-scale site this is completely untenable; polling through a giant list of users on a big site takes too long and makes too many API hits to be practical. Worse yet, if you have a large number of small sites with shared infrastructure, you need to run one daemon per site.

Todo

There are some issues to clean up in the actual importing:

  • local addressing/inbox delivery seems very ad-hoc
    • messages don't go through queues for things like XMPP and Meteor delivery, so don't get sent to people reading through those systems.
  • Need to devise a way to send direct messages to Twitter users
  • Need to make import scalable

Scaling: next steps

I'm starting to experiment with Twitter's streaming APIs, which are stabilizing nicely. The user streams API is now officially in production, and the still-beta site streams now support fetching the friends timelines of multiple users at once which should be friendlier for bulk-hosting.

provisional work branch

Provisional execution model for a single site:

  • Start a daemon using the IoMaster/IoManager async IO daemon infrastructure our queues are built on...
    • Pull the list of users with Twitter import enabled.
    • Divide the list into chunks of 100 (maximum of 1000 chunks for now without special arrangements -- that's 100k users)
    • For each chunk, open a streaming HTTP connection to the server, which will receive quasi-realtime updates for those 100 users.
    • Run async i/o loop...
      • when a message comes in over one of our sockets, check which user it belongs to and pass the message & destination user into a background queue for import.
      • if a connection goes off, restart it with the same(?) target listeners -- use sensible backoff behavior etc
      • if a new user sets up Twitter import, open an individual stream for them...
        • periodically recombine the individual streams into new group streams until you've got another full chunk of 100

The actual status->notice import would be the same as in the polling model, but running in the general queues:

  • Check if the status was already imported; if not:
    • Check if we have saved a profile for the sender's Twitter account; if not:
      • Create one, saving the avatar etc.
    • Save as a local notice with the local profile, marked as having come over a remote gateway.
  • Add the imported notice to the user's inbox.

Note there are some potential pitfalls:

  • Streaming APIs only give you current data -- if the importer daemon is disconnected for a while it can miss things.
    • May still need to poll to fill in gaps.
  • We may get multiples -- for instance if we have 200 users who all listen to the same guy on Twitter, it looks like it'll deliver it 200 times, individually addressed to each user's timeline.
    • May need to more aggressively check for duplicates before passing things back to the queues so we don't have weird conflicts.

A note on authentication for the site streams... If I understand it correctly, the stream API hit needs to authenticate with the Twitter credentials of the Twitter user who registered the application's OAuth consumer key. Then it can pull data for listed user IDs who have valid OAuth tokens for that application. I'm not 100% sure we can transition things cleanly with the existing identi.ca settings.

(Worst case scenario: register a new app and make people reauth under the new key when they turn on importing.)

(As a side note -- the ability of the app owner to pull your timelines and direct messages may be a surprise to people using OAuth for non-web applications, where the individual auth tokens are usually kept on the user's device instead of centrally with the application author. But it makes perfect sense for web apps, which would be storing your keys on your behalf anyway.)

Scaling: even nexter steps

It should be possible to extend things a bit more to have a single daemon listening on behalf of multiple sites, just as our queue daemons do.

This may require a central table to store import state, so we don't have to cycle through tens of thousands of databases checking for users that need to be followed at startup.

A few sites might have their own application consumer keys and would thus require a separate connection, but most will probably run under a common status.net key. That'll let us combine connections, so one streaming connection under one app key + owning user can listen to timelines on behalf of up to 100 users regardless of which individual site they're on.

Personal tools
Namespaces
Variants
Actions
Navigation
Status.net
Toolbox