Unveiling Tweepcred: The Power Behind Twitter's Recommendation Engine

Unveiling Tweepcred: The Power Behind Twitter's Recommendation Engine
April 01, 2023

You've seen some people on Twitter with a certain kind of clout, their tweets garnering likes, retweets, and replies with almost magical efficiency. But have you ever wondered what powers this influence? Today, we're diving into the mysterious world of Tweepcred, the service behind the scenes that calculates a user's Reputation on Twitter. You don't need to be an expert in Spark or batch processing – we'll break it down in a friendly, approachable way that you can digest without a headache.

What is Tweepcred?

Tweepcred is a social network analysis tool that calculates the influence of Twitter users based on their interactions with other users. Think of it as the Reputation points you earn on the platform, which Twitter uses to determine who should be recommended to follow or whose content should be highlighted. Tweepcred leverages Google's PageRank algorithm to rank users based on mentions, retweets, and more interactions.

Initially developed by Google to rank web pages in search results, PageRank has become a cornerstone of modern search engine technology.

PageRank was created by Google founders Larry Page and Sergey Brin while they were still students at Stanford University. The primary purpose of the algorithm was to assign a numerical score to each web page based on the number and quality of other pages that link to it. The more high-quality links a page has, the higher its PageRank score. This allowed Google to provide users with more relevant search results, revolutionizing the world of online search.

At a high level, PageRank treats web pages as nodes in a graph, with hyperlinks acting as edges connecting these nodes.

The algorithm distributes the scores across the graph iteratively, and after a certain number of iterations, the scores stop changing or change very little. When this point is reached, the algorithm is considered to have reached a stable state. This stable state signifies that the PageRank scores have balanced out, effectively ranking the nodes based on their importance or influence within the network.

This process helps identify the most important pages in the network, ensuring they're ranked higher in search results. Now, you might wonder how this web page-centric algorithm could apply to Twitter users and their influence. Well, that's where Tweepcred comes in, adapting the PageRank algorithm to analyze Twitter users and their interactions in a similar manner. So let's jump back to Tweepcred and see how it uses PageRank to measure the clout of your favorite Twitter personalities!

How Tweepcred Works: A High-Level Overview

At a high level, Tweepcred uses the PageRank algorithm to create a graph of Twitter users (nodes) and their interactions (edges). It then assigns a numerical score to each user based on the number and quality of their interactions with other users. The more interactions users have with other high-quality users, the higher their Tweepcred score. The magic behind Tweepcred comes from a series of Scala classes that work together to calculate a user's reputation score. To give you a clear understanding of how the system works, we'll be discussing the main classes and their functionalities.

UserMass

The UserMass class plays a crucial role in calculating a Twitter user's " mass, " representing their Reputation on the platform. The mass score is used in various applications to determine which users should be recommended to follow or which users should have their content highlighted.

To calculate the mass, the UserMass class employs a sophisticated algorithm that considers multiple factors related to the user's profile and their activity on Twitter.

The getUserMass method of the UserMass class receives a CombinedUser object containing information about a Twitter user. It returns an optional UserMassInfo object, which holds the user's ID and calculated mass score.

When computing the mass score, the algorithm considers several factors, such as:

  • Account age: The duration for which the user has been active on Twitter. Older, more established accounts tend to have a higher mass score.
  • The number of followers: A larger follower count often implies more significant influence and, therefore, a higher mass score.
  • The number of followings: The number of users a person follows can also impact their mass score, especially when compared to their follower count.
  • Device usage: The types of devices used to access Twitter may also contribute to the user's mass score.
  • Safety status: Whether the user's account is restricted, suspended, or verified can play a role in determining their mass score.

The algorithm calculates the mass score by combining these factors using various weightings and adjustments. For instance, it might add or multiply weight factors associated with each element to compute the final score. Additionally, the algorithm may apply a threshold for the number of friends and followers to adjust the mass score based on the user's overall engagement on the platform.

https://github.com/twitter/the-algorithm/blob/main/src/scala/com/twitter/graph/batch/job/tweepcred/UserMass.scala

TweepcredBatchJob

This class represents a batch job for computing the Tweepcred score using the weighted or unweighted PageRank algorithm. It extends the AnalyticsIterativeBatchJob class, part of the Scalding framework for data processing on Hadoop. The class is responsible for configuring and running the batch job. It takes in command-line arguments like the --weighted flag, which determines whether to use the weighted PageRank algorithm. The run method prints batch statistics after the job has finished, while the children method defines a list of child jobs that need to be executed as part of the batch job. https://github.com/twitter/the-algorithm/blob/main/src/scala/com/twitter/graph/batch/job/tweepcred/TweepcredBatchJob.scala

ExtractTweepcred

This job calculates Tweepcred from a given PageRank file. It adjusts the scores based on the user's follower-to-following ratio if the post_adjust flag is set. The class reads the PageRank file and a user mass file in TSV format and combines them to produce a new PageRank file with the adjusted values. The adjusted PageRank is then used to calculate Tweepcred values written to output files.

https://github.com/twitter/the-algorithm/blob/main/src/scala/com/twitter/graph/batch/job/tweepcred/ExtractTweepcred.scala

PreparePageRankData

This class prepares the graph data for the PageRank calculation, generating the initial PageRank and starting the WeightedPageRank job. It reads user mass and graph data, generates the initial PageRank from the graph data, writes the number of nodes to a TSV file, and dumps the nodes to another TSV file. The class also has several options for fine-tuning the PageRank calculation.

https://github.com/twitter/the-algorithm/blob/main/src/scala/com/twitter/graph/batch/job/tweepcred/PreparePageRankData.scala

WeightedPageRank

WeightedPageRank is a class that performs the weighted PageRank algorithm on a given graph. The algorithm starts from a given PageRank value and performs one iteration, then tests for convergence (When the numbers don't change anymore or change very little).

If convergence hasn't been reached, the algorithm clones itself and starts the next PageRank job with the updated PageRank as input. If convergence has been reached, the algorithm starts the ExtractTweepcred job instead.

This class takes in several options, including the working directory, the total number of nodes, nodes file, PageRank file, absolute difference, whether to perform weighted PageRank, the current iteration, maximum iterations to run, probability of a random jump, and whether to do post adjust.

https://github.com/twitter/the-algorithm/blob/main/src/scala/com/twitter/graph/batch/job/tweepcred/WeightedPageRank.scala

Reputation

This helper class, Reputation, contains methods for calculating a user's reputation score. The scaled reputation method takes a raw PageRank score and returns a reputation score on a scale of 0 to 100. The adjustReputationsPostCalculation process reduces the PageRank of users with a low number of followers but a high number of followings, calculating a division factor based on the ratio of cults to followers and dividing the user's PageRank by this factor. The method returns the adjusted PageRank.

https://github.com/twitter/the-algorithm/blob/main/src/scala/com/twitter/graph/batch/job/tweepcred/Reputation.scala

Putting It All Together

Now that we've looked at the main classes that power Tweepcred let's see how they all work together to calculate a user's reputation score.

  • First, the TweepcredBatchJob starts the process by running a batch job to compute Tweepcred scores using the weighted or unweighted PageRank algorithm.
  • PreparePageRankData reads user mass and graph data, generates the initial PageRank, and starts the WeightedPageRank job.
  • WeightedPageRank performs the PageRank algorithm on the graph data, checking for convergence after each iteration. If the algorithm converges, it starts the ExtractTweepcred job.
  • ExtractTweepcred adjusts the PageRank scores based on the user's follower-to-following ratio and calculates the final Tweepcred scores.
  • The Reputation helper class is used throughout the process to convert raw PageRank scores to reputation scores and to adjust the scores post-calculation.
  • UserMass calculates the "mass" of a user on Twitter, contributing to the reputation calculation.

Twitter's Biggest Ranking Factors

Like SEO ranking factors, understanding and focusing on these elements can help you optimize your Twitter presence and enhance your influence. Here are some of the most critical factors for ranking on Twitter, according to Tweepcred's logic:

  • Mentions and Retweets: In the Tweepcred service, interactions like mentions and retweets form the edges of the network graph. The more you are mentioned or retweeted by other influential users, the higher your PageRank score, indicating a more significant influence on the platform.
  • Quality Connections: The PageRank algorithm considers the number of interactions and the quality of users interacting with you. Engaging with influential users in your niche can help boost your ranking, as their high PageRank scores positively impact your score.
  • Consistent Activity: The UserMass class considers age when calculating a user's mass. Maintaining a constant presence on the platform by regularly posting engaging content and interacting with others helps improve your Reputation and contributes to a higher Tweepcred score.
  • Follower-to-Following Ratio: The Tweepcred algorithm adjusts the PageRank values based on the user's follower-to-following ratio. A higher percentage, which indicates a more influential account, can increase your Tweepcred score.
  • Safety Status: The UserMass class also considers the user's safety status, such as whether the account is restricted, suspended, or verified. A verified account or one in good standing is more likely to have a higher mass score, which can, in turn, boost the Tweepcred ranking.

To improve your Tweepcred ranking and overall influence on Twitter, focus on fostering quality connections, engaging consistently with your audience, and maintaining a healthy follower-to-following ratio. By incorporating these factors into your Twitter strategy, you can enhance your presence on the platform and achieve long-term success.

Tweepcred is a powerful tool that helps Twitter understand the influence and Reputation of its users. The PageRank algorithm and a series of Scala classes can calculate a user's reputation score based on their interactions with others on the platform. This helps Twitter recommend users to follow and highlight content from influential users.

Now that you have a solid understanding of Tweepcred and how it works, you can appreciate the magic behind the scenes on Twitter. And who knows this newfound knowledge will inspire you to create your own reputation engine for your platform!