Discover more from Pirate Wires
Twitter Open Sourcing Its Algorithm, Explained
a breakdown of how twitter's recommendation algorithm works, and the implications of the company making it open source
Today, Twitter open sourced the code for its recommendation algorithm, the company announced in a blog post. The flow chart above shows the main components of the algo, and broadly describes the decision matrix by which Twitter shows users tweets. Find the GitHub depot here.
In its blog post, Twitter indicates that the main goal of the recommendation algorithm is essentially to optimize user engagement:
The recommendation pipeline is made up of three main stages that consume these features:
(1) Fetch the best Tweets from different recommendation sources in a process called candidate sourcing.
(2) Rank each Tweet using a machine learning model.
(3) Apply heuristics and filters, such as filtering out Tweets from users you’ve blocked, NSFW content, and Tweets you’ve already seen.
The blog post goes into detail about each of these three stages. For example, candidate sourcing, as the company refers to it, is the process by which the algorithm identifies potential tweets to surface in the recommendation timeline. Currently, the company starts with a target of 1,5000 tweets, which is on average split evenly between people you follow, and people you don’t:
For each request, we attempt to extract the best 1500 Tweets from a pool of hundreds of millions through these sources. We find candidates from people you follow (In-Network) and from people you don’t follow (Out-of-Network). Today, the For You timeline consists of 50% In-Network Tweets and 50% Out-of-Network Tweets on average, though this may vary from user to user.
The algorithm treats In-Network and Out-of-Network tweets differently. In-Network tweets are ranked using Real Graph, which optimizes for engagement. Identifying relevant Out-of-Network tweets is “trickier,” and is focused on the issue of finding tweets that will be most relevant to the user. The company decides what’s most relevant by attempting to show users tweets that people they follow engaged with, and tweets from people who have liked tweets similar to what the user has himself liked. Twitter also uses “Embedding space” to determine what’s relevant —
Embedding space approaches aim to answer a more general question about content similarity: What Tweets and Users are similar to my interests?…
There are 145k communities, which are updated every three weeks. Users and Tweets are represented in the space of communities, and can belong to multiple communities.
We can embed Tweets into these communities by looking at the current popularity of a Tweet in each community. The more that users from a community like a Tweet, the more that Tweet will be associated with that community.
“At the end of the day you should be able to trust what you see, and know that it is not manipulated, or is the least manipulated information in the world,” CEO Elon Musk said in a Twitter Spaces today upon release of the algorithm’s code.
“Our optimization is unregretted user-minutes,” he said. “We don’t want users to have a hangover when they’re done.”
Now that the algorithm is open sourced, what’s next?
“The goal is to build trust through transparency with users,” Musk said in the Twitter Space. “I don’t think you should trust any social media algorithm that is a black box and you don’t know what’s going on in there. We’re trying to be the most trusted place on the internet, where you know why things are happening on Twitter. And it [should be] the least game-able system on the internet, is our goal.”
To "open source" something means to make the source code, design, or content of a project, product, or software freely available to the public. By doing this, Twitter is now allowing anyone to view, modify, and distribute the material, typically under specific licensing conditions that ensure the open nature of the project.
When a project or software is open-sourced, it encourages collaboration, innovation, and transparency. People from around the world can contribute their ideas, skills, and expertise to improve the project, fix bugs, or create new features. This process can lead to faster development, increased reliability, and a stronger sense of community involvement.
“It’s going to be quite embarrassing [at first], and people are going to find a lot of mistakes that we are going to fix quickly,” Musk said.
What has the public found in the Twitter algo code?
Amjad Masad says the algorithm identifies four categories of Twitter users: power users, democrats, republicans, and Elon Musk himself.
Tanay Jaipuria found code that inserts potentially irrelevant tweets in the recommendation timeline, just because the user has Twitter Blue.
@0xCygaar discovered, potentially, five things that determine a tweet’s reach: the account’s number of blocks, mutes, abuse reports, spam reports, and unfollows.
This article is being updated throughout the day today, and some parts were drafted by GPT-4.