Co-Following on Twitter


Our recent paper about second order “Co-Following” on Twitter was accepted in ACM Hypertext 2014 (short paper). This is work that I did mostly while I was at QCRI, with Ingmar Weber. The idea is that two Twitter users whose followers have similar friends are similar, even though they might not share any common followers. The intuition behind why this works is that the friends of a user typically represent the interests of that user, and so if two Twitter users have followers who have similar interests, they might be similar.

As an example of our approach, consider Figure 1 below. Consider the twitter accounts of two (unrelated, not so popular) football clubs, from Belgium (@Lierse) and Italy (@ACF_Fiorentina). Directed edges from users (foll1-foll5 in the middle) indicate the following behavior of these users. In this example, we see that the two accounts do not share a single common follower. However, many of their followers are “co-following” some common accounts, such as @FIFA or @FCBarcelona. This can be used to deduce the similarity or rather closeness of the two accounts, @Lierse and @ACF_Fiorentina. Most existing approaches try to measure similarities between two accounts using the number of common followers they have. This would fail to work in the above example, since the two accounts do not share many followers. However, the two accounts are related to football and are close in that sense. Our approach would complement existing approaches by extending to the 2nd order network and thus enabling us to measure similarities of pairs of users who (i) do not share a lot of common followers, (ii) do not have a lot of followers. At first sight, this idea is similar to using common links (co-citations) for clustering web pages. However, typical co-citation or co-linkage approaches would focus on the “1-hop backward” links only and then looking at overlaps. In our analysis, we make crucial use of the added “forward” links. In a sense, we are using 2nd order co-citation or co-following rather than ordinary 1st order co-citation.

This idea has a lot of cool applications in (i) language-agnostic user classification, (ii) user recommendation, (iii) cross-selling and marketing opportunities, etc.

Co-Following example

Starting from seemingly unrelated twitter accounts (@Lierse and @ACF_Fiorentina), our co-following analysis uses both the 1st-hop backward edges to the users and the 2nd-hop forward edges to other co-followed accounts.


User classification

The idea of co-following can be used for language-agnostic user-classification on Twitter. We tried to see if we can use co-following for predicting if a user will follow, from arguably interchangeable rival companies like @CocaCola vs. @Pepsi or @Puma vs. @Nike. We observed that, even after removing obvious co-following features, the prediction AUC is as high as 80%. Figure 2 below shows the results.

AUC for prediction task

Average AUC-ROC across the 18 binary classification tasks (detecting preference among rivaling alternatives) as more and more relevant features are removed. Error bars indicate the standard error across the tasks.

Cross-selling opportunities

Does the fact that you follow @CocaCola tell something about your music preferences? Our preliminary results indicate a signal in that direction. We looked at the top features belonging to different categories of Twitter users and found interesting results. Figure 3 below shows the comparison of top features for @GOP and @TheDemocrats for categories Music, News and Sports. The lifestyle correlations for the political rivalry @GOP vs. @TheDemocrats can be inspected to make intuitive sense with, e.g., @nytimes being more popular among @TheDemocrats followers (The New York Times is generally perceived to have a liberal bias, see this).

Cross-selling image

Interesting observations from top features in different categories. This confirms stereotypes such as that the country singer Kenny Chesney (@kennychesney) is more popular among @GOP followers, whereas Lady Gaga (@ladygaga) enjoys more support from @TheDemocrats followers.

Community detection

We performed multidimensional scaling (MDS) using pair-wise similarity scores obtained from the co-following features and observed some interesting results. The figures below show some of the MDS plots obtained. Most of the observed structure corresponds to musical genres. For example, Lil Wayne (@liltunechi), Chris Brown (@chrisbrown) and Drake (@drake) are rappers and are co-mapped together in the map, marked in red. Similar is the case of Snoop Dogg (@snoopdogg) and Kanye West (@kanyewest), marked in green, both of which are hip hop artists. However, there are also surprising things that emerge such as the relative closeness of “Weird Al” Yankovic (@alyankovic), famous for musical parody, and Yoko Ono (@yokoono), both marked in orange. Though very different musical genres, both arguably appeal to an older, more educated audience. Similarly in the case of German political parties, we see groups of political parties with similar ideologies close to each other.

Musicians MDS
A 2D MDS similarity map of popular musicians.
MDS Germany
A 2D MDS similarity map of German political parties. Similarity measures are derived from their followers’ aggregated friends

Note that MDS is a lossy embedding and that even though two points appear close in the 2-dimensional plane, they might be far apart in the original high dimensional space. Therefore, all conclusions and observations we derived from such mappings in the following have also been validated using the high dimensional similarity information.

Read our paper for more details. The full version is also available on ArXiV.

Posted by kiran

This entry was posted in paper review. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *