Saturday, February 3, 2018

Twitter Blocking: Fast-Scan Social Algorithms

Summary Points:
  • Over 3100 people block my personal account on Twitter.
  • I created an algorithm to crawl Twitter and find people who block me.
  • Those people appear to be mainly aspiring authors and members of the anti-Trump "resistance."

When I go on Twitter, I'm often confronted with this view:

Trying to see the unavaible tweet, I click through and see this:

I am blocked by thousands of people on Twitter.  When I tell people about this publicly they often react in thinking I must be the worlds most massive troll (but I'm not).  But most of these "blockers" are accounts that I've never interacted with, they have essentially blocked me either categorically or because I'm part of a massive block list.

Blocking has become a major part of the user experience on Twitter for many reasons, and that's largely out of the scope of this blog. To understand how someone ends up blocked like me; you should understand two products:

  • Block together:  A program that gives users the ability to share block lists and otherwise categorically ban accounts.  
  • Twitter Block Chain: This chrome extension is (poorly named) used to block all users of a specific account.  For instance, it could be used to block anyone who follows @realDonaldTrump
After a day when I found a couple of random accounts blocking me.  I realized their was a Data Science angle, specifically:
  • Can I use an algorithm to scan Twitter and accounts that block me?
  • Can I optimize the algorithm with Machine Learning to predict accounts likely to block me and make my initial algorithm find blockers more quickly.
The answer to these questions ended up being "yes" and "yes."  Here I'll describe the results, first with a description of results and then a concept of the data science method.


To-date, I've used an algorithm to find about 3100 people who block me on twitter.  Releasing the full list would seem to be doxxing-however the public nature of which "verified" accounts block me seems less problematic.  Here's a listing with evidence of some "celebrity blocks":  

TL;DR version: A guy who invented the term "vice signaling," a liberal writer, another liberal writer, a standup comedian, and a former Star Trek actor.

Digging into the data around who blocks me, we can generally describe the nature of people who tend to block me by comparing how the words on their profiles compare to the words on the profiles of people who *don't* block me.  Below is a list of those words and their indication of risk.

A score of 1.0 deems that this word neither increases the probability that a user will block me, a score of 2.0 is twice as likely to block, whereas a score of 0.5 is half as likely to block.  There are two effective dimensions to the majority of people who block me:
  • People who are part of the liberal "resistance" to Donald Trump (which is a bit bizarre because I'm not a Trump supporter-though I was very anti-Bernie Sanders).
  • "Geeky" authors and writers-this is speaks to the Wil Wheaton theory of my epic blocked status (see below).
One outlier word is the term "lyme" which turned out on inspection to relate to people who believe they have "Chronic Lyme Disease," a condition covered by Evidence Based Medicine proponent Orac over on his blog.

And because I know everyone loves wordclouds (sarcasm), on a recent long social scan, I created wordclouds of people who block me.  Here's what that looks like, first looking at those who block me:

And those who do not:

Apparently, unbeknownst to me, I am loved by dog/pet lovers (potentially an artifact of sampling, rescan method-but few of the dog-lover accounts blocked me).


To begin my methodology, I started with a bit of a priori theory-specifically-what are the incidents that led to my blocking?  I'm blocked by people across the political spectrum, from conservative politicians in my own state to "resistance" members that are famous nationwide.. and an odd number of science fiction writers (I'm not a fan, don't care).  Here were my best theories of my own blocking:

  1. I was added to Wil Wheaton's block list after telling him to "Shutup Wesley" one too many times.
  2. I got into a few arguments that led to individual blocks with Bernie bros when I was pointing out some misconceptions they had about tax policy.

Wheaton's block list, as well as the block together App seemed the most likely scenario for wide-spread blocking.  


Figuring out that someone blocks you via the Twitter API is actually super easy. My initial scanner ran all inside of R, then I ported it to Python using the tweepy package.  The basic steps:

  1. Try to download a user timeline for your target user.
  2. Catch all errors. 
  3. Check to see if the returned error is code 136.

Here's what that looks like in general Python:

Note: This code is simply for a single user.  The entire application I wrote is a full iterative search-for-user, check block status, recursively search for more users, model, repeat application in Python with a Microsoft SQL Server backend that I am not publishing in whole at this time, for various reasons.


Knowing that our end goal here is to optimize an algorithm which will tell us which users to check for blocking, we first need to gather an informed set of accounts that will tell us which types of users block us.  Using the a priori theory of why users may be blocking me, I turned to two places:
  1. The followers of the account for the app "BlockTogether"-this may create an artificially high incidence rate-but hey at least I know they follow me.
  2. Followers of Wil Wheaton-because I know he blocks me, and also shares his block list, so there's a fair chance that his followers would buy into this.
I used this strategy to get an initial subset of users to figure out what types of users might block me.  So I started there, using the Twitter API to download all the Twitter ID's associated with Wheaton's followers (reduced Python below-I'm showing the basic blocks in segments here for ease of presentation). 


Why would I even need to model my blockers? A few reasons:
  1. I don't have a list of all Twitter accounts, so I need to predict who's followers might block me at high rates.
  2. These checks rely on the Twitter API, which is rate limited, so it's of great advantage to rank-order accounts in terms of likelihood to block.
  3. Understanding which people block me gives me insight to the 'why'?
As such I began creating a model against Twitter profile data to determine which accounts are likely to block me.  First, what data elements will I have available?  The basic elements available are what is pulled from a profile scan of the Twitter API: 

  • Verified Account?
  • # of follows
  • # you follow
  • date created
  • #of favorites
  • #of statuses
  • Location
  • Name
  • Description (text description on your profile)
Most of these are fairly easy to analyze, with the exception of the text description (of course, the most rich sense of actual attitudinal data which is likely to predict who you may block). I can't push a text description directly into a ML model, so I took two strategies in creating numeric data from textual data: 
  1. Compute relative frequency between blockers and non-blockers of high incidence words, and then create binary dummies for all words with a skew (measured by binomial test) <.05
  2. Use a type of Natural Language Processing (NLP) to compute correlated topic models accross the description data; create a variable for each observed topic, and assign the probabilistic association for each user description to fill in the data for each topic.
Data was clean at this point and essentially ready for modeling.  Dependent variable was whether or not I was blocked, N was all accounts "checked" to date, independent variables are as listed above.

I tested several types of models (read: ran and evaluated them using R's caret package).  Extreme Gradient Boost trees ended up winning (though there wasn't a huge gain).  AUC = .85.  

Variable importance was interesting:

As expected, textual data was most predictive with several other items also providing important information to the prediction.  Note: many of these variables have complex interactive effects inside our XGBTree, so it's not a simple matter of "more followers==more likely to block."


What is the value of modeling this data?  Because the Twitter API is rate limited I can only check a certain number of accounts per day, so it's good for me to use my rate limit wisely.  By predicting which accounts are most likely to block me, I can simply prioritize those and thus increase my find rate.  Using this prioritization at first increased my positive block rate by about 3x, from about 0.1% to 0.3%.


Far above I covered where to start with a large number of account to check, deciding on followers of Wil Wheaton and the BlockTogether follower list.  But once I've checke all of these accounts where should I go?  Here I implemented two strategies:

  • I found a relationship between an individuals propensity to block and their followers propensity to block.  A quirk of the twitter API is that while blocked from seeing a user's timeline, I can still pull all of their followers.  So, I began iteratively pulling all the followers for each block I found.
  • I also found a releationship between the probability that you block me, and the incidence at which your followers will block.  As such, I began pulling the followers for all high-probability blocking accounts(as determined by model above), regardless of whether they actually blocked me.
  • There's an ability to find users via Twitter API using words on their profile, because I already know which words (Lyme, Geek) are associated with blocking me, I simple search Twitter for those words.
This iterative method can lead to a biased sample effect over time-so it's important that you also include some random accounts in your sample OR some that are intentionally in large deviation to your current pull.


In the end a few points to take away:

  1. The culture of Twitter has led to a situation where blocking is pervasive-especially for certain people who end up on block lists.
  2. I am blocked by many people on Twitter, and by gathering data about these individuals-I can certainly create a general profile of my blockers.
  3. Using Machine Learning, the Twitter API, Natural Language Processing, and other Data Science Technology, I can pull together a list of people who block me on Twitter.

As a personal aside, I don't care a lot that I'm blocked on Twitter, it doesn't appear to effect my user experience in a material way, as I am not blocked by any accounts I would normally follow and it's not an emotional issue for me.  I do find it troubling that the pervasive use of block lists is allowing many Americans to insulate themselves from conflicting points of view, including primarily blocking people with whom they have never interacted with in the past.


  1. After reading this blog i very strong in this topics and this blog really helpful to all. Data Science online Course India
    Data Science online Training

  2. Very nice post here and thanks for it .I always like and such a super contents of these post.Excellent and very cool idea and great content of different kinds of the valuable information's.
    Selenium Training in Chennai | Selenium Training in Bangalore | Selenium Training in Pune | Selenium online Training

  3. myTectra Placement Portal is a Web based portal brings Potentials Employers and myTectra Candidates on a common platform for placement assistance

  4. I simply wanted to write down a quick word to say thanks to you for those wonderful tips and hints you are showing on this site.
    apple service center chennai | ipod service center in chennai | apple iphone service center in chennai | apple service center chennai

  5. Amazing Article ! I would like to thank you for the efforts you had made for writing this awesome article. This article inspired me to read more. keep it up.
    Correlation vs Covariance
    Simple Linear Regression
    data science interview questions
    KNN Algorithm
    Logistic Regression explained

  6. Unless you've been living under a rock, look at these guys you have probably heard of Twitter. It is a rocking hot social networking site that seems to be to be used by almost everyone today.

  7. Expanding your focused on following on Twitter requires a touch of persistence and an all around considered procedure.

  8. yet subsequent to examining their requirements it turned out to be very evident that those potential customers didn't really have the foggiest idea why they required social organizations or SMM to produce online deals. online twitter download

  9. great post ! This was actually what i was looking for and i am glad to came here!
    F5 Load balancer Training
    SAP Ariba Training
    MuleSoft Training

  10. Clients need to purchase online due to their tight timetables and requesting cutoff times so in the event that they can get to your business on Twitter, they can have your business in the palm of their hand; this right where you need them!

  11. I'm without a doubt no master, yet having a healthy degree of information in utilizing Windows Movie Maker (or comparative video altering programming) can divert that video document directly from the camera into a wonderful, YouTube-prepared video. Get More Customers With Social Media

  12. This will likely be the most un-utilized of your more extensive abilities, however in any case it can help you in your social promoting positions. cheapest smm panel

  13. You should simply begin your own YouTube channel and start collaborating with your crowd!

  14. Anyway, what does a web-based media director really do? As you can likely tell at this point, the job of a web-based media administrator is assorted. agree with

  15. Very neat blog. Really looking forward to read more. Keep writing.
    ytmp3 converter

  16. Stupendous blog huge applause to the blogger and hoping you to come up with such an extraordinary content in future. Surely, this post will inspire many aspirants who are very keen in gaining the knowledge. Expecting many more contents with lot more curiosity further.

    Data Science Certification in Bhilai

  17. Extraordinary blog went amazed with the content that they have developed in a very descriptive manner. This type of content surely ensures the participants to explore themselves. Hope you deliver the same near the future as well. Gratitude to the blogger for the efforts.

    Data Science Training

  18. I am thankful for the article post. Really looking forward to read more. Much obliged.
    post photos on instagram from pc

  19. As advertisers and publicists, we highly esteem exactness. In the days of yore of showcasing and publicizing, we fixated on rating quantities of network programs, readership for print advancements, and conveyance achievement rates for regular postal mail. download instagram photo by link

  20. I truly appreciate just perusing the entirety of your weblogs. Just needed to educate you that you have individuals like me who value your work. Unquestionably an extraordinary post. Caps off to you! The data that you have given is exceptionally analytics courses in bhopal

  21. Furthermore you'll see a comparable pattern here for the females. Check out these rates of individuals who've watched YouTube recordings in the course of their life. It's not simply teens. targonca szállítás Debrecen Europa-Road Kft.

  22. Truly quite fascinating post. I was searching for this sort of data and delighted in perusing this one. Continue to post. Much obliged for science course in bhubaneswar

  23. This is a great post.I am going to write on this topic based on your writing.
    Would you please come to my blog and give me some advice? We look forward to your feedback
    english stories