New Study Proves that Twitter Users in China are Rare Birds

Steven Millward
10:29 am on Jan 7, 2013

We recently looked at some stats compiled by a Chinese netizen that showed a mere 18,164 Twitter users active in China. That bit of number-crunching inspired Jason Q. Ng, a keen analyst of social media in the country, to set out to corroborate – or disprove – last week’s figure, and thereby try to get a clearer picture of who’s on Twitter now in mainland China despite it being blocked here way back in 2009.

Another motivator for Jason was the sky-high figure of 35 million Twitter users in the nation that was last year put forward by market research firm GlobalWebIndex. That number sounded suspect to us and was crying out for a double-blast shooting-down.

For the new study, Jason opted for a different method of getting the data from the Twitter API than was used by @ooof last week. In a post last night on Jason’s Blocked on Weibo blog (also the name of an upcoming book of his, tackling censorship on the Twitter-esque Sina Weibo), he concluded:

According to the data I extracted, there are tens of thousands of Twitter users in China, not millions, a result that confirms @ooof’s finding and refutes GWI’s conclusion.

He added:

The exact numbers I and @ooof come up with may differ, and only Twitter itself would be able to best reveal how many Chinese Twitter users there are, but our independent results are likely within an order of magnitude to the actual number, unlike GWI’s result which is about 2,000 times greater than our calculations.

Ultimately, Jason will venture to say only that “the number of active Twitter users in China is almost definitely between 10,000 and 100,000.” But that’s with him being super cautious about including less frequent users of the service.

The raw sample of Twitterers who posted in Chinese during his sampling time amounted to 43,784 individuals. That had to be boiled down to a sample of 608 whose location could be inspected manually. From that sample, 110 people were in the Beijing timezone and apparently located in mainland China. The resultant pie chart of Chinese-language tweets around the globe at that moment looks like this:

Chinese tweets around the world

Jason’s final and very conservative figure could be extrapolated to 8,000 Chinese Twitterers during his brief study period. He explains this tiny figure:

The primary reason why my number is so much lower than @ooof’s is because his data collection period appears to have lasted for a month, and thus he captured the more casual Chinese tweeter; otherwise, my percentages largely confirm his.

So China-based twitterers are indeed rare birds.

For the full methodology and even some spreadsheets, head to the link below.

(Source: Blocked on Weibo)

(And yes, we're serious about ethics and transparency. More information here.)

  • Kim

    Congratulations to Josh for the good work and the passion he puts on to this:


    – apparently GWI ran researches continuously since 2009, for many times a year and their results are always stable and actually increasing over the waves. How do you explain this?

    – It is a bit shallow to questioned an international credited market research company using the twitter real-time search which is faulty and extremely limited.

    -The language search methodology doesn’t tell us much about users twitting in English. I would expect a great number of them since Twitter is much more popular outside of China. It also doesn’t tell us anything about non-active users.

    -It my be just a typo….but have the tweets been collected in Jan 2012???

    -Having worked with Twitter I can guarantee that the DON’T HAVE ANY IDEA of how many users they have in China!

    Conclusion is that this research give us a hint of how many people tweet in Chinese in one day but not an exhaustive overview of the possible number of Chinese users.GWI’s number might seems a bit high, but at least they are based on a solid methodology. If their number are actually wrong it would clear when their next research will come out.

  • Jason

    Hi Kim, thanks for the thoughtful response.

    1) Indeed, I’m sure GWI does great work in many sectors. But there’s no help denying it: they whiffed on this one. Maybe their methodology has been faulty all along and the results flawed since the beginning, and only got noticed now with this incredible number. I’m not survey research expert, but I’d be happy to review the methodology and data if they release it.

    As for your other very smart questions:

    2) It certainly isn’t ideal to use the real time search, and as I mention in the paper, ideally someone with Firehose access would be able to download a more full set of tweets over a longer period. But due to the rate limiting I faced, the search tool was my best option. As I simply wanted to confirm ooof’s number and disprove the 35 million one from GWI, I didn’t do as much error checking of more Chinese user’s timelines to check if their tweets made it into my dataset, but for the one user’s timeline I did examine, 11 of 14 that he posted did make it into my data. Even a tweet I made got included. You’re welcome to give me the name of a user who tweeted in Chinese on Jan 3rd and I’ll tell you how many of their tweets I have. Overall, I’m very confident I’m not missing MILLIONS of tweets, which is what would have to be true if indeed their are 35 million users of the service in the past month (as GWI claims).

    3) I don’t understand your question. The question I tried to answer was how many Twitter users are there in China. Certainly, if I had Firehose access, I could’ve downloaded every tweet possible and then performed a lookup of every user’s timezone (or if I was Twitter, simply look up the timezone of every registered user). However, this simply isn’t feasible for me or just about any researcher (someone with Firehose is more than welcome to try this), so I narrowed down the data to those tweeting in Chinese. You’re right, this loses those who tweet in English from China, but shrug, that’s the limitation of my approach. I’m telling you that there are almost definitely 10,000-100,000 active Twitter users in China who have posted something recently in Chinese. If you want to claim that there are millions of others tweeting in English in Chinese from China, then so be it. I can’t deny that with my data, but anyone could tell you that that’s a foolish theory not even worth disproving when the number of Chinese users is already so much less. … As for non-active users, yes, I acknowledge that in my post. Our numbers demonstrate that 10-18,000 Chinese users posted something recently; if in fact you want to claim there are 35 million users total including those who post and listen, then 99.5% of them would have to have never posted based on our number, which flies against the face of data from multiple sources which put the number of non-posters at much less than that (including GWI which puts it at 66%, which sounds quite reasonable and not that far out of line from what Twitter’s officially published worldwide percentage is).

    4) Good catch, I appreciate it! It should be 2013 and I’ve updated it. Sorry, it’s the new year’s typo. I caught them all in my post/tweets, but didn’t catch this one. Thanks!

    5) I think we laid out a method for them to derive how many users are from China. IP address info is flawed because of VPN and circumvention stuff, user provided data is flukey because of high non-response and the weird stuff people enter. Ooof’s decision to use timezone as a proxy for country is very clever and I think the best way toward figuring out what region a user is from. It’d be trivial for them to extract out this data from all their registered users.

    6) It doesn’t seem a bit high, it’s ASTRONOMICAL compared to the actual tweets present. And the best part is that the data confirms what everyone sees with their own eyes: no one in China really uses in Twitter. I don’t have an axe to grind here, I don’t have a beef with Twitter, GWI, the Chinese gov, anyone; I just pull data and look at it. I realize my study isn’t ultra-rigorous, but I think it’s very solid and I explain ways to replicate it in order to make it airtight. Even so, I think it does enough to debunk the 35 million number. Sure, based on my data, once we adjust for all the factors you mention (folks who tweet in other languages, people who only use it to read other posts, etc) there’s a a small statistical chance there are 200,000 or even 500,000 users–and if the survey said that, I’d shrug and say, maybe if China were the exception to lots of other published findings. But 35 million? That’s just not possible in any way, shape, or form based on the data we see. But thanks for the response and I hope this was useful in explaining what I did.

  • Kim

    Hi Jason,

    thanks a a lot for taking the time to reply to me in such a detailed and comprehensive way. I’m sure you do not have any vested interest with GWI, Twitter or whatsoever and so do I.

    The whole thing seems a bit strained to me. I would definitely be on your side if someone with the same methodology of GWI would have come out with different results. In that case we could all celebrate that GWI lied on Chinese Twitter/Facebook users. But until then, there is no much use with trying to prove them wrong based on “everyone know they are wrong”.

    i’ve been following the discussion around the Chinese number for a long time and my idea is that these finding do not fit with someone’s agenda and that’s why everyone is trying to find some issue with that.

    I’ve seen people quoting a study from Semiocast, a company that only tracks IP and for this reason they will never have the exact number of Chinese social users since they all us VPN. Or even people quoting the Facebook ad tool as a relevant source of information. This is totally NON SENSE. Obviously Facebook doesn’t know how many users they have in China for the same reason above.

    I think their methodology is fully explained in their presentations

    and on their Blog

    and honestly i do not see any problem with that, even if I still think this number are too high.

    I genuinely appreciate your work and i think it’s necessary to question every study that doesn’t seem all right to us and too many times journalist do not do that (I’m a journalist myself).

    This said, I do not understand why “Tech in Asia” gave space to your study when, for your own admission, this is not rigorous, while GlobalWebIndex, which is an international accredited market research company, has been questioned that much.

    Again, thanks and congratulations. Keep up the good Work!


  • Jason

    Hi Kim, again, thoughtful response. I explain exactly how I ran my test and someone is capable of confirming it, and, if they had Firehose, perform it on an even more rigorous level than I did. Looking through the two links you provided, I don’t see anything about their methodology (maybe you have to buy the report?) outside of some mentions of waves and basic demographic breakdowns. If they were willing to release information about their sampling methodology (strata, selection, etc), how many sites and where (city, location, etc) they surveyed at, who their surveyors were, the number of respondents engaged at each site, the response rate to the survey, response rate to specific questions (for instance how did they adjust for people who skipped certain questions), the skip pattern of the survey, how did they handle missing responses, the Chinese wording of the questions, and so on, then we could more rigorously asses how they got their numbers. A few folks have engaged GWI on some of these questions and from what I’ve seen, none of the above questions were answered (if they have, do let me know!). Until they are, no one will be able to critically engage their survey because we can’t investigate what might have went wrong. … As for your first criticism, no one is out arguing GWI is wrong because “everyone know they are wrong”. What I reported on is what the public data from Twitter says. If there were 35 million Chinese Twitter users, the tweets would be there. Looking at what data I could, the tweets just aren’t there. … Thanks again and if you’d like to continue this conversation, feel free to tweet me. All best.

  • Jason

    Ah, see it now in your second link some info about sampling methodology: “which is self-completion surveys in Mandarin (Simplified Chinese for Mainland China) served to an online panel run by an internationally accredited panel company. Furthermore, we have run seven waves of research in China, covering a representative sample of the internet users from within this panel.” Rest of my questions still stand.

  • Kim

    Thanks again for your answer.

    I think you should definitely try to ask them these questions. They are all really good and they can seriously give us a final and complete perspective of what went wrong (if something did).

    “A few folks have engaged GWI on some of these questions and from what I’ve seen, none of the above questions were answered”
    If you are talking about these questions those are clearly nonsense and provocative and no serious company would never address them.

    Yours are really clever, and if you don’t want to ask them i can try to get in touch with them and see what they say (I’m a journalist after all)

    Hope to see some great work from you again!

Read More