‘State of the Word’: What can you do with this WordPress survey data?

Matt Mullenweg, founding developer of WordPress, published details of his 2011 ‘State of the Word’, an annual overview of the headlines, success stories, facts and figures relating to the utilisation of the opensource web/blog tool. Part of this narrative was based on results from the first ever WordPress user and developer survey, which got over 18,000 responses from all over the world, as mapped below:

There is a goldmine of potential insights contained within the survey data: juicy bait that Matt dangles to visualisation designers out there…

We know there’s more good stuff hidden in there and we’re open sourcing and releasing the raw information behind it. If you’re a researcher and would like to dig into the anonymized survey data yourself, you can grab it here. (Careful, it’s a 9MB CSV.)

So, anyone got time and/or interest to give it a go? I’m not launching a contest, no prizes, but if anyone does come up with some interesting visualisation designs let me know and I’ll stick them up on the site and, of course, share them back with the WordPress crew themselves.

Incidentally, here is the full video of Matt’s address:


GrahamSeptember 14th, 2011 at 11:35 am

Not sure if anyone can help, but I’ve been playing with the data and see that for the question about number of active WordPress sites built (in columns F and M in the spreadsheet), the numbers 40704, 40867 and 40579 appear frequently as answers. I haven’t worked out what these mean. Examples of other answers are “21 – 50”,“ 51 – 200” and “200 +”, so I’m hoping the strange numbers are the result of a mis-translation somewhere and they can be correlated to the correct answers.

Anyone have any ideas?

Andy KirkSeptember 14th, 2011 at 11:45 am

Hi Graham
I’m guessing here but imagine these values have been entered as text strings (ie. the user banding) but have been automatically converted by Excel into dates (and therefore changed the numeric value to the underlying datevalue). So, if you put 40704 into Excel and format it as a date, it comes up as 10th June 2011. Now I’m in the UK but guessing what might happen in the US (form where the source of the data would come), this date would be automatically converted from somebody typing in 6-10 (which looks like a logical banding?). I’ve done the same with the others and they come up as 11-20 and 2-5? Does that make sense?
Thanks – looking forward to seeing your analysis

GrahamSeptember 14th, 2011 at 11:09 pm

Thanks for the help, Andy.

Here’s my first attempt: http://www.grahamvdr.co.za/scripting/wordpress/

It focuses on a few of the questions only, and I don’t know if it really reveals anything we couldn’t have guessed at. I did it mostly as a learning exercise for myself—I’m very new to this game! Constructive feedback would be welcome.