This is a guest/cross-post by Jan Willem Tulp, the winner of my recent contest to win a full pass to the O’Reilly Strata conference. The conference is taking place this week and Jan has kindly offered to share a short summary on each day of the conference. You can find out more about Jan’s work via his blog and follow him on Twitter @JanWillemTulp.

Day 1 at O’Reilly Strata Conference: Data Bootcamp
Today was my first day at O’Reilly Strata Conference: a full day of tutorial sessions. The session I picked was the Data Bootcamp by Joseph Adler (LinkedIn), Hilary Mason (bit.ly), Drew Conway (New York University) and Jake Hofman (Yahoo!). The purpose of this bootcamp tutorial was to turn everybody  in the room into data scientists by getting our hands dirty with some  real hands-on experience.
The tutorial was kicked-off with an introduction of  the speakers, and a general overview of the various aspects of working  with data: getting data, cleaning data, applications of data intensive  applications, and much more. Then Drew gave an interactive introduction  in visualizing data using Python and R. The audience had to produce a normal-distribution of random  numbers in R. And although some people managed to get along with all the  examples, there were also lots of people struggling due to the fact  that libraries were missing, or simply for the fact that everything was  going pretty fast, at least for R and Python newbies like myself.
Next Jake gave an great introduction into image  processing, and especially how you can cluster images based on similar  features, color in our case. We used a K-Means clustering algorithm to  cluster similar images based on color, and after that we classified  images, whether they were images of landscapes or head-shots.
After the bream Hilary took over with a great  presentation of working with text-data. Starting with some basic  examples of extracting data from webpages using command-line commands  like curl and wget, and using Python and the BeautifulSoup Python library. After that we turned to the main example: ‘hacking’ a  gmail account, and try to get some valuable information out of it.  Hilary showed us how to classify email using probability statistics, and  then Drew took over to show us how to visualize this data and turn it  into network diagrams.
Last but not least Joseph gave a talk about Big  Data. This was not an interactive session. Joseph shared some of his  knowledge and experience of working with big data at LinkedIn, and  explained the basics of Map/Reduce, Hadoop, and why and when to start  thinking about big data solutions like Hadoop.
Overall it was an interesting day, also because I’ve  met really great people. For me the Data Bootcamp was especially a  inspirational tutorial with lots of ideas to try out on my own. For some  people tempo tempo was a little to high, especially if you’ve never  programmed R or Python before. And becoming a Data Scientist in just 1  day may be an illusion anyway. At least the tutorial gave me a good head  start, lots of inspiration, and great learnings of how the presenters  approach working with data. So for me, this was a great and successful first day, and I’m looking forward to the next two days!
The source code and slides of the Data Bootcamp are available online at: https://github.com/drewconway/strata_bootcamp
