Prices in Search and Shiny Posted on September 13, 2015 by Anthony Reply Shiny based graph As a personal project some time early last year I put a simple scraper together. I used it for monitoring a certain number of search results to document the ads. For a number of these search result pages, the advertisers included prices. With about a year’s worth of this data on hand, now is as good a time to start to have a closer look at it, and an excuse to put something together in Shiny. The Data The data used for this is from the ads in Google’s search result pages. This data was collected by a scraper using built with PHP and MySQL. It monitored a list of search terms and archived the html and stored the ads in the database. Archiving the pages as html turned out to be a good idea, as Google did change their mark up a few times over the course of the last year, and some data had to be extracted after the fact. Some processing was required to create the dataset used here. The display URLs in the ads required some cleaning before they could be used as a proxy for brand and the price information needed to be extracted from the headline and the adcopy. In this exercise, price is generally defined as anything that would match to “$[0-9,]+” The Graphs Price distribution over time R is a very flexible tool. As well as being a fairly robust piece of statistical analysis software, it can also be used to generate reports using packages like knitr and interactive visualisations based on Shiny. Which is why I put together a visualisation on prices over time for a small section of search queries. The code used in this can be found in the Github repository priceTimeseries. The Shiny application opens on a scatterplot of price against time by observation. The second tab displays distribution of the prices shown per brand and the third is a boxplot of the distribution of prices by brand. The fourth tab simply being a table of summary statistics for each brand. Each tab is constrained by the criteria selected in the left panel. This app is designed to communicate changes in pricing strategy for paid search over time between different competitors in the market. So Why? Shiny more or less turns R into a dashboard. It makes it possible to take the kinds of analysis that can be done in R and integrates it into something you can put in front of most people in a business. The example I put together more or less just displays the data like a glorified spreadsheet, though this does not mean that more interesting analysis can’t be presented in this way. Leave a Reply Cancel reply Your email address will not be published. Required fields are marked *Comment Name * Email * Website