Automated Data Collection with R: A Practical Guide to Web Scraping and Text Min

Description: Automated Data Collection with R by Simon Munzert, Christian Rubba, Dominic Nyhuis, Peter Meißner A hands on guide to web scraping and text mining for both beginners and experienced users of R * Introduces fundamental concepts of the main architecture of the web and databases and covers HTTP, HTML, XML, JSON, SQL. * Provides basic techniques to query web documents and data sets (XPath and regular expressions). FORMAT Hardcover LANGUAGE English CONDITION Brand New Publisher Description A hands on guide to web scraping and text mining for both beginners and experienced users of R Introduces fundamental concepts of the main architecture of the web and databases and covers HTTP, HTML, XML, JSON, SQL.Provides basic techniques to query web documents and data sets (XPath and regular expressions).An extensive set of exercises are presented to guide the reader through each technique.Explores both supervised and unsupervised techniques as well as advanced techniques such as data scraping and text management.Case studies are featured throughout along with examples for each technique presented.R code and solutions to exercises featured in the book are provided on a supporting website. Author Biography Simon Munzert is the author of Automated Data Collection with R: A Practical Guide to Web Scraping and Text Mining, published by Wiley. Christian Rubba is the author of Automated Data Collection with R: A Practical Guide to Web Scraping and Text Mining, published by Wiley. Peter Meißner is the author of Automated Data Collection with R: A Practical Guide to Web Scraping and Text Mining, published by Wiley. Dominic Nyhuis is the author of Automated Data Collection with R: A Practical Guide to Web Scraping and Text Mining, published by Wiley. Table of Contents Preface xv 1 Introduction 1 1.1 Case study: World Heritage Sites in Danger 1 1.2 Some remarks on web data quality 7 1.3 Technologies for disseminating, extracting, and storing web data 9 1.4 Structure of the book 13 Part One A Primer on Web and Data Technologies 15 2 HTML 17 2.1 Browser presentation and source code 18 2.2 Syntax rules 19 2.3 Tags and attributes 24 2.4 Parsing 32 3 XML and JSON 41 3.1 A short example XML document 42 3.2 XML syntax rules 43 3.3 When is an XML document well formed or valid? 51 3.4 XML extensions and technologies 53 3.5 XML and R in practice 60 3.6 A short example JSON document 68 3.7 JSON syntax rules 69 3.8 JSON and R in practice 71 4 XPath 79 4.1 XPath--a query language for web documents 80 4.2 Identifying node sets with XPath 81 4.3 Extracting node elements 93 5 HTTP 101 5.1 HTTP fundamentals 102 5.2 Advanced features of HTTP 116 5.3 Protocols beyond HTTP 124 5.4 HTTP in action 126 6 AJAX 149 6.1 JavaScript 150 6.2 XHR 154 6.3 Exploring AJAX with Web Developer Tools 158 7 SQL and relational databases 164 7.1 Overview and terminology 165 7.2 Relational Databases 167 7.3 SQL: a language to communicate with Databases 175 7.4 Databases in action 188 8 Regular expressions and essential string functions 196 8.1 Regular expressions 198 8.2 String processing 207 8.3 A word on character encodings 214 Part Two A Practical Toolbox forWeb Scraping and Text Mining 219 9 Scraping the Web 221 9.1 Retrieval scenarios 222 9.2 Extraction strategies 270 9.3 Web scraping: Good practice 278 9.4 Valuable sources of inspiration 290 10 Statistical text processing 295 10.1 The running example: Classifying press releases of the British government 296 10.2 Processing textual data 298 10.3 Supervised learning techniques 307 10.4 Unsupervised learning techniques 313 11 Managing data projects 322 11.1 Interacting with the file system 322 11.2 Processing multiple documents/links 323 11.3 Organizing scraping procedures 328 11.4 Executing R scripts on a regular basis 334 Part Three A Bag of Case Studies 341 12 Collaboration networks in the US Senate 343 12.1 Information on the bills 344 12.2 Information on the senators 350 12.3 Analyzing the network structure 353 12.4 Conclusion 358 13 Parsing information from semistructured documents 359 13.1 Downloading data from the FTP server 360 13.2 Parsing semistructured text data 361 13.3 Visualizing station and temperature data 368 14 Predicting the 2014 Academy Awards using Twitter 371 15 Mapping the geographic distribution of names 380 15.1 Developing a data collection strategy 381 15.2 Website inspection 382 15.3 Data retrieval and information extraction 384 15.4 Mapping names 387 15.5 Automating the process 389 16 Gathering data on mobile phones 396 16.1 Page exploration 396 16.2 Scraping procedure 404 16.3 Graphical analysis 406 16.4 Data storage 408 17 Analyzing sentiments of product reviews 416 17.1 Introduction 416 17.2 Collecting the data 417 17.3 Analyzing the data 426 17.4 Conclusion 434 References 435 General index 442 Package index 448 Function index 449 Details ISBN111883481X Author Peter Meißner Publisher John Wiley & Sons Inc ISBN-10 111883481X ISBN-13 9781118834817 Format Hardcover Series Wiley Series in Computational and Quantitative Social Science Place of Publication New York Country of Publication United States DEWEY 006.312 Subtitle A Practical Guide to Web Scraping and Text Mining Media Book Edition 1st Year 2014 Publication Date 2014-12-26 Short Title AUTOMATED DATA COLL W/R Language English UK Release Date 2014-12-26 AU Release Date 2015-01-20 NZ Release Date 2015-01-20 Pages 480 Imprint John Wiley & Sons Inc Audience Professional & Vocational US Release Date 2014-12-26 We've got this At The Nile, if you're looking for it, we've got it. With fast shipping, low prices, friendly service and well over a million items - you're bound to find what you want, at a price you'll love! TheNile_Item_ID:133082081;

Price: 131.77 AUD

Location: Melbourne

End Time: 2024-11-22T03:05:23.000Z

Shipping Cost: 16.34 AUD