Perl & LWP

AUTHOR: Sean M. Burke


PUBLISHED: June 2002

ISBN: 0-596-00178-9

PAGES: 242

Review by John Lightsey on 01 Mar 2004.

I've been putting off this book for a long time, literally years. I started writing LWP based spiders and web scrapers three years back, and since then I've told myself again and again that I'd get to this book eventually. It's not so hard to understand really. LWP is incredibly easy to use, and there are only so many ways of scraping a web-site. Considering that I've made due without this book for three years, was I really missing something? I'd have to answer yes. Perl & LWP by Sean M. Burke does put a wide variety of useful information related to LWP in a single package.

What was most surprising was the limited amount of information in this book dedicated to listing LWP related methods, members, and classes. Sure, chapters 2 through 4 cover this sort of information, but it's significantly trimmed to the raw essentials. Don't expect to find a listing of every goofy LWP related project on CPAN. The author decided from the start to stick with the core modules and show how to use them properly and fully. It would have been very easy to crank out an LWP related book by simply dumping the reference material available on-line about LWP. I've certainly purchased books in the past that have taken that route, and I have to applaud the author for deciding against it.

The real benefit of sticking to the basics sinks in when you hit chapter 5, which begins roughly 1/4 into the book. Chapter 5 goes into terrific detail about dealing with form input. Chapter 6 talks in detail about parsing HTML with regular expressions. Chapters 7 and 8 cover HTML parsing with the HTML::TokeParser tokenizer. Chapters 9 and 10 deal with parsing and modifing HTML with trees. Chapter 11 covers advanced topics like cookies and HTTP authentication. Chapter 12 wraps things up with a detailed look at spiders. It may sound like this collection of information has little to do with LWP, but the reality is that LWP is just a conduit for getting on-line content into a script so that it can be processed. Processing the data is the difficult part, the part that the LWP documentation on CPAN simply can't cover in the detail it requires.

The topics that fill this book cover nearly every aspect of LWP programming that I've found useful over the past few years. The only Item I felt was missing was a discussion of dealing with Javascript. Web pages that use Javascript heavily can throw a monkey-wrench into LWP programming, and it would have been nice to know what techniques the author uses to simplify dealing with these types of pages.

The appendix sections deal with a list of LWP related modules, HTTP response codes, MIME types, language encoding tags, content encoding tags, an ASCII table, and a little discussion about how object oriented Perl modules are written. As I said before, you don't find a dump of methods, members, and classes anywhere in this book. That's what CPAN and Perl in a Nutshell are for. If you're looking for that information here, you'll be sorely disappointed.

So, the big question is would I recommend this book to someone interested in programming LWP? You better believe it. This is a terrific place to get started with LWP programming for someone with a basic grasp of Perl.

Rating: 4/5