Perl Tutorials
Using WWW-Mechanize
Perl Basics

Using WWW-Mechanize

Introduction

WWW::Mechanize is a Perl module to automate the navigation and interaction with websites. Being built on LWP::UserAgent gives WWW::Mechanize a familiar interface to many perl authors. Common uses include scraping and spidering of websites along with the ability to easily automate form submission.

Getting Started With WWW::Mechanize

Loading WWW::Mechanize is similar to loading and initializing other Perl modules. The use statement parses the WWW::Mechanize module and loads it into memory. After that initialize an instance by invoking WWW::Mechanize->new().


use WWW::Mechanize;
my $m = WWW::Mechanize->new();

Once the module has been loaded it's time to retrieve a URL with the get statement. Use the content statement to access the raw HTML of the page retrieved. In addition to being able to access the raw content of the page WWW::Mechanize is equipped with a series of methods of interacting with the page.


$url = 'http://www.google.com';
$m->get($url);
$m->content();

Finding links is done by using the find_link method. There are several ways to specify the link to follow, text, text regex, URL, URL regex, absolute URL, absolute URL regex, name, ID, class, tag. Multiple statements can be combined in an and statement if passed comma seperated. Returns a WWW::Mechanize::Link object.


$m->find_link(text => 'string');
$m->find_link(text_regex => qr/regex/i);
$m->find_link(url => 'http://url/');
$m->find_link(url_regex => qr/domain/i);
$m->find_link(text => 'string', url => 'url');

Following Links is availbable with the follow_link function that will return an HTTP::Response object. The link to follow is specified by the same syntax as find_link.


$m->follow_link();

First Script

Now that the initial basics have been covered it's time for an example script. This Perl script will pull www.google.com page and find the URL for the advanced search page.


#!/usr/bin/perl
use WWW::Mechanize;
$url = 'http://www.google.com';
$m->get($url);
$link = $m->find_link(text => 'Advanced Search');
print "The Google advanced search URL is: $link->url()\n";

 1 2 3  >> Manipulating Forms
New Content