WWW::Mechanize 覚書

perldoc WWW::Mechanize の読書メモです

概論
コンストラクタ
PAGE-FETCHING METHODS
STATUS METHODS
CONTENT-HANDLING METHODS
LINK METHODS
IMAGE METHODS
FORM METHODS
MISCELLANEOUS METHODS
OVERRIDDEN LWP::UserAgent METHODS
DEPRECATED METHODS
INTERNAL-ONLY METHODS
WWW::MECHANIZE'S SUBVERSION REPOSITORY
OTHER DOCUMENTATION
ARTICLES ABOUT WWW::MECHANIZE
REQUESTS & BUGS

概論

LWP::UserAgent のサブクラスとして実装され、LWP::UserAgent のメソッドは全部使える。

WWW::Mechanize::FAQ なんてのがある

コンストラクタ

my $mech = WWW::Mechanize->new()
WWW::Mechanize のコンストラクタは、LWP::UserAgent のしてくれることに加えて、User-Agent to とクッキージャーを、↓のような具合にセットアップしてくれる
agent => "WWW-Mechanize/#.##"
cookie_jar => {}    # an empty, memory-only HTTP::Cookies object
ユーザーエージェントを変更するなら
my $mech = WWW::Mechanize->new( agent=>"wonderbot 1.01" );
クッキーを有効にしないんだったら
my $mech = WWW::Mechanize->new( cookie_jar => undef );

この他に、LWP::UserAgent にはない以下のパラメータを設定できる

autocheck => [0|1] ::

onwarn => \&func() ::

onerror => \&func() ::

quiet => [0|1] ::

stack_depth => $value ::

$mech->agent_alias( $alias ) ::

Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)

PAGE-FETCHING METHODS

$mech->get( $url )
$mech->reload()
$mech->back()

STATUS METHODS

$mech->success()
$mech->uri()
$mech->response() / $mech->res()
$mech->status()
$mech->ct()
$mech->base()
$mech->forms()
$mech->current_form()
$mech->links()
$mech->is_html()
$mech->title()

CONTENT-HANDLING METHODS

コンテンツが html じゃなかったら、以下のメソッドでは何も起らない。(将来的には変更の予定がある)
$mech->content(...)

$mech->content( format => "text" ) ::

$mech->content( base_href => [$base_href|undef] ) ::

を追加

LINK METHODS

$mech->links
$mech->follow_link(...)
$mech->find_link()

"text > 'string'," :: - リンクのテキストが 'string'に正確にマッチ <pre class"example"> $mech->find_link( text => "download" );

"text_regex > qr/regex/," :: - 正規表現 が、リンクテキストにマッチ <pre class"example"> $mech->find_link( text_regex => qr/download/i );

"url > 'string'," :: - url が string にマッチ "url_regex > qr/regex/," ::

"url_abs > string" :: - abs_url が string にマッチ "url_abs_regex > regex" ::

"name > string" :: - name が string にマッチ "name_regex > regex" ::

"tag > string" :: "tag_regex > regex" ::

これらで n を指定しないとデフォールト値の 1 が使われ、最初のマッチが拾われる。

複数の指定を同時に行なうと and になる。

$mech->find_link( text => "News", url_regex => qr/cnn\.com/ );

返り値は、WWW::Mechanize::Link オブジェクトの配列へのリファレンスになる。

$mech->find_all_links( ... )

IMAGE METHODS

$mech->images
$mech->find_image()
$mech->find_all_images( ... )

FORM METHODS

$mech->forms
$mech->form_number($number)
$mech->form_name($name)
$mech->field( $name, $value, $number )
$mech->field( $name, \@values, $number )
$mech->select($name, $value)
$mech->select($name, \@values)

$mech->set_fields( $name > $value ... ) :: - 一度に現在フォームの複数のフィールドに値をセットする - 同名のフィールドが複数あれば、最初にみつかったものにセットする - 複数の同名フィールドに値をセットする時は下のように無名配列をつかう <pre class"example"> $mech->set_fields( $name => [ 'foo', 2 ] ) ;

$mech->set_visible( @criteria )
$mech->tick( $name, $value [, $set] )
$mech->untick($name, $value)
$mech->value( $name, $number )
$mech->click( $button [, $x, $y] )
$mech->click_button( ... )

x => x =item * y > y :: クリック座標

$mech->submit() :: - クリックせずにページを送信 - かつて下と同義で使われたが、もう使われない <pre class"example"> $mech->click("submit")"
$mech->submit_form( ... )

MISCELLANEOUS METHODS

$mech->add_header( name > $value [, name > $value... ] )
$mech->delete_header( name [, name ... ] )
$mech->quiet(true/false)
$mech->stack_depth($value)
$mech->save_content( $filename )

OVERRIDDEN LWP::UserAgent METHODS

redirect_ok()
$mech->request( $request [, $arg [, $size]])
$mech->update_html( $html )

DEPRECATED METHODS

トバス

INTERNAL-ONLY METHODS

トバス

WWW::MECHANIZE'S SUBVERSION REPOSITORY

http://svn.perl.org/modules/www-mechanize

OTHER DOCUMENTATION

Spidering Hacks, by Kevin Hemenway and Tara Calishain

Spidering Hacks from O'Reilly (<http://www.oreilly.com/catalog/spi-
derhks/>) is a great book for anyone wanting to know more about screen-
scraping and spidering.

There are six hacks that use Mech or a Mech derivative:

 #21 WWW::Mechanize 101
 #22 Scraping with WWW::Mechanize
 #36 Downloading Images from Webshots
 #44 Archiving Yahoo! Groups Messages with WWW::Yahoo::Groups
 #64 Super Author Searching
 #73 Scraping TV Listings

The book was also positively reviewed on Slashdot: <http://books.slash-
dot.org/article.pl?sid=03/12/11/2126256>

ONLINE RESOURCES

 WWW::Mechanize Development mailing list
    Hosted at Sourceforge, this is where the contributors to Mech dis-
    cuss things.  <http://sourceforge.net/mail/?group_id=83309>

* LWP mailing list
    The LWP mailing list is at
    <http://lists.perl.org/showlist.cgi?name=libwww>, and is more user-
    oriented and well-populated than the WWW::Mechanize Development
    list.  This is a good list for Mech users, since LWP is the basis
    for Mech.

* WWW::Mechanize::Examples
    A random array of examples submitted by users, included with the
    Mechanize distribution.



ARTICLES ABOUT WWW::MECHANIZE

* <http://www.oreilly.com/catalog/googlehks2/chapter/hack84.pdf>
    Leland Johnson's hack #84 in Google Hacks, 2nd Edition is an exam-
    ple of a production script that uses WWW::Mechanize and
    HTML::TableContentParser. It takes in keywords and returns the
    estimated price of these keywords on Google's AdWords program.

* <http://www.perl.com/pub/a/2004/06/04/recorder.html>
    Linda Julien writes about using HTTP::Recorder to create WWW::Mech-
    anize scripts.

* <http://www.developer.com/lang/other/article.php/3454041>
    Jason Gilmore's article on using WWW::Mechanize for scraping sales
    information from Amazon and eBay.

* <http://www.perl.com/pub/a/2003/01/22/mechanize.html>
    Chris Ball's article about using WWW::Mechanize for scraping TV
    listings.

* <http://www.stonehenge.com/merlyn/LinuxMag/col47.html>
    Randal Schwartz's article on scraping Yahoo News for images.  It's
    already out of date: He manually walks the list of links hunting
    for matches, which wouldn't have been necessary if the
    "find_link()" method existed at press time.

* <http://www.perladvent.org/2002/16th/>
    WWW::Mechanize on the Perl Advent Calendar, by Mark Fowler.

* <http://www.linux-magazin.de/Artikel/ausgabe/2004/03/perl/perl.html>
    Michael Schilli's article on Mech and WWW::Mechanize::Shell for the
    German magazine Linux Magazin.

Other modules that use Mechanize

Here are modules that use or subclass Mechanize.  Let me know of any
others:

* Finance::Bank::LloydsTSB
* HTTP::Recorder
    Acts as a proxy for web interaction, and then generates WWW::Mecha-
    nize scripts.

* Win32::IE::Mechanize
    Just like Mech, but using Microsoft Internet Explorer to do the
    work.

* WWW::Bugzilla
* WWW::CheckSite
* WWW::Google::Groups
* WWW::Hotmail
* WWW::Mechanize::Cached
* WWW::Mechanize::FormFiller
* WWW::Mechanize::Shell
* WWW::Mechanize::Sleepy
* WWW::Mechanize::SpamCop
* WWW::Mechanize::Timed
* WWW::SourceForge
* WWW::Yahoo::Groups

REQUESTS & BUGS

bug-tracking system
http://rt.cpan.org/
email
bug-WWW-Mechanize@rt.cpan.org
RT queue
http://rt.cpan.org/NoAuth/Bugs.html?Dist=WWW-Mechanize

連絡先:webadmin.itsumi@gmail.com このページは muse.el で作成しています。 Emacs