TABLE OF CONTENTS

NAME

Mojo::DOM - 基于 CSS 选择器的简单的 HTML/XML DOM 解析模块

SYNOPSIS

use Mojo::DOM;

# 解析
my $dom = Mojo::DOM->new('<div><p id="a">Test</p><p id="b">123</p></div>');

# 查找
say $dom->at('#b')->text;
say $dom->find('p')->text;
say $dom->find('[id]')->attr('id');

# Walk
say $dom->div->p->[0]->text;
say $dom->div->children('p')->first->{id};

# 迭代
$dom->find('p[id]')->reverse->each(sub { say $_->{id} });

# 循环
for my $e ($dom->find('p[id]')->each) {
    say $e->{id}, ':', $e->text;
}

# 修改
$dom->div->p->last->append('<p id="c">456</p>');
$dom->find(':not(p)')->strip;

# 渲染
say $dom;

DESCRIPTION

Mojo::DOM 是一个简约,比较宽松的 CSS 选择器用以支持 HTML/XML DOM 的解析。它甚至会尝试来解析不正常的 XML,所以你不应该用它来验证是否正确。

CASE SENSITIVITY

Mojo::DOM 是使用的 HTML 语义,这意味着所有的标签和属性默认为必须小写.

my $dom = Mojo::DOM->new('<P ID="greeting">Hi!</P>');
say $dom->at('p')->text;
say $dom->p->{id};

如果发现是处理 XML,分析器会自动切换成 XML 模式,这时变得区分大小写。

my $dom = Mojo::DOM->new('<?xml version="1.0"?><P ID="greeting">Hi!</P>');
say $dom->at('P')->text;
say $dom->P->{ID};

XML 的检测也可以通过 xml 的方法来禁用.

# 使用 XML 的语义
$dom->xml(1);

# 使用  HTML 的语义
$dom->xml(0);

METHODS

Mojo::DOM 继承所有的 Mojo::Base 的方法,并自己实现了下面的方法.

all_contents

my $collection = $dom->all_contents;

返回一个 Mojo::Collection 对象包含有着全部的 DOM 结构的节点的 Mojo::DOM 对象.

all_text

my $trimmed   = $dom->all_text;
my $untrimmed = $dom->all_text(0);

从 DOM 结构提取所有的文本内容,默认会启用智能空白微调。

# "foo bar baz"
$dom->parse("<div>foo\n<p>bar</p>baz\n</div>")->div->all_text;

# "foo\nbarbaz\n"
$dom->parse("<div>foo\n<p>bar</p>baz\n</div>")->div->all_text(0);

ancestors

my $collection = $dom->ancestors;
my $collection = $dom->ancestors('div > p');

通过 CSS 选择器找出这个节点所有的祖先, 使用 Mojo::Collection 对象来包含这些元素的 Mojo::DOM 对象.

全部的选择的内容是在 "SELECTORS" in Mojo::DOM::CSS 中都支持的.

# List types of ancestor elements
say $dom->ancestors->type;

append

$dom = $dom->append('<p>I ♥ Mojolicious!</p>');

附加 HTML/XML 片段到这个节点

# "<div><h1>Test</h1><h2>123</h2></div>"
$dom->parse('<div><h1>Test</h1></div>')->at('h1')
  ->append('<h2>123</h2>')->root;

# "<p>Test 123</p>"
$dom->parse('<p>Test</p>')->at('p')->contents->first->append(' 123')->root;

append_content

$dom = $dom->append_content('<p>Hi!</p>');

附加元素内容

# "<div><h1>AB</h1></div>"
$dom->parse('<div><h1>A</h1></div>')->at('h1')->append_content('B')->root;

at

my $result = $dom->at('html title');

查找并返回 CSS 选择器匹配的第一个元素,返回的内容是 Mojo::DOM 的对象,如果没有发现会返回 undef。支持所有 "SELECTORS" in Mojo::DOM::CSS 的选择。

# 查找命名空间内定义的第一个 "svg" 元素 
my $namespace = $dom->at('[xmlns\:svg]')->{'xmlns:svg'};

attrs

my $attrs = $dom->attrs;
my $foo   = $dom->attrs('foo');
$dom      = $dom->attrs({foo => 'bar'});
$dom      = $dom->attrs(foo => 'bar');

元素属性

# 列出 id 的属性
say $dom->find('*')->attr('id')->compact;

children

my $collection = $dom->children;
my $collection = $dom->children('div > p');

返回一个 Mojo::Collection 包含元素子内容的 Mojo::DOM 对象, 类似 find.

# 显示随机的子元素类型
say $dom->children->shuffle->first->type;
my $str = $dom->content;
$dom    = $dom->content('<p>I ♥ Mojolicious!</p>');

Return this node's content or replace it with HTML/XML fragment (for root and tag nodes) or raw content.

# "<b>Test</b>"
$dom->parse('<div><b>Test</b></div>')->div->content;

# "<div><h1>123</h1></div>"
$dom->parse('<div><h1>Test</h1></div>')->at('h1')->content('123')->root;

# "<p><i>123</i></p>"
$dom->parse('<p>Test</p>')->at('p')->content('<i>123</i>')->root;

# "<div><h1></h1></div>"
$dom->parse('<div><h1>Test</h1></div>')->at('h1')->content('')->root;

# " Test "
$dom->parse('<!-- Test --><br>')->contents->first->content;

# "<div><!-- 123 -->456</div>"
$dom->parse('<div><!-- Test -->456</div>')->at('div')
  ->contents->first->content(' 123 ')->root;

contents

my $collection = $dom->contents;

Return a Mojo::Collection object containing the child nodes of this element as Mojo::DOM objects.

# "<p><b>123</b></p>"
$dom->parse('<p>Test<b>123</b></p>')->at('p')->contents->first->remove;

# "<!-- Test -->"
$dom->parse('<!-- Test --><b>123</b>')->contents->first;

find

my $collection = $dom->find('html title');

找到所有 CSS选择器匹配的元素, 并为含有这些元素的 Mojo::DOM 对象集合返回一个 Mojo::Collection 的对象。支持 Mojo::DOM::CSS 的所有选择。

# 查找特定的元素和提取信息
my $id = $dom->find('div')->[23]{id};

# 从多个元素中提取信息
my @headers = $dom->find('h1, h2, h3')->pluck('text')->each;

match

my $result = $dom->match('html title');

Match the CSS selector against this element and return it as a Mojo::DOM object or return undef if it didn't match. All selectors from "SELECTORS" in Mojo::DOM::CSS are supported.

namespace

my $namespace = $dom->namespace;

查找元素的名字空间.

# Find namespace for an element with namespace prefix
my $namespace = $dom->at('svg > svg\:circle')->namespace;

# Find namespace for an element that may or may not have a namespace prefix
my $namespace = $dom->at('svg > circle')->namespace;

my $sibling = $dom->next;

从兄弟元素中返回接下来的一个 Mojo::DOM 的对象。如果没有兄弟元素会返回 undef.

# "<h2>B</h2>"
$dom->parse('<div><h1>A</h1><h2>B</h2></div>')->at('h1')->next;

next_sibling

my $sibling = $dom->next_sibling;

Return Mojo::DOM object for next sibling node or undef if there are no more siblings.

# "456"
$dom->parse('<p><b>123</b><!-- Test -->456</p>')->at('b')
  ->next_sibling->next_sibling;

node

my $type = $dom->node;

This node's type, usually cdata, comment, doctype, pi, raw, root, tag or text.

parent

my $parent = $dom->parent;

从选择的元素中返回父元素的 Mojo::DOM 的对象。如果没有会返回 undef.

parse

$dom = $dom->parse('<foo bar="baz">test</foo>');

使用 Mojo::DOM::HTML 来解析 HTML/XML 文档。 Parse HTML/XML document with Mojo::DOM::HTML.

# 使用 UTF-8 来编码 XML
my $dom = Mojo::DOM->new->charset('UTF-8')->xml(1)->parse($xml);

prepend

$dom = $dom->prepend('<p>Hi!</p>');

前置元素。

# "<div><h1>A</h1><h2>B</h2></div>"
$dom->parse('<div><h2>B</h2></div>')->at('h2')->prepend('<h1>A</h1>')->root;

prepend_content

$dom = $dom->prepend_content('<p>Hi!</p>');

前置元素的内容。

# "<div><h2>AB</h2></div>"
$dom->parse('<div><h2>B</h2></div>')->at('h2')->prepend_content('A')->root;

my $sibling = $dom->previous;

返回元素的上一个兄弟元素的 Mojo::DOM 的对象,如果没有会返回 undef.

# "<h1>A</h1>"
$dom->parse('<div><h1>A</h1><h2>B</h2></div>')->at('h2')->previous;

previous_sibling

my $sibling = $dom->previous_sibling;

Return Mojo::DOM object for previous sibling node or undef if there are no more siblings.

# "123"
$dom->parse('<p>123<!-- Test --><b>456</b></p>')->at('b')
  ->previous_sibling->previous_sibling;

remove

my $old = $dom->remove;

删除这个元素并返回这个元素的 Mojo::DOM 对象.

# "<div></div>"
$dom->parse('<div><h1>A</h1></div>')->at('h1')->remove->root;

replace

my $old = $dom->replace('<div>test</div>');

替换元素,并返回替换元素的 Mojo::DOM 对象.

# "<div><h2>B</h2></div>"
$dom->parse('<div><h1>A</h1></div>')->at('h1')->replace('<h2>B</h2>')->root;

# "<div></div>"
$dom->parse('<div><h1>A</h1></div>')->at('h1')->replace('')->root;

root

my $root = $dom->root;

返回 Mojo::DOM 对象的 root 节点.

root

my $root = $dom->root;

Return Mojo::DOM object for root node.

siblings

my $collection = $dom->siblings;
my $collection = $dom->siblings('div > p');

Find all sibling elements of this node matching the CSS selector and return a Mojo::Collection object containing these elements as Mojo::DOM objects. All selectors from "SELECTORS" in Mojo::DOM::CSS are supported.

# List types of sibling elements
say $dom->siblings->type;

strip

my $parent = $dom->strip;

Remove this element while preserving its content and return "parent".

# "<div>Test</div>"
$dom->parse('<div><h1>Test</h1></div>')->at('h1')->strip;

tap

$dom = $dom->tap(sub {...});

Alias for "tap" in Mojo::Base.

text

my $trimmed   = $dom->text;
my $untrimmed = $dom->text(0);

提取元素的文本内容(不包括子元素),默认启用智能空白微调。

to_string

my $str = $dom->to_string;

Render this node and its content to HTML/XML.

# "<b>Test</b>"
$dom->parse('<div><b>Test</b></div>')->div->b->to_string;

tree

my $tree = $dom->tree;
$dom     = $dom->tree(['root', [qw(text lalala)]]);

文档对象模型。请注意,这个结构你应该非常小心的使用,因为它是非常动态的。

type

my $type = $dom->type;
$dom     = $dom->type('div');

元素的类型

# 列出全面的子元素
say $dom->children->pluck('type');

val

my $collection = $dom->val;

Extract values from button, input, option, select or textarea element and return a Mojo::Collection object containing these values. In the case of select, find all option elements it contains that have a selected attribute and extract their values.

# "b"
$dom->parse('<input name="a" value="b">')->at('input')->val;

# "c"
$dom->parse('<option value="c">Test</option>')->at('option')->val;

# "d"
$dom->parse('<option>d</option>')->at('option')->val;

wrap

$dom = $dom->wrap('<div></div>');

Wrap HTML/XML fragment around this node, placing it as the last child of the first innermost element.

# "<p>123<b>Test</b></p>"
$dom->parse('<b>Test</b>')->at('b')->wrap('<p>123</p>')->root;

# "<div><p><b>Test</b></p>123</div>"
$dom->parse('<b>Test</b>')->at('b')->wrap('<div><p></p>123</div>')->root;

# "<p><b>Test</b></p><p>123</p>"
$dom->parse('<b>Test</b>')->at('b')->wrap('<p></p><p>123</p>')->root;

# "<p><b>Test</b></p>"
$dom->parse('<p>Test</p>')->at('p')->contents->first->wrap('<b>')->root;

wrap_content

$dom = $dom->wrap_content('<div></div>');

Wrap HTML/XML fragment around this node's content, placing it as the last children of the first innermost element.

# "<p><b>123Test</b></p>"
$dom->parse('<p>Test<p>')->at('p')->wrap_content('<b>123</b>')->root;

# "<p><b>Test</b></p><p>123</p>"
$dom->parse('<b>Test</b>')->wrap_content('<p></p><p>123</p>');

xml

my $bool = $dom->xml;
$dom     = $dom->xml($bool);

Disable HTML semantics in parser and activate case sensitivity, defaults to auto detection based on processing instructions.

AUTOLOAD

In addition to the "METHODS" above, many child elements are also automatically available as object methods, which return a Mojo::DOM or Mojo::Collection object, depending on number of children. For more power and consistent results you can also use "children".

# "Test"
$dom->parse('<p>Test</p>')->p->text;

# "123"
$dom->parse('<div>Test</div><div>123</div>')->div->[2]->text;

# "Test"
$dom->parse('<div>Test</div>')->div->text;

OPERATORS

Mojo::DOM overloads the following operators.

array

my @nodes = @$dom;

Alias for "contents".

# "<!-- Test -->"
$dom->parse('<!-- Test --><b>123</b>')->[0];

bool

my $bool = !!$dom;

Always true.

hash

my %attrs = %$dom;

Alias for "attr".

# "test"
$dom->parse('<div id="test">Test</div>')->at('div')->{id};

stringify

my $str = "$dom";

Alias for "to_string".

SEE ALSO

Mojolicious, Mojolicious::Guides, http://mojolicio.us.

POD ERRORS

Hey! The above document had some coding errors, which are explained below:

Around line 533:

Unknown directive: =d2