网页html版面分析-- BeauifulSoup(python 文档解析提取)

介绍

BeauifulSoup 是一个可以从HTML或XML 文件中提取数据的python库;它能通过转换器实现惯用的文档导航、查找、修改文档的方式。
BeauifulSoup是一个基于re开发的解析库,可以提供一些强大的解析功能;使用BeauifulSoup 能够提高提取数据的效率与爬虫开发效率。

安装

pip install beautifulsoup4 

使用

1 构建文档树

BeauifulSoup 进行文档解析是基于文档树结构来实现的,而文档树则是由BeauifulSoup 中的四个数据对象构建而成的。

在这里插入图片在这里插入图片描述
描述

from bs4 import BeautifulSouphtml = """<div class="post js_watermark quill-editor" style="background-repeat: repeat; background-image: url(&quot;&quot;); background-size: 940px 222.942px;"><h1 class="title">版面分析——网页HTML解析 BeautifulSoup</h1><div class="group-info"><a href="https://wx.zsxq.com/dweb2/index/group/51112141255244"><span>来自:</span><span class="group-name">AiGC面试宝典</span></a></div><div class="author-info"><div class="author"><img src="https://images.zsxq.com/FpFYmnHpgmz5J4DicXxscPfi3GI2?e=2064038400&amp;token=kIxbL07-8jAj8w1n4s9zv64FuZZNEATmlU_Vm6zD:hS7fTOpUpCI18IU4GweitfivQIU=" alt="用户头像"><span class="nick-name">Just do it!</span></div><span class="date" id="article-date">2024年04月27日 14:30</span></div><div class="ql-snow"><div class="content ql-editor"><p><img src="https://article-images.zsxq.com/FsOmOdM3jIkLawUT9z7sEbkMZgpV"></p><p><img src="https://article-images.zsxq.com/FnbQkQK1pNTESbYjScR42_PrYb9E"></p><div class="ql-code-block-container"><div class="ql-code-block"><span class="ql-token hljs-keyword">from</span> bs4 <span class="ql-token hljs-keyword">import</span> BeautifulSoup</div><div class="ql-code-block">html =  <span class="ql-token hljs-string">"""</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;html&gt;&lt;head&gt;&lt;title&gt;The Dormouse's story&lt;/title&gt;&lt;/head&gt;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;body&gt;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;p class="title"&gt;&lt;b&gt;The Dormouse's story&lt;/b&gt;&lt;/p&gt;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;p class="story"&gt;Once upon a time there were three little sisters; and their names were</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;a href="http://example.com/elsie" class="sister" id="link1"&gt;&lt;!--Elsie--&gt;&lt;/a&gt;,</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;a href="http://example.com/lacie" class="sister" id="link2"&gt;Lacie&lt;/a&gt; and</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;a href="http://example.com/tillie" class="sister" id="link3"&gt;Tillie&lt;/a&gt;;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">and they lived at the bottom of a well.&lt;/p&gt;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;p class="story"&gt;...&lt;/p&gt;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;/body&gt;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;/html&gt;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">"""</span></div><div class="ql-code-block"><span class="ql-token hljs-comment">#1、BeautifulSoup对象</span></div><div class="ql-code-block">soup = BeautifulSoup(html, <span class="ql-token hljs-string">'lxml'</span>)</div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">f"type(soup):{type(soup)} \n"</span>)</div><div class="ql-code-block"><br></div><div class="ql-code-block"><span class="ql-token hljs-comment">#2、Tag对象</span></div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">f"soup.head:{soup.head} \n"</span>)</div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">f"soup.head.name:{soup.head.name} \n"</span>)</div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">f"soup.head.attrs:{soup.head.attrs} \n"</span>)</div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">f"type(soup.head):{type(soup.head)} \n"</span>)</div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>()</div><div class="ql-code-block"><br></div><div class="ql-code-block"><span class="ql-token hljs-comment">#3、Navigable String对象</span></div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">f"soup.title.string:{soup.title.string} \n"</span>)</div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">f"type(soup.title.string):{type(soup.title.string)} \n"</span>)</div><div class="ql-code-block"><br></div><div class="ql-code-block"><span class="ql-token hljs-comment">#4、Comment对象</span></div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">f"soup.a.string:{soup.a.string} \n"</span>)</div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">f"type(soup.a.string):{type(soup.a.string)} \n"</span>)</div><div class="ql-code-block"><br></div><div class="ql-code-block"><span class="ql-token hljs-comment">#5、结构化输出soup对象</span></div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">f"soup.prettify()=&gt;{soup.prettify()}"</span>)</div></div><p><img src="https://article-images.zsxq.com/FmlPl-0tw4xgHRqKTWm5F2R15YJq"></p><div class="ql-code-block-container"><div class="ql-code-block">type(soup):<span class="ql-token hljs-tag">&lt;class 'bs4.BeautifulSoup'&gt;</span></div><div class="ql-code-block"><br></div><div class="ql-code-block">soup.head:<span class="ql-token hljs-tag">&lt;head&gt;&lt;title&gt;</span>The Dormouse's story<span class="ql-token hljs-tag">&lt;/title&gt;&lt;/head&gt;</span></div><div class="ql-code-block"><br></div><div class="ql-code-block">soup.head.name:head</div><div class="ql-code-block"><br></div><div class="ql-code-block">soup.head.attrs:{}</div><div class="ql-code-block"><br></div><div class="ql-code-block">type(soup.head):<span class="ql-token hljs-tag">&lt;class 'bs4.element.Tag'&gt;</span></div><div class="ql-code-block"><br></div><div class="ql-code-block">soup.title.string:The Dormouse's story</div><div class="ql-code-block"><br></div><div class="ql-code-block">type(soup.title.string):<span class="ql-token hljs-tag">&lt;class 'bs4.element.NavigableString'&gt;</span></div><div class="ql-code-block"><br></div><div class="ql-code-block">soup.a.string:Elsie</div><div class="ql-code-block"><br></div><div class="ql-code-block">type(soup.a.string):<span class="ql-token hljs-tag">&lt;class 'bs4.element.Comment'&gt;</span></div><div class="ql-code-block"><br></div><div class="ql-code-block">soup.prettify()=&gt;<span class="ql-token hljs-tag">&lt;html&gt;</span></div><div class="ql-code-block"> <span class="ql-token hljs-tag">&lt;head&gt;</span></div><div class="ql-code-block">  <span class="ql-token hljs-tag">&lt;title&gt;</span></div><div class="ql-code-block">   The Dormouse's story</div><div class="ql-code-block">  <span class="ql-token hljs-tag">&lt;/title&gt;</span></div><div class="ql-code-block"> <span class="ql-token hljs-tag">&lt;/head&gt;</span></div><div class="ql-code-block"> <span class="ql-token hljs-tag">&lt;body&gt;</span></div><div class="ql-code-block">  <span class="ql-token hljs-tag">&lt;p class="title"&gt;</span></div><div class="ql-code-block">   <span class="ql-token hljs-tag">&lt;b&gt;</span></div><div class="ql-code-block">    The Dormouse's story</div><div class="ql-code-block">   <span class="ql-token hljs-tag">&lt;/b&gt;</span></div><div class="ql-code-block">  <span class="ql-token hljs-tag">&lt;/p&gt;</span></div><div class="ql-code-block">  <span class="ql-token hljs-tag">&lt;p class="story"&gt;</span></div><div class="ql-code-block">   Once upon a time there were three little sisters; and their names were</div><div class="ql-code-block">   <span class="ql-token hljs-tag">&lt;a class="sister" href="http://example.com/elsie" id="link1"&gt;</span></div><div class="ql-code-block">    <span class="ql-token hljs-comment">&lt;!--Elsie--&gt;</span></div><div class="ql-code-block">   <span class="ql-token hljs-tag">&lt;/a&gt;</span></div><div class="ql-code-block">   ,</div><div class="ql-code-block">   <span class="ql-token hljs-tag">&lt;a class="sister" href="http://example.com/lacie" id="link2"&gt;</span></div><div class="ql-code-block">    Lacie</div><div class="ql-code-block">   <span class="ql-token hljs-tag">&lt;/a&gt;</span></div><div class="ql-code-block">   and</div><div class="ql-code-block">   <span class="ql-token hljs-tag">&lt;a class="sister" href="http://example.com/tillie" id="link3"&gt;</span></div><div class="ql-code-block">    Tillie</div><div class="ql-code-block">   <span class="ql-token hljs-tag">&lt;/a&gt;</span></div><div class="ql-code-block">   ;</div><div class="ql-code-block">and they lived at the bottom of a well.</div><div class="ql-code-block">  <span class="ql-token hljs-tag">&lt;/p&gt;</span></div><div class="ql-code-block">  <span class="ql-token hljs-tag">&lt;p class="story"&gt;</span></div><div class="ql-code-block">   ...</div><div class="ql-code-block">  <span class="ql-token hljs-tag">&lt;/p&gt;</span></div><div class="ql-code-block"> <span class="ql-token hljs-tag">&lt;/body&gt;</span></div><div class="ql-code-block"><span class="ql-token hljs-tag">&lt;/html&gt;</span></div></div><p><br></p><p><img src="https://article-images.zsxq.com/FtPX-qsEEgZYHos3AnyDni1jH6rn"></p><div class="ql-code-block-container"><div class="ql-code-block"><span class="ql-token hljs-keyword">from</span> bs4 <span class="ql-token hljs-keyword">import</span> BeautifulSoup</div><div class="ql-code-block">html =  <span class="ql-token hljs-string">"""</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;html&gt;&lt;head&gt;&lt;title&gt;The Dormouse's story&lt;/title&gt;&lt;/head&gt;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;body&gt;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;p class="title"&gt;&lt;b&gt;The Dormouse's story&lt;/b&gt;&lt;/p&gt;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;p class="story"&gt;Once upon a time there were three little sisters; and their names were</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;a href="http://example.com/elsie" class="sister" id="link1"&gt;&lt;!--Elsie--&gt;&lt;/a&gt;,</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;a href="http://example.com/lacie" class="sister" id="link2"&gt;Lacie&lt;/a&gt; and</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;a href="http://example.com/tillie" class="sister" id="link3"&gt;Tillie&lt;/a&gt;;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">and they lived at the bottom of a well.&lt;/p&gt;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;p class="story"&gt;...&lt;/p&gt;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;/body&gt;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;/html&gt;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">"""</span></div><div class="ql-code-block"><span class="ql-token hljs-comment">#1、BeautifulSoup对象</span></div><div class="ql-code-block">soup = BeautifulSoup(html, <span class="ql-token hljs-string">'lxml'</span>)</div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">f"type(soup):{type(soup)} \n"</span>)</div><div class="ql-code-block"><br></div><div class="ql-code-block"><span class="ql-token hljs-comment">#1、向下遍历</span></div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(soup.p.contents)</div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-built_in">list</span>(soup.p.children))</div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-built_in">list</span>(soup.p.descendants))</div><div class="ql-code-block"><br></div><div class="ql-code-block"><span class="ql-token hljs-comment">#2、向上遍历</span></div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(soup.p.parent.name,<span class="ql-token hljs-string">'\n'</span>)</div><div class="ql-code-block"><span class="ql-token hljs-keyword">for</span> i <span class="ql-token hljs-keyword">in</span> soup.p.parents:</div><div class="ql-code-block">    <span class="ql-token hljs-built_in">print</span>(i.name)</div><div class="ql-code-block"><br></div><div class="ql-code-block"><span class="ql-token hljs-comment">#3、平行遍历</span></div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">'a_next:'</span>,soup.a.next_sibling)</div><div class="ql-code-block"><span class="ql-token hljs-keyword">for</span> i <span class="ql-token hljs-keyword">in</span> soup.a.next_siblings:</div><div class="ql-code-block">    <span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">'a_nexts:'</span>,i)</div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">'a_previous:'</span>,soup.a.previous_sibling)</div><div class="ql-code-block"><span class="ql-token hljs-keyword">for</span> i <span class="ql-token hljs-keyword">in</span> soup.a.previous_siblings:</div><div class="ql-code-block">    <span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">'a_previouss:'</span>,i)</div></div><p><br></p><p><img src="https://article-images.zsxq.com/FuyJDzHROhQahkpBUh4jWRuaB-mo"></p><p><br></p><div class="ql-code-block-container"><div class="ql-code-block"><span class="ql-token hljs-built_in">type</span>(soup):&lt;<span class="ql-token hljs-keyword">class</span> <span class="ql-token hljs-string">'bs4.BeautifulSoup'</span>&gt;</div><div class="ql-code-block"><br></div><div class="ql-code-block">[&lt;b&gt;The Dormouse<span class="ql-token hljs-string">'s story&lt;/b&gt;]</span></div><div class="ql-code-block"><span class="ql-token hljs-string">[&lt;b&gt;The Dormouse'</span>s story&lt;/b&gt;]</div><div class="ql-code-block">[&lt;b&gt;The Dormouse<span class="ql-token hljs-string">'s story&lt;/b&gt;, "The Dormouse'</span>s story<span class="ql-token hljs-string">"]</span></div><div class="ql-code-block"><span class="ql-token hljs-string">body</span></div><div class="ql-code-block"><br></div><div class="ql-code-block"><span class="ql-token hljs-string">body</span></div><div class="ql-code-block"><span class="ql-token hljs-string">html</span></div><div class="ql-code-block"><span class="ql-token hljs-string">[document]</span></div><div class="ql-code-block"><span class="ql-token hljs-string">a_next: ,</span></div><div class="ql-code-block"><br></div><div class="ql-code-block"><span class="ql-token hljs-string">a_nexts: ,</span></div><div class="ql-code-block"><br></div><div class="ql-code-block"><span class="ql-token hljs-string">a_nexts: &lt;a class="</span>siste<span class="ql-token hljs-string">r" href="</span>http://example.com/lacie<span class="ql-token hljs-string">" id="</span>link2<span class="ql-token hljs-string">"&gt;Lacie&lt;/a&gt;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">a_nexts:  and</span></div><div class="ql-code-block"><br></div><div class="ql-code-block"><span class="ql-token hljs-string">a_nexts: &lt;a class="</span>siste<span class="ql-token hljs-string">r" href="</span>http://example.com/tillie<span class="ql-token hljs-string">" id="</span>link3<span class="ql-token hljs-string">"&gt;Tillie&lt;/a&gt;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">a_nexts: ;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">and they lived at the bottom of a well.</span></div><div class="ql-code-block"><span class="ql-token hljs-string">a_previous: Once upon a time there were three little sisters; and their names were</span></div><div class="ql-code-block"><br></div><div class="ql-code-block"><span class="ql-token hljs-string">a_previouss: Once upon a time there were three little sisters; and their names were</span></div></div><p><img src="https://article-images.zsxq.com/FtqWdNWSM0b8quez92lJ9SqPTK76"></p><p><span style="background-color: rgb(240, 240, 240); color: rgb(92, 92, 92);">代码</span></p><p><br></p><div class="ql-code-block-container"><div class="ql-code-block"><span class="ql-token hljs-keyword">from</span> bs4 <span class="ql-token hljs-keyword">import</span> BeautifulSoup</div><div class="ql-code-block">html =  <span class="ql-token hljs-string">"""</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;html&gt;&lt;head&gt;&lt;title&gt;The Dormouse's story&lt;/title&gt;&lt;/head&gt;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;body&gt;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;p class="title"&gt;&lt;b&gt;The Dormouse's story&lt;/b&gt;&lt;/p&gt;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;p class="story"&gt;Once upon a time there were three little sisters; and their names were</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;a href="http://example.com/elsie" class="sister" id="link1"&gt;&lt;!--Elsie--&gt;&lt;/a&gt;,</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;a href="http://example.com/lacie" class="sister" id="link2"&gt;Lacie&lt;/a&gt; and</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;a href="http://example.com/tillie" class="sister" id="link3"&gt;Tillie&lt;/a&gt;;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">and they lived at the bottom of a well.&lt;/p&gt;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;p class="story"&gt;...&lt;/p&gt;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;/body&gt;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;/html&gt;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">"""</span></div><div class="ql-code-block"><span class="ql-token hljs-comment">#1、BeautifulSoup对象</span></div><div class="ql-code-block">soup = BeautifulSoup(html, <span class="ql-token hljs-string">'lxml'</span>)</div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">f"type(soup):{type(soup)} \n"</span>)</div><div class="ql-code-block"><br></div><div class="ql-code-block"><span class="ql-token hljs-comment">#1、find_all( )</span></div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(soup.find_all(<span class="ql-token hljs-string">'a'</span>))  <span class="ql-token hljs-comment">#检索标签名</span></div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(soup.find_all(<span class="ql-token hljs-string">'a'</span>,<span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">'link1'</span>)) <span class="ql-token hljs-comment">#检索属性值</span></div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(soup.find_all(<span class="ql-token hljs-string">'a'</span>,class_=<span class="ql-token hljs-string">'sister'</span>)) </div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(soup.find_all(text=[<span class="ql-token hljs-string">'Elsie'</span>,<span class="ql-token hljs-string">'Lacie'</span>]))</div><div class="ql-code-block"><br></div><div class="ql-code-block"><span class="ql-token hljs-comment">#2、find( )</span></div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(soup.find(<span class="ql-token hljs-string">'a'</span>))</div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(soup.find(<span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">'link2'</span>))</div><div class="ql-code-block"><br></div><div class="ql-code-block"><span class="ql-token hljs-comment">#3 、向上检索</span></div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(soup.p.find_parent().name)</div><div class="ql-code-block"><span class="ql-token hljs-keyword">for</span> i <span class="ql-token hljs-keyword">in</span> soup.title.find_parents():</div><div class="ql-code-block">    <span class="ql-token hljs-built_in">print</span>(i.name)</div><div class="ql-code-block">    </div><div class="ql-code-block"><span class="ql-token hljs-comment">#4、平行检索</span></div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(soup.head.find_next_sibling().name)</div><div class="ql-code-block"><span class="ql-token hljs-keyword">for</span> i <span class="ql-token hljs-keyword">in</span> soup.head.find_next_siblings():</div><div class="ql-code-block">    <span class="ql-token hljs-built_in">print</span>(i.name)</div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(soup.title.find_previous_sibling())</div><div class="ql-code-block"><span class="ql-token hljs-keyword">for</span> i <span class="ql-token hljs-keyword">in</span> soup.title.find_previous_siblings():</div><div class="ql-code-block">    <span class="ql-token hljs-built_in">print</span>(i.name)</div></div><p><img src="https://article-images.zsxq.com/FgdDcWod8Suvbq5UuGYLvXz0UI8R"></p><div class="ql-code-block-container"><div class="ql-code-block"><span class="ql-token hljs-built_in">type</span>(soup):&lt;<span class="ql-token hljs-keyword">class</span> <span class="ql-token hljs-string">'bs4.BeautifulSoup'</span>&gt;</div><div class="ql-code-block"><br></div><div class="ql-code-block">[&lt;a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/elsie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link1"</span>&gt;&lt;!--Elsie--&gt;&lt;/a&gt;, &lt;a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/lacie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link2"</span>&gt;Lacie&lt;/a&gt;, &lt;a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/tillie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link3"</span>&gt;Tillie&lt;/a&gt;]</div><div class="ql-code-block">[&lt;a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/elsie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link1"</span>&gt;&lt;!--Elsie--&gt;&lt;/a&gt;]</div><div class="ql-code-block">[&lt;a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/elsie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link1"</span>&gt;&lt;!--Elsie--&gt;&lt;/a&gt;, &lt;a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/lacie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link2"</span>&gt;Lacie&lt;/a&gt;, &lt;a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/tillie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link3"</span>&gt;Tillie&lt;/a&gt;]</div><div class="ql-code-block">F:\AwesomeRAG\tutorial\layout_analysis\html\tutorial\BeautifulSoup4\test3.py:<span class="ql-token hljs-number">24</span>: DeprecationWarning: The <span class="ql-token hljs-string">'text'</span> argument to find()-<span class="ql-token hljs-built_in">type</span> methods <span class="ql-token hljs-keyword">is</span> deprecated. Use <span class="ql-token hljs-string">'string'</span> instead.</div><div class="ql-code-block">  <span class="ql-token hljs-built_in">print</span>(soup.find_all(text=[<span class="ql-token hljs-string">'Elsie'</span>,<span class="ql-token hljs-string">'Lacie'</span>]))</div><div class="ql-code-block">[<span class="ql-token hljs-string">'Elsie'</span>, <span class="ql-token hljs-string">'Lacie'</span>]</div><div class="ql-code-block">&lt;a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/elsie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link1"</span>&gt;&lt;!--Elsie--&gt;&lt;/a&gt;</div><div class="ql-code-block">&lt;a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/lacie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link2"</span>&gt;Lacie&lt;/a&gt;</div><div class="ql-code-block">body</div><div class="ql-code-block">head</div><div class="ql-code-block">html</div><div class="ql-code-block">[document]</div><div class="ql-code-block">body</div><div class="ql-code-block">body</div><div class="ql-code-block"><span class="ql-token hljs-literal">None</span></div></div><p><img src="https://article-images.zsxq.com/FiqTmlpR_fGE6pUZ8gCcdD9z1ao_"></p><div class="ql-code-block-container"><div class="ql-code-block">HTML标题:&lt;h&gt; &lt;/h&gt;</div><div class="ql-code-block">HTML段落:&lt;p&gt; &lt;/p&gt;</div><div class="ql-code-block">HTML链接:&lt;a href=<span class="ql-token hljs-string">'httts://www.baidu.com/'</span>&gt; this <span class="ql-token hljs-keyword">is</span> a link &lt;/a&gt;</div><div class="ql-code-block">HTML图像:&lt;img src=<span class="ql-token hljs-string">'Ai-code.jpg'</span>,width=<span class="ql-token hljs-string">'104'</span>,height=<span class="ql-token hljs-string">'144'</span> /&gt;</div><div class="ql-code-block">HTML表格:&lt;table&gt; &lt;/table&gt;</div><div class="ql-code-block">HTML列表:&lt;ul&gt; &lt;/ul&gt;</div><div class="ql-code-block">HTML块:&lt;div&gt; &lt;/div&gt;</div></div><p><img src="https://article-images.zsxq.com/FkTgptMBTLt2w7nUUKs13PNKkckn"></p><div class="ql-code-block-container"><div class="ql-code-block"><span class="ql-token hljs-keyword">from</span> bs4 <span class="ql-token hljs-keyword">import</span> BeautifulSoup</div><div class="ql-code-block"><br></div><div class="ql-code-block">html =  <span class="ql-token hljs-string">"""</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;html&gt;&lt;head&gt;&lt;title&gt;The Dormouse's story&lt;/title&gt;&lt;/head&gt;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;body&gt;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;p class="title"&gt;&lt;b&gt;The Dormouse's story&lt;/b&gt;&lt;/p&gt;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;p class="story"&gt;Once upon a time there were three little sisters; and their names were</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;a href="http://example.com/elsie" class="sister" id="link1"&gt;&lt;!--Elsie--&gt;&lt;/a&gt;,</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;a href="http://example.com/lacie" class="sister" id="link2"&gt;Lacie&lt;/a&gt; and</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;a href="http://example.com/tillie" class="sister" id="link3"&gt;Tillie&lt;/a&gt;;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">and they lived at the bottom of a well.&lt;/p&gt;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;p class="story"&gt;...&lt;/p&gt;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;/body&gt;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;/html&gt;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">"""</span></div><div class="ql-code-block"><span class="ql-token hljs-comment">#1、BeautifulSoup对象</span></div><div class="ql-code-block">soup = BeautifulSoup(html, <span class="ql-token hljs-string">'lxml'</span>)</div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">f"type(soup):{type(soup)} \n"</span>)</div><div class="ql-code-block"><br></div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">'标签查找:'</span>,soup.select(<span class="ql-token hljs-string">'a'</span>))</div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">'属性查找:'</span>,soup.select(<span class="ql-token hljs-string">'a[id="link1"]'</span>))</div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">'类名查找:'</span>,soup.select(<span class="ql-token hljs-string">'.sister'</span>))</div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">'id查找:'</span>,soup.select(<span class="ql-token hljs-string">'#link1'</span>))</div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">'组合查找:'</span>,soup.select(<span class="ql-token hljs-string">'p #link1'</span>))</div></div><p><img src="https://article-images.zsxq.com/FjQkiig9fOl0Bd5qiCbyH4OddW50"></p><div class="ql-code-block-container"><div class="ql-code-block"><span class="ql-token hljs-built_in">type</span>(soup):&lt;<span class="ql-token hljs-keyword">class</span> <span class="ql-token hljs-string">'bs4.BeautifulSoup'</span>&gt;</div><div class="ql-code-block"><br></div><div class="ql-code-block">标签查找: [&lt;a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/elsie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link1"</span>&gt;&lt;!--Elsie--&gt;&lt;/a&gt;, &lt;a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/lacie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link2"</span>&gt;Lacie&lt;/a&gt;, &lt;a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/tillie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link3"</span>&gt;Tillie&lt;/a&gt;]</div><div class="ql-code-block">属性查找: [&lt;a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/elsie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link1"</span>&gt;&lt;!--Elsie--&gt;&lt;/a&gt;]</div><div class="ql-code-block">类名查找: [&lt;a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/elsie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link1"</span>&gt;&lt;!--Elsie--&gt;&lt;/a&gt;, &lt;a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/lacie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link2"</span>&gt;Lacie&lt;/a&gt;, &lt;a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/tillie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link3"</span>&gt;Tillie&lt;/a&gt;]</div><div class="ql-code-block"><span class="ql-token hljs-built_in">id</span>查找: [&lt;a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/elsie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link1"</span>&gt;&lt;!--Elsie--&gt;&lt;/a&gt;]</div><div class="ql-code-block">组合查找: [&lt;a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/elsie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link1"</span>&gt;&lt;!--Elsie--&gt;&lt;/a&gt;]</div></div><p><img src="https://article-images.zsxq.com/FqZck0in441U4EYGi6KobKlS0emA"></p><div class="ql-code-block-container"><div class="ql-code-block"><span class="ql-token hljs-keyword">import</span> requests</div><div class="ql-code-block"><span class="ql-token hljs-keyword">from</span> bs4 <span class="ql-token hljs-keyword">import</span> BeautifulSoup</div><div class="ql-code-block"><span class="ql-token hljs-keyword">import</span> os</div><div class="ql-code-block"><br></div><div class="ql-code-block"><span class="ql-token hljs-keyword">def</span> <span class="ql-token hljs-title">getUrl</span>(<span class="ql-token hljs-params">url</span>):</div><div class="ql-code-block">    <span class="ql-token hljs-keyword">try</span>:</div><div class="ql-code-block">        read = requests.get(url)  </div><div class="ql-code-block">        read.raise_for_status()   </div><div class="ql-code-block">        read.encoding = read.apparent_encoding  </div><div class="ql-code-block">        <span class="ql-token hljs-keyword">return</span> read.text    </div><div class="ql-code-block">    <span class="ql-token hljs-keyword">except</span>:</div><div class="ql-code-block">        <span class="ql-token hljs-keyword">return</span> <span class="ql-token hljs-string">"连接失败!"</span></div><div class="ql-code-block"> </div><div class="ql-code-block"><span class="ql-token hljs-keyword">def</span> <span class="ql-token hljs-title">getPic</span>(<span class="ql-token hljs-params">html</span>):</div><div class="ql-code-block">    soup = BeautifulSoup(html, <span class="ql-token hljs-string">"html.parser"</span>)</div><div class="ql-code-block">    </div><div class="ql-code-block">    all_img = soup.find(<span class="ql-token hljs-string">'ul'</span>).find_all(<span class="ql-token hljs-string">'img'</span>) </div><div class="ql-code-block">    <span class="ql-token hljs-keyword">for</span> img <span class="ql-token hljs-keyword">in</span> all_img:</div><div class="ql-code-block">        src = img[<span class="ql-token hljs-string">'src'</span>]  </div><div class="ql-code-block">        img_url = src</div><div class="ql-code-block">        <span class="ql-token hljs-built_in">print</span>(img_url)</div><div class="ql-code-block">        root = <span class="ql-token hljs-string">"F:/Pic/"</span>   </div><div class="ql-code-block">        path = root + img_url.split(<span class="ql-token hljs-string">'/'</span>)[-<span class="ql-token hljs-number">1</span>]  </div><div class="ql-code-block">        <span class="ql-token hljs-built_in">print</span>(path)</div><div class="ql-code-block">        <span class="ql-token hljs-keyword">try</span>:</div><div class="ql-code-block">            <span class="ql-token hljs-keyword">if</span> <span class="ql-token hljs-keyword">not</span> os.path.exists(root):  </div><div class="ql-code-block">                os.mkdir(root)</div><div class="ql-code-block">            <span class="ql-token hljs-keyword">if</span> <span class="ql-token hljs-keyword">not</span> os.path.exists(path):</div><div class="ql-code-block">                read = requests.get(img_url)</div><div class="ql-code-block">                <span class="ql-token hljs-keyword">with</span> <span class="ql-token hljs-built_in">open</span>(path, <span class="ql-token hljs-string">"wb"</span>)<span class="ql-token hljs-keyword">as</span> f:</div><div class="ql-code-block">                    f.write(read.content)</div><div class="ql-code-block">                    f.close()</div><div class="ql-code-block">                    <span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">"文件保存成功!"</span>)</div><div class="ql-code-block">            <span class="ql-token hljs-keyword">else</span>:</div><div class="ql-code-block">                <span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">"文件已存在!"</span>)</div><div class="ql-code-block">        <span class="ql-token hljs-keyword">except</span>:</div><div class="ql-code-block">            <span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">"文件爬取失败!"</span>)</div><div class="ql-code-block"> </div><div class="ql-code-block"><span class="ql-token hljs-keyword">if</span> __name__ == <span class="ql-token hljs-string">'__main__'</span>:</div><div class="ql-code-block">   html_url=getUrl(<span class="ql-token hljs-string">"https://findicons.com/search/nature"</span>)</div><div class="ql-code-block">   getPic(html_url)</div></div><p><br></p><p><img src="https://article-images.zsxq.com/Fh_dDSbuteEI_0ArnWoZrCFDRuvm"></p><div class="ql-code-block-container"><div class="ql-code-block">标签查找: [&lt;a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/elsie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link1"</span>&gt;&lt;!--Elsie--&gt;&lt;/a&gt;, &lt;a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/lacie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link2"</span>&gt;Lacie&lt;/a&gt;, &lt;a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/tillie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link3"</span>&gt;Tillie&lt;/a&gt;]</div><div class="ql-code-block">属性查找: [&lt;a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/elsie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link1"</span>&gt;&lt;!--Elsie--&gt;&lt;/a&gt;]</div><div class="ql-code-block">类名查找: [&lt;a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/elsie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link1"</span>&gt;&lt;!--Elsie--&gt;&lt;/a&gt;, &lt;a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/lacie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link2"</span>&gt;Lacie&lt;/a&gt;, &lt;a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/tillie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link3"</span>&gt;Tillie&lt;/a&gt;]</div><div class="ql-code-block"><span class="ql-token hljs-built_in">id</span>查找: [&lt;a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/elsie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link1"</span>&gt;&lt;!--Elsie--&gt;&lt;/a&gt;]</div><div class="ql-code-block">组合查找: [&lt;a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/elsie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link1"</span>&gt;&lt;!--Elsie--&gt;&lt;/a&gt;]</div></div><p><br></p><p><br></p><p><br></p><p><br></p><p><br></p><p><br></p><p><br></p><p><br></p><p><br></p><p><br></p><p><br></p><p><br></p></div></div><div class="milkdown-preview" style="display: none;"><p><img src="https://article-images.zsxq.com/FsOmOdM3jIkLawUT9z7sEbkMZgpV"></p><p><img src="https://article-images.zsxq.com/FnbQkQK1pNTESbYjScR42_PrYb9E"></p><div class="ql-code-block-container"><div class="ql-code-block"><span class="ql-token hljs-keyword">from</span> bs4 <span class="ql-token hljs-keyword">import</span> BeautifulSoup</div><div class="ql-code-block">html =  <span class="ql-token hljs-string">"""</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;html&gt;&lt;head&gt;&lt;title&gt;The Dormouse's story&lt;/title&gt;&lt;/head&gt;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;body&gt;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;p class="title"&gt;&lt;b&gt;The Dormouse's story&lt;/b&gt;&lt;/p&gt;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;p class="story"&gt;Once upon a time there were three little sisters; and their names were</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;a href="http://example.com/elsie" class="sister" id="link1"&gt;&lt;!--Elsie--&gt;&lt;/a&gt;,</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;a href="http://example.com/lacie" class="sister" id="link2"&gt;Lacie&lt;/a&gt; and</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;a href="http://example.com/tillie" class="sister" id="link3"&gt;Tillie&lt;/a&gt;;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">and they lived at the bottom of a well.&lt;/p&gt;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;p class="story"&gt;...&lt;/p&gt;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;/body&gt;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;/html&gt;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">"""</span></div><div class="ql-code-block"><span class="ql-token hljs-comment">#1、BeautifulSoup对象</span></div><div class="ql-code-block">soup = BeautifulSoup(html, <span class="ql-token hljs-string">'lxml'</span>)</div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">f"type(soup):{type(soup)} \n"</span>)</div><div class="ql-code-block"><br></div><div class="ql-code-block"><span class="ql-token hljs-comment">#2、Tag对象</span></div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">f"soup.head:{soup.head} \n"</span>)</div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">f"soup.head.name:{soup.head.name} \n"</span>)</div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">f"soup.head.attrs:{soup.head.attrs} \n"</span>)</div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">f"type(soup.head):{type(soup.head)} \n"</span>)</div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>()</div><div class="ql-code-block"><br></div><div class="ql-code-block"><span class="ql-token hljs-comment">#3、Navigable String对象</span></div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">f"soup.title.string:{soup.title.string} \n"</span>)</div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">f"type(soup.title.string):{type(soup.title.string)} \n"</span>)</div><div class="ql-code-block"><br></div><div class="ql-code-block"><span class="ql-token hljs-comment">#4、Comment对象</span></div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">f"soup.a.string:{soup.a.string} \n"</span>)</div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">f"type(soup.a.string):{type(soup.a.string)} \n"</span>)</div><div class="ql-code-block"><br></div><div class="ql-code-block"><span class="ql-token hljs-comment">#5、结构化输出soup对象</span></div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">f"soup.prettify()=&gt;{soup.prettify()}"</span>)</div></div><p><img src="https://article-images.zsxq.com/FmlPl-0tw4xgHRqKTWm5F2R15YJq"></p><div class="ql-code-block-container"><div class="ql-code-block">type(soup):<span class="ql-token hljs-tag">&lt;class 'bs4.BeautifulSoup'&gt;</span></div><div class="ql-code-block"><br></div><div class="ql-code-block">soup.head:<span class="ql-token hljs-tag">&lt;head&gt;&lt;title&gt;</span>The Dormouse's story<span class="ql-token hljs-tag">&lt;/title&gt;&lt;/head&gt;</span></div><div class="ql-code-block"><br></div><div class="ql-code-block">soup.head.name:head</div><div class="ql-code-block"><br></div><div class="ql-code-block">soup.head.attrs:{}</div><div class="ql-code-block"><br></div><div class="ql-code-block">type(soup.head):<span class="ql-token hljs-tag">&lt;class 'bs4.element.Tag'&gt;</span></div><div class="ql-code-block"><br></div><div class="ql-code-block">soup.title.string:The Dormouse's story</div><div class="ql-code-block"><br></div><div class="ql-code-block">type(soup.title.string):<span class="ql-token hljs-tag">&lt;class 'bs4.element.NavigableString'&gt;</span></div><div class="ql-code-block"><br></div><div class="ql-code-block">soup.a.string:Elsie</div><div class="ql-code-block"><br></div><div class="ql-code-block">type(soup.a.string):<span class="ql-token hljs-tag">&lt;class 'bs4.element.Comment'&gt;</span></div><div class="ql-code-block"><br></div><div class="ql-code-block">soup.prettify()=&gt;<span class="ql-token hljs-tag">&lt;html&gt;</span></div><div class="ql-code-block"> <span class="ql-token hljs-tag">&lt;head&gt;</span></div><div class="ql-code-block">  <span class="ql-token hljs-tag">&lt;title&gt;</span></div><div class="ql-code-block">   The Dormouse's story</div><div class="ql-code-block">  <span class="ql-token hljs-tag">&lt;/title&gt;</span></div><div class="ql-code-block"> <span class="ql-token hljs-tag">&lt;/head&gt;</span></div><div class="ql-code-block"> <span class="ql-token hljs-tag">&lt;body&gt;</span></div><div class="ql-code-block">  <span class="ql-token hljs-tag">&lt;p class="title"&gt;</span></div><div class="ql-code-block">   <span class="ql-token hljs-tag">&lt;b&gt;</span></div><div class="ql-code-block">    The Dormouse's story</div><div class="ql-code-block">   <span class="ql-token hljs-tag">&lt;/b&gt;</span></div><div class="ql-code-block">  <span class="ql-token hljs-tag">&lt;/p&gt;</span></div><div class="ql-code-block">  <span class="ql-token hljs-tag">&lt;p class="story"&gt;</span></div><div class="ql-code-block">   Once upon a time there were three little sisters; and their names were</div><div class="ql-code-block">   <span class="ql-token hljs-tag">&lt;a class="sister" href="http://example.com/elsie" id="link1"&gt;</span></div><div class="ql-code-block">    <span class="ql-token hljs-comment">&lt;!--Elsie--&gt;</span></div><div class="ql-code-block">   <span class="ql-token hljs-tag">&lt;/a&gt;</span></div><div class="ql-code-block">   ,</div><div class="ql-code-block">   <span class="ql-token hljs-tag">&lt;a class="sister" href="http://example.com/lacie" id="link2"&gt;</span></div><div class="ql-code-block">    Lacie</div><div class="ql-code-block">   <span class="ql-token hljs-tag">&lt;/a&gt;</span></div><div class="ql-code-block">   and</div><div class="ql-code-block">   <span class="ql-token hljs-tag">&lt;a class="sister" href="http://example.com/tillie" id="link3"&gt;</span></div><div class="ql-code-block">    Tillie</div><div class="ql-code-block">   <span class="ql-token hljs-tag">&lt;/a&gt;</span></div><div class="ql-code-block">   ;</div><div class="ql-code-block">and they lived at the bottom of a well.</div><div class="ql-code-block">  <span class="ql-token hljs-tag">&lt;/p&gt;</span></div><div class="ql-code-block">  <span class="ql-token hljs-tag">&lt;p class="story"&gt;</span></div><div class="ql-code-block">   ...</div><div class="ql-code-block">  <span class="ql-token hljs-tag">&lt;/p&gt;</span></div><div class="ql-code-block"> <span class="ql-token hljs-tag">&lt;/body&gt;</span></div><div class="ql-code-block"><span class="ql-token hljs-tag">&lt;/html&gt;</span></div></div><p><br></p><p><img src="https://article-images.zsxq.com/FtPX-qsEEgZYHos3AnyDni1jH6rn"></p><div class="ql-code-block-container"><div class="ql-code-block"><span class="ql-token hljs-keyword">from</span> bs4 <span class="ql-token hljs-keyword">import</span> BeautifulSoup</div><div class="ql-code-block">html =  <span class="ql-token hljs-string">"""</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;html&gt;&lt;head&gt;&lt;title&gt;The Dormouse's story&lt;/title&gt;&lt;/head&gt;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;body&gt;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;p class="title"&gt;&lt;b&gt;The Dormouse's story&lt;/b&gt;&lt;/p&gt;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;p class="story"&gt;Once upon a time there were three little sisters; and their names were</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;a href="http://example.com/elsie" class="sister" id="link1"&gt;&lt;!--Elsie--&gt;&lt;/a&gt;,</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;a href="http://example.com/lacie" class="sister" id="link2"&gt;Lacie&lt;/a&gt; and</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;a href="http://example.com/tillie" class="sister" id="link3"&gt;Tillie&lt;/a&gt;;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">and they lived at the bottom of a well.&lt;/p&gt;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;p class="story"&gt;...&lt;/p&gt;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;/body&gt;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;/html&gt;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">"""</span></div><div class="ql-code-block"><span class="ql-token hljs-comment">#1、BeautifulSoup对象</span></div><div class="ql-code-block">soup = BeautifulSoup(html, <span class="ql-token hljs-string">'lxml'</span>)</div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">f"type(soup):{type(soup)} \n"</span>)</div><div class="ql-code-block"><br></div><div class="ql-code-block"><span class="ql-token hljs-comment">#1、向下遍历</span></div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(soup.p.contents)</div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-built_in">list</span>(soup.p.children))</div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-built_in">list</span>(soup.p.descendants))</div><div class="ql-code-block"><br></div><div class="ql-code-block"><span class="ql-token hljs-comment">#2、向上遍历</span></div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(soup.p.parent.name,<span class="ql-token hljs-string">'\n'</span>)</div><div class="ql-code-block"><span class="ql-token hljs-keyword">for</span> i <span class="ql-token hljs-keyword">in</span> soup.p.parents:</div><div class="ql-code-block">    <span class="ql-token hljs-built_in">print</span>(i.name)</div><div class="ql-code-block"><br></div><div class="ql-code-block"><span class="ql-token hljs-comment">#3、平行遍历</span></div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">'a_next:'</span>,soup.a.next_sibling)</div><div class="ql-code-block"><span class="ql-token hljs-keyword">for</span> i <span class="ql-token hljs-keyword">in</span> soup.a.next_siblings:</div><div class="ql-code-block">    <span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">'a_nexts:'</span>,i)</div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">'a_previous:'</span>,soup.a.previous_sibling)</div><div class="ql-code-block"><span class="ql-token hljs-keyword">for</span> i <span class="ql-token hljs-keyword">in</span> soup.a.previous_siblings:</div><div class="ql-code-block">    <span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">'a_previouss:'</span>,i)</div></div><p><br></p><p><img src="https://article-images.zsxq.com/FuyJDzHROhQahkpBUh4jWRuaB-mo"></p><p><br></p><div class="ql-code-block-container"><div class="ql-code-block"><span class="ql-token hljs-built_in">type</span>(soup):&lt;<span class="ql-token hljs-keyword">class</span> <span class="ql-token hljs-string">'bs4.BeautifulSoup'</span>&gt;</div><div class="ql-code-block"><br></div><div class="ql-code-block">[&lt;b&gt;The Dormouse<span class="ql-token hljs-string">'s story&lt;/b&gt;]</span></div><div class="ql-code-block"><span class="ql-token hljs-string">[&lt;b&gt;The Dormouse'</span>s story&lt;/b&gt;]</div><div class="ql-code-block">[&lt;b&gt;The Dormouse<span class="ql-token hljs-string">'s story&lt;/b&gt;, "The Dormouse'</span>s story<span class="ql-token hljs-string">"]</span></div><div class="ql-code-block"><span class="ql-token hljs-string">body</span></div><div class="ql-code-block"><br></div><div class="ql-code-block"><span class="ql-token hljs-string">body</span></div><div class="ql-code-block"><span class="ql-token hljs-string">html</span></div><div class="ql-code-block"><span class="ql-token hljs-string">[document]</span></div><div class="ql-code-block"><span class="ql-token hljs-string">a_next: ,</span></div><div class="ql-code-block"><br></div><div class="ql-code-block"><span class="ql-token hljs-string">a_nexts: ,</span></div><div class="ql-code-block"><br></div><div class="ql-code-block"><span class="ql-token hljs-string">a_nexts: &lt;a class="</span>siste<span class="ql-token hljs-string">r" href="</span>http://example.com/lacie<span class="ql-token hljs-string">" id="</span>link2<span class="ql-token hljs-string">"&gt;Lacie&lt;/a&gt;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">a_nexts:  and</span></div><div class="ql-code-block"><br></div><div class="ql-code-block"><span class="ql-token hljs-string">a_nexts: &lt;a class="</span>siste<span class="ql-token hljs-string">r" href="</span>http://example.com/tillie<span class="ql-token hljs-string">" id="</span>link3<span class="ql-token hljs-string">"&gt;Tillie&lt;/a&gt;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">a_nexts: ;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">and they lived at the bottom of a well.</span></div><div class="ql-code-block"><span class="ql-token hljs-string">a_previous: Once upon a time there were three little sisters; and their names were</span></div><div class="ql-code-block"><br></div><div class="ql-code-block"><span class="ql-token hljs-string">a_previouss: Once upon a time there were three little sisters; and their names were</span></div></div><p><img src="https://article-images.zsxq.com/FtqWdNWSM0b8quez92lJ9SqPTK76"></p><p><span style="background-color: rgb(240, 240, 240); color: rgb(92, 92, 92);">代码</span></p><p><br></p><div class="ql-code-block-container"><div class="ql-code-block"><span class="ql-token hljs-keyword">from</span> bs4 <span class="ql-token hljs-keyword">import</span> BeautifulSoup</div><div class="ql-code-block">html =  <span class="ql-token hljs-string">"""</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;html&gt;&lt;head&gt;&lt;title&gt;The Dormouse's story&lt;/title&gt;&lt;/head&gt;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;body&gt;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;p class="title"&gt;&lt;b&gt;The Dormouse's story&lt;/b&gt;&lt;/p&gt;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;p class="story"&gt;Once upon a time there were three little sisters; and their names were</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;a href="http://example.com/elsie" class="sister" id="link1"&gt;&lt;!--Elsie--&gt;&lt;/a&gt;,</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;a href="http://example.com/lacie" class="sister" id="link2"&gt;Lacie&lt;/a&gt; and</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;a href="http://example.com/tillie" class="sister" id="link3"&gt;Tillie&lt;/a&gt;;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">and they lived at the bottom of a well.&lt;/p&gt;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;p class="story"&gt;...&lt;/p&gt;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;/body&gt;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;/html&gt;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">"""</span></div><div class="ql-code-block"><span class="ql-token hljs-comment">#1、BeautifulSoup对象</span></div><div class="ql-code-block">soup = BeautifulSoup(html, <span class="ql-token hljs-string">'lxml'</span>)</div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">f"type(soup):{type(soup)} \n"</span>)</div><div class="ql-code-block"><br></div><div class="ql-code-block"><span class="ql-token hljs-comment">#1、find_all( )</span></div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(soup.find_all(<span class="ql-token hljs-string">'a'</span>))  <span class="ql-token hljs-comment">#检索标签名</span></div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(soup.find_all(<span class="ql-token hljs-string">'a'</span>,<span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">'link1'</span>)) <span class="ql-token hljs-comment">#检索属性值</span></div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(soup.find_all(<span class="ql-token hljs-string">'a'</span>,class_=<span class="ql-token hljs-string">'sister'</span>)) </div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(soup.find_all(text=[<span class="ql-token hljs-string">'Elsie'</span>,<span class="ql-token hljs-string">'Lacie'</span>]))</div><div class="ql-code-block"><br></div><div class="ql-code-block"><span class="ql-token hljs-comment">#2、find( )</span></div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(soup.find(<span class="ql-token hljs-string">'a'</span>))</div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(soup.find(<span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">'link2'</span>))</div><div class="ql-code-block"><br></div><div class="ql-code-block"><span class="ql-token hljs-comment">#3 、向上检索</span></div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(soup.p.find_parent().name)</div><div class="ql-code-block"><span class="ql-token hljs-keyword">for</span> i <span class="ql-token hljs-keyword">in</span> soup.title.find_parents():</div><div class="ql-code-block">    <span class="ql-token hljs-built_in">print</span>(i.name)</div><div class="ql-code-block">    </div><div class="ql-code-block"><span class="ql-token hljs-comment">#4、平行检索</span></div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(soup.head.find_next_sibling().name)</div><div class="ql-code-block"><span class="ql-token hljs-keyword">for</span> i <span class="ql-token hljs-keyword">in</span> soup.head.find_next_siblings():</div><div class="ql-code-block">    <span class="ql-token hljs-built_in">print</span>(i.name)</div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(soup.title.find_previous_sibling())</div><div class="ql-code-block"><span class="ql-token hljs-keyword">for</span> i <span class="ql-token hljs-keyword">in</span> soup.title.find_previous_siblings():</div><div class="ql-code-block">    <span class="ql-token hljs-built_in">print</span>(i.name)</div></div><p><img src="https://article-images.zsxq.com/FgdDcWod8Suvbq5UuGYLvXz0UI8R"></p><div class="ql-code-block-container"><div class="ql-code-block"><span class="ql-token hljs-built_in">type</span>(soup):&lt;<span class="ql-token hljs-keyword">class</span> <span class="ql-token hljs-string">'bs4.BeautifulSoup'</span>&gt;</div><div class="ql-code-block"><br></div><div class="ql-code-block">[&lt;a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/elsie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link1"</span>&gt;&lt;!--Elsie--&gt;&lt;/a&gt;, &lt;a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/lacie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link2"</span>&gt;Lacie&lt;/a&gt;, &lt;a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/tillie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link3"</span>&gt;Tillie&lt;/a&gt;]</div><div class="ql-code-block">[&lt;a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/elsie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link1"</span>&gt;&lt;!--Elsie--&gt;&lt;/a&gt;]</div><div class="ql-code-block">[&lt;a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/elsie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link1"</span>&gt;&lt;!--Elsie--&gt;&lt;/a&gt;, &lt;a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/lacie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link2"</span>&gt;Lacie&lt;/a&gt;, &lt;a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/tillie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link3"</span>&gt;Tillie&lt;/a&gt;]</div><div class="ql-code-block">F:\AwesomeRAG\tutorial\layout_analysis\html\tutorial\BeautifulSoup4\test3.py:<span class="ql-token hljs-number">24</span>: DeprecationWarning: The <span class="ql-token hljs-string">'text'</span> argument to find()-<span class="ql-token hljs-built_in">type</span> methods <span class="ql-token hljs-keyword">is</span> deprecated. Use <span class="ql-token hljs-string">'string'</span> instead.</div><div class="ql-code-block">  <span class="ql-token hljs-built_in">print</span>(soup.find_all(text=[<span class="ql-token hljs-string">'Elsie'</span>,<span class="ql-token hljs-string">'Lacie'</span>]))</div><div class="ql-code-block">[<span class="ql-token hljs-string">'Elsie'</span>, <span class="ql-token hljs-string">'Lacie'</span>]</div><div class="ql-code-block">&lt;a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/elsie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link1"</span>&gt;&lt;!--Elsie--&gt;&lt;/a&gt;</div><div class="ql-code-block">&lt;a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/lacie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link2"</span>&gt;Lacie&lt;/a&gt;</div><div class="ql-code-block">body</div><div class="ql-code-block">head</div><div class="ql-code-block">html</div><div class="ql-code-block">[document]</div><div class="ql-code-block">body</div><div class="ql-code-block">body</div><div class="ql-code-block"><span class="ql-token hljs-literal">None</span></div></div><p><img src="https://article-images.zsxq.com/FiqTmlpR_fGE6pUZ8gCcdD9z1ao_"></p><div class="ql-code-block-container"><div class="ql-code-block">HTML标题:&lt;h&gt; &lt;/h&gt;</div><div class="ql-code-block">HTML段落:&lt;p&gt; &lt;/p&gt;</div><div class="ql-code-block">HTML链接:&lt;a href=<span class="ql-token hljs-string">'httts://www.baidu.com/'</span>&gt; this <span class="ql-token hljs-keyword">is</span> a link &lt;/a&gt;</div><div class="ql-code-block">HTML图像:&lt;img src=<span class="ql-token hljs-string">'Ai-code.jpg'</span>,width=<span class="ql-token hljs-string">'104'</span>,height=<span class="ql-token hljs-string">'144'</span> /&gt;</div><div class="ql-code-block">HTML表格:&lt;table&gt; &lt;/table&gt;</div><div class="ql-code-block">HTML列表:&lt;ul&gt; &lt;/ul&gt;</div><div class="ql-code-block">HTML块:&lt;div&gt; &lt;/div&gt;</div></div><p><img src="https://article-images.zsxq.com/FkTgptMBTLt2w7nUUKs13PNKkckn"></p><div class="ql-code-block-container"><div class="ql-code-block"><span class="ql-token hljs-keyword">from</span> bs4 <span class="ql-token hljs-keyword">import</span> BeautifulSoup</div><div class="ql-code-block"><br></div><div class="ql-code-block">html =  <span class="ql-token hljs-string">"""</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;html&gt;&lt;head&gt;&lt;title&gt;The Dormouse's story&lt;/title&gt;&lt;/head&gt;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;body&gt;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;p class="title"&gt;&lt;b&gt;The Dormouse's story&lt;/b&gt;&lt;/p&gt;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;p class="story"&gt;Once upon a time there were three little sisters; and their names were</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;a href="http://example.com/elsie" class="sister" id="link1"&gt;&lt;!--Elsie--&gt;&lt;/a&gt;,</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;a href="http://example.com/lacie" class="sister" id="link2"&gt;Lacie&lt;/a&gt; and</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;a href="http://example.com/tillie" class="sister" id="link3"&gt;Tillie&lt;/a&gt;;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">and they lived at the bottom of a well.&lt;/p&gt;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;p class="story"&gt;...&lt;/p&gt;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;/body&gt;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;/html&gt;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">"""</span></div><div class="ql-code-block"><span class="ql-token hljs-comment">#1、BeautifulSoup对象</span></div><div class="ql-code-block">soup = BeautifulSoup(html, <span class="ql-token hljs-string">'lxml'</span>)</div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">f"type(soup):{type(soup)} \n"</span>)</div><div class="ql-code-block"><br></div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">'标签查找:'</span>,soup.select(<span class="ql-token hljs-string">'a'</span>))</div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">'属性查找:'</span>,soup.select(<span class="ql-token hljs-string">'a[id="link1"]'</span>))</div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">'类名查找:'</span>,soup.select(<span class="ql-token hljs-string">'.sister'</span>))</div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">'id查找:'</span>,soup.select(<span class="ql-token hljs-string">'#link1'</span>))</div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">'组合查找:'</span>,soup.select(<span class="ql-token hljs-string">'p #link1'</span>))</div></div><p><img src="https://article-images.zsxq.com/FjQkiig9fOl0Bd5qiCbyH4OddW50"></p><div class="ql-code-block-container"><div class="ql-code-block"><span class="ql-token hljs-built_in">type</span>(soup):&lt;<span class="ql-token hljs-keyword">class</span> <span class="ql-token hljs-string">'bs4.BeautifulSoup'</span>&gt;</div><div class="ql-code-block"><br></div><div class="ql-code-block">标签查找: [&lt;a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/elsie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link1"</span>&gt;&lt;!--Elsie--&gt;&lt;/a&gt;, &lt;a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/lacie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link2"</span>&gt;Lacie&lt;/a&gt;, &lt;a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/tillie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link3"</span>&gt;Tillie&lt;/a&gt;]</div><div class="ql-code-block">属性查找: [&lt;a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/elsie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link1"</span>&gt;&lt;!--Elsie--&gt;&lt;/a&gt;]</div><div class="ql-code-block">类名查找: [&lt;a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/elsie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link1"</span>&gt;&lt;!--Elsie--&gt;&lt;/a&gt;, &lt;a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/lacie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link2"</span>&gt;Lacie&lt;/a&gt;, &lt;a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/tillie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link3"</span>&gt;Tillie&lt;/a&gt;]</div><div class="ql-code-block"><span class="ql-token hljs-built_in">id</span>查找: [&lt;a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/elsie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link1"</span>&gt;&lt;!--Elsie--&gt;&lt;/a&gt;]</div><div class="ql-code-block">组合查找: [&lt;a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/elsie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link1"</span>&gt;&lt;!--Elsie--&gt;&lt;/a&gt;]</div></div><p><img src="https://article-images.zsxq.com/FqZck0in441U4EYGi6KobKlS0emA"></p><div class="ql-code-block-container"><div class="ql-code-block"><span class="ql-token hljs-keyword">import</span> requests</div><div class="ql-code-block"><span class="ql-token hljs-keyword">from</span> bs4 <span class="ql-token hljs-keyword">import</span> BeautifulSoup</div><div class="ql-code-block"><span class="ql-token hljs-keyword">import</span> os</div><div class="ql-code-block"><br></div><div class="ql-code-block"><span class="ql-token hljs-keyword">def</span> <span class="ql-token hljs-title">getUrl</span>(<span class="ql-token hljs-params">url</span>):</div><div class="ql-code-block">    <span class="ql-token hljs-keyword">try</span>:</div><div class="ql-code-block">        read = requests.get(url)  </div><div class="ql-code-block">        read.raise_for_status()   </div><div class="ql-code-block">        read.encoding = read.apparent_encoding  </div><div class="ql-code-block">        <span class="ql-token hljs-keyword">return</span> read.text    </div><div class="ql-code-block">    <span class="ql-token hljs-keyword">except</span>:</div><div class="ql-code-block">        <span class="ql-token hljs-keyword">return</span> <span class="ql-token hljs-string">"连接失败!"</span></div><div class="ql-code-block"> </div><div class="ql-code-block"><span class="ql-token hljs-keyword">def</span> <span class="ql-token hljs-title">getPic</span>(<span class="ql-token hljs-params">html</span>):</div><div class="ql-code-block">    soup = BeautifulSoup(html, <span class="ql-token hljs-string">"html.parser"</span>)</div><div class="ql-code-block">    </div><div class="ql-code-block">    all_img = soup.find(<span class="ql-token hljs-string">'ul'</span>).find_all(<span class="ql-token hljs-string">'img'</span>) </div><div class="ql-code-block">    <span class="ql-token hljs-keyword">for</span> img <span class="ql-token hljs-keyword">in</span> all_img:</div><div class="ql-code-block">        src = img[<span class="ql-token hljs-string">'src'</span>]  </div><div class="ql-code-block">        img_url = src</div><div class="ql-code-block">        <span class="ql-token hljs-built_in">print</span>(img_url)</div><div class="ql-code-block">        root = <span class="ql-token hljs-string">"F:/Pic/"</span>   </div><div class="ql-code-block">        path = root + img_url.split(<span class="ql-token hljs-string">'/'</span>)[-<span class="ql-token hljs-number">1</span>]  </div><div class="ql-code-block">        <span class="ql-token hljs-built_in">print</span>(path)</div><div class="ql-code-block">        <span class="ql-token hljs-keyword">try</span>:</div><div class="ql-code-block">            <span class="ql-token hljs-keyword">if</span> <span class="ql-token hljs-keyword">not</span> os.path.exists(root):  </div><div class="ql-code-block">                os.mkdir(root)</div><div class="ql-code-block">            <span class="ql-token hljs-keyword">if</span> <span class="ql-token hljs-keyword">not</span> os.path.exists(path):</div><div class="ql-code-block">                read = requests.get(img_url)</div><div class="ql-code-block">                <span class="ql-token hljs-keyword">with</span> <span class="ql-token hljs-built_in">open</span>(path, <span class="ql-token hljs-string">"wb"</span>)<span class="ql-token hljs-keyword">as</span> f:</div><div class="ql-code-block">                    f.write(read.content)</div><div class="ql-code-block">                    f.close()</div><div class="ql-code-block">                    <span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">"文件保存成功!"</span>)</div><div class="ql-code-block">            <span class="ql-token hljs-keyword">else</span>:</div><div class="ql-code-block">                <span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">"文件已存在!"</span>)</div><div class="ql-code-block">        <span class="ql-token hljs-keyword">except</span>:</div><div class="ql-code-block">            <span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">"文件爬取失败!"</span>)</div><div class="ql-code-block"> </div><div class="ql-code-block"><span class="ql-token hljs-keyword">if</span> __name__ == <span class="ql-token hljs-string">'__main__'</span>:</div><div class="ql-code-block">   html_url=getUrl(<span class="ql-token hljs-string">"https://findicons.com/search/nature"</span>)</div><div class="ql-code-block">   getPic(html_url)</div></div><p><br></p><p><img src="https://article-images.zsxq.com/Fh_dDSbuteEI_0ArnWoZrCFDRuvm"></p><div class="ql-code-block-container"><div class="ql-code-block">标签查找: [&lt;a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/elsie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link1"</span>&gt;&lt;!--Elsie--&gt;&lt;/a&gt;, &lt;a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/lacie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link2"</span>&gt;Lacie&lt;/a&gt;, &lt;a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/tillie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link3"</span>&gt;Tillie&lt;/a&gt;]</div><div class="ql-code-block">属性查找: [&lt;a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/elsie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link1"</span>&gt;&lt;!--Elsie--&gt;&lt;/a&gt;]</div><div class="ql-code-block">类名查找: [&lt;a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/elsie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link1"</span>&gt;&lt;!--Elsie--&gt;&lt;/a&gt;, &lt;a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/lacie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link2"</span>&gt;Lacie&lt;/a&gt;, &lt;a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/tillie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link3"</span>&gt;Tillie&lt;/a&gt;]</div><div class="ql-code-block"><span class="ql-token hljs-built_in">id</span>查找: [&lt;a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/elsie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link1"</span>&gt;&lt;!--Elsie--&gt;&lt;/a&gt;]</div><div class="ql-code-block">组合查找: [&lt;a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/elsie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link1"</span>&gt;&lt;!--Elsie--&gt;&lt;/a&gt;]</div></div><p><br></p><p><br></p><p><br></p><p><br></p><p><br></p><p><br></p><p><br></p><p><br></p><p><br></p><p><br></p><p><br></p><p><br></p></div><footer><div class="horizon-line"></div><img id="logo" src="/assets_dweb/logo@1x.png"><div class="text">知识星球</div><div class="horizon-line"></div></footer><div class="qrcode-container"><img class="qrcode" id="qrcode" src=""><div class="text-desc">扫码加入星球</div><div class="text-desc">查看更多优质内容</div></div><div id="qrcode-url">https://wx.zsxq.com/mweb/views/joingroup/join_group.html?group_id=51112141255244</div><input type="hidden" name="group_allow_copy" value="false"><input type="hidden" name="group_enable_watermark" value="true"><input type="hidden" name="member_id" value="111888182154422"><input type="hidden" name="member_name" value="wws"><input type="hidden" name="member_role" value="other"></div>
"""
## 上面html跟下面的结果对不上,但是不影响理解应该,跑的时候换成自己的html跑一下就知道了
# 1. BeautifulSoup对象
soup = BeautifulSoup(html, 'lxml')
print(f"type(soup):{type(soup)}\n")# 2. Tag 对象
print(f"soup.head:{soup.head} \n")
print(f"soup.head.name:{soup.head.name} \n")
print(f"soup.head.attrs:{soup.head.attrs} \n")
print(f"type(soup.head):{type(soup.head)} \n")# 3. Navigable String 对象
print(f"soup.title.stringh:{soup.title.string} \n")
print(f"type(soup.title.string):{type(soup.title.string)} \n")# 4. Comment 对象
print(f"soup.a.string:{soup.a.string} \n")
print(f"type(soup.a.string):")# 5. 结构化输出soup对象
print(f"soup.prettify()=>{soup.prettify()}")

在这里插入图片描述

2. 遍历文档树

BeautifulSoup 之所以将文档转为树结构,是因为树结构更便于对内容遍历提取
在这里插入图片描述

from bs4 import BeautifulSouphtml = “”“
<html><head><title>The Dotmouse's stroy</title></head>
<body>
<p ...>...</p>
...
</body>
</html>
”“”
# 1. BeautifulSoup对象
soup = BeautifulSoup(html, 'lxml')
print(f"type(soup):{type(soup)}\n")# 2. 向下遍历
print(soup.p.contents)
print(list(soup.p.children))
print(list(soup.p.descendants))# 3. 向上遍历
print(list(soup.p.parent.name))
for i in soup.p.parents:print(i.name)# 4. 平行遍历
print('a_next:',soup.a.next_sibling)
for i in soup.a.next_sibling:print('a_nexts:', i)
print('a_previous:',soup.a.previous_sibling )
for i in soup.a.previous_sibling:print('a_previous:', i)

在这里插入图片描述

4 搜索文档树

搜索方法:
在这里插入图片描述

from bs4 import BeautifulSouphtml = “”“
<html><head><title>The Dotmouse's stroy</title></head>
<body>
<p ...>...</p>
...
</body>
</html>
”“”
# 1. BeautifulSoup对象
soup = BeautifulSoup(html, 'lxml')
print(f"type(soup):{type(soup)}\n")# 2. find()
print(soup.find('a'))# 查找a标签
print(soup.find(id='link2'))# 查找id等于link2的元素# 3. find_all()
print(soup.find_all('a'))# 查找标签名
print(soup.find_all('a',id='link1'))# 检索属性值
print(soup.find_all('a',class='sister'))# 检索属性值
print(soup.find_all(text=['Elsie','Lacie']))# 4. 向上检索
print(list(soup.p.find_parent().name))
for i in soup.title.find_parents():print(i.name)# 5. 平行检索
print(soup.head.find_next_sibling().name())
for i in soup.head.find_next_sibling():print('a_nexts:', i)
print(soup.title.find_previous_sibling())
for i in soup.title.find_previous_sibling():print('a_previous:', i)

在这里插入图片描述

5 CSS 选择器

在Tag或者BeautifulSoup对象的select()方法中传入字符串参数,即可使用CSS选择器找到Tag
在这里插入图片描述
在这里插入图片描述
在这里插入图片描述

6 爬取图片示例
import requests
from bs4 import BeautifulSoup
import os
def geturl(url):try :read =requests.get(url)read.raise for status()read,encoding=read.apparent encodingreturn read.textexcept:return“连接失败!def getPic(html):soup= BeautifulSoup(html, "html.parser”)all_img = soup.find('ul’).find_all( img )for img in all_img:src = img['src’]img url = srcprint(img_url)root ='F:/Pic/'path=root + img_url.split(/)[-1]print(path)try:if not os.path.exists(root):os.mkdir (root)if not os.path.exists(path):read =requests.get(img url)with open(path, “wb )as f:f.write(read.content)f.close()print("文件保存成功!")else :print(“文件已存在!")except:print(~文件爬取失败!")
if __name__=='__main__':html_url=getUrl( 'https://findicons.com/search/nature' )getPic(html_url)

在这里插入图片描述

参考
版面分析–网页HTML解析
Beautiful Soup 4.4.0 文档
python爬虫之Beautifulsoup模块用法详解
网络爬虫之BeautifulSoup详解(含多个案例)

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.hqwc.cn/news/671674.html

如若内容造成侵权/违法违规/事实不符,请联系编程知识网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

【强训笔记】day7

NO.1 思路&#xff1a;双指针模拟&#xff0c;begin表示最长数字字符串最后一个字符&#xff0c;而len表示数字字符串的长度&#xff0c;i用来遍历&#xff0c;如果为数字&#xff0c;那么定义j变量继续遍历&#xff0c;直到不为数字&#xff0c;i-j如果大于len&#xff0c;就…

TRIZ理论助力充电桩产业跨越技术瓶颈,实现产业升级!

随着新能源汽车市场的迅猛发展和电动汽车保有量的不断增加&#xff0c;充电桩作为电动汽车的“能量补给站”&#xff0c;其重要性日益凸显。然而&#xff0c;充电桩产业在发展过程中也面临着诸多技术瓶颈&#xff0c;如何突破这些瓶颈&#xff0c;推动充电桩产业升级成为行业亟…

【字符串】Leetcode 二进制求和

题目讲解 67. 二进制求和 算法讲解 为了方便计算&#xff0c;我们将两个字符串的长度弄成一样的&#xff0c;在短的字符串前面添加字符0&#xff1b;我们从后往前计算&#xff0c;当遇到当前计算出来的字符是> 2’的&#xff0c;那么就需要往前面进位和求余 注意&#xf…

d3dcompiler_47.dll缺失怎么修复?,修复d3dcompiler_47.dll文件缺失的详细教程

d3dcompiler_47.dll缺失怎么修复&#xff1f;遇到这样的问题是不是不知道怎么办&#xff1f;如果你不知道该怎么办&#xff0c;那么小编这篇文章将教大家如何去解决d3dcompiler_47.dll文件缺失。 方法1&#xff1a;下载并安装d3dcompiler_47.dll文件 当出现找不到d3dcompiler_…

怎么把静图变成gif?推荐一招在线制作

想要实现图片转换gif动图的操作使用什么工具能实现呢&#xff1f;在这个高速发展的时代经常需要在网络中分享、保存各种各样的gif动图。当我们想要将一些静态图片变成有趣的gif图片时&#xff0c;就可以使用gif在线制作&#xff08;https://www.gif.cn/&#xff09;工具&#x…

ComfyUI 基础教程(十三):ComfyUI-Impact-Pack 面部修复

SD的WebUI 中的面部修复神器 ADetailer,无法在ComfyUI 中使用。那么如何在ComfyUI中进行面部处理呢?ComfyUI 中也有几个面部修复功能,比如ComfyUI Impact Pack(FaceDetailer),以及换脸插件Reactor和IPAdapter。 ComfyUI-Impact-Pack 是一个功能强大的插件,专为 ComfyUI …

投资海外标的,首选跨境ETF!现在新开佣金低至万0.5!

全球资产配置的利器 随着经济的发展&#xff0c;全球资产配置成为中产阶级的关注方向。目前&#xff0c;全球资产配置的主要渠道包括直接开立境外账户、 QDII 基金、跨境 ETF 等。 现阶段通过跨境 ETF 投资境外股市是最便利、最具效率的方式之一。首先&#xff0c;与直接境外…

猪饲料生产加工厂污废水如何处理

猪饲料生产加工厂的污废水处理是一个复杂的过程&#xff0c;旨在减少生产活动对环境的影响&#xff0c;确保废水排放符合环保标准。以下是一个基本的处理流程&#xff0c;适用于处理猪饲料生产加工过程中产生的废水&#xff1a; 初步处理&#xff08;预处理&#xff09;&#x…

解决 java: 非法字符: ‘\ufeff‘

【报错解释】&#xff1a; 该错误通常发生在尝试编译Java源代码文件时&#xff0c;文件开头的字符是一个字节顺序标记&#xff08;Byte Order Mark&#xff0c;BOM&#xff09;&#xff0c;即\ufeff。在Java中&#xff0c;\ufeff不是一个合法的字符&#xff0c;因此编译器会报…

git-新增业务代码分支

需求 使用git作为项目管理工具管理项目&#xff0c;我需要有两个分支&#xff0c;一个分支是日常的主分支&#xff0c;会频繁的推送和修改代码并推送另外一个是新的业务代码分支&#xff0c;是一个长期开发的功能&#xff0c;同时这个业务分支需要频繁的拉取主分支的代码&#…

1天搞定SpringBoot+Vue全栈开发 (4)多表查询及分页查询

1.多表查询 在多表查询中&#xff0c;mybatis和mybatis plus无区别 User: package com.example.mpdemo.entity;import com.baomidou.mybatisplus.annotation.IdType; import com.baomidou.mybatisplus.annotation.TableField; import com.baomidou.mybatisplus.annotation.Ta…

36.Docker-Dockerfile自定义镜像

镜像结构 镜像是将应用程序及其需要的系统函数库、环境、配置、依赖打包而成。 镜像是分层机构&#xff0c;每一层都是一个layer BaseImage层&#xff1a;包含基本的系统函数库、环境变量、文件系统 EntryPoint:入口&#xff0c;是镜像中应用启动的命令 其他&#xff1a;在…