跨平台的Html解析代码

前一段时间为了解析HTML在网上找Delphi版本的HTML解析器,发现没有太好用的.遇到复杂的HTML都会出错.最常见的JavaScript中嵌入HTML的字符串,会解析出错.
至于收费的没看过.不知道怎么样.

于是自己写了一个,到现在没有遇到解析出错的HTML.现在公开出来给大家用.只是苦了老外那几个收费的.

采用的是接口形式,生存期自管理,不用理会释放的事情.最近又增加了CSS Selector语法的查找功能.可以像CSS选择器一样选择节点.
只引用了SysUtils单元.避免了在高版本Delphi中Classes这个体积大户.同时也具有较好的跨平台性.
支持Delphi7-DelphiXE4为止的编译器.

因为采用的是接口,理论上编译成DLL的话C++和VB也能使用.
接口声明如下:

  IHtmlElement = interface
    ['{8C75239C-8CFA-499F-B115-7CEBEDFB421B}']
    function GetOwner: IHtmlElement; stdcall;
    function GetTagName: WideString; safecall;
    function GetContent: WideString; safecall;
    function GetOrignal: WideString; safecall;
    function GetChildrenCount: Integer; stdcall;
    function GetChildren(Index: Integer): IHtmlElement; stdcall;
    function GetCloseTag: IHtmlElement; stdcall;
    function GetInnerHtml(): WideString; safecall;
    function GetOuterHtml(): WideString; safecall;
    function GetInnerText(): WideString; safecall;

    function GetAttributes(Key: WideString): WideString; safecall;

    function GetSourceLineNum(): Integer; stdcall;
    function GetSourceColNum(): Integer; stdcall;

    // 属性是否存在
    function HasAttribute(AttributeName: WideString): Boolean; stdcall;
    // 查找节点
    { FindElements('Link','type="application/rss+xml"')
      FindElements('','type="application/rss+xml"')
    }
    function FindElements(ATagName: WideString; AAttributes: WideString;
      AOnlyInTopElement: Boolean): IHtmlElementList; stdcall;
    //用CSS选择器语法查找Element
    function SimpleCSSSelector(const selector: WideString)
      : IHtmlElementList; stdcall;
    // 枚举属性
    procedure EnumAttributeNames(AParam: Pointer;
      ACallBack: TEnumAttributeNameCallBack); stdcall;

    property TagName: WideString read GetTagName;
    property ChildrenCount: Integer read GetChildrenCount;
    property Children[index: Integer]: IHtmlElement read GetChildren; default;
    property CloseTag: IHtmlElement read GetCloseTag;
    property Content: WideString read GetContent;
    property Orignal: WideString read GetOrignal;
    property Owner: IHtmlElement read GetOwner;
    // 获取元素在源代码中的位置
    property SourceLineNum: Integer read GetSourceLineNum;
    property SourceColNum: Integer read GetSourceColNum;
    //
    property InnerHtml: WideString read GetInnerHtml;
    property OuterHtml: WideString read GetOuterHtml;
    property InnerText: WideString read GetInnerText;

    property Attributes[Key: WideString]: WideString read GetAttributes;
  end;

  IHtmlElementList = interface
    ['{8E1380C6-4263-4BF6-8D10-091A86D8E7D9}']
    function GetCount: Integer; stdcall;
    function GetItems(Index: Integer): IHtmlElement; stdcall;

    property Count: Integer read GetCount;
    property Items[Index: Integer]: IHtmlElement read GetItems; default;
  end;

function ParserHTML(const Source: WideString): IHtmlElement; stdcall;

GoogleCode SVN源代码:
http://code.google.com/p/delphi-html-parser/
或者
htmlparser

补充更新的版本:
https://www.raysoftware.cn/?p=443

此条目发表在Delphi, 未分类分类目录。将固定链接加入收藏夹。

跨平台的Html解析代码》有202条回应

  1. mark说:

    简单测试了下。确实很牛,要是能加入直接提取 link,script,image,iframe,只获取这些标签的功能就好了。就是不获取其他标签,是不是速度会快很多,也不用弄成个树,直接返回几个 THtmlElementList。哈哈,我只是提个建议。我看看能不能改下!
    谢谢大牛了!

  2. mark说:

    解析的 atrribute有问题 比如 src http://www.baidu.com/?abc=123&db=12
    这样的你就会根据=号,分割成很多个atrr

  3. mark说:

    就是没有双引号或者单引号的时候!

    • admin说:

      HTML的属性src后面至少要有等号啊.没有引号的已经修正.你可以去GoodCode的SVN上拉取.

  4. maqiang说:

    大牛啊 厉害

  5. 匿名说:

    能不能用程序实现读取网易彩票11选5的倒计时

  6. 米汤说:

    有个小建议,innertext是否把 这种格式,添加个空格。

  7. sparrow说:

    有没有例子源码?

  8. DTAMADE说:

    是个好东西啊

  9. Edwin说:

    您好,这个html parser很不错,有一个请求,就是除了行号及列号,能否加入一个属性返回字符位置?谢谢。

  10. xiucai说:

    您好,在解析网页http://guba.eastmoney.com/news,600000,25782963,d.html#storeply时出现以下错误:
    ‘LineNum:213无法找到Tag结束点:<b'

  11. jacob说:

    我发觉错误点是发生在有一段的叙述上,去掉了这一段,就可以顺利的解析了!

  12. jacob说:

    我发觉错误点是发生在有一段 ?xml:namespace prefix = o ns = “urn:schemas-microsoft-com:office:office” 的叙述上,去掉了这一段,就可以顺利的解析了!

  13. 梧桐说:

    这个解析器挺好用的,我想问一下它调用各种方法,资源是如何回收释放的?

  14. Vladimir说:

    Hello.
    Did you not update the library anymore?
    One developer was busy refining this library, but it causes errors at some points. Perhaps you have a solution: https://github.com/ying32/htmlparser/issues/4

  15. Pingback引用通告: 43ytr.icu/j/GPoAr

  16. Pingback引用通告: Èãðà ïðåñòîëîâ 8 ñåçîí 2 ñåðèÿ 3 ñåðèÿ

  17. Pingback引用通告: Èãðà ïðåñòîëîâ 8 ñåçîí Ëîñòôèëüì

  18. Pingback引用通告: Èãðà ïðåñòîëîâ 8 ñåçîí

  19. Pingback引用通告: glyxar.ru

  20. Pingback引用通告: abisko.ru

  21. Pingback引用通告: 2021

  22. Pingback引用通告: Èãðà ïðåñòîëîâ 8 ñåçîí âñå ñåðèè

  23. Pingback引用通告: Èãðà Ïðåñòîëîâ 8 ñåçîí 5 ñåðèÿ

  24. Pingback引用通告: bitly.com/AzAX3

  25. Pingback引用通告: 2020-2020-2020

  26. Pingback引用通告: Mstiteli: Final (2019)

  27. Pingback引用通告: ðîêåòìåí ïîëíûé ôèëüì

  28. Pingback引用通告: wwin-tv.com

  29. Pingback引用通告: empire-season-2-episode-3-putlocker

  30. Pingback引用通告: Video

  31. Pingback引用通告: Watch

  32. Pingback引用通告: watch online

  33. Pingback引用通告: 00-tv.com

  34. Pingback引用通告: 4serial.com

  35. Pingback引用通告: we-b-tv.com

  36. Pingback引用通告: kino-m.com

  37. Pingback引用通告: m-dnc.com

  38. Pingback引用通告: ðûáàëêà

  39. Pingback引用通告: kino

  40. Pingback引用通告: hs;br

  41. Pingback引用通告: tureckie_serialy_na_russkom_jazyke

  42. Pingback引用通告: tureckie_serialy

  43. Pingback引用通告: serialy

  44. Pingback引用通告: +1+

  45. Pingback引用通告: æóêè+2+ñåðèÿ

  46. Pingback引用通告: Ñìîòðåòü âñå ñåðèè ïîäðÿä

  47. Pingback引用通告: âûòîïêà âîñêà

  48. Pingback引用通告: ++++++

  49. Pingback引用通告: HD-720

  50. Pingback引用通告: guardians+of+the+galaxy+2

  51. Pingback引用通告: strong woman do bong soon

  52. Pingback引用通告: my id is gangnam beauty

  53. Pingback引用通告: guardians of the galaxy vol 2

  54. Pingback引用通告: 2020

  55. Pingback引用通告: kpop+star+season+6+ep+9

  56. Pingback引用通告: 1 2 3 4 5 6 7 8 9 10

  57. Pingback引用通告: dinotube hd dinotube

  58. Pingback引用通告: Watch TV Shows

  59. Pingback引用通告: serial 2

  60. Pingback引用通告: serial

  61. Pingback引用通告: trustedmdstorefy.com

  62. Pingback引用通告: bofilm ñåðèàë

  63. Pingback引用通告: bofilm

  64. Pingback引用通告: 1 seriya

  65. Pingback引用通告: Êîíñóëüòàöèÿ ïñèõîëîãà

  66. Pingback引用通告: topedstoreusa.com

  67. Pingback引用通告: hqcialismht.com

  68. Pingback引用通告: viagramdtrustser.com

  69. Pingback引用通告: Evil-Season-1

  70. Pingback引用通告: Evil-Season-2

  71. Pingback引用通告: Dollface-Season-1

  72. Pingback引用通告: Queer-Eye-We-re-in-Japan-Season-1

  73. Pingback引用通告: tvrv.ru

  74. Pingback引用通告: 1plus1serial.site

  75. Pingback引用通告: #1plus1

  76. Pingback引用通告: 1plus1

  77. Pingback引用通告: Watch Movies Online

  78. Pingback引用通告: Film 2020

  79. Pingback引用通告: parazity-oskar-2020

  80. Pingback引用通告: human design

  81. Pingback引用通告: DSmlka

  82. Pingback引用通告: viagra

  83. Pingback引用通告: viagra online

  84. Pingback引用通告: +

  85. Pingback引用通告: ¯jak Son³k

  86. Pingback引用通告: astrolog

  87. Pingback引用通告: film-kalashnikov-watch

  88. Pingback引用通告: generic cialis

  89. Pingback引用通告: cialis 20mg

  90. Pingback引用通告: LostFilm HD 720

  91. Pingback引用通告: kinoxaxru.ru

  92. Pingback引用通告: pobachennya u vegas

  93. Pingback引用通告: Proshanie so Stalinym

  94. Pingback引用通告: strelcov 2020

  95. Pingback引用通告: film t-34

  96. Pingback引用通告: online pharmacy

  97. Pingback引用通告: canadian pharmacy

  98. Pingback引用通告: Beograd film 2020

  99. Pingback引用通告: psiholog

  100. Pingback引用通告: psixolog

  101. Pingback引用通告: psyhelp_on_line

  102. Pingback引用通告: coronavirus

  103. Pingback引用通告: PSYCHOSOCIAL

  104. Pingback引用通告: rasstanovka hellinger

  105. Pingback引用通告: Cherekasi film 2020

  106. Pingback引用通告: film doktor_liza

  107. Pingback引用通告: djoker film

  108. Pingback引用通告: t.me/psyhell

  109. Pingback引用通告: Ïñèõîëîã îíëàéí

  110. Pingback引用通告: bitly.com

  111. Pingback引用通告: viagra 100mg

  112. Pingback引用通告: viagra price

  113. Pingback引用通告: viagra generic

  114. Pingback引用通告: viagra coupon

  115. Pingback引用通告: cheap viagra

  116. Pingback引用通告: cialis

  117. Pingback引用通告: cialis coupon

  118. Pingback引用通告: canadian pharmacy cialis

  119. Pingback引用通告: cialis 5mg

  120. Pingback引用通告: rlowcostmd.com

  121. Pingback引用通告: bitly

  122. Pingback引用通告: movies-tekstmovies-tekst

  123. Pingback引用通告: Zemlyane 2005 smotret onlajn

  124. Pingback引用通告: smotret onlajn besplatno v kachestve hd 1080

  125. Pingback引用通告: gusmeasu.com

  126. Pingback引用通告: movies-unhinged-film

  127. Pingback引用通告: malenkie-zhenshhiny-2020

  128. Pingback引用通告: dom 2

  129. Pingback引用通告: zoom-psykholog

  130. Pingback引用通告: zoom-viber-skype

  131. Pingback引用通告: Vratar Galaktiki Film, 2020

  132. Pingback引用通告: Vratar

  133. Pingback引用通告: Cherkassy 2020

  134. Pingback引用通告: chernobyl-hbo-2019-1-sezon

  135. Pingback引用通告: moskva-psiholog

  136. Pingback引用通告: batmanapollo.ru

  137. Pingback引用通告: 323

  138. Pingback引用通告: 525

  139. Pingback引用通告: dom2-ru

  140. Pingback引用通告: Tenet Online

  141. Pingback引用通告: psy psy psy psy

  142. Pingback引用通告: krsmi.ru

  143. Pingback引用通告: like-v.ru

  144. Pingback引用通告: CFOSPUK

  145. Pingback引用通告: MAMprEj

  146. Pingback引用通告: fgu0ygW

  147. Pingback引用通告: batmanapollo

  148. Pingback引用通告: tsoy

  149. Pingback引用通告: 44548

  150. Pingback引用通告: 44549

  151. Pingback引用通告: hod-korolevy-2020

  152. Pingback引用通告: HD

  153. Pingback引用通告: 158444

  154. Pingback引用通告: groznyy-serial-2020

  155. Pingback引用通告: 38QvPmk

  156. Pingback引用通告: bitly.com/doctor-strange-hd

  157. Pingback引用通告: bitly.com/eternals-online

  158. Pingback引用通告: bitly.com/maior-grom

  159. Pingback引用通告: matrica-film

  160. Pingback引用通告: dzhonuikfilm4

  161. Pingback引用通告: bitly.com/batman20212022

  162. Pingback引用通告: bitly.com/venom-2-smotret-onlajn

  163. Pingback引用通告: bitly.com/nevremyaumirat

  164. Pingback引用通告: bitly.com/kingsmankingsman

  165. Pingback引用通告: bitly.com/3zaklyatie3

  166. Pingback引用通告: bitly.com/1dreykfilm

  167. Pingback引用通告: bitly.com/topgunmavericktopgun

  168. Pingback引用通告: bitly.com/flash2022

  169. Pingback引用通告: bitly.com/fantasticheskietvari3

  170. Pingback引用通告: bitly.com/wonderwoman1984hd

  171. Pingback引用通告: 1444

  172. Pingback引用通告: cleantalkorg2.ru

  173. Pingback引用通告: 232dfsad

  174. Pingback引用通告: cleantalkorg2.ru/sitemap.xml

  175. Pingback引用通告: join vk

  176. Pingback引用通告: vk login

  177. Pingback引用通告: svaty—7—sezon

  178. Pingback引用通告: svaty 7 sezon

  179. Pingback引用通告: svaty 7

  180. Pingback引用通告: tik tok

  181. Pingback引用通告: 666

  182. Pingback引用通告: The Revenant

  183. Pingback引用通告: 2021

  184. Pingback引用通告: D4

  185. Pingback引用通告: 777

  186. Pingback引用通告: link

  187. Pingback引用通告: 4569987

  188. Pingback引用通告: news news news

  189. Pingback引用通告: psy

  190. Pingback引用通告: psy2022

  191. Pingback引用通告: projectio-freid

  192. Pingback引用通告: kinoteatrzarya.ru

  193. Pingback引用通告: topvideos

  194. Pingback引用通告: afisha-kinoteatrov.ru

  195. Pingback引用通告: Ukrainskie-serialy

  196. Pingback引用通告: site

  197. Pingback引用通告: top

  198. Pingback引用通告: soderzhanki-3-sezon-2021.online

评论已关闭。