前一段时间为了解析HTML在网上找Delphi版本的HTML解析器,发现没有太好用的.遇到复杂的HTML都会出错.最常见的JavaScript中嵌入HTML的字符串,会解析出错.
至于收费的没看过.不知道怎么样.
于是自己写了一个,到现在没有遇到解析出错的HTML.现在公开出来给大家用.只是苦了老外那几个收费的.
采用的是接口形式,生存期自管理,不用理会释放的事情.最近又增加了CSS Selector语法的查找功能.可以像CSS选择器一样选择节点.
只引用了SysUtils单元.避免了在高版本Delphi中Classes这个体积大户.同时也具有较好的跨平台性.
支持Delphi7-DelphiXE4为止的编译器.
因为采用的是接口,理论上编译成DLL的话C++和VB也能使用.
接口声明如下:
IHtmlElement = interface ['{8C75239C-8CFA-499F-B115-7CEBEDFB421B}'] function GetOwner: IHtmlElement; stdcall; function GetTagName: WideString; safecall; function GetContent: WideString; safecall; function GetOrignal: WideString; safecall; function GetChildrenCount: Integer; stdcall; function GetChildren(Index: Integer): IHtmlElement; stdcall; function GetCloseTag: IHtmlElement; stdcall; function GetInnerHtml(): WideString; safecall; function GetOuterHtml(): WideString; safecall; function GetInnerText(): WideString; safecall; function GetAttributes(Key: WideString): WideString; safecall; function GetSourceLineNum(): Integer; stdcall; function GetSourceColNum(): Integer; stdcall; // 属性是否存在 function HasAttribute(AttributeName: WideString): Boolean; stdcall; // 查找节点 { FindElements('Link','type="application/rss+xml"') FindElements('','type="application/rss+xml"') } function FindElements(ATagName: WideString; AAttributes: WideString; AOnlyInTopElement: Boolean): IHtmlElementList; stdcall; //用CSS选择器语法查找Element function SimpleCSSSelector(const selector: WideString) : IHtmlElementList; stdcall; // 枚举属性 procedure EnumAttributeNames(AParam: Pointer; ACallBack: TEnumAttributeNameCallBack); stdcall; property TagName: WideString read GetTagName; property ChildrenCount: Integer read GetChildrenCount; property Children[index: Integer]: IHtmlElement read GetChildren; default; property CloseTag: IHtmlElement read GetCloseTag; property Content: WideString read GetContent; property Orignal: WideString read GetOrignal; property Owner: IHtmlElement read GetOwner; // 获取元素在源代码中的位置 property SourceLineNum: Integer read GetSourceLineNum; property SourceColNum: Integer read GetSourceColNum; // property InnerHtml: WideString read GetInnerHtml; property OuterHtml: WideString read GetOuterHtml; property InnerText: WideString read GetInnerText; property Attributes[Key: WideString]: WideString read GetAttributes; end; IHtmlElementList = interface ['{8E1380C6-4263-4BF6-8D10-091A86D8E7D9}'] function GetCount: Integer; stdcall; function GetItems(Index: Integer): IHtmlElement; stdcall; property Count: Integer read GetCount; property Items[Index: Integer]: IHtmlElement read GetItems; default; end; function ParserHTML(const Source: WideString): IHtmlElement; stdcall;
GoogleCode SVN源代码:
http://code.google.com/p/delphi-html-parser/
或者
htmlparser
补充更新的版本:
https://www.raysoftware.cn/?p=443
简单测试了下。确实很牛,要是能加入直接提取 link,script,image,iframe,只获取这些标签的功能就好了。就是不获取其他标签,是不是速度会快很多,也不用弄成个树,直接返回几个 THtmlElementList。哈哈,我只是提个建议。我看看能不能改下!
谢谢大牛了!
解析以后你用那个CSSselector的方法查找link,script,image,ifame就可以了
直接这样写:FNodes.SimpleCSSSelector(‘script,a,img,iframe,frame’);
对吗?
是的.
目前支持这样的语法
http://www.cnblogs.com/webblog/archive/2009/07/07/1518274.html
目前伪类不支持.
解析的 atrribute有问题 比如 src http://www.baidu.com/?abc=123&db=12
这样的你就会根据=号,分割成很多个atrr
就是没有双引号或者单引号的时候!
HTML的属性src后面至少要有等号啊.没有引号的已经修正.你可以去GoodCode的SVN上拉取.
大牛啊 厉害
能不能用程序实现读取网易彩票11选5的倒计时
有个小建议,innertext是否把 这种格式,添加个空格。
有没有例子源码?
是个好东西啊
您好,这个html parser很不错,有一个请求,就是除了行号及列号,能否加入一个属性返回字符位置?谢谢。
您好,在解析网页http://guba.eastmoney.com/news,600000,25782963,d.html#storeply时出现以下错误:
‘LineNum:213无法找到Tag结束点:<b'
我发觉错误点是发生在有一段的叙述上,去掉了这一段,就可以顺利的解析了!
我发觉错误点是发生在有一段 ?xml:namespace prefix = o ns = “urn:schemas-microsoft-com:office:office” 的叙述上,去掉了这一段,就可以顺利的解析了!
这个解析器挺好用的,我想问一下它调用各种方法,资源是如何回收释放的?
Hello.
Did you not update the library anymore?
One developer was busy refining this library, but it causes errors at some points. Perhaps you have a solution: https://github.com/ying32/htmlparser/issues/4
Pingback引用通告: 43ytr.icu/j/GPoAr
Pingback引用通告: Èãðà ïðåñòîëîâ 8 ñåçîí 2 ñåðèÿ 3 ñåðèÿ
Pingback引用通告: Èãðà ïðåñòîëîâ 8 ñåçîí Ëîñòôèëüì
Pingback引用通告: Èãðà ïðåñòîëîâ 8 ñåçîí
Pingback引用通告: glyxar.ru
Pingback引用通告: abisko.ru
Pingback引用通告: 2021
Pingback引用通告: Èãðà ïðåñòîëîâ 8 ñåçîí âñå ñåðèè
Pingback引用通告: Èãðà Ïðåñòîëîâ 8 ñåçîí 5 ñåðèÿ
Pingback引用通告: bitly.com/AzAX3
Pingback引用通告: 2020-2020-2020
Pingback引用通告: Mstiteli: Final (2019)
Pingback引用通告: ðîêåòìåí ïîëíûé ôèëüì
Pingback引用通告: wwin-tv.com
Pingback引用通告: empire-season-2-episode-3-putlocker
Pingback引用通告: Video
Pingback引用通告: Watch
Pingback引用通告: watch online
Pingback引用通告: 00-tv.com
Pingback引用通告: 4serial.com
Pingback引用通告: we-b-tv.com
Pingback引用通告: kino-m.com
Pingback引用通告: m-dnc.com
Pingback引用通告: ðûáàëêà
Pingback引用通告: kino
Pingback引用通告: hs;br
Pingback引用通告: tureckie_serialy_na_russkom_jazyke
Pingback引用通告: tureckie_serialy
Pingback引用通告: serialy
Pingback引用通告: +1+
Pingback引用通告: æóêè+2+ñåðèÿ
Pingback引用通告: Ñìîòðåòü âñå ñåðèè ïîäðÿä
Pingback引用通告: âûòîïêà âîñêà
Pingback引用通告: ++++++
Pingback引用通告: HD-720
Pingback引用通告: guardians+of+the+galaxy+2
Pingback引用通告: strong woman do bong soon
Pingback引用通告: my id is gangnam beauty
Pingback引用通告: guardians of the galaxy vol 2
Pingback引用通告: 2020
Pingback引用通告: kpop+star+season+6+ep+9
Pingback引用通告: 1 2 3 4 5 6 7 8 9 10
Pingback引用通告: dinotube hd dinotube
Pingback引用通告: Watch TV Shows
Pingback引用通告: serial 2
Pingback引用通告: serial
Pingback引用通告: trustedmdstorefy.com
Pingback引用通告: bofilm ñåðèàë
Pingback引用通告: bofilm
Pingback引用通告: 1 seriya
Pingback引用通告: Êîíñóëüòàöèÿ ïñèõîëîãà
Pingback引用通告: topedstoreusa.com
Pingback引用通告: hqcialismht.com
Pingback引用通告: viagramdtrustser.com
Pingback引用通告: Evil-Season-1
Pingback引用通告: Evil-Season-2
Pingback引用通告: Dollface-Season-1
Pingback引用通告: Queer-Eye-We-re-in-Japan-Season-1
Pingback引用通告: tvrv.ru
Pingback引用通告: 1plus1serial.site
Pingback引用通告: #1plus1
Pingback引用通告: 1plus1
Pingback引用通告: Watch Movies Online
Pingback引用通告: Film 2020
Pingback引用通告: parazity-oskar-2020
Pingback引用通告: human design
Pingback引用通告: DSmlka
Pingback引用通告: viagra
Pingback引用通告: viagra online
Pingback引用通告: +
Pingback引用通告: ¯jak Son³k
Pingback引用通告: astrolog
Pingback引用通告: film-kalashnikov-watch
Pingback引用通告: generic cialis
Pingback引用通告: cialis 20mg
Pingback引用通告: LostFilm HD 720
Pingback引用通告: kinoxaxru.ru
Pingback引用通告: pobachennya u vegas
Pingback引用通告: Proshanie so Stalinym
Pingback引用通告: strelcov 2020
Pingback引用通告: film t-34
Pingback引用通告: online pharmacy
Pingback引用通告: canadian pharmacy
Pingback引用通告: Beograd film 2020
Pingback引用通告: psiholog
Pingback引用通告: psixolog
Pingback引用通告: psyhelp_on_line
Pingback引用通告: coronavirus
Pingback引用通告: PSYCHOSOCIAL
Pingback引用通告: rasstanovka hellinger
Pingback引用通告: Cherekasi film 2020
Pingback引用通告: film doktor_liza
Pingback引用通告: djoker film
Pingback引用通告: t.me/psyhell
Pingback引用通告: Ïñèõîëîã îíëàéí
Pingback引用通告: bitly.com
Pingback引用通告: viagra 100mg
Pingback引用通告: viagra price
Pingback引用通告: viagra generic
Pingback引用通告: viagra coupon
Pingback引用通告: cheap viagra
Pingback引用通告: cialis
Pingback引用通告: cialis coupon
Pingback引用通告: canadian pharmacy cialis
Pingback引用通告: cialis 5mg
Pingback引用通告: rlowcostmd.com
Pingback引用通告: bitly
Pingback引用通告: movies-tekstmovies-tekst
Pingback引用通告: Zemlyane 2005 smotret onlajn
Pingback引用通告: smotret onlajn besplatno v kachestve hd 1080
Pingback引用通告: gusmeasu.com
Pingback引用通告: movies-unhinged-film
Pingback引用通告: malenkie-zhenshhiny-2020
Pingback引用通告: dom 2
Pingback引用通告: zoom-psykholog
Pingback引用通告: zoom-viber-skype
Pingback引用通告: Vratar Galaktiki Film, 2020
Pingback引用通告: Vratar
Pingback引用通告: Cherkassy 2020
Pingback引用通告: chernobyl-hbo-2019-1-sezon
Pingback引用通告: moskva-psiholog
Pingback引用通告: batmanapollo.ru
Pingback引用通告: 323
Pingback引用通告: 525
Pingback引用通告: dom2-ru
Pingback引用通告: Tenet Online
Pingback引用通告: psy psy psy psy
Pingback引用通告: krsmi.ru
Pingback引用通告: like-v.ru
Pingback引用通告: CFOSPUK
Pingback引用通告: MAMprEj
Pingback引用通告: fgu0ygW
Pingback引用通告: batmanapollo
Pingback引用通告: tsoy
Pingback引用通告: 44548
Pingback引用通告: 44549
Pingback引用通告: hod-korolevy-2020
Pingback引用通告: HD
Pingback引用通告: 158444
Pingback引用通告: groznyy-serial-2020
Pingback引用通告: 38QvPmk
Pingback引用通告: bitly.com/doctor-strange-hd
Pingback引用通告: bitly.com/eternals-online
Pingback引用通告: bitly.com/maior-grom
Pingback引用通告: matrica-film
Pingback引用通告: dzhonuikfilm4
Pingback引用通告: bitly.com/batman20212022
Pingback引用通告: bitly.com/venom-2-smotret-onlajn
Pingback引用通告: bitly.com/nevremyaumirat
Pingback引用通告: bitly.com/kingsmankingsman
Pingback引用通告: bitly.com/3zaklyatie3
Pingback引用通告: bitly.com/1dreykfilm
Pingback引用通告: bitly.com/topgunmavericktopgun
Pingback引用通告: bitly.com/flash2022
Pingback引用通告: bitly.com/fantasticheskietvari3
Pingback引用通告: bitly.com/wonderwoman1984hd
Pingback引用通告: 1444
Pingback引用通告: cleantalkorg2.ru
Pingback引用通告: 232dfsad
Pingback引用通告: cleantalkorg2.ru/sitemap.xml
Pingback引用通告: join vk
Pingback引用通告: vk login
Pingback引用通告: svaty7sezon
Pingback引用通告: svaty 7 sezon
Pingback引用通告: svaty 7
Pingback引用通告: tik tok
Pingback引用通告: 666
Pingback引用通告: The Revenant
Pingback引用通告: 2021
Pingback引用通告: D4
Pingback引用通告: 777
Pingback引用通告: link
Pingback引用通告: 4569987
Pingback引用通告: news news news
Pingback引用通告: psy
Pingback引用通告: psy2022
Pingback引用通告: projectio-freid
Pingback引用通告: kinoteatrzarya.ru
Pingback引用通告: topvideos
Pingback引用通告: afisha-kinoteatrov.ru
Pingback引用通告: Ukrainskie-serialy
Pingback引用通告: site
Pingback引用通告: top
Pingback引用通告: soderzhanki-3-sezon-2021.online