dramaticcat@sh.itjust.works to Lemmy Shitpost@lemmy.world · 1 year agoChad scrapersh.itjust.worksimagemessage-square98fedilinkarrow-up1962arrow-down123
arrow-up1939arrow-down1imageChad scrapersh.itjust.worksdramaticcat@sh.itjust.works to Lemmy Shitpost@lemmy.world · 1 year agomessage-square98fedilink
minus-squareindepndnt@lemmy.worldlinkfedilinkarrow-up96·1 year agoI’m down with scraping, but “parses HTML with regex” has got me fucked up.
minus-square257m@sh.itjust.workslinkfedilinkarrow-up52·1 year agoRelevant SO post. https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags#1732454
minus-squaretetelestia@lemmy.worldlinkfedilinkarrow-up2·1 year agoWhat’s wrong with parsing HTML with regex?
minus-squareGuybrushThreepwo0d@programming.devlinkfedilinkarrow-up8·1 year agoGo and look it up on stack overflow
minus-squareindepndnt@lemmy.worldlinkfedilinkarrow-up4·1 year agoIn short, it’s the wrong tool for the job. In practice, if your target is very limited and consistent, it’s probably fine. But as a general statement about someone’s behavior, it really sounds like someone is wasting a lot of time and regularly getting sub-par results.
I’m down with scraping, but “parses HTML with regex” has got me fucked up.
Relevant SO post. https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags#1732454
That was a great read. Thanks!
This is the way
What’s wrong with parsing HTML with regex?
Go and look it up on stack overflow
In short, it’s the wrong tool for the job.
In practice, if your target is very limited and consistent, it’s probably fine. But as a general statement about someone’s behavior, it really sounds like someone is wasting a lot of time and regularly getting sub-par results.