Thanks for your in-depth reply! I know Scrapy, but I don't see any real value in it compared to using BS4 + Requests vs using Scrapy.
Selenium is an option, having a nodeJS subprocess using nightmareJS instead is another. The pyvirtualdisplay + xvfb option to "give a head" to a headless browser is indeed an option, but what's the core functionality of having a headless browser for "clientside event automation"? Automating things without the need to do so manually and/or rendering the DOM. Nothing wrong with a few print()
statements while developing / debugging! Works just fine! ;-)
PS1: This was just part 1 of the web crawler mini-series. More will come very soon, be sure to follow along!
PS2: Since I'm treating my total Learn Python Series
as an interactive Python Book in the making (publishing episodes as tutorial parts via the Steem blockchain as we go), I must consider the sorted orderor subjects already discussed very carefully. Therefore I might not use technical mechanisms I would normally use in a real-life software development situation. For example, I haven't explained using a mongoDB instance for example, interfacing with it using pymongo, which of course would be my GoTo tool at developing a real-life web crawler. Instead, because I did just (earlier this week, 2 episodes ago) explained "handling files", I will (for now, in part 2 of the web crawler mini-series) just use plain .txt files (for I haven't discussed using CSV nor JSON either) for intermediate data storage.
See you around!
@scipio
I understand :)
I didn't know nightmareJS it seems to have pretty nice tools, I'm going to give it a try.
Thank's!
@zerocoolrocker