Some are build as an SPA and that requires per definition a browser based approach. Also because lots of ecommerce sites rely on alot on JavaScript. Since I want to scrape different ecommerce sites spinning up an actual browser looked like the way to go. Others spin up and entire (headless) browser and perform actual DOM operations. Some packages just perform Http calls and evaluate the response. Web scraping comes in different shapes and sizes. Later on I will have to develop some UI which discloses the information for ecommerce traders. Lastly the output of the scraper has to be stored in a database. I’m scraping ecommerce sites and the pages that need to be scraped depend on a list of id’s comming from a database. This frequency might change in the future so I don’t want to have it build in hard coded. I want to scrape certain websites twice a day. Secondly I only want to pay for actual usage and not for a VM thats idle. I don’t want to pay for a VM and just deploy the scraper on it because I need the solution to be scalable. Utilizing Serverless and PaaS services is challenging.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |