Draft:Crawlee

Crawlee
Developer(s)	Apify
Initial release	13 July 2022
Written in	Typescript, Python
Operating system	Windows, macOS, Linux
Type	Web crawler
License	Apache License 2.0

Submission declined on 27 October 2024 by Reading Beans (talk).

This submission is not adequately supported by reliable sources. Reliable sources are required so that information can be verified. If you need help with referencing, please see Referencing for beginners and Citing sources.

If you would like to continue working on the submission, click on the "Edit" tab at the top of the window.
If you have not resolved the issues listed above, your draft will be declined again and potentially deleted.
If you need extra help, please ask us a question at the AfC Help Desk or get live help from experienced editors.
Please do not remove reviewer comments or this notice until the submission is accepted.

Where to get help

If you need help editing or submitting your draft, please ask us a question at the AfC Help Desk or get live help from experienced editors. These venues are only for help with editing and the submission process, not to get reviews.
If you need feedback on your draft, or if the review is taking a lot of time, you can try asking for help on the talk page of a relevant WikiProject. Some WikiProjects are more active than others so a speedy reply is not guaranteed.

How to improve a draft

Wikipedia:Contributing to Wikipedia – a basic overview on how to edit Wikipedia.
Help:Wikitext – how to use the markup
Help:Referencing for beginners – how to include references
Wikipedia:Article development – how to develop your article
Wikipedia:Writing better articles – how to improve your article
Wikipedia:Verifiability – make sure your article includes reliable third-party sources

You can also browse Wikipedia:Featured articles and Wikipedia:Good articles to find examples of Wikipedia's best writing on topics similar to your proposed article.

Improving your odds of a speedy review

To improve your odds of a faster review, tag your draft with relevant WikiProject tags using the button below. This will let reviewers know a new draft has been submitted in their area of interest. For instance, if you wrote about a female astronomer, you would want to add the Biography, Astronomy, and Women scientists tags.

Add tags to your draft

Editor resources

Easy tools: Citation bot (help) | Advanced: Fix bare URLs

Declined by Reading Beans 16 days ago. Last edited by Reading Beans 16 days ago. Reviewer: Inform author.

Resubmit

Please note that if the issues are not fixed, the draft will be declined again.

Crawlee is a free and open-source web-crawling and browser automation library developed by Apify. The original TypeScript version was first released in 2022, with a Python version added in 2024.

Crawlee's architecture is built around modular crawlers responsible for extracting data from websites.^[1]. The library follows a declarative programming approach, where users define crawling logic through a structured set of rules. Crawlee uses queues to manage requests; for each request, a specific function is executed to extract data or perform further processing^[2].

Crawlee supports both headless browser sessions (via Playwright and other browser automation software) and plain HTTP request-based scraping.

It also provides various web-scraping-related utilities, such as a sitemap parser^[3] or an automatic HTTP proxy manager.

Notable mentions of Crawlee's use in web-crawling projects include GPT Crawler by Builder.io^[4] and various generative AI projects maintained by AWS Labs^[5].

History

The first stable TypeScript version was released in 2021 under the name Apify SDK^[6]. This version offered both the open-source crawling framework and the proprietary storage implementation for use on the Apify platform.

In 2022, version v3.0.0 was released^[7], renaming the library to Crawlee. This update made Crawlee independent of the Apify Platform, moving most of the Apify-specific features into a separate package (also named Apify SDK).

In 2024, a beta version of Crawlee for Python was released^[8]

References

^ Koekemoer, Jakkie. "Web Scraping with Crawlee: Step-By-Step Tutorial". Bright Data.
^ Nechytailo, Yelyzaveta. "Crawlee Tutorial: Easy Web Scraping and Browser Automation". oxylabs.io.
^ "Release v3.7.0 · apify/crawlee". GitHub. Retrieved 22 September 2024.
^ "BuilderIO/gpt-crawler: Crawl a site to generate knowledge files to create your own custom GPT from a URL". GitHub. Retrieved 21 September 2024.
^ "awslabs/generative-ai-cdk-constructs: AWS Generative AI CDK Constructs are sample implementations of AWS CDK for common generative AI patterns". GitHub. Amazon Web Services - Labs. 20 September 2024. Retrieved 21 September 2024.
^ "Release v1.0.0 · apify/crawlee". GitHub.
^ "Release v3.0.0 · apify/crawlee". GitHub.
^ "Announcing Crawlee for Python: Now you can use Python to build reliable web crawlers | Crawlee · Build reliable crawlers. Fast". crawlee.dev. 5 July 2024.

[1] Koekemoer, Jakkie. "Web Scraping with Crawlee: Step-By-Step Tutorial". Bright Data.

[2] Nechytailo, Yelyzaveta. "Crawlee Tutorial: Easy Web Scraping and Browser Automation". oxylabs.io.

[3] "Release v3.7.0 · apify/crawlee". GitHub. Retrieved 22 September 2024.

[4] "BuilderIO/gpt-crawler: Crawl a site to generate knowledge files to create your own custom GPT from a URL". GitHub. Retrieved 21 September 2024.

[5] "awslabs/generative-ai-cdk-constructs: AWS Generative AI CDK Constructs are sample implementations of AWS CDK for common generative AI patterns". GitHub. Amazon Web Services - Labs. 20 September 2024. Retrieved 21 September 2024.

[6] "Release v1.0.0 · apify/crawlee". GitHub.

[7] "Release v3.0.0 · apify/crawlee". GitHub.

[8] "Announcing Crawlee for Python: Now you can use Python to build reliable web crawlers | Crawlee · Build reliable crawlers. Fast". crawlee.dev. 5 July 2024.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]