Scrapy is finally 1.0 !!!

Jun 29, 2015 • 2 minutes to read • Last Updated: Oct 25, 2017

When I started using Scrapy I was totally blown away by how easily you could write scrapers and start scraping within minutes. You might be familiar with Scrapy v0.24 if you have started scraping recently just like me. But what is great about this version is that over 7 years of testing later, Pablo and his team have finally moved to their first production release.

So here are some points about the next release that you should be excited about if you are an existing Scrapy developer. Some of these code snippets can be found on the official release notes.

scrapy.Item replaced by python dictionaries

You know how you needed to import scrapy.Item in your custom item classes ? Well in v1.0 you can forget about it all! Heres a code snippet to return your item without the importing an extended scrapy Item class.

No more doing this :

from scrapy import Item, Field

class CustomItem(Item):
    title = Field()
    url = Field()
    date = Field()

Instead do this in your scrapy Spider class:

from scrapy import Spider
class CustomSpider(Spider):

    def parse(self, response):
        return {
            'title': response.css('h1'), 
            'url': response.url,
            'date': response.css('date')
        }

Custom spider settings

Remember the settings you needed to write down in your settings.py file that would make it confusing for you to keep track of when you looked at your spider. Well, you have a choice now to write down your settings in a scrapy Spider instead.

class MySpider(scrapy.Spider):
    custom_settings = {
        "DOWNLOAD_DELAY": 5.0,
        "RETRY_ENABLED": False,
    }

Logging with the python logger

Scrapy is built on top of this marvellous thing that you have never heard of called Twisted. Scrapy does this work so efficiently that you will never need to read the logs that Twisted has created. I would definitely recommend reading about Twisted in your free time. It’s beautiful how someone thought about using a single threaded application to work around the python GIL. Okay, back to logging, you don’t need to use import log from scrapy module anymore, instead use the basic python logger. Code snippet can be found here.

There are other changes in the log that you can read about but the most essential ones are covered above. Let me know how you are planning to scale or start using Scrapy 1.0 version.