-
SQL Alchemy pipeline to add item to DB
In my last post, I talked about how to run 2 spiders concurrently. This post is a brief introduction to how to add scrapy items into database through the pipeline.
Continue Reading... -
Running multiple scrapy spiders programmatic...
This is a continuation of my last post about how to run scrapers from a python script. In this post I will be writing about how to manage 2 spiders. You can run over 30 spiders concurrently using this script.
Continue Reading... -
Running scrapy spider programmatically
I wanted to share something that I have been working on for the past few months, which is, running scrapers with the scrapy framework. I understand that scrapy has existed for many years, but it is still so relevant and useful for me and my team. We were hooked to it and started reading the docs daily on how to get it perfect. There are two ways of running a scrapy spider. You can run a scrapy spider from the command line or using a program.
Continue Reading... -
pylint and git hook for pre-commit
A short description of a git hook would be a script that runs when some important actions occur in your git repository. There are different kinds of actions (pre-commit, post-commit, pre-update, post-update, etc…) where you can run your code. What we will concentrate on would be the pre-commit action.
Continue Reading... -
elasticsearch and EC2
Elasticsearch has taught me so much about text searching. It has been a love-hate relationship since I started using it. Though I still consider myself a newbie, but the struggle has taught me how crucial it is to index my data the right way. Mapping it correctly, using the right analyzers on the right fields. This post isn’t about how great elasticsearch is but, about how there is so little to almost no documentation about setting up a elasticsearch cluster on EC2.
Continue Reading...