Disqus - Latest Comments for kmike

Re: Writing Scrapy Spiders in 2020

Mikhail Korobov — Fri, 03 Jul 2020 09:08:33 -0000

Awesome article Valdir!

It seems it was reposted with minor changes at https://towardsdatascience.... - are you in contact with Aaron?

---

A small correction: in the last example parse_full_blog_post should receive "date" argument, not "post_date".

Re: Web Scraping with Scrapy and MongoDB - Real Python

Mikhail Korobov — Mon, 09 Nov 2015 06:05:12 -0000

Hey,

MongoDBPipeline has a small bug - it is not closing connection properly. Check http://doc.scrapy.org/en/la.... Not that it matters in this example, but in some cases (like https://github.com/scraping... it could.

Re: Rules of Thumb for Methods and Functions

Mikhail Korobov — Thu, 19 Jun 2014 12:42:18 -0000

The intention to minimize state changes is great.

What about classmethods? If a public function is called by other functions from the same module (or may be called in future) then writing it as a @classmethod instead of just a function has advantages - this enables overriding it in subclasses; for module-level functions the alternative is either copy-pasting or monkey-patching.

Re: Cross-Python metaclasses

Mikhail Korobov — Sat, 15 Mar 2014 07:06:42 -0000

Have you seen six.add_metaclass decorator? It does basically the same as yours, but also handles a few edge cases (slots, weakrefs).

Re: Custom event for detecting fetch errors in Backbone.js

Mikhail Korobov — Wed, 13 Nov 2013 20:00:16 -0000

I think the docs are wrong for "error" event - it is fired when collection's fetch fails. Look for 'wrapError' calls in backbone's source code.

Re: kmike.ru

Mikhail Korobov — Mon, 15 Apr 2013 10:17:39 -0000

Thanks! I'll add them to list.

Re: Declaring dependencies in Python

Mikhail Korobov — Tue, 09 Apr 2013 16:45:05 -0000

In my experience install_requires was always problematic because

a) 'pip install -U package' upgrades all packages listed in install_requires, and

b) packages in install_requires could sometimes overwrite locally installed packages (when setuptools doesn't know about them - I don't remember the details, but it may the case with plain distutils packages installed via setup.py).

Example 1: when incompatible dateutil 2.0 was released, I broke things more than once because of packages that list dateutil in install_requirements (update package => distutils gets updated => computer explodes). Pinning dateutil version in install_requires couldn't help because dateutil 2.1 became compatible again and it is not possible to predict such changes.

Example 2: some packages list 'django' in install_requires, and they are hard to use with django development version because installing such packages overwrites local django copy.

I'm not sure all these problems still persist, but they bite me in past many times - I tend to avoid "install_requires" now; installing from pip requirements is not that hard.

Re: PyCon RU 2013

Mikhail Korobov — Wed, 27 Feb 2013 04:58:20 -0000

Мастер-класс был крутой!

только про open-source не Jacob Caplan-Moss выступал, а Russell Keith-Magee :)

Re: The Sorry State of Trie Implementations in Python

Mikhail Korobov — Thu, 31 Jan 2013 04:46:27 -0000

It turns out that for your task pointer-based Patricia Trie suites best - because it provides fast unordered inserts and because it saves some memory if keys are long and have long unique parts. For word data or for telephones Patricia Trie may be not ideal because there is not almost no unique parts and support for this feature requires more memory. Random inserts and updates have their own cost - for example, marisa-trie or DAWG can sometimes store the same data using 20x-100x less memory than Patricia-Trie (e.g. when data is highly duplicated).

Most of these trie packages are not implementations of a single basic "Trie" data structure; they implement different data structures with their own trade-offs and unique features; I'd not call this a "sorry state".

Re: kmike.ru

Mikhail Korobov — Mon, 28 Jan 2013 09:04:50 -0000

What I was missing is that NetworkX uses numpy for heavy calculations. Added to a list.

Re: kmike.ru

Mikhail Korobov — Mon, 28 Jan 2013 08:34:50 -0000

Unfortunately I'm not aware of pure-Python rope implementations.

Blist may be seen as an implementation of Ropes (I didn't mention this in the article).

Re: With Strings Attached

Mikhail Korobov — Thu, 03 Jan 2013 02:36:13 -0000

Hmm. Python 3.x is definitely easier to work with, but I'd say that if you're explicit about encodings and don't use __str__ and __repr__ heavily you should be fine with unicode under Python 2.x most of time. This article may give an impression that unicode under Python 2.x is a nightmare, but it isn't. Avoid writing non-ascii __str__ and __repr__ under Python 2.x, convert data to unicode as soon as possible, be explicit about encodings, and the issues described in the article should disappear.

..but if you want non-ascii __repr__ and __str__ then yes, the "right" answer is to change to Python 3.x :)

Re: kmike.ru

Mikhail Korobov — Wed, 05 Sep 2012 09:32:26 -0000

Sure, that's what I meant writing "a lot of work" :)

Re: kmike.ru

Mikhail Korobov — Wed, 05 Sep 2012 07:48:19 -0000

Many of these structures doesn't support "add" and "remove"; inserting to the beginning, inserting to the middle, appending, inserting in sorted order vs inserting in an unsorted order may all have different performance characteristics (and algorithmic complexity) for different data structures.

Re: kmike.ru

Mikhail Korobov — Mon, 03 Sep 2012 07:16:27 -0000

That's a lot of work :) I don't think there can be a single example application for all data structures, they are all very different. That said, datrie, marisa-trie, hat-trie and DAWG share the same benchmark suite (with a minor tweaks), have similar purpose and and all have a similar interface; benchmark results for these packages are in READMEs in corresponding repositories. And don't trust benchmarks, they are a lie, it is almost always better to measure yoursef (%timeit magic from ipython makes this very easy).

Re: kmike.ru

Mikhail Korobov — Sun, 02 Sep 2012 05:34:28 -0000

As far as I can tell, NetworkX is a pure-Python package. This is not a drawback and may be beneficial in many ways (e.g. such packages are easier to maintain and faster under pypy) but this list excludes pure-Python implementations intentionally.

Re: kmike.ru

Mikhail Korobov — Sat, 01 Sep 2012 13:22:04 -0000

Thanks for the pointer!

Re: Extending User Model in Django

Mikhail Korobov — Fri, 06 Jan 2012 14:07:38 -0000

OneToOneField relations are already available on User model: just use my_user.userprofile in your case (or rename the model to Profile and use my_user.profile). You don't need the AUTH_PROFILE_MODULE and get_profile() in most cases.
Another trick is to use AutoOneToOneField (see https://bitbucket.org/wrar/... ): this way profile will be auto created on first access and signals are not needed.

Re: http://blip.tv/djangocon/advanced-django-form-usage-5573287

Mikhail Korobov — Wed, 21 Sep 2011 17:35:33 -0000

Just a small note: 'request.POST or None' trick will work for most views even if all form fields are empty becuse of the CSRF protection.

Re: kmike.ru

Mikhail Korobov — Tue, 14 Jun 2011 14:27:15 -0000

спасибо!

Re: Django Application Conventions

Mikhail Korobov — Mon, 03 Jan 2011 18:33:48 -0000

The conventions for views.py should probably change with upcoming django 1.3: authors of django apps shouldn't use `template_name` keyword argument, they should use TemplateResponse instead (http://docs.djangoproject.c... or write class-based views (http://docs.djangoproject.c....

Re: django-anonymizer released

Mikhail Korobov — Fri, 24 Dec 2010 13:15:35 -0000

Hi Luke,

There is also a slightly more powerful and popular https://github.com/alliterativeanimal/python-faker library for generating fake data. It supports e.g. birthday generation with Gauss distribution. Faker package by Dylan Clendenin should probably rip-off some of python-faker generators.

Re: Fuzzy testing with assertNumQueries

Mikhail Korobov — Tue, 30 Nov 2010 14:40:36 -0000

The landed assertNumQueries can't be used as a decorator unfortunately.

FuzzyInt is a very clever trick!

Re: Django patterns part 3: efficient generic relations

Mikhail Korobov — Sat, 20 Feb 2010 09:10:00 -0000

I've implemented something like this as a reusable model manager.

http://bitbucket.org/kmike/...

But the names of methods are not the best :)

Re: SIGUSR2 > The Case of the Unusable Reusable

Mikhail Korobov — Thu, 23 Jul 2009 02:26:00 -0000

Original django-faves app (hosted on google code) didn't have 1-sql-query capability. And I've done a similar thing: take it, implement what I need and put it on a bitbucket :)