Skip to content Skip to sidebar Skip to footer

Django Haystack Distinct Value For Field

I am building a small search engine using Django Haystack + Elasticsearch + Django REST Framework, and I'm trying to figure out reproduce the behavior of a Django QuerySet's distin

Solution 1:

I think the best advice I can give you is to stop using Haystack.

Haystack's default backend (the elasticsearch_backend.py) is mostly written with Solr in mind. There are a lot of annoyances that I find in haystack, but the biggest has to be that it packs all queries into something called query_string. Using query string, they can use the lucene syntax, but it also means losing the entire elasticsearch DSL. The lucene syntax has some advantages, especially if this is what you are used to, but it is very limiting from an elasticsearch point of view.

Furthermore, I think you are applying an RDBMS concept to a search engine. That isn't to say that you shouldn't get the results you need, but the approach is often different.

The way you might query and retrieve this data might be different if you don't use haystack because haystack creates indexes in a way more appropriate for solr than for elasticsearch.

For example, in creating a new index, haystack will assign a "type" called "modelresult" to all models that will go in an index.

So, let's say you have some entities called Items and some other entities called vendoritems.

It might be appropriate to have them both in the same index but with vendoritems as a type of vendoritems and items having a type of items.

When querying, you would then query based on the rest endpoint so, something like localhost:9200/index/type (query). The way haystack achieves is this is through the django content types module. Accordingly, there is a field called "django_ct" that haystack queries and attaches to any query you might make when you are only looking for unique items.

To illustrate the above:

This endpoint searches accross all indexes

`localhost:9200/`

This endpoint searches across all types in an index:

`localhost:9200/yourindex/`

This endpoint searches in a type within an index:

`localhost:9200/yourindex/yourtype/`

and this endpoint searches in two specified types within an index:

`localhost:9200/yourindex/yourtype,yourothertype/`

Back to haystack though, you can possibly get unique values by adding a django_ct to your query, but likely that isn't what you want.

What you really want to do is a facet, and probably you want to use term facets. This could be a problem in haystack because it A.) analyzes all text and B.) applies store=True to all fields (really not something you want to do in elasticsearch, but something you often want to do in solr).

You can order facet results in elasticsearch (http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-facets-terms-facet.html#_ordering)

I don't mean for this to be a slam on haystack. I think it does a lot of things right conceptually. It's especially good if all you need to do is index a single model (like say a blog) and just have it quickly return results.

That said, I highly recommend to use elasticutils. Some of the concepts from haystack are similar, but it uses the search dsl, rather than query_string (but you can still use query_string if you wanted).

Be warned though, I don't think you can order facets using elasticutils by default, but you can just pass in a python dictionary of the facets you want to facet_raw method (something I don't think you can do in haystack).

Your last option is to create your own haystack backend, inherit from the existing backend and just add some functionality to the .facet() method to allow for ordering per the above dsl.

Post a Comment for "Django Haystack Distinct Value For Field"