Fine-tuning locality-sensitive hashing

Locality-sensitive hashing (LSH) allows for fast retrieval of similar objects from an index - orders of magnitude faster than simple search at the cost of some additional computation and some false positives/negatives. In the last post I introduced LSH for angular distance. In this one I will tell you how you can fine-tune it to get the expected results.
Read more →

Locality-sensitive hashing for angular distance in Python

Locality-sensitive hashing (LSH) is an important group of techniques which can be used to speed up vastly the task of finding similar sets or vectors.
Read more →

Text indexing in python - mapping text to values using finite-state transducers

In the previous posts I wrote about the finite-state automata (FSA). Now we’ll cover finite-state transducers (FST), which allow to index text with values in libraries such as elasticsearch.
Read more →

Text indexing in Python - constructing FSA from unsorted input

In this post we’ll take closer look at the Python implementation of algorithm for constructing finite-state automata from unsorted set of words.
Read more →

Text indexing in Python with minimal finite-state automata

Have you ever wondered how Lucene/Elasticsearch does its job so well? This post will teach you about essential part of the Lucene index - minimal finite-state automaton (FSA).
Read more →