Typesense – Fast Open Source Search by bodash

Share This Article

Sed ut perspiciatis unde.

Send to HN

Typesense is a fast, typo-tolerant search engine for building delightful search experiences.

An Open Source Algolia Alternative &
An Easier-to-Use ElasticSearch Alternative

✨ Here are a couple of live demos that show Typesense in action on large datasets:

Search a 32M songs dataset from MusicBrainz: songs-search.typesense.org
Search a 28M books dataset from OpenLibrary: books-search.typesense.org
Search a 2M recipe dataset from RecipeNLG: recipe-search.typesense.org
Search 1M Git commit messages from the Linux Kernel: linux-commits-search.typesense.org
Spellchecker with type-ahead, with 333K English words: spellcheck.typesense.org
An E-Commerce Store Browsing experience: ecommerce-store.typesense.org
GeoSearch / Browsing experience: airbnb-geosearch.typesense.org
Search / Browse xkcd comics by topic: xkcd-search.typesense.org
Semantic / Hybrid search on 300K HN comments: hn-comments-search.typesense.org

🗣️ 🎥 If you prefer watching videos:

Here’s one where we introduce Typesense and show a walk-through: https://youtu.be/F4mB0x_B1AE?t=144
Check out Typesense’s recent mention during Google I/O Developer Keynote: https://youtu.be/qBkyU1TJKDg?t=2399
Here’s one where one of our community members gives an overview of Typesense and shows you an end-to-end demo: https://www.youtube.com/watch?v=kwtHOkf7Jdg

Typo Tolerance: Handles typographical errors elegantly, out-of-the-box.
Simple and Delightful: Simple to set-up, integrate with, operate and scale.
⚡ Blazing Fast: Built in C++. Meticulously architected from the ground-up for low-latency (<50ms) instant searches.
Tunable Ranking: Easy to tailor your search results to perfection.
Sorting: Dynamically sort results based on a particular field at query time (helpful for features like “Sort by Price (asc)”).
Faceting & Filtering: Drill down and refine results.
Grouping & Distinct: Group similar results together to show more variety.
Federated Search: Search across multiple collections (indices) in a single HTTP request.
Geo Search: Search and sort by results around a latitude/longitude or within a bounding box.
Vector Search: Index embeddings from your machine learning models in Typesense and do a nearest-neighbor search. Can be used to build similarity search, semantic search, visual search, recommendations, etc.
Semantic / Hybrid Search: Automatically generate embeddings from within Typesense using built-in models like S-BERT, E-5, etc or use OpenAI, PaLM API, etc, for both queries and indexed data. This allows you to send JSON data into Typesense and build an out-of-the-box semantic search + keyword search experience.
Conversational Search (Built-in RAG): Send questions to Typesense and have the response be a fully-formed sentence, based on the data you’ve indexed in Typesense. Think ChatGPT, but over your own data.
Image Search: Search through images using text descriptions of their contents, or perform similarity searches, using the CLIP model.
Voice Search: Capture and send query via voice recordings – Typesense will transcribe (via Whisper model) and provide search results.
Scoped API Keys: Generate API keys that only allow access to certain records, for multi-tenant applications.
JOINs: Connect one or more collections via common reference fields and join them during query time. This allows you to model SQL-like relationships elegantly.
Synonyms: Define words as equivalents of each other, so searching for a word will also return results for the synonyms defined.
Curation & Merchandizing: Boost particular records to a fixed position in the search results, to feature them.
Raft-based Clustering: Setup a distributed cluster that is highly available.
Seamless Version Upgrades: As new versions of Typesense come out, upgrading is as simple as swapping out the binary and restarting Typesense.
No Runtime Dependencies: Typesense is a single binary that you can run locally or in production with a single command.

Don’t see a feature on this list? Search our issue tracker if someone has already requested it and add a comment to it explaining your use-case, or open a new issue if not. We prioritize our roadmap based on user feedback, so we’d love to hear from you.

Here’s Typesense’s public roadmap: https://typesense.link/roadmap.

The first column also explains how we prioritize features, how you can influence prioritization and our release cadence.

A dataset containing 2.2 Million recipes (recipe names and ingredients):
- Took up about 900MB of RAM when indexed in Typesense
- Took 3.6mins to index all 2.2M records
- On a server with 4vCPUs, Typesense was able to handle a concurrency of 104 concurrent search queries per second, with an average search processing time of 11ms.
A dataset containing 28 Million books (book titles, authors and categories):
- Took up about 14GB of RAM when indexed in Typesense
- Took 78mins to index all 28M records
- On a server with 4vCPUs, Typesense was able to handle a concurrency of 46 concurrent search queries per second, with an average search processing time of 28ms.
With a dataset containing 3 Million products (Amazon product data), Typesense was able to handle a throughput of 250 concurrent search queries per second on an 8-vCPU 3-node Highly Available Typesense cluster.

We’d love to benchmark with larger datasets, if we can find large ones in the public domain. If you have any suggestions for structured datasets that are open, please let us know by opening an issue. We’d also be delighted if you’re able to share benchmarks from your own large datasets. Please send us a PR!

Typesense is used by a range of users across different domains and verticals.

On Typesense Cloud we serve more than 10 BILLION searches per month. Typesense’s Docker images have been downloaded over 12M times.

We’ve recently started documenting who’s using it in our Showcase.
If you’d like to be included in the list, please feel free to edit SHOWCASE.md and send us a PR.

You’ll also see a list of user logos on the Typesense Cloud home page.

Option 1: You can download the binary packages that we publish for
Linux (x86_64 & arm64) and Mac (x86_64).

Option 2: You can also run Typesense from our official Docker image.

Option 3: Spin up a managed cluster with Typesense Cloud:

Here’s a quick example showcasing how you can create a collection, index a document and search it on Typesense.

Let’s begin by starting the Typesense server via Docker:

docker run -p 8108:8108 -v/tmp/data:/data typesense/typesense:28.0 --data-dir /data --api-key=Hu52dwsas2AdxdE

We have API Clients in a couple of languages, but let’s use the Python client for this example.

Install the Python client for Typesense:

We can now initialize the client and create a companies collection:

import typesense

client = typesense.Client({
  'api_key': 'Hu52dwsas2AdxdE',
  'nodes': [{
    'host': 'localhost',
    'port': '8108',
    'protocol': 'http'
  }],
  'connection_timeout_seconds': 2
})

create_response = client.collections.create({
  "name": "companies",
  "fields": [
    {"name": "company_name", "type": "string" },
    {"name": "num_employees", "type": "int32" },
    {"name": "country", "type": "string", "facet": True }
  ],
  "default_sorting_field": "num_employees"
})

Now, let’s add a document to the collection we just created:

document = {
 "id": "124",
 "company_name": "Stark Industries",
 "num_employees": 5215,
 "country": "USA"
}

client.collections['companies'].documents.create(document)

Finally, let’s search for the document we just indexed:

Did you notice the typo in the query text? No big deal. Typesense handles typographic errors out-of-the-box!

Step-by-step Walk-through

A step-by-step walk-through is available on our website here.

This will guide you through the process of starting up a Typesense server, indexing data in it and querying the data set.

Here’s our official API documentation, available on our website: https://typesense.org/api.

If you notice any issues with the documentation or walk-through, please let us know or send us a PR here: https://github.com/typesense/typesense-website.

While you can definitely use CURL to interact with Typesense Server directly, we offer official API clients to simplify using Typesense from your language of choice. The API Clients come built-

Typesense – Fast Open Source Search by bodash

Typesense – Fast Open Source Search by bodash

Share This Article

Newsletter

Step-by-step Walk-through

HackTech

Leave a comment Cancel reply

Editor's Choice

Typesense – Fast Open Source Search by bodash

Typesense – Fast Open Source Search by bodash

Share This Article

Newsletter

Step-by-step Walk-through

HackTech

Leave a comment Cancel reply

Editor's Choice

Sign Up to Our Newsletter