WTF is a vector database?

Issue #296.June 10, 2024.2 Minute read.
Bytes

Today’s issue: Cloudflare’s friendly fire, die-hard Electron fans, and lots of big database words.

Welcome to #296.

If you're reading this...

You either haven't opened an issue in a while, or you really care about your privacy and we can't tell (which we're cool with).

Regardless, the Email Overlords™ punish us if we send emails that don't get opened.

If you'd like to keep receiving Bytes, you'll need to click the link below to opt back in. If you don't, you'll be removed.

Keep Receiving Bytes


Eyeballs logo

The Main Thing

Screenshot of Voldemort and Professor Quirrell

The original vector database

WTF is a vector database?

Now that every SaaS startup has transformed itself into an AI startup, we shouldn’t be surprised that every database startup is now offering a vector database to try and cash in on the gold rush.

Supabase launched a Postgres Vector database, PlanetScale brought vector search to MySQL, and Turso just introduced vector search to SQLite last week.

But wtf is a vector database? It’s a database that stores your data as high-dimensional vectors – arrays of numbers that represent features or attributes of your data.

Each vector can have up to thousands of different dimensions, which are generated by applying an embedding function to your raw data (text, images, etc). The closer that two vectors are to each other, the more similar their associated data.

How is this different than other DBs? Vector search is able to compare the similarity of a query to the vector data and return the most relevant items – without needing to query your DB based on exact matches or predefined criteria. This allows you to increase the quality, performance, and relevance of the results.

Vector DBs are great for coming up with relevant recommendations for products and videos – but as you’ve probably guessed by now, they’re also perfect for AI apps that want to generate more relevant and coherent responses from LLMs or image models. So don’t expect them to go away any time soon.

Bottom Line: It’ll be interesting to see if any of the database startups can find innovative ways to make vector DBs more accessible to the masses, since they often require significantly higher compute and scaling costs.

But I guess there’s a reason why you can’t spell vector without “VC.”

        

OneSchema logo

Our Friends
(With Benefits)

A backpack hanging on by a thread

Live look at your app's CSV importer right now

Launch CSV imports 10x faster with OneSchema

If you’re looking for more pain and misery in your life, I suggest building CSV import functionality from scratch – it takes forever, and it’ll probably break as soon as a user tries to upload a messy data file.

That’s why smart teams like Ramp and Vanta use OneSchema’s embeddable CSV importer to skip all that pain. It’s easy for you to set up and even easier for your customers to use.

But what makes it better than other CSV importers?

  1. Powerful bulk data editing features for your users like find & replace, auto-fix errors, and bulk delete/add rows

  2. By far the most performant data importer at scale (see benchmarks)

  3. Enterprise features like localization, advanced UI customization, and automations with the OneSchema API

Check it out — and see how it saves teams 6 months (!!) of engineering time on average.


Spot the Bug logo

Spot the Bug

Sponsored by Datadog

They created this Azure OpenAI Cheatsheet, which shows you how to use their preconfigured dashboard to monitor your API requests, token usage, and the performance of your LLM.

function differenceInMilliseconds(date1, date2) {
  const { getTime: getTime1 } = new Date(date1);
  const { getTime: getTime2 } = new Date(date2);
  return getTime1() - getTime2();
}

differenceInMilliseconds('2021-01-01', '2021-01-02');

Cool Bits logo

Cool Bits

  1. Astro 4.10 just came out with experimental type-safe env variables, new enhancements to the Container API, server actions, and lots more.

  2. Sunil Pai got Remix working inside a Cloudflare durable object. Not to be outdone, I got an old Webpack app working on my local machine this past weekend.

  3. Alexandru Ică wrote about Morphing arbitrary paths in SVG.

  4. QA Wolf can get your web app to 80% test coverage in weeks, not years. They build and maintain your test suite in Playwright, provide unlimited parallel test runs on their infra, and send human-verified bug reports directly to you. [sponsored]

  5. Replit just open-sourced their desktop app, which is great news if you are a diehard Electron fan.

  6. Alex Booker wrote this step-by-step breakdown on building a modern, authenticated chat app Next.js app router, Ably, and Clerk.

  7. Valibot v0.31.0 is a complete rewrite of the type-safe schema library for validating structural data.

  8. The ESLint team just released a Configuration Migrator to make it easier to migrate your .eslintrc file to eslint.config.js.

  9. The TypeScript 5.5 RC just launched with Inferred Type Predicates and lots more.

  10. Erik Heemskerk wrote about how htmx is a beacon of simplicity in an age of complex JavaScript solutions. Unfortunately, those complex JavaScript solutions are also propping up our entire job market so there’s that.


Spot the Bug logo

Spot the Bug: Solution

Sponsored by Datadog

function differenceInMilliseconds(date1, date2) {
  const t1 = new Date(date1).getTime();
  const t2 = new Date(date2).getTime();
  return t1 - t2;
}

differenceInMilliseconds('2021-01-01', '2021-01-02');

In JavaScript, class methods are not direct properties of an instance, but rather belong to the class’s prototype. When you try to destructure a method, you’re attempting to extract it directly from the instance, which doesn’t work because getTime isn’t a direct property. On the other hand, new Date().getTime() works because JavaScript checks the prototype chain and finds getTime on the Date prototype.

Here’s a thing we wrote on the topic a while back if you’d like more info. – A Beginner’s Guide to JavaScript’s Prototype.