Databases & Storage — AI Education

What Is a Database?

A database is an organized collection of data. Think of it as a massive spreadsheet, but with rules.

A regular spreadsheet lets you put anything anywhere. You could write someone's name in a column meant for phone numbers. A database says "no, this column is for phone numbers, and it must follow this format." It enforces structure so the data stays clean and reliable.

This is where CRUD actually happens. When we talked about Create, Read, Update, and Delete in Lesson 2, those operations are happening in a database. The database is the storage. The API (Lesson 7) is how you talk to it from the outside. The database is the kitchen; the API is the waiter.

Key insight

Every app you use has a database behind it. Instagram stores your photos and followers. Gmail stores your emails. Your company's CRM stores contacts and deals. The app is just a pretty face on top of a database.

Tables, Rows, and Columns: The Structure

If a database is a spreadsheet workbook, then:

A table is a single tab (sheet). One table for users, another for deals, another for emails.
A row is one entry. One user. One deal. One email.
A column is a field. Name, email, role, company.

Here's what a simple users table looks like:

id	name	email	role	company
1	Sarah Chen	sarah@acme.com	Sales Rep	Acme Corp
2	Marcus Johnson	marcus@globex.com	Account Exec	Globex Inc
3	Priya Patel	priya@initech.com	Sales Rep	Initech

Three rows, five columns, one table. Every database, no matter how complex, is just tables of rows and columns. A typical SaaS product might have tables for users, email templates, schedules, and activity logs, and each one looks like this.

SQL: The Language for Talking to Databases

SQL (Structured Query Language, pronounced "sequel") is the language you use to ask a database questions. It reads almost like English:

SELECT name, email FROM users WHERE role = 'Sales Rep'

That says: "Give me the name and email from the users table, but only where the role is Sales Rep." That's a Read operation. Here are the four CRUD operations in SQL:

Create: INSERT INTO users (name, email) VALUES ('Alex', 'alex@co.com')
Read: SELECT * FROM users WHERE company = 'Acme Corp'
Update: UPDATE users SET role = 'Manager' WHERE name = 'Sarah Chen'
Delete: DELETE FROM users WHERE id = 3

You don't need to memorize these. The point is that SQL is readable. If someone shows you a query, you can usually figure out what it does just by reading it left to right.

Try it — simulated SQL playground

users5 rows

ID	Name	Email	Role	Company
1	Sarah Chen	sarah@acme.com	Sales Rep	Acme Corp
2	Marcus Johnson	marcus@globex.com	Account Exec	Globex Inc
3	Priya Patel	priya@initech.com	Sales Rep	Initech
4	James Wilson	james@acme.com	Manager	Acme Corp
5	Ana Torres	ana@globex.com	Account Exec	Globex Inc

Run a query

Build a SQL Query

users6 rows

name	email	role
Sarah Chen	sarah@acme.com	Sales Rep
Marcus Johnson	marcus@globex.com	Account Exec
Priya Patel	priya@initech.com	Sales Rep
James Wilson	james@acme.com	Manager
Ana Torres	ana@globex.com	Account Exec
David Kim	david@initech.com	Sales Rep

SELECT (pick columns)

WHERE (filter)

ORDER BY (sort)

Your query

SELECT * FROM users;

Relational vs. Document: Two Ways to Organize Data

Not all databases work the same way. The two most common types:

Relational databases (like PostgreSQL) use the table structure we just discussed. Data is organized into strict rows and columns, and tables can reference each other. A deals table might have a contact_id column that points to a row in the contacts table. This is called a relation, hence "relational." It's rigid, structured, and great for data that has clear relationships.

Document databases (like MongoDB or Firestore) store data as flexible documents. Think of each record as its own JSON blob. A user document might have nested data inside it: their profile, their settings, their recent activity, all in one place. No strict columns. You can add fields to one document without adding them to every other document.

Real-world example

Many teams use both. PostgreSQL might power an email tool and a CRM integration, handling structured data with clear relationships between tables. A document database might power a dashboard where each record is a flexible document and fields can vary from entry to entry.

In-Memory vs. Persistent: Speed and Survival

There are two places data can live:

In memory (RAM) is like a whiteboard. Extremely fast to read and write, but temporary. When the power goes off, everything on the whiteboard is erased. Programs use RAM for data they need right now, this second.

On disk (persistent storage) is like a filing cabinet. Slower to access, but permanent. Turn the power off, turn it back on, the data is still there. Databases store data on disk.

Cache is the bridge between the two. It's keeping frequently-used data on the whiteboard so you don't have to walk to the filing cabinet every time. When a dashboard loads, it might cache query results in memory for a few minutes. If someone refreshes the page, it serves the cached version instantly instead of querying the database again.

The tradeoff is always the same: speed vs. durability. RAM is fast but temporary. Disk is slow but permanent. Good systems use both.

A Typical Database Architecture

Here's what a real-world database architecture might look like for a suite of internal tools:

A relational database server (like PostgreSQL on a cloud provider) with separate databases for each project:

notifications for automated email report data, templates, schedules
outreach for follow-up email drafts, CRM sync state, inbox sync state
scheduling for calendar entries and scheduling data
campaigns for marketing outreach campaigns and contact data

A document database (like Firestore or MongoDB) for:

Dashboard overlay data (notes, status flags, flexible metadata that varies per record)

An analytics layer (like Metabase, Redash, or Looker). Not technically a database, but a reporting tool that queries other databases and returns results via API. Dashboards often pull live data from an analytics layer rather than querying the database directly.