MongoDB Introduction: NoSQL Document Databases
This tutorial introduces MongoDB and NoSQL databases. You'll learn what NoSQL means, how it differs from SQL, when to choose document storage, and how to perform CRUD operations (Create, Read, Update, Delete) in MongoDB using practical examples.
Estimated time: 45 minutes
Why This Matters
Problem statement:
Not all data fits neatly into tables.
Real-world data is messy. Social media posts with varying numbers of tags, product catalogs where items have different attributes, IoT sensors generating unpredictable JSON payloads, and user profiles with flexible schemas don't map cleanly to rigid SQL tables. Adding columns for every possible field creates sparse tables full of NULLs.
NoSQL databases solve these problems. They embrace flexibility, allowing each record to have its own structure. MongoDB stores data as JSON-like documents, making it natural to work with modern web applications that already speak JSON.
Practical benefits: MongoDB skills let you build applications with evolving schemas, handle unstructured data from APIs, scale horizontally across servers, and integrate seamlessly with JavaScript/Python ecosystems. Many startups and tech companies choose MongoDB for rapid prototyping and flexible data models.
Professional context: NoSQL databases power high-traffic applications like e-commerce catalogs, content management systems, real-time analytics, and mobile app backends. Understanding when to use NoSQL vs SQL is a crucial architectural decision.
Choose the right tool for the job—SQL for structured transactions, NoSQL for flexible documents.
Core Concepts
What Is NoSQL?
NoSQL = "Not Only SQL" (not "No SQL")
NoSQL databases provide alternative data models to traditional relational databases. They sacrifice some relational features (like complex joins and strict schemas) to gain flexibility, scalability, and performance for specific use cases.
Key characteristics:
- Schema flexibility: Documents can have different structures
- Horizontal scaling: Add more servers instead of upgrading hardware
- Denormalization: Store related data together rather than splitting across tables
- High availability: Built-in replication and distribution
Common misconception: NoSQL doesn't mean "no structure"; it means "flexible structure."
SQL vs NoSQL: Key Differences
| Aspect | SQL (Relational) | NoSQL (MongoDB) |
|---|---|---|
| Data Model | Tables with rows/columns | Collections with documents (JSON) |
| Schema | Fixed, predefined | Dynamic, flexible |
| Relationships | Foreign keys, JOINs | Embedded documents or references |
| Scaling | Vertical (bigger servers) | Horizontal (more servers) |
| Transactions | Strong ACID guarantees | Eventual consistency (varies) |
| Query Language | SQL | Query API (JavaScript-like) |
| Best For | Financial systems, structured data | Content management, catalogs, logs |
Example comparison:
SQL structure:
-- Two tables with foreign key
Customers: id, name, email
Orders: id, customer_id, order_date, amount
NoSQL structure:
// Single document with embedded data
{
"_id": "cust123",
"name": "John Doe",
"email": "john@example.com",
"orders": [
{ "order_date": "2025-01-15", "amount": 99.99 },
{ "order_date": "2025-02-20", "amount": 149.50 }
]
}
What Is ACID?
ACID = Atomicity, Consistency, Isolation, Durability
These properties guarantee reliable database transactions:
- Atomicity: All operations in a transaction succeed or all fail (no partial updates)
- Consistency: Database moves from one valid state to another (constraints maintained)
- Isolation: Concurrent transactions don't interfere with each other
- Durability: Committed data survives system failures
SQL databases: Strong ACID by default (every operation is a transaction)
NoSQL databases (including MongoDB):
- Traditionally prioritized performance over strict ACID
- MongoDB 4.0+ supports multi-document ACID transactions
- Single-document operations are always atomic
Trade-off: Traditional NoSQL favored eventual consistency for speed and scalability. Modern versions like MongoDB now offer both options.
What Is Document Storage?
Document databases store data as documents (JSON, BSON, XML). Each document is a self-contained record with key-value pairs.
Example document:
{
"_id": ObjectId("507f1f77bcf86cd799439011"),
"name": "Alice Johnson",
"email": "alice@company.com",
"age": 28,
"skills": ["Python", "SQL", "Docker"],
"address": {
"city": "Tirana",
"country": "Albania"
},
"hire_date": ISODate("2025-01-15T00:00:00Z"),
"active": true
}
Why documents?
- Natural fit for objects: Maps directly to JSON/Python dictionaries
- No schema migration: Add fields without ALTER TABLE
- Nested structures: Embed related data (address, arrays)
- Variable fields: Different documents can have different fields
MongoDB uses BSON (Binary JSON) internally for efficiency while exposing JSON interface.
NoSQL Types
NoSQL encompasses multiple database types, each optimized for different use cases:
1. Document Databases (MongoDB, CouchDB)
- Store JSON-like documents
- Best for: Content management, catalogs, user profiles
2. Key-Value Stores (Redis, DynamoDB)
- Simple key → value mapping
- Best for: Caching, session storage, real-time data
3. Column-Family Stores (Cassandra, HBase)
- Store data in columns rather than rows
- Best for: Time-series data, analytics, massive scale
4. Graph Databases (Neo4j, Amazon Neptune)
- Store nodes and relationships
- Best for: Social networks, recommendation engines, fraud detection
This tutorial focuses on MongoDB: The most popular document database.
Benefits of NoSQL Databases
Flexibility: Add fields to documents without downtime or migrations. Perfect for evolving requirements.
// No problem adding new field to one document
{ "name": "Bob", "email": "bob@co.com", "department": "Sales" }
When to choose NoSQL:
- Rapid development with changing requirements
- Hierarchical or nested data structures
- Need to scale horizontally
- Working with JSON APIs
- Real-time analytics on large datasets
When to stick with SQL:
- Complex transactions (banking, accounting)
- Strong consistency requirements
- Data with many relationships requiring joins
- Existing SQL ecosystem and expertise
Step-by-Step Guide
1. MongoDB Basics and Setup
Installing MongoDB:
# macOS with Homebrew
brew tap mongodb/brew
brew install mongodb-community
# Ubuntu/Debian
sudo apt-get install mongodb
# Windows: Download installer from mongodb.com
Starting MongoDB:
Cloud alternative: Use MongoDB Atlas (free tier available) at mongodb.com/atlas
Basic MongoDB terminology:
| SQL Term | MongoDB Equivalent |
|---|---|
| Database | Database |
| Table | Collection |
| Row | Document |
| Column | Field |
| Index | Index |
| JOIN | Embedding or $lookup |
2. Creating Databases and Collections
In MongoDB, databases and collections are created automatically when you insert data.
Switch to (or create) database:
// Switch to database (creates if doesn't exist)
use company_data
// Check current database
db
// Show all databases
show dbs
// Note: Empty databases don't appear until data is added
Collections are created implicitly:
// No need to explicitly create collection
// It's created automatically on first insert
db.employees.insertOne({
name: "John Doe",
email: "john@company.com"
})
// Show all collections in current database
show collections
// Explicitly create collection (optional)
db.createCollection("customers")
Dropping database/collection:
3. Inserting Documents
Insert single document:
// insertOne() adds a single document
db.employees.insertOne({
first_name: "Alice",
last_name: "Johnson",
email: "alice.j@company.com",
age: 28,
department: "Engineering",
skills: ["Python", "MongoDB", "Docker"],
hire_date: new Date("2025-01-15"),
salary: 75000,
active: true
})
// MongoDB automatically generates _id if not provided
// Returns: { acknowledged: true, insertedId: ObjectId("...") }
Insert multiple documents:
// insertMany() adds array of documents
db.employees.insertMany([
{
first_name: "Bob",
last_name: "Smith",
email: "bob.smith@company.com",
age: 35,
department: "Sales",
skills: ["Negotiation", "CRM"],
hire_date: new Date("2024-06-10"),
salary: 68000,
active: true
},
{
first_name: "Carol",
last_name: "Martinez",
email: "carol.m@company.com",
age: 42,
department: "Engineering",
skills: ["JavaScript", "React", "Node.js"],
hire_date: new Date("2023-03-22"),
salary: 92000,
active: true
},
{
first_name: "David",
last_name: "Chen",
email: "david.chen@company.com",
age: 29,
department: "Marketing",
skills: ["SEO", "Content", "Analytics"],
hire_date: new Date("2024-11-05"),
salary: 62000,
active: true
}
])
// Returns: { acknowledged: true, insertedIds: { '0': ObjectId("..."), '1': ObjectId("..."), ... } }
Nested documents:
// Documents can contain nested objects and arrays
db.employees.insertOne({
first_name: "Emma",
last_name: "Wilson",
email: "emma.w@company.com",
age: 31,
department: "HR",
address: {
street: "123 Main St",
city: "Tirana",
country: "Albania",
postal_code: "1001"
},
projects: [
{ name: "Recruitment Portal", role: "Lead", start_date: new Date("2024-01-01") },
{ name: "Training Program", role: "Coordinator", start_date: new Date("2024-06-15") }
],
salary: 70000
})
4. Querying Documents
MongoDB uses a rich query API instead of SQL.
Find all documents:
// Find all (like SELECT *)
db.employees.find()
// Pretty print
db.employees.find().pretty()
// Count documents
db.employees.countDocuments()
Find with criteria (WHERE equivalent):
// Single condition
db.employees.find({ department: "Engineering" })
// Multiple conditions (AND)
db.employees.find({
department: "Engineering",
salary: { $gte: 80000 }
})
// OR condition
db.employees.find({
$or: [
{ department: "Engineering" },
{ department: "Sales" }
]
})
// IN operator
db.employees.find({
department: { $in: ["Engineering", "Sales", "Marketing"] }
})
Comparison operators:
| Operator | Meaning | Example |
|---|---|---|
$eq |
Equals | { age: { $eq: 30 } } |
$ne |
Not equals | { active: { $ne: false } } |
$gt |
Greater than | { salary: { $gt: 70000 } } |
$gte |
Greater or equal | { age: { $gte: 30 } } |
$lt |
Less than | { age: { $lt: 40 } } |
$lte |
Less or equal | { salary: { $lte: 80000 } } |
$in |
In array | { dept: { $in: ["IT", "HR"] } } |
$nin |
Not in array | { status: { $nin: ["inactive"] } } |
Projection (select specific fields):
// Show only first_name and email (1 = include, 0 = exclude)
db.employees.find(
{ department: "Engineering" },
{ first_name: 1, email: 1, _id: 0 }
)
// Exclude specific fields
db.employees.find(
{},
{ salary: 0, _id: 0 }
)
Sorting:
// Sort by salary ascending (1 = ascending, -1 = descending)
db.employees.find().sort({ salary: 1 })
// Sort by department ascending, then salary descending
db.employees.find().sort({ department: 1, salary: -1 })
Limiting and skipping:
// Get first 5 documents
db.employees.find().limit(5)
// Skip first 10, then get 5 (pagination)
db.employees.find().skip(10).limit(5)
// Combine: sort, skip, limit
db.employees.find()
.sort({ hire_date: -1 })
.skip(10)
.limit(5)
Pattern matching (LIKE equivalent):
// Regex for pattern matching
// Find emails ending with @company.com
db.employees.find({
email: { $regex: "@company.com$" }
})
// Case-insensitive search
db.employees.find({
first_name: { $regex: "^a", $options: "i" } // Starts with 'a' or 'A'
})
Querying nested fields:
// Dot notation for nested fields
db.employees.find({
"address.city": "Tirana"
})
// Query array elements
db.employees.find({
skills: "Python" // Finds if Python is in skills array
})
// Array contains all
db.employees.find({
skills: { $all: ["Python", "MongoDB"] }
})
Find one document:
// Returns single document (or null)
db.employees.findOne({ email: "alice.j@company.com" })
// Useful for getting by ID
db.employees.findOne({ _id: ObjectId("507f1f77bcf86cd799439011") })
5. Updating Documents
Update single document:
// updateOne() modifies first matching document
db.employees.updateOne(
{ email: "alice.j@company.com" }, // Filter
{ $set: { salary: 82000 } } // Update
)
// Returns: { acknowledged: true, matchedCount: 1, modifiedCount: 1 }
Update operators:
| Operator | Purpose | Example |
|---|---|---|
$set |
Set field value | { $set: { age: 30 } } |
$unset |
Remove field | { $unset: { temp_field: "" } } |
$inc |
Increment number | { $inc: { salary: 5000 } } |
$mul |
Multiply | { $mul: { quantity: 2 } } |
$rename |
Rename field | { $rename: { "name": "full_name" } } |
$push |
Add to array | { $push: { skills: "Docker" } } |
$pull |
Remove from array | { $pull: { skills: "Java" } } |
$addToSet |
Add if not exists | { $addToSet: { tags: "new" } } |
Update multiple documents:
// updateMany() modifies all matching documents
db.employees.updateMany(
{ department: "Engineering" },
{ $inc: { salary: 5000 } } // Give 5k raise to all engineers
)
// Update all documents
db.employees.updateMany(
{},
{ $set: { reviewed: false } }
)
Update with multiple operators:
db.employees.updateOne(
{ email: "bob.smith@company.com" },
{
$set: { department: "Sales Management" },
$inc: { salary: 10000 },
$push: { skills: "Leadership" }
}
)
Upsert (update or insert):
// If document doesn't exist, create it
db.employees.updateOne(
{ email: "new.person@company.com" },
{
$set: {
first_name: "New",
last_name: "Person",
department: "IT",
salary: 60000
}
},
{ upsert: true } // Creates if not found
)
Replace entire document:
// replaceOne() replaces entire document (except _id)
db.employees.replaceOne(
{ email: "old@company.com" },
{
first_name: "Updated",
last_name: "Employee",
email: "new@company.com",
department: "Finance",
salary: 75000
}
)
// Warning: This removes all fields not in replacement document
6. Deleting Documents
Delete single document:
// deleteOne() removes first matching document
db.employees.deleteOne({
email: "person@company.com"
})
// Returns: { acknowledged: true, deletedCount: 1 }
Delete multiple documents:
// deleteMany() removes all matching documents
db.employees.deleteMany({
active: false
})
// Delete all documents in collection
db.employees.deleteMany({}) // Dangerous!
Delete with conditions:
// Delete employees hired before 2023
db.employees.deleteMany({
hire_date: { $lt: new Date("2023-01-01") }
})
// Delete by multiple criteria
db.employees.deleteMany({
department: "Temp",
salary: { $lt: 50000 }
})
7. Aggregation Pipeline
Aggregation performs complex data processing (like SQL GROUP BY, JOINs).
Basic aggregation:
// Calculate average salary by department
db.employees.aggregate([
{
$group: {
_id: "$department",
avg_salary: { $avg: "$salary" },
count: { $sum: 1 }
}
},
{
$sort: { avg_salary: -1 }
}
])
Aggregation stages:
| Stage | Purpose | SQL Equivalent |
|---|---|---|
$match |
Filter documents | WHERE |
$group |
Group by field | GROUP BY |
$sort |
Sort results | ORDER BY |
$project |
Select/reshape fields | SELECT |
$limit |
Limit results | LIMIT |
$skip |
Skip documents | OFFSET |
$lookup |
Join collections | JOIN |
$unwind |
Deconstruct arrays | - |
Complex aggregation example:
// Find top 3 highest paid employees per department
db.employees.aggregate([
// Stage 1: Filter active employees
{
$match: { active: true }
},
// Stage 2: Sort by department and salary
{
$sort: { department: 1, salary: -1 }
},
// Stage 3: Group by department and get top 3
{
$group: {
_id: "$department",
top_earners: {
$push: {
name: { $concat: ["$first_name", " ", "$last_name"] },
salary: "$salary"
}
}
}
},
// Stage 4: Limit to top 3 per department
{
$project: {
department: "$_id",
top_earners: { $slice: ["$top_earners", 3] }
}
}
])
Common aggregation operators:
// Count, sum, average, min, max
db.employees.aggregate([
{
$group: {
_id: null, // Group all documents together
total_employees: { $sum: 1 },
total_payroll: { $sum: "$salary" },
avg_salary: { $avg: "$salary" },
min_salary: { $min: "$salary" },
max_salary: { $max: "$salary" }
}
}
])
Common MongoDB Challenges
Handling Missing Fields
Problem: Documents can have different fields. Querying missing fields returns nothing.
// Check if field exists
db.employees.find({ phone: { $exists: true } })
// Check if field doesn't exist
db.employees.find({ phone: { $exists: false } })
// Provide default with $ifNull in aggregation
db.employees.aggregate([
{
$project: {
name: "$first_name",
phone: { $ifNull: ["$phone", "No phone provided"] }
}
}
])
Understanding _id vs Custom IDs
Problem: MongoDB auto-generates _id as ObjectId, but sometimes you want custom IDs.
// Auto-generated ObjectId
db.users.insertOne({ name: "Alice" })
// _id: ObjectId("507f1f77bcf86cd799439011")
// Custom ID
db.users.insertOne({ _id: "user_123", name: "Bob" })
// _id: "user_123"
// Query by ObjectId (must wrap in ObjectId())
db.users.findOne({ _id: ObjectId("507f1f77bcf86cd799439011") })
// Query by custom ID
db.users.findOne({ _id: "user_123" })
Avoiding Accidental Updates
Problem: Forgetting $set replaces entire document.
// WRONG: This replaces entire document with just { salary: 80000 }
db.employees.updateOne(
{ email: "alice@co.com" },
{ salary: 80000 } // Missing $set!
)
// CORRECT: Use $set to update specific fields
db.employees.updateOne(
{ email: "alice@co.com" },
{ $set: { salary: 80000 } }
)
Quick Reference
Essential MongoDB Commands
| Command | Purpose | Example |
|---|---|---|
use <db> |
Switch/create database | use company |
show dbs |
List databases | show dbs |
show collections |
List collections | show collections |
db.collection.insertOne() |
Insert document | db.users.insertOne({name: "Alice"}) |
db.collection.find() |
Query documents | db.users.find({age: {$gt: 25}}) |
db.collection.updateOne() |
Update document | db.users.updateOne({_id: 1}, {$set: {age: 30}}) |
db.collection.deleteOne() |
Delete document | db.users.deleteOne({_id: 1}) |
db.collection.countDocuments() |
Count documents | db.users.countDocuments() |
db.collection.drop() |
Delete collection | db.users.drop() |
Summary & Next Steps
Key accomplishments: You've learned what NoSQL means and when to use it, how MongoDB differs from SQL databases, what ACID properties ensure, how document storage works with JSON-like structures, the four types of NoSQL databases, how to perform CRUD operations in MongoDB, and how to use aggregation for complex queries.
Critical insights:
- NoSQL isn't anti-SQL: It's a complementary tool for different use cases
- Flexibility has trade-offs: Gain schema freedom, lose strict consistency guarantees
- Denormalization is normal: Embed related data instead of always splitting into references
- Indexes matter: Even flexible databases need optimization
When to use MongoDB:
- Rapid prototyping with evolving requirements
- Hierarchical/nested data (product catalogs, user profiles)
- Content management systems
- Real-time analytics and logging
- Applications already using JSON
When to use SQL:
- Financial transactions requiring strong consistency
- Complex multi-table relationships and joins
- Fixed schemas with strict validation
- Existing SQL infrastructure and expertise
What's next:
With MongoDB fundamentals mastered, explore replica sets (high availability), sharding (horizontal scaling), change streams (real-time data), and integrating MongoDB with Python using PyMongo or with Node.js using the native driver.
Practice resources:
- MongoDB University - free official courses with certifications
- MongoDB Documentation - comprehensive guides
- M001: MongoDB Basics - hands-on course
External resources:
- MongoDB Compass - GUI for MongoDB
- MongoDB Atlas - free cloud database hosting
- PyMongo Documentation - Python integration
Remember: SQL and NoSQL aren't competitors; they're tools for different jobs. Choose based on your data structure, consistency needs, and scalability requirements. Master both to become a versatile data professional.