SQL: Subqueries vs Joins

Subqueries and joins are two important concepts in SQL, the standard programming language for managing and manipulating databases. In this blog, we will explore what subqueries and joins are, how they differ, and when to use each one in your SQL queries.

What are Subqueries?

A subquery is a query within a query. It is a way to nest a SELECT statement inside another SELECT, INSERT, UPDATE, or DELETE statement, or inside a CREATE VIEW statement. Subqueries can be used to perform a variety of tasks, such as:

  • Filtering rows based on a condition
  • Calculating values for use in the outer query
  • Returning a single value or multiple values as part of the outer query

What are Joins?

A join is a way to combine rows from two or more tables based on a related column between them. There are several types of joins, including INNER JOIN, OUTER JOIN, and CROSS JOIN.

An INNER JOIN returns only the rows that match the join condition.

An OUTER JOIN returns all rows from both tables, including the ones that don’t match the join condition. There are two types of outer joins: LEFT JOIN and RIGHT JOIN. A LEFT JOIN returns all rows from the left table (the first table in the query) and any matching rows from the right table. A RIGHT JOIN returns all rows from the right table (the second table in the query) and any matching rows from the left table.

A CROSS JOIN returns the Cartesian product of the two tables, which is a combination of all rows from both tables.

Practical Example:

In this scenario, we have a Hogwarts database and the following tables are listed below:

  • course_table: This table contains data regarding course names and is identified using a unique primary key
  • student_table: This table contains data regarding student names that are identified by a unique primary key
  • enrollment_table: This table is the JOIN table that establishes the relationship between students and courses. Each time a student enrolls in a course a new record is created. the course_id is the foreign key to identify a course from the course_table, and the student_id is a foreign key to identify a student from the student_table.
course: 
+─────+────────────────────────────────+
| id | course |
+─────+────────────────────────────────+
| 1 | Defense against the dark arts |
| 2 | Potions |
| 3 | History of magic |
+─────+────────────────────────────────+


student:
+─────+────────────────────────────────+
| id | student |
+─────+────────────────────────────────+
| 1 | Harry Potter |
| 2 | Hermione Granger |
| 3 | Ron Weasly |
| 4 | Voldemort |
| 5 | Draco Malfoy |
+─────+────────────────────────────────+

enrollment:
+─────+────────────+─────────────+
| id | course_id | student_id |
+─────+────────────+─────────────+
| 1 | 1 | 1 |
| 2 | 2 | 1 |
| 3 | 3 | 2 |
| 4 | 2 | 3 |
| 5 | 3 | 4 |
| 6 | 1 | 5 |
| 7 | 3 | 1 |
| 8 | 1 | 4 |
+─────+────────────+─────────────+

Now, what if we wanted to find the number of students enrolled in a particular course like Defense against the dark arts? we could use 2 following options, a Subquery, or a Join

Subqueries

Listed below is an example of the Subquery approach.

SELECT * FROM enrollment
WHERE enrollment.course_id IN (SELECT * FROM courses WHERE course_id = 1);

The subquery approach has a nested select statement within the query and sends 2 passes to the database.

JOIN

Listed below is the Join approach.

SELECT * FROM course AS c 
LEFT JOIN enrollment AS e ON c.id = e.course_id
WHERE e.course_id = 1

The join approach only uses 1 pass to the database. Left joining the enrollment table with aliases as seen above.

When to Use Subqueries and Joins

Subqueries and joins can both be used to retrieve data from multiple tables, but they have different use cases. Subqueries are generally used when you want to use the results of the inner query in the WHERE or HAVING clause of the outer query. They are also useful when you want to return a single value or multiple values as part of the outer query.

Joins are generally used when you want to retrieve data from multiple tables based on a related column between them. They are also useful when you want to return all rows from both tables, including the ones that don’t match the join condition.

In general, subqueries tend to be more efficient when you only need to return a few rows, while joins are more efficient when you need to return a large number of rows. However, the performance difference between the two can vary depending on the size of the tables and the complexity of the queries.

In conclusion, subqueries and joins are two important tools for retrieving data from multiple tables in SQL. Subqueries are useful for filtering rows and returning values as part of the outer query, while joins are useful for combining rows from multiple tables based on a related column. Both can be used to achieve similar goals, but the choice of which one to use depends on the specific requirements of your query.