A Brief Introduction To SQL

Datascience George
3 min readOct 5, 2020

SQL or “structured query language” is a type of programming language that works with relational databases. A database is like a gigantic excel file that holds tabular data. In this blog post I will cover the basics of SQL including the select statement, aggregate functions, grouping, and the where clause. Here is a small table I will use as an example.

customersname    age  favorite_food  cityFred    34   apple          NY
Alex 44 broccoli NY
John 28 banana NY
John 24 walnut LA
Sarah 56 orange LA

To take data from a table you will often use the select statement. simply write ‘SELECT’ (it is common to use all caps), then follow that with the names of the columns that you want, then type ‘FROM’ and then follow that with the table that you want to query from. Here is what it would look like.

SELECT name FROM customers>>>
Fred
Alex
John
John
Sarah

Notice that we wrote “name” to refer to the column we wanted to use, and that the query returned every entry in the table, even the duplicates. To avoid duplicates write the “distinct” keyword before the column name.

Next up is aggregate functions. An aggregate function can be thought of as a statistic for a column in the table. Some aggregate functions are sum, average, count, min, and max. Average will be shortened to “avg”. These simple operations are essential and easy to use, however they only work on continuous numerical data. To use one simply write the function the write the name of the column in parenthesis behind the function name. Make sure to use this inside a select statement. Here is an example where we find the maximum age.

SELECT MAX(age) FROM customer>>>
56

Notice how only one value is returned.

Grouping works by taking a column and removing its duplicates to form a group. Individual rows can no longer be accessed at this point, only aggregate functions can be used since the data has been grouped. This is an essential tool to use when you have columns that are full of duplicates. To perform a “group by” simply make a select statement and select the name of the column you are grouping by. Then select along with it any aggregate information you need. After the select cause write “group by” and then write the column you are selecting from again. Here is an example where I find the maximum age for each city.

SELECT city, MAX(age) FROM customers
GROUP BY city
>>>
NY 44
LA 56

Notice how it prints both the city and the maximum age for each group.

Finally we have the “where” clause. This clause cannot be used with group by (look into the “having” clause). The “where” clause is a statement that checks if values match a rule, and if they do they will be returned. Some simple rules would be greater than (>), less than (<) and equal to (=). To use a “where” clause simply write “where” after your select statement then write a column name and a number (or string) you would like to build your rule with. Lets see some examples of how this works.

SELECT name FROM customers 
WHERE age > 30
>>>
Fred
Alex
Sarah
SELECT name FROM customers
WHERE favorite_food == broccoli
>>>
Alex

This is a good exclude data you don’t need.

This was the very basics of SQL. I hope it gave you an idea of how it works and fostered some interest in the language. SQL can be very complicated and do a myriad of different things. This post just scrapes the surface.

--

--