we’ll use for this SQL introduction notebook the Sakila Database and we’ll setting up SQL tools in your Python Environment.
Note There are many ways to browse through an SQL database, the main goal here is to teach you how to use SQL queries to grab information and set it as a pandas DataFrame.
SQLite Browser is lightweight and free to use. There are many other alternatives, check out a list of 10 free ones here
Download SQLite Browser here: http://sqlitebrowser.org/
You can download the fully constructed Sakila database (the .bd file) here
Or - you can download the .sql file to construct the database yourself: http://dev.mysql.com/doc/index-other.html Then use SQLite Browser to construct the database by running the .sql file
the full description of the database tables is here
Either way, make sure to save it in the same directory as your iPython notebooks, or remember the file path for later so you can tell pandas exactly where to look for it..
RQ:
Check out the database either by opening it up using your SQL Client or by checking out the MCD diagram at this link: Diagram to have an overview of what the database looks like..
# pip install mysql-connector-python
import mysql.connector as mysqlConnector
la méthode du connecteur pour nous connecter à l’instance. Cette méthode prend plusieurs arguments.connect()
conn = mysqlConnector.connect(host='Localhost',user='root',passwd='root', database='sakila')
if conn:
print("Connection Successful!")
else:
print("Connection Failed!")
cur = conn.cursor()
#cur.execute("SHOW DATABASES")
cur.execute("select * from actor limit 2")
for row in cur:
print(row)
Connection Successful!
(1, 'PENELOPE', 'GUINESS', datetime.datetime(2006, 2, 15, 4, 34, 33))
(2, 'NICK', 'WAHLBERG', datetime.datetime(2006, 2, 15, 4, 34, 33))
# import Modules
import sqlite3
import pandas as pd
To use the module, you must first create a Connection object that represents the database. If the database name already exists SQLite3 will automatically connect to it, if it does not exist, SQLite3 will automatically create it.
RQ: You can also supply the special name :memory: to create a database in RAM.
all syntax details is here
#other alternativeis to load the ipython-sql extension as follows
#%load_ext sql
#%sql sqlite://
Connect to the Sakila database(downloaded from links up)
# Connect to the Sakila database(downloaded from links up)
con = sqlite3.connect("sakila.db")
if con:
print("Connection Successful!")
else:
print("Connection Failed!")
cur = con.cursor()
#cur.execute("SHOW DATABASES")
cur.execute("select * from actor limit 2")
for row in cur:
print(row)
Connection Successful!
(1, 'PENELOPE', 'GUINESS', '2011-09-14 18:05:28')
(2, 'NICK', 'WAHLBERG', '2011-09-14 18:05:28')
# Après avoir terminé le travail, Il est important de fermer la connexion que nous avons faite à la fin en utilisant la méthode. close()
conn.close()
from sqlalchemy import create_engine
con1 = create_engine("sqlite:///sakila.db",
echo=True)
con1.connect()
print(con1)
2021-01-27 12:09:19,623 INFO sqlalchemy.engine.base.Engine SELECT CAST('test plain returns' AS VARCHAR(60)) AS anon_1
2021-01-27 12:09:19,624 INFO sqlalchemy.engine.base.Engine ()
2021-01-27 12:09:19,626 INFO sqlalchemy.engine.base.Engine SELECT CAST('test unicode returns' AS VARCHAR(60)) AS anon_1
2021-01-27 12:09:19,627 INFO sqlalchemy.engine.base.Engine ()
Engine(sqlite:///sakila.db)
from sqlalchemy import create_engine
#engine = create_engine("mysql///?User=root&Password=root&Database=sakila&Server=myServer&Port=3306")
conn = create_engine("mysql+mysqlconnector://root:root@localhost:3306/sakila",
echo=True)
conn.connect()
print(conn)
2021-01-27 12:09:20,701 INFO sqlalchemy.engine.base.Engine SHOW VARIABLES LIKE 'sql_mode'
2021-01-27 12:09:20,702 INFO sqlalchemy.engine.base.Engine {}
2021-01-27 12:09:20,725 INFO sqlalchemy.engine.base.Engine SHOW VARIABLES LIKE 'lower_case_table_names'
2021-01-27 12:09:20,727 INFO sqlalchemy.engine.base.Engine {}
2021-01-27 12:09:20,738 INFO sqlalchemy.engine.base.Engine SELECT DATABASE()
2021-01-27 12:09:20,739 INFO sqlalchemy.engine.base.Engine {}
2021-01-27 12:09:20,746 INFO sqlalchemy.engine.base.Engine SELECT CAST('test plain returns' AS CHAR(60)) AS anon_1
2021-01-27 12:09:20,748 INFO sqlalchemy.engine.base.Engine {}
2021-01-27 12:09:20,751 INFO sqlalchemy.engine.base.Engine SELECT CAST('test unicode returns' AS CHAR(60)) AS anon_1
2021-01-27 12:09:20,752 INFO sqlalchemy.engine.base.Engine {}
Engine(mysql+mysqlconnector://root:***@localhost:3306/sakila)
Now we can run a basic SQL query, pass it with pandas, and display the output as a DataFrame!
# Set SQL query as a comment
sql_query = ''' SELECT * FROM actor '''
# Use pandas to pass sql query using connection form SQLite3
df = pd.read_sql(sql_query, conn)
# Show the resulting DataFrame
df
2021-01-27 12:09:25,309 INFO sqlalchemy.engine.base.Engine DESCRIBE ` SELECT * FROM actor `
2021-01-27 12:09:25,310 INFO sqlalchemy.engine.base.Engine {}
2021-01-27 12:09:25,769 INFO sqlalchemy.engine.base.Engine ROLLBACK
2021-01-27 12:09:25,772 INFO sqlalchemy.engine.base.OptionEngine SELECT * FROM actor
2021-01-27 12:09:25,773 INFO sqlalchemy.engine.base.OptionEngine {}
actor_id | first_name | last_name | last_update | |
---|---|---|---|---|
0 | 1 | PENELOPE | GUINESS | 2006-02-15 04:34:33 |
1 | 2 | NICK | WAHLBERG | 2006-02-15 04:34:33 |
2 | 3 | ED | CHASE | 2006-02-15 04:34:33 |
3 | 4 | JENNIFER | DAVIS | 2006-02-15 04:34:33 |
4 | 5 | JOHNNY | LOLLOBRIGIDA | 2006-02-15 04:34:33 |
... | ... | ... | ... | ... |
197 | 198 | MARY | KEITEL | 2006-02-15 04:34:33 |
198 | 199 | JULIA | FAWCETT | 2006-02-15 04:34:33 |
199 | 200 | THORA | TEMPLE | 2006-02-15 04:34:33 |
200 | 201 | Julia | Roberts | 2020-11-10 13:00:42 |
201 | 202 | Julia | Roberts | 2020-11-10 13:09:08 |
202 rows × 4 columns
# Set function as our sql_to_pandas
def sql_to_df(sql_query):
# Use pandas to pass sql query using connection form SQLite3
df = pd.read_sql(sql_query, con)
# Show the resulting DataFrame
return df
# Select multiple columns example
query = ''' SELECT first_name,last_name
FROM customer; '''
# Grab from first two columns
sql_to_df(query).head()
first_name | last_name | |
---|---|---|
0 | MARY | SMITH |
1 | PATRICIA | JOHNSON |
2 | LINDA | WILLIAMS |
3 | BARBARA | JONES |
4 | ELIZABETH | BROWN |
# Select multiple columns example
query = ''' SELECT *
FROM customer; '''
# Grab
sql_to_df(query).head()
In a table, a column may contain duplicate values; and sometimes you only want to list the distinct (unique) values. The DISTINCT keyword can be used to return only distinct (unique) values.
SELECT DISTINCT column_name FROM table_name;
# Select distinct country_ids from the city table.
query = ''' SELECT DISTINCT(country_id)
FROM city'''
sql_to_df(query).head()
country_id | |
---|---|
0 | 1 |
1 | 2 |
2 | 3 |
3 | 4 |
4 | 5 |
The WHERE clause is used to filter records, the WHERE clause is used to extract only the records that fulfill the specific parameter.
SELECT column_name FROM table_name WHERE column_name ( math operator) desired_value;
# Select all customer info from the 1st store.
query = ''' SELECT *
FROM customer
WHERE store_id = 1'''
sql_to_df(query).head()
customer_id | store_id | first_name | last_name | address_id | active | create_date | last_update | ||
---|---|---|---|---|---|---|---|---|---|
0 | 1 | 1 | MARY | SMITH | MARY.SMITH@sakilacustomer.org | 5 | 1 | 2006-02-14 22:04:36.000 | 2011-09-14 18:10:28 |
1 | 2 | 1 | PATRICIA | JOHNSON | PATRICIA.JOHNSON@sakilacustomer.org | 6 | 1 | 2006-02-14 22:04:36.000 | 2011-09-14 18:10:28 |
2 | 3 | 1 | LINDA | WILLIAMS | LINDA.WILLIAMS@sakilacustomer.org | 7 | 1 | 2006-02-14 22:04:36.000 | 2011-09-14 18:10:28 |
3 | 5 | 1 | ELIZABETH | BROWN | ELIZABETH.BROWN@sakilacustomer.org | 9 | 1 | 2006-02-14 22:04:36.000 | 2011-09-14 18:10:28 |
4 | 7 | 1 | MARIA | MILLER | MARIA.MILLER@sakilacustomer.org | 11 | 1 | 2006-02-14 22:04:36.000 | 2011-09-14 18:10:28 |
Note, there are a variety of logical operators you can use for a SQL request.
Operator | Description |
---|---|
% | Equal |
<> | Not equal. Note: In some versions of SQL this operator may be written != |
> | Greater than |
< | Less than |
>= | Greater than or equal |
<= | Less than or equal |
SQL requires single quotes around text values, while numeric fields are not enclosed in quotes, for example a text value for the above where statement:
# Select all customer info from Mary.
query = ''' SELECT *
FROM customer
WHERE first_name = 'MARY' '''
sql_to_df(query).head()
customer_id | store_id | first_name | last_name | address_id | active | create_date | last_update | ||
---|---|---|---|---|---|---|---|---|---|
0 | 1 | 1 | MARY | SMITH | MARY.SMITH@sakilacustomer.org | 5 | 1 | 2006-02-14 22:04:36.000 | 2011-09-14 18:10:28 |
The AND operator is used to filter records based on more than one condition.
The AND operator displays a record if both the first condition AND the second condition are true.
# Select all films from 2006 that are rated R.
query = ''' SELECT *
FROM film
WHERE release_year = 2006
AND rating = 'R' '''
sql_to_df(query).head()
film_id | title | description | release_year | language_id | original_language_id | rental_duration | rental_rate | length | replacement_cost | rating | special_features | last_update | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 8 | AIRPORT POLLOCK | A Epic Tale of a Moose And a Girl who must Con... | 2006 | 1 | None | 6 | 4.99 | 54 | 15.99 | R | Trailers | 2011-09-14 18:05:33 |
1 | 17 | ALONE TRIP | A Fast-Paced Character Study of a Composer And... | 2006 | 1 | None | 3 | 0.99 | 82 | 14.99 | R | Trailers,Behind the Scenes | 2011-09-14 18:05:33 |
2 | 20 | AMELIE HELLFIGHTERS | A Boring Drama of a Woman And a Squirrel who m... | 2006 | 1 | None | 4 | 4.99 | 79 | 23.99 | R | Commentaries,Deleted Scenes,Behind the Scenes | 2011-09-14 18:05:33 |
3 | 21 | AMERICAN CIRCUS | A Insightful Drama of a Girl And a Astronaut w... | 2006 | 1 | None | 3 | 4.99 | 129 | 17.99 | R | Commentaries,Behind the Scenes | 2011-09-14 18:05:33 |
4 | 23 | ANACONDA CONFESSIONS | A Lacklusture Display of a Dentist And a Denti... | 2006 | 1 | None | 3 | 0.99 | 92 | 9.99 | R | Trailers,Deleted Scenes | 2011-09-14 18:05:33 |
The OR operator displays a record if either the first condition OR the second condition is true.
# Select all films from R or PG.
query = ''' SELECT *
FROM film
WHERE rating = 'PG'
OR rating = 'R' '''
sql_to_df(query).head()
film_id | title | description | release_year | language_id | original_language_id | rental_duration | rental_rate | length | replacement_cost | rating | special_features | last_update | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | ACADEMY DINOSAUR | A Epic Drama of a Feminist And a Mad Scientist... | 2006 | 1 | None | 6 | 0.99 | 86 | 20.99 | PG | Deleted Scenes,Behind the Scenes | 2011-09-14 18:05:32 |
1 | 6 | AGENT TRUMAN | A Intrepid Panorama of a Robot And a Boy who m... | 2006 | 1 | None | 3 | 2.99 | 169 | 17.99 | PG | Deleted Scenes | 2011-09-14 18:05:33 |
2 | 8 | AIRPORT POLLOCK | A Epic Tale of a Moose And a Girl who must Con... | 2006 | 1 | None | 6 | 4.99 | 54 | 15.99 | R | Trailers | 2011-09-14 18:05:33 |
3 | 12 | ALASKA PHANTOM | A Fanciful Saga of a Hunter And a Pastry Chef ... | 2006 | 1 | None | 6 | 0.99 | 136 | 22.99 | PG | Commentaries,Deleted Scenes | 2011-09-14 18:05:33 |
4 | 13 | ALI FOREVER | A Action-Packed Drama of a Dentist And a Croco... | 2006 | 1 | None | 4 | 4.99 | 150 | 21.99 | PG | Deleted Scenes,Behind the Scenes | 2011-09-14 18:05:33 |
# Count the number of customers
query = ''' SELECT COUNT(customer_id)
FROM customer; '''
# Grab
sql_to_df(query).head()
COUNT(customer_id) | |
---|---|
0 | 599 |
Go ahead and experiment with the other aggregate functions. The usual syntax is:
SELECT column_name, aggregate_function(column_name)
FROM table_name
WHERE column_name
A wildcard character can be used to substitute for any other characters in a string. In SQL, wildcard characters are used with the SQL LIKE operator. The LIKE operator is used in a WHERE clause to search for a specified pattern in a column.
There are several wildcard operators:
Wildcard | Description |
---|---|
% | A substitute for zero or more characters |
_ | A substitute for a single character |
[character_list] | Sets and ranges of characters to match |
Let’s see them in examples now!
# Select any customers whose name start with an M
query = ''' SELECT *
FROM customer
WHERE first_name LIKE 'M%' ; '''
# Grab
sql_to_df(query).head()
customer_id | store_id | first_name | last_name | address_id | active | create_date | last_update | ||
---|---|---|---|---|---|---|---|---|---|
0 | 1 | 1 | MARY | SMITH | MARY.SMITH@sakilacustomer.org | 5 | 1 | 2006-02-14 22:04:36.000 | 2011-09-14 18:10:28 |
1 | 7 | 1 | MARIA | MILLER | MARIA.MILLER@sakilacustomer.org | 11 | 1 | 2006-02-14 22:04:36.000 | 2011-09-14 18:10:28 |
2 | 9 | 2 | MARGARET | MOORE | MARGARET.MOORE@sakilacustomer.org | 13 | 1 | 2006-02-14 22:04:36.000 | 2011-09-14 18:10:28 |
3 | 21 | 1 | MICHELLE | CLARK | MICHELLE.CLARK@sakilacustomer.org | 25 | 1 | 2006-02-14 22:04:36.000 | 2011-09-14 18:10:28 |
4 | 30 | 1 | MELISSA | KING | MELISSA.KING@sakilacustomer.org | 34 | 1 | 2006-02-14 22:04:36.000 | 2011-09-14 18:10:29 |
# Select any customers whose last name ends with ing
query = ''' SELECT *
FROM customer
WHERE last_name LIKE '_ING' ; '''
# Grab
sql_to_df(query).head()
customer_id | store_id | first_name | last_name | address_id | active | create_date | last_update | ||
---|---|---|---|---|---|---|---|---|---|
0 | 30 | 1 | MELISSA | KING | MELISSA.KING@sakilacustomer.org | 34 | 1 | 2006-02-14 22:04:36.000 | 2011-09-14 18:10:29 |
Now we will move on to the [Character_list] wildcard.
# Select any customers whose first name begins with an A or a B
query = ''' SELECT *
FROM customer
WHERE first_name GLOB '[AB]*' ; '''
# Grab
sql_to_df(query).head()
customer_id | store_id | first_name | last_name | address_id | active | create_date | last_update | ||
---|---|---|---|---|---|---|---|---|---|
0 | 4 | 2 | BARBARA | JONES | BARBARA.JONES@sakilacustomer.org | 8 | 1 | 2006-02-14 22:04:36.000 | 2011-09-14 18:10:28 |
1 | 14 | 2 | BETTY | WHITE | BETTY.WHITE@sakilacustomer.org | 18 | 1 | 2006-02-14 22:04:36.000 | 2011-09-14 18:10:28 |
2 | 29 | 2 | ANGELA | HERNANDEZ | ANGELA.HERNANDEZ@sakilacustomer.org | 33 | 1 | 2006-02-14 22:04:36.000 | 2011-09-14 18:10:29 |
3 | 31 | 2 | BRENDA | WRIGHT | BRENDA.WRIGHT@sakilacustomer.org | 35 | 1 | 2006-02-14 22:04:36.000 | 2011-09-14 18:10:29 |
4 | 32 | 1 | AMY | LOPEZ | AMY.LOPEZ@sakilacustomer.org | 36 | 1 | 2006-02-14 22:04:36.000 | 2011-09-14 18:10:29 |
The ORDER BY keyword is used to sort the result-set by one or more columns. The ORDER BY keyword sorts the records in ascending order by default. To sort the records in a descending order, you can use the DESC keyword. The syntax is:
SELECT column_name
FROM table_name
ORDER BY column_name ASC|DESC
# Select all customers and order results by last name
query = ''' SELECT *
FROM customer
ORDER BY last_name ; '''
# Grab
sql_to_df(query).head()
customer_id | store_id | first_name | last_name | address_id | active | create_date | last_update | ||
---|---|---|---|---|---|---|---|---|---|
0 | 505 | 1 | RAFAEL | ABNEY | RAFAEL.ABNEY@sakilacustomer.org | 510 | 1 | 2006-02-14 22:04:37.000 | 2011-09-14 18:10:42 |
1 | 504 | 1 | NATHANIEL | ADAM | NATHANIEL.ADAM@sakilacustomer.org | 509 | 1 | 2006-02-14 22:04:37.000 | 2011-09-14 18:10:42 |
2 | 36 | 2 | KATHLEEN | ADAMS | KATHLEEN.ADAMS@sakilacustomer.org | 40 | 1 | 2006-02-14 22:04:36.000 | 2011-09-14 18:10:29 |
3 | 96 | 1 | DIANA | ALEXANDER | DIANA.ALEXANDER@sakilacustomer.org | 100 | 1 | 2006-02-14 22:04:36.000 | 2011-09-14 18:10:30 |
4 | 470 | 1 | GORDON | ALLARD | GORDON.ALLARD@sakilacustomer.org | 475 | 1 | 2006-02-14 22:04:37.000 | 2011-09-14 18:10:41 |
# Select all customers and order results by last name, DESCENDING
query = ''' SELECT *
FROM customer
ORDER BY last_name DESC; '''
# Grab
sql_to_df(query).head()
The GROUP BY statement is used with the aggregate functions to group the results by one or more columns. The syntax is:
SELECT column_name, aggregate_function(column_name)
FROM table_name
WHERE column_name operator value
GROUP BY column_name;
Let’s see how it works.
# Count the number of customers per store
query = ''' SELECT store_id , COUNT(customer_id)
FROM customer
GROUP BY store_id; '''
# Grab
sql_to_df(query).head()
store_id | COUNT(customer_id) | |
---|---|---|
0 | 1 | 326 |
1 | 2 | 273 |
https://www.w3schools.com/python/python_mysql_getstarted.asp
https://python.doctor/page-database-data-base-donnees-query-sql-mysql-postgre-sqlite
https://realpython.com/python-sqlite-sqlalchemy/
https://pypi.org/project/SQLAlchemy/
https://www.pythoncentral.io/introductory-tutorial-python-sqlalchemy/