SQL Databases using python

we’ll use for this SQL introduction notebook the Sakila Database and we’ll setting up SQL tools in your Python Environment.

Note There are many ways to browse through an SQL database, the main goal here is to teach you how to use SQL queries to grab information and set it as a pandas DataFrame.

Step 1: Choose an SQL Client (SQLite Broswer/ MySQL Workbench..)

SQLite Browser is lightweight and free to use. There are many other alternatives, check out a list of 10 free ones here

Download SQLite Browser here: http://sqlitebrowser.org/

Step 2: Download the sakila Database

You can download the fully constructed Sakila database (the .bd file) here

Or - you can download the .sql file to construct the database yourself: http://dev.mysql.com/doc/index-other.html Then use SQLite Browser to construct the database by running the .sql file

the full description of the database tables is here

Either way, make sure to save it in the same directory as your iPython notebooks, or remember the file path for later so you can tell pandas exactly where to look for it..

RQ:

Check out the database either by opening it up using your SQL Client or by checking out the MCD diagram at this link: Diagram to have an overview of what the database looks like..

# pip install mysql-connector-python
import mysql.connector as mysqlConnector

la méthode du connecteur pour nous connecter à l’instance. Cette méthode prend plusieurs arguments.connect()

hôte – C’est le nom de l’hôte sur lequel le serveur est en cours d’exécution. Puisque nous courons sur localhost, nous utiliserons localhost comme valeur.
utilisateur – C’est le nom de l’utilisateur lors de la configuration du serveur MySQL. Dans notre cas, nous avons utilisé la racine de nom.
mot de passe – Mot de passe qui est défini lors de la configuration du serveur MySQL. Dans notre cas, nous avons utilisé la racine comme mot de passe.
bases de données – Il prend le nom de la base de données que nous souhaitons connecter.

conn = mysqlConnector.connect(host='Localhost',user='root',passwd='root', database='sakila')
if conn:
    print("Connection Successful!")
else:
    print("Connection Failed!")
cur = conn.cursor()
#cur.execute("SHOW DATABASES")
cur.execute("select * from actor limit 2")
for row in cur:
    print(row)

Connection Successful!
(1, 'PENELOPE', 'GUINESS', datetime.datetime(2006, 2, 15, 4, 34, 33))
(2, 'NICK', 'WAHLBERG', datetime.datetime(2006, 2, 15, 4, 34, 33))

**Note** SQLite3 comes with Python, which provides a lightweight disk-based database that doesn't require a seperate server process. It's useful to prototype with SQLite and then mgrate the code to a larger database system, like MySQL. Python comes with a pretty awesome module to connect to a SQL database with SQLite. The module is SQLite3,

# import Modules
import sqlite3
import pandas as pd

To use the module, you must first create a Connection object that represents the database. If the database name already exists SQLite3 will automatically connect to it, if it does not exist, SQLite3 will automatically create it.

RQ: You can also supply the special name :memory: to create a database in RAM.

all syntax details is here

#other alternativeis to load the ipython-sql extension as follows
#%load_ext sql
#%sql sqlite://

Connect to the Sakila database(downloaded from links up)

# Connect to the Sakila database(downloaded from links up)
con = sqlite3.connect("sakila.db")
if con:
    print("Connection Successful!")
else:
    print("Connection Failed!")
cur = con.cursor()
#cur.execute("SHOW DATABASES")
cur.execute("select * from actor limit 2")
for row in cur:
    print(row)

Connection Successful!
(1, 'PENELOPE', 'GUINESS', '2011-09-14 18:05:28')
(2, 'NICK', 'WAHLBERG', '2011-09-14 18:05:28')

# Après avoir terminé le travail, Il est important de fermer la connexion que nous avons faite à la fin en utilisant la méthode. close()

conn.close()

from sqlalchemy import create_engine

con1 = create_engine("sqlite:///sakila.db",
                        echo=True)

con1.connect()

print(con1)

2021-01-27 12:09:19,623 INFO sqlalchemy.engine.base.Engine SELECT CAST('test plain returns' AS VARCHAR(60)) AS anon_1
2021-01-27 12:09:19,624 INFO sqlalchemy.engine.base.Engine ()
2021-01-27 12:09:19,626 INFO sqlalchemy.engine.base.Engine SELECT CAST('test unicode returns' AS VARCHAR(60)) AS anon_1
2021-01-27 12:09:19,627 INFO sqlalchemy.engine.base.Engine ()
Engine(sqlite:///sakila.db)

from sqlalchemy import create_engine

#engine = create_engine("mysql///?User=root&Password=root&Database=sakila&Server=myServer&Port=3306")
conn = create_engine("mysql+mysqlconnector://root:root@localhost:3306/sakila",
                       echo=True)
conn.connect()

print(conn)

2021-01-27 12:09:20,701 INFO sqlalchemy.engine.base.Engine SHOW VARIABLES LIKE 'sql_mode'
2021-01-27 12:09:20,702 INFO sqlalchemy.engine.base.Engine {}
2021-01-27 12:09:20,725 INFO sqlalchemy.engine.base.Engine SHOW VARIABLES LIKE 'lower_case_table_names'
2021-01-27 12:09:20,727 INFO sqlalchemy.engine.base.Engine {}
2021-01-27 12:09:20,738 INFO sqlalchemy.engine.base.Engine SELECT DATABASE()
2021-01-27 12:09:20,739 INFO sqlalchemy.engine.base.Engine {}
2021-01-27 12:09:20,746 INFO sqlalchemy.engine.base.Engine SELECT CAST('test plain returns' AS CHAR(60)) AS anon_1
2021-01-27 12:09:20,748 INFO sqlalchemy.engine.base.Engine {}
2021-01-27 12:09:20,751 INFO sqlalchemy.engine.base.Engine SELECT CAST('test unicode returns' AS CHAR(60)) AS anon_1
2021-01-27 12:09:20,752 INFO sqlalchemy.engine.base.Engine {}
Engine(mysql+mysqlconnector://root:***@localhost:3306/sakila)

Now we can run a basic SQL query, pass it with pandas, and display the output as a DataFrame!

SQL SELECT Statement

The SELECT statement is used to select data from a database. The result is then stored in a result table, called usually the result-set. ##### Syntax for SQL SELECT SELECT column_name FROM table_name ***We could also select multiple columns:*** SELECT column_name1,column_name2
FROM table_name ***Or we could select everything in a table using*** SELECT * FROM table_name

# Set SQL query as a comment
sql_query = ''' SELECT * FROM actor '''

# Use pandas to pass sql query using connection form SQLite3
df = pd.read_sql(sql_query, conn)

# Show the resulting DataFrame
df

2021-01-27 12:09:25,309 INFO sqlalchemy.engine.base.Engine DESCRIBE ` SELECT * FROM actor `
2021-01-27 12:09:25,310 INFO sqlalchemy.engine.base.Engine {}
2021-01-27 12:09:25,769 INFO sqlalchemy.engine.base.Engine ROLLBACK
2021-01-27 12:09:25,772 INFO sqlalchemy.engine.base.OptionEngine  SELECT * FROM actor 
2021-01-27 12:09:25,773 INFO sqlalchemy.engine.base.OptionEngine {}

	actor_id	first_name	last_name	last_update
0	1	PENELOPE	GUINESS	2006-02-15 04:34:33
1	2	NICK	WAHLBERG	2006-02-15 04:34:33
2	3	ED	CHASE	2006-02-15 04:34:33
3	4	JENNIFER	DAVIS	2006-02-15 04:34:33
4	5	JOHNNY	LOLLOBRIGIDA	2006-02-15 04:34:33
...	...	...	...	...
197	198	MARY	KEITEL	2006-02-15 04:34:33
198	199	JULIA	FAWCETT	2006-02-15 04:34:33
199	200	THORA	TEMPLE	2006-02-15 04:34:33
200	201	Julia	Roberts	2020-11-10 13:00:42
201	202	Julia	Roberts	2020-11-10 13:09:08

202 rows × 4 columns

# Set function as our sql_to_pandas

def sql_to_df(sql_query):

    # Use pandas to pass sql query using connection form SQLite3
    df = pd.read_sql(sql_query, con)

    # Show the resulting DataFrame
    return df

Selecting Multiple Columns

# Select multiple columns example
query = ''' SELECT first_name,last_name
            FROM customer; '''

# Grab from first two columns
sql_to_df(query).head()

	first_name	last_name
0	MARY	SMITH
1	PATRICIA	JOHNSON
2	LINDA	WILLIAMS
3	BARBARA	JONES
4	ELIZABETH	BROWN

Selecting Everything from table with *

# Select multiple columns example
query = ''' SELECT *
            FROM customer; '''

# Grab 
sql_to_df(query).head()

Syntax for the SQL DISTINCT Statement

In a table, a column may contain duplicate values; and sometimes you only want to list the distinct (unique) values. The DISTINCT keyword can be used to return only distinct (unique) values.

SELECT DISTINCT column_name FROM table_name;

# Select distinct country_ids from the city table.
query = ''' SELECT DISTINCT(country_id)
            FROM city'''

sql_to_df(query).head()

	country_id
0	1
1	2
2	3
3	4
4	5

Syntax for the SQL WHERE

The WHERE clause is used to filter records, the WHERE clause is used to extract only the records that fulfill the specific parameter.

SELECT column_name FROM table_name WHERE column_name ( math operator) desired_value;

# Select all customer info from the 1st store.
query = ''' SELECT *
            FROM customer
            WHERE store_id = 1'''

sql_to_df(query).head()

	customer_id	store_id	first_name	last_name	email	address_id	active	create_date	last_update
0	1	1	MARY	SMITH	MARY.SMITH@sakilacustomer.org	5	1	2006-02-14 22:04:36.000	2011-09-14 18:10:28
1	2	1	PATRICIA	JOHNSON	PATRICIA.JOHNSON@sakilacustomer.org	6	1	2006-02-14 22:04:36.000	2011-09-14 18:10:28
2	3	1	LINDA	WILLIAMS	LINDA.WILLIAMS@sakilacustomer.org	7	1	2006-02-14 22:04:36.000	2011-09-14 18:10:28
3	5	1	ELIZABETH	BROWN	ELIZABETH.BROWN@sakilacustomer.org	9	1	2006-02-14 22:04:36.000	2011-09-14 18:10:28
4	7	1	MARIA	MILLER	MARIA.MILLER@sakilacustomer.org	11	1	2006-02-14 22:04:36.000	2011-09-14 18:10:28

Note, there are a variety of logical operators you can use for a SQL request.

Operator	Description
%	Equal
<>	Not equal. Note: In some versions of SQL this operator may be written !=
>	Greater than
<	Less than
>=	Greater than or equal
<=	Less than or equal

SQL requires single quotes around text values, while numeric fields are not enclosed in quotes, for example a text value for the above where statement:

# Select all customer info from Mary.
query = ''' SELECT *
            FROM customer
            WHERE first_name = 'MARY'  '''

sql_to_df(query).head()

	customer_id	store_id	first_name	last_name	email	address_id	active	create_date	last_update
0	1	1	MARY	SMITH	MARY.SMITH@sakilacustomer.org	5	1	2006-02-14 22:04:36.000	2011-09-14 18:10:28

Syntax for AND

The AND operator is used to filter records based on more than one condition.

The AND operator displays a record if both the first condition AND the second condition are true.

# Select all films from 2006 that are rated R.

query = ''' SELECT *
            FROM film
            WHERE release_year = 2006
            AND rating = 'R' '''

sql_to_df(query).head()

	film_id	title	description	release_year	language_id	original_language_id	rental_duration	rental_rate	length	replacement_cost	rating	special_features	last_update
0	8	AIRPORT POLLOCK	A Epic Tale of a Moose And a Girl who must Con...	2006	1	None	6	4.99	54	15.99	R	Trailers	2011-09-14 18:05:33
1	17	ALONE TRIP	A Fast-Paced Character Study of a Composer And...	2006	1	None	3	0.99	82	14.99	R	Trailers,Behind the Scenes	2011-09-14 18:05:33
2	20	AMELIE HELLFIGHTERS	A Boring Drama of a Woman And a Squirrel who m...	2006	1	None	4	4.99	79	23.99	R	Commentaries,Deleted Scenes,Behind the Scenes	2011-09-14 18:05:33
3	21	AMERICAN CIRCUS	A Insightful Drama of a Girl And a Astronaut w...	2006	1	None	3	4.99	129	17.99	R	Commentaries,Behind the Scenes	2011-09-14 18:05:33
4	23	ANACONDA CONFESSIONS	A Lacklusture Display of a Dentist And a Denti...	2006	1	None	3	0.99	92	9.99	R	Trailers,Deleted Scenes	2011-09-14 18:05:33

Syntax for OR

The OR operator displays a record if either the first condition OR the second condition is true.

# Select all films from R or PG.

query = ''' SELECT *
            FROM film
            WHERE rating = 'PG'
            OR rating = 'R' '''

sql_to_df(query).head()

	film_id	title	description	release_year	language_id	original_language_id	rental_duration	rental_rate	length	replacement_cost	rating	special_features	last_update
0	1	ACADEMY DINOSAUR	A Epic Drama of a Feminist And a Mad Scientist...	2006	1	None	6	0.99	86	20.99	PG	Deleted Scenes,Behind the Scenes	2011-09-14 18:05:32
1	6	AGENT TRUMAN	A Intrepid Panorama of a Robot And a Boy who m...	2006	1	None	3	2.99	169	17.99	PG	Deleted Scenes	2011-09-14 18:05:33
2	8	AIRPORT POLLOCK	A Epic Tale of a Moose And a Girl who must Con...	2006	1	None	6	4.99	54	15.99	R	Trailers	2011-09-14 18:05:33
3	12	ALASKA PHANTOM	A Fanciful Saga of a Hunter And a Pastry Chef ...	2006	1	None	6	0.99	136	22.99	PG	Commentaries,Deleted Scenes	2011-09-14 18:05:33
4	13	ALI FOREVER	A Action-Packed Drama of a Dentist And a Croco...	2006	1	None	4	4.99	150	21.99	PG	Deleted Scenes,Behind the Scenes	2011-09-14 18:05:33

ORDER BY and GROUP BY statements.

Before we begin with Wildcards, ORDER BY, and GROUP BY. Let's take a look at aggregate functions. * AVG() - Returns the average value. * COUNT() - Returns the number of rows. * FIRST() - Returns the first value. * LAST() - Returns the last value. * MAX() - Returns the largest value. * MIN() - Returns the smallest value. * SUM() - Returns the sum. You can call any of these aggregate functions on a column to get the resulting values back..

# Count the number of customers
query = ''' SELECT COUNT(customer_id)
            FROM customer; '''

# Grab 
sql_to_df(query).head()

	COUNT(customer_id)
0	599

Go ahead and experiment with the other aggregate functions. The usual syntax is:

SELECT column_name, aggregate_function(column_name)
FROM table_name
WHERE column_name

SQL Wildcards

A wildcard character can be used to substitute for any other characters in a string. In SQL, wildcard characters are used with the SQL LIKE operator. The LIKE operator is used in a WHERE clause to search for a specified pattern in a column.

There are several wildcard operators:

Wildcard	Description
%	A substitute for zero or more characters
_	A substitute for a single character
[character_list]	Sets and ranges of characters to match

Let’s see them in examples now!

The % wildcard

# Select any customers whose name start with an M
query = ''' SELECT *
            FROM customer
            WHERE first_name LIKE 'M%' ; '''

# Grab 
sql_to_df(query).head()

	customer_id	store_id	first_name	last_name	email	address_id	active	create_date	last_update
0	1	1	MARY	SMITH	MARY.SMITH@sakilacustomer.org	5	1	2006-02-14 22:04:36.000	2011-09-14 18:10:28
1	7	1	MARIA	MILLER	MARIA.MILLER@sakilacustomer.org	11	1	2006-02-14 22:04:36.000	2011-09-14 18:10:28
2	9	2	MARGARET	MOORE	MARGARET.MOORE@sakilacustomer.org	13	1	2006-02-14 22:04:36.000	2011-09-14 18:10:28
3	21	1	MICHELLE	CLARK	MICHELLE.CLARK@sakilacustomer.org	25	1	2006-02-14 22:04:36.000	2011-09-14 18:10:28
4	30	1	MELISSA	KING	MELISSA.KING@sakilacustomer.org	34	1	2006-02-14 22:04:36.000	2011-09-14 18:10:29

The _ wildcard

# Select any customers whose last name ends with ing
query = ''' SELECT *
            FROM customer
            WHERE last_name LIKE '_ING' ; '''

# Grab 
sql_to_df(query).head()

	customer_id	store_id	first_name	last_name	email	address_id	active	create_date	last_update
0	30	1	MELISSA	KING	MELISSA.KING@sakilacustomer.org	34	1	2006-02-14 22:04:36.000	2011-09-14 18:10:29

Now we will move on to the [Character_list] wildcard.

IMPORTANT NOTE!

Using [charlist] with SQLite is a little different than with other SQL formats, such as MySQL. * In MySQL you would use: WHERE value LIKE '[charlist]%' * In SQLite you use: WHERE value GLOB '[charlist]*'

The [character_list] wildcard

# Select any customers whose first name begins with an A or a B
query = ''' SELECT *
            FROM customer
            WHERE first_name GLOB '[AB]*' ; '''

# Grab 
sql_to_df(query).head()

	customer_id	store_id	first_name	last_name	email	address_id	active	create_date	last_update
0	4	2	BARBARA	JONES	BARBARA.JONES@sakilacustomer.org	8	1	2006-02-14 22:04:36.000	2011-09-14 18:10:28
1	14	2	BETTY	WHITE	BETTY.WHITE@sakilacustomer.org	18	1	2006-02-14 22:04:36.000	2011-09-14 18:10:28
2	29	2	ANGELA	HERNANDEZ	ANGELA.HERNANDEZ@sakilacustomer.org	33	1	2006-02-14 22:04:36.000	2011-09-14 18:10:29
3	31	2	BRENDA	WRIGHT	BRENDA.WRIGHT@sakilacustomer.org	35	1	2006-02-14 22:04:36.000	2011-09-14 18:10:29
4	32	1	AMY	LOPEZ	AMY.LOPEZ@sakilacustomer.org	36	1	2006-02-14 22:04:36.000	2011-09-14 18:10:29

SQL ORDER BY

The ORDER BY keyword is used to sort the result-set by one or more columns. The ORDER BY keyword sorts the records in ascending order by default. To sort the records in a descending order, you can use the DESC keyword. The syntax is:

SELECT column_name
FROM table_name
ORDER BY column_name ASC|DESC

# Select all customers and order results by last name
query = ''' SELECT *
            FROM customer
            ORDER BY last_name ; '''

# Grab 
sql_to_df(query).head()

	customer_id	store_id	first_name	last_name	email	address_id	active	create_date	last_update
0	505	1	RAFAEL	ABNEY	RAFAEL.ABNEY@sakilacustomer.org	510	1	2006-02-14 22:04:37.000	2011-09-14 18:10:42
1	504	1	NATHANIEL	ADAM	NATHANIEL.ADAM@sakilacustomer.org	509	1	2006-02-14 22:04:37.000	2011-09-14 18:10:42
2	36	2	KATHLEEN	ADAMS	KATHLEEN.ADAMS@sakilacustomer.org	40	1	2006-02-14 22:04:36.000	2011-09-14 18:10:29
3	96	1	DIANA	ALEXANDER	DIANA.ALEXANDER@sakilacustomer.org	100	1	2006-02-14 22:04:36.000	2011-09-14 18:10:30
4	470	1	GORDON	ALLARD	GORDON.ALLARD@sakilacustomer.org	475	1	2006-02-14 22:04:37.000	2011-09-14 18:10:41

# Select all customers and order results by last name, DESCENDING
query = ''' SELECT *
            FROM customer
            ORDER BY last_name DESC; '''

# Grab 
sql_to_df(query).head()

SQL GROUP BY

The GROUP BY statement is used with the aggregate functions to group the results by one or more columns. The syntax is:

SELECT column_name, aggregate_function(column_name)
FROM table_name
WHERE column_name operator value
GROUP BY column_name;

Let’s see how it works.

# Count the number of customers per store

query = ''' SELECT store_id , COUNT(customer_id)
            FROM customer
            GROUP BY store_id; '''

# Grab 
sql_to_df(query).head()

	store_id	COUNT(customer_id)
0	1	326
1	2	273

Some Useful Ressources:

https://www.w3schools.com/python/python_mysql_getstarted.asp

https://python.doctor/page-database-data-base-donnees-query-sql-mysql-postgre-sqlite

https://realpython.com/python-sqlite-sqlalchemy/

https://pypi.org/project/SQLAlchemy/

https://www.pythoncentral.io/introductory-tutorial-python-sqlalchemy/