Nanterre p10 - Dev Data

Logo

semaine s14

semaine s15

semaine courante (s17)

planning des veilles

SQL Databases using python

we’ll use for this SQL introduction notebook the Sakila Database and we’ll setting up SQL tools in your Python Environment.

Note There are many ways to browse through an SQL database, the main goal here is to teach you how to use SQL queries to grab information and set it as a pandas DataFrame.

Step 1: Choose an SQL Client (SQLite Broswer/ MySQL Workbench..)

SQLite Browser is lightweight and free to use. There are many other alternatives, check out a list of 10 free ones here

Download SQLite Browser here: http://sqlitebrowser.org/

Step 2: Download the sakila Database

You can download the fully constructed Sakila database (the .bd file) here

Or - you can download the .sql file to construct the database yourself: http://dev.mysql.com/doc/index-other.html Then use SQLite Browser to construct the database by running the .sql file

the full description of the database tables is here

Either way, make sure to save it in the same directory as your iPython notebooks, or remember the file path for later so you can tell pandas exactly where to look for it..

RQ:

Check out the database either by opening it up using your SQL Client or by checking out the MCD diagram at this link: Diagram to have an overview of what the database looks like..

# pip install mysql-connector-python
import mysql.connector as mysqlConnector

la méthode du connecteur pour nous connecter à l’instance. Cette méthode prend plusieurs arguments.connect()

conn = mysqlConnector.connect(host='Localhost',user='root',passwd='root', database='sakila')
if conn:
    print("Connection Successful!")
else:
    print("Connection Failed!")
cur = conn.cursor()
#cur.execute("SHOW DATABASES")
cur.execute("select * from actor limit 2")
for row in cur:
    print(row)
Connection Successful!
(1, 'PENELOPE', 'GUINESS', datetime.datetime(2006, 2, 15, 4, 34, 33))
(2, 'NICK', 'WAHLBERG', datetime.datetime(2006, 2, 15, 4, 34, 33))
**Note** SQLite3 comes with Python, which provides a lightweight disk-based database that doesn't require a seperate server process. It's useful to prototype with SQLite and then mgrate the code to a larger database system, like MySQL. Python comes with a pretty awesome module to connect to a SQL database with SQLite. The module is SQLite3,
# import Modules
import sqlite3
import pandas as pd

To use the module, you must first create a Connection object that represents the database. If the database name already exists SQLite3 will automatically connect to it, if it does not exist, SQLite3 will automatically create it.

RQ: You can also supply the special name :memory: to create a database in RAM.

all syntax details is here

#other alternativeis to load the ipython-sql extension as follows
#%load_ext sql
#%sql sqlite://

Connect to the Sakila database(downloaded from links up)

# Connect to the Sakila database(downloaded from links up)
con = sqlite3.connect("sakila.db")
if con:
    print("Connection Successful!")
else:
    print("Connection Failed!")
cur = con.cursor()
#cur.execute("SHOW DATABASES")
cur.execute("select * from actor limit 2")
for row in cur:
    print(row)
Connection Successful!
(1, 'PENELOPE', 'GUINESS', '2011-09-14 18:05:28')
(2, 'NICK', 'WAHLBERG', '2011-09-14 18:05:28')
# Après avoir terminé le travail, Il est important de fermer la connexion que nous avons faite à la fin en utilisant la méthode. close()

conn.close()
from sqlalchemy import create_engine

con1 = create_engine("sqlite:///sakila.db",
                        echo=True)

con1.connect()

print(con1)
2021-01-27 12:09:19,623 INFO sqlalchemy.engine.base.Engine SELECT CAST('test plain returns' AS VARCHAR(60)) AS anon_1
2021-01-27 12:09:19,624 INFO sqlalchemy.engine.base.Engine ()
2021-01-27 12:09:19,626 INFO sqlalchemy.engine.base.Engine SELECT CAST('test unicode returns' AS VARCHAR(60)) AS anon_1
2021-01-27 12:09:19,627 INFO sqlalchemy.engine.base.Engine ()
Engine(sqlite:///sakila.db)
from sqlalchemy import create_engine

#engine = create_engine("mysql///?User=root&Password=root&Database=sakila&Server=myServer&Port=3306")
conn = create_engine("mysql+mysqlconnector://root:root@localhost:3306/sakila",
                       echo=True)
conn.connect()

print(conn)
2021-01-27 12:09:20,701 INFO sqlalchemy.engine.base.Engine SHOW VARIABLES LIKE 'sql_mode'
2021-01-27 12:09:20,702 INFO sqlalchemy.engine.base.Engine {}
2021-01-27 12:09:20,725 INFO sqlalchemy.engine.base.Engine SHOW VARIABLES LIKE 'lower_case_table_names'
2021-01-27 12:09:20,727 INFO sqlalchemy.engine.base.Engine {}
2021-01-27 12:09:20,738 INFO sqlalchemy.engine.base.Engine SELECT DATABASE()
2021-01-27 12:09:20,739 INFO sqlalchemy.engine.base.Engine {}
2021-01-27 12:09:20,746 INFO sqlalchemy.engine.base.Engine SELECT CAST('test plain returns' AS CHAR(60)) AS anon_1
2021-01-27 12:09:20,748 INFO sqlalchemy.engine.base.Engine {}
2021-01-27 12:09:20,751 INFO sqlalchemy.engine.base.Engine SELECT CAST('test unicode returns' AS CHAR(60)) AS anon_1
2021-01-27 12:09:20,752 INFO sqlalchemy.engine.base.Engine {}
Engine(mysql+mysqlconnector://root:***@localhost:3306/sakila)

Now we can run a basic SQL query, pass it with pandas, and display the output as a DataFrame!

SQL SELECT Statement

The SELECT statement is used to select data from a database. The result is then stored in a result table, called usually the result-set. ##### Syntax for SQL SELECT SELECT column_name FROM table_name ***We could also select multiple columns:*** SELECT column_name1,column_name2
FROM table_name ***Or we could select everything in a table using*** SELECT * FROM table_name
# Set SQL query as a comment
sql_query = ''' SELECT * FROM actor '''

# Use pandas to pass sql query using connection form SQLite3
df = pd.read_sql(sql_query, conn)

# Show the resulting DataFrame
df
2021-01-27 12:09:25,309 INFO sqlalchemy.engine.base.Engine DESCRIBE ` SELECT * FROM actor `
2021-01-27 12:09:25,310 INFO sqlalchemy.engine.base.Engine {}
2021-01-27 12:09:25,769 INFO sqlalchemy.engine.base.Engine ROLLBACK
2021-01-27 12:09:25,772 INFO sqlalchemy.engine.base.OptionEngine  SELECT * FROM actor 
2021-01-27 12:09:25,773 INFO sqlalchemy.engine.base.OptionEngine {}
actor_id first_name last_name last_update
0 1 PENELOPE GUINESS 2006-02-15 04:34:33
1 2 NICK WAHLBERG 2006-02-15 04:34:33
2 3 ED CHASE 2006-02-15 04:34:33
3 4 JENNIFER DAVIS 2006-02-15 04:34:33
4 5 JOHNNY LOLLOBRIGIDA 2006-02-15 04:34:33
... ... ... ... ...
197 198 MARY KEITEL 2006-02-15 04:34:33
198 199 JULIA FAWCETT 2006-02-15 04:34:33
199 200 THORA TEMPLE 2006-02-15 04:34:33
200 201 Julia Roberts 2020-11-10 13:00:42
201 202 Julia Roberts 2020-11-10 13:09:08

202 rows × 4 columns

# Set function as our sql_to_pandas

def sql_to_df(sql_query):

    # Use pandas to pass sql query using connection form SQLite3
    df = pd.read_sql(sql_query, con)

    # Show the resulting DataFrame
    return df

Selecting Multiple Columns

# Select multiple columns example
query = ''' SELECT first_name,last_name
            FROM customer; '''

# Grab from first two columns
sql_to_df(query).head()
first_name last_name
0 MARY SMITH
1 PATRICIA JOHNSON
2 LINDA WILLIAMS
3 BARBARA JONES
4 ELIZABETH BROWN

Selecting Everything from table with *

# Select multiple columns example
query = ''' SELECT *
            FROM customer; '''

# Grab 
sql_to_df(query).head()

Syntax for the SQL DISTINCT Statement

In a table, a column may contain duplicate values; and sometimes you only want to list the distinct (unique) values. The DISTINCT keyword can be used to return only distinct (unique) values.

SELECT DISTINCT column_name FROM table_name;

# Select distinct country_ids from the city table.
query = ''' SELECT DISTINCT(country_id)
            FROM city'''

sql_to_df(query).head()

country_id
0 1
1 2
2 3
3 4
4 5

Syntax for the SQL WHERE

The WHERE clause is used to filter records, the WHERE clause is used to extract only the records that fulfill the specific parameter.

SELECT column_name FROM table_name WHERE column_name ( math operator) desired_value;

# Select all customer info from the 1st store.
query = ''' SELECT *
            FROM customer
            WHERE store_id = 1'''

sql_to_df(query).head()
customer_id store_id first_name last_name email address_id active create_date last_update
0 1 1 MARY SMITH MARY.SMITH@sakilacustomer.org 5 1 2006-02-14 22:04:36.000 2011-09-14 18:10:28
1 2 1 PATRICIA JOHNSON PATRICIA.JOHNSON@sakilacustomer.org 6 1 2006-02-14 22:04:36.000 2011-09-14 18:10:28
2 3 1 LINDA WILLIAMS LINDA.WILLIAMS@sakilacustomer.org 7 1 2006-02-14 22:04:36.000 2011-09-14 18:10:28
3 5 1 ELIZABETH BROWN ELIZABETH.BROWN@sakilacustomer.org 9 1 2006-02-14 22:04:36.000 2011-09-14 18:10:28
4 7 1 MARIA MILLER MARIA.MILLER@sakilacustomer.org 11 1 2006-02-14 22:04:36.000 2011-09-14 18:10:28

Note, there are a variety of logical operators you can use for a SQL request.

Operator Description
% Equal
<> Not equal. Note: In some versions of SQL this operator may be written !=
> Greater than
< Less than
>= Greater than or equal
<= Less than or equal

SQL requires single quotes around text values, while numeric fields are not enclosed in quotes, for example a text value for the above where statement:

# Select all customer info from Mary.
query = ''' SELECT *
            FROM customer
            WHERE first_name = 'MARY'  '''

sql_to_df(query).head()
customer_id store_id first_name last_name email address_id active create_date last_update
0 1 1 MARY SMITH MARY.SMITH@sakilacustomer.org 5 1 2006-02-14 22:04:36.000 2011-09-14 18:10:28

Syntax for AND

The AND operator is used to filter records based on more than one condition.

The AND operator displays a record if both the first condition AND the second condition are true.

# Select all films from 2006 that are rated R.

query = ''' SELECT *
            FROM film
            WHERE release_year = 2006
            AND rating = 'R' '''

sql_to_df(query).head()
film_id title description release_year language_id original_language_id rental_duration rental_rate length replacement_cost rating special_features last_update
0 8 AIRPORT POLLOCK A Epic Tale of a Moose And a Girl who must Con... 2006 1 None 6 4.99 54 15.99 R Trailers 2011-09-14 18:05:33
1 17 ALONE TRIP A Fast-Paced Character Study of a Composer And... 2006 1 None 3 0.99 82 14.99 R Trailers,Behind the Scenes 2011-09-14 18:05:33
2 20 AMELIE HELLFIGHTERS A Boring Drama of a Woman And a Squirrel who m... 2006 1 None 4 4.99 79 23.99 R Commentaries,Deleted Scenes,Behind the Scenes 2011-09-14 18:05:33
3 21 AMERICAN CIRCUS A Insightful Drama of a Girl And a Astronaut w... 2006 1 None 3 4.99 129 17.99 R Commentaries,Behind the Scenes 2011-09-14 18:05:33
4 23 ANACONDA CONFESSIONS A Lacklusture Display of a Dentist And a Denti... 2006 1 None 3 0.99 92 9.99 R Trailers,Deleted Scenes 2011-09-14 18:05:33

Syntax for OR

The OR operator displays a record if either the first condition OR the second condition is true.

# Select all films from R or PG.

query = ''' SELECT *
            FROM film
            WHERE rating = 'PG'
            OR rating = 'R' '''

sql_to_df(query).head()
film_id title description release_year language_id original_language_id rental_duration rental_rate length replacement_cost rating special_features last_update
0 1 ACADEMY DINOSAUR A Epic Drama of a Feminist And a Mad Scientist... 2006 1 None 6 0.99 86 20.99 PG Deleted Scenes,Behind the Scenes 2011-09-14 18:05:32
1 6 AGENT TRUMAN A Intrepid Panorama of a Robot And a Boy who m... 2006 1 None 3 2.99 169 17.99 PG Deleted Scenes 2011-09-14 18:05:33
2 8 AIRPORT POLLOCK A Epic Tale of a Moose And a Girl who must Con... 2006 1 None 6 4.99 54 15.99 R Trailers 2011-09-14 18:05:33
3 12 ALASKA PHANTOM A Fanciful Saga of a Hunter And a Pastry Chef ... 2006 1 None 6 0.99 136 22.99 PG Commentaries,Deleted Scenes 2011-09-14 18:05:33
4 13 ALI FOREVER A Action-Packed Drama of a Dentist And a Croco... 2006 1 None 4 4.99 150 21.99 PG Deleted Scenes,Behind the Scenes 2011-09-14 18:05:33

ORDER BY and GROUP BY statements.

Before we begin with Wildcards, ORDER BY, and GROUP BY. Let's take a look at aggregate functions. * AVG() - Returns the average value. * COUNT() - Returns the number of rows. * FIRST() - Returns the first value. * LAST() - Returns the last value. * MAX() - Returns the largest value. * MIN() - Returns the smallest value. * SUM() - Returns the sum. You can call any of these aggregate functions on a column to get the resulting values back..
# Count the number of customers
query = ''' SELECT COUNT(customer_id)
            FROM customer; '''

# Grab 
sql_to_df(query).head()
COUNT(customer_id)
0 599

Go ahead and experiment with the other aggregate functions. The usual syntax is:

SELECT column_name, aggregate_function(column_name)
FROM table_name
WHERE column_name

SQL Wildcards

A wildcard character can be used to substitute for any other characters in a string. In SQL, wildcard characters are used with the SQL LIKE operator. The LIKE operator is used in a WHERE clause to search for a specified pattern in a column.

There are several wildcard operators:

Wildcard Description
% A substitute for zero or more characters
_ A substitute for a single character
[character_list] Sets and ranges of characters to match

Let’s see them in examples now!

The % wildcard

# Select any customers whose name start with an M
query = ''' SELECT *
            FROM customer
            WHERE first_name LIKE 'M%' ; '''

# Grab 
sql_to_df(query).head()
customer_id store_id first_name last_name email address_id active create_date last_update
0 1 1 MARY SMITH MARY.SMITH@sakilacustomer.org 5 1 2006-02-14 22:04:36.000 2011-09-14 18:10:28
1 7 1 MARIA MILLER MARIA.MILLER@sakilacustomer.org 11 1 2006-02-14 22:04:36.000 2011-09-14 18:10:28
2 9 2 MARGARET MOORE MARGARET.MOORE@sakilacustomer.org 13 1 2006-02-14 22:04:36.000 2011-09-14 18:10:28
3 21 1 MICHELLE CLARK MICHELLE.CLARK@sakilacustomer.org 25 1 2006-02-14 22:04:36.000 2011-09-14 18:10:28
4 30 1 MELISSA KING MELISSA.KING@sakilacustomer.org 34 1 2006-02-14 22:04:36.000 2011-09-14 18:10:29

The _ wildcard

# Select any customers whose last name ends with ing
query = ''' SELECT *
            FROM customer
            WHERE last_name LIKE '_ING' ; '''

# Grab 
sql_to_df(query).head()
customer_id store_id first_name last_name email address_id active create_date last_update
0 30 1 MELISSA KING MELISSA.KING@sakilacustomer.org 34 1 2006-02-14 22:04:36.000 2011-09-14 18:10:29

Now we will move on to the [Character_list] wildcard.

IMPORTANT NOTE!

Using [charlist] with SQLite is a little different than with other SQL formats, such as MySQL. * In MySQL you would use: WHERE value LIKE '[charlist]%' * In SQLite you use: WHERE value GLOB '[charlist]*'

The [character_list] wildcard

# Select any customers whose first name begins with an A or a B
query = ''' SELECT *
            FROM customer
            WHERE first_name GLOB '[AB]*' ; '''

# Grab 
sql_to_df(query).head()
customer_id store_id first_name last_name email address_id active create_date last_update
0 4 2 BARBARA JONES BARBARA.JONES@sakilacustomer.org 8 1 2006-02-14 22:04:36.000 2011-09-14 18:10:28
1 14 2 BETTY WHITE BETTY.WHITE@sakilacustomer.org 18 1 2006-02-14 22:04:36.000 2011-09-14 18:10:28
2 29 2 ANGELA HERNANDEZ ANGELA.HERNANDEZ@sakilacustomer.org 33 1 2006-02-14 22:04:36.000 2011-09-14 18:10:29
3 31 2 BRENDA WRIGHT BRENDA.WRIGHT@sakilacustomer.org 35 1 2006-02-14 22:04:36.000 2011-09-14 18:10:29
4 32 1 AMY LOPEZ AMY.LOPEZ@sakilacustomer.org 36 1 2006-02-14 22:04:36.000 2011-09-14 18:10:29

SQL ORDER BY

The ORDER BY keyword is used to sort the result-set by one or more columns. The ORDER BY keyword sorts the records in ascending order by default. To sort the records in a descending order, you can use the DESC keyword. The syntax is:

SELECT column_name
FROM table_name
ORDER BY column_name ASC|DESC

# Select all customers and order results by last name
query = ''' SELECT *
            FROM customer
            ORDER BY last_name ; '''

# Grab 
sql_to_df(query).head()
customer_id store_id first_name last_name email address_id active create_date last_update
0 505 1 RAFAEL ABNEY RAFAEL.ABNEY@sakilacustomer.org 510 1 2006-02-14 22:04:37.000 2011-09-14 18:10:42
1 504 1 NATHANIEL ADAM NATHANIEL.ADAM@sakilacustomer.org 509 1 2006-02-14 22:04:37.000 2011-09-14 18:10:42
2 36 2 KATHLEEN ADAMS KATHLEEN.ADAMS@sakilacustomer.org 40 1 2006-02-14 22:04:36.000 2011-09-14 18:10:29
3 96 1 DIANA ALEXANDER DIANA.ALEXANDER@sakilacustomer.org 100 1 2006-02-14 22:04:36.000 2011-09-14 18:10:30
4 470 1 GORDON ALLARD GORDON.ALLARD@sakilacustomer.org 475 1 2006-02-14 22:04:37.000 2011-09-14 18:10:41
# Select all customers and order results by last name, DESCENDING
query = ''' SELECT *
            FROM customer
            ORDER BY last_name DESC; '''

# Grab 
sql_to_df(query).head()

SQL GROUP BY

The GROUP BY statement is used with the aggregate functions to group the results by one or more columns. The syntax is:

SELECT column_name, aggregate_function(column_name)
FROM table_name
WHERE column_name operator value
GROUP BY column_name;

Let’s see how it works.

# Count the number of customers per store

query = ''' SELECT store_id , COUNT(customer_id)
            FROM customer
            GROUP BY store_id; '''

# Grab 
sql_to_df(query).head()
store_id COUNT(customer_id)
0 1 326
1 2 273

Some Useful Ressources:

https://www.w3schools.com/python/python_mysql_getstarted.asp

https://python.doctor/page-database-data-base-donnees-query-sql-mysql-postgre-sqlite

https://realpython.com/python-sqlite-sqlalchemy/

https://pypi.org/project/SQLAlchemy/

https://www.pythoncentral.io/introductory-tutorial-python-sqlalchemy/