SQL basic commands-优快云博客

本文链接：https://blog.youkuaiyun.com/weixin_42222933/article/details/146170512

Lesson 1

Line breaks don’t mean anything specific in SQL!!

Some of the most common data types are:

INTEGER, a positive or negative whole number
TEXT, a text string
DATE, the date formatted as YYYY-MM-DD
REAL, a decimal value

Create table

CREATE TABLE table_name (
   column_1 data_type, 
   column_2 data_type, 
   column_3 data_type
);

Insert

INSERT INTO celebs (id, name, age) 
VALUES (1, 'Justin Bieber', 29);

INSERT INTO is a clause that adds the specified row or rows.
celebs is the table the row is added to.
(id, name, age) is a parameter identifying the columns that data will be inserted into.
VALUES is a clause that indicates the data being inserted.
(1, ‘Justin Bieber’, 29) is a parameter identifying the values being inserted.
1: an integer that will be added to id column
‘Justin Bieber’: text that will be added to name column
29: an integer that will be added to age column

Select

SELECT name FROM celebs;
SELECT * FROM celebs;

Alter

ALTER TABLE celebs 
ADD COLUMN twitter_handle TEXT;

ALTER TABLE is a clause that lets you make the specified changes.
celebs is the name of the table that is being changed.
ADD COLUMN is a clause that lets you add a new column to a table:

twitter_handle is the name of the new column being added
TEXT is the data type for the new column
4. NULL is a special value in SQL that represents missing or unknown data. Here, the rows that existed before the column was added have NULL (∅) values for twitter_handle.

Update

UPDATE celebs 
SET twitter_handle = '@taylorswift13' 
WHERE id = 4;

UPDATE is a clause that edits a row in the table.
celebs is the name of the table.
SET is a clause that indicates the column to edit.

twitter_handle is the name of the column that is going to be updated
@taylorswift13 is the new value that is going to be inserted into the twitter_handle column.
4. WHERE is a clause that indicates which row(s) to update with the new column value. Here the row with a 4 in the id column is the row that will have the twitter_handle updated to @taylorswift13.

Delete

DELETE FROM celebs 
WHERE twitter_handle IS NULL;

Constraints

CREATE TABLE celebs (
   id INTEGER PRIMARY KEY, 
   name TEXT UNIQUE,
   date_of_birth TEXT NOT NULL,
   date_of_death TEXT DEFAULT 'Not Applicable'
);

PRIMARY KEY columns can be used to uniquely identify the row. Attempts to insert a row with an identical value to a row already in the table will result in a constraint violation which will not allow you to insert the new row.
UNIQUE columns have a different value for every row. This is similar to PRIMARY KEY except a table can have many different UNIQUE columns.
NOT NULL columns must have a value. Attempts to insert a row without a value for a NOT NULL column will result in a constraint violation and the new row will not be inserted.
DEFAULT columns take an additional argument that will be the assumed value for an inserted row if the new row does not specify a value for that column.

Lesson 2: SQLite

Good: In SQLite, a database is stored in a single file — a trait that distinguishes it from other database engines. This fact allows for a great deal of accessibility: copying a database is no more complicated than copying the file that stores the data, sharing a database can mean sending an email attachment.

Bad: SQLite’s signature portability unfortunately makes it a poor choice when many different users are updating the table at the same time (to maintain integrity of data, only one user can write to the file at a time). It also may require some more work to ensure the security of private data due to the same features that make SQLite accessible. Furthermore, SQLite does not offer the same exact functionality as many other database systems, limiting some advanced features other relational database systems offer. Lastly, SQLite does not validate data types. Where many other database software would reject data that does not conform to a table’s schema, SQLite allows users to store data of any type into any column.

As

SELECT name AS 'Titles'
FROM movies;

AS is a keyword in SQL that allows you to rename a column or table using an alias.

Notice:
Although it’s not always necessary, it is considered best practice to surround your aliases with single quotes.
Note that this practice is specific to SQLite, the RDBMS used in this exercise. When you work with other RDBMSs, notably PostgreSQL, no quotes or double quotes may be required in place of single quotes.
When using AS, the columns are not being renamed in the table. The aliases only appear in the result.

Distinct

DISTINCT
is used to return unique values in the output. It filters out all duplicate values in the specified column(s).
For instance,

SELECT tools 
FROM inventory;

may produce:
在这里插入图片描述

By adding DISTINCT before the column name,

SELECT DISTINCT tools 
FROM inventory;

the result would be:
在这里插入图片描述

Where

SELECT *
FROM movies
WHERE imdb_rating > 8;

The WHERE clause filters the result set to only include rows where the following condition is true.
imdb_rating > 8 is the condition. Here, only rows with a value greater than 8 in the imdb_rating column will be returned.

The > is an operator. Operators create a condition that can be evaluated as either true or false.

Comparison operators used with the WHERE clause are:

= equal to
!= not equal to

greater than
< less than
= greater than or equal to
<= less than or equal to

Like

How could we select all movies that start with ‘Se’ and end with ‘en’ and have exactly one character in the middle?

SELECT * 
FROM movies
WHERE name LIKE 'Se_en';

The _ means you can substitute any individual character here without breaking the pattern. The names Seven and Se7en both match this pattern.

Like II

The percentage sign % is another wildcard character that can be used with LIKE.
This statement below filters the result set to only include movies with names that begin with the letter ‘A’:

SELECT * 
FROM movies
WHERE name LIKE 'A%';

% is a wildcard character that matches zero or more missing characters in the pattern. For example:
A% matches all movies with names that begin with letter ‘A’
%a matches all movies that end with ‘a’

Is Null & Is not Null

To filter for all movies with an IMDb rating:

SELECT name
FROM movies 
WHERE imdb_rating IS NOT NULL;

SELECT name
FROM movies
WHERE imdb_rating IS NULL;

Between

The BETWEEN
operator is used in a WHERE clause to filter the result set within a certain range. It accepts two values that are either numbers, text or dates.
For example, this statement filters the result set to only include movies with years from 1990 up to, and including 1999.

SELECT *
FROM movies
WHERE year BETWEEN 1990 AND 1999;

When the values are text, BETWEEN filters the result set for within the alphabetical range.
In this statement, BETWEEN filters the result set to only include movies with names that begin with the letter ‘A’ up to, but not including ones that begin with ‘J’.

SELECT *
FROM movies
WHERE name BETWEEN 'A' AND 'J';

And

Sometimes we want to combine multiple conditions in a WHERE clause to make the result set more specific and useful.

One way of doing this is to use the AND operator. Here, we use the AND operator to only return 90’s romance movies.

SELECT * 
FROM movies
WHERE year BETWEEN 1990 AND 1999
   AND genre = 'romance';

Or

SELECT *
FROM movies
WHERE year > 2014
   OR genre = 'action';

Order by

We can sort the results using ORDER BY, either alphabetically or numerically. Sorting the results often makes the data more useful and easier to analyze.

For example, if we want to sort everything by the movie’s title from A through Z:

SELECT *
FROM movies
ORDER BY name;

ORDER BY is a clause that indicates you want to sort the result set by a particular column.
name is the specified column.

SELECT *
FROM movies
WHERE imdb_rating > 8
ORDER BY year DESC;

DESC is a keyword used in ORDER BY to sort the results in descending order (high to low or Z-A).
ASC is a keyword used in ORDER BY to sort the results in ascending order (low to high or A-Z).

Limit

SELECT *
FROM movies
LIMIT 10;

LIMIT is a clause that lets you specify the maximum number of rows the result set will have. This saves space on our screen and makes our queries run faster.
LIMIT always goes at the very end of the query. Also, it is not supported in all SQL databases.

Case

A CASE statement allows us to create different outputs (usually in the SELECT statement). It is SQL’s way of handling if-then logic.

Suppose we want to condense the ratings in movies to three levels:

If the rating is above 8, then it is Fantastic.
If the rating is above 6, then it is Poorly Received.
Else, Avoid at All Costs.

SELECT name,
 CASE
  WHEN imdb_rating > 8 THEN 'Fantastic'
  WHEN imdb_rating > 6 THEN 'Poorly Received'
  ELSE 'Avoid at All Costs'
 END
FROM movies;

Each WHEN tests a condition and the following THEN gives us the string if the condition is true.
The ELSE gives us the string if all the above conditions are false.
The CASE statement must end with END.

In the result, you have to scroll right because the column name is very long. To shorten it, we can rename the column to ‘Review’ using AS:

SELECT name,
 CASE
  WHEN imdb_rating > 8 THEN 'Fantastic'
  WHEN imdb_rating > 6 THEN 'Poorly Received'
  ELSE 'Avoid at All Costs'
 END AS 'Review'
FROM movies;

select name,
  case name # you can write name here instead of writing it in each line of when
    when 'kale-smoothie'    then 'smoothie'
    when 'banana-smoothie'  then 'smoothie'
    when 'orange-juice'     then 'drink'
    when 'soda'             then 'drink'
    when 'blt'              then 'sandwich'
    when 'grilled-cheese'   then 'sandwich'
    when 'tikka-masala'     then 'dinner'
    when 'chicken-parm'     then 'dinner'
     else 'other'
  end as category
from order_items
order by id
limit 100;

Lesson 3

COUNT(): count the number of rows
SUM(): the sum of the values in a column
MAX()/MIN(): the largest/smallest value
AVG(): the average of the values in a column
ROUND(): round the values in the column

Count

COUNT() is a function that takes the name of a column as an argument and counts the number of non-empty values in that column.

SELECT COUNT(*)
FROM table_name;

SELECT	state, 
    COUNT(CASE WHEN elevation < 1000 THEN 1 ELSE NULL END) as count_low_elevation_aiports 
FROM airports 
GROUP BY state;

Sum

SUM() is a function that takes the name of a column as an argument and returns the sum of all the values in that column.

SELECT SUM(downloads)
FROM fake_apps;

Max/Min

SELECT MAX(downloads)
FROM fake_apps;

Average

SELECT AVG(downloads)
FROM fake_apps;

Round

ROUND() function takes two arguments inside the parenthesis:

a column name
an integer
It rounds the values in the column to the number of decimal places specified by the integer.

SELECT name, ROUND(price, 0)
FROM fake_apps;

SELECT ROUND(AVG(price),2)
FROM fake_apps;

Here, we pass the column price and integer 0 as arguments. SQL rounds the values in the column to 0 decimal places in the output.

Group By I

SELECT year,
   AVG(imdb_rating)
FROM movies
GROUP BY year
ORDER BY year;

Group By II

SQL lets us use column reference(s) in our GROUP BY that will make our lives easier.

1 is the first column selected
2 is the second column selected
3 is the third column selected
and so on.

SELECT ROUND(imdb_rating),
   COUNT(name)
FROM movies
GROUP BY 1
ORDER BY 1;

SELECT category, 
   price,
   AVG(downloads)
FROM fake_apps
GROUP BY category, price;

Having

HAVING is very similar to WHERE. In fact, all types of WHERE clauses you learned about thus far can be used with HAVING. HAVING is used for filtering groups.

SELECT year,
   genre,
   COUNT(name)
FROM movies
GROUP BY 1, 2
HAVING COUNT(name) > 10;

When we want to limit the results of a query based on values of the individual rows, use WHERE.
When we want to limit the results of a query based on an aggregate property, use HAVING.
HAVING statement always comes after GROUP BY, but before ORDER BY and LIMIT.

Lesson 4

Join

SELECT *
FROM orders
JOIN customers
  ON orders.customer_id = customers.customer_id;

The first line selects all columns from our combined table. If we only want to select certain columns, we can specify which ones we want.
The second line specifies the first table that we want to look in, orders
The third line uses JOIN to say that we want to combine information from orders with customers.
The fourth line tells us how to combine the two tables. We want to match orders table’s customer_id column with customers table’s customer_id column.

Left Join

A left join will keep all rows from the first table, regardless of whether there is a matching row in the second table. If unmatch, it will omit the unmatched row from the 2nd table.

SELECT *
FROM table1
LEFT JOIN table2
  ON table1.c2 = table2.c2;

SELECT *
FROM newspaper
LEFT JOIN online
  ON newspaper.id = online.id
WHERE online.id IS NULL;

Primary Key vs Foreign Key

we had three tables: orders, subscriptions, and customers.
Each of these tables has a column that uniquely identifies each row of that table:

order_id for orders
subscription_id for subscriptions
customer_id for customers
These special columns are called primary keys.

Primary keys have a few requirements:

None of the values can be NULL.
Each value must be unique (i.e., you can’t have two customers with the same customer_id in the customers table).
A table can not have more than one primary key column.

When the primary key for one table appears in a different table, it is called a foreign key.
Generally, the primary key will just be called id. Foreign keys will have more descriptive names.

Cross Join

Sometimes, we just want to combine all rows of one table with all rows of another table.

For instance, if we had a table of shirts and a table of pants, we might want to know all the possible combinations to create different outfits.

SELECT shirts.shirt_color,
   pants.pants_color
FROM shirts
CROSS JOIN pants;

The first two lines select the columns shirt_color and pants_color.
The third line pulls data from the table shirts.
The fourth line performs a CROSS JOIN with pants.
Notice that cross joins don’t require an ON statement.

Union

Sometimes we just want to stack one dataset on top of the other. Well, the UNION operator allows us to do that.

SELECT *
FROM table1
UNION
SELECT *
FROM table2;

SQL has strict rules for appending data:

Tables must have the same number of columns.
The columns must have the same data types in the same order as the first table.

# Each SELECT statement within the UNION must have the same number of columns with similar data types. 
# UNION: Removes duplicate rows from the final result set. 
SELECT column_name(s) FROM table1
UNION
SELECT column_name(s) FROM table2;

# UNION ALL: Includes all rows from the combined result sets, including duplicates.
SELECT column_name(s) FROM table1
UNION ALL
SELECT column_name(s) FROM table2;

With

WITH previous_results AS (
   SELECT ...
   ...
   ...
   ...
)
SELECT *
FROM previous_results
JOIN customers
  ON _____ = _____;

The WITH statement allows us to perform a separate query (such as aggregating customer’s subscriptions)
previous_results is the alias that we will use to reference any columns from the query inside of the WITH clause
We can then go on to do whatever we want with this temporary table (such as join the temporary table with another table)
Essentially, we are putting a whole first query inside the parentheses () and giving it a name. After that, we can use this name as if it’s a table and write a new query using the first query.

Non-Correlated Subqueries

SELECT a.dep_month,
       a.dep_day_of_week,
       AVG(a.flight_count) AS average_flights
  FROM (
        SELECT dep_month,
              dep_day_of_week,
               dep_date,
               COUNT(*) AS flight_count
          FROM flights
         GROUP BY 1,2,3
       ) a
 GROUP BY 1,2
 ORDER BY 1,2;

Correlated Subqueries

SELECT id
FROM flights AS f
WHERE distance > (
 SELECT AVG(distance)
 FROM flights
 WHERE carrier = f.carrier);

The f is an alias for the table flights. It is used to give the table a shorter, more convenient name when referencing it in the query.

Explanation of the Query:

flights AS f assigns the alias f to the flights table.
The outer query selects id from the flights table (aliased as f).
The WHERE condition checks if the distance of the current row (f.distance) is greater than the average distance of all flights with the same carrier.
The subquery calculates AVG(distance) for all flights that have the same carrier as the current row (WHERE carrier = f.carrier).

SELECT carrier, id,
    (SELECT COUNT(*)
FROM flights f
WHERE f.id < flights.id
AND f.carrier=flights.carrier) + 1
 AS flight_sequence_number
FROM flights;

What does f mean?
The f is an alias for the flights table within the subquery. It is used to reference flights again inside the subquery without confusion.

Breaking Down the Query

Outer Query:
Retrieves carrier, id, and computes a column called flight_sequence_number for each row in the flights table.
Subquery (inside the SELECT statement):
Counts how many flights (COUNT(*)) have an id smaller than the current row’s id within the same carrier.
WHERE f.id < flights.id: Ensures only flights before the current flight (i.e., lower id) are counted.
AND f.carrier = flights.carrier: Ensures that the count is only within the same airline (carrier).
Adding 1:
Since COUNT(*) starts from 0 (for the first flight of a carrier), adding 1 ensures that the first flight has a flight_sequence_number of 1, the second is 2, and so on.

Intersect

# it returns only common rows returned by the two SELECT statements.
SELECT column_name(s) FROM table1

INTERSECT

SELECT column_name(s) FROM table2;

Except

# It returns distinct rows from the first statement that aren’t output by the second SELECT statement.
SELECT column_name(s) FROM table1

EXCEPT

SELECT column_name(s) FROM table2;

Stings

# connect 'city' column and 'state' column as one column named as location
SELECT city || ' ' || state as location
FROM bakeries;

# replace '_' in 'ingredients' column as space
SELECT id, REPLACE(ingredients,'_',' ') as item_ingredients
from baked_goods;

Cast

CAST(number1 AS REAL) / number2;: Returns the result as a real number by casting one of numeric inputs as a real number

Tips

Functions

strftime(format, column)
e.g. return the “hour” from column “timestamp”

strftime('%H', timestamp)

Datetime
It will return date and time in the format of YYYY-MM-DD hh:mm:ss

SELECT DATETIME(delivery_time)
FROM baked_goods;

Date/Time

# return date in YYYY-MM-DD
SELECT DATE(manufacture_time), count(*) as count_baked_goods
FROM baked_goods
GROUP BY DATE(manufacture_time);

# return time in hh:mm:ss
SELECT TIME(manufacture_time), count(*) as count_baked_goods
FROM baked_goods
GROUP BY TIME(manufacture_time);