# How to Generate Series to Avoid Gaps in Data in BigQuery

If you're [grouping by time in BigQuery](https://popsql.com/learn-sql/bigquery/how-to-group-by-time-in-bigquery/) and don't want gaps in your data, you need to generate a series of time values. In the example below, we're generating a series of timestamps that increment by the hour. In these examples, the data type of the schema is a `timestamp`.

First we generate our series with no gaps:

```sql
select *
from UNNEST(GENERATE_TIMESTAMP_ARRAY('2015-10-01', '2015-10-03', INTERVAL 1 HOUR)) AS hour
```

```sql
| hour                    |
|-------------------------|
| 2015-10-01 00:00:00 UTC |
| 2015-10-01 01:00:00 UTC |
| 2015-10-01 02:00:00 UTC |
| 2015-10-01 03:00:00 UTC |
| 2015-10-01 04:00:00 UTC |
| 2015-10-01 05:00:00 UTC |
...
```

Note: BigQuery allows you to generate ~1 million intervals with this method. If your schema is `datetime`, simply use `GENERATE_DATETIME_ARRAY`. For `date`, use `GENERATE_DATE_ARRAY`.

Next use a common table expression to create a table that contains the gapless series we created above. We'll call it `hours` 🧐. We can now `left join` any table with "gappy" time series data to our gapless `hours` table:

```sql
with hours as (
  select *
  from UNNEST(GENERATE_TIMESTAMP_ARRAY('2015-10-01', '2015-10-03', INTERVAL 1 HOUR)) AS hour
)

select
  hours.hour,
  COUNT(id)
from hours
left join `bigquery-public-data.hacker_news.comments` on timestamp_trunc(`bigquery-public-data.hacker_news.comments`.time_ts,hour) = hours.hour
group by 1
order by 1;
```

We're using a BigQuery public dataset on Hacker News in our example above, so you can follow along. It only takes [5 minutes to get started with BigQuery in PopSQL](https://popsql.com/learn-sql/bigquery/get-started-with-bigquery-and-popsql-in-5-minutes/).

Not only does PopSQL allow you to share queries with your teammates, but it autogenerates charts to save you even more time. Here's the results from the query above:

![bigquery graph example](//images.ctfassets.net/iv1sg9nibjwl/7qo4jaP6v52fpxgi0wphZZ/aef5c5bb54d24c535c46acd52d41d922/bigquery_graph_example.png)

Isn't exploring BigQuery exhilarating? 🎢

How to Generate Series to Avoid Gaps in Data

# How to Insert Into PostgreSQL
Here's the shortest and easiest way to insert in PostgreSQL. You only have to specify the values, but you have to pass all values, and they have to be in order. So if you have 10 columns, you have to specify 10 values.

```sql
-- Assuming the users table has only three columns: first_name, last_name, and email, and in that order
insert into users values ('John', 'Doe', 'john@doe.com');
```

If you have many columns, but only want to specify some:

```sql
insert into users (first_name) values ('John');
```

If you want to insert into a JSON column, just wrap the valid JSON in a single quoted string:

```sql
insert into users (preferences) values ('{ "beta": true }');
```

If inserting a row would violate a unique constraint, you can use Postgres' `on conflict` clause to specify what to do when that happens. For example, imagine you have a webhook system and you want to gracefully handle duplicate webhooks:

```sql
-- If we already recorded this webhook, do nothing
insert into stripe_webhooks (event_id)
values ('evt_123')
on conflict do nothing;
```

You can also do "upserts" (update or insert) in Postgres:

```sql
-- Assuming you have a unique index on email
insert into users (email, name)
values ('john@doe.com', 'Jane Doe')
on conflict (email) do update set name = excluded.name; -- excluded.name refers to the 'Jane Doe' value
```

How to Insert

# How to Update in PostgreSQL
```sql
-- All rows
update users set updated_at = now();

-- Some rows
update users set updated_at = now() where id = 1;
```

How to Update

# How to Delete in PostgreSQL

```sql
delete from users where id = 1;
```

How to Delete

# How to Trim Strings in PostgreSQL
The `trim()` function removes specified characters or spaces from a string. You can specify to trim from only the start or end of the string, or trim from both. This is best explained using examples.

You can trim "1" from start of the string:

```sql
select trim(leading '1' from '111hello111');

 ltrim   
----------
 hello111
```

You can trim "1" from the end of the string:

```sql
select trim(trailing '1' from '111hello111211');

 rtrim     
--------------
 111hello1112
```

_Note: the 1's before the 2 were not trimmed, as they are not the end of the string._

You can specify multiple characters to trim. PostgreSQL will remove any combination of those characters in succession:

```sql
select trim(both 'abc' from 'abcbabccchellocbaabc');

 btrim
-------
 hello
```

PostgreSQL also supports the non-standard syntax of specifying which characters to remove as the last parameter, separated by a comma:

```sql
select trim(both 'abcbabccchellocbaabc','abc');

 btrim
-------
 hello
```

Not specifying where to trim from has the same result as "both":

```sql
 select trim('1' from '111hello111');

 btrim
-------
 hello
```

Not specifying what character to trim will result in trimming spaces:

```sql
select trim('    remove spaces from both sides    ');

 btrim             
-------------------------------
 remove spaces from both sides
```

`rtrim()` and `ltrim()` are special versions that only remove trailing and leading characters, respectively. These functions only accept parameters separated by commas:

```sql
select ltrim('zzzyclean this up', 'xyz');

 ltrim     
---------------
 clean this up
```

```sql
select rtrim('again not specifying what to trim removes spaces   ');

 rtrim                       
--------------------------------------------------
 again not specifying what to trim removes spaces
```

How to Trim Strings

# How to Use substring() in PostgreSQL
In PostgreSQL, the function `substring()` has many uses. The simplest one extracts a number of characters from the supplied string. Let's look at its syntax and parameters:

**Syntax:**

```sql
substring(original_string [from <starting_position>] [for <number_of_characters>])
```

**Parameters:**

* original\_string (required) - the string or column name to extract from
* `from` starting\_position (optional) - the position of the first character to be extracted. If not specified, the extraction will be from the first character.
* `for` number\_of\_characters - the number of characters to be extracted. If not specified, the extraction will be up to the last character of the string.

Note that while the last two parameters are optional, you must use one of them. Here are some examples:

```sql
# select substring('Learning SQL is essential.' from 10);
 substring 
-------------------
 SQL is essential.
```

```sql
# select substring('Learning SQL is essential.' from 10 for 3);
 substring
-----------
 SQL
```

```sql
# select substring('Learning SQL is essential.' for 13);
 substring 
---------------
 Learning SQL
```

`starting_position` can also be a negative number. On its own it is ignored but when used with `number_of_characters` it causes the function to count "invisible" characters to the left of the string.

```sql
# select substring('Learning SQL is essential.', -4, 10);
 substring
-----------
 Learn
```

There is also a version of this function that uses commas to separate the parameters instead of using the keywords `from` and `for`. In this case only the number of characters parameter is optional.

```sql
# select substring('Learning SQL is essential.', 10, 3);
 substring
-----------
 SQL

# select substring('Learning SQL is essential.', 10);
 substring 
-------------------
 SQL is essential.
```

How to Use substring()

# How to Use PostgreSQL substring() with RegEx to Extract a String
You can use regular expressions in the `substring()` function to extract a string that matches a specified pattern:

**Syntax:**

```sql
substring(string from pattern) -- using POSIX regular expressions
substring(string from pattern for escape_char)  -- using SQL regular expressions
```

Here is one example that uses POSIX regular expressions to extract any word that has 'ss' among its letters:

```sql
# select substring('Learning SQL is essential.' from '\w*ss\w*');
 substring
-----------
 essential
```

`substring()` with SQL regular expressions involves three parameters: the string to search, the pattern to match, and a delimiter defined after the `for` keyword. In the following example we look for a three and then seven letter words that starts with an 'S' and ends with an 'L':

```sql
# select substring('Learning SQL is essential.' from '%#"S_L#"%' for '#');
 substring
-----------
 SQL

# select substring('Do you pronounce it as SQL or SEQUEL?' from '%#"S____L#"%' for '#');
 substring
-----------
 SEQUEL
```

You may refer to the [PostgreSQL documentation](https://www.postgresql.org/docs/12/functions-matching.html) for more information on regular expression pattern matching.

How to Use substring() with RegEx to Extract a String

# How to Replace Substrings in PostgreSQL
The `replace()` function is used to change all occurrences of a certain substring to a new string. It accepts three parameters - the main string, the substring to be replaced, and the new string to be used.

```sql
select replace('This is old, really old', 'old', 'new');
```

```sql
 replace         
-------------------------
 This is new, really new
```

How to Replace Substrings

# How to Modify Arrays in PostgreSQL
## Overwriting an Array

The most basic way to modify an array column are to overwrite all values by assigning it a new array, or to specify an element to change.

Say you start off with these data:

```sql
player_number  |  round_scores   
---------------+------------------
         10002 | {91,92,93,95,99}
         10001 | {95,92,96,97,98}
```

```sql
-- overwrite all scores for a player
update player_scores set round_scores='{92,93,94,96,98}' where player_number=10002;

-- change only the score for the second round for player 10001
update player_scores set round_scores[2]=94 where player_number=10001;
```

After performing these commands, the updated data are:

```sql
 player_number |   round_scores   
---------------+------------------
         10002 | {92,93,94,96,98}
         10001 | {95,94,96,97,98}
```

## Prepend and Append to an Array

PostgreSQL has functions that offer more ways to modify arrays. First, use `array_prepend()` and `array_append()` to add one element to the start and to the end of an array, respectively:

```sql
update player_scores set round_scores = array_prepend(0, round_scores);

update player_scores set round_scores = array_append(round_scores, 100);
```

```sql
 player_number |      round_scores      
---------------+------------------------
         10002 | {0,92,93,94,96,98,100}
         10001 | {0,95,94,96,97,98,100}
```

## Concatenate Multiple Arrays

To add an array to another array, use `array_cat()`.

```sql
select array_cat('{1, 2}', ARRAY[3, 4]) as concatenated_arrays;
```

```sql
 concatenated_arrays
---------------------
 {1,2,3,4}
```

The `||` operator can be used as a much simpler alternative to `array_prepend()`, `array_append()` and `array_cat()`:

```sql
select 1 || array[2, 3, 4] as element_prepend;
```

```sql
 element_prepend
-----------------
 {1,2,3,4}
```

```sql
select array[1, 2, 3] || 4 as element_append;
```

```sql
 element_append
----------------
 {1,2,3,4}
```

```sql
select array['a', 'b', 'c'] || array['d', 'e', 'f'] as concat_array;
```

```sql
 concat_array  
---------------
 {a,b,c,d,e,f}
```

You can even add an array to a 2-dimensional array:

```sql
select array[1, 2] || array[[4, 5],[6, 7]] as concat_2d_array;
```

```sql
   concat_2d_array   
---------------------
 {{1,2},{4,5},{6,7}}
```

## Removal from an Array

`array_remove()` removes all elements that matches the second parameter.

```sql
select array_remove(round_scores,94) as removed_94 from player_scores;
```

```sql
     removed_94      
---------------------
 {0,92,93,96,98,100}
 {0,95,96,97,98,100}
```

Note: `array_remove()` removes ALL occurences of the matching values

```sql
select array_remove(ARRAY[1,2,3,2,5], 2) as removed_2s;
```

```sql
 removed_2s
------------
 {1,3,5}
```

## Replace Elements in an Array

`array_replace()` replaces all elements that matches the second parameter with the third parameter.

```sql
select array_replace(ARRAY[1,2,3,2,5], 2, 10) as two_becomes_ten;
```

```sql
 two_becomes_ten
-----------------
 {1,10,3,10,5}
```

## Fill an Array

`array_fill()` takes three parametes. `array_fill()` returns an array pre-filled by the value of the first parameter. The second parameter defines how many elements to initialize with the given value. The optional third parameter defines the starting position of the array (defaults to 1).

```sql
insert into player_scores (player_number, round_scores) values
	(10003, array_fill(95,array[5]));
```

_In other words, update the_ `_player_scores_` _table with a new record for player\_number 10003. All 5 of her scores will be 95._

```sql
insert into player_scores (player_number, round_scores) values
	(10004, array_fill(90,array[5],array[3]));
```

_Similarly, you update the_ `_player_scores_` _table with a new record for player\_number 10004. However, his 5 scores of 90, will begin in position 3 in the array._

```sql
 player_number |      round_scores      
---------------+------------------------
         10003 | {95,95,95,95,95}
         10004 | [3:7]={90,90,90,90,90}
```

To shows that the scores array for player 10004 started with element position 3, simply query:

```sql
select
  round_scores[1],
  round_scores[2],
  round_scores[3]
from player_scores
where player_number in (10003, 10004);
```

```sql
 player_number | round_scores | round_scores | round_scores
---------------+--------------+--------------+--------------
         10003 |           95 |           95 |           95
         10004 |              |              |           90
```

How to Modify Arrays

# How to Compare Arrays in PostgreSQL
The **equality operators** (`=`, `<>`) do an exact element-by-element comparison.

```sql
select
array[1,2,3] = array[1,2,4] as compare1, -- arrays are equal
array[1,2,3] <> array[1,2,4] as compare2; -- arrays are not equal
```

```sql
 compare1 | compare2
----------+----------
 f | t
```

The **ordering operators** (`>`, `<`, `>=`, `<=`) also compare each element in an array in order. Results are based on the first different pair of elements, not the sizes of the arrays.

```sql
select
array[1,2,5] >= array[1,2,4] as compare1,
array[1,2,5] <= array[1,2,4,5] as compare2;
```

```sql
 compare1 | compare2
----------+----------
 t | f
```

Then there are the **containment operators** (`@>`, `<@`). They are casually called "bird operators", well, because `@>` looks like a bird. An array is said to be contained in another array if each of its unique elements is also present in the other array.

```sql
-- This reads as array['a', 'b', 'c'] contains array['a', 'b', 'b', 'a']
select array['a', 'b', 'c'] @> array['a', 'b', 'b', 'a'] as contains;
```

```sql
 contains
----------
 t
```

```sql
-- this reads as array[1, 1, 4] is contained by array[4, 3, 2, 1]
select array[1, 1, 4] <@ array[4, 3, 2, 1] as is_contained_by;
```

```sql
 is_contained_by
-----------------
 t
```

Lastly, there is the **overlap operator** (`&&`). Arrays that have elements in common are called overlapping arrays. To check if two arrays overlap, use the `&&` operator:

```sql
select
array[1, 2] && array[2, 3] as overlap1,
array[1, 2] && array[3, 4] as overlap2;
```

```sql
 overlap1 | overlap2
----------+----------
 t | f
```

How to Compare Arrays

# How to Concatenate Strings in PostgreSQL
PostgreSQL offers two ways to concatenate strings. The first uses the `||` operator:

```sql
select 'Join these ' || 'strings with a number ' || 23;
```

```sql
 result               
-------------------------------------
 Join these strings with a number 23
```

You can see from above that PostgreSQL took care of transforming the number to a string to attach it to the rest. Note that you also need to consciously add spaces to make the string readable. For example, in the **customer** table of the Sakila database, to join the first and last names of the customers you have to add a single space in the concatenation:

```sql
select first_name||' '||last_name as customer_name from customer limit 5;
```

```sql
 customer_name   
------------------
 MARY SMITH
 PATRICIA JOHNSON
 LINDA WILLIAMS
 BARBARA JONES
 ELIZABETH BROWN
```

One disadvantage of using the `||` operator is a null value in any of the columns being joined together will result in a null value.

```sql
select 'Null with ||' || 'will make ' || 'everything disappear' || null;
```

```sql
 result
----------
```

Using `concat()` will transform the nulls into empty strings when concatenating:

```sql
select concat('Concat() handles', null, ' nulls better', null);
```

```sql
 concat            
-------------------------------
 Concat() handles nulls better
```

`concat()`accepts multiple parameters, separated by commas.

How to Concatenate Strings

# How to Convert the Case of a String in PostgreSQL
The most basic case conversion functions are `lower()` and `upper()`. Usage is pretty straightforward:

```sql
select lower('Turn this into lOweRCAse');
```

```sql
 lower           
--------------------------
 turn this into lowercase
```

```sql
select upper('capiTalize THis');
```

```sql
 upper      
-----------------
 CAPITALIZE THIS
```

Another useful case conversion function is `initcap()`, which capitalizes the first character of each word and lowers the case of everything else. This is very useful when printing proper nouns and titles and making names stored in uppercase more pleasing to read:

```sql
select
  first_name,
  last_name,
  initcap(concat(first_name, ' ', last_name)) as name
from customer
limit 5;
```

```sql
 first_name | last_name |       name       
------------+-----------+------------------
 MARY       | SMITH     | Mary Smith
 PATRICIA   | JOHNSON   | Patricia Johnson
 LINDA      | WILLIAMS  | Linda Williams
 BARBARA    | JONES     | Barbara Jones
 ELIZABETH  | BROWN     | Elizabeth Brown
```

How to Convert the Case of a String

# How to Create an Array in PostgreSQL
An array is a single data object that holds multiple values.

In PostgreSQL, you can create an array for any built-in or user-defined data type. However, an array can only contain one data type. This means you can have an array of strings, an array of integers, and the like, but you cannot have an array that has both integer and string types.

To create a column of an array type, the `[]` symbol is used. The following examples illustrate this:

```sql
create table contacts (
	first_name varchar,
	last_name varchar,
	phone_numbers varchar[]
);

create table player_scores (
	player_number integer,
	round_scores integer[]
);
```

PostgreSQL also allows multi-dimensional arrays by using multiple pairs of square brackets in the column definition. One requirement here, as you will see later, is that the inner dimensions must have the same array lengths. Here we create a two-dimensional array for the student scores:

```sql
create table student_scores (
	student_number integer,
	test_scores decimal[][]
);
```

To conform to the SQL standard, PostgreSQL also accepts the `ARRAY` keyword for declaring one-dimensional arrays. Shown below is an alternate way to create the contacts and player\_scores tables:

```sql
create table contacts (
	first_name varchar,
	last_name varchar,
	phone_numbers varchar array
);

create table player_scores (
	player_number integer,
	round_scores integer array[10]
);
```

Note the `array[10]` in player\_scores above. PostgreSQL allows you to specify an array size limit, whether you use the `datatype[]` or `datatype array[]` pattern. However, this is just to conform to the SQL standard. As of PostgreSQL 12.2, this is quietly ignored and is not enforced.

Arrays are typically used when a field can have multiple values for an entity, and the values are on their own "complete" and do not merit that they be put to another table for a one-to-many relationship.

How to Create an Array

# How to Insert Data Into an Array in PostgreSQL
There are two accepted syntaxes for inserting data to an array column. The first uses `ARRAY [value1, value2, etc]`:

```sql
insert into contacts (first_name, last_name, phone_numbers)
values ('John', 'Doe', ARRAY ['999-876-5432','999-123-4567']);

insert into player_scores (player_number, round_scores)
values (10001, ARRAY [95, 92, 96, 97, 98] );

-- multi-dimension arrays must have same array lengths for the inner dimensions
insert into student_scores (student_number, test_scores)
values (20001, ARRAY [[1, 95], [2, 94], [3, 98]]);
```

The second one uses single quotes and curly braces.

```sql
insert into contacts (first_name, last_name, phone_numbers)
values ('Bob', 'Parr', '{"555-INC-RDBL"}');

insert into player_scores (player_number, round_scores)
values (10002, '{91, 92, 93, 95, 99}' );

insert into student_scores (student_number, test_scores)
values (20002, '{{1, 96}, {2, 93}, {4, 97}}');
```

Note above that Bob Parr's phone number is now enclosed in double quotations, since we are using the single quotes to wrap the array. Also note that even if there is only one phone number, it still has to be inserted in array form.

How to Insert Data Into an Array

# How to Create a Table in PostgreSQL
Here's an example of creating a `users` table in PostgreSQL:

```sql
create table users (
  id serial primary key, -- Auto incrementing IDs
  name character varying, -- String column without specifying a length
  preferences jsonb, -- JSON columns are great for storing unstructured data
  created_at timestamp without time zone -- Always store time in UTC
);
```

This is also a chance to specify [not null constraints](https://popsql.com/learn-sql/postgresql/how-to-add-a-not-null-constraint-in-postgresql/) and [default values](https://popsql.com/learn-sql/postgresql/how-to-add-a-default-value-to-a-column-in-postgresql/):

```sql
create table users (
  id serial primary key,
  name character varying not null,
  active boolean default true
);
```

You can also create temporary tables that will stick around for the duration of your session. This is helpful to break down your analysis into smaller pieces.

```sql
-- Create a temporary table called `scratch_users` with just an `id` column
create temporary table scratch_users (id integer);

-- Or create a temporary table based on the output of a select
create temp table active_users
as
select * from users where active is true;
```

How to Create a Table

# How to Drop a Table in PostgreSQL
```sql
drop table funky_users;
```

How to Drop a Table

# How to Rename a Table in PostgreSQL
```sql
alter table events rename to events_backup;
```

How to Rename a Table

# How to Truncate a Table in PostgreSQL
Be very careful with this command. It will empty the contents of your PostgreSQL table. This is useful in development, but you'll rarely want to do this in production.

```sql
truncate my_table
```

If you have a serial ID column and you'd like to [restart its sequence](https://popsql.com/learn-sql/postgresql/how-to-alter-sequence-in-postgresql/) (ie restart IDs from `1`):

```sql
truncate my_table restart identity
```

How to Truncate a Table

# How to Duplicate a Table in PostgreSQL
Sometimes it's useful to duplicate a table:

```sql
create table dupe_users as (select * from users);

-- The `with no data` here means structure only, no actual rows
create table dupe_users as (select * from users) with no data;
```

How to Duplicate a Table

# How to Add a Column in PostgreSQL
Here's an example of adding a `created_at` timestamp column to your `users` table in PostgreSQL.

```sql
alter table users add column created_at timestamp without time zone;
```

Adding a string (varchar) column with a not null constraint:

```sql
alter table users add column bio character varying not null;
```

Adding a boolean column with a default value:

```sql
alter table users add column active boolean default true;
```

For reference, check out the [full list of Postgres data types](https://www.postgresql.org/docs/current/static/datatype.html#DATATYPE-TABLE).

How to Add a Column

# PostgreSQL: Change a Column Name
Here's an example of renaming a column in PostgreSQL:

```sql
alter table users rename column registered_at to created_at;
```

How to Change a Column Name

# How to Add a Default Value to a Column in PostgreSQL
```sql
-- Example: Orders have a default total of 0 cents
alter table orders alter column total_cents set default 0;

-- Example: Items are available by default
alter table items alter column available set default true;
```

How to Add a Default Value

# How to Remove a Default Value From a Column in PostgreSQL
Assuming `orders.total_cents` had a default value, this will drop the default for future inserts.

```sql
alter table orders alter column total_cents drop default;
```

How to Remove a Default Value From a Column

# How to Add a Not Null Constraint in PostgreSQL
Not null constraints are a great way to add another layer of validation to your data. Sure, you could perform this validation in your application layer, but shit happens: somebody will forget to add the validation, somebody will remove it by accident, somebody will bypass validations in a console and insert nulls, etc. If you're validating nulls on the database layer as well, you're protected 💪

```sql
alter table users alter column email set not null;
```

How to Add a Not Null Constraint

# How to Remove a Not Null Constraint in PostgreSQL
```sql
alter table users alter column email drop not null;
```

How to Remove a Not Null Constraint

# PostgreSQL: Add an Index
Having the right indexes are critical to making your queries performant, especially when you have large amounts of data. Here's an example of how to create an index in PostgreSQL:

```sql
create index concurrently "index_created_at_on_users"
on users using btree (created_at);
```

If you want to index multiple columns:

```sql
create index concurrently "index_user_id_and_time_on_events"
on events using btree (user_id, time);
```

Unique indexes to prevent duplicate data:

```sql
create unique index concurrently "index_stripe_event_id_on_stripe_events"
on stripe_events using btree(stripe_event_id);
```

Partial indexes to only index rows where a certain condition is met:

```sql
create index concurrently "index_active_users"
on users using btree(created_at) where active is true;
```

You can also have a _unique_ partial index. For example, imagine if each user can only have one active credit card:

```sql
-- This will prevent any user from having more than one active credit card
create unique index concurrently "index_active_credit_cards"
on credit_cards using btree(user_id) where active is true;
```

How to Add an Index

# How to Drop an Index in PostgreSQL
```sql
drop index index_created_at_on_users;
```

How to Drop an Index

# How to Create a View in PostgreSQL
```sql
create or replace view enriched_users as (
  select *
  from users
  inner join enrichments on enrichments.user_id = users.id
);
```

How to Create a View

# How to Drop a View in PostgreSQL
```sql
drop view enriched_users;
```

How to Drop a View

# PostgreSQL: Reset Sequence Command
If you have a serial ID column (ie auto incrementing ID), they'll start at 1 by default, but sometimes you may want them to start at a different number. These numbers are known as "sequences" and have their own designated table.

If you have a `users.id` column, you'll have a `users_id_seq` table. Some helpful columns in there are `start_value`, which will usually be `1`, and `last_value`, which _could_ be a fast way to see how many rows are in your table if you haven't altered your sequence or deleted any rows.

```sql
select * from users_id_seq;
```

```sql
 sequence_name | last_value | start_value | increment_by |      max_value      | min_value | cache_value | log_cnt | is_cycled | is_called
---------------+------------+-------------+--------------+---------------------+-----------+-------------+---------+-----------+-----------
 users_id_seq  |          1 |           1 |            1 | 9223372036854775807 |         1 |           1 |      32 | f         | t
(1 row)
```

To alter the sequence so that IDs start a different number, you can't just do an `update`, you have to use the `alter sequence` command.

```sql
alter sequence users_id_seq restart with 1000;
```

How to Reset Sequence

# How to Exclude Current or Partial Weeks in PostgreSQL
Let's say you have a simple query that [groups by week](https://popsql.com/learn-sql/postgresql/how-to-group-by-time-in-postgresql/) and looks back at the last 4 weeks:

```sql
select
  date_trunc('week', created_at), -- or hour, day, month, year
  count(1)
from users
where created_at > now() - interval '4 weeks'
group by 1;
```

If you ran this query midweek, say on a Wednesday, the current week would only return data from Monday through Wednesday (~3 days). Given the incomplete week, the last data point would look artificially low:

![query with dip](//images.ctfassets.net/iv1sg9nibjwl/2ESGPfVAGarGbPzzOf0pWd/9395c5666129f966596dd0702769a852/query_with_dip.png)

To avoid this dip (and the inevitable questions from your manager), use the `date_trunc()` function in the `where` clause:

```sql
select
  date_trunc('week', created_at),
  count(1)
from users
where date_trunc('week', created_at) != date_trunc('week', now())
and created_at > now() - interval '4 weeks'
group by 1;
```

You now omit any data from the current incomplete week, there's no more dip:

![query without dip](//images.ctfassets.net/iv1sg9nibjwl/58OFuSoEfNEcKNlBo94rEZ/2c6736c87246f44af155aecc2ea4e230/query_without_dip.png)

There's one more problem. If you ran this query mid-week, the starting point of your "look back period" would be in the middle of the week 4 weeks ago. To guard against incomplete weeks in the _beginning_ of your time range, `date_trunc()` can help again:

```sql
select
  date_trunc('week', created_at),
  count(1)
from users
where date_trunc('week', created_at) != date_trunc('week', now())
and created_at > date_trunc('week',now()) - interval '4 weeks'
group by 1;
```
Instead of looking back 4 weeks from `now()`, your query look backs 4 weeks from the _beginning of current week_. See the difference in below:

```sql
select now(); -- Result: 2020-02-05 19:38:26.423589+00
select date_trunc('week',now()); -- Result: 2020-02-03 00:00:00+00
```

How to Exclude Current or Partial Weeks

# How to Use BETWEEN Correctly in PostgreSQL
Be careful when using `BETWEEN` with timestamps. You might end up chopping off a whole day of data 😬

Imagine you were chief safety inspector at a local trampoline park (bonus points if that is your job in real life). You might write a query like this to get a report of accidents in December:

```sql
SELECT *
FROM accidents
WHERE created_at BETWEEN '2019-12-01' AND '2019-12-31'
```

Looks good, right? Nope.

This query would omit any mishaps the whole day of December 31. Why? **Your query only looks from midnight** on Dec 1 to midnight on Dec 31. Any bump, abrasion, or mid-air collision that occurred after midnight on the 31st won't be in your results. The query above is the same as:

```sql
SELECT *
FROM accidents
WHERE created_at >= '2019-12-01 00:00:00.000000'
AND created_at <= '2019-12-31 00:00:00.000000'
```

You can avoid this problem by writing the query:

```sql
SELECT *
FROM accidents
WHERE created_at >= '2019-12-01'
AND created_at < '2020-01-01'
```

The lesson: save `BETWEEN` for discrete quantities like integers. And stay away from trampoline parks. We’ve seen the data and it doesn't look pretty.

How to Use BETWEEN Correctly

# How to Query Date and Time in PostgreSQL
Get the date and time time right now:

```sql
select now(); -- date and time
select current_date; -- date
select current_time; -- time
```

Find rows between two absolute timestamps:

```sql
select count(1)
from events
where time between '2018-01-01' and '2018-01-31'
```

Find rows created within the last week:

```sql
select count(1)
from events
where time > now() - interval '1 week'; -- or '1 week'::interval, as you like
```

Find rows created between one and two weeks ago:

```sql
select count(1)
from events
where time between (now() - '1 week'::interval) and (now() - '2 weeks'::interval);
```

Extracting part of a timestamp:

```sql
select date_part('minute', now()); -- or hour, day, month
```

Get the day of the week from a timestamp:

```sql
-- returns 0-6 (integer), where 0 is Sunday and 6 is Saturday
select date_part('dow', now());

-- returns a string like monday, tuesday, etc
select to_char(now(), 'day');
```

Converting a timestamp to a unix timestamp (integer seconds):

```sql
select date_part('epoch', now());
```

Calculate the difference between two timesetamps:

```sql
-- Difference in seconds
select date_part('epoch', delivered_at) - date_part('epoch', shipped_at); -- or minute, hour, week, day, etc

-- Alternatively, you can do this with `extract`
select extract(epoch from delivered_at) - extract(epoch from shipped_at);
```

How to Query Date and Time

# How to Group by Time in PostgreSQL
When you want to group by minute, hour, day, week, etc., it's tempting to just group by your timestamp column, however, then you'll get one group per second, which is likely not what you want. Instead, you need to "truncate" your timestamp to the granularity you want, like minute, hour, day, week, etc. The PostgreSQL function you need here is `date_trunc`.

```sql
select
  date_trunc('minute', created_at), -- or hour, day, week, month, year
  count(1)
from users
group by 1
```

If you don't have new users every minute, you're going to have gaps in your data. To have one row per minute, even when there's no data, you'll want to use [generate\_series](https://popsql.com/learn-sql/postgresql/how-to-use-generate-series-to-avoid-gaps-in-data-in-postgresql/).

How to Group by Time

# How to Round Timestamps in PostgreSQL
Rounding/truncating timestamps are especially useful when you're [grouping by time](https://popsql.com/learn-sql/postgresql/how-to-group-by-time-in-postgresql/). The function you need here is `date_trunc`:

```sql
select date_trunc('second', now()) -- or minute, hour, day, month
```

How to Round Timestamps

How to Convert UTC to Local Time Zone

# How to Use nullif() in PostgreSQL
The `nullif()` function returns a null value, if a the value of the field/column defined by the first parameter equals that of the second. Otherwise, it will return the original value. Here's an example below:

```sql
select
  name,
  platform,
  nullif(platform,'Did not specify') as platform_mod
from users;
```

```sql
   name    |    platform     | platform_mod
-----------+-----------------+------------
 Steve     | Mac             | Mac
 Bill      | Windows         | Windows
 Linus     | Linux           | Linux
 Beth      | Did not specify |
```

Note that `nullif()` is only capable of replacing one value with null. If you need to replace multiple values, you can use the [CASE function](https://popsql.com/learn-sql/postgresql/how-to-write-a-case-statement-in-postgresql/).

```sql
select
  name,
  platform,
  case
    when platform = 'Mac' then null
    when platform = 'Windows' then null
    when platform = 'Linux' then null
    else platform
  end as platform_mod
from users;
```

```sql
   name    |    platform     | platform_mod
-----------+-----------------+------------
 Steve     | Mac             |
 Bill      | Windows         |
 Linus     | Linux           |
 Beth      | Did not specify | Did not specify
```

How to Use nullif()

# How to Use Lateral Joins in PostgreSQL
Once upon a time, my queries were a mess. I didn’t know how to use lateral joins, so I would copy-and-paste the same calculations over and over again in my queries.

_Co-workers were starting to talk._

Lateral joins allow you to reuse calculations, making your queries neat and legible. Let's learn about lateral joins by rewriting an atrocious query together.

## Data Set

We'll use a [cool sample dataset](https://popsql.com/blog/cool-sample-data) of real Kickstarter projects, if you'd like to follow along.

Relevant columns:

![data table](//images.ctfassets.net/iv1sg9nibjwl/3F0Cn2OhTutbvN29thKDRS/6462da6f00b96d6a7e2e213eac6566d8/data_table.png)

For each Kickstarter project, we want to calculate:

*   total pledged in USD
*   average pledge in USD
*   USD over or under goal
*   duration of the project in days
*   daily shortfall / surplus, _the extra USD needed daily to hit goal_

## Queries, Before and After

**Before:**

```sql
select
    (pledged / fx_rate) as pledged_usd,
    (pledged / fx_rate) / backers_count as avg_pledge_usd,
    (goal / fx_rate) - (pledged / fx_rate) as amt_from_goal,
    (deadline - launched_at) / 86400.00 as duration,
    ((goal / fx_rate) - (pledged / fx_rate)) / ((deadline - launched_at) / 86400.00) as usd_needed_daily
from kickstarter_data;
```

Without lateral joins, see how often I reuse the same calculations:

![repetitive computations](//images.ctfassets.net/iv1sg9nibjwl/5tiBwCsD5VfW8oB0woKuEO/92b5175fb600e9cf83e233286ef29022/repetitive_computations.png)

Yuck. Not only does this make the query difficult to read, it introduces risk of typos or other errors if I ever need to make an update.

**After:**

```sql
select
    pledged_usd,
    avg_pledge_usd,
    amt_from_goal,
    duration,
    (usd_from_goal / duration) as usd_needed_daily
from kickstarter_data,
    lateral (select pledged / fx_rate as pledged_usd) pu
    lateral (select pledged_usd / backers_count as avg_pledge_usd) apu
    lateral (select goal / fx_rate as goal_usd) gu
    lateral (select goal_usd - pledged_usd as usd_from_goal) ufg
    lateral (select (deadline - launched_at)/86400.00 as duration) dr;
```

With lateral joins, I can define the calculation just once. I can then reference those calculations in other parts of my query.

## What's happening?

The `lateral` keyword allows us to access columns after the `FROM` statement, and reference these columns "earlier" in the query ("earlier" meaning "written higher in the query").

SQL queries run in a different order than you might expect. In fact, `FROM` and `JOIN` are the first statements run. Therefore it's no problem to reference columns after the `FROM` statement.

![query order](//images.ctfassets.net/iv1sg9nibjwl/1Ll3rLDmU4J36jRbTlQfxg/eea51ce12cf5cb5ece843886a89e18fa/query_order.png)

Image Credit: [Julia Evans](https://twitter.com/b0rk/status/1179449535938076673)

Word of warning: stick to simple mathematical operations when writing lateral joins for calculations. Aggregate functions like `COUNT()`, `AVG()`, or `SUM()` are not supported.

Happy querying! 🍭

How to Use Lateral Joins

# How to Calculate Percentiles in PostgreSQL
Let's say we want to look at the percentiles for query durations. We can use PostgreSQL's `percentile_cont` function to do that:

```sql
select
  percentile_cont(0.25) within group (order by duration asc) as percentile_25,
  percentile_cont(0.50) within group (order by duration asc) as percentile_50,
  percentile_cont(0.75) within group (order by duration asc) as percentile_75,
  percentile_cont(0.95) within group (order by duration asc) as percentile_95
from query_durations
```

If we want to view those percentiles by day:

```sql
select
  day,
  percentile_cont(0.25) within group (order by duration asc) over (partition by day) as percentile_25,
  percentile_cont(0.50) within group (order by duration asc) over (partition by day) as percentile_50,
  percentile_cont(0.75) within group (order by duration asc) over (partition by day) as percentile_75,
  percentile_cont(0.95) within group (order by duration asc) over (partition by day) as percentile_95
from query_durations
group by 1
order by 1 asc
```

How to Calculate Percentiles

# How to Get the First Row per Group in PostgreSQL
Let's say we have an `events` table that belongs to a `user_id`, and we want to see the first event for each user for that day. The function we need here is `row_number`. It's got a tricky syntax that I always forget. Here's an example PostgreSQL query:

```sql
select
  *,
  row_number() over (partition by user_id order by created_at desc) as row_number
from events
where day = '2018-01-01'::date
```

This gives us all the event IDs for the day, plus their `row_number`. Since we only want the first event for the day, we only want rows that have `row_number: 1`. To do that, we can use a [common table expression](https://popsql.com/learn-sql/postgresql/how-to-write-a-common-table-expression-in-postgresql/):

```sql
with _events as (
  select
    *,
    row_number() over (partition by user_id order by created_at desc) as row_number
  from events
  where day = '2018-01-01'::date
)

select *
from _events
where row_number = 1
```

How to Get the First Row per Group

# How to Use generate_series to Avoid Gaps In Data in PostgreSQL
If you're [grouping by time](https://popsql.com/learn-sql/postgresql/how-to-group-by-time-in-postgresql/) and you don't want any gaps in your data, PostgreSQL's `generate_series` can help. The function wants three arguments: `start`, `stop`, and `interval`:

```sql
select generate_series(
  date_trunc('hour', now()) - '1 day'::interval, -- start at one day ago, rounded to the hour
  date_trunc('hour', now()), -- stop at now, rounded to the hour
  '1 hour'::interval -- one hour intervals
) as hour
```

```sql
          hour
------------------------
 2017-12-22 13:00:00-08
 2017-12-22 14:00:00-08
 2017-12-22 15:00:00-08
 2017-12-22 16:00:00-08
 2017-12-22 17:00:00-08
 ...
```

Now you can use a [common table expression](https://popsql.com/learn-sql/postgresql/how-to-write-a-common-table-expression-in-postgresql/) to create a table that has a row for each interval (ie each hour of the day), and then left join that with your time series data (ie new user sign ups per hour).

```sql
with hours as (
  select generate_series(
    date_trunc('hour', now()) - '1 day'::interval,
    date_trunc('hour', now()),
    '1 hour'::interval
  ) as hour
)

select
  hours.hour,
  count(users.id)
from hours
left join users on date_trunc('hour', users.created_at) = hours.hour
group by 1
```

How to Use generate_series to Avoid Gaps In Data

# How to Do Type Casting in PostgreSQL
Here are some examples of common types in PostgreSQL:

```sql
-- Cast text to boolean
select 'true'::boolean;

-- Cast float to integer
select 1.0::integer;

-- Cast integer to float
select '3.33'::float;
select 10/3.0; -- This will return a float too

-- Cast text to integer
select '1'::integer;

-- Cast text to timestamp
select '2018-01-01 09:00:00'::timestamp;

-- Cast text to date
select '2018-01-01'::date;

-- Cast text to interval
select '1 minute'::interval;
select '1 hour'::interval;
select '1 day'::interval;
select '1 week'::interval;
select '1 month'::interval;
```

How to Do Type Casting

# How to Write a Common Table Expression in PostgreSQL
Common table expressions (CTEs) are a great way to break up complex PostgreSQL queries. Here's a simple query to illustrate how to write a CTE:

```sql
with beta_users as (
  select *
  from users
  where beta is true
)

select events.*
from events
inner join beta_users on beta_users.id = events.user_id
```

You can find a real world example of a CTE in [How to Calculate Cumulative Sum/Running Total in PostgreSQL](https://popsql.com/learn-sql/postgresql/how-to-calculate-cumulative-sum-running-total-in-postgresql).

How to Write a Common Table Expression

# How to Import a CSV into PostgreSQL using Copy
Importing a CSV into PostgreSQL requires you to [create a table](https://popsql.com/learn-sql/postgresql/how-to-create-a-table-in-postgresql/) first. [Duplicating an existing table's structure](https://popsql.com/learn-sql/postgresql/how-to-duplicate-a-table-in-postgresql/) might be helpful here too.

The commands you need here are `copy` (executed server side) or `\copy` (executed client side). The former requires your database to be able to access the CSV file, which is rarely going to work for you in a production environment like Amazon RDS because you're not going to be uploading random CSV files to your database server. However, if you use `psql` on your local machine and you're connected to your remote database, you can use the latter `\copy` command, assuming your machine can access the CSV file.

In your terminal, let's open `psql`:

```sql
psql your_database_name # or postgres://username:password@amazonaws.com
```

Now it's time to use the `\copy` command:

```sql
-- Assuming you have already created an imported_users table
-- Assuming your CSV has no headers
\copy imported_users from 'imported_users.csv' csv;

-- If your CSV does have headers, they need to match the columns in your table
\copy imported_users from 'imported_users.csv' csv header;

-- If you want to only import certain columns
\copy imported_users (id, email) from 'imported_users.csv' csv header;
```

How to Import a CSV using Copy

# How to Compare Two Values When One Is Null in PostgreSQL
Imagine you're comparing two PostgreSQL columns and you want to know how many rows are different. No problem, you think:

```sql
select count(1)
from items
where width != height;
```

Not so fast. If some of the widths or heights are null, they won't be counted! Surely that wasn't your intention. That's where `is distinct from` comes into play:

```sql
select count(1)
from items
where width is distinct from height;
```

Now, your count will be "null aware" and you'll get the result you want 💥

How to Compare Two Values When One Is Null

# How to Use Coalesce in PostgreSQL
Say you're looking at a PostgreSQL integer column where some rows are null:

```sql
select
  day,
  tickets
from stats;
```

```sql
    day     | tickets
------------+-------
 2018-01-01 |     1
 2018-01-02 |   null
 2018-01-03 |     3
```

Instead of having that null, you might want that row to be `0`. To do that, use the `coalesce` function, which returns the first non-null argument it's passed:

```sql
select
  day,
  coalesce(tickets, 0)
from stats;
```

```sql
    day     | tickets
------------+-------
 2018-01-01 |     1
 2018-01-02 |     0
 2018-01-03 |     3
```

How to Use Coalesce

# How to Write a Case Statement in PostgreSQL
Case statements are useful when you're reaching for an if statement in your select clause.

```sql
select
 case
 when precipitation = 0 then 'none'
 when precipitation <= 5 then 'little'
 when precipitation > 5 then 'lots'
 else 'unknown'
 end as amount_of_rain
from weather_data;
```

How to Write a Case Statement

# How to Use Filter to Have Multiple Counts in PostgreSQL
Using `filter` is useful when you want to do multiple counts on a table:

```sql
select
  count(1), -- Count all users
  count(1) filter (where gender = 'male'), -- Count male users
  count(1) filter (where beta is true) -- Count beta users
  count(1) filter (where active is true and beta is false) -- Count active non-beta users
from users
```

How to Use Filter to Have Multiple Counts

# How to Calculate Cumulative Sum-Running Total in PostgreSQL
Let's say we want to see a hockey stick graph of our cumulative user sign ups by day in PostgreSQL. First, we'll need a table with a day column and a count column:

```sql
select
  date_trunc('day', created_at) as day,
  count(1)
from users
group by 1
```

```sql
         day         | count
---------------------+-------
 2018-01-01 00:00:00 |     10
 2018-01-02 00:00:00 |     10
 2018-01-03 00:00:00 |     10
```

Next, we'll write a [PostgreSQL common table expression (CTE)](https://popsql.com/learn-sql/postgresql/how-to-write-a-common-table-expression-in-postgresql/) and use a window function to keep track of the cumulative sum/running total:

```sql
with data as (
  select
    date_trunc('day', created_at) as day,
    count(1)
  from users
  group by 1
)

select
  day,
  sum(count) over (order by day asc rows between unbounded preceding and current row)
from data
```

How to Calculate Cumulative Sum-Running Total

# How to query a JSON column in PostgreSQL

## Querying JSON in PostgreSQL

PostgreSQL is a powerful relational database management system. One of its standout features is its ability to handle unstructured data by allowing you to store it in a JSON column. This means you can enjoy the benefits of a structured relational database while leveraging the flexibility of JSON for certain data types. Below are some common ways to 

## Using PostgreSQL JSON to query a column

### Retrieving a Specific JSON Key as Text

If you have a table named **<code>events</code> and you want to retrieve the value associated with the key <code>name</code> from the JSON column <code>params</code>, you can use the following query:

```sql
SELECT params->>'name' FROM events;
```

This will return the value of `params.name` as text from the `events` table.

### Filtering rows based on a specific JSON key value

If you want to find all events with a specific name, for instance, 'Click Button', you can use:

```sql
SELECT * FROM events WHERE params->>'name' = 'Click Button';
```

This will return all rows from the `events` table where the `name` key in the `params` JSON column has the value 'Click Button'.

### Accessing an element from a JSON array

If your JSON column contains arrays and you want to retrieve the first element (index 0) of the array associated with the key `ids` from the `params` column, you can use:

```sql
SELECT params->'ids'->0 FROM events;
```

This will return the first element of the `ids` array from the `params` column in the `events` table.

### Filtering rows based on a nested JSON key

Sometimes, your JSON might have nested structures. For instance, if you have a table named `users` with a JSON column `preferences` and you want to find users where the nested key `beta` is set to true, you can use:

```sql
SELECT preferences->'beta' FROM users WHERE (preferences->>'beta')::boolean IS TRUE;
```

This query first type casts the value of `preferences.beta` from JSON to boolean and then filters the rows where it's true.

### Querying a JSONb column in PostgreSQL

In PostgreSQL, `jsonb` is a data type used to store JSON (JavaScript Object Notation) data in a more efficient and optimized binary format. It is an extension of the json data type. jsonb stands for **JSON binary**. It provides several advantages over the standard json type, especially when it comes to querying and indexing JSON data.

You can query` jsonb` columns using various JSON functions and operators provided by PostgreSQL. Some commonly used functions and operators include:

* `->`: Extracts a JSON element by key or array index.
* `->>`: Extracts a JSON element as text.
* `#>`: Extracts a JSON sub-object at a specified path.
* `#>>`: Extracts a JSON sub-object as text.
* `@>`: Checks if a JSON document contains another JSON document.
* `&lt;@`: Checks if a JSON document is contained within another JSON document.
* `jsonb_array_elements()`: Expands a JSON array into a set of rows.
* `jsonb_each()`: Expands a JSON object into key-value pairs.

Suppose you have a table called employees with a `jsonb` column named` employee_data`.

```sql
-- Extract the employee's name
SELECT employee_data->>'name' AS employee_name
FROM employees;

-- Check if the employee has a skill in "Sales"
SELECT *
FROM employees
WHERE employee_data->'skills' @> '["Sales"]';

-- Find employees in the "Marketing" department
SELECT *
FROM employees
WHERE employee_data->>'department' = 'Marketing';
```

`jsonb` is a powerful tool for working with JSON data in PostgreSQL, especially when you need to query and manipulate complex JSON structures.

## Fixing issues in querying JSON columns

Troubleshooting JSON column querying in PostgreSQL can involve identifying and addressing issues related to data integrity, query performance, and syntax errors. Troubleshooting JSON column querying in PostgreSQL often requires a combination of SQL knowledge, understanding of JSON data structures, and careful query optimization. By addressing these common issues and best practices, you can improve your ability to work effectively with JSON data in PostgreSQL. Here are some common troubleshooting steps and potential issues to watch out for when querying JSON columns.

### Nested JSON structures

As shown in the tutorial, querying JSON columns is fairly straightforward. However, it can get a bit difficult to query nested JSON structures. It is important to use appropriate JSON operators and functions to navigate and query nested JSON objects and arrays. Functions like` ->`, `->>`, `#>`, and` #>>` can help access nested elements. The` ->` operator returns a JSON object, and` ->>` returns the value as text. By chaining these operators, you can navigate through nested JSON structures to retrieve the desired information.

### Incorrect JSON path

While it seems obvious, more often than you would want, specifying the wrong JSON path in your queries results in incorrect output or failed queries. Examples of such error messages are - `cannot extract elements from a scalar` or `JSON path not found`. Double-check the JSON path you're using in your queries, especially when dealing with nested structures. Use tools like JSON viewers to visualize the JSON structure.

### Error handling

Data quality is an industry wide problem. While we fight this issue daily, lack of error handling in queries can cause errors. Poor quality data results in random missing keys. A single missing key in a large query can disrupt query execution and raise an error. To ensure your queries don’t fail entirely for a few missing keys, implement error handling in your queries. To handle such situations more gracefully, you can use the `COALESCE` function or conditional logic to provide a default value when a JSON key is missing. Instead of complete failure, the query will then return `"Uncategorized"` and you still get the values for which keys are present.

### Database version

Some JSON functions and operators may not be available in older PostgreSQL versions.

Ensure that you're using a PostgreSQL version that supports the JSON functionality you need. Consider upgrading, if necessary.

### Performance bottlenecks

The reason for slow query performance can be inefficient JSON queries. To resolve this, profile your queries using tools like `EXPLAIN` to identify potential bottlenecks. Consider optimizing queries by creating appropriate indexes, rewriting queries, or denormalizing data where necessary. 

## Manage unstructured data like a pro

PostgreSQL's ability to seamlessly integrate structured relational data with the flexibility of unstructured JSON data offers developers a unique advantage. By understanding how to query JSON columns effectively, one can harness the full potential of Postgres, making it easier to manage, retrieve, and analyze diverse datasets. Whether you're accessing specific JSON keys, filtering based on specific values, or diving into nested structures, PostgreSQL provides the tools to do so with precision and efficiency. As data continues to evolve and become more complex, mastering these techniques will be invaluable for any database professional.

## Going beyond JSON querying

Not that you have queried JSON columns successfully, it is time to use the results to gain further insights. Filtering this data is one of the ways to refine the results of JSON querying. Take a look at the tutorial [How to Use Filter to Have Multiple Counts in PostgreSQL](https://popsql.com/learn-sql/postgresql/how-to-use-filter-to-have-multiple-counts-in-postgresql).

## FAQs

### 1. How do I extract a specific JSON key from a JSON column?

 You can use the ``->`` or` `->>`` operators to extract JSON keys.` `->`` returns the value as JSON, while` `->>`` returns the value as text.

### 2. Can I query nested JSON structures in PostgreSQL?
Yes, PostgreSQL allows you to query and extract data from nested JSON structures using nested `->` or `->>` operators.

### 3. How can I filter rows based on JSON data criteria?

 Use the `WHERE` clause to filter rows based on JSON data criteria. For example,` `WHERE json_column->>'key' = 'value'``.

### 4. What's the difference between the `json` and `jsonb` data types for querying JSON in PostgreSQL?

``jsonb`` is a binary JSON data type optimized for querying and indexing, while ``json`` is a plain text JSON data type. ``jsonb`` is recommended for querying JSON data.

### 5. How do I handle missing JSON keys or errors when querying JSON columns?

You can use the` `COALESCE`` function or conditional logic to provide default values or handle missing keys when querying JSON columns.

How to Query a JSON Column

# How to drop a column in PostgreSQL

## Dropping or deleting a column in PosgreSQL

Dropping or deleting a column is typically performed as part of a deliberate schema management process, data cleanup, or in response to changing business requirements or compliance needs. This is a significant and irreversible operation that should be carefully considered.

When a column is no longer needed or contains redundant information but consumes storage and affects query performance, it can be a good idea to drop it. Dropping a column simplifies the table structure, optimizes storage, and aids schema evaluation. If a column contains sensitive or **Personally Identifiable Information (PII)** and you need to comply with data privacy regulations like GDPR, you might need to drop the column to ensure data privacy and security.

## Instructions

Dropping a column is a fairly straightforward operation.

1. To drop a column in PostgreSQL, you can use the `ALTER TABLE` statement with the `DROP COLUMN` clause. Here's the basic syntax for dropping a column:

    ```sql
    ALTER TABLE table_name
    DROP COLUMN column_name;
    ```

2. You need to replace `table_name` with the name of the table that contains the column you want to drop, and `column_name` with the name of the column you wish to remove.

    Here's an example of how to drop a column from a table. Suppose you have a table named `employees` and you want to drop the `phone_number` column.

```sql ALTER TABLE employees DROP COLUMN phone_number; ```
## Errors while dropping a table

When attempting to drop a table in PostgreSQL, several errors can occur, especially if there are dependencies or issues related to permissions, active connections, or other factors. Here are common errors you might encounter when trying to drop a table in PostgreSQL:

### Table Does Not Exist

If the table you're trying to drop does not exist in the database, you'll receive an error like "table does not exist."

 ```sql
ERROR:  table "table_name" does not exist
```

### Dependency Exists

If there are dependencies on the table, such as foreign keys, views, or indexes, PostgreSQL won't allow you to drop the table until you've addressed these dependencies. You might see an error like:

 ```sql
ERROR:  cannot drop table "table_name" because other objects depend on it
DETAIL:  constraint "constraint_name" of table "referencing_table_name" depends on table "table_name"
```

In this case, you need to drop the dependent objects first, or you may use the `CASCADE` option with the `DROP TABLE` command to automatically remove dependent objects. However, be cautious when using `CASCADE` as it will remove all dependent objects without confirmation.

### Access Privileges

If you do not have the required permissions to drop the table, you'll receive an error like:

 ```sql
ERROR:  permission denied for table "table_name"
 ```

You should have the `DROP` privilege on the table to delete it.

### Active Connections

If there are active sessions or open transactions that are using the table, you won't be able to drop it until those sessions are closed or the transactions are completed. You might see an error indicating that there are active connections.

 ```sql
ERROR:  table "table_name" is being used by active queries
```

Ensure that there are no active transactions or connections using the table before attempting to drop it.

### Table in Use by Views or Rules

If the table is being used by views or rules, you'll encounter an error indicating that the table is in use.

 ```sql
ERROR:  cannot drop table "table_name" because it is being used by object "view_name"
```

You must first drop or modify the dependent views or rules before deleting the table.

### Table Contains Data

If the table contains data, PostgreSQL will require you to specify whether you want to delete the data (and the table) or just the table structure. You may receive an error like:

 ```sql
ERROR:  cannot drop table "table_name" because other objects depend on it

DETAIL:  constraint "constraint_name" of table "referencing_table_name" depends on table "table_name"

HINT:  Use DROP ... CASCADE to drop the dependent objects too.
```

In this case, you can either drop the data and table or use the `CASCADE` option to remove dependent objects.

### Recovery Mode

If PostgreSQL is in recovery mode, you won't be able to drop a table. You'll receive an error like:

 ```sql
ERROR:  cannot execute DELETE in a read-only transaction
```

Ensure that you're not in recovery mode when attempting to drop the table.

### Reserved Keywords

If your table name is a reserved keyword in PostgreSQL, you might encounter issues. It's generally a good practice to avoid using reserved keywords as table names.

 ```sql
ERROR:  syntax error at or near "select"
```

To successfully drop a table in PostgreSQL, make sure to address these potential errors by removing dependencies, granting necessary permissions, and ensuring there are no active connections or transactions using the table. Always exercise caution and back up your data before attempting to drop a table, as it is an irreversible operation.

## Dropping a table is permanent

It's important to emphasize that dropping a column is a potentially destructive operation. You should always perform a thorough analysis of the impact before proceeding as this operation cannot be undone. Consider the following:

* **Data Dependencies**: Check for any dependencies on the column, such as in views, indexes, triggers, or stored procedures. You may need to update or remove these objects.
* **Backup**: Ensure you have a recent backup of the data in case you need to restore it.
* **Application Impact**: Be aware of how dropping a column may affect the applications that rely on the table. You may need to modify your application code or queries.
* **Testing**: Consider testing the column drop in a development or staging environment to verify that it won't cause unexpected issues.

## Further table manipulation

Now that you know how to drop a column in PostgreSQL, let us explore how to add a column with this [next tutorial](https://popsql.com/learn-sql/postgresql/how-to-add-a-column-in-postgresql).

## FAQs

### 1. What happens to the data in the column when I delete it?

Deleting a column in PostgreSQL also removes all the data stored in that column. It is an irreversible operation, so make sure to back up your data before proceeding if you need to keep the information.

### 2. What if there are constraints or dependencies on the column I want to delete?

If there are constraints (e.g., foreign keys) or dependencies (e.g., views) that reference the column, you won't be able to delete it directly. You'll need to address these dependencies first, either by dropping or modifying the dependent objects.

### 3. Do I need specific privileges to delete a column?

Yes, you need the appropriate privileges to modify the table. Typically, you must have the `ALTER` privilege on the table to delete a column.

### 4. What is the impact of deleting a column on my application or queries?

Removing a column can impact any SQL queries, views, or applications that reference the deleted column. You will need to update your code and queries to accommodate the change, ensuring they no longer reference the deleted column.


How to drop a column

# How to Query Arrays in PostgreSQL
You can retrieve the contents of an array by specifying it in the `select` clause like any other column:

```sql
select
 first_name,
 last_name,
 phone_numbers
from contacts;
```

```sql
 first_name | last_name | phone_numbers 
------------+-----------+-----------------------------
 John | Doe | {999-876-5432,999-123-4567}
 Bob | Parr | {555-INC-RDBL}
```

You can also specify which element of an array to retrieve by specifying its position inside the square brackets. By default PostgreSQL uses 1 as the first position, though this can be overridden as shown in the `array_fill()` example. The example below shows the first round scores of a player:

```sql
select
 player_number,
 round_scores[1]
from player_scores;
```

```sql
 player_number | round_scores
---------------+--------------
 10001 | 95
 10002 | 91
```

You can also check against the value of a specific element in the `where` clause:

```sql
select *
from player_scores
where round_scores[1] >= 95;
```

```sql
 player_number | round_scores 
---------------+------------------
 10001 | {95,92,96,97,98}
```

To compare all elements of an array to a value, you can use `ANY/SOME` and `ALL` . `ANY` and its synonym `SOME` will return a row if at least one element satisfies the condition. `ALL` requires all elements to satisfy the condition for a row to be returned. See the examples below:

_Show records where there's at least one score above 95. This is best read as "where 95 is lower than ANY of the scores":_

```sql
select *
from player_scores
where 95 < any (round_scores);
```

```sql
 player_number | round_scores 
---------------+------------------
 10001 | {95,92,96,97,98}
 10002 | {91,92,93,95,99}
```

_Only show records where 92 is lower or equal to ALL the scores:_

```sql
select *
from player_scores
where 92 <= all (round_scores);
```

```sql
 player_number | round_scores 
---------------+------------------
 10001 | {95,92,96,97,98} 
```

Using `unnest()` expands an array to multiple rows. The non-array columns get repeated for each row.

```sql
select
 first_name,
 last_name,
 unnest(phone_numbers)
from contacts;
```

```sql
 first_name | last_name | unnest 
------------+-----------+--------------
 John | Doe | 999-876-5432
 John | Doe | 999-123-4567
 Bob | Parr | 555-INC-RDBL
```

How to Query Arrays

# How to Use string_agg() in PostgreSQL
`string_agg()` combines non-null values into one string, separated by the delimiter character that you specify in the second parameter. For example, in the Sakila database there's a city and a country table. If you want to show the available cities per country in one line, separated by commas:

```sql
select
  country,
  string_agg(city, ',') as cities
from country
join city using (country_id)
group by country
limit 4;
```

```sql
       country        |             cities             
----------------------+--------------------------------
 Thailand             | Songkhla,Nakhon Sawan,Pak Kret
 Faroe Islands        | Trshavn
 Bangladesh           | Jamalpur,Tangail,Dhaka
 United States        | Springfield, Springfield
```

Note that PostgreSQL does not assume a default delimiter. You need to specify it as the second parameter. Also note that like other aggregate functions, you must use the `group by` clause with `string_agg()`.

## Removing duplicates in our output string

Notice that in our results, there are multiple cities in the United States named "Springfield". If you want to omit any duplicates, simply add `distinct` in the first parameter:

```sql
select
  country,
  string_agg(distinct city, ',') as cities
from country
join city using (country_id)
group by country
limit 4;
```

```sql
       country        |             cities             
----------------------+--------------------------------
 Thailand             | Songkhla,Nakhon Sawan,Pak Kret
 Faroe Islands        | Trshavn
 Bangladesh           | Jamalpur,Tangail,Dhaka
 United States        | Springfield
```

## Ordering the contents within the output string

Perhaps you also want the output of the `string_agg()` function to be ordered alphabetically. You can specify the order using `order by` after the second parameter:

```sql
select
  country,
  string_agg(distinct city, ',' order by city asc) as cities
from country
join city using (country_id)
group by country
limit 4;
```

```sql
       country        |             cities             
----------------------+--------------------------------
 Thailand             | Nakhon Sawan,Pak Kret,Songkhla
 Faroe Islands        | Trshavn
 Bangladesh           | Dhaka,Jamalpur,Tangail
 United States        | Springfield
```

This can be helpful for particular long outputs, but the `order by` clause is optional.

How to Use string_agg()

# How to Round Timestamps in MySQL
Rounding or truncating timestamps are especially useful when you're [grouping by time](https://popsql.com/learn-sql/mysql/how-to-group-by-time-in-mysql/). If you are rounding by year or date, you can use the corresponding functions:

```sql
SELECT YEAR(now());  -- or DATE();
```

However, care must be done if you are grouping by months. Using `MONTH()` will, for example, make November 2018 and November 2017 both just translate to "11". If that is what you want, then you can use MONTH(). However, if you want to distinguish between months of different years, you need to use `DATE_FORMAT()`:

```sql
SELECT date_format(now(),'%Y-%m'); -- round to the month
SELECT date_format(now(),'%Y-%m-%d'); -- round to the day
SELECT date_format(now(),'%Y-%m-%d %H'); -- round to the hour
SELECT date_format(now(),'%Y-%m-%d %H:%i'); -- round to the minute
```

# Data Types in BigQuery

BigQuery supports all common data types found in Standard SQL. Google Cloud has [verbose documentation](https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types), but here it is presented short and sweet:

## Most Common

| **Name** | **Description** | **Storage Size** | **Note** |
| --- | --- | --- | --- |
| INT64 (Integer) | A whole number that is not a fraction. | 8 bytes | Range from -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807 |
| NUMERIC | A number with 38 digits of precision, 9 of which can be after the decimal point. | 16 bytes | Range from -99999999999999999999999999999.999999999 to 99999999999999999999999999999.999999999 |
| FLOAT64 (Float) | Double precision (approximate) decimal values. | 8 bytes | [Wikipedia will take this one.](https://en.wikipedia.org/wiki/Double-precision_floating-point_format) |
| BOOL (Boolean) | Represented by keywords TRUE and FALSE |  | Case insensitive so `TRUE` or `false` will work. |
| STRING | Variable-length character (Unicode) data. | | Must be quoted with either single (') or double (") quotation marks. Alternatively, can be triple-quoted with groups of three single (''') or three double (""") quotation marks. |
| BYTES | Variable-length binary data. | | Not to be used interchangeably with STRING.  
| BYTES operates on raw bytes rather than Unicode characters. | | | |
| ARRAY | Ordered list of zero or more elements of any non-ARRAY type. |  |  |

### Time Data Types

| **Name** | **Description** | **Canonical Format** | **Note** |
| --- | --- | --- | --- |
| DATE | Represents a logical calendar date, without time. | `YYYY-[M]M-[D]D` |BYTES operates on raw bytes rather than Unicode characters. Range from 0001-01-01 to 9999-12-31 |
| TIME | Represents a time, independent of a specific date. | `[H]H:[M]M:[S]S[.DDDDDD]` | Range from 00:00:00 to 23:59:59.999999 |
| DATETIME | Represents a year, month, day, hour, minute, second, and subsecond _without a timezone_. † | `YYYY-[M]M-[D]D[(|T)[H]H:[M]M:[S]S[.DDDDDD]]` | Range from 0001-01-01 00:00:00 to 9999-12-31 23:59:59.999999 |
| TIMESTAMP | Represents an absolute point in time, with microsecond precision _with a timezone_. | `YYYY-[M]M-[D]D[( |T)[H]H:[M]M:[S]S[.DDDDDD]][time zone]` | Range from 0001-01-01 00:00:00 to 9999-12-31 23:59:59.999999 |

† `DATETIME` is seldom used, as it's rare to wish to omit the timezone.

**How to read Canonical Format:**

*   `YYYY`: Four-digit year
*   `[M]M`: One or two digit month
*   `[D]D`: One or two digit day
*   `( |T)`: A space or a `T` separator
*   `[H]H`: One or two digit hour (valid values from 00 to 23)
*   `[M]M`: One or two digit minutes (valid values from 00 to 59)
*   `[S]S`: One or two digit seconds (valid values from 00 to 59)
*   `[.DDDDDD]`: Up to six fractional digits (microsecond precision)
*   `[time zone]`: String representing the time zone, with two canonical formats:
    *   [Time zone name](https://cloud.google.com/dataprep/docs/html/Supported-Time-Zone-Values_66194188) per the tz database
    *   Offset from Coordinated Universal Time (UTC), or the letter Z for UTC

## Less Common

| **Name** | **Description** | **Note** |
| --- | --- | --- |
| STRUCT | A container of ordered fields, each with a type (required) and field name (optional). | [Learn more](https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#struct-type) about STRUCT |
| GEOGRAPHY | A collection of points, lines, and polygons, which is represented as a point set, or a subset of the surface of the Earth. | Based on the Open Geospatial Consortium’s (OGC) [Simple Features specification (SFS)](https://www.opengeospatial.org/standards/sfs#downloads) |

Data Types

# Get Started with BigQuery and PopSQL in 5 Minutes

Grab your ⏱ and see if you can beat our 5 min record:

1. Log into your [existing Google Cloud account](https://console.cloud.google.com/home/dashboard) or create a [Google Cloud account for free](https://cloud.google.com/). Just be sure to use the Google Account you'll also use to create a PopSQL account.

2. Create a project within your Google Cloud account. Name it whatever you'd like.

  ![create test project bigquery google cloud console](//images.ctfassets.net/iv1sg9nibjwl/582GaqdASEJVe6WOC55UCX/e1d5c0a0bbc46e63691f54e9882092cf/create_test_project_bigquery_google_cloud_console.png)

3. [Download PopSQL](https://get.popsql.com/).

  > _You can technically use the Google BigQuery web UI, but PopSQL is a modern SQL editor built for teams that works on every platform, for all major databases._

4. Create a [new database connection in PopSQL for BigQuery.](https://popsql.com/docs/getting-started/connecting-to-bigquery/)

    * We recommend connecting via OAuth.
    * Use the same Google Account you used to make your Google Cloud Account.
    * Enter the `Project Name` you created in step 2 as your `BigQuery Project`

5. Want to start querying right away? In PopSQL you can reference any of BigQuery's [surprisingly cool free, public sample datasets](https://popsql.com/learn-sql/bigquery/accessing-bigquery-public-datasets/) 😎.

Get Started with BigQuery and PopSQL in 5 Minutes

# How to add index in MySQL

## Enhancing MySQL query performance with indexing

Indexes are essential tools in databases, ensuring efficient data retrieval. Indexing in MySQL plays a pivotal role in optimizing query performance, especially as your dataset grows. Without the appropriate indexes, you might experience high CPU usage on your database server, slow response times, and ultimately, dissatisfied users. Indexes are data structures that store a subset of the data in a table, organized in a way that allows the database management system to quickly locate and retrieve rows that match specific criteria.

## Getting started with indexing in MySQL

### Using the CREATE INDEX command

1. To add an index to a table in MySQL or create a table index, you can use the `CREATE INDEX` command. The basic syntax is:

        CREATE INDEX index_name ON table_name (column_name);

2. For instance, if you want to add an index to the `email` column in the `users` table, the command would be:

        CREATE INDEX email_idx ON users (email);

### Multiple-column index
1. Sometimes, you might need to add an index that spans multiple columns, especially if those columns are frequently used together in queries. A multiple-column index often performs better than several single-column indexes. The syntax is:

        CREATE INDEX index_name ON table_name (column1, column2);

2. If your `column1` and `column2` contain user IDs and Organization ID, this is how your query would look:

        CREATE INDEX user_id_and_org_id_idx ON users (user_id, org_id);

### Unique index
1. A unique index ensures that the indexed columns do not have duplicate values. This can be particularly useful for columns like email addresses, where uniqueness is crucial. The standard syntax for creating unique index is:

        CREATE UNIQUE INDEX index_name ON table_name (column_name);

2. To create unique index for the `email` column, with the index name `users_email_uq`, here’s the query:

        CREATE UNIQUE INDEX users_email_uq ON users (email);

### Partial index or filtered index

In some cases, you might want to index only a specific portion of a string column instead of an entire table. This type of indexing is particularly useful for indexing historical data, rare or extreme values, or indexing based on status.  For instance, you can create an index on the first 20 characters of a the `name` column that holds company names:
```sql
CREATE INDEX company_part_name_idx ON companies (name(20));
```

### Storage order in index

Starting with MySQL version 8.x, you can specify the storage order of a column in an index. This can be beneficial if you also need to display the column in a particular order. By default, the order is ascending. In the following example, we are changing the order to descending using `DESC`:
```sql
CREATE INDEX reverse_name_idx ON companies (name DESC);
```

### Functional key parts

1. MySQL versions 8.0.13 and above support functional key parts. Functional key parts allow you to create an index on a function of one or more columns rather than on the columns themselves. This feature can be particularly useful in scenarios where you want to index computed values, apply functions to columns, or use expressions in your queries:

        CREATE INDEX index_name ON table_name 
        (expression_function(column_name));

2. In the following example, the `idx_full_name` index is created on the result of the `CONCAT` function applied to the `first_name` and `last_name` columns.

        CREATE INDEX idx_full_name ON employees((CONCAT(first_name, ' ', last_name)));

## Identifying and resolve indexing issues in MySQL

Troubleshooting MySQL indexing issues can help improve the responsiveness and performance of your database queries. Remember to carefully plan and test any index changes in a development or staging environment before applying them to a production database. Indexing decisions should be based on your specific query patterns and use cases, and regular monitoring and maintenance are essential for maintaining optimal performance.

### Missing index

Missing indexes, refer to indexes that have not been created on columns frequently used in query conditions (e.g., in the `WHERE` clause) or in join conditions (e.g., in` JOIN` operations). When these indexes are absent, queries can become inefficient, leading to slower data retrieval and decreased overall database performance.
```sql
SELECT * FROM orders WHERE customer_id = 123;
```
To identify missing indexes, you can use `EXPLAIN` or `EXPLAIN ANALYZE`. For the above example, let us create an index on the `customer_id` column.
```sql
CREATE INDEX idx_customer_id ON orders(customer_id);
```

### Redundant index

Redundant indexes in MySQL refer to indexes that are created on the same set of columns as other existing indexes, providing little to no additional benefit in terms of query performance. These redundant indexes can lead to increased storage requirements, slower data modification operations, and additional maintenance overhead without improving the efficiency of database queries. Let us look at the following example where there are two indexes for the same column `column1`:
```sql
CREATE INDEX idx_column1 ON table1(column1);
CREATE INDEX idx_column1_column2 ON table1(column1, column2);
```
To resolve the above indexing issue, let us combine redundant indexes into a single, more efficient composite index (single index for `table1`):
```sql
DROP INDEX idx_column1 ON table1;
```

### Composite index order

Sometimes, the order of columns in a composite (multi-column) index does not match the query conditions or the order of columns used in a` JOIN` operation. This can lead to suboptimal query performance.
```sql
SELECT * FROM products WHERE category_id = 1 AND brand_id = 2;
```
Let us create a composite index with columns in the correct order to match the existing index order. Note that this may not always be feasible, especially if you have many queries using different column orders.
```sql
CREATE INDEX idx_category_brand ON products(category_id, brand_id);
```

### Low cardinality index

Low cardinality index is an index on a column that has a relatively small number of unique values compared to the total number of rows in a table. In other words, a low cardinality column has few distinct values, and many rows share the same value. Indexing a column with low cardinality may not improve query performance.
```sql
CREATE INDEX idx_status ON orders(status);
```
Consider whether indexing such a column is necessary or beneficial. In some cases, indexing a low cardinality column may not provide significant benefits, and it may be more efficient to focus on indexing columns with higher selectivity (i.e., columns with many distinct values) or optimizing query design in other ways. Careful consideration should be given to the specific use cases and query patterns in your database to determine whether indexing a low cardinality column is appropriate.

### Index fragmentation

Index fragmentation refers to a condition where the physical storage of index data becomes disorganized or inefficient over time due to data modifications such as` INSERT`,` UPDATE`, and `DELETE` operations. As data in a table changes, the corresponding indexes may become less efficient, leading to performance degradation in query execution.
```sql
OPTIMIZE TABLE your_table;
```
Regularly optimize tables to rebuild indexes and regain performance.

### Large index

Large indexes consume significant storage space and can slow down data modification operations. Evaluate whether such an index is necessary and consider the trade-offs.
```sql
CREATE INDEX idx_large_column ON table1(large_column);
```

### Over-indexing

Having too many indexes can increase maintenance overhead and slow down data modifications.
```sql
CREATE INDEX idx_column1 ON table1(column1);
CREATE INDEX idx_column2 ON table1(column2);
CREATE INDEX idx_column3 ON table1(column3);
```
Review the necessity of each index and remove redundant or unused ones.

### Covering index or index-only query

Covering index is a type of database index that includes all the columns required to fulfill a specific query without the need to access the actual data rows in the table. In other words, a covering index "covers" a query by including all the information needed in the index itself, allowing the database engine to satisfy the query directly from the index structure. This can significantly improve query performance.
```sql
SELECT name, email FROM customers WHERE registration_date >= '2023-01-01';
```
Create a covering index that includes all the columns required by the query.
```sql
CREATE INDEX idx_registration_date ON customers(registration_date, name, email);
```

## MySQL indexes for database optimization

Creating and managing indexes in MySQL is a crucial aspect of database optimization. By understanding and utilizing the various index types and options available, you can ensure efficient data retrieval and optimal database performance. Always remember to monitor and adjust your indexes as your data and query patterns evolve. With this tutorial, you can now create index in MySQL easily and troubleshoot any indexing errors you may encounter.

## Towards next steps

Now that you know how to create indexes, if you want to drop an index, go to the next tutorial, [How to Drop an Index in MySQL using Drop Index and Alter Table Commands](https://popsql.com/learn-sql/mysql/how-to-drop-an-index-in-mysql)

## FAQs

### What is an index in MySQL?

An index in MySQL is a database structure that provides a quick and efficient way to look up rows in a table based on the values in one or more columns. Indexes enhance query performance by reducing the need for full table scans when filtering, sorting, or joining data.

### What types of indexes are available in MySQL?

MySQL supports various types of indexes, including single-column indexes, composite (multi-column) indexes, unique indexes, full-text indexes for text-based searches, and spatial indexes for geospatial data. The choice of index type depends on your specific use case and query patterns.

### When should I use indexes in MySQL?

You should use indexes in MySQL when you frequently query a table based on specific columns in `WHERE` clauses, `JOIN` conditions, or `ORDER BY` clauses. Indexes are most beneficial for read-heavy workloads but should be carefully selected to avoid over-indexing and index maintenance overhead.

### What are some common mistakes to avoid when using indexes in MySQL?

Common mistakes when using indexes include over-indexing (creating too many indexes), not monitoring index usage, neglecting index maintenance, using low cardinality columns for indexing, and not considering the order of columns in composite indexes. Proper indexing strategy and regular monitoring are crucial to avoid these pitfalls.

### How do I check if my queries are using indexes in MySQL?

You can use the `EXPLAIN` statement before your query to view the query execution plan, which provides information about which indexes are used (or not used) in the query. For example:
```sql
EXPLAIN SELECT * FROM table_name WHERE column_name = 'value';
```
Additionally, MySQL provides tools and commands for monitoring index usage, such as the MySQL Performance Schema and the `SHOW INDEX` statement.

How to use SQL Pivot

# How to Insert in MySQL
## Basic

The simplest way to insert a row in MySQL is to use the `INSERT INTO` command and specify values for all columns. If you have 10 columns, you have to specify 10 values and they have to be in order how the table was defined:

```sql
-- Assuming the users table has only three columns: first_name, last_name, and email, and in that order
INSERT INTO users VALUES ('John', 'Doe', 'john@doe.com');
```

## Specifying a Column List

It's optional, but specifying a column list before the `VALUES` keyword is highly recommended:

```sql
INSERT INTO users (first_name, last_name, email, birth_date, city, state)
VALUES ('John', 'Doe', 'john@doe.com','2000-01-01','Los Angeles','CA');
```

Having a column list has the following advantages:

*   You don't have to remember the column order as defined in the table.
*   You don't have to specify a value for all columns, just the required ones.
*   In case there are many columns, it is easier to match a value to the column it's intended for when you see it in the statement, rather than having to look at the table definition.
*   `INSERT` statements without a column lists are invalidated once a column is added or removed from the table. You need to modify your query to reflect the new or deleted column in order for them to work again.

## Using the SET Keyword

Although it is not frequently used, MySQL also allows the `SET` keyword in the `INSERT` statement. This is useful when there are many columns to be inserted because it's easier to read:

```sql
INSERT INTO users SET
  first_name='Vincent',
  last_name='Aviles',
  birth_date='1973-08-14',
  profession='Artist',
  educational_attainment='College Degree',
  city='Daly City',
  current_points=543,
  date_registred='2018-11-30';
```

The limitation of using the `SET` keyword is it can only be used to insert one row at a time.

## Inserting Multiple Rows

You can insert multiple rows in one `INSERT` statement by having multiple sets of values enclosed in parentheses:

```sql
INSERT INTO users (first_name, last_name)
VALUES
  ('John','Lennon'),
  ('Paul','McCartney'),
  ('George','Harrison'),
  ('Ringo','Starr');
```

You can also use `INSERT` with a `SELECT` command to copy data from an existing table. Note the `VALUES` keyword is omitted:

```sql
INSERT INTO beta_users (first_name, last_name)
SELECT first_name, last_name
FROM users
where beta = 1;
```

## Inserting JSON Values

Starting with version 5.7.8, MySQL supports JSON data types. If you want to insert into a JSON column, just wrap the valid JSON in a single quoted string:

```sql
INSERT INTO test_json
VALUES ('{"beta": true, "status": "for review", "test_count": 1}');
```

We also have a tutorial on [querying JSON columns](https://popsql.com/learn-sql/mysql/how-to-query-a-json-column-in-mysql/).

## Handling Conflicts/Duplicates

If inserting a row would violate a unique constraint, there are different ways to handle it, depending on your requirements.

Using `INSERT IGNORE` will quietly discard the row to be inserted:

```sql
INSERT IGNORE INTO products (product_id, product_name, stocks)
VALUES (1, 'VPN Product 1', 50);
```

See it in action:

```sql
mysql> create table products
    -> (product_id int not null primary key, product_name varchar(100), stocks int);
Query OK, 0 rows affected (0.01 sec)

mysql> INSERT IGNORE INTO products (product_id, product_name, stocks)
    -> VALUES (1, 'VPN Product 1', 50);
Query OK, 1 row affected (0.01 sec)

mysql> select * from products;
+------------+---------------+--------+
| product_id | product_name  | stocks |
+------------+---------------+--------+
|          1 | VPN Product 1 |     50 |
+------------+---------------+--------+
1 row in set (0.00 sec)

mysql> INSERT IGNORE INTO products (product_id, product_name, stocks)
    -> VALUES (1, 'VPN Product 1', 40);
Query OK, 0 rows affected, 1 warning (0.00 sec)

mysql> select * from products;
+------------+---------------+--------+
| product_id | product_name  | stocks |
+------------+---------------+--------+
|          1 | VPN Product 1 |     50 |
+------------+---------------+--------+
1 row in set (0.00 sec)
```

Note that the stocks value did not change.

If you want to do an "upsert" (update the row it exists, otherwise insert), you can use `INSERT... ON DUPLICATE KEY UPDATE`:

```sql
INSERT INTO products (product_id, product_name, stocks)
VALUES (1, 'VPN Product 1', 45)
ON DUPLICATE KEY UPDATE stocks = VALUES(stocks);
```

In action:

```sql
mysql> INSERT INTO products (product_id, product_name, stocks)
    -> VALUES (1, 'VPN Product 1', 45)
    -> ON DUPLICATE KEY UPDATE stocks=VALUES(stocks);
Query OK, 2 rows affected (0.00 sec)

mysql> select * from products;
+------------+---------------+--------+
| product_id | product_name  | stocks |
+------------+---------------+--------+
|          1 | VPN Product 1 |     45 |
+------------+---------------+--------+
1 row in set (0.00 sec)
```

MySQL also has the `REPLACE` keyword. When it detects a duplicate, it will delete the conflicting row and insert the new one.

```sql
REPLACE INTO products (product_id, product_name, stocks)
VALUES (1, 'VPN Product 1', 45)
ON DUPLICATE KEY UPDATE stocks = VALUES(stocks);
```

Be careful when using `REPLACE` vs `INSERT ON DUPLICATE KEY` because `REPLACE` deletes the entire row, so it affects foreign keys and can produce unexpected results if the table uses `AUTO_INCREMENT` to assign a value.

# How to Update in MySQL
To update all rows in a MySQL table, just use the `UPDATE` statement without a `WHERE` clause:

```sql
UPDATE products SET stocks=100;
```

You can also update multiple columns at a time:

```sql
UPDATE products SET stocks=100, available=true;
```

Usually you only want to update rows that match a certain condition. You do this by specifying a `WHERE` clause:

```sql
--This will update only one row that matches product_id=1
UPDATE products SET stocks=100, available=true
WHERE product_id=1;

--This will update multiple rows that match Category='Electronics'
UPDATE products SET stocks=50, available=true
WHERE category='Electronics';
```

# How to Delete in MySQL
To delete rows in a MySQL table, use the `DELETE FROM` statement:

```sql
DELETE FROM products WHERE product_id=1;
```

The `WHERE` clause is optional, but you'll usually want it, unless you really want to delete every row from the table.

# How to Create a Table in MySQL
Here's an example of creating a `users` table in MySQL:

```sql
CREATE TABLE users (
  id INTEGER NOT NULL AUTO_INCREMENT PRIMARY KEY, -- Auto incrementing IDs
  name VARCHAR(100), -- String column of up to 100 characters
  preferences JSON, -- JSON columns are great for storing unstructured data and are supported starting MySQL version 5.7.8
  created_at TIMESTAMP -- Always store time in UTC
);
```

Within the parentheses are called column definitions separated by commas. The minimum required fields for a column definition are column name and data type, which is what is shown above for columns `name`, `preferences`, and `created_at`. The id column has extra fields to identify it as the primary key column and use an auto-incrementing feature to assign it values.

This is also a chance to specify [not null constraints](https://popsql.com/learn-sql/mysql/how-to-add-a-not-null-constraint-in-mysql/), [default values](https://popsql.com/learn-sql/mysql/how-to-add-a-default-value-to-a-column-in-mysql/), and an optional `ENGINE` keyword:

```sql
CREATE TABLE users (
  id INTEGER AUTO_INCREMENT PRIMARY KEY,
  name VARCHAR(100) not null,
  active boolean default true
) ENGINE=INNODB;
```

Storage engines are MySQL components that handle the SQL operations for different table types. This allows developers to extend capabilities of MySQL. When not specified, the default engine used is INNODB.

# How to Delete a Table in MySQL
Dropping a table in MySQL is simple:

```sql
DROP TABLE inactive_users;
```

Be careful - there's no "undo" for this!

How to Delete a Table

# MySQL: Rename a Table with Alter Table or Rename Table Command
MySQL offers two ways to rename tables. The first one uses the `ALTER TABLE` syntax:

```sql
ALTER TABLE old_table_name RENAME new_table_name;
```

The second way is to use `RENAME TABLE`:

```sql
RENAME TABLE old_table_name TO new_table_name;
```

`RENAME TABLE` offers more flexibility. It allows renaming multiple tables in one statement. This can be useful when replacing a table with a new pre-populated version:

```sql
RENAME TABLE products TO products_old, products_new TO products;
```

The above statement is executed left to right, so there's no conflict naming `products_new` to `products` since the existing table has already been renamed to `products_old`. Furthermore, this is done atomically.

# How to Truncate a Table in MySQL
Be very careful with this command. It will empty the contents of your MySQL table and there is no undo. This is useful in development, but you'll rarely want to do this in production.

```sql
TRUNCATE TABLE table_name;

--the TABLE keyword is actually optional:
TRUNCATE table_name;
```

If the table contains an `AUTO_INCREMENT` column, the counter for it will also get reset. Note that this behavior is different vs other databases where you need to reset the counter yourself.

# How to Duplicate a Table in MySQL
You can duplicate or "clone" a table's contents by executing a `CREATE TABLE ... AS SELECT` statement:

```sql
CREATE TABLE new_table AS SELECT * FROM original_table;
```

Please be careful when using this to clone big tables. This can take a lot of time and server resources.

Note also that `new_table` inherits ONLY the basic column definitions, null settings and default values of the `original_table`. It does not inherit indexes and auto\_increment definitions.

To inherit all table definitions, use the `CREATE TABLE... LIKE` syntax:

```sql
CREATE TABLE new_table LIKE original_table;
```

This makes the structure of `new_table` exactly like that of `original_table`, but DOES NOT copy the data. To copy the data, you'll need `INSERT ... SELECT`:

```sql
INSERT INTO new_table SELECT * FROM original_table;
```

Again, be careful when doing this to big tables.

# How to Add a Column to MySQL Table
Adding a column in MySQL involves using the `ALTER TABLE` command.

Here's an example of adding a `created_at` datetime column to your `users` table:

```sql
ALTER TABLE users ADD created_at DATETIME;
```

Adding a string (varchar) column with a not null constraint:

```sql
ALTER TABLE users ADD bio VARCHAR(100) NOT NULL;
```

Adding a boolean column with a default value:

```sql
ALTER TABLE users ADD active BOOLEAN DEFAULT TRUE;
```

MySQL offers extensive documentation on supported datatypes in their [documentation](https://dev.mysql.com/doc/refman/8.0/en/data-types.html).

# How to Remove a Column in MySQL
Dropping a column in MySQL involves using the `ALTER TABLE` command. The typical syntax is as follows:

```sql
ALTER TABLE table_name DROP COLUMN column_name;

--The COLUMN keyword is actually optional
ALTER TABLE table_name DROP column_name;
```

How to Remove a Column

# How to Change a Column Name in MySQL
## MySQL 5.6.x and 5.7.x

Renaming a column in MySQL involves using the `ALTER TABLE` command. For MySQL version 5.6 .x and 5.7.x, the typical syntax is as follows:

```sql
ALTER TABLE table_name CHANGE old_column_name new_column_name <column definition>;

ALTER TABLE products CHANGE product_name product_full_name VARCHAR(100) NOT NULL;
```

Note that you MUST restate the full column definition, otherwise undeclared attributes will go back to default. For example, not stating `NOT NULL` will result in the column allowing NULLS.

To ensure that you do not miss anything, you can use the `SHOW CREATE TABLE` command to see the full column definition:

```sql
mysql> SHOW CREATE TABLE products\G
*************************** 1. row ***************************
 Table: products
Create Table: CREATE TABLE `products` (
 `product_id` bigint(20) NOT NULL,
 `product_name` varchar(100) NOT NULL,

(The rest of the output is truncated for brevity)
```

Then use that as basis for the `ALTER TABLE` command.

## MySQL 8.0

While MySQL 8.0 accepts the above syntax, it also support an easier way:

```sql
ALTER TABLE products RENAME COLUMN product_name TO product_full_name;
```

This is a lot easier since there's no longer a need to restate the full column definition. But if you need to change both the column name and something in the definition, you can use the `ALTER TABLE ... CHANGE` command to do it in one go.

# How to Set a Column with Default Value in MySQL with Alter Table Command
To add a default value to a column in MySQL, use the `ALTER TABLE ... ALTER ... SET DEFAULT` command:

```sql
--Example: Products have a default stock of 0
ALTER TABLE products ALTER COLUMN stocks integer SET DEFAULT 0;

--Example: Products are available by default (removed optional COLUMN keyword)
ALTER TABLE products ALTER available SET DEFAULT true;
```

How to Set a Column with Default Value

# How to Remove a Default Value to a Column in MySQL
To remove a default value to a column in MySQL, use the `ALTER TABLE ... ALTER ... DROP DEFAULT` command:

```sql
--Example: Products have a default stock of 0
ALTER TABLE products ALTER COLUMN stocks integer DROP DEFAULT;

--Example: Products are available by default (removed optional COLUMN keyword)
ALTER TABLE products ALTER available DROP DEFAULT;
```

How to Remove a Default Value to a Column

# How to Add a Not Null Constraint in MySQL
Not null constraints are a great way to add another layer of validation to your data. Sure, you could perform this validation in your application layer, but shit happens: somebody will forget to add the validation, somebody will remove it by accident, somebody will bypass validations in a console and insert nulls, etc. The only way to really be sure is to enforce it in your column definition. If you're validating nulls on the database layer as well, you're protected 💪

To enforce `NOT NULL` for a column in MySQL, you use the `ALTER TABLE .... MODIFY` command and restate the column definition, adding the `NOT NULL` attribute.

```sql
--Example: Products have a default stock of 0
ALTER TABLE products MODIFY stocks INT NOT NULL;
```

Note that you MUST restate the full column definition, otherwise undeclared attributes will go back to default settings. For example, not restating the DEFAULT clause will unset the default value.

To ensure that you do not miss anything, you can use the `SHOW CREATE TABLE` command to see the full column definition:

```sql
mysql> SHOW CREATE TABLE products\G
*************************** 1. row ***************************
       Table: products
Create Table: CREATE TABLE `products` (
  `product_id` bigint(20) NOT NULL,
  `product_name` varchar(100) DEFAULT '',
  `stocks` int(11) DEFAULT '0',

(The rest of the output is truncated for brevity)
```

Use the current definition and add `NOT NULL` for the correct modification:

```sql
ALTER TABLE products MODIFY stocks INT NOT NULL DEFAULT 0;
```

# How to Remove a Not Null Constraint in MySQL
To remove a `NOT NULL` constraint for a column in MySQL, you use the `ALTER TABLE .... MODIFY` command and restate the column definition, removing the `NOT NULL` attribute.

```sql
--Example: Products have a default stock of 0
ALTER TABLE products MODIFY stocks INT;
```

Note that you MUST restate the full column definition, otherwise undeclared attributes will go back to default settings. For example, not restating the DEFAULT clause will unset the default value.

To ensure that you do not miss anything, you can use the `SHOW CREATE TABLE` command to see the full column definition:

```sql
mysql> SHOW CREATE TABLE products\G
*************************** 1. row ***************************
       Table: products
Create Table: CREATE TABLE `products` (
  `product_id` bigint(20) NOT NULL,
  `product_name` varchar(100) DEFAULT '',
  `stocks` int(11) NOT NULL DEFAULT '0',

(The rest of the output is truncated for brevity)
```

Use the current definition and remove the NOT NULL for the correct modification:

```sql
ALTER TABLE products MODIFY stocks INT DEFAULT 0;
```

# How to Drop an Index in MySQL using Drop Index and Alter Table Commands
To drop a non-primary key index, use the `DROP INDEX` command:

```sql
DROP INDEX index_name ON table_name;
```

The syntax requires the table name to be specified because MySQL allows index names to be reused on multiple tables.

Primary keys in MySQL are always named `PRIMARY` (not case sensitive). But because `PRIMARY` is a reserved keyword, backticks are required when you use it in the `DROP INDEX` command.

```sql
--Enclose PRIMARY in backticks to refer to the name, not the reserved word
DROP INDEX `PRIMARY` ON table_name;
```

Alternatively, MySQL also allows to drop indexes using the `ALTER TABLE` command:

```sql
-- Drop a non-primary key index
ALTER TABLE table_name DROP INDEX index_name;

-- Drop the primary key. This time we are using the reserved words so no backticks
ALTER TABLE table_name DROP PRIMARY KEY;
```

# How to Create a View in MySQL
Views allow to encapsulate or "hide" complexities, or allow limited read access to part of the data.

To create a view, use the `CREATE VIEW` command:

```sql
CREATE OR REPLACE VIEW view_name AS <select statement>;
```

While optional, the `OR REPLACE` part is frequently used so the the view is updated if already exists.

Some examples:

```sql
-- A view to show only beta users
CREATE VIEW beta_users_vw AS
SELECT * FROM users WHERE beta = 1;

-- A view to limit read access to only certain columns
CREATE VIEW users_basic_vw AS
SELECT first_name, last_name, telephone_number
FROM users;

-- A view for management so they only need to do a "SELECT * FROM top_20_customers_vw" instead of learning a complex SQL
CREATE OR REPLACE VIEW top_20_customers_vw AS
SELECT c.customer_name, sum(p.price*od.quantity) order_total
FROM customers c
JOIN orders o USING (customer_id)
JOIN order_details od USING (order_id)
JOIN products p USING (product_id)
GROUP BY c.customer_name
ORDER BY order_total DESC
LIMIT 20;
```

You may notice all view names above end in `_vw`. This is an example convention some developers adopt to easily distinguish views from tables.

# How to Drop a View in MySQL
To drop a view, use the `DROP VIEW` command:

```sql
DROP VIEW view_name;
```

# How to Alter Sequence in MySQL
Auto incrementing columns in tables start at 1 by default, but sometimes you may want them to start at a different number. These numbers are known as "sequences" in other databases but are implemented differently in MySQL. An auto incrementing column is part of table definition and is modified using the `ALTER TABLE` command.

First, you can check for the next value using the `SHOW CREATE TABLE COMMAND`:

```sql
mysql> SHOW CREATE TABLE users\G;
*************************** 1. row ***************************
 Table: users
Create Table: CREATE TABLE `users` (
 `user_id` int(11) NOT NULL AUTO_INCREMENT,
 <<other lines omitted for brevity>>
 PRIMARY KEY (`user_id`)
) ENGINE=InnoDB AUTO_INCREMENT=2
```

To change it:

```sql
ALTER TABLE users AUTO_INCREMENT=1000;
```

How to Alter Sequence

# How to Query Date and Time in MySQL

MySQL has the following functions to get the current date and time:

```sql
SELECT now();  -- date and time
SELECT curdate(); --date
SELECT curtime(); --time in 24-hour format
```

To find rows between two dates or timestamps:

```sql
SELECT *
FROM events
where event_date between '2018-01-01' and '2018-01-31';

-- Can include time by specifying in YYYY-MM-DD hh:mm:ss format:
SELECT *
FROM events
WHERE event_date BETWEEN '2018-01-01 12:00:00' AND '2018-01-01 23:30:00';
```

To find rows created within the last week:

```sql
SELECT *
FROM events
WHERE event_date > date_sub(now(), interval 1 week);
```

There's also `DATE_ADD()`. For example, to find events scheduled between one week ago and 3 days from now:

```sql
SELECT *
FROM events
WHERE event_date BETWEEN date_sub(now(), interval 1 week) AND date_add(now(), interval 3 day);
```

You can extract part of a timestamp by applying the corresponding function:

```sql
SELECT year(now()); -- or month(), day(), hour(), minute(), second()
```

To get a day of week from a timestamp, use the `DAYOFWEEK()` function:

```sql
-- returns 1-7 (integer), where 1 is Sunday and 7 is Saturday
SELECT dayofweek('2018-12-12');

-- returns the string day name like Monday, Tuesday, etc
SELECT dayname(now());
```

To convert a timestamp to a unix timestamp (integer seconds):

```sql
-- This will assume time to be 12am
SELECT unix_timestamp('2018-12-09');

-- You can specify an exact timestamp to be converted down to the second
SELECT unix_timestamp('2018-12-09 14:53:21');

-- calling unix_timestamp without a parameter will be like calling it for current timestamp
SELECT unix_timestamp(); -- same as SELECT unix_timestamp(now());
```

To calculate the difference between two timestamps, convert them to unix timestamps then perform the subtraction:

```sql
-- show seconds between delivery and shipping timestamps
SELECT unix_timestamp(delivered_at) - unix_timestamp(shipped_at);
FROM deliveries;

-- convert computed difference to hh:mm:ss format:
SELECT sec_to_time(unix_timestamp(delivered_at) - unix_timestamp(shipped_at))
FROM deliveries;
```

Note that MySQL also has `DATEDIFF()` and `TIMEDIFF()` functions, but they can only used with purely date values or purely time values, respectively.

# How to Group by Time in MySQL

When you want to group by minute, hour, day, week, etc., it's tempting to just group by your timestamp column, however, then you'll get one group per second, which is likely not what you want. Instead, you need to "truncate" your timestamp to the granularity you want, like minute, hour, day, week, etc. The function you need here is `DATE_FORMAT`:

```sql
SELECT
   date_format(created_at,'%Y-%m-%d %H-%i'), -- leave out -%i if you want to group by hour
   count(1)
FROM users
GROUP BY 1;
```

Grouping by date is easier as you can just use the \`DATE() function:

```sql
SELECT
   date(created_at),
   count(1)
FROM users
GROUP BY 1;
```

If you don't have new users for every minute/hour/day, you're going to have gaps in your data. To have one row per interval, even when there's no data, you'll want to [generate data](https://popsql.com/learn-sql/mysql/how-to-avoid-gaps-in-data-in-mysql/) for it.

# How to Convert UTC to Local Time Zone in MySQL

MySQL does not store time zone data when storing timestamps. It also uses the host server time as basis for generating the output of `NOW()`;

To convert a timestamp from one time zone to another, you can use the `CONVERT_TZ` function:

```sql
-- using named time zones
SELECT CONVERT_TZ('2018-01-01 12:00:00','UTC','MET');

-- using offset time zones
SELECT CONVERT_TZ('2018-01-01 12:00:00','+00:00','+10:00');
```

# How to Use Coalesce in MySQL

Say you're looking at a MySQL integer column where some rows are null:

```sql
select
  day,
  tickets
from stats;
```

```sql
    day     | tickets
------------+-------
 2018-01-01 |     1
 2018-01-02 |   null
 2018-01-03 |     3
```

Instead of having that null, you might want that row to be `0`. To do that, use the `ifnull` function, which returns the first non-null argument it's passed:

```sql
select
  day,
  ifnull(tickets, 0)
from stats;
```

```sql
    day     | tickets
------------+-------
 2018-01-01 |     1
 2018-01-02 |     0
 2018-01-03 |     3
```

# How to Calculate Percentiles in MySQL
MySQL is still behind other databases when it comes to analytical/window functions. But there are ways to get things done. For example, to get a top-down percentile ranking of film lengths from the [Sakila Sample Database](https://dev.mysql.com/doc/sakila/en/):

```sql
SELECT
 f.title,
 ROUND(100.0 * (SELECT COUNT(*) FROM film AS f2 WHERE f2.length <= f.length) / totals.film_count, 1) AS percentile
FROM film f
CROSS JOIN (
 SELECT COUNT(*) AS film_count
 FROM film
) AS totals
ORDER BY percentile DESC;
```

# How to Get the First Row per Group in MySQL

Starting with version 8.0.2, MySQL now offers a way to easily number rows. For example, if we want to list the number the films in the [Sakila Sample Database](https://dev.mysql.com/doc/sakila/en/), grouped by rating and ordered by release year:

```sql
SELECT *,
  row_number() OVER (PARTITION BY rating ORDER BY release_year) as row_num
FROM film;
```

Now if you only want to get the first row for each rating, you can use a [common table expression](https://popsql.com/learn-sql/mysql/how-to-write-a-common-table-expression-in-mysql/):

```sql
WITH _films AS (
  SELECT *,
    row_number() OVER (PARTITION BY rating ORDER BY release_year) as row_num
  FROM film;
)
SELECT *
FROM _films
WHERE row_num = 1;
```

# How to Avoid Gaps in Data in MySQL

If you're [grouping by time](https://popsql.com/learn-sql/mysql/how-to-group-by-time-in-mysql/) and you don't want gaps in your report data, you need to generate a series of time values and use it to do an outer join with your data. Prior to MySQL 8, you can do this using variables. In the following example we extracted the number of rentals per hour from the [Sakila Sample Database](https://dev.mysql.com/doc/sakila/en/):

```sql
-- The first line is to make the first value of statement below 0:00:00, not 1:00:00
-- LIMIT 720 gives 30 days worth of hourly values
SET @n:=('2005-05-25' - INTERVAL 1 HOUR);
SELECT
 hours.this_hour,
 count(rental.rental_id)
FROM
 (SELECT (SELECT @n:= @n + INTERVAL 1 HOUR) this_hour
 FROM inventory LIMIT 720) hours
LEFT JOIN rental ON (hours.this_hour=date_format(rental.rental_date,'%Y-%m-%d %H:00:00'))
GROUP BY hours.this_hour;
```

There are serious limitations with the above method. For one thing, you may ask what does the `inventory` table have to do with rentals? It does not have anything to do with it directly. The `inventory` table got chosen because it has a few thousand rows and we needed one that has at least 720 to generate our series of time values. In reality any table with a large number of rows can be used for this. But what if there is none available?

Fortunately, starting with MySQL 8, you can instead use a [common table expression](https://popsql.com/learn-sql/mysql/how-to-write-a-common-table-expression-in-mysql/):

```sql
WITH RECURSIVE my_hours AS
(
 SELECT 0 as inc
 UNION ALL
 SELECT 1+inc
 FROM my_hours WHERE inc<=720
)
SELECT
 hours.this_hour,
 count(rental.rental_id)
FROM
 (SELECT '2005-05-25' + interval inc hour as this_hour
 FROM my_hours) as hours
LEFT JOIN rental ON (hours.this_hour=date_format(rental.rental_date,'%Y-%m-%d %H:00:00'))
GROUP BY hours.this_hour;
```

How to Avoid Gaps in Data

# How to Do Type Casting in MySQL

By default, MySQL is not strict with type casting. For example, adding a numeric value in string quotes to another numeric value with not give the usual errors other databases and programming languages will give:

```sql
mysql> select 1 + '1';
+---------+
| 1 + '1' |
+---------+
|       2 |
+---------+
```

However, should the need arise, you can use the `CAST()` function to force the type of a value. You can cast to the following types: `BINARY`, `CHAR`, `DATE`, `DATETIME`, `TIME`,`DECIMAL`, `SIGNED`, `UNSIGNED`.

```sql
-- cast float to unsigned integer
SELECT CAST(1.0 AS UNSIGNED);

-- cast string to date
SELECT CAST('2018-12-12' AS DATE);

-- cast string to decimal
SELECT CAST('12.345' AS DECIMAL(5,3));

-- cast string to time:
SELECT CAST('12:45' AS TIME);
```

# How to Write a Common Table Expression in MySQL

Common table expressions (CTEs) are a great way to break up complex queries. MySQL started supporting this in version 8. Here's a simple query to illustrate how to write a CTE:

```sql
with beta_users as (
  select *
  from users
  where beta is true
)
select events.*
from events
inner join beta_users on beta_users.id = events.user_id;
```

You can find more complex examples of using CTE's in [How to Avoid Gaps in Data in MySQL](https://popsql.com/learn-sql/mysql/how-to-avoid-gaps-in-data-in-mysql/) and in [Calculating Cumulative Sums in MySQL](https://popsql.com/learn-sql/mysql/how-to-calculate-cumulative-sum-running-total-in-mysql).

# How to Import a CSV in MySQL

Importing a CSV into MySQL requires you to [create a table](https://popsql.com/learn-sql/mysql/how-to-create-a-table-in-mysql) first. [Duplicating an existing table's structure](https://popsql.com/learn-sql/mysql/how-to-duplicate-a-table-in-mysql) might be helpful here too.

This tutorial focuses on the `LOAD DATA` command which is available if you use the MySQL command line client. By default it assumes your source data is delimited using tabs (vs commas), has strings that contain the tab delimiter in single quotes, and each row of data ends with the newline (`\n`) character.

First you need to connect to MySQL:

```sql
$ mysql -u username -p -h database.host.name database_name
```

The `-p` option will result in the user being prompted for a password. `-h database.host.name` is optional if the MySQL server is on the same host. Also optional is `database_name` but it is better to specify the database/schema name now rather than later.

Here are example uses of the `LOAD DATA` command:

```sql
-- source file is located in the MySQL Server data directory, is tab delimited, and the columns exactly match the table
LOAD DATA INFILE 'user_data.tsv' INTO TABLE users;

-- source file is local, is comma delimited and contains only some columns. Database is remote
LOAD DATA LOCAL INFILE '/tmp/user_data.csv'
  INTO TABLE users (first_name, last_name, email)
  FIELDS TERMINATED BY ',';

-- source file is comma delimited, strings are enclosed by double quotes, lines are terminated by carriage return/newline pairs, has a single header row that has to be ignored
LOAD DATA INFILE 'data.txt' INTO TABLE my_table_name
  FIELDS TERMINATED BY ',' ENCLOSED BY '"'
  LINES TERMINATED BY '\r\n'
  IGNORE 1 LINES;

-- Ignore a column in the source file by assigning it to a user variable and not assigning the variable to a table column
LOAD DATA INFILE 'data.txt'
  INTO TABLE my_table_name (column1, @dummy, column2, @dummy, column3);

-- Populate a column the source file does not have any data for using the SET clause
LOAD DATA INFILE 'file_no_timestamps.txt'
  INTO TABLE my_table_with_timestamps
  (column1, column2)
  SET timestamp_column = CURRENT_TIMESTAMP;
```

How to Import a CSV

# How to Compare Two Values When One is Null in MySQL

Imagine you're comparing two MySQL columns and you want to know how many rows are different. No problem, you think:

```sql
SELECT count(1)
FROM items
WHERE width != height;
```

Not so fast. If some of the widths or heights are null, they won't be counted! Surely that wasn't your intention. That's where you need a null-aware operator like `<=>`:

```sql
SELECT count(1)
FROM items
WHERE NOT (width <=> height);
```

Because `<=>` is actually a null-aware version of the equal operator, we need to negate it with `NOT`. Now, your count will be "null aware" and you'll get the result you want.

How to Compare Two Values When One is Null

# How to Write a Case Statement in MySQL

Case statements are useful when you're dealing with multiple `IF` statements in your select clause. It comes in two forms:

```sql
SELECT
 CASE
 WHEN score < 70 THEN 'failed'
 WHEN score BETWEEN 70 AND 80 THEN 'passed'
 WHEN score BETWEEN 81 AND 90 THEN 'very good'
 ELSE 'outstanding'
 END AS performance
FROM test_scores;

SELECT
 CASE grade
 WHEN 'A' THEN 'Excellent'
 WHEN 'B' THEN 'Good'
 WHEN 'C' THEN 'Needs Improvement'
 ELSE 'Failed'
 END AS grade_interpretation
FROM grades;
```

# How to Query a JSON Column in MySQL

Starting with version 5.7.8, MySQL supports JSON columns. This gives the advantage of storing and querying unstructured data. Here's how you can query a JSON column in MySQL:

```sql
-- Getting the params.name string value from events table
SELECT params->>'$.name'
FROM events;

-- Getting rows where the browser.name is Chrome
-- This also shows the difference of using -> vs ->>
-- Using -> will cause strings to be enclosed in quotes
SELECT browser->>'$.name', browser->'$.name'
FROM events
WHERE browser->>'$.name' = 'Chrome';

-- Give me the first index of a JSON array
SELECT properties->>'$.my_array[0]'
FROM events;

-- Going deeper to get the X resolution only
SELECT properties->'$.resolution.x'
FROM events;
```

# How to Have Multiple Counts in MySQL

To do multiple counts in one query in MySQL, you can combine `COUNT()` with `IF()`:

```sql
SELECT
  COUNT(1), -- Count all users
  COUNT(IF(gender='male', 1, 0)), -- Count male users
  COUNT(IF(beta=true, 1, 0)) -- Count beta users
  COUNT(IF(active=true AND beta = false, 1, 0)) -- Count active non-beta users
FROM users;
```

How to Have Multiple Counts

# How to Calculate Cumulative Sum-Running Total in MySQL

Let's say for our MySQL DVD rental database, we want to see a hockey stick graph of our cumulative rentals by day. First, we'll need a table with a day column and a count column:

```sql
SELECT
  date(rental_date) as day,
  count(rental_id) as rental_count
FROM rental
GROUP BY day;

day        | rental_count
-----------+--------------
2005-05-24 | 8
2005-05-25 | 137
2005-05-26 | 174
2005-05-27 | 166
2005-05-28 | 196
```

Then we use this to do our cumulative totals. Before MySQL version 8 you can use variables for this:

```sql
SELECT t.day,
       t.rental_count,
       @running_total:=@running_total + t.rental_count AS cumulative_sum
FROM
( SELECT
  date(rental_date) as day,
  count(rental_id) as rental_count
  FROM rental
  GROUP BY day ) t
JOIN (SELECT @running_total:=0) r
ORDER BY t.day;

day        | rental_count | cumulative_sum
-----------+--------------+----------------
2005-05-24 | 8            |	8
2005-05-25 | 137          |	145
2005-05-26 | 174          |	319
2005-05-27 | 166          |	485
2005-05-28 | 196          |	681
```

For MySQL 8 you can use a windowed `SUM()` and also a [MySQL common table expression (CTE)](https://popsql.com/learn-sql/mysql/how-to-write-a-common-table-expression-in-mysql/) instead of a subquery to make it more readable, the result is the same:

```sql
with data as (
  select
    date(rental_date) as day,
    count(rental_id) as rental_count
  from rental
  group by day
)

select
  day,
  rental_count,
  sum(rental_count) over (order by day) as cumulative_sum
from data;
```

# How to Insert in Redshift

## Basic

The simplest way to insert a row in Redshift is to to use the `INSERT INTO` command and specify values for all columns. If you have 10 columns, you have to specify 10 values and they have to be in order how the table was defined:.

```sql
-- Assuming the users table has only three columns: first_name, last_name, and email, and in that order
INSERT INTO users VALUES ('John', 'Doe', 'john@doe.com');
```

## Specifying a Column List

It's optional, but specifying a column list before the `VALUES` keyword is highly recommended:

```sql
INSERT INTO users (first_name, last_name, email, birth_date, city, state)
VALUES ('John', 'Doe', 'john@doe.com','2000-01-01','Los Angeles','CA');
```

Having a column list has the following advantages:

*   You don't have to remember the column order as defined in the table.
*   You don't have to specify a value for all columns, just the required ones.
*   In case there are many columns, it is easier to match a value to the column it's intended for when you see it in the statement, rather than having to look at the table definition.
*   `INSERT` statements without a column lists are invalidated once a column is added or removed from the table. You need to modify your query to reflect the new or deleted column in order for them to work again.

## Inserting Multiple Rows

You can insert multiple rows in one `INSERT` statement by having multiple sets of values enclosed in parentheses. It's faster to do one bulk insert rather than multiple individual inserts.

```sql
INSERT INTO users (first_name, last_name)
VALUES
  ('John','Lennon'),
  ('Paul','McCartney'),
  ('George','Harrison'),
  ('Ringo','Starr');
```

You can also use `INSERT` with a `SELECT` command to copy data from an existing table. Note that the `VALUES` keyword is omitted:

```sql
INSERT INTO beta_users (first_name, last_name)
SELECT first_name, last_name
FROM users
where beta = 1;
```

## Inserting JSON Strings

While Redshift does not support the JSON datatype, you can still store properly formatted JSON strings in a `CHAR` or `VARCHAR` column. `VARCHAR` is needed if the strings include multi-byte characters.

```sql
INSERT INTO test_json
VALUES ('{"beta": true, "status": "for review", "test_count": 1, "test_array":[1,2,3]}');
```

Note that Amazon Web Services [recommends using JSON sparingly](https://docs.aws.amazon.com/redshift/latest/dg/json-functions.html), because it does not leverage Redshift's design.

## Handling Conflicts/Duplicates

Note that primary keys and unique constraints are not enforced in Redshift. Ensure your data is clean and duplicates have been eliminated before inserting into Redshift.

# How to Update in Redshift

To update all rows in a Redshift table, just use the `UPDATE` statement without a `WHERE` clause:

```sql
UPDATE products SET brand='Acme';
```

You can also update multiple columns at a time:

```sql
UPDATE products SET brand='Acme', category='Home Appliances';
```

Usually you only want to update rows that match a certain condition. You do this by specifying a `WHERE` clause:

```sql
--This will update only one row that matches product_id=1
UPDATE products SET stocks=100, available=true
WHERE product_id=1;

--This will update multiple rows that match Category='Electronics'
UPDATE products SET stocks=50, available=true
WHERE category='Electronics';
```

# How to Delete in Redshift

To delete rows in a Redshift table, use the `DELETE FROM` statement:

```sql
DELETE FROM products WHERE product_id=1;
```

The `WHERE` clause is optional, but you'll usually want it, unless you really want to delete every row from the table.

# How to Create a Table in Redshift

Here's an example of creating a `users` table in Redshift:

```sql
CREATE TABLE users (
  id INTEGER primary key, -- Auto incrementing IDs
  name character varying, -- String column without specifying a length
  created_at timestamp without time zone -- Always store time in UTC
);
```

This is also a chance to specify [not null constraints and default values in Redshift](https://popsql.com/learn-sql/redshift/how-to-add-or-remove-default-values-or-null-constraints-to-a-column-in-redshift/):

```sql
create table users (
  id BIGINT primary key,
  name character varying not null,
  active boolean default true
);
```

Redshift supports the following data types:

*   SMALLINT (INT2)
*   INTEGER (INT, INT4)
*   BIGINT (INT8)
*   DECIMAL (NUMERIC)
*   REAL (FLOAT4)
*   DOUBLE PRECISION (FLOAT8)
*   BOOLEAN (BOOL)
*   CHAR (CHARACTER)
*   VARCHAR (CHARACTER VARYING)
*   DATE
*   TIMESTAMP
*   TIMESTAMPTZ

You can also create temporary tables that will stick around for the duration of your session. This is helpful to break down your analysis into smaller pieces.

```sql
-- Create a temporary table called `scratch_users` with just an `id` column
create temporary table scratch_users (id integer);

-- Or create a temporary table based on the output of a select
create temp table active_users
as
select * from users where active is true;
```

This concludes the basics of creating tables in Redshift. In [How to Use DISTKEY, SORTKEY and Define Column Compression Encoding in Redshift](https://popsql.com/learn-sql/redshift/how-to-use-distkey-sortkey-and-define-column-compression-encoding-in-redshift/) we will cover more advanced, Redshift-specific table creation options.

# How to Use DISTKEY, SORTKEY and Define Column Compression Encoding in Redshift

Designing tables properly is critical to successful use of any database, and is emphasized a lot more in specialized databases such as Redshift. This articles talks about the options to use when creating tables to ensure performance, and continues from [Redshift table creation basics](https://popsql.com/learn-sql/redshift/how-to-create-a-table-in-redshift/).

## Selecting Sort Keys

When you create a table on Redshift, you can (and should) specify one or more columns as the sort key. You can think of a sort key as a specialized type of index, since Redshift does not have the regular indexes found in other relational databases. Redshift stores data on disk in sorted order according to the sort key, which has an important effect on query performance.

You choose sort keys based on the following criteria:

* If recent data is queried most frequently, specify the timestamp column as the leading column.
* If you frequently filter by a range of values or a single value on one column, that column should be the sort key.
* Columns frequently used in joins should be used as the sort key

Here are some examples of defining the sort key:

```sql
-- sale_date is the timestamp column
CREATE TABLE sales (
 sale_id BIGINT NOT NULL PRIMARY KEY,
 sale_date timestamp NOT NULL SORTKEY,
 ... <other colums>
);

-- use the SORTKEY table attribute keyword to create a multi-column sort key
-- In this case searches are done frequently by the location columns,
-- so state and city are part of sort key
CREATE TABLE dim_customers (
 ... <some columns>...
 state VARCHAR,
 city VARCHAR
)
SORTKEY (state, city);
```

## Selecting Distribution Styles

When you create a Redshift cluster, you define the number of nodes you want to use. The nodes work in parallel to speed up query execution. This also means that when you load data into a table, Redshift distributes the rows of the table to each of the node slices according to the table's distribution style.

There are three distribution styles:

* **EVEN Distribution**: This is the default and just uses a simple round-robin method to distribute data, regardless of values. This is appropriate when a table is not used in queries with joins or when there is no clear choice of distribution method between the next two.
* **KEY Distribution**: The values in one column are used to determine the row distribution. Redshift will attempt to place matching values on the same node slice. Use this for tables that are frequently joined together so that Redshift will collocate the rows of the tables with the same values of the joining columns on the same node slices. This makes execution of the joins much faster since the matching values of the common columns are physically stored together.
* **ALL Distribution**: Using this will cause a copy of the entire table to be stored on each node. This is normally used for small but frequently joined tables such as lookup tables.

Distribution of a table is defined using the `DISTSTYLE` and/or `DISTKEY`.

```sql
-- Specifying a column as DISTKEY automatically sets distribution style to KEY
CREATE TABLE sales (
 sale_id BIGINT NOT NULL PRIMARY KEY,
 sale_date timestamp NOT NULL SORTKEY,
 customer_id int DISTKEY,
 amount float
);

-- Use DISTSTYLE table attribute to set it to ALL
CREATE TABLE atrribute_lookup (
 attribute_id INT NOT NULL PRIMARY KEY,
 attribute_name VARCHAR
)
DISTSTYLE ALL;
```

## Specifying Column Compression Encoding

Compression is defined per column allows reduction of size of stored data, which reduces disk I/O and improves query performance.

You define compression per column, if you do not specify any, Redshift uses the following compression:

* All columns in temporary tables are assigned RAW compression by default
* Columns defined as sort keys are assigned RAW compression
* BOOLEAN, REAL, and DOUBLE PRECISION columns are assigned RAW compression
* All others are assigned LZO compression.

For example, if you want to force a VARCHAR column to use RAW compression:

```sql
CREATE TABLE atrribute_lookup (
 attribute_id INT NOT NULL PRIMARY KEY,
 attribute_name VARCHAR ENCODE RAW
);
```

See the [Redshift Documentation](https://docs.aws.amazon.com/redshift/latest/dg/c_Compression_encodings.html#compression-encoding-list) for details on the different compression encodings.

How to Use DISTKEY, SORTKEY and Define Column Compression Encoding

# How to Drop a Table in Redshift

Dropping a table in Redshift is simple:

```sql
DROP TABLE users;
```

Care must be done before running this command. There is no "undo" function for it.

# How to Rename a Table in Redshift

Redshift allows renaming a table using the `ALTER TABLE` syntax:

```sql
ALTER TABLE old_table_name RENAME TO new_table_name;
```

# How to Truncate a Table in Redshift

Be very careful with this command. It will empty the contents of your Redshift table and there is no undo. This is useful in development, but you'll rarely want to do this in production.

```sql
TRUNCATE TABLE table_name;

--the TABLE keyword is actually optional:
TRUNCATE table_name;
```

# How to Duplicate a Table in Redshift

You can duplicate or "clone" a Redshift table's contents by executing a `CREATE TABLE ... AS SELECT` statement:

```sql
CREATE TABLE new_table AS SELECT * FROM original_table;
```

Please be careful when using this to clone big tables. This can take a lot of time and server resources.

Note also that `new_table` inherits ONLY the basic column definitions, null settings and default values of the `original_table`. It does not inherit table attributes.

To inherit all table definitions, use the `CREATE TABLE... LIKE` syntax:

```sql
CREATE TABLE new_table (LIKE original_table);
```

This makes the structure of `new_table` exactly like that of `original_table`, but _does not_ copy the data. To copy the data, you'll need `INSERT ... SELECT`:

```sql
INSERT INTO new_table SELECT * FROM original_table;
```

Again, be careful when doing this to big tables.