rel:: [[Programming Languages]]
rel:: [Against SQL](x-devonthink-item://C236F225-B503-41F6-AA74-CEB634A77D15)
# PRQL
> **P**ipelined **R**elational **Q**uery **L**anguage, pronounced "Prequel".
>
> [...] Unlike SQL, it forms a logical pipeline of transformations, and supports abstractions such as variables and functions. It can be used with any database that uses SQL, since it transpiles to SQL.
## Principles
> PRQL is intended to be a modern, simple, declarative language for transforming data, with abstractions such as variables & functions. It's intended to replace SQL, but doesn't have ambitions as a general-purpose programming language. While it's at a pre-alpha stage, it has some immutable principles:
>
> - _Pipelined_ — PRQL is a linear pipeline of transformations — each line of the query is a transformation of the previous line's result. This makes it easy to read, and simple to write. This is also known as "[point-free style](https://en.wikipedia.org/w/index.php?title=Point-free_programming)".
> - _Simple_ — PRQL serves both sophisticated engineers and analysts without coding experience. By providing simple, clean abstractions, the language can be both powerful and easy to use.
> - _Compatible_ — PRQL transpiles to SQL, so it can be used with any database that uses SQL, and with any existing tools or programming languages that manage SQL. PRQL should allow for a gradual onramp — it should be practical to mix SQL into a PRQL query where PRQL doesn't yet have an implementation. Where possible PRQL can unify syntax across databases.
> - _Analytical_ — PRQL's focus is analytical queries; we de-emphasize other SQL features such as inserting data or transactions.
> - _Extensible_ — PRQL can be extended through its abstractions, and can evolve without breaking backward-compatibility, because its queries can specify their PRQL version.
## Reference
- [source](https://github.com/max-sixty/prql)
- [hacker news discussion](https://news.ycombinator.com/item?id=30060784#30062329)
- [similar projects](https://github.com/max-sixty/prql#similar-projects)
### Syntax Example
```SQL
SELECT TOP 20
title,
country,
AVG(salary) AS average_salary,
SUM(salary) AS sum_salary,
AVG(salary + payroll_tax) AS average_gross_salary,
SUM(salary + payroll_tax) AS sum_gross_salary,
AVG(salary + payroll_tax + benefits_cost) AS average_gross_cost,
SUM(salary + payroll_tax + benefits_cost) AS sum_gross_cost,
COUNT(*) as count
FROM employees
WHERE salary + payroll_tax + benefits_cost > 0 AND country = 'USA'
GROUP BY title, country
ORDER BY sum_gross_cost
HAVING count > 200
```
In PRQL this would be expressed:
```elm
from employees
filter country = "USA" # Each line transforms the previous result.
derive [ # This adds columns / variables.
gross_salary: salary + payroll_tax,
gross_cost: gross_salary + benefits_cost # Variables can use other variables.
]
filter gross_cost > 0
aggregate by:[title, country] [ # `by` are the columns to group by.
average salary, # These are aggregation calcs run on each group.
sum salary,
average gross_salary,
sum gross_salary,
average gross_cost,
sum_gross_cost: sum gross_cost,
ct: count *,
]
sort sum_gross_cost
filter ct > 200
take 20
```
Note `derive` variables.
### Syntax example with functions
```sql
SELECT
date,
-- Can't use a `WHERE` clause, as it would affect the row that the `LAG` function referenced.
IF(is_valid_price, price_adjusted / LAG(price_adjusted, 1) OVER
(PARTITION BY sec_id ORDER BY date) - 1 + dividend_return, NULL) AS return_total,
IF(is_valid_price, price_adjusted_usd / LAG(price_adjusted_usd, 1) OVER
(PARTITION BY sec_id ORDER BY date) - 1 + dividend_return, NULL) AS return_usd,
IF(is_valid_price, price_adjusted / LAG(price_adjusted, 1) OVER
(PARTITION BY sec_id ORDER BY date) - 1 + dividend_return, NULL)
- interest_rate / 252 AS return_excess,
IF(is_valid_price, price_adjusted_usd / LAG(price_adjusted_usd, 1) OVER
(PARTITION BY sec_id ORDER BY date) - 1 + dividend_return, NULL)
- interest_rate / 252 AS return_usd_excess
FROM prices
```
In PRQL this would be expressed
```elm
prql version:0.1 db:snowflake # Version number & database name.
func lag_day x = (
window x
by sec_id
sort date
lag 1
)
func ret x = x / (x | lag_day) - 1 + dividend_return
func excess x = (x - interest_rate) / 252
func if_valid x = is_valid_price ? x : null
from prices
derive [
return_total: prices_adj | ret | if_valid # `|` can be used rather than newlines.
return_usd: prices_usd | ret | if_valid
return_excess: return_total | excess
return_usd_excess: return_usd | excess
]
select [
date,
sec_id,
return_total,
return_usd,
return_excess,
return_usd_excess,
]
```