連鎖網(wǎng)站開發(fā)中國(guó)做網(wǎng)站的公司排名
該筆記來(lái)源于網(wǎng)絡(luò),僅用于搜索學(xué)習(xí),不保證所有內(nèi)容正確。
文章目錄
- 一、presto基礎(chǔ)操作
- 二、時(shí)間函數(shù)
- 0、當(dāng)前日期/當(dāng)前時(shí)間
- 1、轉(zhuǎn)時(shí)間戳
- 1)字符串轉(zhuǎn)時(shí)間戳 (推薦)
- 2)按照f(shuō)ormat指定的格式,將字符串string解析成timestamp。
- 3)bigint 轉(zhuǎn)時(shí)間戳
- 2、轉(zhuǎn)年月日/取年月日
- 1)時(shí)間戳取年月日
- 2)字符串轉(zhuǎn)年月日
- 3)bigint 轉(zhuǎn)年月日
- 3、日期變換:間隔、加減、截取、提取
- 1)求時(shí)間間隔 date_diff
- 2)求幾天前,幾天后 interval、date_add
- 3)時(shí)間截取函數(shù) date_trunc(unit, x)
- 4)時(shí)間提取函數(shù) extract、year、month、day
- 4、轉(zhuǎn)int
- 三、字符串函數(shù)
- 四、二進(jìn)制函數(shù)(與字符串函數(shù)類似)
- 五、正則表達(dá)式
- 六、聚合函數(shù)
- 七、窗口函數(shù)
- 八、數(shù)組、MAP、Json函數(shù)
一、presto基礎(chǔ)操作
邏輯操作 AND OR NOT
比較操作 > < >= <= = <> !=
范圍操作 between and; not between and; ……
空值判斷 is null; is not null
最大最小值 greatest(1,2,3); least(1,2,3)
條件表達(dá)式case when thenif(condition, true_value, false_value)nullif(value1, value2):value1 = value2返回null,否則返回value1try(expression):表達(dá)式異常則返回null(防止分母為0,數(shù)字超過(guò)范圍,無(wú)效cast等)
轉(zhuǎn)換函數(shù) cast(value as type); try_cast(value as type) : 轉(zhuǎn)換失敗返回nulltypeof(expr) :返回?cái)?shù)據(jù)類型數(shù)學(xué)運(yùn)算 + - * / % abs() 絕對(duì)值ceil() 向上取整floor() 向下取整pow(x,p);power(x,p) x^prand();random() 返回[0,1)間隨機(jī)數(shù)round(): 同int()round(x,d):保留基本d位小數(shù)nan():not a numberis_nan(x): 判斷x是否為nan注:/與hive有差異!!!presto 10/6=1hive 10/6=1.6666666666666667 presto 中可采用: cast(10 as double)/6=1.6666666666666667
二、時(shí)間函數(shù)
0、當(dāng)前日期/當(dāng)前時(shí)間
presto:adm> select current_date,current_time,current_timestamp【=now()】-> ;_col0 | _col1 | _col2
------------+------------------+-----------------------------2019-04-28 | 13:04:22.232 PRC | 2019-04-28 13:04:22.232 PRC
1、轉(zhuǎn)時(shí)間戳
1)字符串轉(zhuǎn)時(shí)間戳 (推薦)
即:‘2019-04-26’ 轉(zhuǎn)換成 2019-04-26 00:00:00.000
select cast('2019-04-26' as timestamp)
-- 2019-04-26 00:00:00.000select cast('2019-04-26 01:22:23' as timestamp)
-- 2019-04-26 01:22:23.000
2)按照f(shuō)ormat指定的格式,將字符串string解析成timestamp。
select date_parse('2019-04-06','%Y-%m-%d') 2019-04-06 00:00:00.000
select date_parse('2019-04-06 00:03:55','%Y-%m-%d %H:%i:%S') 2019-04-06 00:03:55.000
注:字符串格式和format格式需保持一致,以下為錯(cuò)誤示例:
select date_parse('2019-04-06','%Y-%m-%d %H:%i:%S')
Invalid format: "2019-04-06" is too shortselect date_parse('2019-04-06 00:03:55','%Y-%m-%d')
Invalid format: "2019-04-06 00:03:55" is malformed at " 00:03:55"select date_parse('2019-04-06 00:03:55','%Y%m%d %H:%i:%S')
Invalid format: "2019-04-06 00:03:55" is malformed at "-04-06 00:03:55"
注:時(shí)間戳格式化 format_datetime(timestamp,‘yyyy-MM-dd HH:mm:ss’)
3)bigint 轉(zhuǎn)時(shí)間戳
即:int型 轉(zhuǎn)換成 2017-05-10 06:18:50.000
from_unixtime(create_time)
補(bǔ)充:時(shí)間轉(zhuǎn)bigint:
select to_unixtime(current_date); 1556380800
2、轉(zhuǎn)年月日/取年月日
推薦思路:先轉(zhuǎn)時(shí)間戳,再格式化為年月日再date()為年月日。
1)時(shí)間戳取年月日
即:2017-09-18 13:40:31 轉(zhuǎn)換成 2017-09-18
select date_format(current_date,'%Y-%m-%d')
select date(current_date)
select cast(current_date as date)
-- 2019-04-28
2)字符串轉(zhuǎn)年月日
select date(cast('2019-04-28 10:28:00' as TIMESTAMP))
select date('2019-04-28')
select date_format(cast('2019-04-28 10:28:00' as TIMESTAMP),'%Y-%m-%d')
select to_date('2019-04-28','yyyy-mm-dd');-- 2019-04-28
注:格式不同時(shí)date、to_date無(wú)法使用
select date('2019-04-28 10:28:00')
-- failed: Value cannot be cast to date: 2019-04-28 10:28:00
select to_date('2019-04-28 10:28:00','yyyy-mm-dd');
-- Invalid format: "2019-04-28 10:28:00" is malformed at " 10:28:00"
3)bigint 轉(zhuǎn)年月日
date(from_unixtime(1556380800))
select date_format(from_unixtime(1556380800),'%Y-%m-%d')-- 2019-04-28
3、日期變換:間隔、加減、截取、提取
1)求時(shí)間間隔 date_diff
date_diff(unit, timestamp1, timestamp2) → biginteg:select date_diff('day',cast('2019-04-24' as TIMESTAMP),cast('2019-04-26' as TIMESTAMP))
--2
注:與hive差異!!!
presto中 date_diff('day',date1,date2)【后-前】
hive,mysql中 datediff(date1,date2) 【前-后】
2)求幾天前,幾天后 interval、date_add
select current_date,(current_date - interval '7' day),date_add('day', -7, current_date)2019-04-28 | 2019-04-21 | 2019-04-21select current_date,(current_date + interval '7' day),date_add('day', 7, current_date)2019-04-28 | 2019-05-05 | 2019-05-05
3)時(shí)間截取函數(shù) date_trunc(unit, x)
截取月初
select date_trunc('month',current_date)
2019-04-01截取年初
select date_trunc('year',current_date)
2019-01-01
4)時(shí)間提取函數(shù) extract、year、month、day
extract(field FROM x) → bigint【注:field不帶引號(hào)!】
year(x),month(x),day(x)
eg:
select extract(year from current_date),year(current_date),extract(month from current_date),month(current_date),extract(day from current_date),day(current_date);
-------+-------+-------+-------+-------+-------2019 | 2019 | 4 | 4 | 28 | 28
4、轉(zhuǎn)int
思路:先轉(zhuǎn)timestamp,再to_unixtime轉(zhuǎn)int
to_unixtime(timestamp_col)
三、字符串函數(shù)
presto中字符串只能使用單引號(hào)
注意:hive中字符串可以使用單引號(hào)或雙引號(hào),presto中字符串只能使用單引號(hào)。
eg:
presto:adm> select d_module from adm.f_app_video_vv where dt='2019-04-27' and d_module="為你推薦-大屏" limit 10;
Query 20190428_034805_00112_ym89j failed: line 1:76: Column '為你推薦-大屏' cannot be resolvedpresto:adm> select d_module from adm.f_app_video_vv where dt='2019-04-27' and d_module='為你推薦-大屏' limit 10;d_module
---------------為你推薦-大屏為你推薦-大屏為你推薦-大屏為你推薦-大屏為你推薦-大屏為你推薦-大屏為你推薦-大屏為你推薦-大屏為你推薦-大屏為你推薦-大屏
(10 rows)
基礎(chǔ)字符串函數(shù) concat length lower upper
拼接 concat(string1, ..., stringN) → varchar取長(zhǎng)度 length(string) → bigint字母全部轉(zhuǎn)換為小寫 lower(string) → varchar
字母全部轉(zhuǎn)換為大寫 upper(string) → varchar
eg:select lower('ABc'),upper('ABc')abc,ABC
字符串填充 lpad rpad
字符串左填充 lpad(string, size, padstring) string長(zhǎng)度不足size則將padstring重復(fù)填充到左邊直到長(zhǎng)度等于sizestring長(zhǎng)度超過(guò)size則截圖string左側(cè)的size個(gè)字符eg.select lpad('csdfasg',10,'a') aaacsdfasgselect lpad('csdfasg',3,'a') csd字符串右填充 rpad(string, size, padstring) → varchar
字符串清除空格 ltrim rtrim trim
清除字符串左側(cè)空格 ltrim(string) → varchar清除字符串右側(cè)空格 rtrim(string) → varchar清除字符串兩側(cè)空格 trim(string) → varchar
字符串替換字符 replace
替換字符-去掉string中的search: replace(string, search) 替換字符-將string中的search替換為replace:replace(string, search, replace)eg:select replace('23543','2'),replace('23543','2','8')3543, 83543
字符串拆分 split
拆分字符串:
split(string, delimiter) -> array(varchar)eg:select split('325f243f325f43','f');[325, 243, 325, 43]
拆分字符串-拆分到第limit-1個(gè)分隔符為止:
split(string, delimiter, limit) -> array(varchar)eg:select split('325f243f325f43','f',2);[325, 243f325f43]select split('325f243f325f43','f',3);[325, 243, 325f43]
拆分字符串-獲取特定位置的拆分結(jié)果(注:index從1開始):
split_part(string, delimiter, index)eg:select split_part('325f243f325f43','f', 4)43
字符串定位 strpos position
定位函數(shù)-獲取字符串中某個(gè)字符第一次出現(xiàn)的位置,從1開始:
strpos(string, substring) → bigint
position(substring IN string) → bigint
字符串截取 substr
截取函數(shù)-截取start右側(cè)字符(含start):
substr(string, start) → varchar
【 substring(~)相同 】eg:select substr('325f243f325f43', 3),substr('325f243f325f43', -3)5f243f325f43,f43
截取函數(shù)-從start開始向右側(cè)截取length個(gè)字符(含start):
substr(string, start, length) → varchar
【 substring(~)相同 】eg:select substr('325f243f325f43', 3, 3),substr('325f243f325f43', -3,2)5f2,f4
擴(kuò)展:截取函數(shù)substr,定位函數(shù)strpos組合使用:
substr(remark,strpos(remark,'title'),strpos(remark,'status')-strpos(remark,'title')-3)
其他
string轉(zhuǎn)UTF-8:to_utf8(string) → varbinary補(bǔ)充:
二進(jìn)制轉(zhuǎn)int:crc32(binary) → bigint
二進(jìn)制轉(zhuǎn)string:from_utf8(binary) → varchareg:
select to_utf8('你好') ,crc32(to_utf8('你好')), from_utf8(to_utf8('你好'))e4 bd a0 e5 a5 bd | 1352841281 | 你好
四、二進(jìn)制函數(shù)(與字符串函數(shù)類似)
length、concat、substr、lpad、rpad等
md5(binary) → varbinary
crc32(binary) → biginteg:presto:adm> select to_utf8('為你推薦-大屏'), crc32(to_utf8('為你推薦-大屏'));_col0 | _col1-------------------------------------------------+------------e4 b8 ba e4 bd a0 e6 8e a8 e8 8d 90 2d e5 a4 a7 | 4200009045e5 b1 8f |(1 row)
五、正則表達(dá)式
返回string中符合pattern的元素: regexp_extract_all、regexp_extract
返回string中所有符合pattern的元素 :
regexp_extract_all(string, pattern) -> array(varchar)eg:SELECT regexp_extract_all('1a 2b 14m', '\d+'); -- [1, 2, 14]返回string中第一個(gè)符合pattern的元素 :
regexp_extract(string, pattern) → varchareg:SELECT regexp_extract('1a 2b 14m', '\d+'); -- 1返回string中所有符合"pattern組合"的元素中指定pattern位的元素 :
regexp_extract_all(string, pattern, group) -> array(varchar)
eg:SELECT regexp_extract_all('1a 2b 14m', '(\d+)([a-z]+)', 2); -- ['a', 'b', 'm']返回string中第一個(gè)符合"pattern組合"的元素中指定pattern位的元素 :
regexp_extract(string, pattern, group) → varchar
eg:SELECT regexp_extract('1a 2b 14m', '(\d+)([a-z]+)', 2); -- 'a'
判斷string是否符合pattern: regexp_like
【可理解為多個(gè)like的組合,且比like組合高效】
regexp_like(string, pattern) → boolean
eg:
SELECT regexp_like('1a 2b 14m', '\d+n'),regexp_like('1a 2b 14m', '\d+m'),regexp_like('1a 2b 14m', '\d+n | \d+m')false,true,true
替換string中符合pattern的元素: regexp_replace
替換字符-將 string 中符合 pattern 的元素替換為空 (移除元素) :
regexp_replace(string, pattern) → varchar
eg:SELECT regexp_replace('1a 2b 14m', '\d+[ab] '); -- '14m'替換字符-將string中符合pattern的元素替換為replacement:
regexp_replace(string, pattern, replacement) → varchar
eg:SELECT regexp_replace('1a 2b 14m', '(\d+)([ab]) ', 'new'); -- newnew14mSELECT regexp_replace('1a 2b 14m', '(\d+)([ab]) ', '3c$2 '); -- '3ca 3cb 14m'注:$2指第二個(gè)parttern位對(duì)應(yīng)元素替換字符-將string中符合pattern的元素替換為function結(jié)果 :
regexp_replace(string, pattern, function) → varchar
eg:SELECT regexp_replace('new york', '(\w)(\w*)', x -> upper(x[1]) || lower(x[2])); --'New York'
按pattern拆分string: regexp_split
拆分字符串-按pattern拆分 :
regexp_split(string, pattern) -> array(varchar)
eg:presto:adm> SELECT regexp_split('1a 2b 14m', '\s'),regexp_split('1a 2b 14m', '[a-z]+');_col0 | _col1---------------+----------------[1a, 2b, 14m] | [1, 2, 14, ]
六、聚合函數(shù)
求和函數(shù) sum
最大最小值函數(shù) max min
最大值:max(x) → [same as input]
最大的n個(gè)值:max(x, n) → array<[same as x]>
最小值:min(x) → [same as input]
最小的n個(gè)值:min(x, n) → array<[same as x]>
注1:hive中沒(méi)有 max(x, n)、min(x, n)
注2:max(x, n)、min(x, n) 與rank相比,書寫更簡(jiǎn)單,但無(wú)法直接帶出相關(guān)信息
eg:
select max(m_vvpv,3) from app.c_app_videodiscover_uv where dt='2019-04-27';[3333, 2222, 1111]
最大最小值函數(shù)擴(kuò)展 max_by min_by
取出最大y值對(duì)應(yīng)的x值:max_by(x, y) → [same as x]
取出最大的n個(gè)y值對(duì)應(yīng)的x值:max_by(x, y, n) → array<[same as x]>取出最小y值對(duì)應(yīng)的x值:min_by(x, y) → [same as x]
取出最小的n個(gè)y值對(duì)應(yīng)的x值:min_by(x, y, n) → array<[same as x]>eg:
presto:adm> select max_by(d_module_type,m_vvpv) from app.c_app_videodiscover_uv where dt='2019-04-27';_col0
-------其他presto:adm> select max_by(d_module_type,m_vvpv,3) from app.c_app_videodiscover_uv where dt='2019-04-27';_col0
--------------------[其他, 搜索, 首頁(yè)]-- 等同于hive中(但沒(méi)有取出m_vvpv)
select d_module_type,m_vvpv
from app.c_app_videodiscover_uv
where dt='2019-04-27'
order by m_vvpv desc
limit 3d_module_type m_vvpv
1 其他 3333
2 搜索 2222
3 首頁(yè) 1111適用場(chǎng)景:video表取播放量最大的幾個(gè)視頻,user表取簽到次數(shù)最多的幾個(gè)用戶等(不需聚合)注:max_by無(wú)法實(shí)現(xiàn)如下聚合取top功能
-- hive 聚合
select d_module_type,sum(m_vvpv) m_vv
from app.c_app_videodiscover_uv
where dt='2019-04-27'
group by d_module_type
order by m_vv
limit 3相關(guān)推薦 33333
2 首頁(yè) 22222
3 搜索 11111
計(jì)數(shù)函數(shù) count count_if
計(jì)數(shù):count()
滿足條件則計(jì)數(shù):count_if()【hive中沒(méi)有,同hive中 sum(if(condition,1,0))】
eg:presto:adm> select count_if(d_module='為你推薦-大屏') from adm.f_app_video_vv where dt='2019-04-27' ;_col0---------6666
近似計(jì)數(shù)函數(shù) approx_distinct
approx_distinct(x) → bigint
? count(distinct x)的近似計(jì)算,較count distinct速度快,約有2.3%的誤差。
eg:select approx_distinct(d_diu) from adm.f_app_video_vv where dt='2019-04-27' and d_module='為你推薦-大屏';select count(distinct d_diu) from adm.f_app_video_vv where dt='2019-04-27' and d_module='為你推薦-大屏';
分組計(jì)數(shù)函數(shù) histogram
返回x值及其count組成的map:histogram(x) -> map(K, bigint)eg:
select histogram(client)
from app.c_app_videodiscover_uv
where dt='2019-04-27'----------------------------{其他=3, IOS=4, Android=4}
七、窗口函數(shù)
窗口函數(shù)和分組排序函數(shù)示例:
row_number() over (partition by u_appname order by share_dnu desc) rank
排序窗口函數(shù)對(duì)比 row_number、rank、dense_rank
1. row_number:不管排名是否有相同的,都按照順序1,2,3…..n1. eg:12345672. RANK() 生成數(shù)據(jù)項(xiàng)在分組中的排名,排名相等會(huì)在名次中留下空位1. eg:12335673. DENSE_RANK() 生成數(shù)據(jù)項(xiàng)在分組中的排名,排名相等不會(huì)在名次中留下空位1. eg:1233456
**將每組分組排序個(gè)數(shù)限定在n以內(nèi)[含n]:ntile(n) → bigint **
eg:
select client,d_module_type,m_vvpv,ntile(3) over (order by m_vvpv desc) rank
from app.c_app_videodiscover_uv
where dt='2019-04-27'client | d_module_type | m_vvpv | rank
---------+---------------+---------+------Android | 其他 | 7777 | 1Android | 搜索 | 6666 | 1Android | 首頁(yè) | 5555 | 1Android | 相關(guān)推薦 | 4444 | 1IOS | 其他 | 3333 | 2IOS | 搜索 | 2222 | 2IOS | 相關(guān)推薦 | 1111 | 2IOS | 首頁(yè) | 999 | 2其他 | 相關(guān)推薦 | 88 | 3其他 | 首頁(yè) | 1 | 3其他 | 其他 | NULL | 3
(11 rows)
返回排名/最大排名:percent_rank() → double
eg:
select client,d_module_type,m_vvpv,percent_rank() over (partition by client order by m_vvpv desc) rank
from app.c_app_videodiscover_uv
where dt='2019-04-27'client | d_module_type | m_vvpv | rank
---------+---------------+---------+--------------------Android | 其他 | 7777 | 0.0Android | 搜索 | 6666 | 0.3333333333333333Android | 首頁(yè) | 5555 | 0.6666666666666666Android | 相關(guān)推薦 | 4444 | 1.0其他 | 相關(guān)推薦 | 88 | 0.0其他 | 首頁(yè) | 1 | 0.5其他 | 其他 | NULL | 1.0IOS | 其他 | 3333| 0.0IOS | 搜索 | 2222 | 0.3333333333333333IOS | 相關(guān)推薦 | 1111 | 0.6666666666666666IOS | 首頁(yè) | 999 | 1.0
(11 rows)
八、數(shù)組、MAP、Json函數(shù)
數(shù)組:
SELECT ARRAY [1,2] -- [1, 2]array_distinct(x) → array
array_max(x) → x
array_min(x) → x
array_sort(x) → array
Map:
map_keys(x(K, V)) -> array(K)
map_values(x(K, V)) -> array(V)
element_at(map(K, V), key) → V擴(kuò)展:取map中的key變成數(shù)組,數(shù)組中查看包含'cid'返回true:contains(map_keys(event_args),'cid') = true
Json:
判斷是否為json:is_json_scalar(u_bigger_json)eg:select is_json_scalar(u_bigger_json)from edw.user_elogwhere dt='2019-04-27'limit 3-------falsefalsefalsestring轉(zhuǎn)json-推薦:json_parse(u_bigger_json)
eg:
select json_parse(u_bigger_json)
from edw.user_elog
where dt='2019-04-27'
limit 3
-- {"u_rank":",0,1,2","u_recsid":",100002,100002,100002","u_rmodelid":",17,17,17",
-- {"u_abtag":"35","u_device_s":"HWMYA-L6737","u_frank":"8","u_package":"com.bokec
-- {"u_abtag":"97","u_all_startid":"1556315003775","u_buglyupdate":"1","u_device_sstring轉(zhuǎn)json-不建議:cast(u_bigger_json as json)
eg:
select cast(u_bigger_json as json) from edw.user_elog where dt='2019-04-27' limit 10;
-- "{\"u_vpara\":\"0\",\"u__\":\"1556317886230\",\"u_callback\":\"jQuery17206597692994207338
_1556317875402\"}"獲取json中某key的值:
select json_extract_scalar(json_parse(u_bigger_json),'$.u_abtag')
from edw.user_elog
where dt='2019-04-27'
limit 30
-- -------
-- 29
-- 21
-- 16
-- ~判斷value是否在json(json格式的字符串)中存在:
json_array_contains(json, value) → boolean
SELECT json_array_contains('[1, 2, 3]', 2)判斷json中是否含有某key
法1:失敗
select json_array_contains('[1, 2, u_p_source, 3]', 'u_p_source')
法2:結(jié)合split和cardinality(獲取array長(zhǎng)度)
SELECT split('[1, 2, u_p_source, 3]', 'u_p_source'),split('[1, 2, 3]', 'u_p_source'),cardinality(split('[1, 2, u_p_source, 3]', 'u_p_source')),cardinality(split('[1, 2, 3]', 'u_p_source'))
["[1, 2, ",", 3]"]
["[1, 2, 3]"]
2
1即:where cardinality(split(u_bigger_json,{{ para }}))>1
擴(kuò)展:string格式的json中取某key的value
select dt,-- function1: split stringsum(cast(split(split(split(split(u_bigger_json,'u_num')[2],',')[1],':')[2],'"')[2] as int)) flower_send_pv,-- function2: string to json, get valuesum(cast(json_extract_scalar(json_parse(u_bigger_json),'$.u_num')as int)) flower_send_pv_2,count(distinct u_diu) flower_send_uv
from edw.user_ilog
where dt= cast(current_date - interval '1' day as varchar)
and u_mod='flower'
and u_ac='new_send'
group by dtdt | flower_send_pv | flower_send_pv_2 | flower_send_uv
------------+----------------+------------------+----------------2019-04-27 | 8888 | 8888 | 5678