做直通車任務(wù)的網(wǎng)站做個網(wǎng)頁需要多少錢?
利用二手車數(shù)據(jù)進行可視化分析
- 查看原始數(shù)據(jù)
- 去除重復數(shù)據(jù)
- 需求分析
- 1.統(tǒng)計全國總共有多少量二手車,用KPI圖進行展示
- 2.統(tǒng)計安徽總共有多少量二手車,用KPI圖進行展示
- 3.統(tǒng)計合肥總共有多少量二手車,用KPI圖進行展示
- 4.取最貴的10輛二手車信息,用列表圖展示
- 5.各品牌二手車數(shù)量,取Top10用餅圖展示
- 6.各品牌價格最貴的二手車
- 7.各品牌價格最低的二手車
- 8.各城市二手車數(shù)量,取Top10用餅圖展示
- 9.幾幾年款的二手車平均價格走勢,取最近10年的數(shù)據(jù)用折線圖展示
- 10.統(tǒng)計每座城市二手車數(shù)量,并通過地圖展示
- 利用生成的表在FineBI中繪圖
- 鏈接數(shù)據(jù)庫
- 傳入數(shù)據(jù)庫中的表
- 新建分析進行繪圖
查看原始數(shù)據(jù)
查看MySQL中爬取完成的數(shù)據(jù)發(fā)現(xiàn)有十萬多條,接下來清理一下這些數(shù)據(jù)看看有沒有重復的數(shù)據(jù)將這些數(shù)據(jù)剔除
select car_id from car_info group by car_id
可以看到不重復的數(shù)據(jù)有七萬多條,有將近三萬條的重復數(shù)據(jù),接下來就去除這些重復數(shù)據(jù)
去除重復數(shù)據(jù)
先查看一下重復的數(shù)據(jù)是什么樣的
查看一下重復車型的car_id 有好多重復10次以上 復制一些看看這些車的信息
select car_id,count(*) cnt
from car_info
group by car_id
having cnt>2
order by cnt descc6a6fa03344447c1
d6d5b6c63184f41cselect * from car_info where car_id = 'c6a6fa03344447c1'
可以看到一些一樣的車不止掛在一個地方而是掛在離一個城市周邊的地區(qū)都會掛牌,導致重復的數(shù)據(jù)很高,現(xiàn)在我們根據(jù)車輛的一些信息分類然后將數(shù)據(jù)進行清洗。
按照車型,公里數(shù),首付,總價這些車子定量不變的信息進行分類,可以看到清洗出來了8萬條數(shù)據(jù),這些數(shù)據(jù)都是不相同的,因為有些車子雖然car_id相等 但是車子的型號等信息不相等。
# 去除重復數(shù)據(jù)
select car_id,car_info,car_year,car_mile,car_price_total,car_price_pyment
from car_info
group by car_id,car_info,car_year,car_mile,car_price_total,car_price_pyment
接著將清洗好的數(shù)據(jù)保存到另一張表下,避免修改源數(shù)據(jù),
# 將清洗好的數(shù)據(jù)保存在另外一張表下 因為使用group分組 所以用max就可以很容易取到city
create table if not exists car_detail as
(select car_id,car_info,car_year,car_mile,car_price_total,car_price_pyment,max(city_name) as city,max(city_code) as city_code,max(created_at) as creat_time,max(updated_at) as update_time
from car_info
group by car_id,car_info,car_year,car_mile,car_price_total,car_price_pyment)
需求分析
畫圖需要使用python繪圖或者使用專用軟件繪圖
使用python繪圖可以使用pyecharts
這里我使用軟件fineBI繪圖
1.統(tǒng)計全國總共有多少量二手車,用KPI圖進行展示
create table if not exists ques1 as
(select count(distinct car_id) as all_car
from car_detail)
2.統(tǒng)計安徽總共有多少量二手車,用KPI圖進行展示
create table if not exists ques2 as
(select count(distinct car_id) as anhui_car
from car_detail
where city_code in ('hf','wuhu','bengbu','bozhou','hn','mas','huaibei'
,'tongling','anqing',"huangshan",'chuzhou','chizhou','fy','suzhou','la','xuancheng'))
3.統(tǒng)計合肥總共有多少量二手車,用KPI圖進行展示
create table if not exists ques3 as
(select count(distinct car_id) as hf_car
from car_detail
where city_code = 'hf')
4.取最貴的10輛二手車信息,用列表圖展示
create table if not exists ques4 as
(select city,car_info,car_id,car_mile,car_price_total
from car_detail
order by car_price_total desc
limit 10)
5.各品牌二手車數(shù)量,取Top10用餅圖展示
create table if not exists ques5 as
(select bend,count(*) as cont
from
(# SUBSTRING_INDEX(str,delim,count) 類似與split的函數(shù)
select SUBSTRING_INDEX(t1.car_info,'-',1) as bend
from car_detail as t1) as tt1
group by bend
order by cont desc
limit 10)
6.各品牌價格最貴的二手車
create table if not exists ques6 as
(select bend,ROUND(max(tt1.car_price_total),2) as max_price
from (select SUBSTRING_INDEX(t1.car_info,'-',1) as bend,t1.car_price_total
from car_detail as t1) as tt1
group by tt1.bend
order by max_price desc)
7.各品牌價格最低的二手車
create table if not exists ques7 as
(select bend,ROUND(min(tt1.car_price_total),2) as min_price
from (select SUBSTRING_INDEX(t1.car_info,'-',1) as bend,t1.car_price_total
from car_detail as t1) as tt1
group by tt1.bend
order by min_price)
8.各城市二手車數(shù)量,取Top10用餅圖展示
create table if not exists ques8 as
(select city,count(*) as cont
from car_detail
group by city
order by cont desc)
9.幾幾年款的二手車平均價格走勢,取最近10年的數(shù)據(jù)用折線圖展示
create table if not exists ques9 as
(select t1.car_first_year as car_year,round(avg(t1.car_price_total),2) as price
from
(select year(car_year) as car_first_year,car_price_total
from car_detail) as t1
group by t1.car_first_year
order by car_year desc)
10.統(tǒng)計每座城市二手車數(shù)量,并通過地圖展示
create table if not exists ques10 as
(select city,count(*) as car_number
from car_detail
group by city
order by car_number desc)
利用生成的表在FineBI中繪圖
鏈接數(shù)據(jù)庫
傳入數(shù)據(jù)庫中的表
新建分析進行繪圖
繪圖完成