魯春利的工作筆記,誰(shuí)說(shuō)程序員不能有文藝范?
創(chuàng)新互聯(lián)建站-專業(yè)網(wǎng)站定制、快速模板網(wǎng)站建設(shè)、高性價(jià)比本溪網(wǎng)站開(kāi)發(fā)、企業(yè)建站全套包干低至880元,成熟完善的模板庫(kù),直接使用。一站式本溪網(wǎng)站制作公司更省心,省錢,快速模板網(wǎng)站建設(shè)找我們,業(yè)務(wù)覆蓋本溪地區(qū)。費(fèi)用合理售后完善,10余年實(shí)體公司更值得信賴。
在hive中創(chuàng)建mywork數(shù)據(jù)庫(kù),以后的測(cè)試在該數(shù)據(jù)庫(kù)中進(jìn)行,避免每次都使用default數(shù)據(jù)庫(kù)。
hive> create database mywork; OK Time taken: 0.487 seconds hive> show databases; OK default mywork Time taken: 0.614 seconds, Fetched: 2 row(s) hive> hive> use mywork; OK Time taken: 0.064 seconds hive> create table student(id int, name string); OK Time taken: 0.519 seconds hive>
查看Hive在HDFS上的存儲(chǔ)
[hadoop@dnode1 ~]$ hdfs dfs -ls -R /user/hive drwxrw-rw- - hadoop hadoop 0 2015-12-08 21:37 /user/hive/warehouse drwxrw-rw- - hadoop hadoop 0 2015-12-08 21:36 /user/hive/warehouse/mywork.db drwxrw-rw- - hadoop hadoop 0 2015-12-08 21:36 /user/hive/warehouse/mywork.db/student [hadoop@dnode1 ~]$
Hive支持的數(shù)據(jù)類型如下:
原生類型:
TINYINT 1字節(jié) SMALLINT 2字節(jié) INT 4字節(jié) BIGINT 8字節(jié) BOOLEAN true/false FLOAT 4字節(jié) DOUBLE 8字節(jié) STRING 字符串 BINARY (Hive 0.8.0以上才可用) TIMESTAMP (Hive 0.8.0以上才可用)
復(fù)合類型:
arrays: ARRAY<data_type> 有序字段,類型必須相同 maps: MAP<primitive_type, data_type> 無(wú)序的鍵/值對(duì) structs: STRUCT<col_name : data_type [COMMENT col_comment], ...> 一組命名的字段 union: UNIONTYPE<data_type, data_type, ...>
說(shuō)明:ARRAY數(shù)據(jù)類型通過(guò)下標(biāo)來(lái)獲取值,如arrays[0],MAP通過(guò)["指定域名稱"]訪問(wèn), STRUCT類型通過(guò)點(diǎn)方式訪問(wèn)(如structs.col_name)。
建表示例:
hive> create table employee ( > eno int comment 'the no of employee', > ename string comment 'name of employee', > salary float comment 'salary of employee', > subordinates array<string> comment 'employees managed by current employee', > deductions map<string, float> comment 'deductions', > address struct<province : string, city : string, street : string, zip : int> comment 'address' > ) comment 'This is table of employee info'; OK Time taken: 0.33 seconds hive>
在Hive中各列之間,以及復(fù)合類型內(nèi)部使用了不同的分隔符來(lái)指定,每行數(shù)據(jù)對(duì)應(yīng)一條記錄。
在${HIVE_HOME}/data目錄下創(chuàng)建文件data_default.txt文件,采用默認(rèn)分隔符,內(nèi)容為:
Hive默認(rèn)的字段分隔符為ascii碼的控制符\001,建表的時(shí)候用fields terminated by '\001'。造數(shù)據(jù)在vi 打開(kāi)文件里面,用ctrl+v然后再ctrl+a可以輸入這個(gè)控制符\001(即^A)。按順序,\002的輸入方式為ctrl+v,ctrl+b。以此類推。
說(shuō)明:
1000 員工編號(hào) zhangsan 員工姓名 5000.0 員工工資 lisi^Bwangwu 下屬員工 ptax^C200^Bpension^C200 工資扣除金額(如稅收等) shandong^Bheze^Bdingtao^B274106 家庭住址(struct結(jié)構(gòu)只需指定值即可)
加載數(shù)據(jù)
hive> load data local inpath 'data/data_default.txt' into table employee; Loading data to table mywork.employee Table mywork.employee stats: [numFiles=1, numRows=0, totalSize=83, rawDataSize=0] OK Time taken: 0.426 seconds hive> select * from employee; OK 1000 zhangsan 5000.0 ["lisi","wangwu"] {"ptax":200.0,"pension":200.0} {"province":"shandong","city":"heze","street":"dingtao","zip":274106} Time taken: 0.114 seconds, Fetched: 1 row(s) hive> # 對(duì)于復(fù)合類型數(shù)據(jù)查詢方式如下 hive> select eno, ename, salary, subordinates[0], deductions['ptax'], address.province from employee; OK 1000 zhangsan 5000.0 lisi 200.0 shandong Time taken: 0.129 seconds, Fetched: 1 row(s) hive>
查看HDFS數(shù)據(jù)結(jié)構(gòu)
[hadoop@dnode1 ~]$ hdfs dfs -ls -R /user/hive/warehouse/ drwxrw-rw- - hadoop hadoop 0 2015-12-09 00:00 /user/hive/warehouse/mywork.db drwxrw-rw- - hadoop hadoop 0 2015-12-09 00:00 /user/hive/warehouse/mywork.db/employee -rwxrw-rw- 2 hadoop hadoop 83 2015-12-09 00:00 /user/hive/warehouse/mywork.db/employee/data_default.txt drwxrw-rw- - hadoop hadoop 0 2015-12-08 23:03 /user/hive/warehouse/mywork.db/student [hadoop@dnode1 ~]$ hdfs dfs -text /user/hive/warehouse/mywork.db/employee/data_default.txt 1000zhangsan5000.0lisiwangwuptax200pension200shandonghezedingtao274106 [hadoop@dnode1 ~]$
自定義分隔符:
hive> create table employee_02 ( > eno int comment 'the no of employee', > ename string comment 'name of employee', > salary float comment 'salary of employee', > subordinates array<string> comment 'employees managed by current employee', > deductions map<string, float> comment 'deductions', > address struct<province : string, city : string, street : string, zip : int> comment 'address' > ) comment 'This is table of employee info' > row format delimited fields terminated by '\t' > collection items terminated by ',' > map keys terminated by ':' > lines terminated by '\n'; OK Time taken: 0.228 seconds hive> load data local inpath 'data/data_employee02.txt' into table employee_02; Loading data to table mywork.employee_02 Table mywork.employee_02 stats: [numFiles=1, totalSize=99] OK Time taken: 0.371 seconds hive> select * from employee_02; OK 1000 'zhangsan' 5000.0 ["'lisi'","'wangwu'"] {"'ptax'":200.0,"'pension'":200.0} {"province":"'shandong'","city":"'heze'","street":"'dingtao'","zip":274106} Time taken: 0.101 seconds, Fetched: 1 row(s) hive>
data/employee02.txt文件內(nèi)容為
[hadoop@nnode data]$ pwd /usr/local/hive1.2.0/data [hadoop@nnode data]$ cat data_employee02.txt 1000 'zhangsan' 5000.0 'lisi','wangwu' 'ptax':200,'pension':200 'shandong','heze','dingtao',274106 [hadoop@nnode data]$
說(shuō)明:由于在文本文件中包含有單引號(hào),在load到hive的表之后表示方式為屬性加雙引號(hào),這里的單引號(hào)被認(rèn)為了是屬性或值的一部分了,需要注意。
查看詳細(xì)表定義
# 建表時(shí)為默認(rèn)設(shè)置 hive> describe formatted employee; OK # col_name data_type comment eno int the no of employee ename string name of employee salary float salary of employee subordinates array<string> employees managed by current employee deductions map<string,float> deductions address struct<province:string,city:string,street:string,zip:int> address # Detailed Table Information Database: mywork Owner: hadoop CreateTime: Tue Dec 08 23:10:07 CST 2015 LastAccessTime: UNKNOWN Protect Mode: None Retention: 0 Location: hdfs://cluster/user/hive/warehouse/mywork.db/employee Table Type: MANAGED_TABLE Table Parameters: COLUMN_STATS_ACCURATE true comment This is table of employee info numFiles 1 numRows 0 rawDataSize 0 totalSize 83 transient_lastDdlTime 1449590423 # Storage Information SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe InputFormat: org.apache.hadoop.mapred.TextInputFormat OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat Compressed: No Num Buckets: -1 Bucket Columns: [] Sort Columns: [] Storage Desc Params: serialization.format 1 Time taken: 0.098 seconds, Fetched: 37 row(s) # 建表時(shí)自定義了分隔符 hive> describe formatted employee_02; OK # col_name data_type comment eno int the no of employee ename string name of employee salary float salary of employee subordinates array<string> employees managed by current employee deductions map<string,float> deductions address struct<province:string,city:string,street:string,zip:int> address # Detailed Table Information Database: mywork Owner: hadoop CreateTime: Wed Dec 09 00:12:53 CST 2015 LastAccessTime: UNKNOWN Protect Mode: None Retention: 0 Location: hdfs://cluster/user/hive/warehouse/mywork.db/employee_02 Table Type: MANAGED_TABLE Table Parameters: COLUMN_STATS_ACCURATE true comment This is table of employee info numFiles 1 totalSize 99 transient_lastDdlTime 1449591260 # Storage Information SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe InputFormat: org.apache.hadoop.mapred.TextInputFormat OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat Compressed: No Num Buckets: -1 Bucket Columns: [] Sort Columns: [] Storage Desc Params: colelction.delim , field.delim \t line.delim \n mapkey.delim : serialization.format \t Time taken: 0.116 seconds, Fetched: 39 row(s) hive>
遺留問(wèn)題:
hive> delete from student; FAILED: SemanticException [Error 10294]: Attempt to do update or delete using transaction manager that does not support these operations. hive>
注意事項(xiàng):如果sql語(yǔ)句中含有tab格式的內(nèi)容,則會(huì)出現(xiàn)如下問(wèn)題
新聞標(biāo)題:Hive-1.2.0學(xué)習(xí)筆記(二)Hive數(shù)據(jù)類型
瀏覽路徑:http://m.rwnh.cn/article42/gspiec.html
成都網(wǎng)站建設(shè)公司_創(chuàng)新互聯(lián),為您提供品牌網(wǎng)站設(shè)計(jì)、網(wǎng)站制作、網(wǎng)站設(shè)計(jì)、用戶體驗(yàn)、企業(yè)網(wǎng)站制作、App開(kāi)發(fā)
聲明:本網(wǎng)站發(fā)布的內(nèi)容(圖片、視頻和文字)以用戶投稿、用戶轉(zhuǎn)載內(nèi)容為主,如果涉及侵權(quán)請(qǐng)盡快告知,我們將會(huì)在第一時(shí)間刪除。文章觀點(diǎn)不代表本網(wǎng)站立場(chǎng),如需處理請(qǐng)聯(lián)系客服。電話:028-86922220;郵箱:631063699@qq.com。內(nèi)容未經(jīng)允許不得轉(zhuǎn)載,或轉(zhuǎn)載時(shí)需注明來(lái)源: 創(chuàng)新互聯(lián)