Hive(10AmTo1:00Pm) Lab1 notes : Hive Inner and External Tables
SreeRam Hadoop Notes
by
3y ago
hive> create table samp1(line string); -- here we did not select any database.    default database in hive is "default".     the hdfs location of default database is    /user/hive/warehouse  -- when you create a table in default database, under warehouse location, one directory will be created with table name.    in  hdfs,    /user/hive/warehouse/samp1 directory is created. hive> create database mydb;   when a database is created, in warehouse location, with name database and extension ".db" , one directory will be crea ..read more
Visit website
Pig Video Lessons
SreeRam Hadoop Notes
by
3y ago
Pig class Links: PigLab1 Video: https://drive.google.com/file/d/0B6ZYkhJgGD6XTzVHbzBYUFY0a1k/view?usp=sharing PigLab Notes1: https://drive.google.com/file/d/0B6ZYkhJgGD6XeU9tUF9aS3QxUWc/view?usp=sharing PigLab2 Video: https://drive.google.com/file/d/0B6ZYkhJgGD6XNnhvZUN5eTJSaHM/view?usp=sharing PigLab2 Notes: https://drive.google.com/file/d/0B6ZYkhJgGD6Xd0ZHb1hWZVhjbmc/view?usp=sharing PigLab3 Video: https://drive.google.com/file/d/0B6ZYkhJgGD6XY3ZTWFFZZ3VMcnM/view?usp=sharing PigLab3 Notes: https://drive.google.com/file/d/0B6ZYkhJgGD6Xb1k1aklZOXdjaUE/view?usp=sharing PigLab4 Video[its a ..read more
Visit website
Hive Partitioned tables [case study]
SreeRam Hadoop Notes
by
3y ago
[cloudera@quickstart ~]$ cat saleshistory 01/01/2011,2000 01/01/2011,3000 01/02/2011,5000 01/02/2011,4000 01/02/2011,1000 01/03/2011,2000 01/25/2011,3000 01/25/2011,5000 01/29/2011,4000 01/29/2011,1000 02/01/2011,2000 02/01/2011,3000 02/02/2011,8000 03/02/2011,9000 03/02/2011,3000 03/03/2011,5000 03/25/2011,7000 03/25/2011,2000 04/29/2011,5000 04/29/2011,3000 05/01/2011,2000 05/01/2011,3000 05/02/2011,5000 05/02/2011,4000 06/02/2011,1000 06/03/2011,2000 06/25/2011,3000 07/25/2011,5000 07/29/2011,4000 07/29/2011,1000 08/01/2011,2000 08/01/2011,3000 08/02/2011,5000 09/02/2011,4000 09/02/2011,10 ..read more
Visit website
Pig : Udfs using Python
SreeRam Hadoop Notes
by
3y ago
we can keep multiple functions   under one program(.py)  transoform.py ------------------------- from pig_util  import outputSchema @outputSchema(name:Chararray) def  firstUpper(x):    fc = x[0].upper()    rc = x[1:].lower()    n = fc+rc    return n @outputSchema(sex:Chararray) def  gender(x):    if x=='m':       x = 'Male'    else:       x = 'Female'    return x @outputSchema(dname:chararray) def dept(dno):    dname="Others"    if dno==11:   ..read more
Visit website
Python Examples 1
SreeRam Hadoop Notes
by
3y ago
name = input("Enter name ") age = input("Enter age") print(name, " is ", age, " years old ") ----------------------------------- # if a = 10 b = 25 if a>b:   print(a , " is big") else:    print(b , " is big ") ----------------------------- # nested if a = 10 b = 20 c = 17 big = 0 if a>b:   if a>c:     big=a   else:     big=c elif b>c:   big=b else:   big=c print("Biggest is ", big) ---------------------------------- # if and loop combination: lst = [10,20,34,23,12,34,23,45] big = lst[0] for v in lst:   if v>big ..read more
Visit website
Spark : Spark streaming and Kafka Integration
SreeRam Hadoop Notes
by
3y ago
steps:  1)  start zookeper server  2)  Start Kafka brokers [ one or more ]  3)  create topic .  4)  start console producer [ to write messages into topic ]  5) start console consumer [ to test , whether messages are stremed ]  6) create spark streaming context,     which streams from kafka topic.  7) perform transformations or aggregations  8) output operation : which will direct the results into another kafka topic. ------------------------------------------       following code tested with ,   ..read more
Visit website
Pig : UDFs
SreeRam Hadoop Notes
by
3y ago
Pig UDFS ----------   UDF ---> user defined functions.      adv:        i)  custom functionalities.       ii)  reusability.  Pig UDFs can be developed by     java    python     ruby     c++     javascript     perl step1:    Develop udf code. step2:    export into jar file    ex: /home/cloudera/Desktop/pigs.jar step3:    register jar file into pig.  grunt> register Desktop/pigs.jar step4:    create t ..read more
Visit website
Pig : Cross Operator to Cartisian
SreeRam Hadoop Notes
by
3y ago
 Cross:  -----    used cartisian product.    each element of left set, joins with each element of right set.   ds1 --> (a)           (b)           (c)   ds2 --> (1)           (2)   x = cross ds1, ds3    (a,1)    (a,2)    (b,1)    (b,2)    (c,1)    (c,2) emp = load 'piglab/emp' using PigStorage(',')     as (id:int, name:chararray, sal:int,   sex:chararray, dno:int); task:    f ..read more
Visit website
Pig : Order [ Sorting ] , exec, run , pig
SreeRam Hadoop Notes
by
3y ago
 order :-    to sort data (tuples) in ascending or descending order.  emp = load 'piglab/emp'      using PigStorage(',')      as (id:int, name:chararray,     sal:int, sex:chararray, dno:int);  e1 = order emp by name;  e2 = order emp by sal desc;  e3 = order emp by sal desc, sex, dno desc;  --------------------------------------- sql:    select * from emp order by sal desc limit 3;  e = order emp by sal desc;  top3 = limit e 3; limitation:  101,aaa,30000,.....  102,bbb,90000 ..read more
Visit website
Pig : Joins
SreeRam Hadoop Notes
by
3y ago
[cloudera@quickstart ~]$ hadoop fs -cat spLab/e 101,aaaa,40000,m,11 102,bbbbbb,50000,f,12 103,cccc,50000,m,12 104,dd,90000,f,13 105,ee,10000,m,12 106,dkd,40000,m,12 107,sdkfj,80000,f,13 108,iiii,50000,m,11 109,jj,10000,m,14 110,kkk,20000,f,15 111,dddd,30000,m,15 [cloudera@quickstart ~]$ hadoop fs -cat spLab/d 11,marketing,hyd 12,hr,del 13,fin,del 21,admin,hyd 22,production,del [cloudera@quickstart ~]$ $ cat > joins.pig  emp = load 'spLab/e' using PigStorage(',')     as (id:int, name:chararray, sal:int,        sex:chararray, dno:int);  dept = load 's ..read more
Visit website

Follow SreeRam Hadoop Notes on FeedSpot

Continue with Google
Continue with Apple
OR