Phoenix is an open source SQL skin for HBase. You use the standard JDBC APIs instead of the regular HBase client APIs to create tables, insert data, and query your HBase data.
Prerequisites –
1. Java JDK (This demo uses JDK version 1.7.0_67)
Make sure the JAVA_HOME system environment variable points to the JDK. Make sure the java executable’s directory is in the PATH environment variable, i.e., %JAVA_HOME%\bin.
2. Make sure you have installed Hbase on your machine, for that refer my post Hbase Installation in Pseudo-distributed mode.
Install Phoenix –
1. Download Phoenix-4.7.0-Hbase-1.1 and expand the installation tar.
tar -zxvf phoenix-4.7.0-Hbase-1.1-bin.tar.gz
2. Add the phoenix-[version]-server.jar to the classpath of HBase region server and master and remove any previous version. An easy way to do this is to copy it into the HBASE_INSTALL_DIR/lib directory.
3. Add the phoenix-[version]-client.jar to the Phoenix client.
4. Restart Hbase.
5. Run an example to test everything is working fine :
Open the Command Line – A terminal interface to execute SQL from the command line is now bundled with Phoenix. To start it, execute the following from the bin directory:
$ sqlline.py localhost
And if Zookeeper is running externally then , in our case zookeeper is running on master node. Therefore run,
$ sqlline.py <master-hostname>:2181
a. First, let’s create a us_population.sql file, containing a table definition:
CREATE TABLE IF NOT EXISTS us_population ( state CHAR(2) NOT NULL, city VARCHAR NOT NULL, population BIGINT CONSTRAINT my_pk PRIMARY KEY (state, city));
b. Now let’s create a us_population.csv file containing some data to put in that table:
NY,New York,8143197 CA,Los Angeles,3844829 IL,Chicago,2842518 TX,Houston,2016582 PA,Philadelphia,1463281 AZ,Phoenix,1461575 TX,San Antonio,1256509 CA,San Diego,1255540 TX,Dallas,1213825 CA,San Jose,912332
c. And finally, let’s create a us_population_queries.sql file containing a query we’d like to run on that data.
SELECT state as "State",count(city) as "City Count",sum(population) as "Population Sum" FROM us_population GROUP BY state ORDER BY sum(population) DESC;
d. Loading Data : In addition, you can use the bin/psql.py to load CSV data or execute SQL scripts. For example:
./psql.py <your_zookeeper_quorum> us_population.sql us_population.csv us_population_queries.sql
You have created a table in phoenix , inserted the data and run the query.
This is it 🙂