Syncfusion Big Data: query data from HDFS in Avro format

I have data which stored in HDFS in Avro format.
How can I query it via Syncfusion Big Data Studio by Spark SQL?

Thanks!

3 Replies

AT Aravindraja Thinakaran Syncfusion Team June 23, 2017 07:20 AM UTC

Hi Ilya, 

Thanks for contacting Syncfusion support. 

Please check with below steps to access AVRO files which available in HDFS using SparkSQL from Big Data Studio. 

Step 1: Download and extract the spark-avro_2.11-3.2.0.jar file and copy the jar to below location. 
<Install Drive>:\Syncfusion\BigData\<Install Drive>\BigDataSDK\SDK\Spark\jars\ 
 
Step 2: Restart Spark Thrift server service from Service Manager. 

Step 3: Use “/Data/Spark/Resources/Users.avro” as input file from HDFS to create table in Spark SQL using below command. 
CREATE TABLE Users USING com.databricks.spark.avro OPTIONS (path "/Data/Spark/Resources/Users.avro"); 

Step 4: After table created use below command to view the created table. 
select * from Users; 

Thanks, 
Aravindraja T 



IB Ilya Bo June 26, 2017 03:31 PM UTC

Thank you for your answer!One more thing I would like to clarify:How to specify Avro Schema (.avsc file) correctly? 


AT Aravindraja Thinakaran Syncfusion Team June 27, 2017 01:02 PM UTC

Hi Ilya, 

You can specify a custom Avro schema (.avsc file) using Scala API and access it using Spark SQL as usual. Please follow the below procedure. 
 
Step 1: Create Spark table by specifying Avro schema using Spark Scala tab in Big Data Studio by running the following script, 
 
Step 2: Access the table as usual using Spark SQL. 
            
Note: 
It seems there is some limitation in specifying custom Avro schema in Spark SQL API , so we provided solution by using Scala API to specify a custom schema. 
 
Thanks, 
Aravindraja T. 


Loader.
Up arrow icon