We use cookies to give you the best experience on our website. If you continue to browse, then you agree to our privacy policy and cookie policy. Image for the cookie policy date

Syncfusion Big Data: query data from HDFS in Avro format

I have data which stored in HDFS in Avro format.
How can I query it via Syncfusion Big Data Studio by Spark SQL?

Thanks!

3 Replies

AT Aravindraja Thinakaran Syncfusion Team June 23, 2017 07:20 AM UTC

Hi Ilya, 

Thanks for contacting Syncfusion support. 

Please check with below steps to access AVRO files which available in HDFS using SparkSQL from Big Data Studio. 

Step 1: Download and extract the spark-avro_2.11-3.2.0.jar file and copy the jar to below location. 
<Install Drive>:\Syncfusion\BigData\<Install Drive>\BigDataSDK\SDK\Spark\jars\ 
 
Step 2: Restart Spark Thrift server service from Service Manager. 

Step 3: Use “/Data/Spark/Resources/Users.avro” as input file from HDFS to create table in Spark SQL using below command. 
CREATE TABLE Users USING com.databricks.spark.avro OPTIONS (path "/Data/Spark/Resources/Users.avro"); 

Step 4: After table created use below command to view the created table. 
select * from Users; 

Thanks, 
Aravindraja T 



IB Ilya Bo June 26, 2017 03:31 PM UTC

Thank you for your answer!One more thing I would like to clarify:How to specify Avro Schema (.avsc file) correctly? 


AT Aravindraja Thinakaran Syncfusion Team June 27, 2017 01:02 PM UTC

Hi Ilya, 

You can specify a custom Avro schema (.avsc file) using Scala API and access it using Spark SQL as usual. Please follow the below procedure. 
 
Step 1: Create Spark table by specifying Avro schema using Spark Scala tab in Big Data Studio by running the following script, 
 
Step 2: Access the table as usual using Spark SQL. 
            
Note: 
It seems there is some limitation in specifying custom Avro schema in Spark SQL API , so we provided solution by using Scala API to specify a custom schema. 
 
Thanks, 
Aravindraja T. 


Loader.
Up arrow icon