S.No |
Query |
Response |
1 |
Is there a command line interface to get data into hadoop that can be scheduled? |
We have several options to achieve this.
· By using Java, we can transfer data from FTP to HDFS directly. Java program can be easily scheduled on regular basis using Oozie. · If files are accumulated through streams of activity (such as logging), Flume will be a good choice. We have a special implementation of Flume that we can provide. · Alternatively, If files from FTP are collected and stored in local system, below Hadoop command can be used for copying data from local to HDFS in command line interface.
Hadoop Command line interface directory : C:\Syncfusion\BigDataSDK\<version>\SDK\Hadoop\bin
hdfs dfs –copyFromLocal <local_ file_ location> <target_hdfs_location> |
2 |
Is there a way to schedule tasks on a regular basis? |
Yes. Hadoop tasks can be scheduled using Oozie on a regular basis. Please refer below link to learn in detail about Oozie. We have provided support for Oozie in our platform. |
3 |
In the hardware FAQ you mention 'new' hardware. Is this a requirement or can we use our existing workstation. They are quite high spec with about 8tb on each with RAID |
You can use your existing workstation. HDFS clusters do not benefit from using RAID for data node storage. HDFS handles replication between nodes by itself. Hence it is not recommended to use RAID on any of data nodes or client machine for the requirement of forming Hadoop cluster. But RAID can be used for name nodes. Please refer following UG link for forming cluster. http://helpbdp.syncfusion.com/bigdata/cluster-manager/cluster-creation |
4 |
How would you configure Hadoop to make use of all of the drives? What we want to do is have the OS drive use SSD, and the data drives be normal large drives. |
By default, with Syncfusion cluster, data node will make use of all fixed type drives of a machine.
Hadoop data nodes can be configured to restrict drives, by changing dfs.datanode.data.dir property of hdfs-site.xml file in advanced setting provided in our cluster manager application when creating cluster. |
5 |
Would it work with an external drive bay? |
Yes. With Syncfusion cluster manager, data nodes can detect all fixed type external drives and we can use it for Hadoop HDFS storage. It is just the volume has to be a fixed volume (and not a transient volume). |
6 |
Would we be able to access Hadoop from non-windows? |
The Syncfusion Big Data Studio is a Windows only tool.
However accessing Syncfusion Hadoop distribution that is running on Windows through native command line interface from non-windows clients is supported just as with any other cluster. We can assist with this. Accessing thrift services such as Spark and Hive thrift servers with our solution is platform independent. We can access it using Java Thrift API from non-windows clients as well. https://cwiki.apache.org/confluence/display/Hive/HiveClient#HiveClient-Thrift |