![]() ![]() To configure Splunk Analytics for Hadoop to work with Parquet tables, see Configure Parquet tables.To configure Splunk Analytics for Hadoop to work with Hive, see Configure Hive connectivity. ![]() Splunk Enterprise Search, analysis and visualization for actionable insights from all of your data. Cloud-powered insights for petabyte-scale data analytics across the hybrid cloud. Parquet files created by all tools (including Hive) work with (and only with) ParquetSplitGenerator. Splunk enables customers to use their data to unlock innovation, enhance security, and drive resilience across hybrid and multicloud environments. If you have custom Hive file formats that do not use file-based data split logic, you can implement a custom SplitGenerator that uses your split logic. Any custom Hive files with file-based split logic (such as files created with Hadoop FileOutputFormat and its subclasses) works with the HiveSplitGenerator. Since the default FileSplitGenerator does not work for Hive or Parquet files, Splunk Analytics for Hadoop provides HiveSplitGenerator and ParquetSplitGenerator for Hive and Parquet. The default FileSplitGenerator contains the same data split logic defined in Hadoop's FileInputFormat This means that it works for any data format that can be read by Hadoop's InputFormat implementation (which has same split logic as FileInputFormat). When Splunk Analytics for Hadoop initializes a search for non-HDFS input data, it uses the information contained in the FileSplitGenerator class to determine how to split data for parallel processing. Working with Hive and Parquet data Data Preprocessors ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |