Apache Parquet and Avro read/write support

Modified on Wed, 7 Jun, 2023 at 12:49 PM

Introduction


Omniscope supports reading and writing Apache Parquet and Avro files.


Parquet is a popular column-based file format used by Hadoop systems. It is designed to efficiently storage large data sets and has the file extension .parquet.

Avro is a row-oriented file format, developed by Apache.


Reading a Parquet or Avro file


Inside your Omniscope workflow, add a new File input block. Double-click on the block to open the options. Select the location of the parquet file. If the file has the expected .parquet extension Omniscope will automatically pick the Parquet file format. Click the Play button to execute and read the data:



Writing a Parquet or Avro file


In side your Omniscope workflow, add a new File output block. Connect the data that you want to write to your output block:



Double-click on the File output block to open the options. Select the location and name of the file you want to create. Change the Format to Apache Parquet (.parquet file). Click the Play button to write the data:



Limitations


When reading a Parquet file, Omniscope only supports the following logical types: STRING, ENUM, INTEGER, DECIMAL, DATE, TIME, TIMESTAMP, JSON. Other types, such as LIST and MAP are not currently supported. If you need to import data with one or more missing types please get in touch with us, as it may be possible for us to develop support if required.


Was this article helpful?

That’s Great!

Thank you for your feedback

Sorry! We couldn't be helpful

Thank you for your feedback

Let us know how can we improve this article!

Select at least one of the reasons
CAPTCHA verification is required.

Feedback sent

We appreciate your effort and will try to fix the article