Thursday, January 1, 2015

Salesforce Analytics aka Wave how to upload from Avro or Parquet files

Avro and Parquet files are file formats where user can store the data and schema compressed to one file and this simplifies file maintenance issues along with memory issue of storing files and fetching them from different sources.

The Salesforce Wave or Analytics platform currently seems to accept the data in binary format where data is coming from a csv sort of file and schema is like json schema.

This inherently makes user to have two files one for record and one for schema and amalgate them to upload data to salesforce analytics.

The idea is user should be able to load avro or parquet files in case he has them so he doesnt have maintain two file or create two files from one file.

While increasing use of analytics for end users on hadoop based file systems this provides direct data analysis on big data (a definite to look forward to).

There are tools that can transform data from avro & parquet files to csv  and may be that would be quickest way to upload data to salesforce analytics for hadoop users right now till salesforce opens its API.