Open standards will have a huge impact on driving innovation in banking. Learn the status in the U.S. – and the bold new opportunities open standards are set to usher in.
The Transform Technology Summits start October 13th with Low-Code/No Code: Enabling Enterprise Agility. Register now!
Let the OSS Enterprise newsletter guide your open source journey! Sign up here.
The Apache Software Foundation (ASF) this week updated an open source Apache Drill tool that enables end users to query multiple data sources using SQL — without waiting for enterprise IT teams to create schemas and set up pipelines.
End users can download Drill 1.19 to launch queries against Apache Cassandra, Elasticsearch, and Splunk platforms, in addition to querying XML files and REST application programming interfaces (APIs) without any schema required.
Other capabilities include support for the Avro protocol plugins based on the Apache Kafka messaging platform; Apache Airflow software for managing workflows; integrated password vaults to secure credentials; and Linux ARM64 systems.
Apache Drill first emerged as a SQL-based query engine designed to enable end users to interrogate data stored in NoSQL Apache Hadoop platforms. Since then, the number of data sources has steadily increased to the point that end users are employing the tool to interrogate data wherever it resides, said Charles Givre, vice president of Apache Drill and CEO of DataDistillr, a provider of SQL query tools based on Apache Drill.
That’s critical because organizations struggle to aggregate all their data within a single data warehouse, Givre added. “It’s practically impossible to get all your data in a data lake,” he said.
Just as problematic, there’s usually a significant time delay between when new data is created by an application and when that data becomes available in a data warehouse or data lake, Givre said. But Apache Drill makes it easier to launch SQL queries against the freshest set of data available, regardless of where it resides, he said.
In some cases, data science teams are setting up complex processes to analyze datasets when they could accomplish the same tasks more easily using Apache Drill to join two or more datasets without having to ever move any data, he added.
IT organizations have for some time been trying to strike a balance between centrally managing data and enabling end users to interactively query data as they see fit. In many cases, end users have gotten around IT departments by setting up their own platforms and query tools. Beyond governance issues that might create, the data a business unit is employing to make decisions is usually out of sync with the data the rest of the business relies on.
Most enterprise IT teams don’t have the political capital required to ban business units from using a given tool, however. Instead, Givre said they should focus on striking a balance between end users’ need to easily query data as it becomes available and the need to manage terabytes of historical data that might reside in a data warehouse.
Regardless of the path organizations opt for when it comes to managing data, the number of tools and platforms for querying data is continuing to explode. The issue now is determining to what degree organizations should limit access to tools sanctioned by their IT team.
Hear from CIOs, CTOs, and other C-level execs on data and AI strategies
© 2021 VentureBeat. All rights reserved.
We may collect cookies and other personal information from your interaction with our website. For more information on the categories of personal information we collect and the purposes we use them for, please view our Notice at Collection.