Choosing the right cloud database seems like an easy choice, as there are so many great options to choose from in the market today. Many databases are available that will simply work well on analytics that are fairly small and uncomplicated. Long-term platform adoption is rarely considered when you simply have some data to crunch and a hungry business team looking for short-order reports.
However, chief data officers and strategic thinkers know that if you occasionally use a database for analytics, the repeated use can lead to corporate adoption. It leads to data silos within a larger corporation, as business units make individual choices and adopt multiple standards. One of the most important decisions a CDO can make is picking a single analytics platform that meets most of the business needs. If your organization is considering a standard-issue database for that reason, below are five important considerations to note in your evaluation.
The analytics trend today is cloud-first, but some experts believe the future of analytics is hybrid and multi-cloud, and the capability to seamlessly switch workloads wherever they can be effectively executed. Evaluating deployment options allow businesses to choose whether they want the simplicity of cloud, the necessity of on-premise, or the flexibility of hybrid, all with the same solution.
Many database providers only offer a "cloud-first" strategy, which usually means the technology will only support a cloud development strategy. Businesses would lose the ability to deploy on-premises workloads with the same simplicity of cloud workloads. Cloud-first usually means that a customer needs to load all data into a single cloud – the vendor's. This limits the customer's ability to move to a different cloud (in case developers need it) or pull workloads back onto on-premises (in case there are any changes in the regulatory landscape). Finally, if a new platform or technology comes along, businesses may want to migrate their data to a new solution. It's important to study the exit costs involved before closing on a solution.
More than ever, data is often diversely located in databases and data lakes. Cloud databases vary greatly in terms of accessing external data. Some solutions require data to be stored in specific formats in data warehouses and offer no support for data lakes. Still others support data lakes but require multiple tools to do so. Look for a solution that can handle common formats, (like ORC, PARQUET, AVRO, JSON) and leverage those sources into daily analysis with grace and speed. Look for solutions that can reach into other databases in your organization (data virtualization) so that no data is difficult to access.
No analytical workload is the same. Some long-running queries can cause a smooth-running analytical database to come to a screeching halt, particularly when there are a lot of aggregate functions like DISTINCT COUNT, and JOINs. Therefore, it's important to evaluate whether the database supports enough options for improving query performance. It can't just be about adding nodes. Options include workload management (does it allow you to map resources like memory and CPU to queries?), division of compute and storage, query optimization (does the database offer tools that help identify the best way to limit data reads and memory needed to answer the query?), node scaling (can the IT team scale nodes at will and control their size and configuration?).
Think about all the different groups in your organization that may be leveraging data and how deep and divergent their questions or tools may be. The idea is that you don't necessarily want to have to move data for you to create specialized analytics. While business users might be interested in business metrics, data scientists and analysts might be interested in hidden opportunities, trends and patterns. To serve such a broad user-base and a wide range of analytical use cases, the database you choose must contain a broad range of analytics functions. Examples include Time Series (analyze data over set intervals of time), Geospatial (analytics based on latitude, longitude and elevation), in-database Machine Learning (ability to train and deploy machine learning models), Alternate Frameworks (support for languages such as R and Python and interfaces like Jupyter or Zeppelin notebooks).
Cloud databases must provide functionality that helps IT teams and analysts adhere to industry requirements and government mandates and reduce the risk of penalties and data breaches. This includes things like: access control (Tax ID numbers, credit card numbers and personally identifiable information (PII) that are only accessible and managed by the team that owns them and are responsible for them); encryption and security; encryption in motion is essential to cloud operation. Even if your data is in a public cloud, it needs to be encrypted to avoid data breaches. Support for format-preserving encryption (FPE), which doesn't require decryption for analytics to run, is also advisable for cloud databases. Cost control (allow users to spin down compute when not in use), managing copies of data (limit multiple copies of the same data to reduce costs and protect data security).
Don't forget, the analytics requirements you have today are not likely to stick around forever. Business, technology and consumer expectations are evolving constantly, and this is why technology buyers must consider all use cases and possibilities (today its cloud, tomorrow one could be back to on-prem or hybrid) and zero-in on a database that is not only flexible and scalable, but also future-ready.
As a director at Vertica, Steve Sarsfield has held thought leadership roles at Cambridge Semantics, Talend, Trillium Software and IBM. Steve's writings, offering insight and opinion on data governance and analytics, have produced a popular data governance blog, articles on Medium.com and a book titled "The Data Governance Imperative."
Email: steve.sarsfield@vertica.com
Linkedin: https://www.linkedin.com/in/steve-sarsfield/
Twitter: @stevesarsfield
Join our WhatsApp Channel to get the latest news, exclusives and videos on WhatsApp
_____________
Disclaimer: Analytics Insight does not provide financial advice or guidance. Also note that the cryptocurrencies mentioned/listed on the website could potentially be scams, i.e. designed to induce you to invest financial resources that may be lost forever and not be recoverable once investments are made. You are responsible for conducting your own research (DYOR) before making any investments. Read more here.