Data, big and small
This year I was privileged to be part of the team that facilitated the 20th annual Ntegra Greenside United States Research Tour. During the week long event we received presentations from thirty-nine start-up companies, three Venture Capital firms (Intel Capital, Andreessen Horowitz and Sequoia Capital) as well as Microsoft and Splunk. This was a great opportunity for me to learn more about the latest data domain developments and innovations coming out of Silicon Valley, and to get to know the eighteen delegates (all senior executives from the UK) who attended the tour. An important aspect of these events is maximising opportunities for delegates and hosts to share new ideas and first-hand experiences, and to forge new relationships, partnerships and collaborations.
Trends and developments associated with the management, manipulation and exploitation of vast amounts of corporate, social and sensor generated data were hot topics. What was previously considered ‘hype’ is becoming increasingly normal as organisations realise the critical importance of data, big and small. The compound effect of Moore’s Law fast approaching ‘critical mass’ and continuing improvements in data processing techniques look set to enable new paradigms of machine learning and artificial intelligence.
New companies have been driving rapid growth of systems that support non-relational and semi-/un-structured forms of data, as well as increasing their analytic capabilities at massive scale and speed. Established big data and analytics solutions are evolving and maturing to bring them into line with enterprise IT standards and a number of start-ups have come about specifically to address gaps in solutions that make them difficult to integrate with existing business systems and operating models. This trend for retrofitting seems likely to persist as the rate of development of new raw capabilities increases. Big data start-ups are contributing, building and integrating components such as security, authentication, fine-grained role based authorisation and business continuity capabilities that customers expect from traditional enterprise relational database management systems. These, previously somewhat overlooked, capabilities are now becoming key enablers and differentiators within the ecosystem of emerging big data technologies, eliminating barriers to enterprise adoption.
As well as working to provide enterprise class systems, companies are building solutions that enable both business users and data scientists to fully realise the value of their data. There is growing demand from business users to have the same self-service access to insights that they get from traditional data warehouse environments. Companies such as Datameer are blurring the lines between traditional business intelligence and big data, enabling business users to discover insights in any data via wizard-based iterative ‘point & click’ analytics and ‘drag & drop’ visualisations, regardless of the data type, size, or source.
Companies are rapidly negating the need for “swarms of experts in white lab-coats” to be continually nursing corporate big data solutions. Self-service, self-discovery and automated commentary (describing why insights and analytic results have come about) are what’s expected by today’s business users.
Business users also want to reduce the time and complexity of preparing data for analysis, when dealing with a variety of data types and formats. Companies, such as MarkLogic and GigaSpaces have put a lot of focus on end user data preparation. Their customers can also dynamically asynchronously scale up or down the amount of storage and compute resources in the databases relative to the larger amounts of information stored in data lakes. Storage of data is comparatively cheap compared to the cost of the compute resource needed to process it so it makes sense to use the elastic provision of resources in the cloud, to ensure that compute is only paid for when it is actually being used. The effort to ‘humanise’ IT (enabling people to intuitively interact with systems as opposed to systems asserting behaviours that enable interaction) is increasing. A number of companies demonstrated how data security can be moved into the background using biometrics and new algorithms to enable faster, non-intrusive authentication and authorisation. Companies like GoodData are providing both the tools and the expertise needed for organisations to collect, analyse and exploit data, allowing users to quickly and easily see the impact of changes made on performance. This leads to a more human approach, to performance improvement, based on feedback loops and proven success.
Adoption of NoSQL technologies and a preference for storing data in unstructured schema-less form (where data is applied to a plan or schema as it’s being pulled out of storage, rather than when it is written) were common themes. NoSQL and ‘schema-on-read’ databases are becoming an established part of the enterprise landscape as the benefits of schema-less database concepts become more recognised and understood.
Growth in the massively parallel processing (MPP) data warehouse segment has been slowing recently and the “death of the data warehouse” has been predicted. However, companies such as Cazena are driving a resurgence in the popularity and use of this technology in the cloud. Their solution provides self-service orchestration of cloud infrastructure in Amazon AWS RedShift and Microsoft Azure SQL Data Warehouse. Cazena uses these environments to provide, what it calls, Data Mart as a Service (DMaaS) alongside Data Lake as a Service (DLaaS) configurations on Hadoop and other schema-less databases. This enables on demand provision of data processing and analytics platforms, with other technologies including Google BigQuery likely to be included soon, giving customers seamless access to best of breed workload engines and heterogeneous infrastructures – something most enterprises would simply not contemplate on premise.
Data volumes from devices connected to the ‘internet of things’ (IoT) is a further driver for petabyte scale growth in the cloud. Established companies such as Google, Amazon Web Services and Microsoft are developing IoT services to enable data to move seamlessly to their cloud based analytics engines. Services that ease the pain of ‘wrangling’ and conveniently storing this data, are enabling companies like Arundo to develop predictive solutions that raise asset utilisation and performance in industrial companies. They do this by combining sensor and transactional data with deep domain knowledge and experience to reduce maintenance costs and avoid unexpected outages using machine learning techniques.
Using machine learning algorithms to enable systems to learn how humans work (not the other way around) helps to spot patterns and connections between activities and performance, leading the way to more innate decision support and, ultimately, autonomous decision making capabilities. Driverless cars, drones and robotics are pushing back the boundaries of what’s possible in this domain. Developing technologies so that interaction is driven by human preferences and needs, rather than technology capabilities, will come to the forefront during the next year. IT literacy may not be a differentiator for individuals soon, but human literacy and understanding will become an essential component of future systems.