System integration, disparate data sources and competing modeling strategies are among the challenges members of Protocol's Braintrust highlighted for the financial sector.
Chief Data Officer at Moelis & Company
The complexity of taking a systematic approach presents a major hurdle to productionizing data within financial services and fintech products. Systems-based thinking around integration — let alone seamless integration — of people, processes, technology, data and models is highly challenging.
While the five or even seven Vs of data, particularly volume and variety, in addition to data quality present challenges early on in the value chain, it is difficult to integrate new AI/ML models into existing core financial services organizational systems. Large incumbents are dealing with outdated legacy systems, and both buy-side and sell-side financial services institutions have legacy systems for IT and infrastructure that are rarely optimized for data-driven technologies. This is triangulated with recruiting and retaining the highly skilled talent needed for productionization and stamina of new teams to drive an internal cultural shift resulting from lack of familiarity with the underlying IT and tech systems. These factors are embedded in the hindered integration processes needed for proper productionization.
Across the value chain, productionization calls for early investment in systems that enable efficient deployment, maintenance, and adoption of the target data processes. Design work needed for productionization almost always lengthens the time it takes to launch a product, and because of that it is often neglected. But a delayed launch is less frustrating and expensive than a blundered launch. It pays to take the time to design the system well.
Systems thinking and systems-based design are critical to the successful productionization of data, data science, and machine learning within our financial services and fintech ecosystem.
Worldwide Business and Market Development for Banking and Capital Markets at Amazon Web Services
The goal of productionizing data is to continuously shrink the time between data discovery and business value. Unfortunately, many financial firms still face challenges that slow their ability to generate data-led insights. Legacy architectures simply don't provide the agility and flexibility needed to prepare data effectively for analytics and increasingly, machine learning. In particular, alternative or nontraditional data are finding their way into more financial services decision-making.
Most firms have not had the capabilities to operationalize the discovery, processing and modeling of these data sets. In speaking to customers, we discovered they needed a simpler way to do this. AWS Data Exchange is a new service that enables providers of market and alternative data to reach new audiences, while helping data subscribers more easily find data sets with transparent billing and streamlined delivery.
AWS also democratizes access to artificial intelligence and ML across the enterprise. For instance, we offer pretrained AI services that integrate with applications to address common use cases such as personalized recommendations, identity verification, and document processing. For ML developers and data scientists, Amazon SageMaker is a fully managed service that helps build, train and deploy ML models.
We have helped numerous financial institutions, including Intuit, Guardian Life, Nasdaq and the National Australia Bank, modernize data architectures in AWS to analyze data at massive scale, low latency, and reduced cost — without compromising security. The production of insights takes only days instead of months, because they've automated traditional obstacles. I expect to see continued investment and innovation throughout the end-to-end workflow as the industry looks for new ways to move up the data value chain.
Chief Cloud Strategy Officer at Deloitte Consulting LLP
The ability to understand the meaning of the data that's to be productionized. This means having a single version of the data, including a single source of truth. The ability to deal with the data without implying a structure, and the ability to link the data to unstructured resources, such as multimedia and images.
Moving forward we're looking for the self-identification of data, or binding the metadata to the data in a way that they are coupled in terms of meaning, use cases, access policies, security, and compliance. This means that data transferred from one resource to another are able to maintain meaning, security, and access restrictions.
My colleague Dilip Krishna, a managing director at Deloitte & Touche LLP and the CTO of Risk & Financial Advisory, regularly highlights three big challenges he sees financial services clients facing:
- Achieving high-quality, repeatable data processes with the profusion of systems involved in generating data and the large number of systems involved in curating data and generating other metrics, such as risk and financial metrics.
- Needing to leverage unstructured or semi-structured data such as text, voice and system logs, among others, which requires embedding newer technologies and capabilities.
- Understanding how data is classified across the organization for the purposes of safeguarding it, and also ensuring that large, multinational financial services clients can help assess compliance with cyber and privacy rules in the jurisdictions they operate in.
Head of Product Development at Nasdaq Cloud Data Service
One of the biggest challenges that we hear from clients is that data remains locked in silos, undiscoverable or locked down or within purpose-built systems not intended to be shared beyond original use cases. If and when the existing data is found, often it is hard to evaluate for potential monetization opportunities.
To help our clients overcome these challenges, we launched the Nasdaq Cloud Data Service. The cloud is fundamentally reshaping the storage, consumption, analysis, management and distribution of data. The convergence of big data, cloud capabilities and rise of mobile platforms has created the opportunity to meet users where they are. This means the ability to serve to scale all sides of the market, fueling transparency across the spectrum from small fintech firms and entrepreneurs to more-traditional and larger financial players.
NCDS is accessible through a suite of highly scalable, cloud-based APIs. These APIs utilize open-source delivery standards and a software development kit to fast track engineering efforts. This helps eliminate the need for hardware procurement, proprietary protocols, file formats and leased lines, and allows for a more effortless integration of data from disparate sources. The result is a drastic reduction in time to market for customers, and removes the obstacle of locked, undiscoverable data sets.
SVP Product & Corp Development at Ripple
During the early days of blockchain, it was clear that this technology had real potential to transform financial services, starting with payments. However, preliminary feedback from financial institutions was that the inherent openness and transparent nature of public blockchains like bitcoin and XRP, wouldn't comply with the privacy requirements of traditional payments. And so the knee-jerk reaction was to build private blockchains (blockchain not bitcoin), which are essentially databases. A private blockchain negates the fundamental reason that blockchain exists: to provide a means of value transfer in an open and transparent way, without a central counterparty.
The data is clear: To use blockchain technology for payments, you need to embrace its intrinsic qualities. Blended solutions are the key: Protect consumer data with secure and private encryption but allow the value transfer to leverage the openness of the underlying blockchain technology. While the pendulum has shifted back and forth between public and private blockchains over time, I think today we've overcome the first major challenge for this nascent industry. It's evident that public blockchains are the true innovation.
Founder at Burnmark
Collection of data from different sources still remains a huge challenge with most global financial institutions. There are several use cases where more effective productionization of data is being worked on across the data value chain, especially with a focus on trust, scalability and privacy, but collating all the available sources of data still represents the single largest effort from large financial institutions as it stands.
The desire to solve this challenge is quite significant as well, especially for traditional institutions, due to the number of legacy systems without data interoperability and due to the increased pressure from consumers and fintechs to innovate with data while pursuing profitability. Plugging in gaps with systems, collating legacy data, dealing with toxic data and bringing in new sources of data like social media and biometrics, still remains challenging to most financial institutions.
Once high-quality data sources are created and high-quality data brought in, it can then support the productionization process effectively, ensuring modeling and analytics are done on the right set of data and new features like sentiment analysis brought in. We have several banks trying out machine learning on consumers' social media data; but preparation of data, with consent, still remains the biggest challenge to use these technologies to provide an accurate picture of the consumer's behavior.
Chief Analytics Officer at FICO
Trust. Because without it, financial companies have nothing.
Trust is the end game; to achieve it, organizations must deeply understand the business problem; the data, technology and context used to solve it; and how to prove their careful data stewardship to customers.
First, data scientists must quantify the importance of each data element in the decisioning model, because the model's outputs can significantly impact customers and the business. Otherwise science is working in a vacuum.
Second, business executives must have a usable framework for understanding the analytic model. Transparency is required to show how the model arrives at its decisions, based on the data it uses. People don't trust what they can't understand.
Third, consumers must be able to trust that their personal data will be used as promised. Reassurances of "Don't worry — we will do no evil with your personal data" don't cut it.
My data science organization relies on Responsible AI — artificial intelligence that is explainable, ethical and efficient — to build the trust necessary to productionize data in models. These principles are the foundation of trust.
Tactically, we use patent-pending blockchain technology to create immutable records of every decision made when building the model, providing transparency for management and regulators. Blockchain can also serve to codify consumer permission to use their data, in compliance with regulations such as GDPR. With responsible AI bringing transparency to the decisions models make and the data they incorporate, financial companies and fintechs can gain the trust they need to truly productionize data.
See who's who in Protocol's Braintrust. (Updated May 6, 2020)
Questions, comments or suggestions? Email firstname.lastname@example.org.
Kevin McAllister ( @k__mcallister) is a Research Editor at Protocol, leading the development of Braintrust. Prior to joining the team, he was a rankings data reporter at The Wall Street Journal, where he oversaw structured data projects for the Journal's strategy team.
More from Braintrust