The effective use of data is about as fundamental to finance companies as the use of electricity. The creation of useful intelligence about customers, markets, pricing, etc. is the focus of an ever greater pool of talent, budget, heartache and triumph in our financial centers. Some banks are spending more than 10% of their yearly revenue on technology but CFOs fear that nearly half of that spend is being wasted, according to a recent Bloomberg piece. But the key to improving the effectiveness of the new data analytics may lie in making the data itself more intelligent. More specifically, it is the systematic creation and delivery of fresh, actionable intelligent data to the myriad analysts, bankers, customers, and everyone else on the front lines.
How Can Data Be Intelligent?
Generally speaking, creating intelligence boils down to three things:
- Massive amounts of data, structured and transformed, to provide fresh and historical context.
- Relatively straightforward math, no PhDs needed, to shed light on unseen and subtle patterns.
- Smart and inquisitive minds to shape and guide their algorithms towards intelligence.
These three ingredients, well-proportioned and combined, can generate an uninterrupted source of intelligent data that determines whether we succeed or fail.
In the simplest case, data is made more intelligent with the addition of descriptive information about itself, or metadata. If you embed richer metadata and create more structure – data quality, deduplication, etc. – then information workers have less to do, and are more likely to get better, more accurate answers as they ask questions of their intelligent data.
Data preparation and metadata management are already integral parts of the modern data engineering discipline. But the role of data engineers is evolving as more and more data preparation practices can be automated. Data engineers will increasingly have an effect on how data is used and how well it performs under different analytics approaches. Data becomes more intelligent if data engineers can embed intelligence within the dataset as to how best to query the data for optimal performance and efficiency.
Basic data analytics can provide a view of “what” is happening but offering up “why” requires multidimensional and dynamic views of the data. This is comparable to human questioning in which one query leads to another until you end up with multiple perspectives of a topic. Over time, and many topics later, query patterns emerge. If you can continuously optimize those queries, your efforts to solve big challenges will start with a pool of good and relevant questions. Good questions beget great questions that advance intelligent answers.
To best illustrate the advantages of intelligent data, it is useful to examine two major financial use cases that can greatly benefit. First, we will look at fraud prevention in payment systems – preventing payment fraud while improving the customer experience. Second, we’ll consider usage analytics of financial apps – interactions that don’t match a customer’s behavior profile could indicate fraud or churn. These two use cases are mainstays of digital transformation efforts in finance.
Fraud Prevention in Payment Systems
When a banking customer is using a credit card while on vacation, if the bank’s payment system detects something unusual (like a different country of origin), it can choose to decline the card or it could have the customer confirm their identity. Which payment provider provides the better outcome: the bank that alerts you immediately and provides a solution or the bank that rejects the payment out of hand? It’s pretty obvious and the ramifications are huge. After a card is declined, more than a third of consumers will abandon a purchase or go to a competitor instead.
For a customer, it’s about having a good time or being embarrassed and annoyed. For a bank, striking a fine balance between preventing fraud and earning customer trust is critical because erring on either end leads to loss.
- If fraud is allowed, recovering the money is virtually impossible once it has entered the immensely complex global payment network. Less than 25% of fraud losses are recovered.
- Mistaking fraud is more costly than the fraud itself. Losses due to false declines have been projected to reach $443 billion by 2021, which is nearly 70x more than losses from fraud itself.
Traditional rules-based methods for intercepting fraud are not adequate. The need for AI/ML at industrial levels of automation to combat fraud in real time is not only a good approach, but also a necessary one.
But you need to feed AI/ML models. Fundamental to success is access to large quantities of training data. High quality and voluminous data improve the accuracy of the statistical models used. For example, to identify complex, previously unseen fraud strategies, hundreds of billions of transactions and associated data are required to tease out subtle details that trigger a fraud alert.
Feed the Features with Fresh Data Pipelines
Feature engineering is a fundamental component of the function of AI/ML. Data scientists, in collaboration with business analysts and data engineers, combine domain knowledge to identify features (unique pieces of data or signals) from raw data. Analysis and predictions are done on features.
An AI/ML model might have tens of thousands of features that are used to determine if a payment transaction is legitimate. Features are continually being refined as new data and/or new types of fraud are discovered, so a scalable architecture that can deal with this level of continual change is required. How that data is utilized also changes constantly as data scientists query the data based on new features, new models, or more complex investigations.
To provide a detailed, nuanced picture that can accurately assess individual transactions for fraud, hundreds of data pipelines are necessary. One data pipeline might provide current and historical financial transactions, while another might provide biometric data. It could be as mundane as email lists or as exotic as information acquired on the dark web. This wide variety of data sources is used to help enrich the models. To catch ever-changing payment fraud schemes, the data pipelines need to be fresh, meaning the data being used is the right combination of historical data and real-time data.
Customer Usage Analytics
More than two-thirds of data today is generated by individuals, not companies. Each person is unique and how they use digital applications (apps) tells a story about their wants, needs, and aspirations.
Application usage analytics are critical for banks to acquire a deep and subtle understanding of how their customers are interacting (or not) with banking applications. A cloud-based banking application used by tens of thousands of users is going to generate a massive amount of usage data daily. The heterogeneous nature of these scenarios places a complex set of requirements on data infrastructure. The need to ingest and transform large volumes of usage data while handling disparate query requests simultaneously is difficult, even for veteran data engineers.
Building off the data pipeline requirements described above, optimizing queries on those pipelines is critical. At the scale needed to detect fraud and enhance user experience, AI-augmented query optimization is required.
Query Self-Optimization Accelerates User Engagement Maps
It is not unusual these days for a bank to generate billions of application usage records consuming terabytes of storage daily. Often, there will be hundreds of billions of records that are available for querying. Combining this data with other data sources,both external and internal, for instance CRM or census data – is critical to give additional ‘color’ to the data.
Each team within your organization will want their own views of that data. A product specialist would want to know who is using their application and details about that individual,industry, company, position, and location, all of which give additional statistical features about the user. Other stakeholders will require other data sources that you will need to combine with the usage data.
They are all asking questions that lead to another question. To do that well, your data platform needs to self-optimize all those queries using predictive analytics or machine learning. Optimization requires that the datasets live in the same infrastructure to minimize query response time. Using techniques such as data virtualization on datasets of this size are not effective. An ideal data infrastructure can learn query patterns over time from which to automatically centralize data as needed. Automatic query optimization removes the necessity of having an engineer or DBA perform this function manually.
Intelligent Data for Digital Transformation
Understanding human behavior and anticipating fraudulent behaviors are crucial elements to a financial institution’s digital transformation. While many organizations are looking to spend their way to greater outcomes in these areas via more and better tools, those investments may yield greater results if they focus on building greater intelligence into their datasets. When data analysts and data scientists are provided with intelligent data that guides their analysis and algorithms, they can benefit from greater performance, greater efficiencies, and getting the most band for their computing buck.
About the author:
Li Kang, Vice President, North America, Kyligence
Li Kang is the vice president of Kyligence USA operations and also works with technology partners to build joint solutions. Li has extensive experience in the big data and analytics field. Before joining Kyligence, he worked at Oracle, Informatica, and MapR where he helped customers build database, data warehouse, and big data solutions.