husayn gokal
Geneva

essay

Fraud Detection Systems Suck — Here's a Solution Using Quantum Computing

Quantum computing can potentially revolutionize fraud detection systems — here’s how

epistemic status: originally published on medium; reposted here as the canonical home.


Originally published on Medium: fraud-detection-quantum


Fraud detection systems suck — here’s a solution using quantum computing

An image generated by Midjourney depicting a bunch of people on various digital devices in a futuristic, dystopian setting

Imagine this scenario. You are going about your day, minding your own business when suddenly, you get this notification on your phone:

A picture of a fake iPhone SMS detailing a credit card scam for 10,000 AED

If you’re anything like me, your mind would immediately start panicking, and your first instinct would be to call the bank asking about what happened. In most cases, victims of fraud are able to get their money back. However, there are many people who unfortunately weren’t so lucky.

It’s quite clear to me that existing systems for fraud detection aren’t perfect — far from it actually. It was this realization that fueled my determination to make a difference.

But wait, how did I even get to this point? Let’s rewind a bit, shall we?

I have always been someone passionate about trying to solve these kinds of problems in the world. That is why I decided to participate in the NYUAD Hackathon for Social Good in the Arab World — to try and find a solution to a pressing problem in the arab region. However, I didn’t have a problem in mind to solve 😅

And that’s when I came across this statistic: the Federal Trade Commission in the United States reported 441,822 cases of Credit Card Fraud in the year 2022.

Given that the median amount of money lost to credit card fraud that year was $800, that equates to around ~$354 million US Dollars in damages 🤯

I remembered that my dad has personally experienced credit card fraud three times in his life with successively increasing amounts. That is when I knew that I wanted to work on this problem.

I decided to pitch this to my team, and we ended up taking this up as our problem to solve for the event.

A picture of our team at the NYUAD Hackathon for Social Good in the Arab World, 2023

A picture of the main slide for our project Tanbeeh (تنبيه), which literally translates to “Alert” in English

While we didn’t win due to a lackluster presentation, nor did we end up making a working model due to limitations in time and expertise, we were still on the lookout for an opportunity to work on this project again. That opportunity came along in July when we took part in QWorld’s QIntern 2023; an online internship program for experts and enthusiasts alike who want to work on cutting-edge quantum projects alongside experienced mentors.

We ended up working on a completely different approach to the problem and published a research paper that details the entire process. This article is meant to be a summary of our work over those 6 weeks, so I highly recommend you check it out for more details and insights into the process.

Current Methods

There are various methods that are used to monitor and detect fraudulent transactions, such as rule-based detection systems. These systems work by gathering user data related to a transaction and comparing them against rules that you have set, leading to an overall “score”. If that score is past a particular threshold, the transaction will not be allowed to go through or may require manual authentication.

The problem with this method is the lack of adaptability to changing fraud patterns, as well as the limited context of the rules themselves. This can allow for new forms of fraud to go under the radar of these detection systems.

Let’s look at an example here. Let’s say that there is a financial institution that has implemented the rules below for its fraud detection system:

Source: seen.io | An image with a bunch of rules, highlighting a rule titled “Phone is disposable”

Of course, most people probably wouldn’t be using a disposable phone anyway (didn’t even know what that was until I saw this image). But if we have a look at some of the other rules, they are incredibly easy to bypass.

If I were trying to execute a malicious transaction from a residential IP address provided by a local ISP (Internet Service Provider), using a brand new browser profile with the latest version, and not using a private email relaying service, I would be in the clear — a fraud score of 0!

These rules aren’t the most robust for sure, but no matter how many rules and logical connections you end up having, there will always be a way to bypass the system. Not to mention the amount of time it would take to go through that many rules for each transaction.

This is why researchers came up with Supervised and Unsupervised Machine Learning methods for financial fraud detection as a way to get rid of adaptability and context-related limitations, thereby increasing the accuracy of the system.

As it turns out, these systems still end up having a low accuracy, being incredibly sensitive to false positives, and causing customers a lot of hassle in the process.

In short, it appears that there are three fundamental challenges with current financial fraud detection systems:

  • The reduced quantity and quality of datasets that these models are trained on.
  • The lack of contextual relations between features of the dataset.
  • The inability to detect subtle differences in high-dimensional transactional patterns.

Starting off with the first point, datasets such as credit card fraud datasets (which is what we trained our model on) have highly imbalanced cases of fraud vs. non-fraud. Moreover, due to reasons pertaining to the confidentiality of the data, they contain a very limited number of features, and/or are disguised using PCA (Principal Component Analysis) that restricts background information about the data and may affect the models’ performance.

The quantity of these datasets is also an issue, as modern methods like 3-D Secure used by companies like VISA process hundreds of features for each individual transaction that pertains to a user’s profile. Such data is not available online in public datasets for privacy reasons, leading to insufficient data to create a feasible model.

On the second point, pretty much all of the data that financial institutions use to detect fraud are categorical. This poses substantial challenges for models to be able to discern patterns and relationships within the dataset.

Imagine a jigsaw puzzle with irregularly shaped pieces instead of the traditional square or rectangular ones.

In a dataset with numerical data, each data point is like a puzzle piece that fits neatly into a larger picture, making it relatively easy for a model to discern patterns and relationships.

However, with categorical data, each piece is unique in shape, much like the diverse features in a dataset.

Imagine trying to assemble a puzzle with irregular pieces — it’s much harder to see how they fit together, right?

Similarly, models struggle with categorical data because they can’t easily “see” how different categories relate to each other or form a cohesive pattern.

To better facilitate a model’s data processing on such data, techniques such as one-hot encoding or clustering are often used to transform the data.

However, this still doesn’t enable optimum relationality between these features of data. To solve this, our team decided to encode the data into the model as a graph instead!

Now while I’m sure most of you know what a graph is, let’s try to really understand what it is; from a mathematical sense.

Simply put, a graph is used to represent relationships between a set of objects. Each object is drawn out as a circle and is called a “node”, while each relationship is drawn out as a line connecting the two objects, and is called an “edge”.

A good example of a graph is a family tree. Each person in the tree would be represented as a node, and the edges would connect these people with each other.

An example of a family tree

There are some kinds of graphs where things like the length of the edge and distance between other nodes and edges matter too. However, to keep things simple, let’s have a look at a graph that represents an important type of fraud: fraud rings.

The concept is simple. A group of hackers go online and try to find personal information that they can steal from individuals. This can include things like Phone Numbers, Physical Addresses, and even Social Security numbers.

Once this information has been compromised, they can apply for credit cards and loans under an alias while using that information.

They can also use that information to transact from someone else’s credit card or bank account — a common issue with banks in the UAE (as mentioned before, this happened to my dad 3 times 🥲).

To increase the number of bogus transactions that they can initiate while reducing the amount of personal information that they might need, hackers often resort to linking these details between the fake identities that they create.

This forms a ring-like pattern, similar to the image below.

Source: neo4j.com | An image with an example of a fraud ring (the data shown in the image is fake)

Auditors can identify such graphical patterns in a financial institution’s data, exposing such fraud rings.

However, it gets much trickier to do so with a graph that potentially has tens of thousands of nodes, where only a handful of them may be involved in such a fraud ring.

This leads to the third problem that still remains: effectively identifying patterns in high-dimensional data.

I mentioned 3-D Secure earlier, and how it is able to process hundreds of features for every single transaction that goes through it.

However, running a graph-based neural network with TDA (Topological Data Analysis) on that many features would be incredibly computationally expensive, especially with the speeds that today’s FinTech industry requires.

To address this issue, we initially wanted to use an algorithm called QSVC (Quantum Support Vector Classifier) to try and classify the data in a 3D space.

However, due to technical difficulties, we resorted to using a QGNN (Quantum Graph Neural Network) instead.

Before I go on, I do want to clarify that this is a “Quantum” Graph Neural Network and not a “Quantum Graph” Neural Network. They’re two very, very different things 👀

This is essentially a GNN (Graph Neural Network) that runs using a quantum circuit and can be run on a quantum computer. Here is the architecture of our QGNN:

Source: Our paper | The theoretical architecture for our QGNN (Quantum Graph Neural Network)

Let’s start breaking this apart piece by piece. To start off, each graph input into the model represents a single transaction. 28 features out of the total 30 in each graph are processed using PCA, while the remaining two (amount and time) aren’t.

Then, we apply TDA, as well as something called “node clustering” to discriminate graphs based on their distinct elements (in a good way) to help better identify cases of fraud.

As an example, if our dataset had two features named “IP Address” and “Location”, they may be grouped together as IP Addresses directly correspond to the location of a device.

These graphs are then encoded into our quantum circuit using a technique called Angle-Encoding, which essentially encodes our features (in this case, the nodes in our graph) into the “spin” of the qubits.

What’s interesting is that compared to other encoding techniques, this one can only support one data point at a time. However, that does mean that the quantum circuit outputted would have a “constant depth”, making it operable on the limited quantum hardware that we have today.

Next, we use a VQC (Variational Quantum Circuit), also called an Ansatz, to “train” our model. This is the meat of our model, which uses various rotational gates like RX (Rotational-X), RY (Rotational-Y), and CNOT (Controlled-NOT)to find the optimal result. Unfortunately, Quantum Optimizers don’t exist yet, so we have to use classical ones just like in regular machine learning.

Instead of using just one VQC, we actually used two of them. This is a common practice in quantum machine learning, whereby using multiple layers can significantly increase the accuracy of a model. In our case, two layers gave us the best result for a 6-qubit circuit.

Once this is all done, we pool all of our nodes together to input into a Linear Layer. The reason we’re doing this is to be able to achieve a categorical “1” or “0” as a result that our model can output.

Future Work

Getting good-quality data for such an experiment is easier said than done. Realistically, unless a financial institution wants to fund this research directly, getting the data required for a project like this is simply infeasible.

And even if they did decide to do that, the dataset would still be incredibly biased as cases of fraud just don’t happen that often in the real world.

A promising solution appears to be synthetic data — essentially using AI to generate data for another AI.

This is just one of the many things we thought of trying but just didn’t have the time to do. In case you wish to try and improve upon what we have done here, here’s a quick list of things that you can try yourself:

  • Using Neutral-Atom Quantum Computers to physically encode the graph and see if that makes a difference in the performance of our model.
  • Using multi-dimensional QSVC to classify the data using a hyperplane in 3D space, potentially improving upon existing methods such as Factorial Analysis of Mixed Data (FAMD) in a 2D space.

Our work was by no means comprehensive. We faced a lot of limitations with the data, time, and expertise required to pull this off.

Nevertheless, we hope that we can continue to improve upon our initial approach and inspire a new generation of quantum enthusiasts to leverage quantum technologies in tackling some of the world’s biggest problems.

PS: For those who don’t know me — hey! My name is Husayn Gokal, and I’m a 16-year-old quantum enthusiast. I’m a certified IBM Qiskit Advocate (only one in the UAE as of this date) and have been obsessed with the field for about 2 years now. I’m always keen to do research and learn more from my fellow peers :D

PPS: This is my first medium article! Would love some feedback in the comments down below 👇👇

More in this thread

Submit a note on this post

Notes go to a moderation queue, not a public comment thread. The author reviews them in Obsidian. Substantive notes appear on the post under a reader notes section; drive-by complaints do not.