Mining MPesa Data For Fun & For Profit

If you haven’t already, read KRA, Safaricom & You. Go on. I’ll Wait.

That post has raised quite a bit of interest, and so, a follow up.

First of all, there is nothing wrong with data mining. If you are a serious company you hire a guy like me to crunch your data and give you new, non-obvious insights. You will get insights like

  • How to target your products
  • Which to discontinue
  • Which to invest in
  • Whether an advertising campaign is working
  • Usage patterns
  • Purchase patterns
  • Customer patterns
  • etc

What I OBJECT STRONGLY to is government mining our transactional information because we might be evading taxes. And so should you!

Now, let us get back to MPesa. I would like to discuss data mining in a bit more detail.

Again, I’m using MPesa because the numbers from the other providers are of nuisance value.

An MPesa transaction has the following information

  • Date
  • Time
  • Sender phone number
  • Recipient phone number
  • Amount
  • MPesa Outlet

To register for MPesa, or indeed to get your line you provided a raft of information about yourself. The interesting bits are

  • ID Number
  • Name
  • Date of birth
  • Gender

Let us look at that MPesa outlet. An MPesa outlet, obviously, must register itself. Therefore the following information is available

  • Outlet name
  • PIN Number
  • Physical Location
  • Owner name, ID number
  • GPS Co-ordinates *
  • Opening time *
  • Closing time *

The starred items are what I am not sure Safaricom collects, but if i were them, I would.

Now, 6 months of this data is data mining gold. I’d frankly be astonished if Safaricom did not mine this database.

There are some quick, obvious things that you can derive to improve service delivery.

Which MPesa outlets open on time

Given an outlet that claims to open at 8, if the earliest MPesa transaction on a daily basis is between 9 and 9-30 over a continuous period, it is likely that outlet does not open on time

Which MPesa outlets close on time

Same as above. Only for closing time

Which MPesa outlets should be closed

Given you have the GPS co-ordinates, you can position the MPesa outlets on a map. If you find there are four next to each other, A,B,C and D, and A,B and C do on average 30 transactions a day but D just does 5, you can probably close D

Where do you need more MPesa outlets

Example as above. If you find A,B,C and D are processing transactions continuously from opening to closing time i.e. there are no hourly spikes, cross references with the average number of processed transactions across outlets, it is likely they are working flat out in which case you might need more outlets to absorb the load

Which MPesa outlets have a demographic profile

This is more interesting. Since you have the sender’s details you can derive things like what is the modal age of customers at a particular MPesa outlet. By modal age I mean get the age of the sender, and find out how frequently that age occurs.

In other words, you can find in a particular outlet, most visitors are between 25-30 and in another most visitors are 18-23 and in another 40-50.

This is useful information for any competent marketing person. Or a practical person e.g. in the place where most visitors are 40-50 Safaricom can advise the outlets to get chairs for customers to sit on as they wait.

Which are the peak times for transactions

Self explanatory. You might find for example on average an outlet does 10 transactions an hour but at lunch time it spikes to 200. Then it drops back to 10.

You find this outlet cannot handle the spike so customers have to queue.

Dilemma. If you open a second outlet, it will likely be idle. If you do nothing – customer dissatisfaction.

Solution: something like a portable MPesa outlet (a van or something) that can go there at lunch time, absorb the load and then leave)

What is the average time it takes to complete a transactions

Self explanatory. If you remember the initial forms to fill they collected a lot more detail than they do now. Someone must have analyzed these numbers and optimized the process.

And so on. There are tons of other things that you can look out for but those examples should suffice.

Let’s move on to the transaction themselves.

Remember this information is at your disposal

  • Sender name
  • Sender ID number
  • Sender gender
  • Sender age
  • Recipient name
  • Recipient ID number
  • Recipient gender
  • Recipient age
  • Amount
  • MPesa outlet name
  • MPesa outlet location

Armed with a bit of mathematics, economics, psychology mining this information will yield a GOLD MINE of information. Let me re-iterate – anyone with access to both this data and data mining expertise OWNS YOU.

If that alone is a gold mine, Safaricom is sitting on a gold mine next to oil and platinum deposits for the excellent reason that they also have access to your call records.

In other words, they can cross-reference your call and your MPesa records and mine that bad boy still further. Add to this the SMS database and this is paradise.

You can derive a treasure trove of information from this, over and above the examples I gave in my previous post

Over and above who are you sending the money to, there is a lot of context to be gleaned if we can guesstimate why you are sending the money.

Let us take an example of how end to end mining would work.

Let me again repeat– data mining is premised on PROBABILITY, not certainty. Some of the assumptions may be wrong. But usually, you can derive pretty good confidence levels

0721 000000 sends 5,000 to 0722 000000 at 2.00 AM, via his phone.

Let’s get started.

First of all, let us build a profile of both sender and recipient.

0721 00000 maps to John Kamau, aged 37. He has been a customer since 2000.

0722 000000 maps to Jamie Omondi, who is not a male as first thought, but a female, aged 32, a customer since 2003.

Next, let us analyze the context.

A 2.00 AM transaction is unusual. This is unlikely to be paying for something. Let us hop over to the phone logs database. Aha. John and Jamie have in the past made calls to each other.

We can therefore infer that they know each other. Therefore that transfer was probably either some emergency or Jamie had a pressing bill that she needed to pay.

The next bit is to check if there are any subsequent transactions where Jamie is the sender.

Oho! Lipa Na MPesa till number 000000 received a payment of 4,500 from Jamie 5 minutes after she got the money from John.

Have there been any other payments from Jamie to that till number? Yes. On average, twice a moth, over a 6 month period.

From the till number we can determine the business it was registered to. Turns out it is Sky Lounge, a swanky bar.

Have there been any other payments to tills belonging to bars? Yes! 6 other bars / hotels over the same 6 month period.

We can then infer that Jamie probably drinks. Given the profiles of the outlets she drinks at, she probably doesn’t drink Senator, but more likely spirits and cocktails.

So, if Safaricom were decide to license targeted customer profile databases and KBL requests and pays for that, guess whose details would be on that database?

Or if Safaricom decides to do context sensitive advertising. Once Jamie logs in to her Gmail via her Bamba modem, Safaricom can tie her traffic to her number. And can therefore serve appropriate ads (Smirnoff, etc)

Relax, I said IF!

Going back to John.

What other transactions has John made?

John has made at least one transaction every month via Pay Bill to a hair salon. The average amount is 5,000 which means it is unlikely he is paying for himself. There is probably a lady in his life, who he accompanies to the salon.

It also urns out he has used Lipa Karo to 3 different schools. Ergo he either is a father with 3 children or he has 3 dependants he pays school fees for.

John also pays DSTV via MPesa. Premium package (7,000) without fail on the 3rd of each month.

John also pays Access Kenya (10,400) for his home internet connection, also on the 3rd of each month.

John also pays Kenya Power an average of 4,000 a month in power, which says something about where he lives – he likely does not live alone.

His bills say a lot about his financial abilities.

In fact, none of his bills is paid earlier than the 3rd.

Looking closer, on the morning of the 3rd of every month John makes a 30,000 deposit into his MPesa from his bank which he uses to pay his bills. This suggests that likely he has a regular income that clears on the 3rd.

John also makes many payments to Steers. As frequently as 3 times a week, averaging 700. The payments are always in the evening.

This suggests that John eats a lot of take-away. Thus it is unlikely he is living with children (no one feeds kids burgers 3 days a week). This is supported by the fact that his spending at the Steers (700 is pretty much a meal for one).

There is also a payment of 3,000 at the end of every month to a number that does not appear in any of his call logs.

This same number also received the same 3,000 from 4 other different numbers, with the same pattern. No calls.

Who do you send money to but never call? Either some nefarious criminal enterprise or much more likely, a some sort of housekeeper.

But let me not belabour the point. A lot of insight can be derived for data mining, and this is not necessarily a bad thing.

Safaricom probably uses this number crunching to derive things like

  • New products e.g. tariffs
  • Promotions e.g. free calls from minute x
  • Pricing & price adjustments
  • Optimization of infrastructure
  • Competition containment (what is the highest we can charge for inter-network connectivity while still making money, staying clear of the regulator and blacking the eye of other networks)
  • etc

What horrifies me would be government having direct access to that information. That cannot be a good thing!

Here are some tweets i’ve exchanged with the Director Of Corporate Affairs this morning

BTW any lawyers care to chip in on the previous post?

8 thoughts on “Mining MPesa Data For Fun & For Profit”

  1. So the GoK’s only interest in this data would be taxable transactions?
    With their market dominance, this data really is a gold mine – probably of more value than the actual provision of mobile communication.

  2. This is in response to both of your posts on this issue… My rookie legal views are as follows:
    Safaricom can and should give KRA a list of the transaction numbers and charges because that enables KRA to tax them so I don’t agree with your point that such info is useless to KRA.
    But based on your explanation of what data-mining is the above doesn’t seem to fall into that category so Safaricom’s answer was not only unrelated but unhelpful:(
    It is unconstitutional and therefore illegal for them to data-mine : Article 31(d) states that one has a right to the privacy of their communication.
    This right can only be breached in the event of an on-going investigation therefore, if they suspect you KRA legally can get your records but they cannot use your records to initiate their investigation…. This is basically what the the T&C excerpts you posted in your previous post allow them to do; not just give away your info. for funsies.
    However, if Safaricom were to discover you were evading tax, they can forward the proof to KRA (section 5 of the KRA act) so if KRA get Safcom to do it for them it’s totally legal:(

  3. Interesting read here….I have an amateur question that would put the article into full perspective. What are some of the negative ways in which the information mined by KRA could be used ?

  4. very well done post…..its consistent.

    u should do another article on the same…particularly for guys who own android phones (you can guess how much of their personal data is accessible to apps installed: from call logs, to gps locations, sms, internet usage, keyloggers)…all that possibly sent to remote servers….without their consent.

Leave a Reply

Your email address will not be published. Required fields are marked *