Skip links

Unearthing Mineral Owners: A Data Odyssey in New Mexico


Have you ever tried to create a database of the names/addresses of mineral or WI owners? I have. It’s not a party I want to be invited to on a Tuesday – and I’m in the business of creating data. I’ll explain why using New Mexico as an example. 

I love New Mexico. Hatch green chile. Santa Fe. The margs at The Anasazi Hotel (blame my old friend _ Smith for that $20 Mexico Heck Yes), Christmas at the Shed. And the Oil Conservation Division, or the OCD, which regulates oil and gas activity in the state of New Mexico. God bless the OCD. Did you know that the OCD is the next Texas Ad Valorem dataset circa 1997 (yes, it took that long)? I think so. Texas is famous for paying Ad Valorem taxes on producing minerals. Levied at the County level, these taxes, rooted in the Latin phrase according to value, becomes payable only when minerals are producing (as opposed to non producing), and are billed and collected once per year. So if I wanted to embark on the rather pedestrian affair of compiling a list of mineral owners (again, WHY do people pay for this?), Texas offers a convenient solution. Thanks to the county appraisal district office’s provision of XLS spreadsheets brimming with data every November, identifying mineral owners who are actively producing and paying taxes is a straightforward task. 

In the Land of Enchantment, there’s ad valorem tax and regulatory diligence and due process isn’t just a courtesy; it’s a mandated process. Before any regulatory filings—be it spacing, non-standard locations, or poolings—take effect, New Mexico requires companies to extend the courtesy of notification to mineral interest, working interest, or overriding royalty interest owners. After all, altering the value of someone’s estate warrants a heads-up; you can’t just swoop in and drink their milkshake without so much as a by-your-leave. #regulatorydata.

Let’s not mistake regulatory data for the entire menagerie of mineral ownership. As the scope widens, so does the necessity for a more robust database. Queue Martin Brody, “You’re gonna need a bigger boat”—or rather, in this case, database. When it comes to the conveyance of mineral rights, deeds like warranty deeds, quitclaim deeds, or mineral deeds reign supreme. Yet, navigating this Formula One circuit isn’t without its hazards, especially when grappling with grantees boasting names and addresses formatted in ways that are completely jacked and nonsensical (the application of Levenshtein distance is only moderately helpful but that conversation is behind the paywall). The term “grantee” itself traces back to an old French word, “graunter,” denoting “to grant” or “to allow.” But the saga doesn’t end there. Royalty deeds, a separate entity altogether, specifically entitle their bearers to receive royalty payments from mineral production, sans actual ownership of the minerals. And let’s not forget about OG leases, which bestow upon lessees the right to extract minerals from the land under specified terms and conditions—complete with our favorite word, grantor in this case, making a cameo appearance. 


Now I’m sure you’ve found mechanical turk or got a crack team from the Philippines to Asheville ready to transform PDFs into pristine XLS spreadsheets, haven’t you? If not, it might be high time to broaden your horizons. The world is brimming with intellectual prowess, as evidenced in Friedman’s The World is Flat. . 

Picture this: a hefty 7-page document of egyptian cotton, (one single unstructured document or image), and you want to capture 15 data elements from that document and put them in a spreadsheet containing 15 distinct columns. It sounds as straightforward as milking a bull or shoeing a rooster. You read the document, highlight the data elements you’re searching for (name, date, address, legal description, tracts, document type, legal jargon), and slot these elements in the correct column header on your spreadsheet (side note: please don’t pay $600/day for this). Then you would check the spreadsheet ensuring the information is in its rightful place, hit save, and reach for the next document. 

Now riddle me this, how do humans identify an entity or one person, LLC, C-Corp, Trust, etc., address, asset class (mineral interest, working interest, ORRI), contact info, and document type when there are over 20 million unstructured legal and regulatory documents and within those documents are over 30 million names across 15 million unique tract descriptions across just six different county data sources, 200 different county instrument types, and six different regulatory filings, five different data formats, with inconsistent data quality – misspellings, missing data elements, etc.? 

Humans don’t (coming soon: a tee shirt with the slogan ‘humans don’t’). From the bowels of experience, if you endeavored to, one credible estimate is that it would cost you $.40/name or +$9MM for manual entity extraction before standardization, normalization, cleaning, etc. of the data. But that’s not even the worst part. Not even close. One offshore “analyst” can only do ~200 standardizations per day (200 entries using home row keys). 24MM standardizations is roughly 121,000 man days – to just extract the names/addresses, etc. That’s why this is not your Texas Ad Valorem tax dataset. This is the world of unstructured data. Not quite the quantum realm, but still not a party. 

Original Names & Addresses
Persons Identified
Organizations Identified

From over 24 million names & addresses, we linked 1.2 million persons & 2.75 thousand organizations.

And have a process to add data from any source, at any scale.

Economics of entity extraction and linking AFTER acquisition and aggregation of the source documents:

  • 24,265,130 names *$.40/name (name, address, legal description, instrument type, dates= $9,706,052 for manual entity extraction 
  • if an analyst could do 200 standardizations/day:
    • 24,265,130/200= 121,325 days. it would take 400 analysts 300 days to complete just 2 states, county and regulatory sources back to 2000. Not including title, just extraction.

So, what’s the game plan for handling 24+ million entities across courthouse records and regulatory data to generate substantial value for acquisition teams at scale? Let’s face it, the mineral and royalty interests market is tough. And here we stand, peering down the barrel of the largest transfer of wealth in history, courtesy of the aging baby boomer population. What used to be a single Net Mineral Acre is now being divvied up into minuscule fractions among multiple inheritors, yet transaction costs or title remains flat. How do we deploy at scale and not become reliant on marketed or networked deals? With mammoth consolidators, both publicly owned and backed by private equity, how on earth does one even begin to carve out a piece of this colossal $700 billion market at scale? 

It’s a mindbend worth pondering. I believe this much. An asset management team must have a very large pool of prospect opportunities on its plate relative to its projected capital deployment. Put more bluntly, it’s been shown time and time again that when too much money chases too few good deals, the outcome isn’t favorable. You want to increase your funnel of quality deal flow and select the best candidates. Not underwrite to conviction. I’m working my way to discuss just how to do that. But for now, I’m tapped out. We’ll reconvene on this topic in a future blog post.