Proper Case Format Provider for IdM

Today I am posting yet another custom format provider suitable for Identity Management projects. This provider comes with a little story; I hope you’ll enjoy it.

Cult of personality

The majority of identity management projects are centered on a person’s identity as opposed to on a computer, a printer or any other inanimate object. Therefore administrators are quite often presented with familiar sets of biographical data attributes such as first and last names.

By nature of identity management project, any change made to the synchronized data is replicated and propagated to many data-sources; as a result that data gains greater visibility. In the same time by nature of biographical data it is a matter of personal vanity. The combination of those factors could be very politically charged and consequently could produce a serious hindrance to the seemingly pure technical project.

The code that I am about to present was conceived during one of my IdM engagements where I was facing a challenge of reading data from a legacy mainframe system and piping it out to dozens of directories around the enterprise.

"Legacy" is another way of saying old

Mainframes, with their heavy-lifting capabilities, have an important role in modern enterprise. We call it "legacy". We have a whole industry supporting "legacy" and making it more open and available to current trendy technologies. "Legacy", almost by definition, is resistant to change, and consequently there is whole other side of the industry that is working against the trend of opening it up; do you remember Y2K projects? Good times! Good times! I think that was the time when word "legacy" settled in the every-day IT vocabulary.

The problem that I’ve personally faced, while interacting with "legacy", was rather simple to describe. Whole HR system user’s biographical information was stored in all CAPS. (Visualize Homer Simpson’s – DOH!) There is nothing wrong with this, if you want to represent SCREAMING in computer writing, but it was not quite acceptable for re-usage throughout the whole enterprise. System admins of eDirectory and AD didn’t like the idea of overwriting their carefully formatted -by- hand data. So, Mr. Homer Simpson in AD was stored as SIMPSON, HOMER in mainframe. Big deal! How hard is it to lower the string and capitalize first letter of each word? Right? Wong!


C’est l’histoire De ‘Monique

As I’ve thought at the time, everything was going very well. My code was lean, easy to understand and performant. However after closer examination I have found high-up positioned person with last name De’Monique (name was slightly modified to reflect essence of the problem). Why was De’Monique a problem? Let’s take a closer look at the algorithm I’ve originally proposed. Take the string "DE’MONIQUE" and lower it, which will result in "de’monique". Now take the first letter of the string, which is "d" in this case, and capitalize it to make it "D", append it back to the rest of the string which will result in "De’monique". Doh!

Mrs. Big Director De’Monique was not very happy about lowering her name to a mere mortal sloppy formatting by some piece of code chomping string away somewhere in the guts of the IT machine. With the name like De’Monique she disserved special attention! I also received a protest from a Mr. Van Der Problem, Vice President; whose name became "Van der problem", which is inappropriate for a person of his caliber. As you can imagine there are plenty of hyphenated names that will also suffer from blunt lowering and capitalizing of the first character of entire string. So the opportunity to enhance my code presented itself (once again). I had to write something more clever than simple first letter capitalizer.

So the method of splitting the string on spaces and processing it, as well as splitting it on apostrophes and hyphens with recursive call to the capitalization routine was born.

Going "all-in"

Once I figured out the method to split the string on its components and analyze each component individually, I was in a better place. Mrs. De’Monique was happy; Van Der Problem wasn’t problematic any longer. My formatter was lean and mean formatting machine ready for the primetime. Life was good.

Have I mentioned that ALL VALUES IN HR MAINFRAME SYSTEM ARE STORED IN ALL CAPS? It was (and probably still is)! My inner-geek was very angry at the person who decided that ALL CAPS is the way to save space on the magnetic tape in 1963. So what did I do? In my infinite wisdom I’ve unleashed my mean reformatting algorithm on all values of HR data-source. I bet you already can feel that there will be a problem with this decision?

Degree of Vanity

I don’t know about you, but personally I am not a big fan of a traditional school-going; even thinking about years and years of going to school making me a little anxious. I can appreciate people who did pay their dues and stayed in school for a long time. Perhaps they didn’t really know who they wanted to be when they grew up, or maybe they loved that one particular field of study so much they had to get their PhD. I can see that when you’ve paved your way to the title of "doctor" you want to display it proudly and you will pick up that phone receiver and make that call to your local help-desk to make absolutely sure that you are displayed in global address list as Homer J. Simpson, PhD, and no other way.

So let’s take a look at this scenario via the prism of my lean and mean formatter. Remember I’ve attempted to parse ALL LEGACY CAPITALIZED VALUES with it. By now it could take apart the string to its compound components and properly capitalize all first characters where and when needed. So in the case of PhD, as expected, it will "flatten it" to "Phd"; Doh! …which will generate that dreaded helpdesk call… Doh!

So how can we fix this? The answer is a mysterious and all powerful RegEx. I had to write a regular expression "formula" that will look for patterns within the string and determine whether the string might contain predefined acronym such as PhD that deserves special attention. Outside of PhDs, which is a rather special capitalization case, there are plenty of other capitalization artifacts in mighty English language. So say hello to Regular Expressions:


A regular expression string that matches degrees and professional designations and ensures that they are in all caps this will match:


First and Last Caps

A regular expression string to match PhD or LegD and any variants with periods


Roman Soldiers count-off

Roman Soldiers – count off! [ai; ai-ai; ai-ai-ai; ai-vee; vee; vee-ai…] (the joke is dry and wry because it supposed to be. It’s Monty Python, if anybody cares to know)

As you might guess by now, roman numerals in people’s names present another problem for us when formatting those names. We all appreciate tradition, so when your name is Homer J. Simpson IV it is pronounced "fourth" not "[ai – vee]"; hence you will probably want to see it displayed as "IV" and not "Iv".

So how hard is it to teach my lean and mean format provider to see Roman Numerals? Thankfully it is not too hard with some help from our already mentioned friend RegEx. Here is the RegEx formula that I’ve used to determine Roman Numerals in strings.

Roman Numerals:

A regular expression to match Roman numerals


Thankfully I was not asked to represent number zero in Roman Numerals.

The rise and fall of the ancient clan of MacHinist

In this article I have already mentioned De’French, Van Dutch with Der Germans and Roman (more or less) now let’s talk about two other great nations -the Irish and the Scottish. We are already OK with O’Lastname kind of names, however other Scottish and Irish patronymic surnames frequently having ‘Mc’ or ‘Mac’ prefixes appended. (Can you hear the drum-roll sound emerging?)

To illustrate let’s take a look at "MacDonald" which is one of the most popularized surnames of this type that come to my mind. The problem with that name, as you might already see, comes in the form of double capitalization without any separation between the prefix and root of the word. MacSimpson will not be very happy if you will create a display name Homer J. Macsimpson. It looks wrong, it feels wrong, and therefore it’s just wrong! I would call help-desk right away! Doh!

I had to come back and seek help in the world of regular expressions once again. The syntax that I’ve used this time was a little more tricky


A regular expression to match the Mc’s and Mac’s of the world, but NOT MCSE MCSD or MCSA. This expression uses negative look ahead to rule out those possibilities.



MacSimpson case was successfully resolved! However, do you remember that I was running this algorithm on ALL CAPITALIZED DATA FROM LEGACY HR? Well, it is about to bite me in the rear.


First Name: HOMER — > Homer
Last Name: SIMPSON –> Simpson
Title: MACHINIST — > MacHinist

I was looking at my data and wondering why am I seeing so many people with the same last name? All those MacHinists! It must be whole clan that moved in to work here (I began to wonder how the tartan of MacHinist clan looked); and why are all those last names were listed in the "title" field? W-a-ait a minute… Doh!

This problem almost entirely killed the whole re-formatting idea. When working with a pool of thousands and thousands of last names (or applying the same formatting rule to all strings) you should have last names from all corners of the planet. Spotting Irish/Scottish last names and distinguishing them from any other last names (or random words) that could legitimately start on Mac or Mc (like "machinist") is mission-impossible.

There is no spoon

The solution for this problem is not very simple; frankly I could not solve it with the IFormatProvider interface implementation alone. What I did was a two-prone approach.

Prong number one:

Proper case format provider was adjusted to have the "McOption". It could be turned on or off. That allowed me to choose between ways of capitalization of certain strings. So the "title" would not have the McOption turned on and the "last name" would. Overall introduction of McOption provided greater degree of flexibility. After separation of McOption the "proper case" capitalization worked very well, with very few exceptions.

Prong number two:

I have found handful of surnames that had to be reverted back to its manually formed capitalization. The example of that is the last name "Machado". With the McOption being turned on last name Machado would be capitalized as MacHado, which is not desirable result. My first attempt to resolve this conundrum was an expansion of the regex syntax, and then I’ve looked into creation of an exclusion list.

At this moment I’ve remembered: "Do not try to bend the spoon. That’s impossible. Instead only try to realize the truth. There is no spoon". So once I’ve realized that I really can’t predict all possible exception and formulate them in regular expression of IFormatProvider, my identity management hat was on. I hate answer "no". In fact my mantra is: "Answer to all technical questions is "Yes". Real question is "how much time you want to spend?"

By now, with the knowledge that my mean and lean ProperCase formatter is not "The One", I’ve decided to use pure IdM solution to solve this problem. I’ve created a data-source that contained handful of people with exceptions in the spelling of their name. In fact I’ve overwritten entire "display name" attribute. People like Machado, DeBeers, and others with irregular capitalization of their names would be manually added to that data-source. The attribute flow priority for the name was set the way that "Capitalization Exceptions" management agent has foremost priority. Whenever record is created in "Capitalization Exceptions" MA user is joined on the user ID and the value of "display name" flows on top of the auto-formatted value, effectively overwriting it with its hand-crafted equivalent. That allowed not only to counterweight all mis-formatting that I’ve spotted then, but also addressed an issue of future unknowns and extensibility.

Ladies, gentlemen, if you are still reading this post you are more than deserving to download the LafiProperCaseFormatProvider and use it in any way you want. I do appreciate your attention.

CodeProject: Lost And Found Proper Case format Provider:

Happy coding!

  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: