Oct 16 2013
Buckle up Ladies and Gents – this is going to be a long post. I have been scouring the internet for clues to the ObamaCare debacle to see if I can figure out how bad it is, and how long it will take to fix. This is basically what I do in my day job. I come into troubled, large federal programs that are heavy on distributed computers (most times some of the computers are distributed in space around our planet or in our solar system, but in the end that is just a Honking Large Wide Area Network (HL-WAN)), and I uncover where the programs are malfunctioning. This allows the programs to focus on how to get right, which I help them do as well.
In the case of ObamaCare’s Healthcare.gov and its failure to launch (or even get off the pad), I am handicapped by the fact I don’t have the technical spec, schedules, test cases/results and other information I use to do my analysis. And usually in a situation like this, there are many versions of these as the program realizes they are going to smash into the wall. First by dumping capabilities (like corporate health care plans), then by dumping tests, and then by just plane old cutting corners and pretending every thing is all right! You can watch the entire thing spiral out of control through the updates made to these program artifacts. But sadly I don’t have access to all of them, just some snippets
This tale spans many years, and inklings of what went wrong have been hinted at by the experts in recent reporting. One thing is for sure, if the price-shock cover-up was the catalyst to this meltdown, it opened a major wound on a system that was probable not very robust (i.e., stable) in the first place. Think of this panicked need to hide the premium and deductible costs from individuals as the last straw, not the sole or primary problem.
I understand the panic. If you do the math, you can figure out quickly that healthy, young people are going to be paying out the nose for those who are older and sicker – because that is the whole house of cards ObamaCare sits on. Income redistribution! Just wait until we have to cover all the illegal immigrants as well! Then ObamaCare’s roughly $10,000 annual deductibles for a family of four along, with nearly $12,000 dollars in premiums a year, will look damn cheap!
Can anyone figure out which is less? $22,000 a year for health insurance and co-pays, or paying a fine and sticking the $22K in a CD? Even low interest is better than losing that much scratch!
So what happened?
Well let’s step back to March 2013 and this interesting overview of the the Federal Facilitated Exchanges (FFEs, which are a major element of HealthCare.gov for the 36 states that don’t have insurance exchanges), State Based Exchanges (SBEs) and apparently the big bottleneck to the hole scheme the Federal Data Service Hub (sometimes called the FDSH, or just “The Hub”).
Here is concept cartoon [click to enlarge all images] for the FEE, including its relationship to the SBEs (referred to as “States” lower right), The Hub, (upper right) State Medicare/Medicaid agencies (lower left). Missing are the numerous Federal Agencies The Hub interfaces to – which would be shown beyond The Hub.
This is one of those deceptively simple diagrams for managers, which neglects to tell the whole story. And it is an example of why so many government programs are in trouble. You put this in front of technologically naive people and they completely under estimate what they are facing. The critical complexity is hidden from most, and only appreciated by a few. So let me attempt to expose it.
First off, the lower right “State Medicaid and CHIP” systems. There are probably 51 of these, one for each state (plus DC) and each one with a different DB schema and way to interface to the data. This is one of the checks done by Healthcare.gov when applicants sign up to see what they qualify for. Each applicant gets checked by these state databases based on where they say they live. So a minimum of 50 interfaces to these state systems.
The Lower right is the SBEs, of which I can only guess there are 25 (since 36 states went with the HHS provided exchange – DC is one pile or the other). These have to use The Hub to access federal databases for other cross checks. So there are 24 more interfaces.
Also interfacing to this FFE are insurers, of which there must be tens if not hundreds for the 36 states being supported by HealthCare.gov. The SBEs already contain their local insurers, so they are hidden from this view.
Upper left are the consumers, something like 30 million uninsured individuals from 50 states, and I would wager a similar number of small and large companies. Note, back in March 2013, Companies had not quite yet received their one year reprieve from ObamaCare. That repreive showed up mid summer. So lets just stick with 15 million individuals everyone claims tried to sign on the first week. Remember, these people come from all 50 states and the District, so each applicant is now referred to either a state SBE (which in many cases means they were able to sign up) or now fall into the labyrinth that is the FFE and The Hub.
It is hard for people to grasp how much information has to flow through this ‘concept’ to make it work. Each individual applicant has to be tagged with a transaction ID (to keep track and collate the information being pulled together by The hub) as they fill out their identity details. This then triggers The Hub to do its thing, which is cross check your ID with numerous federal systems, and this is the crazy part:
Now each of these federal systems has multiple databases and database schema. Worse yet, each uses a different way to identify Americans. And each probably has a different cadence in terms of requests and response. So each person who tried to set up an account on Healhcare.gov was cross checked at the Social Security Administration (SSA), the IRS, the Department of Homeland Security (DHS), etc. What happens if you are not in one database but in another? Bet you Healthcare.gov throws its hands up and calls a human operator.
Two big things to note about this chart from March 2013. These are the Hub capabilities as of that date and a lot of it was supposedly in place. Hard to tell if this was prior to or after the sticker-shock cover-up. But if you had to go through all these hoops before a price could even be computed it is no wonder the system bogged down. My guess this is prior, and when they had to turn this around and pretty much do all the cross checking before you could move on, the concept broke.
Moreover, if any of these agencies had delays, or if the process was sequential (which it had to be) then delays would domino. If SSA took too long, then IRS would be put on hold, etc. At the end another commenter concurs this probably happened.
I can see a cascade of queued up messages waiting for details from the previous step. It would work for a short time, but as more people logged in their transaction would queue up, consuming RAM and disk. Worse yet, if the networks running the messages/files started to clog, then that would add to the system backing up and freezing. The system would belly up. And do so again and again as they restarted and tried to run again.
If these steps are mandatory and must occur in sequence, there is no amount of HW that will stop the build up of transaction and the ultimate freezing. Core dump crashes would wipe out the progress in the queues, and you start over, adding to the back log
But let’s move on to The Hub. It is a bloody complicated function. It must collect a trail of documents per registrant, building up a background file deep enough to identify the types of subsidies available for the home state/region and then compute the price. One would never realize that from this concept diagram from march 2013:
“Overly simplistic” is an enormous understatement. When I see diagrams like this I see a technical project in serious denial and headed for a crack up. As I noted before, the right side is really something like 50 state Medicaid/CHIP systems and the left is the 24 State Based Exchanges. The right side also has all those Federal systems the HUB has to query for each individual. The Hub uses a mush of messages, emails and files to interact with these entities, adding to the impossible complexity. And it translates a lot of unstructured data between all these outside systems.
On the left side, a sane program would have limited the number interface types to define, test and maintain. With SBEs being developed as part of ACA, you could afford to limit the interface complexity. And on the right side people should have begun to worry if in March, when coding supposedly STARTED, they had enough time to test, let alone develop. They show 6 entities, but the reality is 26 SBEs on the left, 60+state and federal systems on the right, and millions of transactions coming in to build a file for each of the 17 million applicants.
To see how bad The Hub’s data collection challenge is, just check out this ‘Guideline” document on interfacing to the FFE and The Hub. Here is an example record structure – no data entered.
All this collected from who knows how many sources. There is a lot more juicy technical information in this document I don’t have time to cover, but I will cover one more item. One of the main interfaces to the hub are files that are ’emailed’ between The Hub and the external insurers.
QHP [Qualified Health Plan] Issuers will connect to the Hub (for enrollment EDI transactions) via the CMS Enterprise File Transfer (EFT) system which is a batch system. Each QHP Issuer is assigned a Submitter Identifier in the EFT system which allows access to a mailbox. The QHP Issuer and the Hub use this mailbox to pick up and drop off data files.
This can cause huge problems since email can be slow and firewall rules (and other security features) can create difficulties in getting the mail delivered, with attached files intact. I have never seen such a thing work quickly. At least it appears to be only with insurers! But if you are waiting on emails to be exchanged before you can enroll in your plan, good luck! Some reporting hints that this interface to possibly hundreds of insurers is also not operating very well.
Anyway, so the general design for Healthcare.gov was to create a bottleneck called The Hub, which acted as a universal translator between scores of legacy and new systems. The universal widget and perpetual motion machine all in one. No wonder failure was the only option.
The data modeling alone must be a nightmare. Each interface deserved extensive testing – and I am talking a week of basic functional testing and then weeks of load testing. The reason you do weeks of load testing is lots of times SW systems can begin to consume themselves through memory leaks, expanding buffers, etc. Much of the problems on Go-Live day SHOULD have been caught if the load testing was done. You never want to discover the things out we are seeing after deployment. Never. And there was no excuse in not doing the load testing.
I always recommend a day-in-the-life-test followed by a week-in-the-life-test at nominal peak loading to make sure the system is robust. Obviously this was not done or else Healthcare.gov would have made it at least a day before crashing.
Originally, the plan seemed somewhat rationale. In this July 2011 procurement spec for The Hub the initial schedule looks very different from the final dash:
The foregoing activities must be completed to ensure the DSH will be ready. The following reviews represent the key milestones (stage gate reviews in the ELC, dates represented as calendar year) for the DSH:
- Architecture Review: October 2011
- Project Startup Review: Q4 2011
- Project Baseline Review: Q4 2011
- Preliminary Design Review: Q1 2012
- Detailed Design Review: Q1 2012
- Final Detailed Design Review: Q2 2012
- Pre-Operational Readiness Review: Q2 2012
- Operational Readiness Review: Q3 2012
We know from the Hub To Date chart from March 2013, this schedule never happened.
Now look at this schedule from 9/3/13 I found on the Colorado SBE website:
Big difference, eh?
You will notice how little testing time there was at the end of this fiasco – just a couple of weeks. No time to fold in corrections.
horizontal dashed blue line is the date of this report. The vertical red line is the Go-Live date of Oct 1, 2013. 4th row down we see the supposedly nearly complete Hub barely making it through testing before end-to-end (E2E) testing is halfway done (the yellow color indicates a risk). This is another sign of a program heading to disaster. They are running E2E and User Acceptance Testing (6th row) while the Hub is still being tested. Hard to do those other larger tests when the central bottleneck is still under test.
And why was The Hub Scrambling? Some of it had to do with the fact it had not been demonstrated to be secure. A GAO report late summer identified the fact security requirements and implementations for The Hub were not complete, let alone tested. The core piece touching all those state and federal databases had not done the required security assessments or had the agreements in place to interface with those federal repositories of PII information. So the security was slapped on at the last minute – another sign of a certain performance disaster. And I would wager, Healthcare.gov is probably very open to IT security threats. When were the ~100 external interfaces to The Hub operationally tested? I doubt it was in those last few weeks of September. I am pretty confident the first real test was on October 1, and they are now discovering a sea of technical issues that will take weeks to work off.
From that same Colorado SBE report in mid September comes this telling chart:
Look at the issues they have. The 3rd one notes The Hub had just completed stand alone testing! End to End was to begin – and this is Sept 13? They ran out of time, and that is poor technical management. And then they deployed anyway – and that is political stupidity.
“Our surveillance of the exchange landscape shows that while some states have completed basic testing with the hub, others are working through the final testing phases despite still being in the building stages of development,” Brett Graham, managing director of the Center for Exchange Excellence at Salt Lake City-based Leavitt Partners, which provides exchange expertise to a range of states, told members of the House Energy and Commerce Health subcommittee Tuesday. “Several states have expressed concern to us about using the federal data services hub and, where possible, are planning on using their own data sources for verification.”
He predicted that most exchanges “will experience a rocky enrollment period,” adding that “there will be technical issues that will impede a consumer’s ability to enroll in a seamless and timely manner.”
This person knew what was happening. Most states knew what was happening. In fact, what we see is a bad design working on an impossible schedule. In Congressional testimony on Sept 10th you can detect the caveats and CYA language coming from the Hub development team:
Our delivery milestones for Data Services Hub completion are being met on time. We expect CMS’ Data Services Hub will be ready as planned by October 1st.
At this point:
- We have completed software coding for the Data Services Hub for all its required October 1st functions.
- We are continuing performance and integration testing.
- We have connected the Data Services Hub to databases at the key federal agencies that will be used to verify information.
- We have connected the Data Services Hub to the system that will transfer data to and from the health plan issuers.
First line: note the caveat “required”. It is well known updates were planned for Dec and later, probably to fix problems discovered in September. Second line: The Hub is still being integrated and tested. It should have completed this by now and high priority ‘bugs’ should be coming in for final operational testing. Third line: The Hub has been ‘connected’ to federal agency databases. No real world testing has been done, just a connection test. Same with line 4 -“connected” is not tested.
Here are some more charts from the Washington State Exchange as far back as July 2013. This chart notes “DSHS Eligibility Services and the CMS FDSH are new systems being constructed by their respective organizations”. The SBE had deemed these to be critical schedule risks.
There are more risks listed in the presentation, but a pattern is quickly becoming clear. Not enough testing, probably due to insufficient design and planning.
In a nutshell, many agile processes — and especially extreme programming — reject the big design phase as part and parcel of rejecting the waterfall methodology. Agile processes follow more of an “organic” software development, where developers start coding the smallest increment possible and “grow” the working software up, little by little, with constant customer feedback. These agile methodologies call for “user stories” to design each small increment of the system being developed. To be fair, agile can work for some software projects, but I assert that it is the kiss of death for projects with many moving parts, multiple organizations and complex interactions.
Personally, I am a fan of design and system architecture, and I have witnessed many successful projects that resulted from good design. Furthermore, I flatly reject the notion that user stories can suffice for large IT projects like HealthCare.gov that require scalability, data integration, numerous system interfaces and other complexities.
Couldn’t agree more. The tunnel vision and blinders I see in the technical documentation is stunning. With this many complicated, unique external interfaces you can’t just wing it in scrums and sprints. With this many different forms of data you can’t skip data modeling and data architecture, so you know how to translate between the various sources, sinks and internal records. And you would not make the mistake of standalone tests over week-in-the-life tests.
The data hub would certainly be ground zero for such load issues, but not the only one. If any of the other databases it spoke to were overloaded, the sign-up process would break anyway. The conundrum may not even be in the data hub or in healthcare.gov, but in some pre-existing citizenship database that’s never had to cope with the massive crush of queries from the hub.
The fundamental fault is in the failure of coordination between the two systems, the failure to test from end to end, and—to use a term of art in software engineering—the failure to handle failure gracefully. To the extent that the data hub was the nexus for integrating many systems (including the healthcare.gov front end), “systems integrator” QSSI had the responsibility to work out graceful failure conditions with all of their partners and a comprehensible user experience in such cases—something that clearly wasn’t done. This points to bad management, lack of accountability, and a broken contractor procurement process.
And who are they kidding with “Agile” when there are 50 contractors coming and going over 3 years? See here for the list of contracts per company, per year, and a summary here. Apparently there was some enterprise architecture work done by Mitre, but I would bet the developers of the Hub just ignored it in their Agile process. This is not an Agile environment. Too many contractors doing small pieces and no Systems Engineer/Architect making sure it comes together.
There are probably scores of witnesses for Congress to call upon, and a paper trail a mile deep in HHS and the states. The proof is in the last 3 dismal weeks of ObamaCare as it ObanaCrashed over and over. If the WaPo is right, these terminal design and technical mismanagement issues are leaving potential customers cold. Traffic to Healthcare.gov is dropping like a rock. The ones soldiering through are those desperate for coverage – the costly ones. Those giving up on enrolling are the ones now wondering if paying the penalty and saving the money from premiums and co-pays may not be the less painful path in the end.
Either way, the issues with Obamacare are not going away anytime soon.
Update: Is WH already working plans to delay ObamaCare individual Mandate. ROTFLMAO.
The federal agency in charge of the exchanges signed agreements this summer with several e-brokers to sell health plans in the 36 states where the feds are running the new individual marketplaces. But the online brokers, eager to tap a new market of people who’ll qualify for federal subsidies, learned shortly before the Oct. 1 launch that they wouldn’t be able to offer exchange plans right away.
The brokers say the Centers for Medicare & Medicaid Services didn’t act fast enough to let them integrate their websites with the IT systems supporting the federal insurance marketplaces. They hope to get everything linked up with the feds in the coming weeks.
Betchya those were the Dec updates that were planned! You know what people will think now about Shutdown Theater – huge waste? BTW, Expect the links alone in this post to be fodder for the conservative PR machine. There’s a lot of ‘slpaining to do.
Update: Ouch, this left a mark:
This isn’t some coding error, or even the Health and Human Service Department’s usual incompetence. The failures that have all but disabled ObamaCare are the result of deliberate political choices, which HHS and the White House are compounding with secrecy and stonewalling.
The health industry and low-level Administration officials warned that the exchanges were badly off schedule and not stress-tested despite three years to prepare and more than a half-billion dollars in funding. HHS Secretary Kathleen Sebelius and her planners swore they’d be ready while impugning critics and even withholding documents from the HHS inspector general for a routine performance audit this summer.
Yep, cut corners and pretend all is right in the world. Denial is deep at HHS. More at WSJ:
Our sources in the insurance industry explain that the 834s so far are often corrupted or in the wrong syntax, and therefore unusable unless processed by hand. In other cases the exchanges spit out multiple 834s enrolling and unenrolling the same user and don’t come with time stamps that would allow the insurer to identify the most recent version.
Yep, that confirms my suspicions above with the interface to the insurers. That will require a complete do over at these interfaces, with weeks of proper testing.
Final Update: A Reuters laugher for sure:
A Reuters review of government documents shows that the contract to build the federal Healthcare.gov online insurance website – key to President Barack Obama’s signature healthcare reform – tripled in potential total value to nearly $292 million as new money was assigned to the work beginning in April this year.
“Why this went from a ceiling of $93.7 million to $292 million is hard to fathom,” said Scott Amey, general counsel at the Project on Government Oversight, a Washington, D.C.-based watchdog group that analyzes government contracting.
“Something changed. It suggests they ran into problems and knew last spring that they couldn’t do it for $93.7 million. They just blew through the original ceiling. Where was the contract oversight?”
What happened CMS and HHS got beyond those simplistic power point slides and found 10’s of “engineering years” worth of work that had to be done in only a few months. They went for the 9 women being pregnant for a month gambit to ‘bend the schedule curve’, Which means they screwed up and threw bodies at a problem that really required a full stop and a new strategy.