Kicking the Dead Horse: Top 10 Mistakes of Sunrocket
I still dig top-10 lists, so here we go:
10. Failure Framework: Hiring telco execs in CXO positions out of the gate. Who else to piss away money the fastest on "future proofing" while the sysarch has to explain why these "web servers" are necessary?
9. Gotcha: Claiming a "no-gotcha" company while maintaining strict radio silence when things were going wrong.
8. From bad to worse: Hiring the same people responsible of a major loss in shareholder value at AOL to run Sunrocket. Of course, they covered their butts and so this is impossible to prove. I'll mention this: nobody in their 40's "retires" from a VP position. That's jargon for not being able to find a new job before it becomes plain as day you suck at your current one.
7. We tanked like this last time: Telco execs hiring ex-telco NOC staff to run what's essentially a dot-com network hauling primarily UDP traffic. To their credit most came around to running internet systems, but this was much harder than it needed to be. The bitching of the "seasoned Telecom execs" about Linux not being UNIX and then one of their stars tanking a UNIX machine (Solaris... that's UNIX, right?) by typing "hostname help" at the root prompt did little to point out their own irony.
6. Improvement aversion: vertically scaling session border controllers originally purchased as a stop-gap measure while politicking to have a solution with 1/20th the per-subscriber cost in back office equipment (and scales horizontally) taken off the map. Not to mention it provided a gateway to the Jabber protocol's Voip (can you say peering with GoogleTalk?) service. Can't do that - it could make us popular.
5. The Ostrich: One exec signs a deal for a SQL Server-based billing system (read: runs Windows, not UNIX) with no due diligence. Want to know why Sunrocket could not actually make money? They could not bill. The result: Call detail records were present three times in the billing database with the call merging done in-database. A perl script would have been 20 times faster. A funny one was that it was the same exec that called a meeting to explain the "web server line item" asking why this was needed, what this thing called Linux was - and then promptly complained that SQL Server cost money, and nixing the enterprise choice so fail-over could work.
4. The Ostrich 2.0: Deploying (I don't know if it ever went live) the next billing system on servers with no floating point processors (Sun T1000 systems). Now fine, billing systems should count tenths of pennies or so (i.e. not need floating point), but Oracle 10g sure does, and programmers sometimes forget that you're quickly down to two digits of precision when daisy-chaining floating point calculations. Are the T1000's even Oracle certified?
3. The Sheep: Outsourcing development and call center without actually agreeing to a development process in house. And then wondering why things fail, take twice as long, or end up canceled. Repeat after me: The good thing about outsourcing development to India is you get exactly what you specify. The bad thing about outsourcing to India is you get back exactly what you specify. If you're a bank, fine, you probably had 20 years to sort your process and procedure out - doing it as a dot-com is an express ticket to dot-gone. Oh, and the project managers that were not allowed to manage projects... Wait! Innovation was outsourced to ... AOL execs. Yaaay!
2. Security? That's the sysarch's problem: SR could have called itself the official VoIP provider of Al-Quaeda for a while - the session border controllers were so bottlenecked that authentication was effectively off at times, and ATA (the box that gives you dial tone) configs were unencrypted. Of little consolation is that your account password is the same as the account number ... hmmm. Too bad the sysarchs weren't usually allowed on the session border controllers.
1. Penny wise, pound foolish: The whole thing was rolled out seam-of-the-pants style. To the first 7 month team's credit, it all mostly worked (except believing some vendors, like the SBC people, that their capacity assessments were actually accurate). Conveniently, Mr. "what's-this-webserver-thing" canceled the inital lab to diagnose or load-test anything because it would be too expensive, and he did not want to run another 60A of power to the machine room. That was until the building engineer got all huffy showing the circuit distribution box hitting 166F on his thermal cam, so at least the circuits came.
Now fine - I have no trouble admitting my part in this. I never produced a nuclear-sub-grade operations manual for the whole system, with step-by-step procedures for when some green light went red. But then, nuclear subs cost 60 billion each and take three years or more to build. But was fun pointing the network staff at a folder on their network during an outage (I wasn't there anymore, but I also like my phone working...) that outlined troubleshooting and recovery procedures, and lists of every process needed to to support production on every production system and hearing "I never even heard this existed..."
Oh, and I got fired for getting upset that someone with control over the access list at the data center decided to "omit me" the day I was to work on-site with a vendor, and then pointing this out. Plausibly deniable - only other CXO's admitted to hearing the perp admit this was not an accident. But perfect timing, since my mom had just died, and I was somewhat vulnerable as a result. Who says telcos can't innovate?
Arguably, the number one folly at Sunrocket: for a communications company, we all should have been better at communicating. The irony.
I feel better now.