7 Reasons to Break Your Extract, Transform, Load (ETL) Addiction

By: John Mancini on June 15th, 2012

7 Reasons to Break Your Extract, Transform, Load (ETL) Addiction

In many cases, hundreds of millions of records can be handled with split-second response times, especially when dealing with aggregate queries on modern hardware. But today, a considerable number of organizations are clinging to 90s ETL technology.

ETL seems to be an addiction for IT organizations because they still consider millions of records a lot of data. But having more memory than data may allow for these organizations to have their cake and eat it too. And today, getting to this point is dirt-cheap.

This may be a little controversial, but I believe that too many companies are falling for the lies about how transactional databases have millions of records and can’t handle the load. As an IT community, let’s eliminate the idea that the users have to put up with a batch process that makes it so they don’t know the outcome of their system interactions for a day (or even a month) later.

Today’s technologists are fortunate to have access to powerful, yet surprisingly affordable, hardware that would have been considered “supercomputers” ten years ago. This is obviously a good thing! Here are seven reasons to break the ETL addiction.

1. ETL is Just Too Hard.

Existing ETL tools make easy things hard. People lucky enough to never have experienced an ETL project, make the assumption that there is a magic "copy the database button.” We have four databases, let's just "copy" them to a big one every night.

2. Database Growth Exceeds Storage Performance.

While databases are getting faster and SSD gives a great boost to random access, data growth is rapidly exceeding performance. This trend stems from basic storage technology. A company now can purchase several gigabytes of storage for $1. Indeed, even laptops are shipping with terabyte drives. There is no reason to assume this trend will stop anytime soon.

Real-world data transfer speeds, especially over a network, are growing at a much slower pace. This means that creating a backup of the database will take longer and longer until it borders on the ridiculous. Moving a gigabyte over 100 BaseT takes 100 seconds. Moving a terabyte takes 10,000 seconds or 3 hours. (This is theoretical performance and assumes the network isn't doing anything else.)

In a real-word scenario on an active network with lots of users, this could take all day. Sure, you could get better networking, but that is a temporary Band-Aid. As data volumes continue to grow, ETL technologies will have to spend more than 24 hours copying data.

3. There is No Such Thing as “Down Time” Anymore.

Typically, ETL tools would be scheduled around midnight during supposedly idle downtime. What if you have people that access the system from California after-hours? What if some of these people are in offices in different time zones?

Your midnight could well be the middle of the day in Asia, and you may have suppliers there that need access to your applications and data. Should they suffer bad performance so that the executives in New York can get their dashboard in the morning?

4. Scarcity of IT Resources Makes ETL Maintenance a Nightmare.

So you hired a DBA to setup the ETL, but they have an average turnover rate of 75%. Are they going to be available when the sales department adds new fields to the database to do marketing automation?

Since ETL scripts are often very difficult to decipher, are you really going to invest the few days it will take someone new to figure out how the old scripts work and update them? Isn't there some "automatically add new fields" feature? The answer is no. This assumes you have the luxury of a spare DBA. If not, it could take months to hire one, especially for a short-term project as DBA’s prefer longer assignments, and the demand is there.

5. Real-Time is a Requirement Now.

Social media applications have trained billions of people on informing them what's happening, at the time it happens. Yesterday's data is like yesterday's news. Your customers expect to be able to change their data and see the results in their dashboard immediately. Sometimes this requires fixing bad data, and sometimes this is a critical operational problem that needs immediate attention.

Would you want a 911 operator to put your information into a queue and send it to responders the next day? No. The same holds true for all critical data.

6. 64-Bit Quad-Corer with 16 Gigabytes of Memory Now Cost Under $2000.

If your company is like most organizations, your database now fits in the system memory of a commodity server. In contrast, DBAs cost over $100 an hour and won’t tell you that $100 worth of memory may be a better investment than hiring them. Upgrading to such a server and utilizing a platform to give users access to that now lightning-fast data, could be just what you need to ensure the company stays very happy.

7. Data Virtualization Enables Joining and Merging Data without Moving It.

You no longer have to copy data over to work with different sources at the same time. Data virtualization technologies available today let users enjoy the benefits of interactive dashboards that consume data from various places; like relational databases, web services, and CSV files all with one simple user experience that simulates having the data in one place.

About John Mancini

John Mancini is the President of Content Results, LLC and the Past President of AIIM. He is a well-known author, speaker, and advisor on information management, digital transformation and intelligent automation. John is a frequent keynote speaker and author of more than 30 eBooks on a variety of topics. He can be found on Twitter, LinkedIn and Facebook as jmancini77. Recent keynote topics include: The Stairway to Digital Transformation Navigating Disruptive Waters — 4 Things You Need to Know to Build Your Digital Transformation Strategy Getting Ahead of the Digital Transformation Curve Viewing Information Management Through a New Lens Digital Disruption: 6 Strategies to Avoid Being “Blockbustered” Specialties: Keynote speaker and writer on AI, RPA, intelligent Information Management, Intelligent Automation and Digital Transformation. Consensus-building with Boards to create strategic focus, action, and accountability. Extensive public speaking and public relations work Conversant and experienced in major technology issues and trends. Expert on inbound and content marketing, particularly in an association environment and on the Hubspot platform. John is a Phi Beta Kappa graduate of the College of William and Mary, and holds an M.A. in Public Policy from the Woodrow Wilson School at Princeton University.