PDF vs. TIFF vs. JPEG vs. PNG vs. MS Office - Which File Format for My Business Application?
There are literally thousands of file formats available – which can lead to lots of confusion when trying to select the best file format for your business applications. Different file formats work better to meet certain business requirements, and selecting the wrong format can cause issues for organizations, their customers, their legal team, etc.
To help make this type of decision easier, we’ve outlined some very common file formats used in almost every organization. We’ll look at each of these in a bit more detail to help you compare them and ultimately choose the file format that will best fit your needs.
Adobe Portable Document Format (PDF)
What is PDF? PDF has been around since 1993 as a way to share rich documents, including formatting, links, and images. Over the years, the format has been updated a number of times; its proprietary nature has resulted in significant issues with backwards compatibility.
In 2005, Adobe produced a subset of PDF, the PDF/Archive (PDF/A), which was standardized as ISO 19005. The purpose of PDF/A is to provide a stable format suitable for archiving, in part by prohibiting features that change the look of the document such as active code or font linking. The intent is that this should make PDF/A suitable for long-term preservation where such efforts are directed to faithfully reproducing a digital document in the future. This standard is regularly updated.
In 2008, Adobe went further and made the PDF Specification a standard: ISO 32000. Adobe has also produced other specialized PDF formats, including PDF/Engineering, PDF/X for prepress digital exchange, and PDF/UA for universal accessibility.
Considerations for Choosing PDF: PDF is a widely used format for a variety of applications. PDF supports multiple pages and a variety of content types within a single PDF. PDF is also natively supported in almost all web browsers through the ubiquitous PDF Reader. Many scanning and content creation applications can output directly to PDF.
Tagged Image File Format (TIFF)
What is TIFF? Tagged Image File Format, or TIFF, is a graphical format presenting the document as a digital copy of the original using raster images. It is an ISO standard. It supports many different compression formats, particularly lossless ones – that is, all of the original data remains present in the file. This can result in significantly larger file sizes compared to other approaches.
Considerations for Choosing TIFF: TIFF was the most common format found in digital imaging applications because it was the first to be based on industry-wide standards. It was the default file format for many scanners and digital imaging applications for a number of years. It supports black and white (bitonal), grayscale, and color scanning and can create multi-page files as well.
TIFF is not as popular today compared to PDF for scanning office documents for two main reasons. First, TIFF files are not searchable without taking additional steps to perform character recognition on them. This is often built into the PDF capture process. Second, the ubiquity of the PDF Reader means that PDFs are viewable on almost any device, including mobile devices. TIFF readers and plugins are much less common.
Joint Photographic Experts Group (JPEG)
What is JPEG? The Joint Photographic Experts Group, or JPEG, is both a standard compression algorithm and the file format that uses that algorithm. JPEG works by discarding up to 99% of color information that can’t be discerned by the eye. This works best for continuous-tone images such as digital photographs; it does not work as well for black and white images, such as scanned business documents. JPEG is an ISO standard format.
JPEG is considered a “lossy” algorithm since data is actually discarded during the compression process. This is generally not an issue when creating a digital image but can become a problem if the image is repeatedly converted because each conversion results in 80-90+% of data loss.
Considerations for Choosing JPEG: While JPEG can support some methods for displaying multiple pages in a single file, these are not very well supported in the marketplace.
JPEG has become much more common as mobile scanning and capture applications have matured – it is often the default file format for those devices and applications.
Portable Network Graphics (PNG)
What is PNG? Portable Network Graphics format, or PNG, is a more recent graphics format that supports very efficient, lossless compression.
Considerations for Choosing PNG: PNG supports compressions of color graphics up to 32-bit. This makes PNG very desirable for web graphics. Many graphics programs, including those in digital photography applications, support the creation of PNG formats; as with JPEG, PNG is natively supported in almost all current web browsers. PNG is an ISO standard format.
Microsoft Office (Word, Excel, PowerPoint)
While there are other office productivity suites out there, including very good open-source offerings, none of them has achieved the market share that Microsoft Office has – in fact, in some ways, Office defines that market space.
Consideration for Choosing Microsoft Office: Office includes a number of tools; the composition of these tools changes over time as do the individual tool capabilities. For this post, we will limit our review to the three most common formats: Word, Excel, and PowerPoint.
Word is commonly used for creating and collaborating on business documents such as reports and contracts. Excel is a spreadsheet that can be used for financial calculations as well as presenting information in tabular format. PowerPoint is used to summarize and present information succinctly. Each tool offers a broad spectrum of capabilities that can make office workers more productive – but these broad capabilities also result in significant complexity in terms of what can be included in a given file or document.
Office file formats are all considered proprietary; there are standards-based XML versions of each file format, but they are much less commonly used than the default formats. Microsoft’s sheer domination of the market ensures some compatibility between versions and among other tools, but complex authoring can result in potential incompatibilities in the future and in accessing significantly older versions of files.
Selecting the Right File Format
Selecting the correct format is an important consideration for creating and capturing information – it’s the first step in the intelligent information management lifecycle. This step sets the stage for everything that follows: extracting intelligence from information, digitalizing information-intensive processes, and even automating governance and compliance.
So how do you know which file format to use? It really depends on the business needs of the department creating the information and, ultimately, the needs of the organization. If you want to make information available over the World Wide Web, you should select a format that can be displayed in a variety of browsers, isn’t too large or cumbersome, etc. If you’re looking for an archival format, that is likely a different format that will depend on the original. If your customers use Office, it will be difficult to engage them using an incompatible office productivity suite or format.
Standard file formats are preferred where possible, especially where you need to exchange information with others or when you need to retain information for long periods of time. In general, the more complex a file format is, the harder it is to use over time; proprietary formats start out with more complexity.