Basics: raw vs. jpeg

In this post I want to explain why opting for raw data, followed by processing the data yourself, is better than using your camera to process that data for you, despite its convenience. I will not explain how you can process that data as that is then up to you do, as you see fit, with the software of your choice. Cameras and software are alike in this regard. What matters is that you get the results you want, not the brand or the company behind that brand.

Does raw data even exist?

There is a book edited by Lisa Gitelman. Its title is ‘“Raw Data” Is an Oxymoron’. The title builds on what Geoffrey C. Bowker states in his book ‘Memory Practices in the Sciences’. He (184) argues that “[r]aw data is both an oxymoron and a bad idea” and that “data should be cooked with care.” We could, of course, reply to that by stating the exact same thing, that cooked data is an oxymoron and a bad idea, considering that to cook something requires something uncooked, i.e., raw. Otherwise, it will not make much sense.

Gitelman expands on Bowker’s view alongside Virginia Jackson in the introduction to the book that mentions raw data in its title. For them (2) all data is “always, already ‘cooked’ and never entirely ‘raw.’” They (2) go as far as to state that “[i]t is unlikely that anyone could disagree” with Bowker’s view, only to acknowledge that, oddly enough, data is more important ever. I believe they are right, but I also think that they are wrong.

They (2) clarify the problem with raw data by stating that “starting with data often leads to an unnoticed assumption that data are transparent, that information is self-evident, the fundamental stuff of truth itself.” In other words, the problem with it is that it is deemed to be objective. They (3) list a number of verbs that we typically associate with data, such as collecting, entering, compiling, storing, processing, mining and interpreting, only to add that, in fact, even in cases where we think that we are not interpreting the data already, we are, in fact, already interpreting it. In short, we are actually generating data when we are examining something, as they (3) remind their readers.

While I agree that, strictly speaking, everything is cooked, to a certain degree, as emphasized by them (3), I think that they might be overlooking why people refer to something as raw and then something else as processed. To use the cooking example, while you can eat raw potatoes and thus could say argue that they are, in fact, therefore cooked, having been cooked by the earth, most people would still refer to them as raw and opt to cook them.

Raw potential

To connect this discussion to photography or, to be more accurate, digital photography, it is common among photographers to differentiate between raw and processed file formats. To be clear, raw images are not even images. They only become images once they are processed. They therefore hold the potential to be images, but they are not images themselves.

To make more sense of what I just stated, if you set your camera to save files in raw format, you can inspect them on your camera. That would seem to contradict my earlier statement. However, that is not the case. This is simply a matter of convenience. The camera does exactly what it is supposed to, saving the file in a format that is technically not an image, but it also processes that file into an image, applying some default settings, or whatever settings you have chosen or customized through the camera menus. This makes it easier for you to assess whether that file is worth keeping or not. That image or photo that you see on the back of your camera is not, however, the raw file itself. It is merely an image created from that data, as processed by the camera.

You can also set the camera to do the exact opposite, so that you get only an image, but not the raw file. The camera file format is then typically set as jpeg (also known as jpg), short for Joint Photographic Experts Group. Some cameras do not even allow you to save that raw. They operate the same way as cameras that do allow this, but they simply discard the raw data after it has been processed into an image format that one would typically refer to as a photo.

It is worth keeping in mind that digital cameras are computers that are capable of saving files in formats that is immediately usable, e.g., in jpeg format, or potentially usable, e.g., raw formats. Many, but not all photographers prefer the latter over the former because it allows them to process the data the way they want to, on a computer other than a small camera, on a screen much larger and higher resolution than the one that can be found at the back of one’s camera. Moreover, they also prefer the raw format because it allows them to process the data in more than one way, even years or decades later.

To be clear, you can also process the immediately usable files, such as jpegs, but then you are processing something that has already been processed. In many cases it is not an issue, but you would rather process something that is yet to be processed instead of something that has already been processed. A major difference between the jpegs and raw files is in the file size. The former are compressed, whereas the latter are not, because they serve a different purpose. Think of the jpegs as the final photos that have been processed in a certain way and in a format that does not a lot of disk space and bandwidth. If you care for the image quality, you prefer to use software to process the raw data for a specific purpose, instead of what has already been processed from that data.

If we want to retain that cooking analogy, raw imaging data, saved in raw file formats, is uncooked. In fact, it is not only raw, like a potato, it is inedible, unlike a potato, which can be eaten, not that I would recommend that. It is usable. It can be cooked, like a raw ingredient, over and over again, infinite number of times, unlike a raw ingredient which can only be used once. This allows you can try different things with it, by simply changing the recipe.

The processed image file, i.e., the photo, is cooked. It is ready to be eaten. It can be further cooked, but it may result in it being overcooked. If you want the food to be as tasty as possible, you start from the beginning, using fresh ingredients. This is not to say that using leftovers or simply reheating food does not have its merits. It might not be as tasty, but it gets the job done. It is still nutritional and takes much less time than starting from scratch.

What is raw anyway?

To be clear, this does not mean that Gitelman and Jackson are not right, nor that Bowker is not right, because, in a sense, they are. I will not get stuck on the details as it is enough to point out that digital photography involves an analog to digital conversion that is, by no means, 1:1 conversion. We might say that the raw data is virtually unprocessed, but it is not actually unprocessed. Something is lost in the process. Then again, we may counter that by acknowledging that it is also the case with human vision.

A good example of the limitation of both photography, be it analog or digital, and human vision is that the dynamic range (DR) is always limited. A digital camera can typically handle 14-bits of data, which means 14 stops. The human eye can easily beat that, operating at 20+ stops, let’s say 20 to 30 stops, but that is because it does not function the same way as a digital camera. While it is fair to say that human vision is much better than a digital camera, at least in this regard, it is equally unable to provide us raw visual data. In fact, it is also cooked.

It is also worth noting that not all cameras and lenses are alike, as I have discussed in my previous posts. They are certainly very similar, but the differences do impact the data they are able provide us. This then impacts what you can do with that data, what kind of images you can process from it.

To stay on the cooking analogy, no ingredient is, strictly speaking, raw. They do not simply exist in raw form, simply waiting for us to cook them. For example, something as simple as potatoes do need the right environment to grow and that environment impacts them accordingly. We need to consider temperature, moisture, the soil quality, the mineral and organic nutrients, etc. In addition, we need to consider the farming practices. Once the potatoes are ripe, they become the raw ingredients. It is, however, worth also noting that the potatoes can be harvested sooner or later, opting for the delicious, but small new season potatoes, or letting them grow even more, for those mid and late season potatoes that are bigger, but less tasty.

The problem with raw is that it is often assumed to be the starting position, something that is not at all processed, even though there is no such thing. Even the raw needs to ripen before it is cooked. Even the unprocessed needs to be processed in order to be post-processed, which is the term typically used in photography for the processing of raw files on a computer other than the camera. It is an apt term because that data has already been processed, much like a potato is processed as it ripens underground.

Pragmatics vs semantics

To summarize the problem identified by Gitelman and Jackson (2), typical arrangement of raw vs. cooked is that it assumes that we have these fixed states where something is either raw or cooked, as discussed by. Raw is then heralded as objective, transparent and self-evident information, “the fundamental stuff of truth itself”, as noted by the two (2-3). While I agree with them in that regard, I think that it is equally problematic to state, as they (2) do, that “data are always already ‘cooked’ and never entirely ‘raw.’”

The problem with stating that all data is cooked is that holds on to that binary. As everything is cooked, everything is to be treated with suspicion. There is this lament to it. Raw is retained, but as something unattainable.

If we consult a dictionary (OED, s.v. “raw”, adj. and n.), it will tell us that it can also be something that is often ‘un-something’: “unprocessed”, “unrefined” or “only partly refined”, “unbaked”, “untreated”, “untanned”, “undressed”, “unfinished”, “unfulled”, “untucked”, “undyed”, “uncut”, “unpolished”, “undistilled”, “undiluted”, “unmalted”, “undried”, “undeveloped”, “unmitigated”, “unused”, “unripe”, “unfamiliar”, “inexperienced”, “unskilled”, “untrained”, “crude”, “uncivilized”, “coarse”, “brutal”, “not yet (statistically) analysed or evaluated”, “unadjusted”, “uncorrected”, “unprotected” and “undisguised”, to name just about anything that seems relevant to this.

If we look at how the word is used, as this ‘un-something’, there’s no strict binary to be found. One might refer to sugar as raw, as noted in the dictionary (OED, s.v. “raw”, n.), but we do not need a dictionary to tell us that sugar does not exist in raw form. It needs to be extracted from something else, typically from sugar beet or sugar cane, which, in turn, need to be grown and ripened first, much like the potato.

To me, objecting to raw data, on the grounds of it all being cooked, is akin to stating that everything is therefore fake. It relies on this semantic distinction between what is raw and what is cooked. I believe that is much more productive way to think of those words as not in semantic terms, having a fixed meaning, but in pragmatics terms, so that what is raw or cooked depends on the context.

Bottom line

Choosing between a processed or an unprocessed file format is up to you. Opting for the former, typically for the jpeg format is fine, inasmuch as you know its limitations. Simply put, you are limiting yourself quite considerably as you can only further process the photo that has already been processed. Opting for the latter, typically for a raw format, gives you much more room to work with. You simply have more to work with.

Raw formats do, however, take much more disk space than the processed jpegs. This is not really an issue if you only to store only a handful of photos, but it becomes an issue if you plan to store more than that. You will want to invest in storage if you prefer the raw formats.

Having the camera do the processing for you is much more convenient than doing it yourself. You can also adjust the camera processing, by changing the camera processing settings. This gives you some control over the processing. That control is, however, limited by the camera software, which typically gives you only a handful of options to adjust the processing. Moreover, you are working with a tiny, low resolution screen, which makes adjusting the settings inconvenient.

I have my cameras set to raw by default because I want to do the processing on a desktop computer, in front of a large screen. Others are not as fussy and are happy with the processed photos as most people would not even know the difference. You are free to choose, to opt for raw or jpeg. You can also opt for both, opting for raw or jpeg selectively.

It is worth noting that the combination of darkness and bright lights is very difficult for the cameras to process. I therefore recommend opting for a raw format and processing that raw data yourself, if you are interested in nightscapes. I have managed to pull of good jpegs of nightscapes, after fiddling with the camera settings, especially the HDR modes, but, overall, I think the best results can be achieved by opting to do the processing of that data yourself.

References

Bowker, G. C. (2005). Memory Practices in the Sciences. Cambridge, MA: The MIT Press.

Gitelman, L. (Ed.) (2013). ‘Raw Data’ Is an Oxymoron. Cambridge, MA: The MIT Press.

Gitelman, L., and V. Jackson (2013). Introduction. In L. Gitelman (Ed.), ‘Raw Data’ Is an Oxymoron. (1–14). Cambridge, MA: The MIT Press.

Oxford English Dictionary Online (n. d.). Oxford, United Kingdom: Oxford University Press.