In my first article, I explored in broad strokes the process of science. In this article, I’ll narrow the focus somewhat to one of the more vital aspects of the scientific method: that of experimentation, or testing the hypothesis.
An experiment is meant to test a hypothesis. To that end, we must make sure that:
1) The experiment actually answers the question
2) That random changes in the experimental environment don’t affect the outcome
3) Our own human bias don’t creep in to foul things up and
4) Determine how much “error” there is in any given test.
In more scientific terms, we’re talking about setting up controls, controlling variables, eliminating bias and statistical significance. All of these factors MUST be known and must be recorded when you present your findings.
Now here’s the thing: due to the popular depiction of science in the media, many people assume that science cannot be done without experimentation. This isn’t true. To test a hypothesis, you can do one of two things: experiment or observe and predict. Remember, your hypothesis can be formulated as either a question or a statement. Both require testing. A question requires an answer. A statement requires verification. It’s a subtle but important difference, and that difference will inform your method of testing.
There are some things that we cannot perform experiments on for one reason or another. Most often, it’ll be because of the time frames involved. Geological change happens over a time scale of millions of years. We can’t directly observe the earth as it was millions of years ago, and we can’t recreate plate tectonics in the lab. We therefore don’t do an experiment to answer a question. Instead, we make a statement, and then observe to see whether it holds true or not. We can do this through examination of past events to predict future ones.
Here’s a simple statement related to plate tectonics. “The continental plates are moving on a bed of magma.” We can test this hypothesis through observation. We can go to the rift valley in Africa and measure the rate at which the continent is being torn apart. We can measure the rate at which the Himalayas and the Alps are growing. We can measure the rate at which one continental plate is moving under another by going there and looking, and based on this data, we can predict what is doing to happen next. We can also deduce what must have happened in the past. The test for this will be whether we find physical evidence for it in the rocks. If we do find it, the hypothesis holds up without us having done a single experiment.
Let’s take evolution, for example. We can’t test evolution in the lab. We can test aspects of it in that we can induce heritable mutations, but the time scales is so great that we can’t observe the changes directly. However, the theory of evolution through natural selection states that such changes will occur, and we should be able to see it over time. We of course do this through the fossil record and through genetics. We can measure genetic change over time and match that with the fossil record. Even though we don’t have all the fossils, we have enough to prove that evolution does happen.
That brings me back to the four criteria for successful tests. Using again my example of the hot water, let’s examine each criterion.
1) Setting up controls.
You want to determine what heats the water, and to do that, you decided to collect water samples to see if you can detect traces of the heat source in the water itself. In order to do that, you must first know a little something about water. Reading up about water, you’ll know that pure water is composed of hydrogen and oxygen. You’ll know that water is a good solvent, so you have a fairly good chance of detecting traces of anything the water passed through or came in contact with. You know how much energy it takes to heat water to a specific point, and you know under which conditions that energy is transferred. You know that the hotter water gets, the easier things dissolve in it. Knowing all this, you know you can almost certainly determine what the water was in contact with. So you take your sample, and test it for pH, temperature, dissolved oxygen, mineral content etc. You also test a sample of purified water to determine its composition and properties. Then you test water that’s been in contact with burning coal, burning oil and radiation and record your results. Then you compare your sample to all your controls. You may find that your sample’s mineral content doesn’t look at all like you’d expect if it had been in contact with burning coal. In fact, it looks nothing like any of the controls. The closest one is the control that’s been exposed to radiation, but not quite like that either.
So what have you done here? By setting up comparisons, you’ve determined that there’s a gap in your hypothesis. There’s a heat source you haven’t considered. By knowing as much as you can before hand, you’ve also ensured that you can design an experiment that will give you a proper answer. Just because you knew what a possible answer might be beforehand doesn’t mean you set up your experiment to ensure you get that specific answer. It merely means that you tested it against known outcomes to determine whether the hypothesis is true or not. In this case, you got a completely unexpected answer. In the world of a scientist, there’s nothing as exciting or as nerve-racking as an unexpected result, because it usually means one of two things: you just discovered something entirely new, or you made a big mistake somewhere.
More often than not, it turns out to be the latter, and most often the mistake was something you didn’t take into account, which brings us to the second criterion of proper scientific testing: controlling variables.
In order for your controls to be comparable to your sample, you have to prepare them under the same conditions. If you took your sample at an air temperature of 25 degrees Celsius, and at an air pressure of 1013kpa (standard conditions) then all your controls must be prepared and kept under those exact same conditions. You should have used the exact same type of containers to prepare your controls that you used to collect your samples, used the same heating elements to prepare them, used the same instrument cleaning regime...in short, done everything you possibly could do to eliminate any potential differences between your controls and your sample so that you test ONLY the natural difference in the mineral content of the water and nothing else. Knowing that water dissolves things more efficiently at higher temperatures, if you don’t prepare your controls all at the same temperature, you can’t be sure that the mineral content will be comparable to your sample, and that can be a BIG flaw in your experiment, and lead you to draw a completely wrong conclusion. You must account for every variable, and this can be a very time consuming and painstaking process. This is why the scientific method is so precisely and rigorously enforced: human error can all too easily creep in. However, once you have accounted for every variable, and you still get the same result, then you know you’re on to something.
Bias is another aspect that can mess up results, although not always. Some experiments are not determined by human judgement. A spectrophotometer for example, simply gives you a result which you record. So does a pH meter, or any other scientific instrument. Bias comes in when a judgement call must be made. Is the sample coloured green or yellow? Green or blue? Turquoise or green? Red or orange? Every time you have to make a judgement call, you allow for bias to enter into the results. You can correct for bias by increasing the number of people on whose judgement you rely. The people you call on to help you determine the colour of a sample should know as little about it as possible to ensure that their judgement isn’t impaired by any notion of what the colour should or shouldn’t be. This is especially relevant in the soft sciences where the researches rely on questionnaires and interviews to gather their data. In those, bias is an ever-present danger.
Which brings us to the final criterion: statistical significance. When processing data, scientists often present data in the form of graphs and charts and tables of numbers. There are always variations in results. These differences are due to small changes in conditions from one moment to the next. These small changes can add up after a while, hence why controlled conditions are so important. But even under controlled conditions, small variations can occur. Depending on the sensitivity of your instruments, your results can vary a little or a lot, and depending on the sensitivity of your methods as well. Some experiments require more sensitivity than others. This is where averages comes in. Usually, for every test, you test three or more times, and the take the average. If the amount of variation between the different readings of the same sample are small, then good and well. That’s as it should be. Variation between different readings of the same sample should be small. Differences in readings between different samples can be larger because conditions at the source of the samples may be slightly different. We use statistics to determine whether differences are significant or not, and the key to good statistics is sample size. The more samples you have to test, and the more often the readings fall into a certain margin of variation, the more reliable your data will be and the less statistical error there will be.
When it comes to observational testing of hypotheses, you can use different methods to back your observations, such as using both fossil record and genetic data to support the theory.
By carefully documenting and accounting for various sources of error, we can make sure that the basis for the conclusions we draw are solid.