I’m starting my second season of research and my data is a mess. I’ve got spreadsheets on my laptop, notes in my field book, datasheets in my folders, specimens on my desk, and tally marks on the back of my hand. I don’t think I’ve lost anything (yet) but I’m one stiff breeze away from a total catastrophe. How do I manage all this data!
Befuddled in Baie-Comeau.
Hopefully this response finds you still in possession of all your data, and your sanity as well.
Before we begin let me assure you you’re not alone. In the rush to get to the field or the bench and accomplish the ‘fun’ part of research we often neglect to stop and consider how we are going to handle all the information we plan to generate. Often the most appropriate method suggests itself, but like all aspects of your research it’s worth taking the time to think about how to process, mange and store your data.
First off, It sounds like you actually have two problems; how do you store the data you already have and how will you manage the data you are going to produce. Let’s handle these two issues separately.
Choose a medium of data storage that you like and are willing to use consistently. Some people will prefer a field or lab notebook, others will prefer spreadsheets or databases. Regardless of the method(s) you choose you have to commit to using it as the main repository for your data. So, Befuddled it sounds like your first task is it assemble your multitude of data and consolidate them into one common location. Which you choose is up to you, but here are some suggestions to consider as you decide.
Is it appropriate for all my data? – Spreadsheets are great for tabular data, but are poor repositories for field notes or digital media files. Have you explored databases?
Is it secure? Your data has value and it should be kept in more than one place. Digital data is relatively easy to backup and store, but do you have a plan? What about large data files? Consider also how you backup and store written material like protocols, field observations and data sheets. Plan for the worst, what happens if your lab burns down or your laptop is stolen.
Is it compatible? Your data ‘belongs’ to you (at least for now) but its likely your supervisor has a vested interest in it as well. For digital data consider the programs you use, can they be opened on another computer on another operating system? If you’re part of large project, does your method of data storage mesh with that of your collaborators? For analog data, do you follow your lab’s protocols for recording measurements? Does your supervisor have a preferred method of data storage?
Is it achievable, searchable and understandable to others? You’re probably collecting data with your own project in mind, but you should consider the possibility that you may want to use it later, far down the road from now. Could you pick up your data and notes from last summer and understand them 2 years from now? 10 years from now? In a more basic sense where will you store your data, in what form and who will be responsible? More importantly could someone else access your data and, without your input, discern what you did and what your measurements mean. Remember, data without context is noise. If you go to the trouble of measuring something make sure you can maximize its value now and in the future.
There is any number of digital tools and computer packages to help you manage data, probably to many to list here, not to mention there will be no clear cut solution for each researcher or field. In short, Its up to you to find the tools that you like and that work for you. In my opinion everyone’s ‘research toolkit’ should at the very least include a notebook and a good pen. Nothing digital has yet replaced the portability and fail-safe attributes of paper and pen for recording observations. In my own research I find it helpful to write down the details of my analyses, the location of my data files, and even paste in the outputs of statistical analyses, and copies of my figures and graphs as I produce them. When used consistently a notebook can serve as a valuable reference for your ideas and thought processes as you’re working. If you’re stuck for a way to organize your data, I’d suggest starting with pen and paper. This method may not be optimal for all your data, but at least you’ll start to consolidate it in one place.
OK Befuddled, now that we’ve got your problem with your existing well in hand, how are we going to handle all those fresh new data points. Well, we’ve already covered storage, which will be a big component of your strategy for managing your data. What we want to do now is consider ways to prevent you from getting disorganized in the first place.
Presumably you’ve already thought up and designed your new experiments or at the very least have an idea of what data you want to capture. Perhaps you’ve even talked to a statistician about your experimental design. Your first goal then, should be to design a way of effectively capturing all the data you want to collect. Typically this will take the form of a data sheet. Designing a data sheet feels like a trivial exercise but consider this; your data sheet is the ultimate record of the data that will form your research. You may have a good idea of what you want to collect, but how about your field assistants – could they take your data sheet and fill it in without you being there? More importantly if you have many different measurements to take, a well designed data sheet can serve as a reminder to make sure all the data you want are collected. Taking some time to figure out how to capture data goes a long way to making sure you collect everything you need.
Once your data are collected your first goal should be to transfer them as soon as possible to a safe and secure place. Now here I’m not talking about locking them in a truck, or storing them in a binder. An important part of managing your data is getting it into your data storage system as soon as possible. This practice serves three important functions; 1) it secures your data against accidental loss; 2) You can immediately review your data and check it for errors (Imagine discovering in September your pH meter went south 2 days into a 20 day experiment back in July, or that your assistant is misidentifying your specimens) and most importantly 3) You can start working with your data. This last aspect is probably the most exciting as you can often see trends starting to develop, and if you’re lucky you may see patterns emerge that suggest new directions for research. Plus, by this point you’ve probably been sweating over your project for quite a while now, wouldn’t it be nice to see some payoff? Seeing trends starting to develop in your own data can serve as a big motivator on those rainy days in the field, or when the instruments just aren’t working. Plus it will serve to keep you on top of entering your new data as it comes in.
Now that your data is being managed and transferred into your storage system the next steps of analysis and presentation should be easy (or at least easier). Probably one of the greatest wastes of time in any project, and the act that introduces the most error, has to be ‘reshaping’ data into a form that it is compatible with whatever analyses you decide to run. Having a coherent and consistent plan to manage and store your allows you to enter data once, check it for errors and then proceed unfettered, without needing to consider your data’s structure. Being able to trust in the integrity of data allows you to be confident in your results. After all that writing the papers should be easy!
A short aside. As more and more researchers become ‘cogs’ in the machinery of large, multi-disciplinary, long term projects its important to consider how your data fits into the overall enterprise. Sometimes staff may be assigned to keep watch over the data (These are known by many names: Bioinfomatician, Database manager, Project co-ordinator, Research director). Within these large projects the way you collect data, and even your experimental design, may be dictated to you by the needs of your project. Ideally, you may even be allowed to design your own experiments that operate within the larger project. In either case it’s wise to consult these “keepers of the data” early on in the design stage of your experiment. They may have already developed some of the tools you may need in your research, and will likely have useful suggestions for how you might set up your data management system. Consulting a data management professional early (even if you aren’t part of a large project) may even save you time at the end of your research if you’re expected to deposit your data into a central framework. No one wants to spend a week after they defended their degree reshaping spreadsheets just so they can get that last signature on their thesis!
So Befuddled, I hope that helps. In the perfect research project we’d pay as much attention to managing the data as we do to the design and analyses of the experiments. In reality the impetuous to get to the field or the bench often overwhelms our desire to carefully plan for every aspect of the research (as you found out). Taking a little time to think about the data before collecting it saves more time than it wastes.
Have any other suggestions for Befuddled? Comment on this column at https://mrbugman.wordpress.com
I’d like to thank Jason Edwards, Charlene Hahn, Brad Tomm (Field co-ordinators and data managers for the EMEND project) for helpful discussions that shaped the response to Befuddled; and David Langor (who still uses data from his PhD) for suggesting this month’s topic.
This post also appears in dead-tree and pdf version in the June 2009 edition of The Bulletin of the Entomological Society of Canada