As scholars who work at the interface of science, technology, and the humanities, we want to make it easy for researchers organize, store, and share their data. We know what it’s like to get by with spreadsheets: it gets the job done in the short-term, but long-term care and feeding can quickly become a nightmare. There are all kinds of reasons to get serious about data management and database design. Here are just a few:
Scale with confidence.
At a certain point it becomes unwieldy and inconvenient to load your entire dataset into your computer's main memory every time you want to perform an analysis. Load just the bits you need for your analysis, no matter how big your dataset becomes.
No data left behind!
Be confident that you can find the data that you need when you need it, and retrace precisely how the data was generated. Just because you will Always Remember Where the Data Came From doesn’t mean that the person who inherits your data will, too.
Spreadsheets won’t complain if your research assistant enters "YOLO" in a column meant for timestamps, or if they forget to enter some crucial identifying information. A well-designed database with proper constraints will give you peace of mind when five, ten, or fifty people are contributing to a dataset.
Don't dread integration.
Pairing your data management plan with a well-designed database lays the foundation for smart data integration down the road. Communicate with confidence about the structure of your data, and reap the rewards when it comes to pooling data for larger analyses down the road.
Poorly designed data structures can make simple maintenance tasks—like updating information about a field-site or an instrument that applies to big chunks of your dataset—extremely arduous. Get clear about the concepts and relationships in your data, and store them in a way that makes maintenance easy.
Make sharing easy.
Whether you need to meet repository requirements for data formatting, or want to build a high-quality data portal for your field, a well-designed database is the first step. As funding agency requirements surrounding data management and sharing become more robust, you’ll be ahead of the curve.
No, you can’t just do it later.
The longer you go without a data management plan and a well designed database, the harder it will be to adopt those tools in the future. A common misconception is that using a formal database will constrain your work down the road. In fact, the opposite is true! Being explicit from the outset about the nature and structure of your data makes the process of expanding and extending your data more precise and ultimately less time consuming.
The following options and service levels are intended to give you a starting point for planning. We will work with you to develop a plan that meets your specific needs and budget. The earlier that we can start the conversation, the easier it will be to hone in on a price-point. Don’t hesitate to contact us: we’re happy to field hypotheticals!
Small-Project Data Management Consultation and Database Design: $1,000 – $3,000
If you're planning a new research project, or would like to develop a more robust solution for an existing project, a data management consultation is the place to start. We’ll talk about your data creation and analyses processes, the nature of your data, and your short and long-term goals. We’ll then develop a simple workflow for metadata creation that meets the needs of your project, as well as a database design tailored to your goals and the nature of your data. The phrase “small project” can be a bit nebulous — we have in mind projects that revolve around a single research question, and that involve one to four distinct data types (e.g. data from different instruments, or generated via distinct workflows).
You can expect to exchange several emails, meet a few times via Skype, and receive a clear and simple roadmap for managing your data. If you then decide to hire us to implement your database design, we’ll throw in three months of cloud hosting (25% discount on a one year plan; see below).
Medium-Project Data Management Consultation and Database Design: $3,000 – $10,000
For projects that are a bit more complex, and involve a wider variety of data types, the consultation process will inevitably be more extensive. By “medium-project,” we have in mind projects that involve several related questions, and anywhere from five to ten data creation workflows.
You can expect to exchange several emails, meet a few times via Skype, and receive a clear and simple roadmap for managing your data. If you subsequently decide to hire us to implement your database design, we’ll throw in three months of cloud hosting.
Lab-Group or Large-Project Data Management Consultation and Database Design: $10,000+
Once we get beyond around ten data types, and into a much wider range of research questions, we should take a more measured pace. Depending on temporal and geographic constraints, we prefer to do an on-site visit and meet with as many project participants as we can. We expect to work with you over the course of several months to iteratively fine-tune your data management plan. Ideally, this would happen in conjunction with the actual deployment of your database and metadata management solutions (see below).
Database Deployment and Hosting: In the Cloud
Depending on the requirements of your project, a cloud-based database can be a low-cost high-security solution for your project or lab-group. We provide database implementation and management services built on top of the Amazon Web Services platform. Our service includes a web-based data ingestion platform that includes some simple monitoring tools.
We keep costs low by pricing this service in one-year increments. The main factors determining cost are (1) the size of your database and (2) the amount of network traffic that you anticipate (i.e. how many people will access the database, how often will you upload or download data, and in what quantities). For example, a small allocation suitable for teams of one to three people with low throughput starts at around $400/year. A beefy database designed for large teams and high throughput could cost as much as $40,000/year. We’ll help you to determine what level of service makes sense given your goals and budget.
You can expect easy and secure access to your data, daily backups, and prompt responses to questions or problems.
Database Deployment and Support: Your Hardware
If you have access to server hardware in your lab or at your institution, we’ll help you to deploy and configure your database. While we can’t provide the same level of technical support as we can for cloud hosting solutions, we’ll be available to help troubleshoot common problems. More importantly, we’ll provide guidance about what kind of hardware is appropriate, and help develop a plan to keep your database secure and running smoothly. Once your database is deployed, we’ll provide three months of technical support.
Data Portal Application Development
We want to help research communities find news ways to integrate and share their data. Researchers in many fields now recognize the value of aggregating data in centralized databases where investigators can query, download, and visualize a large number of related datasets all at once. If you see an opportunity to advance your own field in this way, we want to help. A project of this kind involves both robust and scalable database design, as well as an agile software development model that can respond quickly to community feedback. Please contact us to discuss your ideas and requirements, and we’ll work with you to develop a workable budget and roadmap.