The crowdsourcing stakeholder view on 'What do you wish you'd known at the start?'

As promised, here are the responses to one of the two key questions we asked 'behind the scenes' workers or stakeholders on crowdsourcing, citizen science, citizen history, digital / online volunteer projects, programmes, tools or platforms with cultural heritage collections. Our thanks to everyone who contributed and helped spread the word!

Context: this survey had 32 responses, and this question had 30 responses. The first response was on February 9 and the last response was March 1, 2021. It represents a sample of convenience, dependent on the team's reach on social media, and the membership of mailing lists and newsletters we posted to. There's further information about these informal surveys at 'We want to hear from you!', and we're happy to answer questions about our project. We've tagged our survey results posts for quick reference.

What do you wish you'd known at the start?

To be a lot more specific in what I wanted images of. Also, it would have been great to have warning about the pandemic

To be honest I was fairly experienced with these things

In general I wish I knew how long these projects take to plan and execute so we could begin sooner. Specifically though I wish I had more experience working with data particularly in terms of thinking about what to do with the data once you have it (it would have been useful in terms of knowing what we wanted to ask volunteers to do)

It is very easy to set up a project on Zooniverse, but much harder to make sense of the data you get. I underestimated how much time it takes to clean and aggregate the data, and it took me a while to find an IT expert to help me.

We are only 9 months in to the project, and are actually launching to the public in a month. But already through beta tests we are seeing how the design of the project could have been improved and optimized for easier aggregation and processing of responses. The majority of the institutional buy in and focus is on getting these projects up and running with the public; but often neglects to account for the time and resources that are necessary to process the results in to a usable and worthwhile set of data.

1. That the onsite workflow would not be feasible for more than a few months due to COVID. When we pivoted to fully virtual we realized the project lifespan was going to be much shorter due to the Zooniverse volunteer base which is much more dedicated and committed. We were able to lower retirement limits and process our data sets almost 40-50% faster than expected. 2. The work that is needed to continue to run the project. Our Collections team was warned by the citizen science team that it would take a lot of work to not only respond to talkboards and questions, but also to aggregate responses and process exports — and we wish we had really taken that advice to heart when planning our yearly capacities.

Nothing.

I was nervous getting the project started as I personally didn't have a huge experience of using Github and we hadn't seen anyone else using it in this way to really base it on, so we had no idea if it would work or if the target audience would be confidant using Github, but now I know how relatively easy it is to get results I wouldn't hesitate to undertake further projects using a similar workflow. It had been suggested initially that we simply put up a list of the notebooks and a guide to transcribing them, asking people to email us with a text/word file containing their finished transcription, so in hind-sight I'm glad I pushed the workflow through Github, as we had no idea how many people would participate and it made the workflow manageable.

How quickly volunteers would go through all of the original sets of cards. I didn't realize I needed to prepare many, many years of cards in advance of starting the project.

I underestimated how engaged my students would be after I gave them the option to pick documents meaningful to them.

Read the Help section carefully – avoid wasting your own time. Expect to take a while to settle into your task.

I wish I'd known more about how to recruit participants from more ethnically diverse groups. We got there in the end, but only just (and with a combination of added resources and serendipity).

Ensure minimun agreed points for everyone who takes part in the project: academics, stakeholders, volunteers, politicians, funding partners etc.

Set boundaries on how much of your time and energy you'll invest in the project. You cannot make everyone happy, really, you can't. Realise that if people are making lots of demands on your time that you cannot always meet these demands. Invest time in testing training materials. The digital skills gap is much bigger than you think!

Make sure you know what the output is going to look like (valid XML? txt files? xsl files?) and what sort of massaging/vetting/verification it will need, and make sure that volunteers can access their contributions and contributor stats on their own.

Your standards aren't the same as your public's standards for crowdsourced content! And that's OK, just plan for how to address this. And another lesson learned: any crowdsourced project needs constant outreach attention – through emails, through social media, through your org's website. It's not a one-and-done marketing blitz,

Don't measure impact on the number of uploads or contributions. Online Cenotaph is a legacy project with a longtail so its impact can be measured by the way it has changed people's views of themselves and their ancestors.

How much effort it takes to set up technical environment and then to motivate people to take part

I would wish to had known at the start that crowdsourcing is extremely time-consuming and that you have to put in a relatively large amount of effort to get people to participate (at least in the case of VerbaAlpina).

That we would not only be successful in increasing the representation of women on Wikipedia, but that we would also be successful in curtailing harassment, and be a leader in the area of "Apply Technology for Women’s Empowerment and Digital Inclusion". In other words, be aware that your goal may be X, but Y and Z may happen, too.

Start small , but be curious and examine the entire contents of a grubby folder .

1. Transcription projects draw a smaller audience than classifying animals and stars. As one of the first transcription projects on Zooniverse perhaps this was a lesson that we could only learn from experience. 2. It's important to acknowledge that transcription is hard. This is related to point 1. But as with teaching it's a balance between emphasizing the challenge, and telling the volunteers that you can do it. 3. The audience will have a different relationship to, and interest in, your material than you do. Even though you must tell them the science, they are participating for their own motivations. Rather than seeing yourself as a missionary trying to convert volunteers to your interests, it's perhaps better to see yourself as trying to collect a traveling circus of misfits who nevertheless can make something incredible. 4. Contextual knowledge of the language and the words is critical in accurate transcription. Yet volunteers often focus on the literal letter-by-letter transcription. You can put up videos about paleography, but I believe only a minority will watch them. 5. Although these projects are over the internet, I believe projects that make the most success have been able to engage heavily in person (this relates to point 3. In person you can collect your traveling circus, and they will collect more people for you, eventually)

My colleagues on the Wellcome Trust-funded Addressing Project and I are using Zooniverse to transcribe the pension records of around 30,000 nineteenth-century UK Post Office pensioners. We started developing this project last year and launched on the platform at the end of January 2021. There is not much I wish I had known at the outset, the documentation available for Zooniverse and the active community of their forums means that it is easy to get a good grasp of what is involved in running a transcription project through their platform. I would strongly recommend it for other people interested in transcription projects.

be a volunteer before starting your own project for knowing and understanding both perspectives

You have to really think through the questions you are asking people. Every step is a barrier to data reconciliation and cleanup. Also this will take WAY longer than actually setting up the project and running it.

Taking data gathering simple, with a low barrier to entry, means that volunteers will more easily get engaged in the project

Helping people get the air quality monitors set up at home takes more time than planned

This is a hard one. Probably knowing how best to engage the rest of my organisation with the project in the hope of making it more sustainable

That institutional backing ghosts in and out and that being part of core documentation goals fro the institution would have made a big difference. Long term funding proves problematic.

Your passion will grow, while your capacity or resourcing may not – so start small, be generous and kind to participants, evaluate effort to outcomes, be creative, communicate often, and grab opportunities as they come (you'll soon learn with the previous bits of advice which opportunities can catalyze momentum)

Leave a Reply Cancel reply