Quantcast
Channel: Data Carpentry
Viewing all 170 articles
Browse latest View live

Congratulations to Greg on his new position!

$
0
0

As many of you will know by now, Greg Wilson, Director of Instructor Training for Software Carpentry (and thus indirectly also for Data Carpentry), has announced that he will step down from his position at the end of January 2017 to join Shopify as their Computer Science Education Lead. We’re very excited about this new opportunity for Greg to continue to reshape how people learn computer and data science skills, and we are enthusiastic to see where his creativity and passion about democratizing digital literacy skills take him next. Greg is also the reason we are here in the first place, and so this is an opportunity to say thank you to him.

At Data Carpentry, we’re united by a deep sense of gratitude for everything Greg has done for the Carpentries. He’s the giant on whose shoulders we all stand, and it’s his tireless work, infectious idealism, and unbending commitment to fostering community, collaboration, and openness that has turned a few lessons on better research software practices into a worldwide movement that has had a real impact on how we do research everyday. Along the way Greg created an ethos of collaborative and open lesson development, of teaching shared curriculum, and of learning to be a better teacher. Most important of all, he inspired a community of people who passionately work together towards shared goals. He showed those of us who felt alone in our labs working in programming or data analysis that we weren’t, and enabled us to connect with others who shared our enthusiasm for data and code, as well as our passion to share that perspective with others.

When we recently surveyed Data Carpentry and Software Carpentry instructors, the most common reason cited by respondents for why they wanted to become instructors is because they talked to Greg. Greg has collectively and individually inspired over 700 instructors that we have today as well as countless learners in any of the workshops that he’s taught. He’s made us feel like we can do more than we thought and that we had something important to contribute. For Data Carpentry, he is the one who encouraged us to scale it beyond an initial few workshops, to make it a Carpentry, and he served on the Steering Committee for the crucial time of getting it off the ground.

Greg came from the code world, but what he cared about first has always been people. He is the kind of person who will advocate for everyone else he knows before he does so for himself. He embodies the desire to make the world a better place, by giving more people a chance to participate and to be heard. We’re glad that Greg will continue to be a part of the community, including occasionally teaching workshops and training instructors. Those of you fortunate to be in his classes will get to experience him first hand, but you’ll also see his passion and commitment to helping others through all the instructors he trained, and indeed through the Carpentries movement as a whole.

We’re certain there are many other stories from people who Greg inspired to be bold and create something new, and to go after something that was valuable but difficult. If you have a story of your own and want to thank Greg, please don’t hesitate to, in true SWC fashion, create an issue in the ‘conversations’ repo and leave him a message.

We all wish Greg the best in his new endeavor!


Join our new Mentorship Program!

$
0
0

The Carpentry community is growing quickly! Over a hundred new instructors have been trained in the past two months. These new instructors join our community because they’re excited to help fellow researchers who are struggling to analyze their data and because they share our commitment to reproducibility and open access. In spite of these shared values, however, a third of our instructors never go on to teach with us.

The Carpentry community has a strong tradition of supporting our instructors. We hold weekly live discussion sessions where those who are new to our community can benefit from the collective wisdom of more experienced instructors. However, we know that it helps to get some more support when starting out as an instructor. We as a community can mentor and provide that support to help the Carpentry community stay strong as we continue to grow.

Data Carpentry, in collaboration with the Data/Software Carpentry mentoring committee is piloting a Mentorship Program in the new year. This program will support instructors who are new to our community by matching them with a personal Mentor - instructors who have volunteered their time to help new instructors gain the confidence, technical skills, and teaching skills they need to successfully certify as an instructor and teach their first workshop.

We invite anyone who is interested to attend an information session. Whether you’re an instructor who wants to help your fellow instructors prepare to teach their first workshop, or a new instructor who would like some guidance - please come learn more about the program and share your thoughts about how we can best support our new instructors become active members of our community.

Please sign up on the meeting Etherpad: http://pad.software-carpentry.org/mentorship-info

If you’re interested in the program but can’t attend any of the listed times, please contact ebecker@datacarpentry.org.

How I Developed a Workflow for Success in Graduate School

$
0
0

My home state of Michigan and the surrounding Great Lakes area cradles 20% of our planet’s freshwater resource and yet growing up next to these ecosystems, I knew almost nothing about their biological inner workings. Now, as a PhD student and environmental microbial ecologist, I explore the most abundant and amazing inhabitants of the lakes in my backyard: the diverse communities of tiny microbes.

When I started my PhD in 2012, my ecological skillset was based on a low-throughput method of discovering who a few of the bacteria were in our environmental samples. I would spend weeks doing molecular biology in the lab and Sanger sequence up to 10 sequences to manually identify my bacteria through NCBI blast. It would take me several weeks to learn the identities of just a few of the bacteria in my samples.

Instead of learning about a single microbe, I am now interested in the millions of microbes that contribute to the complex lake bacterial community. My dissertation research questions include: How do so many species of microbes live together? Who are all of these microbes? Where within freshwater lakes do these bacteria live? How do they contribute to the food web and biological processes that happen within lake ecosystems?

Scaling up my science

A high-throughput approach is needed to answer these ecological questions. I isolate lake bacterial community DNA and RNA and sequence it with Illumina sequencing, resulting in millions of sequences. A million times more data than I had before! It would take years to do the manual work I had depended on earlier. I needed to learn how to scale up my analyses, match these sequences to a database, and uncover interesting ecological patterns.

I had to learn how to program. Already a graduate student, I had never taken a single class on programming or computer science! How did this happen? How would I learn how to code in addition to the fundamentals of ecology and evolutionary biology? I had a lot of catching up to do.

Developing my data analysis workflow

After 2 years of attempting to teach myself, with some progress, I enrolled in Dr. Pat Schloss’s semester-long Microbial Informatics class. The class included many of the topics of Software and Data Carpentry (i.e. reproducible research, git/GitHub, and R). It was in this class that I was introduced to the basics of programming and version control with git/GitHub for the first time. At the end of the course, Pat invited the class to be helpers at the first ever Software Carpentry workshop at U of M that he was co-organizing with Dr. Meghan Duffy. I took him up on his offer and decided to help out.

Through Pat’s class and helping out at the workshop, I was introduced and empowered to use what is now my current data analysis workflow: obtain millions of sequences, use the unix shell to quality filter and assign my sequences to their matching bacteria using mothur, a program written in C++. Next, I import large data files into R where I apply reproducible coding practices in RMarkdown (this was so influential - I even wrote a tutorial) in which I do all of my analysis. While I work through my analysis, I save and keep track of all my changes with git and GitHub using 5 main commands: git add <file>, git commit -n “message about the file, git push, git pull, and git status. (My remote copy of my work on GitHub saved my data analysis two times when my computer died!) Once my analysis is finished, I add a README to my github page and add the link to the methods section of my current paper (click here for an early example). Therefore, if people reading my paper are curious about how I did my analysis or how my figures were made, they can see the real deal and I am accountable to my science.

Expanding into other realms: remote, high performance computers & collaboration

I originally worked on my own laptop during my first dissertation project as I only had tens of samples, however, my second dissertation project has ten times more samples! So, I now use remote, high performance computers to do the heavy lifting of my research for me. With help from the SWC unix lesson, I mastered how to work with files using unix. I submit the work to the remote computers, it runs the most computationally intensive steps of my work and when it is finished I use globus to transfer the important files to my laptop.

Building on the foundation above, I have started and am excited to evolve a new branch of my workflow: collaboration. Now, I am beginning to build collaboration in my workflow to develop reproducible scripts that are useful for others in similar situations (or me in the future). To do so, I use git and GitHub to share and develop scripts with collaborators that have overlapping research goals. Together we can create GitHub issues to remind ourselves of what needs to be done or of dreams that we have for our work. To suggest changes and have a conversation about what we are working on we can send pull requests to each other. And who knows - maybe what we are working on could be helpful for others!

What I wish I would have known

First, I wish that I had been told earlier in my career that learning how to program is a fundamental skillset that all biologists should have. Second, don’t learn how to program alone. I tried it and it was less efficient than learning alongside others and learning through teaching and helping in SWC/DC workshops. Search for workshops happening around you or contact local people who are involved in SWC/DC. If there’s no one geographically close, sign up for a mentorship discussion and lean on those who are in other locations. This community is dedicated to sharing their erudition, excitement, and resources for those seeking to learn this ever so important skillset.

Developing a community of learners

Here at U of M, I am working with the University of Michigan’s Software Carpentry partner organization to develop a community of instructors/learners and to host workshops throughout campus. While our organization is still young, we have hosted several workshops and are starting to grow more. Even last week, I attended a two half-day Python workshop taught by Byron Smith and Jackie Cohen to learn how to use python with the Jupyter notebook. This community is an immense resource and provides an opportunity for all levels of expertise, a space to learn, grow, and depend on others while we develop programming proficiency.

Academia can learn from the SWC/DC community. The bottom-up design that SWC/DC uses empowers people at all levels to contribute, teach, and learn. Helping to host a workshop or submitting to the development of a lesson also provides a place for people to develop collaborative, leadership, and teaching skills that can also help benefit people’s next career stages. I am ever thankful to this community!

Career Pathways Panel Discussions

$
0
0

The Carpentries are excited to announce an upcoming series of panel discussions designed to help our community members become informed about the variety of career paths available to computationally literate members of their fields. Panel discussions will be held virtually in the months of January, February and March (tentative dates, below) with each session featuring 3-4 senior community members in Carpentry-related professions, including: tenured faculty, communicators/consultants, research software engineers, industry scientists, etc.

Panelists will discuss how their career path led them to their current positions, including obstacles or challenges they may have faced and how they overcame those barriers. Audience members will have the opportunity to submit questions for panelists, and time will be reserved for free-form Q&A.

To insure that all attendees have the opportunity to participate, attendance will be limited to 20 participants who have attended a debriefing within the last 3 months. To attend, please add your information to this form.

We are currently in the process of recruiting panelists and would love to have recommendations from the community! If you know of someone who would be a good panelist, please recommend them here by Monday, January 9.

Anyone with questions can send an email to Lauren Michael (organizer) at lauren1.michael-at-gmail-dot-com.

Tentative Dates
Tuesday, Jan 24 - 7am PST / 10am EST / 3pm UTC / 2am AEST (next day)
Wednesday, Feb 22 - 3pm PST / 6pm EST / 11pm UTC / 10am AEST (next day)
Tuesday, Mar 21 - 3pm PST / 6pm EST / 11pm UTC / 9am AEST (next day)

Soft(ware) Skills

$
0
0

I think, in the end, technology and data science are about communication.

As a biologist with a physics degree, I’ve always been called on by my peers to help out with the more ‘mathy’ parts of their work. As early as my masters, my friends and collaborators would send me their data, here and there, to have a look and make suggestions for how to proceed. More often than not, I’d either send it back to them asking lots of questions, or spend hours of my own time, cleaning, manually manipulating data until it was in a form that I could query it and looks for patterns.

I became interested in using legacy data to understand ecological patterns as a matter of profession. During my grad work, I used data collected by others to answer questions of scale- like identifying environmental patterns inducing migratory behavior in an invasive species, and building tritrophic population models to understand seasonal dynamics. And then in my first postdoc, I joined the Long Term Ecological Research network, and that basically blew up my brain- I suddenly had access[1] to a whole world of data documenting various ecosystem metrics over long periods of time. I spent several of my years working on a 30-year insect observation database- and I was able to uncover previously unobserved patterns in how communities of similar organisms respond to invasions. Through all this, I continued to work with others, helping them with their individual data problems.

Over time, I started to see patterns emerging, not in the data itself, but in the way the different sets of data were formatted- patterns in the errors and other problematic ways spreadsheets were being used, and how these issues were hindering things down-river. It was almost like the data was collected in a completely different universe than the world in which it was to be analyzed. There was a break in communication between the scientists collecting the data, and the computational tools they intended to use to analyze the data. But even worse, when people come to me for help with their data problems, I heard some of the same things over and over again. “I’m just not good at these things.” “I’m bad at math/programming/computers.” “I don’t know where to start.” “I’m afraid to do it wrong.” They look at everything that we teach in DC and SWC as ‘hard’ skills, and they see themselves as more…mushy? I think it makes it hard for some students to see themselves as capable practitioners of data science because they see it as so far from their identities.

I’ll admit, I found this frustrating. And I wanted to fix it. So I started writing about my data management struggles, tips, and tricks. This was how I got involved in the data management training and reproducible research community, and through that fell in with this rag-tag group of misfits here at DC and SWC. What I love about this community is its commitment to inclusiveness- it’s about helping individuals, wherever they come from and whatever skills they start with, to become better at what they do. And now I’m in pretty deep- last year I developed a semester long course that takes students through the DC and SWC curricula, with some reproducible research philosophy mixed in there, and apply the principles to real data- essentially offering a guided experience in making the post-data-collection part of the scientific workflow happen, and heck, even fun! And I’ve learned so much in the process. But moreover, I recently had an epiphany about technology in science. So often, teachers of technology and computational methods frame things as hard skills. But we learn these skills not (just) so we can put the bullet points on our CVs- we learn so we can work better together. So we can communicate our science better. So we can be better at answering the whys of our science.

With this in mind, my own workflow has changed dramatically over the last few years. I’ve adopted a sort of extreme openness and focus on reproducibility in my own work- everything I’m able to post goes directly on my github as it’s being composed- meaning that if disaster struck, others would still be able to build on my work.[2] But also, when I work with students and collaborators, my push towards better, more reproducible practice has helped these collaborations become more efficient and productive. For example, in a current collaboration, a student and I work together regularly on his data- we meet via Skype, share screens, and work through his stats issues by composing R scripts in R studio, then sharing them easily in Github. Whereas most students are comfortable using Skype or other video conferencing software to enable communication, RStudio and Github allow us to extend this- to easily exchange ideas, to document our progress. It’s actually easier and more effective than meeting in person- all of our changes are easily logged, we get to verify our work is reproducible as we go by testing it on multiple computers, and short meetings and quick questions can be easily handled without travel time or finding a meeting space factoring into our scheduling.

When we think about how we use these tools, I find that the soft skills they represent are both powerful motivators to the interested and very useful tools to convince the more hesitant to get on board. Have you considered upgrading your soft(ware) skills lately?


[1] As it turns out, I always had access. LTER data is, as a matter of policy, publicly available. I just didn’t know about it. Now you know!
[2] Incidentally, this has earned me the title of “Most expendable member” in my current research group.

Carpentries Career Pathways Panel: Raniere Silva, Geneviève Smith, Tiffany Timbers

$
0
0

Tuesday, January 24, 7am PST / 10am EST / 3pm UTC / 2am AEST (next day)

On Tuesday, January 24, the Carpentries will host the first of three Career Pathway Panels, where members of the Carpentry communities can hear from three individuals in careers that leverage teaching experience and Carpentry skills.

Anyone who has taught at a Carpentry workshop in the last three months is invited to join, and should register ahead of time in order to be invited to the call. Registration is limited to 20 people per session, so please only commit if you are sure you will attend. Attendees can register for any number of these sessions. Future, monthly, panel sessions will occur on different days and at different times. Each session will last one hour and will feature a different set of panelists.

For the first session, we are excited to be joined by the below panelists!

Raniere Silva

Community Officer at the Software Sustainability Institute, UK. I’m Brazilian, just completed my year living abroad, and my background is applied mathematics. Most of the time I select Python as the tool that I will use to solve my tasks but I’m jealous of those who use RStudio. My dream is that South America host as much Carpentries workshop (Software Carpentry, Data Carpentry, Library Carpentry, …) as US, UK and Australia.

Geneviève Smith

I’m the Head of Data Science at Insight, where we run training programs for quantitative PhDs who want to move into careers in data science, data engineering, health data, and AI. Prior to joining Insight I did a postdoc and earned my PhD in Ecology, Evolution & Behavior from UT Austin. My research focused on the role of competition in structuring ecological communities of species through a combination of field-based experiments and theoretical modeling. During my time in grad school I participated in multiple Software Carpentry workshops, volunteered at a few, and trained to be an instructor. Those experiences were critical in my development as a coder and helped me gain confidence while building evidence of my computational skills.

Tiffany Timbers

Tiffany Timbers received her Bachelor of Science in Biology from Carleton University in 2001, following which she completed a Doctorate in Neuroscience at the University of British Columbia in 2012, which focused on the genetic basis of learning and memory. After obtaining her doctorate, Tiffany carried out data-intensive postdoctoral research in behavioural and neural genomics at Simon Fraser University (SFU). During this time, she also gained valuable experience teaching computational skills to students and scientists through her work with Data and Software Carpentry, the SFU scientific programming study group, and teaching a course in computation in Physical Sciences at Quest University. Tiffany began her current teaching role in the University of British Columbia Master of Data Science program in the summer of 2016.

South Africa's North-West University Becomes Software and Data Carpentry’s first African Partner

$
0
0

In November 2014 the first large-scale Software Carpentry event was run in South Africa [1] as part of the eResearch Africa conference [2] in Cape Town. Since then 15 more Software, Data, and/or Library Carpentry events were run by the Southern African community across many disciplines and several institutions [3, 4, 5].

Feedback from the NWU Genomics Data Carpentry Workshop in September 2016

The North-West University [6] has been heavily involved in further developing the Southern African Carpentry community. In 2015 NWU led the development of a 12-month proposal [9] that kicked off in April 2016 with the first South African in-person instructor training event [10]. Since 2015 NWU has been involved in four internal Software and Data Carpentry events as well as four events run at other Southern African institutions. The university currently has five qualified instructors as well as two preparing for check-out. Instructors hail from diverse disciplines such as genomics, digital humanities, chemistry, and IT.

At the end of 2016 the NWU entered into a gold partnership with Software and Data Carpentry. The partnership marks the beginning of a new phase of capacity development around computing and data at the university, it is the culmination of months of hard work, exciting workshops, and interesting conversations with colleagues from all over the world. The NWU Chief IT Director, Boeta Pretorius, has been the main sponsor for Carpentry activities around the university and hope that the partnership will help to develop and enhance computational research skills amongst NWU researchers and postgraduate students while developing increasing numbers of local instructors. The training events have been run as part of the NWU eResearch Initiative [11] which commenced in 2015.

We look forward to continue our collaboration with Software and Data Carpentry and with you, our community!

[1] [https://software-carpentry.org/blog/2014/12/cape-town-swc.html]
[2] [http://eresearch.ac.za/]
[3] [https://software-carpentry.org/blog/2016/01/a-year-of-swc-in-south-africa.html]
[4] [http://www.datacarpentry.org/blog/genomics-nwu/]
[5] [https://cmacdonell.github.io/2016-08-25-CSIR/]
[6] [http://www.nwu.ac.za]

[9] [https://figshare.com/articles/A_Programme_for_the_Development_of_Computational_and_Digital_Research_Capacity_in_South_Africa_and_Africa_-_phase_1/3382168]
[10] [https://software-carpentry.org/blog/2016/04/south-africa-instructor-training.html]
[11] [http://www.nwu.ac.za/eresearch]

The SQL Ecology Lessons

$
0
0

I am fond of saying that ecologists should not be afraid of big data – instead, we have to deal with small, complex, and poorly connected data. Understanding how we can stay on top of things, data-wise, is becoming more and more important. And some of the practices used to collect a small amount of data do not scale well at all when the amount of data increases, even if slightly.

Knowing ecology is important. Understanding the natural history of your model, the conditions of your field site, the big theories that we used to make sense of results, are obviously things that matter, and they receive a lot of emphasis during the training of ecologists. But moving from observation to insight requires to make sense of the observations, and this is turn require good data management practices.

I couldn’t have been happier to help develop the Data Carpentry SQL ecology lesson. Up to this point, it was an important resource in the lab, and in the community at large. It served as a basis for some material in a semester long class I am currently giving. And the question I keep coming back to, whenever we discuss adding, tweaking, or removing material from the lesson, is: “does it lead the learner to better practices?”.

I do not think this lesson teaches “best practices”. I don’t think I am even remotely qualified to discuss best practices of SQL. But emphasizing good practices is something I feel comfortable doing. So what are good practices?

Good practices are something that you could, realistically, apply tomorrow. This means covering the basics, and hinting at the very cool things that can be done with more advanced features of the language. In-class, I will motivate this by inviting guest speakers that added some of this methodology in their toolkit, and discuss how it helped them do research. In a workshop, it can be short motivational story that will be familiar to former workshop attendees.

More concretely, good practices are that which minimizes the chance that something goes wrong (there has been much PDF ink spilled on the fact that Excel does, in fact, sometimes changes values to other values), and give you a productivity boost. Learning a new anything can be intimidating, so good practices must be, to some extent, re-insuring.

The things we focus most on the lesson are organizing data to avoid duplication (why have the field site info at each row, when you can just use another table and link them with IDs?), and data retrieval. Data retrieval, in SQL, can encompass a lot of operations: merging, aggregating, conditionals, counting, calculating averages, and so forth. I like the “ah ah!” moment where learners realize that operations that would take a few lines of R, or all your willpower and sanity in a spreadsheet, can be done in a single SQL line.


The Community of Carpentry Enthusiasts at the University of Wisconsin

$
0
0

I am so thrilled that we have been asked to comment on our growing Software and Data Carpentry community at the University of Wisconsin-Madison. While I initially joined the greater Software Carpentry community as part of my ongoing role in providing local workshops (four years ago!), I must say that I’ve been happy to contribute back, well outside of my typical work week. These organizations do so much for the UW-Madison campus, as does our local community of ‘Carpentry’ enthusiasts.

In this second week of January 2017, we’re teaching our 12th Software Carpentry (SC) workshop since early 2013 and our 4th Data Carpentry (DC) workshop since summer 2015, with roughly 20 instructors and helpers, almost all of whom donate their time! As a result of prior ties with SC-extraordinaire Greg Wilson, we actually began cultivating this local group of enthusiasts when we ran more casual “boot camps” of test curricula even before Software Carpentry got it’s name. It was that early enthusiasm, and the successful leadership of individuals like Greg Wilson, Paul Wilson (UW-Madison), Katy Huff (previously UW-Madison), and several others, who set us off on the right foot with a small community that I inherited leadership for. While I’m not sure that same dynamic and timing could be replicated, here’s what we do today that I believe strengthens our community and could be applied anywhere:

Recruiting members

Even at our very first “Software Carpentry” workshop in April 2013, we invited attendees of the workshop (in wrap-up discussion) to join us for future workshops if they thought they would like to contribute as a helper or instructor. To this day, our thriving community of helpers and instructors is made up primarily of prior attendees who are research graduate students (mostly), post-docs, or staff. We also have 2 faculty and a few campus staff who are not prior attendees, but who have a professional interest and/or role in enabling others through computation/data practices and who heard about the workshops via adjacent efforts on our campus. Therefore, our members’ inherent interests and our methods of recruiting them mean that everyone values our ‘Carpentry’ work for enhancing the computational capabilities of others and for enhancing their own teaching skills/credentials. Aside from myself and Christina Koch, who are lead hosts/organizers of our workshops, everyone else effectively donates their time to workshop decisions and execution.

Evolving community members into effective contributors and instructors who keep coming back

The two primary practices that have worked in developing the interest and contributions of our members have been to:
1. give work back to the people, by cultivating a culture that values fresh perspectives from new members and invites all members to contribute to our goals and work, and
2. invite gradual escalation of contribution by each member, within their comfort zone.

As a prominent example of the second point, our unofficial cultural expectation within our community is that everyone serve as a workshop helper, when available, and that anyone wanting to instruct is expected to have prominently helped at one or more prior workshops. Given that we have several certified instructors, we can allow non-certified community members to try instructing while still providing a ton of support from those who are more experienced and have gone through Instructor Training. This advantage of having a local community lowers the barrier to entry as a workshop instructor. (Of course, all of our members who were available were ecstatic to attend our first on-site Instructor Training in fall 2016!!)

On the first point, our two main organizers have gradually given more work to members with each workshop we offer, by even increasing their roles in logistics planning, mentoring each other, and leading sub-projects that complement our workshops. Our first Data Carpentry workshop, for example, was made much easier when our members leveraged their professional networks to gather help from instructors on- and off-campus with relevant experience in DC topics. We are also finding that our members are interested in having regular meetings (monthly) outside of workshop planning, and that this helps to make helpers feel even more invested and to impact how we execute our workshops. It also allows us to extend #1, above, to the following …

Giving the community a purpose beyond local workshops

When your instructors and helpers reflect on workshops and how to make them more effective, find ways to channel that energy into benefits to your local community and to the greater Software Carpentry community. In an earlier time when very few institutions taught and regularly contributed to the Software Carpentry curriculum, we encouraged our members interests in holding hackathons, at that time, and continued to develop and teach some older curricula even after Software Carpentry made its first major change to more-novice materials in early 2014. Why fall a bit behind the SC standard? Because it kept our instructors and helpers motivated, kept our curriculum fresh, and allowed us to contribute more insights to Software Carpentry’s new (2014) materials when we did transition to it in 2015.

More recently, our own Sarah Stevens led an effort to create the installation video tutorials that now appear in the Software Carpentry workshop template, after we had the idea come up in a post-workshop discussion. Given less-than-successful attempts by ourselves and DC/SC to have open office hours (with few learners showing up), we’re also trying out an email help list to provide on-demand assistance after workshops. And we’re excited to report back our findings after we’ve had time to reflect. Bottom line: when we encourage our members to think critically about the workshops and community, the entire DC and SC communities benefit, and our members are invested in the cause.

Let us know what YOU think

We may have had a unique start to our local community years ago, but the methods above truly reflect what keeps our Carpentry community effective and productive, with members who keep coming back to continually improve our workshops and to contribute to the greater community. That said, we are far from perfect, and there are likely additional strategies that we’d love to learn about from other sites. We’re also happy to discuss the activities of our local community more, should anyone like to get in touch, and hope that this blog post helps with our goal of being more formally involved in the international Carpentry communities. Please feel free to write to us (swc-dc-help-AT-lists.wisc.edu) or to me, directly, if you’d like to get in touch (lmichael-AT-wisc.edu).

Best recipe - just add statistics and science

$
0
0

We’d like to talk about our experiences working together as a domain scientist and a statistician and encourage you do this too!

When we say working together, we mean that we collaborate in the best sense – we each bring our strengths and those strengths are complementary. We have some suggestions for you as you get started. First, the domain scientist should look at how statistics is organized at their institution. Does your university have consulting statisticians or research-active faculty or both? Consulting can also be done as part of statistics students’ training. Most larger universities have both consulting and research. Some scientific questions naturally flow into research on new statistics methods and concepts, and other science questions are a better fit for an analysis method that has already been published. Talk to the consulting center first if you have one.

Collaboration between scientists and statisticians is essential and it is best to start before actually gathering data, even before the experimental design is finished. Susan has met with a number of faculty and students after data was collected and had to be the bearer of bad news…the data collected could NOT answer the question at hand. This could be due to a number of reasons – small sample size, wrong experimental design, not a random sample, incorrect information collected, etc. Don’t let this happen to you!

What has our path been like?

Susan: I have been very fortunate to have had the opportunity to work with many great scientists. The problems that I’ve been able to work on have been extremely fun and exciting. Not only did I have a chance to answer scientific questions, but in doing so, I’ve had the opportunity to expand my statistics knowledge. One such example was developing a model for plant quantitative trait loci. We developed a great hierarchical Bayesian model that used a Markov Chain Monte Carlo Model Composition (MC3) approach in identifying important markers. This was the analysis that started me down the path into Bayesian statistics. Before this analysis, most of my work focused on frequentist approaches.

Ann: I first appreciated statisticians when I brought a difficult experimental result to a university statistics consulting department and the statistician both solved the problem and taught me to do bootstrapping in SAS (back then it it was a new tool, biologists had not heard of this). I then convinced my advisor that this was the right analysis. Gail, I salute you!. Then Susan Simmons and the other statistics faculty at UNCW continued to educate me…I especially remember learning about Bayes’ theorem from Ed and Susan, and Susan explaining known-truth simulations and validation to me (which has let to an ongoing cyberinfrastructure project with many great students from computer science and statistics). I am now working toward understanding causal calculus, tensors, and U-statistics with patient tutelage from Yishi Wang and Cuixian Chen. There is always more fun stuff to learn. I have no intention of being an expert – and I don’t have enough formal math – but I can bring lots of disparate and interesting data and methods to the table.

What should you expect as you start down this path?

As a scientist meets with a statistician, the statistician asks many questions! The statistician tries to get a good grasp of the problem and will therefore really probe the scientist about exactly the question at hand. The statistician will also question and try to assess what limitations are evident in the problem and what are the best ways to overcome these limitations. Also, due to the amount of probing, statisticians can sometimes help scientists think of questions about their research that they did not even consider.

The goals of the statistician and scientists are the same – they both want the best research conducted with appropriate answers and solutions. With that said, the statistician must understand the data, the problem and the question to develop the correct methodology to use in analyzing the data.

What to do

Scientists should provide as much background information on the topic as possible. Keep in mind that the statistician might not have a background in your area, so be prepared to provide basic information. Try not to use too much of the jargon that is specific to your area. Be patient….this is true for both the statistician and scientist. In most cases, the two areas will have different terminologies. Keep in mind that the scientist needs to clearly relay all pertinent information about the problem, and the statistician needs to relate the correct methodology needed to analyze the data. This may take some time, but it is important for both parties to have this understanding. Communicate! Just as we mentioned previously that the scientist needs to communicate the problem well, the statistician also needs to communicate the analysis well. For example, if the analysis requires a certain assumption (for example, normality), then it is important for the statistician to relay this information and ensure that it makes sense that this assumption holds for the data. To get the best results, the scientist and statistician need to communicate throughout the entire process.

Many biology papers use outdated or just plain wrong statistical methods and visualizations. You can do better, but you may get pushback (sad, but common). Your statistics colleague can teach you how to explain the better analysis methods that you are using, and serve as the expert to convince reviewers.

This topic from the stats side, http://simplystatistics.org/2013/10/09/the-care-and-feeding-of-your-scientist-collaborator/

Scroll down for some excellent comments at http://stats.stackexchange.com/questions/5597/statistics-collaboration

https://github.com/jtleek/datasharing and
http://simplystatistics.org/

http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1004961

Everyone has a professional association – check out these statistics societies’ conferences, http://ww2.amstat.org/meetings/csp/2017/conferenceinfo.cfm and http://www.amstat.org/ASA/Meetings/Joint-Statistical-Meetings.aspx?hkey=bc3bc257-950f-44f8-aed6-b37736571bfc

Ann’s opinion article about this kind of collaboration – http://journal.frontiersin.org/article/10.3389/fpls.2014.00250/full

Author Bios:

Ann’s highlights: published with statistician and computer science collaborators, funded by USDA and NSF, chair of Gordon Research Conference on Quantitative Genetics, UNCW mentor award

Susan’s highlights: published with various scientists and computer scientists; AE of Environmetrics; Council of Section representative for the Risk Analysis Section of ASA; elected member of the International Statistics Institute

Moving Forward

$
0
0

As of January 30th, Greg Wilson has stepped down from his role as Director of Instructor Training to start a new position as Shopify’s Computer Sciences Education Lead.

Instructor training will continue under the guidance of Erin Becker, Data Carpentry’s Associate Director and Maneesha Sane, Data and Software Carpentry’s Program Coordinator. Erin has a strong background for this role from her postdoc at University of California, Davis studying the effectiveness of training methods for transforming instructional practices. She has been involved with the Carpentry community as an instructor trainer, a member of the Mentorship Subcommittee, and leader of the effort to form an instructor mentoring program.

Maneesha is the Carpentry Program Coordinator, and serves as an active Carpentry instructor and member of the Mentorship Subcommittee. Maneesha’s hard work behind the scenes keeps Carpentry workshops running smoothly. She will now bring her expertise to coordinating instructor training events.

Erin and Maneesha have worked actively with Greg to ensure a smooth transition. We are conducting instructor trainings as scheduled, and are planning new events with Member Organizations. We will continue our efforts to train and support instructor trainers and build the instructor training program.

If you have any questions about instructor training, including the status of your institution’s planned training event, please contact us at admin@software-carpentry.org.

Standing for Inclusivity: a Foundation for Our Teaching and Community

$
0
0

Our goal as Software and Data Carpentry is to build a community teaching digital research skills. We teach essential computing skills to thousands of researchers in academia and industry each year. As an international organization we rely on the volunteer efforts of hundreds of researchers and professionals from around the world. Our volunteers come from diverse backgrounds, countries of origin, and beliefs. These individuals generously donate their time with the goal of helping to speed the discovery of new knowledge and the creation of new technology.

Actions and policies that arbitrarily restrict the movement of peoples based on their beliefs, national origins, race, ethnicity, sexual orientation, gender identity, or any other intrinsic class contradict one of Software and Data Carpentry core values: providing inclusive and supportive environments for all researchers. These harmful policies send a message to the highly-trained individuals who participate in and teach workshops that they and those like them are not welcome in the country where they collectively volunteer the majority of their time. They also put traveling volunteers at risk of being stranded far from their homes and families with little or no warning. These restrictions negatively impact our ability to teach others, collaborate and conduct scientific discourse and affect the advancement of research of all types.

We stand with those that have been harmed, both directly and indirectly, by any such actions or policies. If you are a researcher who is stranded and could use a local contact, contact us, and we will work to connect you with volunteers in our global network.

Transfer of Learning

$
0
0

Have you ever experienced déja vu? It’s when you have the feeling that what you’re currently experiencing has already happened. It can be extremely awkward, right? It’s a sensation that leaves you saying to yourself, “I did this already. This happened.” So, what do you do? Do you carry out the scenario as you remember it happening, or change it up? What about your perceived past experience makes you perform differently in the current experience?

Déja vu is a feeling of recollection–the feeling that you lived through an experience already. Studies have shown that similar spatial layouts between the new scene and the scene in your memory may contribute to the experience [1]. This idea of linking past and present experiences can be applied to teaching and learning. One theory,Transfer of Learning, describes how past experiences (transfer tasks) affect performing in new situations (learning tasks).

Transfer of learning depends on how similar the tasks (learning vs. transfer) are:

Near transfer: Transfer of knowledge between similar contexts.
Far transfer: Transfer of knowledge between dissimilar contexts.

For example, when I took my first MATLAB course in college, I relied on my previous high school experience programming in Pascal: near transfer.

When I learned to do clean and jerks in my weightlifting class, I relied on my knowledge of vectors from geometry to visualize where the barbell should go: far transfer.

Now, let’s think about how we teach Data Carpentry lessons. Ultimately, all learning is transfer–when learning new things one builds upon what was previously learnt. Instructors act as facilitators by encouraging learners to recall what they’ve already seen (déja vu). By encouraging learners to transfer knowledge, whether from near or far, we are giving them one more tool to help them learn the skills we teach such that they are able to master working with data easily and efficiently. When we ask learners to recall concepts from their previous studies and connect those concepts to what they’re learning in our lessons, we are encouraging them to store the information in their long-term memory.

Recognizing how transfer of learning can be used to teach our lessons can also improve our ability to assess learner’s skills and confidence. As learning is an active and dynamic process, learners have the ability to improve their learning by participating in dynamic assessment (i.e. the challenges throughout our lessons). These challenges promote metacognition, or, awareness and understanding of one’s thought process.

Think about your own learning. What are some examples of past experiences that have affected your learning or performing in a new situation? In particular, can you think of a time when you were able to transfer knowledge from either near or far? Did that help you learn programming? Teach programming? Share your experience below–you may just help someone.

[1] Cleary; Brown, AS; Sawyer, BD; Nomi, JS; Ajoku, AC; Ryals, AJ; et al. (2012). “Familiarity from the configuration of objects in 3-dimensional space and its relation to déjà vu: A virtual reality investigation”. Consciousness and Cognition. 21 (2): 969–975. DOI: 10.1016/j.concog.2011.12.010. PMID: 22322010.

How we’re getting things done

$
0
0

Adopting work cycles

The Data and Software Carpentry staff have been working together to make progress on projects that are important for our community. To help us do this, we’re trying out a new work process based on BaseCamp’s six week work cycle. You can read their blog post if you’re interested in the details of how structuring a work cycle works. We’re picking a small handful of projects to focus on for each six week cycle, with each staff member working on one or two projects. For each project, we’re setting realistic goals we know we can accomplish before the end of the cycle and holding ourselves accountable to meeting those goals. We’re spending the first two weeks of the cycle planning those goals, dividing up the work into teams, and setting timelines to make sure we stay on track. We envision this workflow having some specific advantages, including:

  • Reducing clutter and letting us focus on making progress.
  • Making it ok to say “we can’t tackle this right now, but that’s an important project, can we do it next cycle”?
  • Making it possible for busy community members to be involved without having to commit time indefinitely. (No commitments after the cycle ends!)
  • Bringing staff time and resources together with community enthusiasm.
  • Giving us a structure for regularly communicating what we’re working on with the community at large.
  • Providing passionate community members more opportunities to get involved.

We’re still working out some of the details of how working in cycles will work for us, but we’re excited to share our plan for the first round. If there’s something you’re excited about for the next round, let us know! If you’d like to join (or organize) a team for one of the next few cycles, let us know! Please post an issue on our conversations repo or email ebecker@datacarpentry.org.

Our first cycle - Cycle Prometheus (January 23rd - March 17th)

Our first cycle started at the end of January and goes through the middle of March. Here’s what we’re hoping to accomplish in our first cycle.

Planning for Data Carpentry Ecology Lessons Release

Tracy, François Michonneau, and Erin are working on Data Carpentry’s first lesson release! In addition to starting the process for releasing our Ecology lessons, we’re also working on setting up a process for future lesson releases. Based on Software Carpentry’s success with the Bug BBQ last year, we’re planning an Issue Bonanza to coordinate community effort on preparing the lessons for release. Keep your eyes peeled for announcements and ways you can contribute!

Streamlining Process for Instructor Training

Erin and Maneesha are continuing Greg’s instructor training work and are updating the instructor training program process for organizing training events and tracking trainee progress from training through checkout. We’re simplifying how we schedule instructor training events and putting together resources for instructor trainers. We’re also streamlining the process of tracking instructor trainees to make more efficient use of our staff and volunteer time. Lastly, we’re exploring our needs for new instructor trainers and planning the recruitment and training process. If you’re interested in becoming an instructor trainer, please email Erin so we can keep you in the loop about future plans.

New hire

Tracy, Jonah and Kari are working a new hire for Software and Data Carpentry. Posting coming Monday, February 20th, so keep your eye out for more information!

Setting an Assessment Strategy

Kari is developing a strategy for both near-term and long-term assessment of Data Carpentry workshops. She’s putting together new pre- and post-workshop surveys for learners at Data Carpentry workshops that will be piloted starting in April, as well as a long-term assessment for learners from previous workshops to be piloted by mid-March. She’s also cleaning up code and formalizing a template for regular quarterly data releases on assessment efforts. We need more Data Carpentry workshops to pilot our new surveys! Please consider organizing a workshop at your institution in April. Let us know what we can do to support you in getting a workshop set-up. Please email Maneesha.

Lesson Contribution Guidelines

Erin, Mateusz Kuzak, Aleksandra Nenadic, Raniere Silva and Kate Hertweck are working on making it easier for new instructors and other community members to contribute to lesson development. We’re reaching out to the community to understand roadblocks people may have with the development process, and then developing new documentation and resources to help reduce these barriers. We’re collecting feedback from all of the various discussion threads and GitHub issues. Please keep commenting there, and stay tuned for more opportunities to give us feedback!

Continuing Work

We’re also continuing to work on our many ongoing projects, including (but not limited to):

  • Publishing our monthly newsletter
  • Running our blogs
  • Maintaining our websites and lessons
  • Coordinating workshops and instructor training events
  • Teaching at workshops and instructor training events
  • Hosting discussion sessions and instructor teaching demos
  • Speaking publically about Data and Software Carpentry
  • Running our Virtual Assessment Network
  • Organizing our Mentorship Program
  • Serving on the mentoring subcommittee, trainers group and bridge subcommittees

If you’re interested in helping with any of this ongoing work, or would like to make suggestions about what to tackle in our next cycle, let us know! Please post an issue on our conversations repo or email ebecker@datacarpentry.org.

Our next two cycles will be:
Cycle Deimos - March 20th through May 12th
Cycle Phobos - May 15th through June 23rd

Job Opportunity: Community Development Lead

$
0
0

Software Carpentry and Data Carpentry are hiring a Community Development Lead!

We are excited to announce a position as a full-time staff member to lead community development activities! Software and Data Carpentry have an active global community of researchers, volunteers, learners and member organizations. This person will cultivate and grow this community, developing communication strategies and opportunities for the community to connect with and support each other. You will become an active member of our team of staff and will work with people around the world to advance our mission of building a community teaching digital research skills to make research more productive and reproducible.

As the Community Development Lead, you will oversee Software and Data Carpentry’s community engagement efforts to develop and support the community, creating pathways for participation and increased communication. You will lead blog, newsletter and social media efforts, help develop online resources, participate in the mentorship subcommittee and help facilitate the development of regional groups. You will also have the opportunity to guide efforts to reach underserved communities and to be involved in instructor training.

For details, including a full job description and the application procedure, please see the Jobs page. This is a joint Software and Data Carpentry position and is cross-listed on both websites.


Carpentries Career Pathways Panel - Marianne Corvellec, Bernhard Konrad, Aleksandra Pawlik

$
0
0

Wednesday, Mar 1, 3pm PST / 6pm EST / 11pm UTC / 9am AEST (next day)

On Wednesday, March 1, the Carpentries will host the second of three Career Pathway Panels, where members of the Carpentry communities can hear from three individuals in careers that leverage teaching experience and Carpentry skills. (Note: The date of this second panel was shifted from the originally-proposed date of Feb 22 due to scheduling considerations.)

Anyone who has taught at a Carpentry workshop in the last three months is invited to join, and should register by Monday, February 27 in order to be invited to the call. Registration is limited to 20 people per session, so please only commit if you are sure you will attend. Attendees can register for any number of these sessions. Each session will last one hour and will feature a different set of panelists. The final session will occur on Tuesday, March 21 at 3pm PST (panelists TBA).

For the March 1 session, we are excited to be joined by the below panelists!

Marianne Corvellec

Marianne earned a PhD in statistical physics in 2012. She now works as a data scientist at CRIM, a semi-public research centre in Montréal, Canada. She specializes in data analysis and software development. Before joining CRIM, she worked at three different web startups. She speaks or teaches at local and international tech events on a regular basis. Her interests include data visualization, signal processing, inverse-problem approach, assessment, free software, open standards, best practices, and community management.

Bernhard Konrad

Bernhard attended a SWC workshop in 2012 during his graduate studies, and was immediately fascinated by the world of opportunities and productivity that these software tools opened up. He taught a dozen workshops since, and started to work on software-related personal side projects. Bernhard then went to Insight Data Science, a Data Science fellowship in Silicon Valley. After interviewing with a few companies and after a complicated work permit process, he started his job as a Software Engineer at Google in early 2016. There, he develops internal tools for engineering productivity.

Aleksandra Pawlik

Aleksandra Pawlik is the Research Communities Manager at the New Zealand eScience Infrastructure (NeSI). Before joining NeSI in 2016 she worked for three years at the University of Manchester, UK for the Software Sustainability Institute where she was leading the Institute’s training activities. Software and Data Carpentry has been always a big part of her professional activities and allowed Aleksandra develop a range of skills, understand the research ecosystem and meet a number of amazing and inspirational people.

Reproducible Data-Driven Discovery

$
0
0

I spent the two weeks in January hanging out with some awesome scientists who are all passionate about the future of science. I was participating in two professional development events with support from Data Carpentry, and I’d like to share some of the highlights.

A Curriculum Development Hackathon for Reproducible Research using Jupyter Notebooks

On January 9–11, 2017, I attended my first hackathon at the Berkeley Institute for Data Science! The event was organized jointly by Data Carpentry and the Jupyter Notebook project. The goal of the hackathon was to develop a two-day workshop curriculum to teach reproducible research using the Jupyter Notebook. There attendees were a group of 25 scientists from the US, Canada, and the UK with diverse backgrounds with a unique set of skills and expertise. I was one of a handful of attendees that uses R Markdown more than iPython or Jupyter Notebooks; however, after seeing the notebook’s power and utility, I’m really excited about adding this to my reproducible workflow.

On the first day of the hackathon, we all sketched out the general workshop overview and learning objectives. Then, we broke out into small groups to design the specific lessons. I worked closely with Erin Becker, Elizabeth Wickes, Daniel Soto, and Mike Pacer to develop the lesson on publication and sharing. This particular lesson focuses on exporting reports for sharing, best practices for documenting your workflow, best practices for using metadata, and using DOIs and ORCiD to get credit for your scholarly work. Even thought the workshop curricula is not completely polished and ready to teach, we are all very proud of the made significant progress we collectively made. You can view workshop website here.

This curriculum is still being developed and revised on an ongoing basis. Want to contribute? If you are interested in helping with the development, have a look a this list of GitHub issues to see what is happening and what needs to be done. We’d appreciate your contributions.

Data-Driven Discovery Postdoc and Early Career Researcher Symposium

On January 17-21, 2017, the Gordon and Betty Moore Foundation hosted the Data-Driven Discovery Postdoc and Early Career Researcher Symposium. Over 50 young investigators supported from 14 different time zones gathered at Waikoloa Beach, Hawaii to network and discuss challenges and opportunities for research and careers in data science. The symposium was of the “un-conference” style that promoted group discussions among like-minded attendees and deemphasized traditional panels and speakers.

Each day the participants engaged in ice-breaker activities that gave us a chance to meet and get to know nearly everyone of the attendees. You might think that it’s a little cheesy to introduce yourself and also say your favorite comfort who or which famous person you share a birthday with, but I was pleasantly surprised at how often those bits of helped the participants get to know each other better. Another favorite icebreaker was the living poster session, where we spent about an hour illustrating our research or teaching and then another two hours learning more about everyone’s interests.

All participants played a major role in crafting the agenda by pitching and then attending “birds of a feather” breakout sessions. You can see the diversity of suggested topics by viewing the open and closed GitHub issues or the session notes. One day I participated in a breakout session about science communication. It was awesome to hear how everyone struggled with and/or managed the tricky balance of doing science and communicating science. To report back to the group, we listed some challenges and resources for science communication on big pieces of white board paper, which you can view here. The next breakout session I attended was about science activism. It was a little unfortunate the symposium conflicted with the presidential inauguration and women’s marches, but some of us stayed very engaged in what was happening five time zones away. The 15 or so of us in the activism group (for lack of a better word) are committed to staying in touch to share news and opportunities for promoting science awareness and literacy in our local and global communities.

Overall, the symposium was #MooreUseful and #MooreInspiring than I anticipated. On of the more useful things (in my opinion) was an around-the-room discussion of each person’s favorite new tool; take a look at this list to see the kind of tools and methods we shared. It was so inspiring to learn what the other grad students, postdocs, and research scientists were working on and to hear their career struggles and successes. I was able to synthesize tons of ideas for my future research and career, and my eyes have been opened to more of the challenges and opportunities that data-driven researchers are facing.

Reproducible Data-Driven Discovery

I’m not sure if anyone has already coined the phrase “Reproducible Data-Driven Discovery”, but I think its an awesome way to summarize these two events and the communities that made them happen. The Moore Foundation funds researchers who do science with lots and lots of data, and Data Carpentry and Project Jupyter are two of the Moore-funded organizations that are helping make sure the data-driven research is freely available, open access, and reproducible. I can’t wait to see all the new awesome things that these communities create and build!

Thanks!

I especially want to thank Tracy for the opportunity to attend both of these events. I thank Hilmar Lapp, François Michonneau, Jasmine Nirody, Kellie Ottoboni, Tracy Teal, and Jamie Whitacre for organizing the Hackathon and Chris Mentzel, Carly Strasser, and Natalie Caulk for organizing the symposium. I thank everyone who participated and helped make these events awesome! I thank Laura Noren for feedback on an earlier version of this post.

Run a workshop this Spring!

$
0
0

Did you know that 95% of Data Carpentry learners are first-timers? As a community we’ve impacted hundreds of learners around the globe, and 93% of our learners told us that their level of data management and analysis skills were higher because of our workshops!

We’re doing very well as a community, so let’s keep up the good work. We want to encourage you to either request a workshop, or run a self-organized workshop this Spring (March-May).

If you’d like to run a self-organized Data Carpentry workshop, but haven’t taught one yet, here are a few tips to get you started:

  1. Look at the lessons and decide which content you’d like to teach.
  2. Recruit co-instructors and helpers.
  3. Find and book a venue.
  4. Set up your workshop registration page and website.

Here’s the complete checklist.

If you’re interested in running a workshop this Spring, but need more information, attend one of our new virtual ‘Run a Workshop!’ office hours. During the “Run a Workshop!” office hours we will give a short overview of Data Carpentry workshops and you can ask any questions you have about logistics, content or anything else. Come by any of our office hours with questions for Tracy or Maneesha, and meet other instructors on the call. Dates and times are listed in this etherpad. Sign up if you plan to attend.

If you’re not a certified instructor yet, now is a great time to finish your checkout process. Just follow the steps in our checklist!

You, our instructors, are the backbone of our community. We are here to encourage and support you however we can. Let us know how we can help you get started.

A Year to Build a Software and Data Carpentry Community at the University of Florida - The Impact of a Local Instructor Training Workshop on Building Computing Capacity

$
0
0

This January was the one year anniversary of our effort to bring regular Software Carpentry and Data Carpentry workshops to the University of Florida. These workshops are aimed at helping students, staff, and faculty gain the computing skills they need to be successful in our data-driven world. The Carpentries are international organizations that provide materials, instructor certification, and organization of multi-day workshops on basic software development and data analysis tools. In January 2016 a Software Carpentry instructor training workshop held at the University of Florida Informatics Institute provided the start of our efforts. Since then, instructors trained here as well as experienced instructors already in the UF community have held four workshops, reaching 98 participants, including 70 students, 14 staff and 11 faculty. The participants received training in programming languages like R and Python, version control with Git and GitHub, SQL database queries, OpenRefine, and Excel spreadsheets.


Graph of participants’ status at UF and word cloud of departments our participants hail from ( https://www.jasondavies.com/wordcloud/)

Such a robust and recurring workshop pattern is uncommon in the Carpentries community (but not unprecedented) and it is a result of the generosity and volunteerism of a combination of staff, faculty, students, and organizations at UF. Together we recognized that members of the UF community did not have enough opportunities to get hands-on experience with the software development and data analysis tools they need to be effective researchers, employees, and future job-seekers. In response, we have established a highly collaborative process for giving our fellow UF community members, whether they are students, staff, or faculty, this opportunity.

Our Year of Workshops

Though UF has a longer history with the Data and Software Carpentry communities, the start of this current program was an instructor training workshop held in January 2016 at the UF Informatics Institute (UFII). Dr. Ethan White provided funds (through a grant from the Gordon and Betty Moore Foundation) for UF to become a Software Carpentry Foundation affiliate member and to run an on-site training for instructors. Fourteen people from UF attended the 2016 workshop, 5 came from other Florida institutions, and 4 from elsewhere in the US and Canada. As a result of this workshop, 8 participants from UF became newly certified instructors for Software or Data Carpentry. Today there are a total of 10 active instructors at UF.

Several existing instructors, including Matthew Collins from the Advanced Computing and Information Systems Lab and Dr. François Michonneau from the Whitney Laboratory for Marine Bioscience, with the help of the newly trained instructors, then approached the director of the UF Informatics Institute, Dr. George Michailidis, for logistical support to run a Software Carpentry workshop in March 2016. While it was very successful, only 16 participants of 31 who signed up attended. We did not charge a registration fee, so we believe that many people simply did not show up when another commitment arose.

For our second workshop, held in August 2016 just before the start of the semester, Alethea Geiger from the UFII worked with the UF Conference Department to set up an account and a registration page that accepted credit card payments. We were able to charge a $30 registration fee which allowed us to pay for lunch during the workshop. This amount appears to strike a good balance between using a registration fee to encourage attendance and cover catering costs while not imposing serious financial hardship for participants with limited funding. However, the Conference Department web site did not let us smoothly deal with waitlists and capacity caps, and over the first weekend we had more than 35 people sign up for the workshop. In order to accommodate everyone, the Marston Science Library generously offered a larger room for the workshop. Everyone who registered attended this workshop.

In October 2016, we held our third workshop using the Data Carpentry curriculum. At this workshop we had the honor of having Dr. Kari L. Jordan as a participant. Dr. Jordan was recently hired as the Data Carpentry organization’s director of assessment and this was her first experience at a workshop. The registration process worked smoothly this time and were able to use the UFII conference room for the workshop and catering. Our most recent event was another Software Carpentry workshop held at the UFII in February 2017.

What it Takes

This group’s volunteered time as well as the coordination and support of three existing instructors and the logistics supplied by the Informatics Institute have made it possible to reliably host Carpentry workshops. It currently takes about 8 hours for the lead instructor to arrange instructors, helpers, and announcements and to respond to attendee questions. The staff at the UFII spend another 8 hours managing registration and preparing the catering. Instructors spend between 4 and 12 hours preparing to teach depending on whether they have taught the lesson before. Helpers who are already familiar with the content of the lessons usually don’t need further preparation but new helpers spend 4 to 8 hours reviewing lessons and software installation instructions. Combined, each workshop takes about 40 person-hours of preparation and over 80 person-hours to host. With the exception of the UFII staff, this time is all volunteered.

How do we keep people volunteering? There are a number of factors that go into maintaining volunteers’ motivation and momentum. We didn’t plan these in advance but now that we have them in place, we recognize them as the reasons we can continue to keep our community engaged and excited about putting on workshops.

  1. Instructor density - have enough instructors to get 3-6 people at each workshop without burdening anyone’s schedule
  2. Instructor cohesion - just like we suggest learners attend workshops with a buddy, instructors who come to the instructor training from the same department or discipline immediately make their own community of practice
  3. Instructor mentorship - a core group of senior instructors to guide initial workshops (note the plural) so new instructors can focus on the teaching experience without the logistical burdens
  4. Professional staff - find staff who organize workshops as part of their job to share the overhead of coordinating logistics
  5. Institution-level support - a single research lab or department doesn’t have enough people to do this on its own, doing it for the whole institution fits the needs of everyone and shares the work
  6. Follow-through - have supporting events and communities available for people to keep learning and keep their experience with the Carpentries fresh in their minds when it comes time to look for more instructors and helpers

Community Building After the Workshops

Some of the instructors have also been involved in creating and helping communities of learners on campus grow outside of workshops. Dr. Michonneau started a Meetup.com group for the Gainesville community focused on R. M. Collins is an advisor to the UF Data Science and Informatics student organization which holds about 12 evening workshops each semester focused on building data science skills for UF students. In spring 2017 Dr. Daniel Maxwell , Informatics Librarian for the Marston Science Library, re-invigorated the UF R Users mailing list and is holding weekly in-person drop-in sessions. These venues allow former workshop participants to continue learning the skills taught in the Carpentry workshops. They provide a space where participants can ask questions of and interact with their peers when they start using the tools taught in the workshops for their own research. This ongoing communal engagement is proving to be a key factor in making sure workshop participants continue to develop their abilities.

UF’s Impact on the Carpentry Community

UF has a long history and deep connections to the Carpentries. Data Carpentry was originally imagined during the 2013 COLLAB-IT meeting between the IT members of iDigBio (a large NSF-sponsored project centered at UF) and the other NSF biocenters. The attendees of this two-day workshop found that one important need shared by the biocenters was a training program for researchers, focused on the novice, to develop software skills and data literacy for analyzing their data. Some attendees were involved with Software Carpentry and decided to develop a curriculum based on Software Carpentry’s teaching principles. Dr. White, as well as iDigBio staff including Deborah Paul, Dr. Michonneau, and M. Collins were instructors, helpers, and attendees at the prototype Data Carpentry workshop held in May 2014 at NESCENT facility at Duke University. The second official Data Carpentry workshop was put on by the iDigBio project right here at UF.

Since this first engagement with the Carpentries, many other members of the UF community have participated in Software and Data Carpentry workshops across the country. Not all have participated in this most recent effort to run workshops here on campus and some have moved on to other institutions but they have all contributed to UF being a valued organization in the Carpentries community.

In addition to building its own workshop infrastructure, UF is helping to advance the Carpentry programs in the US and globally. Dr. White is a founding Data Carpentry steering committee member, a member of the Software Carpentry Advisory Council, and has developed a semester-long course based on Data Carpentry materials that he has taught twice as WIS6934 through the Department of Wildlife Ecology. Through the iDigBio project and support from Dr. White, M. Collins and D. Paul have taught workshops in Nairobi, Kenya and Santa Clara, Costa Rica before the Biodiversity Information Standards conferences in 2015 and 2016. M. Collins has also served as a mentor to instructors trained during the South African instructor training and along with D. Paul has more recently become a member of the formal Carpentry mentorship program providing on-going support to new instructors across the country.

Going Forward

The success of our group has been the result of the serendipitous meeting of interested UF community members, an existing international teaching community, and informal funding and infrastructure support. We are now looking for a way to formalize UF’s commitment to building capacity in informatics skills for its staff, students, and faculty through an on-going structure.

To start this process, a consortium of labs and institutes at the University of Florida have combined resources to sponsor a joint Gold Partnership with Software and Data Carpentry going forward. The UF partners are Dr. White’s lab, the UF Biodiversity Institute (via Dr. Pamela Soltis), iDigBio (via Dr. Soltis), and the UF Informatics Institute (via Dr. Michailidis). This partnership will provide annual instructor training opportunities to grow the instructor community

To continue the rest of the key parts of our success, we still need:

  1. A UF department or institute to adopt the goal of informatics capacity building for the UF community.
  2. An individual to be given the task of coordinating this goal across UF.
  3. Continuous funding and resources to provide for a pipeline of people capable of meeting this goal.

We believe UF has a unique opportunity to create a sustainable effort that cuts across individual departments and research labs. While existing on-book courses and department-specific programs are available, we have shown that there is need for hands-on, community-led informatics skill development for everyone on campus regardless of affiliation or discipline. By approaching this need at the university level we can maintain the critical mass of expertise and motivation to make our staff more productive, our students more employable, and our faculty’s research more innovative.

Acknowledgements

The following people have been active members of the UF instructor community and have volunteered their time in the past year by participating as instructors or helpers during the recent workshops:

Erica Christensen (*) - Ernest Lab, WEC
Matthew Collins - Advanced Computing and Information Systems Lab, ECE
Dave Harris (*) - White Lab, WEC
Allison Jai O’Dell (*) - George A Smathers Libraries
Sergio Marconi (*) - White Lab, WEC
François Michonneau - Martindale Lab, Whitney Laboratory for Marine Bioscience
Elise Morrison (*) - Soil and Water Sciences, IFAS
Deborah Paul (*) - Institute for Digital Information, Florida State University
Kristina Riemer (*) - White Lab, WEC
Henry Senyondo (*) - White Lab, WEC
Miao Sun - Soltis Lab, FLMNH
Brian Stucky (*) - Guralnick Lab, FLMNH
Shawn Taylor (*) - White Lab, WEC

(*) Trained at the January 2016 UF instructor training workshop

The following entities have contributed material support to our workshops or the Carpentries communities:

Advanced Computing and Information Systems Lab, Electrical and Computer Engineering
Earnst Lab, Wildlife Ecology and Conservation
Soltis Lab, Florida Museum of Natural History
University of Florida Biodiversity Institute
University of Florida Informatics Institute
White Lab, Wildlife Ecology and Conservation

We would also like to thank the incredible support provided by Alethea Geiger, Flora Marynak, and Deb Campbell at the UF Informatics Institute. They have managed the space, catering, registration, and financial aspects of our workshops for us and their services are the main reason we can provide so many workshops.

Ecology Issue Bonanza!!!

$
0
0

We’re excited to announce our first lesson release! Data Carpentry is preparing to publish our Ecology workshop materials on Zenodo this Spring. Software Carpentry regularly publishes their lessons to provide stable identifiers for polished versions of the lessons. This enables referenced discussions of the lesson materials and gives contributors a verifiable product to cite on their CVs or resumes.

This lesson release will include the following repos:

Get involved!

If you’ve made a contribution to one of the Data Carpentry Ecology lessons, you’re already an author. Help make sure the final product is polished and complete by getting involved in the lesson release events.

How does the lesson release process work?

Here’s a run-down of the lesson release process and our timetable for this release.

  • Resolve existing PRs. 2/27-3/14
  • Freeze lessons to new significant changes. 3/15
  • Issue Bonanza to identify issues that need to be fixed before publication. 3/16 - 3/17
  • Staff and maintainers organize issues (e.g. add tags and remove duplicates). 3/18-4/5
  • Bug BBQ to fix issues identified during Issue Bonanza. 4/6 - 4/7
  • Publish! 4-21

Issue Bonanza

Thursday, March 16th 22:00 UTC - Friday, March 17th 22:00 UTC

Click this link to see the event in your local time.

Join the community in a hacky-day dedicated to creating issues and simple PRs for cleaning up the Ecology lessons. Issues to focus on are in the lesson release checklist. You don’t need to be an expert in the materials - we need people to help search for broken links and typos too! We strongly encourage folks to form local or distributed working groups to catalyze activity. We’ll provide Etherpads and BlueJeans rooms for you to join in the effort along with a checklist of the types of issues to look for. We’ll also send a packet of Data Carpentry swag to any group of five or more who collectively create at least twenty meaningful issues.

If you’re planning on joining the Issue Bonanza - add your name to the event Etherpad. Stay tuned for updates about Bonanza events and how to organize your working group!

Bug BBQ

Thursday, April 6th, 22:00 UTC - Friday, April 7th, 22:00 UTC

Click this link to see the event in your local time.

Our Issue Bonanza will help us identify issues that need fixing before publication. Once we know what those issues are, the next step is to fix them! Join with the community to submit PRs to fix existing issues and get us ready to publish. We’ll provide Etherpads and BlueJeans rooms for you to work with your community members and will be sending out swag to the groups or individuals who submit the most merged PRs. Keep an eye open for more information about the Bug BBQ!

We’re excited to work with the community to push out or first lesson release. Put these dates on your calendar, and we’ll send out reminders and updates too. These lessons belong to the community - help us keep them great!

Viewing all 170 articles
Browse latest View live




Latest Images