Responding to your Learners

September 8, 2016, 5:00 pm

≫ Next: September Carpentries Community Call

≪ Previous: September Data Carpentry All-Stars!

Picture for a moment one of your undergraduate university courses, or a recent meeting of your department’s research seminar. If your experience is typical, more likely than not the instructor or presenter came in with a prepared set of materials on a slide deck (or for the old-school, on projector overheads), talked at you for an hour, maybe asked if there were any questions, and then packed up and left. If this sounds familiar, you’ve probably had lots of experience with the standard lecture.

Lecturing has been the default mode of instruction for centuries, as evidenced by its presence in medieval illuminations. As a mechanism for large-scale verbal dissemination of information, lecturing has served a valuable purpose. It’s relatively easy to get adult learners to sit still for an hour and hear what you have to say. But of course, in most cases, the end goal is not to have people listen to you speak, but to have them understand and be able to use new information in their thought process. In other words, you want them to learn something.

Larentius de Voltolina lecture, mid-1300s

We know that in many cases, simple exposure to information isn’t enough to learn it in the sense of integrating the information into a mental model and being able to use it to understand the world. The human brain is expert in tricking us into believing we understand something, for example by forcing new information into existing incorrect mental models of how the world works. This is one reason why asking learners if they have questions, or if “this all makes sense”, is ineffective. Even with the best of intentions, learners often don’t know when they’re confused.

The only way to accurately gauge learner understanding is to ask them to do something using their new information. Traditionally, this “doing something” has been restricted to high-stakes assignments like exams and was meant to serve as an official judgment of the end-result of a learner’s progress. These sorts of tasks are known as summative assessments because they serve as a final measurement of the learner’s ability, without providing an opportunity to learn from mistakes.

In contrast, formative assessments provide opportunities for learners to practice using their new knowledge in a low-stakes atmosphere, with the primary goal of providing feedback to themselves and to the instructor about their level of understanding. Formative assessment is most useful for learners when the results of that assessment are used by the instructor to address misunderstandings and adjust the pace and focus of instruction.

In university classrooms, formative assessments are often carried out using electronic audience response systems. By enabling instructors to rapidly and accurately collect information about their learners’ current knowledge state, these systems can be used to make real-time decisions about the best next step for instruction.

Such electronic systems work well for quarter or semester-long courses, but not so well for workshops for two reasons. First, we don’t want to ask our learners to spend money on a tool they’re only going to use for a couple of days. And second, learning how to use any new technology takes time, a precious commodity in our short workshops.

Instead, we carry out formative assessments using a very low-tech piece of technology - the humble sticky note. Learners are given two different colored sticky notes at the beginning of the workshop. These are used as progress flags during hands-on exercises to signal to the instructor and workshop helpers when learners are stuck (red sticky note) or have successfully completed the exercise (green sticky note). Learners can also use these at any time during the workshop to signal that they want help (red sticky note).

Sticky notes

This system lets instructors quickly gauge the room’s progress and adjust their teaching pace, while simultaneously allowing resources (helpers) to be efficiently targeted towards those learners who need assistance. By making it unnecessary for learners to raise their hands, sticky notes also provide two other benefits - they make it more likely that learners will ask for help (putting up a sticky note is less emotionally threatening than sitting there with a hand raised), and they let learners who are struggling continue to work (it’s hard to type one-handed).

Using just two little pieces of paper and a bit of adhesive, we’re able to harness the amazing power of formative assessment and ensure that we’re teaching our learners rather than just talking at them. So next time you’re in a Carpentry workshop, look around at those little colored flags on everyone’s laptop and remember that sometimes simple tools can do great things.

↧

September Carpentries Community Call

September 11, 2016, 5:00 pm

≫ Next: Self-Efficacy and the Carpentry Learner

≪ Previous: Responding to your Learners

Our next Carpentries Community Call (formerly called Lab Meeting or Town Hall meeting) will be Thursday, September 15 (September 16 Aus/NZ). These meetings will now be monthly on the third Thursday of every month. It would be great to see instructors there! These calls are a great chance to connect with other Carpentry instructors and get updates and information on important and interesting topics for the community.

Times:

7am Pacific / 10am Eastern / 2pm UTC / 12am (Sept 16th) Sydney
4pm Pacific / 7pm Eastern / 11pm UTC / 9am (Sept 16th) Sydney

Topics this month will include:

New lesson template
Policy committee and update on CoC
IRB approval and updates on assessment
Highlighting manuscripts from our community
Election on rules for Software Carpentry Steering Committee

Head over to the etherpad to let us know you’ll be attending one of the two sessions.

↧

Self-Efficacy and the Carpentry Learner

September 19, 2016, 5:00 pm

≫ Next: Analysis of Data Carpentry Workshop Impact

≪ Previous: September Carpentries Community Call

The Little Engine that Could had it absolutely right. Thinking you can is one of the best ways to succeed at a host of activities ranging from public speaking and bungee jumping to cooking a souffle and learning R. What the Little Engine didn’t know, however, is that this concept of “thinking I can” is actually a theory known as self-efficacy, and plays a major role in how we approach our goals.

Self-efficacy is simply our belief in our ability to accomplish a task. That task can be anything you can imagine:

Riding a bike
Doing visualizations in R
Singing karaoke
Starting and populating a repository

No matter the task, awareness of your self-efficacy relates to your achievement.

I’ll be transparent. In my first Optimization class session in graduate school I looked around the room and thought to myself, “I don’t belong here.” I had no friends or colleagues in the course, and none of the other students looked like me. I didn’t even know anyone at the university who had taken (and passed) this course. My anxiety about the course content was stronger than my will to see the course through to finish. Lastly, I had never seen the material before, or anything like it for that matter. Needless to say I dropped the course immediately.

What would have happened had I been aware of how my self-efficacy for completing the Optimization course linked to my success? Let’s examine what happened in the classroom that day.

First:
I looked around the room and thought to myself, “I don’t belong here.” I had no friends or colleagues in the course, and none of the other students looked like me.

In self-efficacy research this refers to social persuasions, or simply, encouragement from another person. Having that friend or colleague next to you encouraging you may seem insignificant, but it could be a major factor in whether you achieve a task.

Second:
I didn’t even know anyone at the university who had taken (and passed) this course.

This is referred to as vicarious experiences, or “if she can do it, so can I.” How many times have you pushed yourself to complete a task solely because you saw someone else do it?

alt text

Third:
My anxiety about the course content was stronger than my will to see the course through to finish.

Physiological factors like sweating, butterflies in your stomach, and fatigue certainly affect whether or not you complete a task. Recognizing these factors and being able to overcome them gets you that much closer to completing the task.

Finally:
I had never seen the material before, or anything like it for that matter.

Mastery, or enactive attainment, is probably the most important factor that determines your self-efficacy. Completing a task time and time again (i.e. if becoming a “master”), raises your self-efficacy; failure lowers it.

So what does this mean for the carpentry learner and how we (should) structure our workshops? Here are a few thoughts and recommendations, and my hope is that you’ll comment below and give us more.

Encourage learners to invite a friend to join them in the workshop. We all could use a little social persuasion every now and then.
Encourage learners to network with community members, instructors, or other individuals who have already completed a carpentry workshop. “If they did it, I think I can!”
Encourage learners to use “break time” to get up, stretch, gripe about deleting their repo, whatever! Ridding themselves of anxiety will get them closer to completing the workshop headache free.
Encourage learners to attend follow-up sessions and practice, practice, PRACTICE! Who doesn’t want to be a master?

The moral of the story is this: If you think you can, you should, and you will. Choo choo!

[I dare you to share this post.]

↧

Analysis of Data Carpentry Workshop Impact

November 3, 2016, 5:00 pm

≫ Next: Belonging: Developing a Community of Practice among Data Carpentry Learners

≪ Previous: Self-Efficacy and the Carpentry Learner

It’s funny. When I first started working for Data Carpentry, I had never heard the phrase, “reproducible research”. I can tell you now that having attended a Software Carpentry workshop, Data Carpentry workshop, and a Software/Data Carpentry instructor training, I wish I had learned the skills we teach when I first began my PhD program.

I even confessed to my colleagues that the data I left behind for up and coming grad students is so disorganized, I sent them an e-mail to apologize! This community has made a believer out of me, and for good reason: Our workshops work.

See for yourself. Read the report of the recent analysis of Data Carpentry’s post-workshop surveys.

You can run your own analysis, too! The data is available in the assessment repo on GitHub.

Data Carpentry workshops have made a meaningful impact on the way learners view their ability to complete computational tasks. Learners have expressed satisfaction with workshop content and appreciation for the caliber of their instructors. Learners self-reported improved levels of data management and analysis skills following Data Carpentry workshops.

As Data Carpentry continues to offer more workshops, we hope to see a continued shift in the perspective of how researchers view and use computational skills. Data Carpentry will continue to develop and teach fundamental data skills to expand the community of data literate researchers.

If you missed our assessment community call, check out the slidedeck.

Comment below, and tweet us your thoughts @datacarpentry and @drkariljordan.

↧

Belonging: Developing a Community of Practice among Data Carpentry Learners

November 13, 2016, 4:00 pm

≫ Next: Open Instructor Training

≪ Previous: Analysis of Data Carpentry Workshop Impact

Two schools of thought regarding learning and how it is achieved are the acquisition metaphor and the participation metaphor.

The acquisition metaphor asserts knowledge is acquired individually, applied, and transferred.

The participation metaphor asserts knowledge is formed by becoming a member of a community of practice.

The computational scientist of old personifies the acquisition metaphor. She is someone who learned and worked in isolation, spending hours upon hours coding and debugging until her problem was solved.

In the Carpentry community, we develop our workshops around the participation metaphor by incorporating hands-on experiences in our in-person lessons, and promoting community engagement post-workshop. Does this imply we are leaving those who prefer to learn on their own behind?

In the late 1990s, Anna Sfard published an article discussing the implications of these two metaphors (acquisition vs. participation). Here is a succinct mapping adapted from Sfard’s article.

Learning	Acquisition metaphor	Participation metaphor
Definition	Acquisition of something	Becoming a participant
Goal	Individual enrichment	Community building
Student	Recipient	Apprentice
Teacher	Facilitator	Expert participant
Knowledge	Property	Activity
Knowing is	Having	Belonging

In short, the acquisition metaphor views knowledge as property that one acquires, whereas the participation metaphor views knowledge as being active in a community.

I, like many others in educational research, wrestle with whether or not one has to choose one metaphor over the other. Am I #teamaquisition or #teamparticipation? I have an example. I attended my first Software Carpentry workshop about a month ago. I had never heard of many of the lessons we teach, and during the workshop I lost my way. I knew I had to learn the information to be successful at my job, so immediately following the workshop I went to the mall, sat in the foodcourt, and went through the entire lesson by myself. By the end of the lesson I was confident!

Having done that I was now able to interact and be a part of the conversation during the second day of the workshop–I felt like I belonged. I found it easier to learn the material on my own, and still feel like part of this community.

What are your thoughts about these two metaphors as it relates to our learners, workshops, and community? Acquisition or participation? Do we have to choose to belong?

When did you realize you belonged to the carpentry community?

Comment below, and tweet us your thoughts @datacarpentry and @drkariljordan.

Reference: Sfard, A. (1998). On two metaphors for learning and the dangers of choosing just one. Educational researcher, 27(2), 4-13.

↧

Open Instructor Training

November 15, 2016, 4:00 pm

≫ Next: Forming a Community-Developed Code of Conduct: What we learned.

≪ Previous: Belonging: Developing a Community of Practice among Data Carpentry Learners

After workshops and conferences, we frequently get questions from people who are interested in teaching with the Carpentries. We’re overjoyed by this interest and excited to bring more committed and enthusiastic instructors into our community. Unfortunately, until recently, we haven’t had the resources to open up our instructor training program, and have been holding training events primarily with Partnering institutions.

In response to this sustained community interest, Data and Software Carpentry re-opened applications in July for anyone interested in instructor training, regardless of affiliation. This two-day intensive training covers aspects of pedagogy critical for teaching our target audience, including creating useful formative assessments, motivating learners, dealing with cognitive load, and understanding how subject-matter expertise is developed. We also teach signature Carpentry instructional practices, including live coding and the use of sticky notes to track learner progress.

Within three weeks of calling for applicants we received 169 applications for 60 available seats. Applications came in from 22 countries spread across all continents except Antarctica. The Carpentry team reviewed applications on the basis of applicant’s previous involvement with the Carpentries, previous teaching experience and/or formal pedagogical training, and commitment to teaching workshops. In addition to these criteria, we looked for applicants from locations outside of our already established communities or with training in domains that are underrepresented among our current instructor pool, such as the social sciences and digital humanities.

We were able to make offers to applicants from 13 countries representing the full geographical breadth of those who applied. Two training sessions have now been held, a third is taking place this week, and the fourth is scheduled for the second week of December. The feedback from the first two sessions has been very positive: we have had to adapt some of our exercises to account for the fact that the trainees are participating online rather than being physically co-located, but other than a few web conferencing glitches, things have gone surprisingly smoothly.

If you were not selected for this round of instructor training, don’t lose heart: we have kept everyone’s application in queue, and hope to revisit our offerings in the new year. If you also have colleagues who are interested in teaching for the Carpentries, consider asking your institution to Partner with us! Partnering institutions receive multiple benefits, including reserved seats in instructor training events and discounted workshops.

We are very grateful to everyone who applied, and hope that you will continue to be involved in the community. We welcome contributions to our lessons, which are all developed collaboratively by our community, and encourage you to help at our host a Carpentry workshop at your institution.

↧

Forming a Community-Developed Code of Conduct: What we learned.

November 20, 2016, 4:00 pm

≫ Next: Reproducible Research using Jupyter Notebooks: Curriculum Development Hackathon

≪ Previous: Open Instructor Training

Codes of Conduct are important because they define behavior expectations for a community and encourage positive relationships, and because of their importance, care must be taken when they are being developed. Defining the standards to which people will be held when they act as part of a community is inherently difficult. Everyone has slightly (or sometimes radically) different ideas about what counts as acceptable behavior, and most people are surprised that their idea of “acceptable” isn’t shared by all. On the other hand, some people reject the notion that any (legal) behavior should preclude them from being part of a community. In a group like the Carpentry community, with a fluid membership and a culture of co-operative, consensus based decision-making, the development of these standards must be a community process that can handle this complexity. How does this collective ‘you’ decide on behavioral rules for the community that reflect the diverse voices present? And how does this ‘you’ balance the need to provide clear boundaries without writing a comprehensive rule book?

The Code of Conduct has always been important to the Carpentries, but a few incidents this past spring made it evident that we needed clearer guidelines and processes for handling incidents. There was significant discussion around these issues and some community members were also concerned about the way incidents were handled, calling for a closer look at how reports of Code of Conduct violations are adjudicated and what sorts of penalties are appropriate.

In response to this community conversation (here, here, here and here), Software and Data Carpentry staff began looking into both how to improve the wording of our Code of Conduct to more clearly communicate our community standards, and how to make sure our process for handling reported violations is fair and in line with our community’s needs.

After some research, we found that simply revising the Code of Conduct itself was not enough, we also needed to develop specific guidelines for reporting and adjudicating potential Code of Conduct violations that were appropriate for all Carpentry spaces, both in-person and online. We also needed to make sure that these guidelines were developed, approved and enforced by the community they were intended to serve. The Software and Data Carpentry Steering Committees approved a community-based process for revising the Code of Conduct.

In August, Carpentry staff put together a draft of a new Code of Conduct, reporting guide and enforcement manual and sent these around to the community for input. We also put out a call for volunteers to serve on a Policy subcommittee to help finalize the language of this and related policies. After receiving a lot of community feedback, we hosted two Community Calls (previously Lab Meetings) to discuss the new language and further ask for volunteers to serve on the Policy subcommittee. The call for volunteers also went out on our blog. We asked volunteers to share with us how they’ve been involved with the Carpentries in the past and how they plan to be involved in the future, what sort of experience they might have that would help them form this new policy, and how they felt they would contribute to the diversity of the Policy group. Three community members volunteered and were accepted to serve on the subcommittee. We are happy to have Pauline Barmby, Chris Hamm, and Simon Waldman on the committee, along with staff representation from Jonah Duckles and Erin Becker. Tracy Teal and Karin Lagesen also contribute to the Policy committee by serving as liaisons to the Data and Software Carpentry Steering Committees.

The new Policy subcommittee met virtually starting in September to iterate on the draft of the new Code of Conduct language. They worked to incorporate thoughts from the community that had been shared on GitHub discussions, the Discuss list, and comments left on the initial Google doc of the drafted policy. After a series of conversations incorporating feedback and comments, the group reached consensus and presented the new language to both the Software Carpentry and Data Carpentry Steering Committees.

Throughout this process, we knew that it was important to us as a community to have an open conversation about these issues, both to provide everyone an opportunity to make their voices heard and to promote community ownership of the final policy. We hope that we’ve met this goal and helped our community ensure that Carpentry spaces are welcoming to all and that any potential violations of community standards of behavior are handled fairly and transparently. If you have any questions about the Code of Conduct, please email the Policy subcommittee at policy@carpentries.org.

We are incredibly grateful to our volunteer community for their input and to the members of the Policy subcommittee, for their dedication to ensuring that our community remains welcoming to all. Please join us in extending a warm thank you to the Policy subcommittee and to our community for engaging in this important work.

↧

Reproducible Research using Jupyter Notebooks: Curriculum Development Hackathon

November 20, 2016, 4:00 pm

≫ Next: Collaborative Lesson Development for Semester-long Courses

≪ Previous: Forming a Community-Developed Code of Conduct: What we learned.

Goal: Develop the content of a two-day Data Carpentry workshop teaching how to conduct research reproducibly using the Jupyter notebook
Location: Berkeley Institute for Data Science (BIDS), Berkeley, CA
Dates and times: January 9 - 11, 2017; 9 am - 5 pm each day
To apply:https://goo.gl/aPO71f; deadline December 5, 2016. Successful applicants will be notified by December 12.

Synopsis

Making science more reproducible has enormous potential to accelerate research advances, including for practicing individuals. Despite this, the tools and approaches that are already available are rarely taught. To address this, we are organizing a 3-day hands-on hackathon aimed at developing, and later teaching, a short-course curriculum on using Jupyter notebooks for reproducible research practices. The event will be held January 9 - 11, 2016, in Berkeley, CA, at the Berkeley Institute for Data Science (BIDS). We aim to assemble a diverse and interdisciplinary group of participants, and invite those interested to apply by December 5, 2016, at https://goo.gl/aPO71f.

Full details at the Call for Participation: https://git.io/vXAaa

↧

Collaborative Lesson Development for Semester-long Courses

November 27, 2016, 4:00 pm

≫ Next: The R ecology lessons

≪ Previous: Reproducible Research using Jupyter Notebooks: Curriculum Development Hackathon

When did you realize you belonged to the Carpentry community?

My first encounter with the Carpentry community was early in my PhD training, way back when Software Carpentry was teaching programming lessons using brilliantly narrated videos. I knew then that a basic understanding of computer programming was an important part of a strong career in science. What I didn’t expect was to get inspired by a community of scientists and teachers that promotes openness and collaboration throughout their work.

Fast forward to the present… having just accepted my first faculty position in agroecology (read, big data scientist in agriculture), I am very glad that I got involved in the Carpentry community. The Carpentries active and open approach to science and teaching has become my own.

I taught laboratory courses throughout my PhD and quickly realized that the Carpentry model of active-learning resonated with my preference for teaching labs. There is something about engaging students in the learning process and giving them an opportunity to explore the lesson material. Insightfully, the Carpentry community elevated this active-learning process to include their instructors who are empowered to contribute to the collaborative development of open source lesson materials. For new instructors and veteran Carpentry community members alike, now’s an exciting time to be involved.

Data Carpentry is working to expand their collaborative workshop development to include curriculum innovation for college and university courses and just formally announced the first semester-long Data Carpentry Course. It doesn’t take much to get involved. Instructors and students are encouraged to provide feedback for the course to help improve the content and clarity of the existing curriculum. Instructors are also welcome to fork the course and use the general site structure and templates to develop their own course and expand the domains represented by the Carpentry community resources.

To get the collaborative course development rolling, instructors are encouraged to contribute exercises and lessons that can be used to customize the existing course structure to specific needs of various classes and programs. We use a ‘reverse instructional design’ to develop our course materials and the exercises are the heart of that approach. Strong exercises clearly direct the instructional materials to be presented, facilitate practice of the material, and assess learning.

The Carpentry community continues to grow and we are excited for your participation. Let us know how we can help you get involved.

↧

The R ecology lessons

November 27, 2016, 4:00 pm

≫ Next: The Python ecology lessons

≪ Previous: Collaborative Lesson Development for Semester-long Courses

The Data Carpentry lessons are aimed at learners who have never programmed before. Learning something new, especially coding, can feel intimidating. Yet learners attending our workshops are motivated as they realize it is a skill they will eventually need to master to be able to manage and make sense of the data they are generating in their research. When learning a programming language, you eventually need to master two things: the syntax of the language (e.g., where the parentheses and the commas go), and the intricacies of the language that will make you write less code and faster code (e.g., taking advantage of the vectorized operations). The first can be learned relatively quickly, the second takes years of practice. One of the advantages of learning the basics of programming during a workshop is that we can teach learners good practices from the start, rather than having to go through the painful experiences that typically accompany self-learning experiences.

We teach good practices but not necessarily best practices. When you first learn subtraction, your teacher taught you that you can’t give away six marbles if you only have four. And when you first learned about square roots, your teacher told you that you can only calculate the square root of a positive number. Teachers don’t cover negative numbers and complex numbers until you master other skills that are needed to understand these concepts.

We are working on a tight schedule during a workshop. In two days, we need to empower learners to demonstrate that coding isn’t scary, and that with a little knowledge and good practices, you can achieve a lot. During this time, we need to tackle problems that are realistic enough that learners can project the skills we teach them to their own datasets/problems, but not so difficult that learners are demotivated by the amount of knowledge they will need to master to be able to be productive on their own with their data.

One way we do that is by building the lesson around a dataset learners can easily relate to. The dataset for the ecology-themed lesson is from a real long-term experiment that uses variables anyone working in ecology will be familiar with: species names, measurements (lengths and weights), date of observation. The dataset is large enough (35,000+ rows) that manipulating it in a spreadsheet program would be difficult, but small enough that working on it with a programming language is almost instantaneous.

The other way is by putting a lot of thought into selecting what we cover during these two half days. We focus on how to organize the code, the data, and the files that make up a typical research project. There are now many resources to learn R online (some of which can be useful to the learners after a workshop), but one way Data Carpentry stands out is by demonstrating how good practices for data formatting and organization can facilitate data analysis. We reinforce this idea by using the same dataset throughout the workshop.

The main skills we focus on in the R lesson are how to prepare datasets for analysis and visualization. In Data Carpentry, we teach packages from the tidyverse (formerly known as the Hadleyverse) which are sophisticated and elegant additions to the R language to work with data. Most functions are verbs (e.g., filter()), and use a limited vocabulary that distills operations needed to manipulate data. For instance, the six functions we introduce in the lesson on dplyr are enough to cover most cases of subseting data and extracting relevant information from it. We use ggplot2 for data visualization because it allows learners to rapidly produce high-quality graphics. In addition, these packages encourage best and consistent data formatting practices by relying on the tidy data concept (one row for each observation, one column per variable, one table per observational unit). We first introduce the tidy data concept in the spreadsheet lesson and emphasize its utility in each of the lessons.

Because of the duration of the workshop, we have to leave a lot of things out of the lesson. We want to limit the information overload. For instance, we don’t cover lists. While they are essential to programming in R, a lot of data analysis can be done without knowing about them. On the other hand, we cover factors because learners will encounter them when importing data in R, and their behavior is often misunderstood. When factors are covered during a workshop, it will be easier for learners to know how to deal with them in the future.

During workshops, “Challenges” are a core component of the learning experience. After each concept the instructor covers, the challenges are the opportunity for learners to practice what they just learned. It is a form of formative assessment that brings interactivity. The learners can witness for themselves what they are now capable of doing with their newly acquired skills (for instance, a beautiful plot), and it allows instructors to assess whether the learners have assimilated the concepts taught. Compared with the passive lecture format learners typically experience, they are not used to this level of interactivity, and often praise these challenges in the workshop evaluations.

When beginning learning a new programming language, it can be frustrating to not be able to generate the desired output or to be faced with cryptic error messages because a comma or a quotation mark is missing. To limit this frustration, we provide learners with a handout that contains a lot of the code already typed. They can fill in the blanks, add their own comments, and bits of code to it. It allows learners to focus on learning concepts and how they relate to each others rather than obsessing over where the commas go.

I think it is also important to be realistic about the expectations one can have after a 2-day workshop. Learning how to code takes time. Because it is a new skill, learners will need to change the way they approach the analysis of their data. Being confronted with something new can feel uncomfortable, and facing the limits of one’s knowledge can be frustrating. It is important to celebrate your successes along the way. It will help you go through frustrating times. Having people who can help you in your learning experience is also important: the person sitting next to you at a workshop might be the best person to take on this job. Finally, strive for best practices but not before you master the “good enough” practices.

↧

The Python ecology lessons

November 27, 2016, 4:00 pm

≫ Next: Making use of Data Skills

≪ Previous: The R ecology lessons

Why We Teach It

In the Beginning

A lot of what are now the Python materials came out of a Software Carpentry Hackathon in 2014 when Software Carpentry was under the Mozilla Science Labs umbrella. At the time, Data Carpentry was a fledgling organization and really only had a small amount of materials on R and Excel. As you can see from the Etherpad from that hackathon, people were pretty excited about getting some of those materials translated to Python.

Way back in those days, the materials were a lot more information dense, and lacked a real narrative threading through. In practice, that set of lessons was probably better for self-guided learning than classroom practice.

The current generation materials strikes a better balance: there is enough content in the lessons that you absolutely can go through them on your own, or with a small group, without an instructor. But we’ve also developed a better narrative for the lessons by using an empirical ecological dataset (Ernst et al. 2009).

Why we teach what we teach

If you look at the way the lessons are structured, we start broad: just getting a look at your data, making a couple plots. Part of the goal of Data Carpentry is to get learners doing open, reproducible science. But I think we can all think of times when we just opened our data in a spreadsheet program and explored it. A lot of science happens in those little moments of exploration … and a lot of science gets lost when you can’t remember what you clicked on to get that cool plot. By starting with a broad view, we emphasize to learners that they can still do the exploration that they want to do in a reproducible way.

From there, we get more into the nitty-gritty. How do I subset my data? How can I restructure my data into useful groups? Our developers and maintainers are biologists - the functionality that we cover comes from biologists thinking hard about what is useful to know. In the first pass of the materials, I added content based on things I had done recently, and things I wanted to be able to do, but didn’t know how. We can see the biases of biologists in what is covered – and in what isn’t.

Where do we go from here?

There are a few immediate things that are on my TODO, for now. If you’re a new instructor looking to make your check-out PR, any of these things would be great.

One issue with the Python lessons is that they get taught less frequently than the R lessons, so we get less feedback. If you’re using the Python lessons, even in self-guided study or for onboarding undergraduates in your lab (which is how I use them on a regular basis), I’d love to hear from you. More feedback will help us improve our materials.
Checking over the later lessons. The early lessons (Short Introduction to Python, Starting With Data) get a lot of love from new instructors, but the later lessons that build on them don’t get nearly as much attention. Additional eyes checking them for errors would be great.
Improvements to the instructor’s guide. This is new, and for some of the exercises, there are mutliple possible solutions.
Adaptation of the materials to other contexts. We occasionally get pull requests that reflect a different knowledge base from the average ecologist, and the lessons are starting to get overfull. If you do genomics, or behavior, or whatever your discipline is and want to chat about adapting the materials, do get in touch. We have a really great community and we can probably find someone to work with you.

↧

Making use of Data Skills

November 30, 2016, 4:00 pm

≫ Next: Hand-crafted relational databases for fun and science

≪ Previous: The Python ecology lessons

Data Carpentry learners and instructors come from a variety of backgrounds and research disciplines. Many of us, myself included, don’t think of ourselves as “computer people.” We’re here because, at some point in our career, we found ourselves doing repetitive and error-prone computational tasks by hand when someone looked over our shoulder and said “Why on earth are you doing it like that!?” This person may then have leaned over our keyboard, typed for a few minutes, and finished the work that would have taken us the rest of the day. At least in my case, this happened routinely, every time a certain person happened to walk past my work station, until it eventually became more work to try to justify my process (“But I only have to do this once”. I’d say) than to capitulate and try to learn the computer magic.

That’s not to say that learning these skills was easy or fast, or that I feel proficient in many of them even years later. But once I committed to trying (mostly to avoid those annoying conversations) I started noticing that these dull, monotonous tasks could be minimized and yes, some of them could even be fun. And of course, as it turns out, you hardly ever have to do anything only once.

Some of you may have had an efficiency advocate like I had to peer over your shoulder and pester you into doing things the “right” way. Some of you may have gotten frustrated on your own and sought out tools to make your lives easier. For others, it may have been a desire to make sure your work was reproducible, rather than simply efficient, that drew you into this realm. Regardless of what got you to first engage with computers as helpful tools rather than magic boxes, you’ve probably developed your own set of practices for making the work that you do efficient and reproducible.

We’re excited to learn about how the diverse members of our community use computational tools, particularly those in the Data Carpentry curriculum, in their daily work. The Data Carpentry blog will be running a new series titled “Data in the Field” to highlight the many ways in which our community members integrate “good enough” practices for data management and analysis into their research.

Stay tuned for posts in this series every Monday. Upcoming contributors include:
- Naupaka Zimmerman, Asst. Professor of Biology at University of San Francisco
- Damien Irving, Postdoctoral Research Fellow, CSIRO Oceans and Atmosphere
- Marian Schmidt, PhD student in Ecology and Evolutionary Biology at the University of Michigan
- Christie Bahlai, Professor of Integrative Biology at Michigan State University
- Sean Pue, Professor of Linguistics at Michigan State University

We’d love to hear from you! If you’re interested in contributing a post in this series please contact ebecker@datacarpentry.org.

Some questions to consider as you plan your post. These are simply suggestions - feel free to structure your post in a way that makes sense for you.

What is your current field of research/work? Please describe a project you’re working on in terms that a non-specialist will understand. Tell us about the types of computational tools that you need to know in order to work on this project. What do you use them for and why are they important to your research?

How did you come to learn those tools? More broadly, how did you first become exposed to these sorts of computational tools? When did you realize you needed to learn how to use these sorts of tools in order to do the work you’re interested in doing? How did you go through the process of learning what you know now? For example, did you have a very supportive advisor? Did you carve out time for self-study? Did you have a community of learners that you worked with?

What do you want to learn next? What are you currently working on learning in terms of computational tools or what do you hope to learn soon? More broadly, how do you plan on continuing to improve your skill set?

What advice do you have for new learners? What do you wish someone had told you when you first started learning how to use these sorts of computational tools in your work? Do you have any words of wisdom?

↧

Hand-crafted relational databases for fun and science

December 4, 2016, 4:00 pm

≫ Next: A Year in Review: Annual Moore Progress Report

≪ Previous: Making use of Data Skills

I’m a microbial ecologist that is primarily interested in understanding the ecological causes and consequences of plant-microbe interactions. Like many ecologists these days, my research spans the gamut from field to lab to laptop. Often the work involves collecting some leaves from plants in the field or from plants in a greenhouse, and then culturing from those leaves the fungi that live asymptomatically within them (a.k.a. endophytic fungi). Sometimes studying these communities relies on culturing the organisms in Petri dishes, and sometimes on directly extracting the DNA or RNA from the samples and sequencing that. One of the hardest things for me has been simply keeping track of what samples came from where, when, and who did what to which samples. To give an example: Let’s say I’ve got a fungal culture growing in a Petri dish, and I want to know how many times it has been subcultured (that is, regrown from an existing culture), when those subculturing events happened, what kind of plant tissue the original progenitor culture came from, and where that plant tissue itself came from out in the field. Oh and then downstream, I also want to know something about all the other such fungi that came out of that same plant and all their metadata as well. And I want to be able to get all of that programmatically.

Keeping track of this type of information is simple enough with dozens or even hundreds of samples. But with thousands, or even tens of thousands, it becomes untenable without a better system. So, as part of my postdoc, I invested some of my time in building a relational database to track all of this information for my own research projects. I had used SQL commands to query databases before for other projects, but I had never designed and implemented a database of my own. It seemed like building a small custom database would be a good solution to my problem and would help me develop a skill worth having.

The first part of designing a database is thinking long and hard about a schema (basically the blueprint for the database’s structure). While relational databases allow you to represent highly complex datasets and the connections between and within different types of data, they are not nearly as flexible as a simple spreadsheet. Once they are built, it’s not trivial (although definitely possible) to change their structure. This is why having a good long think on how you’d like to represent your data within the database is really important. I found the macOS program OmniGraffle really helpful for putting this diagram together, but any vector drawing program (or pen and paper) would work.

Here’s an example of a draft Entity Relationship Diagram I made to think about different types of data I work with in my research and their relationships:

The first key step is to think about the different kinds of things you’d like to incorporate in the database. Each table in the database should represent only one type of thing, along with its associated metadata (see for example Figure 1 here). Not all tables have to be for tracking physical objects (e.g. plant samples, fungal cultures) though. One of the conventions I found useful was to have a table to track the events or actions that occurred to objects. So for example, there was one table for fungal cultures, and another for culture events. In the events table, I logged observations, subculturing events, DNA extraction events, and so on, and then each of these is linked back to one and only one culture in the culture table. The key advantage of this setup vs just adding these additional details to columns in a spreadsheet is that it allows for an arbitrarily large number of observations for any culture.

Once you decide on the tables, then you can think about what type of information goes within each table and the practical work of creating the primary and foreign keys that link them all together. A piece of advice: consider using UUIDs for your primary and foreign keys. Might be a little more complicated than using integers as unique identifiers, but are less likely to cause problems in the future. And then you can use grep across all your datasets because the IDs are globally unique.

I started out by building a simple database using FileMaker Pro. I had used the software as an undergrad, and was attracted by the ability to use it to serve mobile-ready interfaces (in other words they had a free add-on iOS app). I read, and highly recommend the book FileMaker Training Series: Advanced for an accessible overview of how to think about designing relational databases even if you aren’t using FileMaker. In the end though, I decided not to keep using it, and transitioned everything over into a SQLite database. The primary rationale was that SQLite is non-proprietary, file-based (don’t need a server), and could easily be transitioned into a more complex relational database system (e.g. PostgreSQL) in the future if needed. Plus SQLite plays really nicely with R and Python, which made scripting custom interfaces much more straightforward, and as a bonus is used in Data Carpentry lessons. While I currently use the commercial Navicat software to ingest data into the database as I add to the spreadsheets that make up its tables, I am slowly working on developing a custom scripted interface to automate particular types of occurrence (returning from a field sample collection expedition, for example). In R, dplyr works directly with SQLite so you can write queries in native dplyr syntax. In Python, I’ve had good success with the sqlite3 library.

Things I wish I had known about designing a database (and about doing science):

Think about your schema before you start. It will change over time as you learn about your system and as your projects evolve, but externalizing your mental understanding on paper, on a whiteboard, or on a screen can be really helpful for clarifying what type of data and metadata you will need to collect and when. The easiest way to do this is to start with the simplest case and build out from there.
Think about the type of projects you want to be working on in 5-10 years. Most likely the particular technologies we’ll all be reliant on then don’t exist yet. Invest in learning tools that will be flexible enough to handle this unknown future landscape.
Tidy data is great for more than just using the R tidyverse. It also enables pivot tables in Excel, relational databases of any sort, and processing with powerful and fast command-line tools. If you do nothing else, keep your data tidy and your data analysis life will be so much easier.
Taking a Data Carpentry workshop, or training to become an instructor, are great places to start building these practical skills and internalizing these mental approaches to data analysis. I didn’t see how powerful these types of computational tools could be when used in concert until I started teaching, as a Data Carpentry Instructor, others to do exactly that. Teaching, for me, has been the best way to keep learning.

↧

A Year in Review: Annual Moore Progress Report

December 4, 2016, 4:00 pm

≫ Next: Building Genomics Data Analysis Capacity at NWU

≪ Previous: Hand-crafted relational databases for fun and science

A year ago we were fortunate enough to receive a grant from the Gordon and Betty Moore Foundation Data-Driven Discovery initiative to support Data Carpentry to develop and deliver data skills training to researchers in the life and physical sciences. See our original proposal here. Thanks to the Moore Foundation, this support has allowed us as a community to run more workshops, develop new materials, develop infrastructure and advocate for the importance of data skills training for more effective and reproducible research.

The work of the community has made this possible and we wanted to share our Moore annual report to highlight all the work that has been done in the last year!

Some highlights from the last year

Workshops

Since August, 2015 we have run 93 workshops in ten countries, serving approximately 1,700 learners
We have over 600 completed surveys that show the demographics of our audience as well as workshop outcomes. Workshop attendees are primarily graduate students (31%), but there is also research staff (25%) and postdoctoral researchers (12%) and PIs (12%). Workshop attendees were primarily from the life sciences (45%), with other (21%) and social sciences and library sciences (9%).
Over 50% of learners were new to programming having never programmed or less than once per year.
The vast majority (>90%) of those responding to our post-workshop survey say that participating in the workshop was worth their time and led to improvements in their data management and data analysis skills.
Gender balance in USA workshops (where demographic data is collected) was 51% female and 47% male.
In the USA we are primarily reaching Caucasian (63%) and Asian/Pacific Islander (18%) participants. Black or African American (3%), Hispanic (6%) and American or Alaskan Native (0.2%) are under-represented.

Instructors

With Software Carpentry we have run over 30 instructor training events.
We have over 350 officially badged Data Carpentry instructors.
With Software Carpentry we were involved in forming a Mentoring Subcommittee who have developed and run discussion sessions

Curriculum

With CyVerse and SESYNC we developed a Genomics Data Carpentry workshop, with a grant from Amazon AWS for cloud computing resources
With NEON we developed Geospatial Data Carpentry lessons on working with raster data and vector data.

Strategic Planning, Sustainability, Communication and Advocacy

Hired a Program Coordinator, Associate Director and Deputy Director of Assessment.
Developed a Lesson Development Roadmap to guide the development of new curriculum
Started a joint Partnership program with Software Carpentry.
Developed criteria for requirements for Data Carpentry workshops and self-organized workshops.
Gave presentations and wrote papers on Data Carpentry and data skills for more effective and reproducible research.

In year one the focus was on building the foundation to scale to run more workshops and train instructors, develop materials in new domains, maintain existing lessons and support our instructor community. We also are working to establish Data Carpentry with Software Carpentry as sustainable organizations. We’ll continue these activities in year two and and also work on scaling to reach and engage with more learners and instructors throughout the world!

As always, great thanks to all the volunteers who make all this work possible!

↧

Building Genomics Data Analysis Capacity at NWU

December 6, 2016, 4:00 pm

≫ Next: Instructor Training for Librarians

≪ Previous: A Year in Review: Annual Moore Progress Report

The North-West University in South Africa boasts two next generation sequencing (NGS) platforms and additionally receive terabytes of NGS data annually from local and international service providers. Research projects with NGS components exist in the areas of Microbiology, Zoology, Botany, Nutrition, Agriculture, and more.

The three biggest challenges experienced by researchers and postgraduate students in terms of data analysis are as follow: * many of the students entering NGS projects have limited prior exposure to molecular techniques such as Sanger sequencing and PCR, and genetics concepts; * there is limited access to bioinformatics support and training (although there is lots of access to short interventions like 1- or 2-day workshops with no sustained follow-up); * and they are not aware of the range of research compute infrastructures which are available to them.

In September 2016, the NWU eResearch Initiative helped to establish a Genomics Hacky Hour (GHH) Study Group to support postgraduate students and researchers using NGS technologies. The original intention of the GHH was to bring researchers together to work on their current projects. However, limited shared NGS vocabulary hampered constructive communication amongst researchers and it was decided that specific topics would be discussed during the first few sessions, lead by a study group leader.

The GHH members participated in a locally ran Software Carpentry Workshop in November 2015, where they were introduced to the basic concepts of reproducible research and various tools such as Shell, git, GitHub, and either Python or R. The GHH Study Group sessions provided a safe, informal post-workshop learning environment for participants to continue their learning.

In January 2016, several students and supervisors enrolled for the Coursera Genomics Data Science Massive Open Online Course (MOOC). The GHH sessions were used to discuss challenges and solutions specific to the Coursera course and the hope was that, with a better support structure, participants would be able to stay the course and complete the 7-module specialisation over the next 9 - 12 months. The learning curve was very steep for several of the modules and we realised we needed additional learning opportunities even to complete the MOOC.

In April 2016, two PhD students with NGS projects participated in a locally hosted Software/Data Carpentry instructor training workshop with the idea to host a Genomics Data Carpentry workshop soon after. The NWU hosted its first Genomics Data Carpentry workshop from 26 - 29 September 2016 lead by the two newly-qualified instructors, Bianca Peterson and Maryke Schoonen, alongside Jason Williams, Assistant Director, DNA Learning Centre, Cold Spring Harbor Laboratories.

The workshop was run on AWS instances courtesy of Data Carpentry. One of our concerns was that, contrary to other Carpentry workshops, researchers wouldn’t have access to the software environment after the workshop to continue practicing their newly acquired skills and play around with their own data.

Luckily, NWU is one of the founding members of the African Research Cloud (ARC) and we were able to get access to enough instances on this infrastructure after the Data Carpentry workshop. Tim Carr from UCT eResearch worked with Jason Williams to build a replicate of the AWS Genomics Data Carpentry instance on the ARC and shortly after the workshop our participants were able to continue their learning in a familiar environment.

In the past few weeks the GHH folks have been working through the Data Carpentry genomics lessons at their own pace to reinforce what was learned during the workshop and complete some of the exercises that weren’t covered. These exercises have strengthened individual knowledge, built trust amongst participants and made them more aware of available information, tools and resources. We are already planning additional exercises to augment what is covered in the Data Carpentry Genomics lessons .

Take-home message: genomics capacity building initiatives can not be limited to workshop participation, but require long-term continuous learning (i.e. post-workshop participation) and support. It is important to focus efforts on ‘what works’ at the level of the individual, department and organization, whether it be running workshops, doing MOOCs or getting involved in study groups. Some words from one of our GHH folks: “You will feel stupid and want to give up a thousand times, but if you stick with it and work through the material and exercises, you will get to a level where you can analyze your own data.”

↧

Instructor Training for Librarians

December 7, 2016, 4:00 pm

≫ Next: Discovering the data science community, becoming part of it, and expanding it

≪ Previous: Building Genomics Data Analysis Capacity at NWU

We are pleased to announce that we are partnering with csv,conf (a community conference for data makers everywhere) to run an instructor training class specifically geared for people interested in Library Carpentry. The class will take place in Portland, Oregon, on May 4-5, 2017; for details, please see the full announcement.

↧

Discovering the data science community, becoming part of it, and expanding it

December 8, 2016, 4:00 pm

≫ Next: Climate Science and the Command Line

≪ Previous: Instructor Training for Librarians

My path to becoming part of a data science community was happenstance. I started programming as a graduate student because I wanted to increase the spatial and temporal scales of my analyses. The scale increase was from one location for a few months to the entire coast of Western North America for many years. For larger scale analyses, I needed satellite data, which I downloaded from a data repository. I learned to use shell scripts to automatically download data because going through 5 to 10 internet links to manually download hundreds of files was tedious. The files with satellite data were large and in special file formats so I learned how to access and analyze them using R.

When I first started programming, I would spend hours (sometimes days) trying to figure out a single R command. For example, I would want to find monthly means for my data but the dates would need to be converted to a different date format before I could make the monthly mean calculation. The process for figuring out the R command for converting date formats would be:

I would type keywords such as “date convert R” into a Google search. Then the frustrating sequence would be: click search result, read text, click back arrow, click next search result, read text, click back arrow, etc. I would add more keywords and subtract keywords. I would stare at my screen and type a few of the R commands that I discovered during my Google search into my R console. In response, R would print an incomprehensible error message. I would desperately try the same R commands again because I thought through some coding miracle that they would convert the date the next time. I would update R, restart the program, and try again. Then I would go back to Google and eventually find the answer or ask for help from more advanced R coders.

Persistence is critical when learning to code. At the most frustrating moments, I thought that I was going to be a student forever because my data would never be analyzed. This feeling of frustration would be quickly followed by euphoria when I finally typed in the R command that converted the dates to the correct format for my analysis. My feeling of euphoria would transition into feeling powerful in a “the sky’s the limit” way because I could quickly convert dates for infinite numbers of data files. Then I would try to write the next line of code, and I would be back to typing words in a Google search.

Through my graduate school experience, I recognized the power of using programming to more efficiently and effectively analyze scientific data. I was talking with faculty in the School of Oceanography at the University of Washington about a postdoctoral position when I saw the announcement for the data science postdoctoral fellowship in the eScience Institute on a mailing list. I was doing a lot of programming, but I was not sure if it counted as “data science” or “big data.” However, I was intrigued by the opportunity, and the faculty in the School of Oceanography encouraged me to apply.

Breakthrough moment

I started my postdoc in the eScience Institute, which was rapidly expanding from a small group of faculty to a much larger organization with graduate students, postdocs, data scientists, and research scientists. I transitioned from being an observer to a participator in the data science community when I started asking questions about how to publish code from my research projects. I thought publishing my code was important, but publishing code on my personal website did not seem worthwhile because my website was not a long-term archive. A data scientist told me about GitHub (a platform designed for version control and sharing code) and Zenodo (a long-term archive connected to GitHub). We also discussed documentation and software licensing. In a short period of time, I revised my code and archived a version on Zenodo as the scientific paper based on it was published. I felt my publication was more complete because the code was available.

Part of the team

I learned about the reproducibility working group when I asked questions about publishing my code and started attending meetings. The goal of the reproducibility working group is for researchers to share the entire process including code that produces results. More transparency in the calculation of results will increase confidence in scientific discoveries. We are working on initiatives to promote reproducibility at the University of Washington. Through the reproducibility working group, I learned about Software/Data Carpentry. I started as a helper at a Software Carpentry Workshop and eventually became a certified instructor.

Joining the reproducibility working group and becoming an instructor for Software/Data Carpentry helped me to become an active member of the data science community. I have met other researchers who are working on really different subjects but using similar methods. I really enjoy chatting with other instructors about techniques for teaching programming. For someone looking to join a data science community, my recommendation is to get involved.

Data science ambassador

I spend most of my time doing oceanographic research. When I talk to graduate students, postdocs, and faculty in oceanography, I make a point of talking about the resources and opportunities available in the eScience Institute. If someone asks me about learning Python or R, I steer them towards a Software/Data Carpentry workshop. An oceanography graduate student was really interested in publishing code with a publication, so I walked the student through the process that I use for publishing my code. Thus far, my conversations have been informal, but I am organizing a more formal presentation that will address both reproducibility and data science educational opportunities in the environmental sciences. By reaching a wider audience, I hope that the connections between data science and environmental science communities will grow.

Data science ecosystem

Who is in your data science community? Is your community growing? How do you attract new members to your community?

Comment below, and tweet us your thoughts @datacarpentry and @eco2logy.

↧

Climate Science and the Command Line

December 11, 2016, 4:00 pm

≫ Next: Feedback on Communications

≪ Previous: Discovering the data science community, becoming part of it, and expanding it

Climate science has a pretty high profile these days, particularly in my personal area of research, which involves quantifying the role humans have played in climate change (otherwise known as “climate attribution”). Behind the glamorous media coverage and international meetings, however, research in this field centers around the most unglamorous tool of all: the command line. From data management and workflow coordination to remotely accessing supercomputing facilities and backing up code, the command line is the most critical part of my toolbox. Let me explain…

Data management
Over 30 research groups from around the world submit data to the international climate modelling projects that form the basis of reports such as those produced by the Intergovernmental Panel on Climate Change (IPCC). Couple that with the multiple experiments that each of these groups run with their models and the wide variety of variables they provide data for, and climate scientists quickly find themselves dealing with hundreds (if not thousands) of data files. Managing these files with Finder or Windows Explorer would clearly be a nightmare, but at the command line it’s a breeze. By using a strict data reference syntax for naming my files (see my blog post on the topic here), I can quickly and easily locate the data I need using the find command and/or a combination of the ls command and wildcards.

Workflow coordination
Climate models break the world up into a series of grid boxes, which means the data that these models provide (e.g. temperature, humidity, atmospheric pressure, ocean salinity) are typically multi-dimensional arrays indexed by time, latitude, longitude and elevation (or depth). The Earth is a big place and climate models are using increasingly fine resolution grids, so these arrays can be very large. To keep the required time and/or memory required to process these arrays to a manageable level, I usually split my workflows into a number of data processing steps, with the output from each step saved as a series of intermediate files. Keeping track of all the data processing steps can be tricky, which is where the command line comes in. By making each step executable at the command line (e.g. all my Python scripts can be executed at the command line), I can chain them all together in a shell script. If one or more of the steps takes a particularly long time to run (i.e. I’d rather not run it unless absolutely necessary), I use a Makefile to manage the process instead. Whenever I make an update to my code or data, Make is able to figure out which intermediate files need to be regenerated and which don’t, allowing me to avoid re-running time consuming parts of the workflow that haven’t changed.

The other nice thing about making my data processing steps executable at the command line is that I can use any tools or languages I want. Most of the big players in weather and climate science software development (e.g. the UK Met Office) have converged on Python as their language of choice, but there are (for example) some old but still excellent command line tools out there for dealing with netCDF files (the default file format in weather and climate science). The command line is the place where all my tools can be linked together.

Remote access
Another consequence of the enormous size of international climate model datasets is that you can’t just store the data on your own computer. Instead, the data are stored and managed by a network of supercomputing facilities around the world. It would be impractical to shuffle small subsets of the data back and forth between the supercomputing facility and my workplace, so instead the supercomputer facilities (like the National Computational Infrastructure for me in Australia) encourage you to do your data analysis on their computers, which have direct access to the data. The way to access these remote computers is via the command line.

Version control
Last but not least, I use the command line when backing up my code. There are graphical user interfaces available for Git, but setting these up seems more trouble than it’s worth. Most days I only use four commands (git add, git commit, git push, git pull), so it’s easier just to type those into the command line.

I blog and tweet about all aspects of computational best practice in the weather and climate sciences, so please subscribe / follow if that sounds relevant to your work!

↧

Feedback on Communications

December 11, 2016, 4:00 pm

≫ Next: Growth Mindset

≪ Previous: Climate Science and the Command Line

Software and Data Carpentry have at their core a collaboration-driven ethos, and communication is key to that collaboration. We’re reaffirming our commitment to open and transparent communication, because we know we can do better! We want to give community members opportunities to talk to each other, staff and Steering Committees, to get updates on efforts and activities and to generate ideas and participate in discussions.

So, first, we want to hear from you!

What ideas do you have about communication? What do you want to hear from us? What channels do you like to use for communication? Do you like email lists or forums that include every topic, or ones on particular questions, domains or regions? What do you like about communication now? What don’t you like?

We’re going to be working on communication channels and strategies to promote and support these ideas, and continue to make the Carpentries a community that you are excited to be a part of, so please let us know what you think!

Please respond as a comment to this post, or in our “conversations” repository on GitHub (we’re considering these our suggestion boxes) if you have particular topics. Thanks for your feedback!

To be true to our ethos and effective in our mission, we need to be able to communicate effectively about both aspirations and ongoing efforts so that we can learn from each other, identify critical issues, recover quickly from mistakes, evaluate ideas and commitments, and make strategic decisions.

As a community, we communicate in many ways and for different purposes.

Community members take initiative to coordinate activities.
Staff and committee members seek community input.
Staff and committee members report actions and deliver products to the community.

We know that we need effective ways for:

Community members to propose ideas for new work or directions for ongoing work.
Community members to organize work efforts around a particular issue.
Community members to stay up-to-date on work going on in the community, including work done by staff members, Steering Committees and subcommittees and unofficial groups of community members.
Staff to jointly decide on priorities, form productive collaborations and keep up-to-date on progress of projects.

We also know that there may be other communication needs we have as an organization that we haven’t yet considered. We invite anyone who has experience in communications, in building open communities or who simply has thoughts about these issues to contribute as we work to develop an thoughtful, efficient and transparent communications strategy.

We envision this blog post and our new “Conversations” repository as a first step in developing this strategy.

To take part in the conversation about developing communication strategy - please respond to this post or to the GitHub issue.

As we work to develop a communications strategy, Carpentry staff will actively monitor this thread and follow-up on issues.

↧

Growth Mindset

December 11, 2016, 4:00 pm

≫ Next: Congratulations to Greg on his new position!

≪ Previous: Feedback on Communications

Each of us is born with a unique set of interests. Along the way we develop beliefs about our interests, and those beliefs determine whether we choose to cultivate our interests as we progress through adulthood.

For example, I think I’m a pretty “okay” singer, but I’ve always wanted to be a dynamic performer and sing in front of large audiences. Therefore, I learned to read music, joined several choruses and ensembles over the years, and frequently sing karaoke. I even auditioned for the American television series The Voice!

This describes what Carol Dweck calls the growth mindset. Those having a growth mindset believe any skill or ability can be acquired if one truly invests time, study, and effort.

You may think, on the contrary, that our unique interests and qualities are fixed. “No matter how hard I practice I’ll never be able to dunk a basketball.” This describes Dweck’s fixed mindset, or, the belief that our interests are innate, and failure to succeed in a particular area means one simply lacks the necessary abilities.

Let’s think about these two mindsets, growth and fixed, in the context of Data Carpentry’s instructor training and workshops. What kind of instructors would we arm our communities with if we trained them with a fixed mindset? An instructor with a fixed mindset would tell our learners, unless you’re brilliant and already have a knack for programming, you’ll not be able to develop the skills you learned in this workshop to conduct meaningful (and reproducible) research.

Aren’t you glad our instructors embody a growth mindset?

Without naming any names I’d love to hear about an experience you had with an instructor who embodied a growth mindset. What did s/he say to you to get you to invest time and effort into cultivating your data management and analysis skills? Tell us your story below.

↧