Data-driven business requires data collaboration — both inside and outside technical teams.
But getting it right means understanding the unique challenges data teams face, as well as the role technology plays in solving them.
Let’s look at how data teams collaborate, what’s standing in their way, and how organizations can overcome the most common obstacles.
The unique needs of data teams
When it comes to collaboration, data is different — especially compared to other technical functions.
Take software engineering, for example. Engineers write code on their own, describing and documenting it along the way. They use the development environment of their choice (e.g., VS Code, PyCharm, etc.) and then, after polishing up their code, share their work with others via a separate platform (e.g., GitHub, GitLab, etc.).
Relatively simple, right? Well, that’s not how data teams work.
Their goal is knowledge. That means running quick experiments, creating prototypes, and iterating on feedback as soon as possible in order to speed up time to insight and deliver on business-critical needs. It’s called exploratory programming, and it’s much different from traditional software engineering.
Data professionals aren’t going from point A to point B to ship a product. They’re answering questions, identifying underlying problems, and generally navigating uncharted territory without a clear destination or roadmap in hand. This makes it essential to collaborate with teammates and stakeholders much earlier in the process — and more frequently.
The 3 levels of data collaboration
Data collaboration falls into one of three buckets:
Level 1: Peer-to-peer
If you’ve ever walked by a data professional’s desk and seen two people huddled around a monitor engaged in a lively discussion, you’ve witnessed peer-to-peer collaboration. It happens in real time, and usually one on one.
The value is in team members seeing what their teammates see in the moment.
Peer-to-peer collaboration is ideal for high-stakes and time-sensitive situations. Take pair programming when touching production data, for example. In a scenario where you’re handling business-critical data, two pairs of eyes are always better than one.
The same goes for launching rockets into space. In the Fast Adaptive AeroSpace Tools program at NASA Langley Research Center, pair programming is “highly encouraged.” Why? Because it breaks down communication barriers, simplifies knowledge-sharing, and results in higher-quality code. And yes, it’s more fun than programming solo.
Sometimes teammates just need to jam on code together. It’s how they divide and conquer to solve complex problems faster.
Level 2: Team
When timelines are more forgiving and multiple teammates need to be informed, peer-to-peer collaboration gives way to team collaboration. This generally happens asynchronously.
The value is not just in quality control, but also in tracking and documenting a project as it evolves.
Consider the team-wide code review. In a 2018 study at Microsoft, improving code quality and identifying defects were highlighted as the top two motivations behind code reviews. But team members also cited increasing knowledge transfer as one of the biggest benefits, ranking it as the third most important reason for code reviews. That goes double for data work, where experiments and end goals evolve rapidly.
Comments, change requests, and other responses create an audit log, keeping teammates up to speed on projects they don’t necessarily have direct access to. And as projects are shared with both technical and non-technical stakeholders, more people are empowered to join the discussion, glean insights, and stay in the loop.
To maintain visibility and distribute knowledge across the team, not to mention ensure quality code, teammates need to be able to easily give and receive feedback — as well as keep track of any changes.
Level 3: Organizational
As your team gets bigger, new collaboration pain points start to emerge. Chief among them are discoverability and accessibility. This is where organizational collaboration — how companies as a whole store, organize, access, and leverage data projects over time — comes in.
The value is in everyone being able to quickly find, access, and utilize an ever-growing library of data projects as needed.
In NewVantage Partners' 2022 Data and AI Leadership Executive Survey, barely more than a quarter of respondents — most of them data leaders at Fortune 1000 companies — said their organizations are data-driven. One of the biggest reasons is obvious: The majority (approximately 60%) said they aren't managing data as an enterprise business asset.
Data is an asset that’s value compounds over time. It happens each time a team member checks to see if someone else has already answered a question before wasting time starting from scratch. Or when someone reproduces a colleague’s work and builds on it for another project. Or when a business stakeholder views an existing analysis and plays around with the numbers to self-serve insights.
Companies need to centralize and systematize data projects — as well as streamline who can access them and how — to support data collaboration over the long term and maximize its value.
Combatting collaboration roadblocks
Each level of data collaboration serves a different purpose, but they all have one thing in common: They’re punishing without the right tools.
These are the obstacles data teams face:
Silos
Because most data tools are built for individuals, teammates can’t collaborate in real time without sitting side by side (which is very hard to do when most of the team is working from home) or hopping on a video call. Both options come with inherent (and frustrating) limitations.
The result is silos that diminish productivity.
Reproducibility
To collaborate in any kind of meaningful way, data professionals must be able to access the same data sources and in the same environment (not to mention have the proper security permissions).
With the traditional toolset, team members end up wasting time hunting down credentials and trying to configure their environments instead of exploring data alongside their teammates.
Sharing
After a team member runs an SQL query, writes code, creates a chart, etc., there’s no easy way for them to share their work — especially with non-data folks. They end up gluing together lines of code with corresponding screenshots and sending them out over email or Slack. Then the inevitable follow-up questions come and the process repeats itself.
As one-off projects stay chained to individual machines, they get lost, forgotten, and overlooked.
Data notebooks are collaborative by design
The challenges data teams encounter on a daily basis stymie collaboration. Today’s data teams may have better algorithms, bigger datasets, and more computing power, but those don’t count for much if teams are unable to work together to solve problems.
Data teams need tools that are built to help them think, explore, and decide together, whether they’re coding in real time or sharing a mock-up with business stakeholders.
And here’s what that requires:
- One place to query, code, and visualize datasets, as well as provide written context for team members
- Shared integrations and environments that allow team members to jump into each other’s projects instantly without tedious, time-consuming connection and configuration procedures
- Real-time and asynchronous collaboration options where team members can share the same execution environment simultaneously or simply leave comments for each other, safe in the knowledge that built-in tracking and version control will help them spot any changes and restore older work when necessary
- Sharing options that allow team members to pass along their work as a browser-based link, invite teammates into their projects with granular permissions, or publish work as an interactive and business-friendly article, dashboard, or application
- Centralized, searchable workspaces that act as a single source of truth for all data projects and give team members an easy way to find, store, replicate, and build on their teammates’ work
It’s not magic — it’s a modern, cloud-based data notebook (okay, maybe it is a little magical). And it’s built for the way data teams actually work.
The need for data collaboration isn’t going away — it’s only growing. But the barriers to fast, effective collaboration remain. This isn’t a new phenomenon. The 2017 paper "Exploring Exploratory Programming" spelled it out in black and white:
“Although exploratory programming is prevalent across many applications today, there is currently a lack of tool support for experimentation, including a lack of support for recording and sensemaking of exploration history, and a lack of support for exploration by groups of people.”
Researchers Mary Beth Kery and Brad A. Myers highlighted how exploratory data analysis becomes "highly convoluted" when collaboration is required, concluding that "keeping a shared understanding of an exploration’s progress across a team can be difficult."
Here's what’s changed: The tools data teams need have finally arrived. What will separate the most successful companies from the rest is how they use those tools to empower their data teams to work together.
Take data collaboration to the next level with Deepnote
Get started for free to see how Deepnote supports and simplifies collaboration both inside and outside data teams.