Mapping data flows (with fish)

6 min readMar 23, 2020

One of the most useful tools we’ve adapted for working on fisheries data issues is mapping data flows. Data mapping exercises are common for people designing data networks or IT systems, and can be very detailed and technical to guide development and hardware choices. But that’s usually not what we’re doing with data mapping.

Boxes, arrows, and line illustrations of IT system components — A map of the the IBM Netcool Operations Insight v1.6.0 system! We don’t need this level of technical detail (but we do have satellites).

We’re using the mapping process to discover common ground and conflicts across the community of data contributors and users. We’re often talking at the program design stage, where you have government agency data needs, service providers that need to satisfy those requirements, and industry partners that create the data, all trying to create a working system of technology and policy.

We want that group to look at the data landscape and articulate:

What “data” are we talking about?
Who controls that data?
Who can access it?
Who pays for it

By discussing each of these aspects — even if in describing them you realize you don’t all mean the same thing — you can surface the kind of people and alignment issues that will tank even the most technically amazing data management systems. Especially if you’re in a highly-regulated space and you don’t have infinite time and lawyers. Better to find out now that one person’s “raw data” is another person’s “processed images” and design a program that gets you what you actually need. Here’s what we do.

Who do you need in the discussion?

Like a design sprint or project kick-off, 8–10 people is a good start. You can do it with a larger group (and we have) but you might want to test it out with a few participants first to preview the sticking points. You could also give people a worksheet to fill out in advance and have them share at the meeting, or send it in if they can’t attend.

You could hold this discussion when you think everyone is on the same page so you can document “yep, this is what we think we’re talking about here.” Or you can do it when you’re not sure what everyone thinks. We’ve found real value in bringing different perspectives into the room because — per the old story of blindfolds and elephants — that’s where you discover big gaps and insights. When we’re data mapping for fisheries, these are the kind of perspectives we’re talking about:

A research scientist, who has very specific data needs and wants to increase data collection (volume and velocity)
A lawyer, who needs to track and ensure compliance
A mid-sized vendor, who wants to be able to meet customer needs without making sixteen custom product lines
A fishing association, who would like to manage their own data system that connects to government services
A data manager, who needs to decide on APIs and handle the back-end design

What supplies do you need?

A white board with two colors of pens, and/or
Two packs of 3” x 5” stickies (great if you have two colors), or
A virtual display — Mural, Google Slides, the whiteboard in Zoom, Office360, etc
Someone to take notes
Someone to run the process (these last two can be the same person but it’s usually easier to have two, especially for a virtual gathering)

How long do you need?

No more than 3 hours. Take a break after 90 minutes. If it’s easy — great! End early and start fixing. If it’s way more complex than you thought, well, now you have a starting point to keep digging. Come back and try another part the next day.

Set up the mapping space

Across the top of our workspace, we’re going to write stages of the ‘data life cycle’ or ‘data value stream.’ The internet has plenty of examples of what constitutes a data life cycle (including one in the NOAA Fisheries Data Management policy). We use:

Plan | Collect | Transmit | Analyze | Share | Store

We keep Transmit in as its own step because the oceans aren’t fully wired (yet), so getting big files from boats to data centers may involve delays of days or weeks as well as postage. Look at a few examples and pick the phases you need to focus on but resist the urge to list more than seven. Cluster where you think things are fairly well agreed on; split out steps where you think there’s confusion.

We often don’t spend a lot of time on the “Plan” phase but we keep it in because YOU HAVE TO HAVE A DATA PLAN. If you work outside Big Tech you have limited resources for data collection and management. You cannot afford to say “collect everything and we’ll figure it out later.” There are also security and privacy risks to having a big pile of data. If you don’t know what you’re collecting the data for, don’t. At least talk about how you will have a data plan and maybe who’s going to create it.

Work from left to right in columns

Below your life cycle stage, start a column on the far left with the four questions:

What “data” are you talking about?

Maybe you want to talk about a lot of data bucketed by source (daily satellite imagery) or maybe you want to be very specific (home addresses associated with licenses). As you move through the value chain you may branch off data products, such as a group of water temperature sensors that becomes daily summary reports under “Analyze” and “Share.”

2. Who controls the data?

Essentially, who gets to set the rules and permissions and who’s ultimately responsible for availability and quality. This can be interpreted as “ownership” but for some groups that might trigger “data as property” discussions you want to steer clear of. This can also be the place where you talk about tricky legal issues like who’s liable if the data get hacked or how to to maintain an evidentiary chain of custody for criminal violations.

3. Who can access the data?

In the detailed notes you want to capture who and how. You might be able to fit this in a sticky box if you have a small group (or a virtual display). If you need quick reference on the wall, go with icons, shapes, or colors so at a glance you can see who can get their hands on what data.

4. Who pays?

Mandatory fisheries monitoring tends to have a blend of financial responsibilities; the government picks up some costs and others are passed on to industry. Data can have value beyond just meeting compliance requirements. One reason we’re having more of these conversations is the growing interest in getting more data access by the fishing industry, global food supply chains, and data analysts outside of fisheries. This question lets you see how the payers and the data users line up.

Here’s an example for a generalized monitoring program using on-vessel video to track catch:

From Collect through Analyze you can see that industry would have to pay for all the costs but couldn’t access data until it’s been analyzed by the vendors and Shared. Then, there’s big question mark about if they could ever see the video (and if they might have to pay extra to do so). Maybe industry doesn’t want to see video and vendors only need to secure video data for government recordkeeping. Or, maybe there’s a different suite of data products that could satisfy enforcement and the science teams. After building out this map with the group, we know where we need to start the discussion about program design and we have a common language to talk about the ingredients.

Have you tried similar mapping for alignment exercises in your work on fish (or something else)? Let us know what you’ve discovered.