Open Data Resource Pack
This Open Data Resource Pack is intended to help public authorities throughout Scotland develop and implement their own plans for open data.
6. Select your data
How do you decide which data to publish first? Prioritisation of data release is necessary as it is impractical and potentially costly to release all your data at once. There is no definitive guidance on data prioritisation, there are many ways an organisation can choose to select its data depending on its goals.
This section will present a list of practical guidelines based on best practice from around the world. Annex A has simple downloadable checklist which has been developed to help you navigate each of the steps.
How to select your datasets?
Step 1: Identify your data and create asset register
Before you know which datasets to release, you must identify what data you hold. If you do not know what data your organisation has, then you may miss out valuable data.
This may seem like a daunting task as your organisation will likely hold its data in various places and across multiple platforms, for example databases, spreadsheets, folders, documents and websites. Do not let the size and scale of the task put you off. This is an important step in your open data journey and will also be useful for other work related to Freedom of Information and Re-Use of Public Sector Information. Beginning with the identification of high level datasets and adding granularity over time will make the task more manageable.
It may be useful to ask colleagues across your organisation to help with this step, people are likely to have a good understanding of the data held within their own department or division.
Capture metadata
When you are identifying your data you should begin to capture metadata. Metadata is descriptive information about the data. It can describe elements such as the content, format, currency and limitations. More guidance on metadata can be found elsewhere in this pack.
At this stage you should attempt to capture as much metadata as possible as it will make things simpler in later stages. You should begin with what you would like to include in your asset register. The checklist in Annex A provides a short list of key metadata elements which you should begin capturing.
Asset Register
You now have enough information to create a comprehensive list of all the data you hold. Your asset register will be used to create an Open Data Publication Plan which will inform the public about the data you hold and intend to release as open data.
Example asset registers
Department for Transport Information Asset Register
Home Office Information Asset Register
This asset register does not need to be published and can be kept as an internal resource. However, it would be possible to combine an open data asset register with your organisations PSI asset register. The 2015 PSI Regulations require your organisation to publish a register of both published and unpublished information assets which fall within its public task. The potential open data that your organisation holds may fall out-with its public task. The PSI asset register could be extended to cover all of the data your organisation holds.
Your register is not static, the information you hold will change over time. Your asset register(s) should be reviewed and updated at regular intervals.
Useful Links
National Archives Asset List Guidance
National Archives Identifying Information Assets and Business Requirements
National Archives Information Asset Register Guidance
National Archives Public Sector Information Guidance
W3C Best Practice - Discover published information by site scraping
W3C Best Practice - Identifying what you already publish
Value Assessment
During the initial stages of identifying data and capturing metadata, you should make an initial assessment about the data's value and priority for release. An initial value assessment can help identify potential priority releases. Each organisation will assess the value of their data differently, depending on their priorities and quality of available data. The checklist in Annex A has a handy list of areas which should be considered in order to assess value.
Step 2: Select the open data you want to publish
When it comes to selecting data to publish, there is no right way. The important thing is to begin putting data out there. We recommend starting small and building up. Focusing on a few key datasets will help you create a maintainable publishing process. You should then add more data over time.
You will have to consider dataset prioritisation. Which datasets should you release first? If you have identified a few priority releases, should these be released together or separately? When prioritising your data you will begin shaping a plan for future releases. This plan or schedule will be helpful when compiling your Open Data Publication Plan.
There are number of easy ways to begin prioritising your data.
Start with your goal
You should return to the goals of your open data project and identify the datasets which support the realisation of those goals.
Quick wins
Sometimes an organisation may choose to release data which is easy to publish openly. Examples include upgrading data already published online ( PDFs, Excel files, Word documents or other formats) into an open format. As this data is already released to the public, converting it to an open format should be easy and non-contentious.
Small, easy releases help get the project off the ground and build momentum, but organisations should be careful not to rely on easy releases for too long as the public may lose faith in the initiative if valuable datasets remain closed.
Demand driven release
Release the data that the users want. Examine informal (e-mail/calls) and formal requests ( FOISA requests) for data. Does your organisation have a twitter or facebook page? Check the comments to see if there are suggestions about possible data you could release. By making the most requested datasets available in a discoverable, open format you can satisfy public demand and help reduce administrative burdens on departments e.g. fewer enquiries or requests.
Another way is to ask the public what it wants. As they are the people who will be using the data, they will likely have a good understanding about which data would be useful. Invite the public to suggest ideas via social media, surveys or on your own website. Hack events are also another great way to generate interest in your open data and find out what people want or need to make their ideas a reality.
Scottish Government Dialogue App
Between 8 th June and 14 th July 2015 the Scottish Government held an open data discussion on the Dialogue App. The Dialogue App is a crowdsourcing software designed for government. It allows the public to suggest, rate and comment on ideas in a collaborative way. The most popular and important ideas can then be easily identified and viewed.
As part of the Open Data Strategy, the Scottish Government made a commitment to engage with the public about which datasets they would like to see released from public sector organisations. The Dialogue App was chosen to hold the discussion as the format enabled everyone to participate in an open discussion.
Over the course of 5 weeks a total of 18 ideas were posted from 9 individuals. Nearly a quarter of the ideas submitted related to the release of information about public sector assets, physical (buildings, land) and non-physical (information, asset registers).
More information about the findings from the Dialogue App discussion and lessons learned can be found in Annex B.
Follow best practice
Open data is growing and there are many public sector organisations both in Scotland and worldwide that are beginning to release their data openly. Don't reinvent the wheel, copy what has worked for others and build upon their success.
Cities and departments all over the world are beginning to release their own open data catalogues. Spend some time browsing their sites, see which datasets are popular and which ones your own organisation could release.
Examples of Open Data Portals |
||
---|---|---|
This list is a very small snapshot of the portals available!
The G8 Open Data Charter, Open Data Barometer and the Open Data Census have all published works detailing what should be considered high value datasets and considered consider priority releases. Of course, some of the datasets may not be relevant to your organisation and you may not be ready to release them just yet, but it is a good starting point if you don't know where to begin.
The following table lists the 14 categories which the G8 considers high value, priority releases. Examples of the types of data which fall under each category are also listed.
G8 High Value, Priority Releases
G8 Category |
Example datasets |
---|---|
Companies |
Company/business register |
Crime and Justice |
Crime statistics, safety |
Earth observation |
Meteorological/weather, agriculture, forestry, fishing, and hunting |
Education |
List of schools; performance of schools, digital skills |
Energy and Environment |
Pollution levels, energy consumption |
Finance and contracts |
Transaction spend, contracts let, call for tender, future tenders, local budget, national budget (planned and spent), international trade data |
Geospatial |
Topography, postcodes, national maps, local maps |
Global Development |
Aid, food security, extractives, land |
Government Accountability and Democracy |
Government contact points, election results (national and local), legislation and statutes, salaries (pay scales), hospitality/gifts |
Health |
Prescription data, performance data, doctor surgery locations |
Science and Research |
Genome data, research and educational activity, experiment results |
Statistics |
Data used to produce Official Statistics including the Census, sample surveys and administrative data. E.g. Datasets would include GDP, skills, unemployment |
Social mobility and welfare |
Housing, health insurance and unemployment benefits |
Transport and Infrastructure |
Public transport timetables, access points broadband penetration |
Useful reading
W3C Best Practice - Discover published information by site scraping
W3C Best Practice - Identifying what you already publish
W3C Best Practice - Understand demand for data
Sunlight Foundation Open Data Guidelines 1 - 7
Step 3: Develop an Open Data Publication Plan
Once you have decided which data you want to publish as open data you should develop a publication plan. The benefit of an Open Data Publication Plan is the public will have a comprehensive list of the datasets you will be publishing open data and when they will be released.
The publication plan does not replace the publication scheme you are required to have under section 23 of FOISA. It should be part of your publication scheme which should:
- signpost your publication plan in your Guide to Information
- explain briefly how your open data will be published
Contact the Scottish Information Commissioner for more information about the Freedom of Information (Scotland) Act and publication schemes.
The publication plan shows the authority's commitment to open data and demonstrates its understanding of the benefits which releasing data openly can bring. As a guide, it is recommended that any Open Data Publication Plan should:
- tell users what information is available as open data
- explain when the information will be available, if it is not already
- tell users the currency of the data, available formats and licensing conditions
- provide contact details should someone want to get in touch about the dataset
- provide details about how users can make recommendations for future
An aim of the Open Data Strategy is for all Scottish public authorities to have published their Open Data Publication Plans by December 2015. Annex A has a link to the template which has been designed to help you do this.
The template uses much of the information captured in the dataset asset register. The main difference between the asset register and the publication plan is that the publication plan will only identify the datasets that your organisation has released as open data, or intends to release as open data in the future.
Completion of the publication plan will likely happen in combination with the creation of a dataset. Further guidance on creating datasets can be found in section 7.
Contact
There is a problem
Thanks for your feedback