• Keine Ergebnisse gefunden

As defined in Section 2.2, data quality has no agreed-upon definition, and apart from being cross-disciplinary, it is also subjective [101]. Also, the publishing of data on portals does not guarantee that it is of good or high quality [35,116]. For these reasons, we hereby do not define how published data can be of good quality, but we discuss the different aspects which influence the quality of the data, whether positively or negatively, and ultimately affect the (re-)use of the published data.

Ochoa and Duval [100] propose a set of metrics to identify metadata quality, based on parameters used for human reviewing. The authors of [116] build upon these metrics, adapting them for assessing the quality of the actual data, rather than the metadata. Similarly, in [71,79], the authors discuss a number of quality dimensions, as found in the majority of related literature. We here establish the following criteria which are considered by most efforts in the literature for calculating data quality.

Usability- This is the most “generic” quality criterion. By usability we meanhow easily can the published data be used. It is the most generic as it depends on other quality dimensions whether the published data is usable or otherwise. For example, it is directly related to what degree the data is accessible, open, interoperable, complete, and discoverable [77,88]. The more the published data is usable, the more potential data consumers are encouraged to re-use and exploit the data.

Accuracy- By accuracy we meanthe extent to which a data/metadata record correctly describes the respective information[71,87,116]. With respect to metadata, this quality dimension directly affects the discoverability of datasets, as good quality metadata enables the dataset to be easily discovered by data consumers.

4.3 Data Quality

Completeness- This quality dimension deals withthe number of completed fields in a data/metadata record [100, 116,136]. Thus, a record is considered complete only when the record contains all the information required to have the ideal representation of the described data. The completeness of the metadata, like accuracy, also directly affects the discoverability of datasets.

Consistency- The consistency of record fields depends on whether theyfollow a consistent syntactical format, without contradiction or discrepancywithin the entire catalogue of metadata [71,82]. Apart from the syntactical format, a field is considered to be consistent if the respective values are selected from a fixed set of options. An example of inconsistency is if within two records the use of “U.S” and

“United States” is interchangeable. Another example is the representation of dates, where the date, month and year follow an arbitrary order.

Timeliness- By this quality dimension we meanthe extent to which the data or metadata is up to date.

As pointed out in Section4.1, the organisational approach affects the timeliness of the published data, which depends on whether the data is directly or indirectly provided by the data publisher.

Accessibility - As identified by the authors of [100], the accessibility quality dimension has two measures. The cognitive accessibilitydefineshow easy it is for a data consumer to understand the published information. Several aspects of the data affect the cognitive accessibility, such as the ambiguity of the data, discussed in Section3.5. The second measure is thepsychological or logical accessibility, which can be defined asthe ease with which the relevant dataset is discoveredthrough a data catalogue or repository. This quality dimension is affected by the format in which the data is published, the search tool used, and the discoverability of the dataset [82].

Openness- The openness of a dataset directly influences the use, re-use, and re-distribution of data.

Tim Berners-Lee’s Five Star Scheme for Linked Open Data (Figure4.1) can be seen as a mix of the accessibility and usability quality dimensions. As the authors of [71] point out, open data can be technically defined to be open if it isavailable as a complete set in an open, machine readable format, at a reasonable price which is not more than the cost of reproduction.

The authors of [71], Kuˇcera et al., identify two types of strategies for improving data quality; namely data-drivenandprocess driven. The first involves directly modifying the values of data, such as correcting invalid data values or normalising data. The second involves the redesign of the data creation and

Figure 4.1: Five Star Scheme for Linked Open Data (Source:5stardata.info).

modification processes in order to identify and correct the cause of quality issues, such as implementing a data validation step in the data acquisition process.

Efforts in publications such as [33, 71, 116] take a number of quality dimensions and implement them, with the aim of assessing the quality of published data. Debattista et al. [33] evaluate and assess the datasets’ quality in such a way that consumers can then identify the ideal quality for the intended use, attaching the results of the evaluation to the actual dataset graph. In [71], Kuˇcera et al. focus on the quality of catalogue records within initiatives in the Czech Republic. They proceed to propose some techniques and tools to improve the quality of the data catalogue records. Similarly, Reiche and Höfig [116] propose quality assessment metrics and implement them in three public government data repositories.

C H A P T E R 5

Budget Data: A Use Case and an Assessment Model

Budgetary or financial data is collected and maintained by all governments and public administrations as part of their day-to-day administration. Whilst mostly related to transparency efforts by governments, budget data can also improve democratic participation, allow comparative analysis of governments, and boost data-driven businesses. The importance of publishing government budgetary data can therefore be summarised in four key elements as follows:

1. Transparency- Opening budget data unveils public funds’ management. This increases account-ability and can augment citizens’ trust in public administration, whilst having a potential of uncovering hidden transactions and thus preventing corruption. An important factor which can stimulate corruption is the fact that funding goes through the hands of public officials without further scrutiny. In European Union Member States, this is particularly evident within public procurement, which is prone to corruption owing to deficient control mechanisms [43].

2. Participation - Opaque regimes may compel citizens to engage against the government. A transparent public administration, on the contrary, can stimulate social participation in community enhancement. Open budget initiatives can not only enable meaningful civil societal scrutiny of transnational financial flows, but they can also provide platforms for stakeholders to develop benchmarks that in turn create pressure on public authorities to provide data in a timely, comparable, re-useable and well-structured manner. These platforms can also involve local citizens in the budget planning and auditing phases, by allowing them to interact with the process, providing opinions and suggestions on setting budget priorities, and providing feedback on the published transactions. A virtuous circle can be created, in which both public officials and civil society will realise the value of data and analysis tools, in a collaborative environment open to contributions and engagement.

3. Comparative Analysis- Well organised budget data facilitates researchers and policy makers to compare spending strategies between cities, states, and countries, and also among different administration levels. Visualisation, analytics, and exploration tools can offer different stakeholders an opportunity to scrutinise and interpret financial data related to a region of interest. It also allows comparing allocations and transactions between multiple regions, to visualise detected trends and budget projections, and to investigate anomalies and activities that have been flagged as suspicious.

A necessary condition for comparative analysis is the compatibility and consistency of data from different data sources.

4. Business Value- Publishing budget data can stimulate the creation, delivery, and use of new services on a variety of devices that utilise new web technologies, coupled with open public data.

In fact, Manyika et al. [83] estimate that open data can help unlock between 3 to 5 trillion U.S.

Dollars in economic value annually. Budget data can also generate value by empowering journalists when they report on spending items, and accurate information on public funds’ usage may enable content producers to create better articles.

As a sub-set of the broader open government data, budget data is also being published in open government data initiatives, and unfortunately also suffers from similar challenges and issues. A core issue is the large number of diverse data structures that make the comparison and aggregate analysis of transnational financial flows practically impossible. The tools to present, search, download and visualise this financial data are also nearly as diverse as the number of existing portals. This heterogeneity may even prevent an analysis of the quality of the data for the same funds administered by different funding authorities [145]. Moreover, most of the budget publishing efforts results in simple data catalogues, fragmented and dispersed, because they do not share standards and methodologies [145]. This absence of standards can lead to data misuse [161], or even to results opposed to the initial aims [48].

In this chapter we propose astructured analysis frameworkto analyse open government initiatives that publish budget data. Our aim is to identify problems generated by the lack of standards and help policy makers to understand the importance of various aspects of publishing budget data. We also envision the framework as a tool to design more adequate budget publishing systems. Together with other ongoing initiatives [102,152], we believe that the development of a solid standard can help governments to make their budget data more usable, and thus enable citizen participation in the democratic process.

5.1 Terminology

Although budget-specific terms are already defined in the Economics or Accountancy fields1, some of the basic concepts used in this chapter still miss a specific definition in the context of open data. We here provide a very short description of the most relevant concepts as used in this chapter.

Budget- The description of the amount of money planned to be spent in a specified time period.

Budget descriptions can refer to several levels of specificity, from general (total amount to be spent) to specific (amount by area, or category). There are different types of budget, such as proposed, planned, and certified, which is presented after the budget term.

Spending- Also known as expenditure, spending refers to the amount of money actually spent by the public administration. It can also be seen as the realisation of the budget. There also exist different types of spending, such as planned (according to the budget), authorised (payment order) and executed (money transferred from government to the recipient).

Revenue- The amount of money received by a government administration. Revenues can have several types of origins, such as taxes (revenue, commercialisation), service fees (transportation), royalties (oil and mine exploration), concessions (roads), or financial operations. Predicted revenues, used to specify the budget, may differ from the actual revenues.

Open Budget Data- Any electronic file or set of files containing structured data related toBudget, RevenueorSpending. In order to be strictly designated as “open”, data should follow the Eight Open Government Data Principles, however open data can have varying degrees of openness.

Open Budget Initiative- This refers to any effort that aims to publish budget data. This can take shape as a portal or application which publishes open budget data and allows stakeholders to access it.

1Thehttp://financial-dictionary.thefreedictionary.com/compiles several of these definitions.