• Keine Ergebnisse gefunden

data sources and methods

Network topology

The network topology and geography of substations and transmission lines have been extracted from the geographical vector data of the onlineENTSO-E

Interactive Map [94] by the GridKit toolkit [95] and published at [96]. The extract was corrected in several steps:

1. 29 alternating current (AC) lines were removed, which were identified as inadvertent duplicates by manual comparison to the online map.

2. Three converters at the end of dangling high-voltage direct current (HVDC) lines and at the border between Poland and Lithuania were introduced.

3. 64 transformers and 12 lines were added between buses with a distance of less than 1 km.

4. 60AClines carrying circuits of two different voltage levels were identified by inspecting the descriptive text tag and are split into several lines.

The electrical parameters are derived by assuming the standardACline types in Table3.1 for the length and number of circuits. The DC line capacities are assigned from the table in [97]. No transformer information is contained in the map, so a single transformer of capacity 2 GW (i. e. equivalent to four 500 MW transformers) is placed between buses of different voltage levels at the same location, with a reactance of0.1per unit. The transformer capacity assumption is on the high side to avoid introducing constraints where none exist in reality.

The restriction to buses and transmission lines of the voltage levels 220 kV, 300 kV and 380 kV in the landmass or exclusive economic zones of the Euro-pean countries and the removal of 41 disconnected stub sub-networks (of less

3.2 data sources and methods 29

Volt. Wires Series Series ind. Shunt Current App. power level resist. reactance capacit. therm. limit therm. limit

(kV) (Ω/km) (Ω/km) (nF/km) (A) (MVA)

220 2 0.06 0.301 12.5 1290 492

300 3 0.04 0.265 13.2 1935 1005

380 4 0.03 0.246 13.8 2580 1698

Table 3.1:Standard line types for overheadAClines [98]

220 kV 300 kV 380 kV

Figure 3.1:Transmission network model

than 10 buses) produces the transmission network in Figure3.1 of all current transmission lines plus several ones which are already under or close to con-struction (these are marked in the dataset). In total the model contains 5586 high-voltage alternating current (HVAC) lines with a volume of 241.3 TW km (of which 11.4 TW km are still under construction), 26 HVDC lines with a vol-ume of 3.4 TW km (of which 0.5 TW km are still under construction) and 4653 substations.

The countries are partitioned into Voronoi cells as catchment areas, each of which is assumed to be connected to the substation by lower voltage network layers. These Voronoi cells are used to link power plant capacities and deter-mine feed-in by potential renewable energy generation, as well as the share of demand drawn at the substation.

Conventional power plants

Official sources often only report on country-wide capacity totals keyed by fuel-type and year like the Eurostat nrg_113a database [99], the ENTSO-Enet gener-ation capacity [100] or the ENTSO-E Scenario Outlook and Adequacy Forecast (SO&AF) [101, 102], while only seven countries1 have official power plant lists collected and standardised by the Open Power System Data (OPSD) project [103].

This gap has been gradually closing sinceENTSO-Estarted maintaining a power plant list (ENTSO-E PPL) on their Transparency Platform [104]. Unfortunately, it is still far from complete, for instance even after excluding solar and wind generators, the total capacity represented in Germany amounts only to about 54.5 GW, while theSO&AFreports 111 GW, 109 GW of which are also covered in the German Bundesnetzagentur (BNetzA) Kraftwerksliste [105] excluding power plants that have been permanently shut down.

The powerplantmatching (PPM) tool and database [106] we present in this section achieves good coverage by (1) standardising the records of several freely available databases, (2) linking them using a deduplication and record linkage application and (3) reducing the connected claims about fuel type, technology, capacity and location to the most likely ones.

PPM incorporates several power plant databases that are either published under free licenses allowing redistribution and reuse or are at least freely ac-cessible. In the order of approximate reliability, there areOPSD [103],ENTSO-E

PPL [104], DOE Energy Storage Exchange (ESE) [107], Global Energy Observa-tory (GEO) [85], Carbon Monitoring for Action (CARMA) from 2009 [108, 109]

and the WRI Powerwatch project [110]. All of them are brought into the stan-dardised tabular structure outlined in Table3.2 by explicit maps between the various naming schemes and additional heuristics identifying common fuel-type or technology keywords likeligniteor CHPin theNamecolumn. Further-more, the Name column is cleaned by removing frequently occurring tokens, power plantor block numbers, for instance.

SinceOPSD, ENTSO-EPPL andESEreport individual power plant units for at least some power plants, in a first step we use the deduplication mode of the java application Duke to determine units of the same power plant. Duke [111]

is a free software extension of the search engine library Lucene that determines probabilities whether pairs of records (of the same or different tables) refer to the same entity. It computes conditional probabilitiespi,jc :=P(Mi,j|xi,jc )for the event Mi,j := “records i and j match” given the data xi,jc in column c of these records from mostly character-based similarity metrics like theJaro distanceor the Q-gram distance [112] skewed into a configurable interval and combines them into an overall matching probability as

pi,j :=P(Mi,j| ∩cxi,jc ) = cp

i,jc

cpi,jc +c(1−pi,jc ) . (3.1)

1 BE, DE, FR, HU, IE, IT, LT as listed by the Open Power System Data project at http:

//open-power-system-data.org/data-sources#23_National_sources

3.2 data sources and methods 31

Column Argument

Name Power plant name

Fueltype {Bioenergy, Geothermal, Hard Coal, Hydro, Lignite, Nuclear, Natural Gas, Oil, Solar, Wind, Other }

Technology {CCGT,OCGT, Steam Turbine, Combustion Engine, Run-Of-River, Pumped Storage, Reservoir }

Set {PP, CHP}

Capacity Generation capacity in MW lat/lon Latitude and Longitude

Country {EU-27+ CH + NO (+ UK) minus Cyprus and Malta}

YearCommissioned Commissioning year

File Source file of the data record

projectID Identifier of the power plant in the original source file Table 3.2:Standardised data structure for the power plant databases.

This formula, a simplified variant of Naive Bayesian Classification, can be de-rived from the Bayes Theorem under the assumptions of pairwise conditional independence of thexi,jc and unbiased prior probabilities whether two records match or do not, i. e. P(Mi,j) = P((Mi,j)C) = 0.5. The former assumption un-derlies all naive bayesian classifiers and ignores for instance the correlation between technology and capacity (run-of-the-river turbines are typical small (20 MW), whereas nuclear power plants are typically large, with a median ca-pacity of 2 GW). The latter assumption means literally that any two power plant entries from two different datasets have a prior probability of50% to refer to the same power plant, while the real probability is less than NN2 = N1, seen from the comparison of two identical datasets of length N. To include such a more realistic prior assumption into the matching process would require changing several internals of the Duke library and is out of the scope of this work. Never-theless, the model has already been successfully applied in practice [113,114].

For the aggregation of power plant units, Duke is configured to use the metrics and intervals described in Table 3.3 to return the probabilities pi,j >

0.985 between likely pairs i and j. In the power plant matching tool of the authors, these are used as edges in a directed graph of records and the cliques of this graph2 are aggregated as power plants. Note the low end of the interval for measuring the similarity of the fuel-type chosen to prevent merging units with different fuel-types into the same power plant.

For linking the six databases, PPM runs Duke in Record linkage mode on every pair of databases and determines the most likely links above the thresh-old of 0.985. These links are joined to chains by collecting the records across all databases that match to the same plant in any database. The chains are reduced by keeping only the longest chains, until they are consistent, i. e. each power

2 A clique in a directed graph is a subset of the nodes such that every two distinct nodes are adjacent.

Column Deduplication Record linkage

Comparator low high Comparator low high

Name JaroToken 0.09 0.99 JaroToken 0.09 0.99

Fueltype QGram 0.09 0.65 QGram 0.09 0.7

Country QGram 0.01 0.51 QGram 0.0 0.53

Capacity Numeric 0.49 0.51 Numeric 0.1 0.75

Geoposition Geo 0.05 0.55 Geo 0.1 0.8

Table 3.3:Duke comparison metrics and intervals for aggregation of power plant units (deduplication) and linking different power plant tables (record linkage).

JaroToken breaks the full string into several tokens, evaluates the Jaro Win-kler distance metric for each and returns the compound Jaccard index [112].

These parameters have been chosen by hand and plausibility, while instead they should be tuned for a representative subset to an ideal match by Duke’s Genetic algorithm.

plant appears only in at most one chain. This could likely be improved by join-ing chains recursively, while keepjoin-ing track of the chain probability based on a variant of Eq. (3.1) at the expense of not being able to rely on the fast pandas routines any more.

For the remaining chains the power plant information is aggregated by tak-ing the most frequent Fueltype, a comma separated list of the Technology(-ies), the meanlat/lonand the medianCapacity. The latter ensures that the shutdown or addition of a block of a power plant which is not yet reflected in a minority of databases does not distort the final capacity.

The compound dataset, at the time of writing, contains 3465 power plants with a total capacity of 705 GW. Less than a third of these are represented in 3 or more sources, but still account for about two third of the capacity. 2494 small power plants with an average capacity of about 93 MW appear in only two databases. There are a further841power plants with 21.7 GW capacity in the

OPSD dataset unmatched by the other free datasets and exclusively compiled from official sources. After including these power plants the mean absolute error from the SO&AF country-wise capacity is at 9% of the average capacity and below a 33% deviation in each single country except for Bulgaria and Lithuania. Refer to the companion paper for a more detailed comparison of the free dataset with the proprietary World Electric Power Plants dataset [115].

Hydro-electric generation

Existing hydroelectric capacities ensue from the same matching process as the conventional power plants, particularly based on the sourcesESEandENTSO-E. The capacities are categorised into run-of-river, reservoir and pumped stor-age. Reservoir and pumped storage have energy storage capacities that are estimated by distributing the country-aggregated energy storage capacities re-ported by [116, 117] in proportion to power capacity. Run-of-river as well as

3.2 data sources and methods 33

reservoir hydro capacities receive an hourly-resolved in-flow of energy. Exten-sions to the current hydro capacities are not considered.

Renewable generation time series like hydro-electric in-flow, wind and solar are derived from the re-analysis weather dataset CFSv2 by the US National Oceanic Atmospheric Administration [118]. It provides wind speeds, irradia-tion, surface-roughness, temperature and run-off on a 0.2°×0.2° spatial raster (x∈ X) in hourly resolution since2011.

The simplified in-flow time series is generated as in [40,116] by aggregating the total potential energy at heighthx relative to ocean level of the CFSv2 run-off dataRx in each country cby

GcH(t) =N

x∈X(c)

hxRx(t) (3.2)

where N is chosen so that R

tGcH(t)dt matches the U.S. Energy Information Administration (EIA) annual hydroelectricity generation [119]. The in-flow is distributed to all run-of-river and reservoir capacities in proportion to their power capacity.

Wind generation

Following the methodology in [78], the wind speeds at 10 m above ground u10 mx (t) are extrapolated to turbine hub-height h using the surface roughness z0xwith the logarithmic law

uhx(t) =u10 mx (t) ln h/z

0x

ln(10 m/z0x) . (3.3)

The capacity factor of each raster cellx for a wind turbine with powercurve Pw(u)and generator capacityPwmaxis determined as

cx,w =

Pw(uhx(t))t

Pwmax (3.4)

and together with the usable areaAx,wthe maximally installable wind genera-tion capacityGx,wmax= 0.3·10 MW/km2·Ax,wis calculated, where 10 MW/km2 is the technical potential density [28] and 0.3 arises out of considering compet-ing land use and issues of public acceptance.

The usable area is restricted by the following constraints: Onshore wind can only be built in land use types of the CORINE Land Cover database [120]

associated toAgricultural areasandForest and semi natural areasand furthermore a minimum distance of 1000 m fromUrban fabricandIndustrial, commercial and transport units must be respected. Offshore wind can only be constructed in water depths up to 50 m. Additionally, all nature reserves and restricted areas listed in the Natura2000 database [121] are excluded. The wind generation potential in Germany is shown in Figure3.2.

Each Voronoi cell V of a substation covers multiple cells of the re-analysis weather grid, as described by the indicatormatrix IV,x = area(Vx)/areax,

and we distribute the wind turbine capacity according to a normed capacity layout

GV,x,wp.u. = IV,xcx,wGmaxx,w

x(·) (3.5)

which prefers cells xwith high capacity factor cx,w and high maximally instal-lable capacityGx,wmax. The wind generation availability time-series at a substation with Voronoi cellV is, thus,

V,w(t) =GV,wxG

V,x,wp.u. Pw(uhx(t))

Pwmax (3.6)

for an installed capacityGV,w. This capacity is expandable until reachingGx,wmax in any grid cell up to

GV,wmax= min

{x|IV,x>0}

IV,xGmaxx,w

GV,x,wp.u. . (3.7)

The power curve of the turbine Vestas V112with a turbine capacity of 3 MW and a hub height 80 m is used to generate the onshore wind time-series and the National Renewable Energy Laboratory (NREL) Reference Turbine with 5 MW at 90 m is used for the offshore wind time-series. The accuracy of the wind generation time-series are improved to account for effects of spatial wind speed variations within a grid cell by smoothing the power curves with a Gaussian kernel as

Pw(u) =η Z

0 P0(u00)q 1 2πσ02e

(uu0+∆u)2 2

0 du0 , (3.8)

whereη =0.95,∆u= 1.27 m/s andσ0 =2.29 m/s are the optimal parameters minimising the error between the re-analysis-based time-series and a year of Danish wind feed-in [78]. A study comparing the wind generation time-series based on the re-analysis MERRA-2 dataset for a 20 year period to the per-country wind feed-in and several wind park generation measurements found non-negligible discrepancies of the optimal bias correction parameters between different countries [122]. They will be incorporated in a future version of the presented model.

Photovoltaic generation

The solar availability time-series and maximally installable capacity per sub-station are like the wind generation in the previous section based on the re-analysis weather dataset CFSv2and we will focus on the differences.

The photovoltaic generationPx,s(t)for a panel of nominal capacityPsmaxof a point in time t and space resp. grid cell x is calculated from the re-analysis short-wave radiation. A direct, a diffuse and a ground-reflected irradiation component are derived from the clearsky model of Reindl [123] and

geomet-3.2 data sources and methods 35

Figure 3.2:Wind (l.) and solar (r.) potential power generation after landuse restrictions for weather grid cells in Germany. The generation of all grid cells in a Voronoi cell (also shown in black) is fed into the central substation.

ric relations of the trajectory of the sun and the tilted panel surface [124,125].

An effective electric model by Bofinger et al. [126] determines the active power output from the total irradiation and the ambient temperature. Implementation details are found in thepv sub-package of the atlite package [127] are based on the Renewable Energy Atlas developed at Aarhus University [78]. An in-verter inefficiency reducing the solar generation by21% is assumed to match on average solar capacity factors reported by [128].

For each raster cell x ∈ X the capacity factor cx,s and the maximally instal-lable capacityGmaxx,s =0.01·145 MW/km2·Ax,sis determined as for wind, with the difference that the high technical potential of 145 MW/km2 corresponds to an unrealistic full surface of solar cells, which is offset by allowing only up to 1%. The permitted CORINE land use types areArtificial surfaces, most Agricultural areas except for those with forests and then including only few sub-categories ofForest and semi natural areas:Scrub and/or herbaceous vegetation associations, Bare rocks and Sparsely vegetated areas. Figure 3.2 shows the solar generation potentials.

Equations (3.5)-(3.7) are applied analogously to generate the solar availability time-series ¯gV,s(t)and to find the solar expansion potentialGV,smax. The reference solar panel is the U-EA120 type thin-film silicon panel by Kaneka. Note that photovoltaic generation models based on satellite imaging as provided by ME-TEOSAT have been found to recover measured feed-in time-series and capacity factors with a higher accuracy [129,130].

Demand

There are two classes of approaches in the literature: Most employ a top-down approach by distributing the historical demand curves of each country to the substations in the country according to some geographical key. [87] uses pop-ulation, ELMOD uses a weighted convex combination of population and gross domestic product [91], the REMIX model relies on the area of artificial land-surface from land-cover data [28]. Hülk* et al. [131] extend this approach in two aspects: firstly, they use a rule-based partitioning of the geographical surface combining Voronoi cells with administrative boundaries; secondly, the

electric-Circuit length DE EU

in 1000 km 220 kV 300 kV 380 kV 220 kV 300 kV 380 kV

ENTSO-E 13.70 0.0 20.92 117.25 9.96 146.82

PyPSA-Eur 11.04 0.0 23.76 115.63 10.00 152.55

Table 3.4:AClines circuit lengths of the whole of Europe and Germany as an example ity consumption for each load area is derived for each sector (residential, in-dustrial, retail and agricultural) separately based on OpenStreetMap land use and industrial infrastructure data as well as population density. Alternatively, demand time-series can be compiled in a more involved bottom-up approach by combining industrial and domestic reference profiles according to regional-ized sectoral statistics and finally overlaid by country load profiles, as recently attempted at the KIT [132]; unfortunately detailed documentation of the results has not been published.

For PyPSA-Eur, the hourly electricity demand profiles of each country from 2011 to 2016 are taken from the European Network of Transmission System Operators for Electricity (ENTSO-E) website [133]. The load time-series is dis-tributed to the substations in each country by60% according to the gross do-mestic product (GDP) as a proxy for industrial demand and by 40% as resi-dential demand according to population in a Voronoi cell. The60-40% split is based on a linear regression analysis of the per-country data and agrees with values used in [91]. The two statistics are mapped from the Eurostat Regional Economic Accounts database (nama_10-reg) for NUTS3regions to the Voronoi cells in proportion to their geographic overlap.

3.3 validation

Network total line lengths

In this subsection, total line circuit lengths at different voltage levels in the model are compared with official statistics from ENTSO-E. The lengths of AC

circuits [134] per voltage level and country are compared to aggregations of line lengths times circuits from PyPSA-Eur, so that cross-border lines are equally attributed to both adjacent countries. In Table 3.4 the total line lengths for the whole of Europe and Germany are presented as examples. Considering the data for all countries, the lines in the PyPSA-Eur dataset deviate from the

ENTSO-Elengths of circuits by a mean absolute error of14% for 220 kV,11% for 300 kV and9% for 380 kV lines. These deviations are accounted for by the fact that the ENTSO-E map [94] from which the PyPSA-Eur network is derived is only an artistic representation and does not follow the exact contours of each transmission line. Some differences may also be due to incorrect classification of 220 kV lines as 380 kV lines, or due to the fact that the ENTSO-E map on which PyPSA-Eur is based is more up-to-date with regard to recent upgrades to the transmission network.

3.3 validation 37

PyPSA­Eur osmTGmod ELMOD­DE

Figure 3.3:80 clusters jointly identified by colour in the network topologies of the models PyPSA-Eur, osmTGmod and ELMOD-DE.

Figure 3.3:80 clusters jointly identified by colour in the network topologies of the models PyPSA-Eur, osmTGmod and ELMOD-DE.