• Keine Ergebnisse gefunden

Clock Sync. and Adversarial Fault Tolerance

N/A
N/A
Protected

Academic year: 2021

Aktie "Clock Sync. and Adversarial Fault Tolerance"

Copied!
32
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

Clock Sync. and Adversarial Fault Tolerance

Christoph Lenzen – MPI for Informatics

Danny Dolev – Hebrew U. of Jerusalem

also starring: Ben Wiederhake , Matthias Függer

(2)

Today’s Menu

1. Why does this course exist?

2. What is this course about?

3. Who are you and what do you want?

- discussion in small groups

- sharing your findings with everyone 4. How will we run this course?

- your questions and input on this

5. Heads-up: What comes next?

(3)

Today’s Menu

1. Why does this course exist?

2. What is this course about?

3. Who are you and what do you want?

- discussion in small groups

- sharing your findings with everyone 4. How will we run this course?

- your questions and input on this

5. Heads-up: What comes next?

(4)

- very large (>1010 transistors) -> fault-tolerance mandatory - highly concurrent/parallel -> synchronous operation - very fast (>109 cycles/s) -> communication “slow”

Chips are Distributed Systems

(5)

Chips are Distributed Systems

- very large (>1010 transistors) -> fault-tolerance mandatory - very fast (>109 cycles/s) -> communication “slow”

- highly concurrent/parallel -> synchronous operation

(6)

Clocking VLSI Circuits

cycle r−1 cycle r cycle r+1 cycle r+2

store compute

(7)

Clock Trees

Distribute clock signal from single source!

+ very simple

+ self-stabilizing: recovers from any transient faults + ca. 20ps = 2*10-11s precision (single chip)

clocked element (e.g. register)

(8)

Clock Trees: Scalability Issues

- clock tree is single point of failure

-> components must be extremely reliable

- tree dist./physical dist. = Ω(L) (L side length of chip) -> max. difference of arrival times between adjacent

gates grows linearly with L

-> clock frequency goes down with chip size

(9)

Clock Trees: Scalability Issues

- clock tree is single point of failure

-> components must be extremely reliable

- tree dist./physical dist. = Ω(L) (L side length of chip) -> max. difference of arrival times between adjacent

gates grows linearly with L

-> clock frequency goes down with chip size

- countermeasure: use higher voltage and wider wires -> electro-magnetic interference causes trouble and strong currents induce large power consumption

(10)

GALS: Globally Sync., Locally Async.

GALS: multiple separately clocked subsystems communicate asynchronously

+ removes some clock tree scalability issues

- asynchronous communication risks metastability -> use of synchronizers, several clock cycles latency

(11)

What happens if we do Computer Science

to it?

(12)

Scalable Clocking: Gradient Clock Sync

Synchronize along data flow!

=> bound skew between communicating components

clock tree clock tree + optimism

GCS

(worst-case bound)

(13)

Fault-Tolerance

- redundancy enables tolerating (worst-case!) faults - low-degree distribution networks needed

direction of propagation

(14)

Innocent “Theory” Assumption

time difference can be

turned into a discrete number

time

(15)

Metastability

(16)

Metastability is Rare...

...unless your system runs at GHz speeds!

measurement equipment metastable

(17)

A “CS” Approach to Metastability

AND 0 1 0 0 0 1 0 1

AND

M

0 1 M 0 0 0 0 1 0 1 M M 0 M M

- What can be computed “with” metastable inputs?

- What is the complexity of such circuits?

- Can we avoid synchronizers (and their latency)?

(18)

This, and more...

...is to become a book!

(19)

Treats

We intend to treat you to the

second ≈ 33.33% of its contents!

(20)

Today’s Menu

1. Why does this course exist?

2. What is this course about?

3. Who are you and what do you want?

- discussion in small groups

- sharing your findings with everyone 4. How will we run this course?

- your questions and input on this

5. Heads-up: What comes next?

(21)

Outlook

winter 2020/21: clocking in the past &

future from 40’s to 40’s

this course: fault-tolerant clocking Byzantine faults & self-stabilization winter 2021/22: handling metastability

going beyond synchronizers

(22)

winter 2020/21: clocking in the past &

future from 40’s to 40’s

this course: fault-tolerant clocking Byzantine faults & self-stabilization winter 2021/22: handling metastability

going beyond synchronizers

Outlook

(23)

Warning: Contents May Advance Quickly

lectures content

2 model & getting our feet wet

3-5 limits on Byzantine fault-tolerance 6-8 optimal skew under Byzantine faults 9-11 low-degree clock distribution networks 12-13 self-stabilization and recovery

14-16 opt. skew with Byzantines & self-stabilization 17-19 consensus

20-22 pulse synchronization from consensus 23-24 synchronous counting

25-27 low-degree gradient clock distribution 28 summary & feeling good about ourselves

(24)

Today’s Menu

1. Why does this course exist?

2. What is this course about?

3. Who are you and what do you want?

- introduce yourself

- what you are attending this course for 4. How will we run this course?

- your questions and input on this

5. Heads-up: What comes next?

(25)

Now

~15 min. in breakout room (no recording):

+ implicit soundcheck for everyone + introductions

+ what would you like to take away from this course

+ questions

(26)

Today’s Menu

1. Why does this course exist?

2. What is this course about?

3. Who are you and what do you want?

- discussion in small groups

- sharing your findings with everyone 4. How will we run this course?

- your questions and input on this

5. Heads-up: What comes next?

(27)

Our Expectations

+

+ =

matt.might.net/articles/phd-school-in-pictures/

(28)

Our Expectations of You

1. For each topic (i.e., 2-3 lectures), study the reading assignment.

2. Write a short summary of the topic, including your thoughts and questions. 25% grade contribution 3. Attend* the sessions:

+ brief intro/overview by the lecturer

+ discuss and/or exercise in breakout room + 25% grade contribution from participation

4. After the lecture period is over, write a report on handcrafted questions one of the topics.

50% grade contribution

*Recordings! Contact us in case of privacy concerns!

(29)

Questions?

(30)

Today’s Menu

1. Why does this course exist?

2. What is this course about?

3. Who are you and what do you want?

- discussion in small groups

- sharing your findings with everyone 4. How will we run this course?

- your questions and input on this

5. Heads-up: What comes next?

(31)

Schedule for the next 7 Days

1. Read the 3-page summary of motivation and model by tomorrow.

2. Write an email to the mailing list. Any questions on the summary are highly encouraged!

3. I‘ll present the model and setting in depth on Monday (second opportunity for questions).

4. Study and summarize the reading assigment, handing it in before the lecture on Wednesday!

5. On Wednesday, Danny takes over for the first chapter.

(32)

See You on Monday!

Bring a Question!

Referenzen

ÄHNLICHE DOKUMENTE

Shen, “A Dynamic Task Scheduling Algorithm for Grid Computing System,” in Proceedings of the 2nd International Symposium on Parallel and Distributed Processing and Applications

As an example, the traditional N-modular redundancy techniques, such as TMR (triple modular redundancy) end up in the power overhead of about 300-400% of original non-protected

If there are 2 f + 1 replicas of the protected application on 2 f + 1 framework variants (with f being the number of faults including the exploitable bugs known to the attacker)

For quantified basic failure modes in the safety analysis model and a specific quantified requirement as described in section 1, the table can be used to directly perform a

Furthermore, two workflow modeling tools that we have developed are briefly described: (1) DAVO (Domain-Adaptable Visual Orchestrator which is the foundation for ViGO (Visual

As far as live processes are concerned, the secon- dary computation is almost identical to the primary computation except possibly in the beginning, when a process stays active

• if process state is not saved then takes local checkpoint (with mentioned additional actions) and finishes saving state from the channel where marker was observed (saves this

Fault-containing super-stabilizing distributed algorithms are fault-containing, super-stabi- lizing, and guarantee that the safety property is satisfied within constant time even if