메타매스 C

hackernews | | 📦 오픈소스
#메타매스 c #소프트웨어 검증 #프로그래밍 언어 #검증 #메타매스 #반도체 #c언어 #metamath #검증된 프로그램 #증명 보조기 #하드웨어/반도체 #형식적 검증 #반도체 검증 #하드웨어 설계 #메타매스c #반도체검증 #증명보조기 #프로그래밍언어 #하드웨어검증 #소프트웨어검증 #형식검증
원문 출처: hackernews · Genesis Park에서 요약 및 분석

요약

**Metamath C(MMC)**는 기존 프로그래밍 언어와 증명 보조기의 한계를 극복하기 위해 개발된 언어로, C와 유사한 저수준 구문을 사용하면서도 수학적 증명이 가능한 **전체 함수(Total Functions)**와 **분리 논리(Separation Logic)**를 결합하여 정확성을 검증할 수 있는 것이 특징입니다. 이 언어는 러스트의 참조자 의미를 차용한 소유권 및 가변 참조 타입을 지원하고, 컴파일러가 자동으로 **MM0 형식의 검증된 증거**를 생성하며, 정수 오버플로우와 같은 문제를 실행이 아닌 컴파일 타임에 수학적 증명을 통해 방지합니다. 또한 개발자는 타입에 불변식과 소유권 조건을 포함시켜 임의의 복잡성을 가진 타입을 정의할 수 있고, `pun`이나 `entail`과 같은 명령어를 통해 메모리 상태에 대한 정확한 논리적 추론을 코드 내에서 수행할 수 있습니다.

본문

The requirements of verified programs are somewhat unique and not well addressed by either conventional programming languages such as C, Python, Rust, Haskell etc., or proof assistants like Isabelle, Coq, Lean, etc. On the one hand, for a low level language we need ways to talk about imperative procedures, pointer manipulation, while loops and the like, where every construct has a well defined lowering to machine instructions, and on the other hand we need the expressiveness to reason about the program inside an ambient logic, where infinite sets and undecidable predicates are common. These can sometimes be approximated by assertions, which have the advantage of being executable, but these can only be used for dynamic analysis, and in the context of a formal proof of correctness, executability of intermediate assertions is irrelevant and limiting (although it is nice to have when available). Metamath C (abbreviated MMC) is a language that uses total functions (provably terminating mathematical functions as one would find in HOL or a dependent type theory) for its semantics, combined with inline separation logic through erased "hypothesis variables" for reasoning about heap structures and non-functional components, on top of a C-like structure that is used to provide well defined and predictable lowering to machine code. MMC is currently written in arguments to the MMC compiler, which has an interface through MM1. As a result it inherits the same Scheme-like syntax used in MM1 tactics. (This may change in the future.) MMC has an extensible type system, and it produces MM0 proofs in the back-end. Because types are implemented as "type guards", they have an independent existence as well, and there are primitives for "casting" a variable to a type T given a proof that it has type T . A type is a function that maps values to separating propositions over machine states. That is, it is a true-or-false statement that applies to portions of the machine state (registers and memory). This is a very low level view, but it has the advantage that because it is so general, users can define types of arbitrary complexity, containing invariants and ownership semantics. Types also contain a size and an alignment, although for the x86 instantiation of MMC all types have alignment 1. Here are some basic types: () | bool | u8 | u16 | u32 | u64 | i8 | i16 | i32 | i64 | (own T) | (array T n) | (sn {a: T}) The first few represent signed and unsigned integers of various widths. The (own T) type is an owned pointer to a type. The (array T n) type is a contiguous sequence of n elements of type T . (The value n is not stored anywhere in memory, it is a parameter of the type.) The (sn a) type (singleton type) is the type of values that are equal to a: T . Types can depend on values, which is often known as dependent typing, however unlike most dependent type theories very little "computation" is done inside types, and instead a (pun) must be used to re-type a term given a proof that it has a type. (This is not like a C cast that can be used to reinterpret a type - the value must satisfy all invariants of the target type to be eligible for pun .) There are two other pointer types, inspired by Rust semantics: (& T) | (&mut T) The (& T) type of shared references is not a real type, but desugars to (& p T) where p is the "provenance" of the allocation from which T derives. The compiler inserts an assertion that all pointer provenances are alive to the function's frame condition. (The proper way to handle this is with ghost state as in Iris, but that is future work.) The upshot is that these types can be manipulated and copied like regular shared references. The (&mut T) type of mutable references is also not a real type, but desugars to (mut {x : (own T)}) , which corresponds to an inout parameter of type (own T) . That is, this is an argument that is passed into the function, and must be preserved and passed out at the end (but the output part is only a ghost parameter). The compiler ensures that the value is not dropped. (mut _) is an attribute that does this inout translation generally. With any of the pointer types (own T) , (& T) , (&mut T) , the notation (* x) can be used in assertions to refer to the value of type T that x dereferences to. This is desugared to x |-> y in the frame and y at the point of use. There are also some additional types used only in ghost variables and propositions. nat is the type of nonnegative integers, and int is the type of integers. These types are unbounded and so cannot be concretized. (As a reminder, even bignums cannot get arbitrarily large. This set includes numbers that provably cannot exist on an x86 machine.) All numeric operations like x + y operate on nat or int , even when the arguments are fixed length integers like u32 . That is, if x y: u32 then x + y: nat . (If subtraction or signed numbers are involved then the result has type int instead.) This means that in the expression language overflow is impossible. Instead, overflow occurs when assigning this expression to a variable (which as mentioned cannot have type nat unless it is a ghost variable). This is done using the cast function, which requires a proof that the computed value lies within the bounds for the target type. The compiler will sometimes be able to prove these goals automatically, and in particular some specific patterns like x + y >= = != compare them to produce a value of typebool . The compiler may fail to find a way to compile comparisons on typenat orint , but if it succeeds then the meaning is that of the abstract mathematical reading. - If x,y have typebool then{x or y}, {x and y}, (not y): bool . The := operator in MMC is the main way to introduce new variables into the context and give them values. In its simplest use it binds the result of the right hand side of the equation to the place on the left, for example {{x : u64} := {1 + 2}} -- now x: u64 := 1 + 2 Here {x : u64} is a type ascription, and indicates that the storage for x will be 64 bits large. In subsequent lines the type checker will know that x is in the context and has type u64 , as well as the fact that it has value 1 + 2 . This fact can be retained in two ways: - After the definition (at least until some control flow that could possibly reassign the value of x is encountered),x is treated as "definitionally equal" to1 + 2 , so it can be unfolded in proofs types involving one will unify with the other. - For more long range retention of the fact, it is also possible to use the sn function to construct and retain a proof thatx = 1 + 2 :{x_h := (sn {{1 + 2} : u64})} -- now x_h: sn (1 + 2: u64) {(x h) := (sn {{1 + 2} : u64})} -- now x: u64 and h: $ x = 1 + 2 $ This is just one example of expressions that return proofs. Another example is the assert function, which returns a proof of any expression: {h := (assert {{2 + 2} = 4})} -- here h: $ 2 + 2 = 4 $ This also works if the statement is false: {h := (assert {{2 + 2} = 5})} -- here h: $ 2 + 2 = 5 $ because assert terminates the program if the assertion fails to hold, so we can safely assume that on the branch where we are still executing the program, the assertion is true. The semantics of assignments here is like that in functional programming languages, i.e. they are let bindings. Variables that are shadowed are still in the context, but are inaccessible: {{x : u8} := 2} -- x: u8 := 2 {{x : u8} := 3} -- x*: u8 := 2, x: u8 := 3 The variable x* shown in the second state is used in printing to refer to a shadowed variable named x . The variable x*: u8 is in the state but hidden by default. It cannot be typed directly in the program. Instead, we can add a with clause to the assignment to rename one or the other variable: {{x : u8} := 2} ({{x : u8} := {3 + x}} with {x -> old_x}) -- old_x: u8 := 2, x: u8 := 3 + old_x {{x : u8} := 2}} -- x: u8 := 2 ({{x : u8} := {3 + x}} with {new_x (ghost old_x) to ensure that the old value of x is marked ghost, so that it can be compiled to a reassignment instead of a copy. The (own (array u8 4)) $}) := (typeof! x)} Here we have stripped the variable x of its rich type (own (array u8 4)) , producing the bare type u64 (representing that the original variable is in a 64 bit storage location) and separately the predicate x :> (own (array u8 4)) (read x has type (own (array u8 4)) ) that this is a valid pointer to an array of 4 u8 's. (There is a variation on typeof! called typeof , which applies in the special case where x is already a base type, so that its type is not changed by the operation of typeof! . That is, if x :> T is a duplicable separating proposition.) The dual operation is pun : {x := (malloc u8 4)} -- x: (own (array u8 4)) {(x h) := (typeof! x)} -- x: u64, h: $ x :> (own (array u8 4)) $ {x := (pun x h)} -- x: (own (array u8 4)) {(x h) := (typeof! x)} -- x: u64, h: $ x :> (own (array u8 4)) $ {{h2: $ x :> (own (array u8 2)) $} := (entail h -- proof of x :> (own (array u8 4)) |- x :> (own (array u8 2)) _)} {x := (pun x h2)} -- x: (own (array u8 2)) In the second example we do a proof in separation logic that x :> (own (array u8 4)) |- x :> (own (array u8 2)) , and use it to reconstitute x at a different type than its original one. It is worth reiterating that all of the above operations are no-ops in the computational semantics. pun and typeof! are identity functions and entail is a proof-only notion, so the only thing that is in the emitted binary is the call to malloc on the first line. "Casting" in C might mean any of the above notions. We separate them in MMC as follows. cast is used for downcasting or upcasting integers, preserving the logical value of the cell. If x: X then (cast x h): Y under the following conditions: X andY are numeric types andh is a proof thatx is within the bounds ofY , for exampleh: $ x Y $ . - If X is a pointer type andY isu64 , thenh is not needed. If the compiler can prove the side goal h , it can be omitted using (cast x) or just x when the context forces type Y . For truncation and generally non-value-preserving type casts, we use the as function. If x: X then {x as Y}: Y under the following conditions: - If X is a numeric type andY isuN , then it is upcast tonat and then downcast using a modulo function. The resulting formal expression ischop N x , which is defined asx % 2^N . - If X is a numeric type andY isiN , then it is upcast toint and then downcast using a modulo function. The resulting formal expression ischopZ N x , which is defined to equal(x + 2^(N-1)) % 2^N - 2^(N-1) . - If X isbool andY is a numeric type, it returns0 or1 . - If X is a numeric type andY isbool , it returnsx != 0 . For type punning and reconstituting values of complex types that have been stripped by typeof! , we use pun , which is guaranteed to be an identity function in the computer but may have an effect on the logical value of the expression. If x: X then (pun x h): Y under the following conditions: - If h: $ x :> Y $ andY has the samesizeof asX . (The evidence thatx :> X is dropped.) - It has the same effect as (trunc x) ifX andY are numeric types with the samesizeof . - If X = (& T) or(own T) andY = (& U) , orX = (own T) andY = (own U) andh provessizeof U (own T) using assertions, although if x : (own (array u8 (sizeof T))) and T represents some predicate over simple data then you may be able to assert that (* x) :> T and then pun x to type (own T) . {{x : u64} := 1} {{y : u64} := 2} {h := (assert {x x_old}) (f (variant {... : $ x_old ))) This is especially useful for exhaustiveness checking in matches, see below. The (match x {a1 => b1} ... {an => bn}) command executes bi for the first i for which x = ai . The ai should be numeric constants (pattern matching on enums and the like are future work). In a pattern, one can use {a or b} to pattern match on multiple constants, {x with p} to evaluate a predicate, and {h : x} to capture the assertion that x matches the given pattern in h (and the assertion that x doesn't match the pattern in h in all subsequent branches). For example: (match {2 + 2} {{h : 4} => -- h: 2 + 2 = 4 here "all is well"} {_ => (unreachable (entail h (proof of $ 2+2 != 4 -> F. $)))}) (match x {{h : {x with (is_groovy x)}} => "x is groovy"} {{h2 : {1 or 2}} => -- h: $ ~ is_groovy x $, h2: $ x = 1 \/ x = 2 $ "x is not groovy but it is 1 or 2"} {_ => -- h: $ ~ is_groovy x $, h2: $ ~ (x = 1 \/ x = 2) $ "what is this???"}) The special case of bounded for loops is especially helpful for verified programming because it eliminates the need to prove variance, as well as prove that the loop counter does not go out of bounds. The syntax is (for {x : T} in {a .. b} t) , where annotating a and b as in {h : a} obtains proofs that x (or A B C) is equivalent to x :> A \/ x :> B \/ x :> C . The empty union is the never type or "void" type (but not C 's void !). The void type in MMC has no elements, and one can pass it to (unreachable) just like a proof of false. MMC does not have union declarations like C , and you cannot use field access with unions. One might wonder how a union type can be used at all, but the key is to note that the individual types in a union can be dependent on some other piece of data in the program. Many functional programming languages contain a discriminated union type, but here we can build it as a union type: (enum A B) = (or (list (sn {0 : u8}) A) (list (sn {1 : u8}) B)) To use a value of a union type, one must prove that the value is one of the disjuncts, and then one can use pun to reconstitute the value as the resulting type. We have already seen the (array T n) type in several examples. Unlike C, (array T n) is not a pointer and does not decay to one; the type represents the bits of an array directly. Because array is a large type, it is usually passed around behind a pointer type. - The function (index a i h) is the equivalent ofC 'sa[i] ; it has type(own T) ifa has type(own (array T i)) and type(& T) ifa has type(& (array T i)) . The hypothesish is a proof thati is in the bounds of the array. The variant(index a i) supplies(assert {i T . The underlying proof system is a model of separation logic instantiated on the target architecture (x86 for the present). The advantage of this is that it is possible to prove theorems and build programs at any level of abstraction. (Inline assembly syntax is TBD but can certainly be supported in the framework.) Proofs can be written directly inline in the MM1 language, or they can be proven as theorem commands at the top level of the enclosing MM1 file and referenced in the MMC program. The main tool that MMC adds to the MM1 proof architecture is the entail command: (entail h1 h2 .. hn p) produces a proof of a separating proposition $ Q $ given hi : $ Pi $ and assuming p is a proof of the entailment P1 * ... * Pn |- Q . If everything is independent of the heap, then it can be a proof of P1 /\ ... /\ Pn -> Q instead. The proc command declares a new top level procedure, and func decla

Genesis Park 편집팀이 AI를 활용하여 작성한 분석입니다. 원문은 출처 링크를 통해 확인할 수 있습니다.

공유

관련 저널 읽기

전체 보기 →