大端字节序码流中取出2字节
在这篇文章中,我们将看到如何为我们的语言生成字节码。 到目前为止,我们已经看到了如何构建一种语言来表达我们想要的东西,如何验证该语言,如何为该语言构建编辑器,但实际上我们还不能运行代码。 是时候解决这个问题了。 通过为JVM进行编译,我们的代码将能够在各种平台上运行。 对我来说听起来很棒!
建立自己的语言的系列
以前的帖子:
- 建立一个词法分析器
- 建立一个解析器
- 创建带有语法突出显示的编辑器
- 使用自动补全功能构建编辑器
- 将解析树映射到抽象语法树
- 建模转换
- 验证方式
代码在GitHub上的标签为08_bytecode
添加打印声明
在跳入字节码生成之前,我们只需在我们的语言中添加一条打印语句即可。 这很容易:我们只需要在词法分析器和解析器定义中更改几行,就可以了。
// Changes to lexer
PRINT : 'print';// Changes to parser
statement : varDeclaration # varDeclarationStatement| assignment # assignmentStatement| print # printStatement ;print : PRINT LPAREN expression RPAREN ;
我们的编译器的一般结构
让我们从编译器的入口点开始。 我们将从标准输入或文件中获取代码(将被指定为第一个参数)。 一旦获得代码,我们将尝试构建AST并检查词汇和语法错误。 如果没有,我们将验证AST并检查语义错误。 如果仍然没有错误,我们继续进行字节码生成。
fun main(args: Array<String>) {val code : InputStream? = when (args.size) {0 -> System.`in`1 -> FileInputStream(File(args[0]))else -> {System.err.println("Pass 0 arguments or 1")System.exit(1)null}}val parsingResult = SandyParserFacade.parse(code!!)if (!parsingResult.isCorrect()) {println("ERRORS:")parsingResult.errors.forEach { println(" * L${it.position.line}: ${it.message}") }return}val root = parsingResult.root!!println(root)val errors = root.validate()if (errors.isNotEmpty()) {println("ERRORS:")errors.forEach { println(" * L${it.position.line}: ${it.message}") }return}val bytes = JvmCompiler().compile(root, "MyClass")val fos = FileOutputStream("MyClass.class")fos.write(bytes)fos.close()
}
请注意,在此示例中,我们始终会生成一个名为MyClass的类文件。 大概以后,我们想找到一种为类文件指定名称的方法,但是现在这已经足够了。
使用ASM生成字节码
现在,让我们潜入有趣的部分。 JvmCompiler的编译方法是我们生成字节的地方,以后我们将其保存到类文件中。 我们如何产生这些字节? 在ASM的帮助下,ASM是一个用于生成字节码的库。 现在,我们可以自己生成bytes数组,但要点是,它将涉及一些无聊的任务,例如生成类池结构。 ASM为我们做到了。 我们仍然需要对JVM的结构有所了解,但是我们可以生存下来而无需成为专家的精髓。
class JvmCompiler {fun compile(root: SandyFile, name: String) : ByteArray {// this is how we tell ASM that we want to start writing a new class. We ask it to calculate some values for usval cw = ClassWriter(ClassWriter.COMPUTE_FRAMES or ClassWriter.COMPUTE_MAXS)// here we specify that the class is in the format introduced with Java 8 (so it would require a JRE >= 8 to run)// we also specify the name of the class, the fact it extends Object and it implements no interfacescw.visit(V1_8, ACC_PUBLIC, name, null, "java/lang/Object", null)// our class will have just one method: the main method. We have to specify its signature// this string just says that it takes an array of Strings and return nothing (void)val mainMethodWriter = cw.visitMethod(ACC_PUBLIC or ACC_STATIC, "main", "([Ljava/lang/String;)V", null, null)mainMethodWriter.visitCode()// labels are used by ASM to mark points in the codeval methodStart = Label()val methodEnd = Label()// with this call we indicate to what point in the method the label methodStart correspondsmainMethodWriter.visitLabel(methodStart)// Variable declarations:// we find all variable declarations in our code and we assign to them an index value// our vars map will tell us which variable name corresponds to which indexvar nextVarIndex = 0val vars = HashMap<String, Var>()root.specificProcess(VarDeclaration::class.java) {val index = nextVarIndex++vars[it.varName] = Var(it.type(vars), index)mainMethodWriter.visitLocalVariable(it.varName, it.type(vars).jvmDescription, null, methodStart, methodEnd, index)}// time to generate bytecode for all the statementsroot.statements.forEach { s ->when (s) {is VarDeclaration -> {// we calculate the type of the variable (more details later)val type = vars[s.varName]!!.type// the JVM is a stack based machine: it operated with values we have put on the stack// so as first thing when we meet a variable declaration we put its value on the stacks.value.pushAs(mainMethodWriter, vars, type)// now, depending on the type of the variable we use different operations to store the value// we put on the stack into the variable. Note that we refer to the variable using its index, not its namewhen (type) {IntType -> mainMethodWriter.visitVarInsn(ISTORE, vars[s.varName]!!.index)DecimalType -> mainMethodWriter.visitVarInsn(DSTORE, vars[s.varName]!!.index)else -> throw UnsupportedOperationException(type.javaClass.canonicalName)}}is Print -> {// this means that we access the field "out" of "java.lang.System" which is of type "java.io.PrintStream"mainMethodWriter.visitFieldInsn(GETSTATIC, "java/lang/System", "out", "Ljava/io/PrintStream;")// we push the value we want to print on the stacks.value.push(mainMethodWriter, vars)// we call the method println of System.out to print the value. It will take its parameter from the stack// note that we have to tell the JVM which variant of println to call. To do that we describe the signature of the method,// depending on the type of the value we want to print. If we want to print an int we will produce the signature "(I)V",// we will produce "(D)V" for a doublemainMethodWriter.visitMethodInsn(INVOKEVIRTUAL, "java/io/PrintStream", "println", "(${s.value.type(vars).jvmDescription})V", false)}is Assignment -> {val type = vars[s.varName]!!.type// This code is the same we have seen for variable declarationss.value.pushAs(mainMethodWriter, vars, type)when (type) {IntType -> mainMethodWriter.visitVarInsn(ISTORE, vars[s.varName]!!.index)DecimalType -> mainMethodWriter.visitVarInsn(DSTORE, vars[s.varName]!!.index)else -> throw UnsupportedOperationException(type.javaClass.canonicalName)}}else -> throw UnsupportedOperationException(s.javaClass.canonicalName)}}// We just says that here is the end of the methodmainMethodWriter.visitLabel(methodEnd)// And we had the return instructionmainMethodWriter.visitInsn(RETURN)mainMethodWriter.visitEnd()mainMethodWriter.visitMaxs(-1, -1)cw.visitEnd()return cw.toByteArray()}}
关于类型
好的,我们已经看到我们的代码使用类型。 这是必需的,因为根据类型,我们需要使用不同的说明。 例如,将值放入整数变量中,我们使用ISTORE;而将值放入双重变量中,我们使用DSTORE 。 当我们以整数调用System.out.println时,我们需要指定签名(I)V,而当我们调用它以打印双精度字符时,则需要指定(D)V 。
为此,我们需要了解每个表达式的类型。 在我们超简单的语言中,我们现在仅使用int和double 。 在真实的语言中,我们可能想使用更多的类型,但这足以向您展示这些原理。
interface SandyType {// given a type we want to get the corresponding string used in the JVM// for example: int -> I, double -> D, Object -> Ljava/lang/Object; String -> [Ljava.lang.String;val jvmDescription: String
}object IntType : SandyType {override val jvmDescription: Stringget() = "I"
}object DecimalType : SandyType {override val jvmDescription: Stringget() = "D"
}fun Expression.type(vars: Map<String, Var>) : SandyType {return when (this) {// an int literal has type int. Easy :)is IntLit -> IntTypeis DecLit -> DecimalType// the result of a binary expression depends on the type of the operandsis BinaryExpression -> {val leftType = left.type(vars)val rightType = right.type(vars)if (leftType != IntType && leftType != DecimalType) {throw UnsupportedOperationException()}if (rightType != IntType && rightType != DecimalType) {throw UnsupportedOperationException()}// an operation on two integers produces integersif (leftType == IntType && rightType == IntType) {return IntType// if at least a double is involved the result is a double} else {return DecimalType}}// when we refer to a variable the type is the type of the variableis VarReference -> vars[this.varName]!!.type// when we cast to a value, the resulting value is that type :)is TypeConversion -> this.targetType.toSandyType()else -> throw UnsupportedOperationException(this.javaClass.canonicalName)}
}
表达方式
如我们所见,JVM是基于堆栈的计算机。 因此,每次我们想使用一个值时,都会将其压入堆栈,然后执行一些操作。 让我们看看如何将值推入堆栈
// Convert, if needed
fun Expression.pushAs(methodWriter: MethodVisitor, vars: Map<String, Var>, desiredType: SandyType) {push(methodWriter, vars)val myType = type(vars)if (myType != desiredType) {if (myType == IntType && desiredType == DecimalType) {methodWriter.visitInsn(I2D)} else if (myType == DecimalType && desiredType == IntType) {methodWriter.visitInsn(D2I)} else {throw UnsupportedOperationException("Conversion from $myType to $desiredType")}}
}fun Expression.push(methodWriter: MethodVisitor, vars: Map<String, Var>) {when (this) {// We have specific operations to push integers and double valuesis IntLit -> methodWriter.visitLdcInsn(Integer.parseInt(this.value))is DecLit -> methodWriter.visitLdcInsn(java.lang.Double.parseDouble(this.value))// to push a sum we first push the two operands and then invoke an operation which// depend on the type of the operands (do we sum integers or doubles?)is SumExpression -> {left.pushAs(methodWriter, vars, this.type(vars))right.pushAs(methodWriter, vars, this.type(vars))when (this.type(vars)) {IntType -> methodWriter.visitInsn(IADD)DecimalType -> methodWriter.visitInsn(DADD)else -> throw UnsupportedOperationException("Summing ${this.type(vars)}")}}is SubtractionExpression -> {left.pushAs(methodWriter, vars, this.type(vars))right.pushAs(methodWriter, vars, this.type(vars))when (this.type(vars)) {IntType -> methodWriter.visitInsn(ISUB)DecimalType -> methodWriter.visitInsn(DSUB)else -> throw UnsupportedOperationException("Summing ${this.type(vars)}")}}is DivisionExpression -> {left.pushAs(methodWriter, vars, this.type(vars))right.pushAs(methodWriter, vars, this.type(vars))when (this.type(vars)) {IntType -> methodWriter.visitInsn(IDIV)DecimalType -> methodWriter.visitInsn(DDIV)else -> throw UnsupportedOperationException("Summing ${this.type(vars)}")}}is MultiplicationExpression -> {left.pushAs(methodWriter, vars, this.type(vars))right.pushAs(methodWriter, vars, this.type(vars))when (this.type(vars)) {IntType -> methodWriter.visitInsn(IMUL)DecimalType -> methodWriter.visitInsn(DMUL)else -> throw UnsupportedOperationException("Summing ${this.type(vars)}")}}// to push a variable we just load the value from the symbol tableis VarReference -> {val type = vars[this.varName]!!.typewhen (type) {IntType -> methodWriter.visitVarInsn(ILOAD, vars[this.varName]!!.index)DecimalType -> methodWriter.visitVarInsn(DLOAD, vars[this.varName]!!.index)else -> throw UnsupportedOperationException(type.javaClass.canonicalName)}}// the pushAs operation take care of conversions, as neededis TypeConversion -> {this.value.pushAs(methodWriter, vars, this.targetType.toSandyType())}else -> throw UnsupportedOperationException(this.javaClass.canonicalName)}
}
Gradle
我们还可以创建gradle任务来编译源文件
main = "me.tomassetti.sandy.compiling.JvmKt"args = "$sourceFile"classpath = sourceSets.main.runtimeClasspath
}
结论
我们没有详细介绍,我们急于浏览代码。 我的目的只是给您概述用于生成字节码的一般策略。 当然,如果您想构建一种严肃的语言,则需要做一些研究并理解JVM的内部,这是无可避免的。 我只是希望这个简短的介绍足以使您了解到这并不那么令人恐惧或复杂,大多数人都认为。
翻译自: https://www.javacodegeeks.com/2016/09/generating-bytecode.html
大端字节序码流中取出2字节