Skip to content

Files

Latest commit

728ab43 · Jul 31, 2024

History

History
1952 lines (1142 loc) · 150 KB

CG-06.md

File metadata and controls

1952 lines (1142 loc) · 150 KB

WebAssembly logo

Table of Contents

Agenda for the June meeting of WebAssembly's Community Group

Research Day

Registration

  • Fill out this form by May 24.

Logistics

Agenda Items

Session start times are not guaranteed. We may start sessions before their scheduled times if previous sessions end early.

Google calendar

Wednesday, June 5

  • 9:00am: Welcome and Introduction (Ben Titzer and CG Chairs)
  • 9:15am: WebAssembly.org makeover (Tom Steiner)
  • 10:00am: Custom annotations (Andreas Rossberg)
    • Vote to advance to phase 4
  • 10:30am: Break
  • 10:45am: Branch hinting (Yuri Iozzelli)
    • Vote to advance to phase 4
  • 11:15am: Compilation hints (Emanuel Ziegler)
  • 11:45am: FP16 (Ilya Rezvov)
    • Vote to advance to phase 2
  • 12:15pm: Lunch
  • 1:15pm: Experience report: Compiling Scala.js to Wasm (Sébastien Doeraene)
  • 2:00pm: JS string builtins (Ryan Hunt)
    • Vote to advance to phase 3
  • 2:45pm: Break
  • 3:00pm: Memory control (Ben Visness and Deepti Gandluri)
    • Vote to advance to phase 2
  • 4:00pm: Exception handling (Ben Titzer and Heejin Ahn)
    • Possible vote to advance to phase 4
  • 4:45pm - 5:00pm: Slack time

Thursday, June 6

  • 9:00am: SpecTec DSL (Andreas Rossberg)
    • Vote of intent to adopt SpecTec in spec infrastructure
  • 9:45am: Typed Continuations (Frank Emrich)
  • 10:30am: Break
  • 10:45am: Bag-o-stacks (Thomas Lively)
  • 11:30am: On the growing complexity of WebAssembly (Andreas Rossberg)
  • 12:00pm: Lunch
  • 1:00pm: Large Wasm binaries from desktop application migrations (David Sankel)
  • 1:45pm: Shared-everything threads (Thomas Lively)
  • 2:30pm: Break
  • 2:45pm: Memory64 (Sam Clegg)
  • 3:30pm: WASI / component model (Luke Wagner)
  • 4:15pm - 5:00pm: Slack time

Meeting notes

Wednesday, June 5

Attendees

  • Deepti Gandluri
  • Ilya Rezvov
  • Thomas Lively
  • Derek Schuff
  • Dan Gohman
  • Kirk Shoop
  • Robert Seacord
  • Michael Ficarra
  • Brendan Dahl
  • Emanuel Ziegler
  • Ben Visness
  • Ryan Hunt
  • Alexa VanHattum
  • Chris Fallin
  • Jamey Sharp
  • Till Schneidereit
  • Adam Bratschi-Kaye
  • Thomas Steiner
  • Heejin Ahn
  • David Degazio
  • Nick Fitzgerald
  • Luke Wagner
  • Sam Clegg
  • Mattias Blume
  • Adam Klein
  • Chris Woods
  • Mats Brorsson
  • Yuri Iozzelli
  • Bailey Hayes
  • Wavid Bowman
  • Suhas Thalanki
  • Andreas Rossberg
  • Alex Crichton
  • Andrew Brown
  • Ayako Akasaka
  • Francis McCabe
  • David Thompson
  • Yury Delendik
  • Sébastien Doeraene
  • Zalim Bashorov
  • Jeff Charles
  • Lann Martin
  • Elizabeth Gilbert
  • Alex Bai
  • Calvin Prewitt
  • Daniel Macovei
  • David Sankel
  • Frank Emrich
  • Linh Nguyen
  • Linwei Shang
  • Mendy Berger
  • Nathan Egge
  • Nimish Jindal
  • Oscar Spencer
  • Ricky Vetter
  • Sam Estep
  • Sean Isom
  • Shival Lamba
  • Tom An
  • Ty Overby
  • Robin Brown
  • Saul Cabrera
  • Steven DeTar
  • Heather Miller
  • Keith Winstein
  • Paolo Severini
  • Alon Zakai
  • Sam Lindley
  • Slava Kuzmich

Welcome and Introduction (Ben Titzer and CG Chairs)

WebAssembly.org makeover (Tom Steiner)

[https://docs.google.com/presentation/d/1eQsVaPJCLpBMn2zCyNZeGZ2HH2ytVI5L6_dkI19hjug/edit#slide=id.p]

URL: https://goo.gle/webassembly_org-makeover Inlink stats from the slides: https://docs.google.com/spreadsheets/d/1Uj384n4xiHa0t3SKxB1DQQL-dQ1jQl8RDMgaq-Px8Ow/edit?usp=sharing (Please request access)

Jamey Sharp: What resources are available for someone hosting server-side code?

TS: Don’t know, we could have static hosting for free, we have folks from Fastly here too - this is open for discussion. Should we make it neutral? Should we make it sponsored? Do we need a server component at all? This will fall to whatever working group model - nothing is set in stone, only presenting ideas for now.

Sam Estep: Having compiler explorer using Wasm running in the browser would be a good selling point, and we can already translate wat to wasm using binaryen

Wavid Bowman: People like being able to compile code and push a run button and see it running. The wat format making it human-readable is great, but people want to see it running too.

David Thompson (chat): hoot's wasm toolchain could be run in the browser without too much additional work to do wat->wasm and such

AR: Couple of things - this is great, we should do something because the webpage is outdated. However, it's ironic that there’s a lot of unmaintained links, and we’re suggesting having a lot of features, but that may also lead to it being not maintained. If we have limited resources, we should make sure that we use it for its primary purpose which is documentation - so let’s keep it simple and updated.

Ben Visness: On documentation: How much documentation do people want on webassembly.org, versus having documentation from the various toolchains. The toolchains are already writing their own docs, so what role should webassembly.org play?

AR: What I mean by documentation, is not just documentation in the sense of documents, but some of the useful info that should be easily discoverable and well presented like proposals, feature detection, and the actual spec and the spec page that we can surface better. One thing people have complained about before is having the spec versions be more discoverable. It's not just documents but everything around documents

SE: If we’re thinking in terms of making this less ambitious and better scoped, one suggestion is that making the features page more up to date would be impactful - automatically detecting features for example

AR: Updating the first line that says “WebAssembly 1.0 has shipped” might be good ;-)

Ben Titzer: Agree with Andreas, we shouldn’t be too ambitious here. Link to Compiler Explorer and other resources, and give just the facts.

ChrisW: Great idea to update the slide - there’s two audiences, one is us - implementers, and authors etc. and the other is new users - helping folks onboard new users, new toolchain authors, etc. There could be some features with advanced users for debugging, setting up toolchains IDEs, etc.

BT: I agree, but the reason I think we should do docs first is that it’s easier to get done. Getting up to speed with where we are on the current spec seems easier to achieve.

TS: About documentation, documentation for the web means going to MDN - for the web it might make sense for centralizing this on MDN (different companies already contribute through Open Web Docs​​). W3C had an effort called WebPlatform.org which failed - if it might make sense to centralize all the documentation. There could also be different personas, and what you would see could depend on what kind of user you are.

Deepti: There’s a lot of documentation that is relevant to only subsets of users, like WASI or Emscripten for example, so I’m curious what people want to see, and what signals we have for that. Toolchain documentation is easier to keep up to date at the source, so we might focus on redirecting to that documentation.

Till: I agree with that and about scoping things down, because we as a group are not the best people for maintaining a complex website that has requirements about being kept up to date. Historically we haven’t done well at keeping content up to date. If we add any new features to the site, they should come with a commitment by some party, not just an individual of some company, but some organization saying they are on the hook for keeping the feature up to date.

Suhas: I like the idea from my perspective of just starting with Wasm. It would be nice to add component model docs to the website because that's such a big part of Wasm. In addition, it would be nice to surface links to WABT, wasm-tools, etc.

Lann Martin (chat): It might be helpful to make it more obvious how to contribute to the website, e.g. a prominent link to the site's own repo.

TS: We could also add iframes - iframe the tool and have the tool author agree to surfacing the information that way.

Mats Brorsson: Is there an opportunity to auto-generate content from Github and other sources? That might help us keep it up to date.

BT: Like the idea but in practice its harder to do

Deepti: In terms of who should take responsibility, even if one organization isn’t on the hook, we could consider having Subgroups take responsibility for keeping documentation up to date.

TL: Thoughts about having a Subgroup make this more concrete? Perhaps we could write a charter and define milestones. What would people think?

Lann: For a Subgroup, perhaps we should take a poll to find out who would personally contribute to it?

TL: If we had such a subgroup, raise your hand if you’d be interested in attending that meeting.

DS: A full phased process seems too heavyweight for this. Websites and documentation are easier to iterate on than the spec and the changes are reversible. Let's do something lighter weight.

BT: It might be easier to have engines provide features they support in a form that could be scraped.

TS: Process question: if we were to form such a working group and come with recommendations, but the people who have not contributed to the working group fundamentally disagreed with the recommendations, this would be a problem. What would happen then?

Wavid Bowman: We could do what legislative bodies, where the full body has the power to only accept or reject proposals from the lower body, where proposals have to be formed by the lower body.

SE: My reaction is that it seems like a lot of process for something that’s easy to change.

CF: The consensus building is important because there should be a start to bottom approach - for example for people using WebAssembly, toolchains using etc. There’s a larger group that we can come to with lists and consensus etc.

Mattias Blume: To separate concerns about content and presentation, if there’s something missing or something that shouldn’t be there, the committee should have veto power, but if it’s just visual or usability of the website, that shouldn’t hold up the process.

Deepti: I’m curious if there’s a need for more process at all. Given that the website is already out of date, I would hesitate to add more process. If we reduce the amount of process, it may be easier to keep the website up to date.

TL: Instead of doing a redesign, maybe go through it and start making it up to date.

WB: How big of a problem is it if we have incorrect or inappropriate information on the website?

TS: One of my worries about making it iterative would be that if you want to make radical changes, that would require restructuring. Do we want to rewrite from scratch, or iterate? Iterating would limit the more radical changes. Sometimes it's easier to just throw away what you have and start fresh than work on something completely outdated. I haven't looked into the the current website structure, though.

DS: If you fix a typo, no one will complain, and if you want to add something completely new, there's nothing stopping you from getting a 15 minute CG meeting slot. That would at least give people opportunities to object. Perhaps minor fixes could just be committed, but bigger features could go through the CG?

MF: We’re conflating 2 problems: There's the uncontroversial problem of updating outdated documentation, and general improvements to the website itself. It sounds like we want to reduce scope, not increase it. Many of the suggestions made were to already-extant resources around the web. We can link to them for now. We shouldn't use the outdated content issue to motivate taking on these additional responsibilities.

TL: Any closing last thoughts?

ChrisW: Ben said he tested a large number of Web engines. When we have this Subgroup set up, perhaps we should reach out to some of the implementers; we may find more people who could commit to keeping the docs up to date. That would help with the adoption of WebAssembly overall.

BT: I found a website that lists all the engines, and I sent a PR to the website, but it took three months to land. We should have a solution where our burden is low and engine authors' burden is clear. We should reward the folks working on the engines, or other pages with more traffic so our stuff doesn’t get out of date and their stuff also gets visibility.

SE: I like the idea of having organizations committing to donating resources for this. Seems like we could accept donations when organizations make them, and do minimal maintenance otherwise.

AK: The immediate problem is that the website is very out of date. If we have an order of operations, start with surveying what's there, and update the documentation, and then come back to adding new features. We’ll then have a structure in place that is responsible for maintenance as well.

David Thompson (chat): a simple static site with good docs is all that is really needed, in my opinion. The feature matrix is particularly important to me. I often forget which version of which browser supports which proposal.

Custom annotations (Andreas Rossberg)

Andreas Rossberg Presenting Custom annotations slides

DT (in chat): Do annotations have to be valid s-expressions?

AR: It has to be a valid token, and it should be valid in nested structures, but apart from that it’s pretty free form.

AR presenting current status

Adam: Are there planned uses other than branch hinting and the names section?

TL: The compilation hints proposal is also going to use it. Non-standards track, we also have things like the linking section. And individual tools like Binaryen have a lot of ideas for information we could stick in custom sections, and we look forward to being able to do that.

SE: Making the binary format in sync with the text format, does that mean any arbitrary custom section with annotations will need to be mapped to a text format and vice versa?

AR: There’s the generic fallback which is the generic syntax, but that may not roundtrip properly, such as if the custom section depends on byte offsets, or depending on the tool that parses it. It’s up to the designers of custom sections to ensure that they roundtrip properly.

FM: Do you have some suggestions for making a custom section round trippable - if you’re not going to use a binary offset, what would you use?

AR: The primary example is using binary offsets of instructions. The alternative would be to have a count of instructions, which have been used in various places.

DS: In defense of custom section designers that have done this "wrong", that requires tools to be able to parse instructions.

AR: Right.

YI: Originally branch hinting was using instruction offsets, but at the time there were ideas for features to disable parts of a function based on feature detection. It’s not proposed anymore, but instruction indices wouldn't work with those ideas because the indices would depend on the features.

AR: That would create similar problems on other levels.

BT: It’s more convenient for things at the binary level to have byte offsets.

DS: Even if you are going to use binary offsets, using them in terms of section or function offsets it is easier than using arbitrary binary offsets

YI: For referencing instructions, let’s suppose we solve that, but for other things, hopefully most things don’t need to depend on the binary format that much.

DD: To clarify, for things like @name annotations, are we proposing standardizing that too?

AR: The name section is already defined in the name section of the standard in an appendix, so this will also go there. I wouldn’t mind moving some of these features into separate documents, otherwise the main document gets cluttered and there’s the question of which features go in the main document versus the other docs.

TL: Those are the only two concrete annotations that are included in the proposal, the "name" annotation and the "custom" annotation.

DD: If there are annotations that don’t conform to the specific annotations mentioned here, and tooling can add different annotations, it bypasses the standards process to some extent and it would be nice to not do that. It's good to set the precedent to include this in the standard annotations

DS: There’s an important distinction between conventions used by tools and conventions used by engines, in particular engines required to support features in perpetuity. When we get to branch hints, we have to commit to an interpretation in a way that’s more permanent than the linking section.

DD: Mostly want to draw attention to the fact that these could be used for anything, and the tools can then use them - just want to draw attention to this.

Ryan: I don’t think Firefox is going to support any kind of custom section unless it’s required by standards.

AR: There’s a distinction to being standardized, and being in the main document, the branch hinting document will include a separate document, but it will be a standard but also have a different document. In terms of custom sections, they are custom and it is more optional and can be supported optionally

DD: That’s clear. For names, it’s good that the standard is calling it out explicitly. On the tooling side, I think it's good to announce major non-standard custom sections that people are going to use.

AR: In terms of actual custom custom sections that tools use, it would be nice to have some convention to pick names for them to avoid name clobbering. We as a community could define these conventions, but that is orthogonal to the custom sections problem itself.

Dave Thompson (chat): So would the name section be populated by "name" annotations rather than the existing method of extracting names from the symbolic ids of various things (types, funcs, etc.)?

AR: How you construct the Wasm module is up to the tools, there’s no prescriptive ways on how to do it.

POLL: Custom annotations to Phase 4

SF F N A SA
14 30 8 0 0

TL: That is consensus for Phase 4.

Room: applause

Branch hinting (Yuri Iozzelli)

YI presenting branch hinting slides

  • not in the core spec doc, nor in any other existing doc, but in a new "metadata" doc
  • Code Metadata logically attached to instructions
  • binary format: group by function, then use byte offsets starting from function beginning
  • text format: annotation immediately precedes the instruction it's attached to
  • why did it take so long? first proposal added new instructions, but did not reach consensus
  • so, then the question became how to use a custom section in the actual spec
  • because now the reference interpreter needs to actually understand the custom section
  • currently phase 3; phase 4 requirements status: everything done except CG consensus

Live spec draft: https://webassembly.github.io/branch-hinting/

Adam Klein: What does the test suite look like?

YI: There is a directory in the spec tests for custom section tests. The reference interpreter has an option for enabling and validating custom sections. The tests check that round trips work. Also there are a few checks that binary offset refers to something, correct order, stuff like that.

David Sankel: This is for performance optimization, right? We tried adding these hints to our C++, and it turns out there's no way to actually put them in the right place to use them to improve performance; have you benchmarked to see that these actually help?

YI: You are correct that it’s difficult for programmers to decide, but there are other ways to get this hint that are more precise, such as profile-guided optimization. The main use case for the proposal is the CheerpX JIT compiler for x86. It generates Wasm modules at runtime, and there’s a lot of assumptions about how the code executes, so there are many very unlikely slow paths. We cannot emit good code for those today because we cannot separate the unlikely code from the hot path. Structured control flow works against you. With this feature, engines can decide where to put things. In V8, if the unlikely path is inside of a loop, it can move it outside of a loop and tweak register allocator weights. There’s a big effect, in our product we see 7-8% gain.

David Degazio: Why is it strictly necessary to use byte offsets instead of instruction indices?

YI: It's not strictly necessary; earlier iteration used instruction indices, which have some good properties (e.g. no need for a specialized text format). It's a bit more annoying as an implementer to keep track of instruction indices, but not a huge deal. Other reasons: back then, there was a potential proposal for partial parts of a function that could be disabled, so then there's no nice linear ordering of instructions. Also, other non-standardized features like debugging info with DWARF; those also use byte offsets. Those are different (start from code section start instead of function start).

Mattias Blume: Instead of byte offsets relative to function beginnings, why not use the section-relative offsets already used in the linker section?

YI: Good question; mostly to reduce the size a bit. These indices are all encoded with ULEB, and not all the functions will have annotations; so, the indices would be quite sparse. For DWARF, it's less of a concern for two reasons. One, section offsets are more similar to how DWARF works for native (the format is pretty much 1:1), and they use absolute offsets. Two, you're not shipping the debug info to production, but you would for branch hints.

MB: Second question is, so likely/unlikely, is it one bit? Why not go all the way and have a branch probability?

YI: That was also considered; it could make sense. The idea there is that at the very end, what you do is pretty much have a choice. Either do nothing, or separate out the unlikely code. So the final decision the engine makes is usually binary like this. It's useful to have probabilities before that (compile based on if code is in a loop, multiple branches, etc) and you can do all that. But usually it boils down to, "80%: unlikely". These are emitted at the very end of the compilation pipeline.

MB: Yeah I was thinking, first of all if you get information from PGO or LTO, you’d have execution frequencies which would give you probabilities. You could be smarter about where to focus optimization effort.

YI: Yeah, it's possible, but just saying, in practice, the way V8 and also JSC work is quite simple. Branch hinting is already there. For example, stuff like branch-on-null has a built-in unlikely branch for it to be null. We used that in our initial prototype, and it was actually faster to use that built-in hint than not. So V8 has these three states already built-in, and we've done the same thing.

Chris Fallin: These two things are orthogonal. Probability is a measurement. That could be useful for tools, but the hint is something else, a decision that has been made. Could we have both?

MB: It could make sense to have execution frequencies.

ZB (in Chat): Shouldn't compilation hints be a generalized version of this one?

YI: In principle yes, but this one came first. Compilation hints is the next item here and will use this format here that we made.

DD: Why is this specifically expecting either 0 or 1? In general a source language lets you not necessarily be boolean; for instance, br_table. Some C++ codebases expect that the thing we're switching on is a particular value. In the binary encoding, we have an u32, so there would be space to encode additional indices. Why not include that here?

YI: Yes so, it’s possible to extend this. But br_table is trickier to figure out. V8 wouldn’t have a nice way to do anything special. This was the most conservative feature. If there is a need, we could expand this. Either by adding more values or with an expected value with the same format. We tried to make this generalized format to be compatible with more things coming in the future. We had decision paralysis at the beginning, but then decided to just do this one thing that we know is useful.

DS: I was gonna mention something similar. LLVM has a number that ranges quite a bit. likely/unlikely results in very large numbers. Which is why it’s not useful. I'm worried this will fall into the same situation.

YI: The idea is not that you go from C++ "unlikely" and map 1:1 to one of these annotations. Toolchains decide what to do with this and when to output it. For instance Cheerp uses the probabilities in LLVM, and idk what the threshold is (99%?). If this came from C++ or figuring out from CFG, the toolchain decides, maybe wrong, maybe not, maybe use PGO, it's up to toolchains.

TL: Upstream LLVM doesn't use this, right?

YI: No. I had a POC that this would work, but it's not complete. It should be a matter of just doing the work. I don’t think it’ll be a lot.

Ben T (in chat): One way to think of this is, as Chris says, it's a directive (for code layout).

DD: I'm surprised that runtimes are not able to make use of frequency info. Anyone could make use of branch likelihood/unlikelihood, anyway. JSC has floating-point frequencies for different blocks and makes use of that. It's pretty established in compilers. So if both the runtime wants/benefits from this, and LLVM benefits from this, it seems like useful info. Especially if PGO produces fractional probabilities.

YI: V8 works with three states. The idea there was that there was a gap we needed to bridge. LLVM natively would do one of two things: treat them the same or put the other far away. That decision uses probabilities, but the actual behavior would be one of two things. We want this choice to be made at this point. The idea is that we don’t want the engine to make the decision again, we just want the code layout here.

DD: This seems kind of limiting. In terms of block frequencies, LLVM can say "this block is unlikely", but a JIT compiler gets real profiling data e.g. for PGO. I don't know if it's used today, though. I think it's incorporated a couple places in higher tiers. A static directive, plus real-time profiling, mixed together, seems like the optimal solution?

YI: I suppose that could make sense.

DD: Yeah, seems like a useful capability. Some capability to incorporate; maybe not on or off, but a lower granularity.

YI: That could work. We based this off of what we saw was available. Firefox supports branch hinting in a limited way.

TL: Almost at the end of the scheduled time; let's do the poll.

SF F N A SA
6 27 16 0 0

TL: Okay, proposal is phase 4.

Room: applause

Compilation hints (Emanuel Ziegler)

EZ presenting slides. Notes on content not on slides.:

  • Recap: goals
    • Sometimes collecting runtime feedback is too expensive
  • Design principles
    • Resource constraints, e.g. mobile phones might not to inline everything all the time
    • There are some ways using static analysis but they don't guarantee inlining vs not
  • Compilation order
    • E.g. if we know ahead-of-time that a function is going to be called soon
    • Hotness to know whether to skip lower tiers and go straight to the top tier
      • Kind of ill-defined, like should you do lowest tier then highest tier? Or skip all tiers?
    • You don't have to put these hints on everything; can just say "I don't know" for some
      • Don't want to blow up the module size with useless info
  • Inlining
    • In order to go straight to the top tier in V8, we need additional info
    • Saying you want to inline at a specific callsite, not everywhere
    • I don't really like that this is in the same section because it's different scopes
      • Maybe separate sections for function inlining and callsite inlining
    • Should we use floating-point instead of integers for call frequency?
      • So then maybe "do not line" and "try your best to inline" would be infinities?
    • Standardized 0 or 127 values might not matter because the probabilities are so low, but perhaps useful anyway for knowing when to completely ignore
  • Call targets
    • Using percent, but might be nice to use floats
      • We don't need that level of detail, but just for consistency
    • There's no way of expressing a call_ref into another module
      • So we currently can't express that, can't inline that
      • That's a limitation: Wasm just doesn't know about other modules
    • You don't have to encode all the callsites, only the relevant ones.
  • Open questions
    • From discussion on branch hints proposal, seems like people want quantitative info

Ryan Hunt: Does the hotness have a unit associated with it?

EZ: I want to resolve this. I want it to be an actual unit. But I don't know what would be the best unit for it.

Mattias Blume: This makes things way more complicated, I realize. Compilation order, could it be conditional? If this function, then this function should be compiled.

EZ: Yeah, we don't have that. I understand the need: say you're using a function that's not commonly used, but is common in your particular application. Could be interesting, happy to discuss. I see usefulness, but it's hard to figure out the trigger.

RH: Does this call_ref include the call target?

EZ: Especially for WasmGC, we do a lot of speculative inlining.

David Sankel: Could encode an import name to refer to functions in different modules.

EZ: But many of those import names are pretty long.

TL: But an imported function has an index, so the module can just import the function to get an index for it.

RH: One thought on supporting calls to other modules: would it be possible for toolchains to say I want a callsite for something in another module, import it but then not actually use it?

EZ: Could be.

RH: Other hints that might be necessary: I was trying to understand the inlining hints which say whether or not to inline. You also have callsites as a separate thing. But if a producer generates something where you have one callsite that's called a million times, but also says "don't inline", what to do with that?

EZ: The original proposal was actually based on V8 does, which didn't have direct call inlining at the time. But this doesn't really work that well for static analysis and toolchains that might not have that information in the first place. But I also didn't want to overload the information in the section. There might still be a benefit to having different callsites even if you don't inline, because the engine might inline on its own.

RH: Yeah, just trying to figure out the overlap. There is a profiling cost to collecting call site information, and it would be useful for certain languages to opt out of that cost. To specify "polymorphic" and go straight to that highest tier.

EZ: So, in this case we don't have a special value to say these are the top callsites you should be inlining, and ignore all the other ones. You'd have to make up numbers. We could change that: a special value that just assigns equal parts to these.

RH: If you specify that a function is extremely hot, but you didn’t specify inlining information, would V8 compile it to the top tier or assume that it still needed to collect profiling information?

EZ: So as I said, we have call target information but not inlining information.

RH: Sorry, suppose a function is specified as hot, but producers intentionally didn’t give any hints for inlining or call sites.

EZ: Up to the engine to decide. I think it's not smart to do that; we can do better than what the hints give us. At the top tier, we don't collect inlining information. So, once you are there, you are stuck there, you can never get out, because you're just not collecting that.

Ben (in Chat): Another big benefit of inlining is to be able to specialize the context, especially for big functions where many functions can be gotten rid of.

EZ: Current inlining, we don't know that ahead of time. It's just a huge function. Potentially that could cause problems along the line. If we knew that most of it actually goes away, because of the conditions under which it's being called, that could be helpful. Dunno how to express that. Maybe "this is the effective wire bite size.

Zalim: I think it would be nice to have a hint for the lazy initialization pattern, where usually you have a function called only once, then you will call another function. Currently it will be covered by the call target hint, probably it could be covered by this hint, but it would be nice to have a special one for this.

EZ: I'm not sure I fully understood the idea. You have functions called only once for initialization, then never again?

Zalim: One way is to call a function by reference, after that we replace it with a second function that does nothing. At the engine side it looks like virtual calls. Usually engines don't make many optimizations on call sites. The hint would explicitly say to the engine that the first function is called only once, after that it’s always the second one.

EZ: Yeah in such a scenario we'd like not to inline. Not sure how to encode that nicely. Maybe say that you go into the baseline code, and once the calls are finished to that function, if only once then OK that's easy to figure out. Once initialization is done, go to the top tier and ignore that info. Encoding, I'd have to think about that. But I see the use case.

EZ: That's it. Thanks for discussion, look forward to more discussion.

FP16 (Ilya Rezvov)

IR presenting slides. Notes not already in slides:

  • Presented the same talk three weeks ago, not much new.
  • Motivation
    • ARM support very good, x86 not as good
  • Game plan
    • No F16 scalar support proposed because of high implementation cost, and no known use cases or requests from partners for it
  • Hardware support and emulation
    • Hope for broad x86 support in maybe 3-5 years via AVX10
  • Implementation status
    • No performance numbers yet

Sebastian: Should storage type be extended to include f16 for GC instructions?

IR: Good question. Whenever we have any scalar version of f16, we currently just use f32 and we need special f16 versions for arrays.

DG: For emulation when you don’t have Sapphire Rapids ISA, is the rounding the same under emulation as under native instruction? Is that compatible with the optimization path that folds those instructions?

IR: Yeah.

DG: Is that still an issue under optimization?

IR: Not implemented yet. Good point; there could be less rounding. So we might have less optimization opportunities.

David Degazio: Seems like a very aggressive amount of hardware dependence: condemning anyone without AVX512 to very subpar performance, without any control over it. There's sort of a core of this that is pretty defensible, specifically f16 to f32 conversion. Generally this is very well-supported on Intel and ARM (big asterisk). Peers at Apple would be far more supportive of this. Dramatically different performance on different hardware, very niche hardware like AVX512.

Deepti Ganduri: I think that there is a part of this, you’re right about conversions. We don’t try to split these instructions apart. Hardware adoption is unpredictable. ARM64 has a good subset. Supporting only a subset of instructions leaves you in a bad situation where you can’t use these. Wasm standardization moves slowly. Since we can't feature detect between different versions of fp16 support, it's better to make it future proof and look at support in newer hardware that will be widely adopted in the future. THat's the approach we've taken in the past as well.

DD: Is the expectation that everyone is running AVX512 in the future?

Deepti: No, but AVX10 is expected in the future and will have f16 support.

DD: I think if we're in that future, I'd feel very differently; it would be similar to a lot of other SIMD proposals. I definitely get the concern of, "all these ARM chips have support for fp16", but that's strictly a hardware dependency, and goes philosophically against the principles of Wasm being hardware-agnostic. If there was feature detection, where you could hand-tune your own conversion without the engine doing it for you (some engines don't have time to optimize), then opt into f16, that's a lot more reasonable. As it stands, though, this would have to be adopted as a monolith.

IR: agree that feature detection would be nice. On either the Wasm or JS side. Feature detection could help decide which binary to download. Agree with Deepti, we don’t want to stay on the common denominator of hardware, we want to support modern hardware. The gulf between them grows every year.

CF: On the x86 side, we’re even more aggressive with backwards compatibility and have fallbacks all the way to SSE2. Without F16C, how bad is emulation?

IR: If we don't have F16C, it's bad. You basically can only emulate two instructions (neg and abs) for floating-point. But it's a really common extension, was released in 2007, it's almost everywhere.

Sam Estep: question about load/stores instructions. Weird in context of fx64 promote/down, with loads/stores. Do they round to infinities, how do they handle out of range?

IR: We really don’t want a new scalar type, so we have fused the load/store and promote/downgrade. We need to standardize that behavior.

CW: Q about conv operators. If you just have conv operators to f16, what could you do with the f16 vectors? Is that equivalent to not having f16 vectors?

IR: Yeah, without arithmetic operations, you can reorder lanes, maybe?

CW: That doesn’t seem like any code would want to use conversion operators then.

DD: Definitely not ideal, but it would still let you store f16, but manipulate them as f32 or f64, which is more or less the current situation without AVX512, but more explicit for people hand-optimizing their Wasm. On ARM it's strictly worse, which is unfortunate, but it matches the intersection of the two architectures.

CW: Seems like you’d just not want anything in this case, just make people use f32.

DD: I could abide by that. Under the assumption that we want this proposal, I think it needs to come with some sort of conditionalization or something. I get that it cuts into the value if you don’t have f16 operations, but it's clear that many Wasm users don’t have f16 on their hardware.

Deepti: One thing to add there. For devices that can emulate f16, it’s not a terrible lowering. If you see the implementation, it’s about the same as the original SIMD proposal. If you do have f16 emulation, it still seems like a reasonable fallback. The Intel architectures are just uneven and require some emulation. We haven't gated on that in the past. As long as it's not software emulation with dozens of instructions, that still seems reasonable.

Andrew (on Zoom): Just based on what Deepti just said, emulation is a good approach for fixing irregularities. When I just joined the Wasm community, it was with the SIMD proposal. What I observed there (being very naive) was that that rationale was used to cover over a multitude of things that I realized later were more than just covering irregularities. It was picking ARM semantics instead of trying to find common ground between the ISAs. I'm kind of OK with the idea that future CPUs will have AVX10, but I'd like to see the lowerings to the AVX10 instructions, and to the ARM instructions, for these Wasm operations, because I kind of feel like I got bitten the last time this happened.

Deepti: Clarifying question: What we did in all those cases where the lowering wasn't exact was to verify that the performance was still portable. Would you agree with that?

Andrew: I didn’t like that bar being set. We should look at peak performance for each chip, but x86 was hindered because the chips happened to be faster than the ARM chips we were comparing against. That doesn't feel fair.

Deepti: I understand not liking it, but it is giving you somewhat similar performance across architectures. What would your ideal criteria be? For instance, some ISAs have different semantics for what looks like the same instruction.

Andrew: That discussion is a deeper one than the one we have for f16. I think there are alternatives to what happened with Wasm-SIMD.

Deepti: Still relevant.

Andrew: Absolutely. I don't think the same bar should be used for this.

Deepti: If you have an opposition to the bar, be specific and say what you'd like it to be.

Andrew: I'm not prepared right now with slides for that. But I'd love to see the versions for both ISAs for this proposal.

CW: If the ideal case is we expose the same peak perf of intel/arm, then we need to have non-determinism to support the different semantics. Relaxed simd went that way. It’s not realistic to stick with peak performance and still have determinism.

Andrew: True. I just want to point out that, in the majority of cases where a compromise had to be made, it was x64 that paid the price, and not ARM.

Deepti: And the relaxed SIMD proposal as a follow-on addressed most cases.

Poll:

SF F N A SA
3 15 24 7 0

Andrew: comment on my against, would want to vote for, but need to see the instruction lowerings and see if x64 is going to be negatively affected.

CW: I don’t think we reached the phase 2 criteria.

DD: One final word from me. Lots of potential, but as a monolith it’s not worth doing. If we had feature detection we’d be much more supporting. There's a kernel we're in favor of, but not the whole thing.

Alex Crichton: We need to find balance between hardware performance, feature detection, non-determinism. This is a big open question.

DD: A 2x performance difference between architecture seems counter to the core of Wasm to bake into the spec.

TL: we’re not going to declare consensus. Please engage on this to help us move this forward. Good to hear this feedback now. Let’s figure out concrete next steps.

Experience report: Compiling Scala.js to Wasm (Sébastien Doeraene)

SD presenting slides.

  • Presenting a few unusual things from compiling Scala.js to wasm
  • Scala introduction: originally designed for the JVM, fusion of object-oriented programming and functional programming. More relevant for today is that it has 3 targets that are well supported including JVM, JavaScript, and Native.
  • Scala.js: Distinction between different targets and language semantics as much as the backends that we are using. As a language, it is not specific to browsers but widely used on the platform. It is designed to be portable, interoperable, and efficient. The source code and tests are the same and execute the same across platforms.
  • Scala native is a dialect of scala with the same set of key goals but designed to be interoperable with C code. Why not discuss scala native and instead Scala.js?
  • Scala is a garbage collected language. For Scala.js we rely on the gc semantics of js.
  • We chose to start from Scala.js to compile the semantics of scala.js to Wasm and WasmGC.
  • Scala.js and js may have constant crossing of the border between the two. Implementation consequence is that the representation of an Int must be a JavaScript number. This is different from the JVM where it is an instance of java.lang.Integer.
  • Even though we have strong interop guarantees, we have limits. The ScalaClass methods are not visible unless methods are annotated with @JSExport. A class is exported for JS to instantiate with @JSExportTopLevel.
  • What we use from WebAssembly includes:
    • GC extension is used, perhaps every single op-code.
    • Exception handling including WebAssembly.JSTag.
    • The JS API
  • We don’t use linear memory or tables.
  • GC type use
    • Objects (e.g. jl.Object, scala.List) use structs
    • But numbers…can we do (struct jl.Integer)? No, it needs to be a JS number. How to put i32 into an anyref?
    • With the following:
      • JS: function boxInt(x) { return x }
      • Wasm: (func (import “scala” “boxInt”) (param i32) (result anyref))
    • Boxing via the JS API!
      • [explanation of behavior in ToJSValue and ToWasmValue]
    • Unboxing is achieved in a similar way
  • How is that not terribly inefficient?
    • This is effectively a no-op, the same pointer passed through the boundary. From i32 and back is not free but it is not updated in most cases.
  • [Virtual dispatch with primitive receivers]
    • Dispatch achieved with tests.
  • [Data types with universal representation as JS primitives]
    • [table from scala types to primitive representation and universal representation]
    • Note that the primitive string is used
    • This is why we have many helpers and would be better served by the JS string builtins proposal that we’ll talk about later today.
  • [VM-supported exceptions]
    • Scala.js there is an UndefinedBehavior error that is thrown during dev/testing. This is relaxed for production builds.
  • [Other JS interoperability features] there’s a lot of them!
  • Can we do better? Yes
    • Generate one JS helper for every JS interop call site and other enhancements covered on the slide.
    • Could WebAssembly help us do better in the future? Unlikely because we don’t want js operations in WebAssembly.
  • Exception handling
    • Need to be able to catch and throw JS exceptions. From Scala.js’s semantics view, our exceptions are JS exceptions.
    • Contributed catching with JSTag for try_table and proposed behavior for throwing JS exceptions which became spec text.
    • Users of the exception handling proposal and js stack.
  • @JSExport
    • One feature we cannot implement is @JSExport inside Scala classes - works for top level but not in classes
    • Scala objects are GC structs, GC structs are totally opaque to JS
    • Possibly solved by proposals
    • [list of requirements for implementing this feature]
    • 3 things that will give the full power of Scala.js interopt and WebAssembly
  • Other proposals that could help
    • More efficient strings: Stringref / JS string builtins
    • Coroutines: JS promise integration / stack switching
    • Efficiency details: call tags
  • Performance: great with no JS interop, abysmal with JS interop
  • We want to target non-JavaScript Hosts but this would be another target. This would obviously mean dropping the JS interop. We would have to re-implement primitives.
  • Food for thought for questions and discussions:
    • So far the interop has been at the ABI level. This is OK for languages with linear memory but not so great for languages with managed memory and an object protocol.
    • If we had universal anyref-able primitive datatypes like stringref and others.
    • Closure creation to create funcrefs.
    • A protocol for accessing objects.

YI: Issue of calling methods from JS - can’t you in principle implement this by wrapping in a proxy when handing from wasm to JS?

SD: It’s basically the same issue we have with the i32 anyref. By the time it gets to our universal representation it already has to be in the right shape. Within Scala code, our representation would have to be these proxy JS objects, and that’s terrible right?

BT: Originally we did have a way to create closures, removed it because OCaml wanted their own closure representation. Another thing we kicked out was more complicated union types. Would it be useful to have the union of i31 and some user type?

SD: So for i31 we can do it because it is a subtype of any. But it doesn’t help because we have the 32nd bit so we need to store it and it’s not an option. We can use i31 sometimes as an optimization to avoid having to go through the JavaScript helper but in the general case we need to go to JS to hold all of the possible values.

BT: What about the use case of “heap number” - choice between i31 and struct boxing a JS number?

AR: You can already express that. Union types might give the engine more static knowledge in terms of data representation, possibly giving some performance.

BT: I agree and think performance is important which is why I was asking.

AR: One note - problem to solve is constructing JS heap values in wasm. That’s the key interop problem.

SD: Yes, you could see it as an interop problem between different wasm languages, too, at the GC level.

AR: Right, and we’ve had many discussions about that and seamless interop magic was never the design goal because in our experience it never really works. It also collides with the goal of WebAssembly being a lower-level language. Any form of interop we would end up choosing would fit some language and not another. So far we have tried to keep it low-level with a couple exceptions like cast operators where we kind of went a little overboard. With all of the types here there is no hidden branching or hidden allocation and that is the line we need to not cross.

SD: We would be fine with it being predictably slow! If i32 with that last bit could be heap allocated, sure, fine. We would prefer that for interop.

AR: Isn’t it possible to express that right now in user space? In your own code you could say this is i31 and then the larger value right now.

SD: Yes, that’s what we use the JS helper for. But we can’t do it entirely on the wasm side. We still need to ask JS to make a number for us.

AR: Right but that is a separate problem. There you really need to create a JS interpretation and not really a boxed number. There is a hidden assumption on this last slide, in that these types would interop with JS the way you would expect them to.

SD: Yes, and any other wasm language that chooses to use them.

AR: My take would be that we could go a similar route with these that we are trying with string builtins. Particularly once we have something like type imports. Right now we use externref, but we could have interop modules in the JS API level that give you the primitives you need.

SD: Yes that would work.

CW: To echo Andreas, the direction I expect this to go would be to facilitate the allocation of JS objects instead of trying to make engines more clever behind the scenes.

AR: One other comment re: closures: The main reason we removed them was concerns that we want a plain function pointer type, and don’t want to conflate it with a closure type. There might be room for closures as a future feature but would be a separate type. They are a richer thing with more expansive implementation.

SD: Sure, that wouldn't fundamentally change how we use it.

AR: With all of these things, it really takes someone to champion it and push it through. Now it seems that we do have a user.

CW: To be fair, the rebuttal is often to have the engines implement these proposals as that often takes significant work.

AR: Yes that too.

TL: Thank you very much!

JS string builtins (Ryan Hunt)

RH presenting slides.

  • Summary of proposal:
    • Goal is efficient inline codegen of JS primitives in wasm code without changing core spec
    • Adds set of importable builtin functions
    • Imported via opt-in flag at compile time
    • Adapt existing JS primitives to the WebAssembly ABI - receivers, operators, etc. lowered to wasm concepts
    • Data type conversions minimized
  • Example
    • WebAssembly.compile(..., { builtins: [‘js-string’] })
    • (import “wasm:js-string” “fromCodePoint” (func …))
  • Progress since the last time I gave a presentation is that I have added new tests and there were some small proposal changes but mostly miscellaneous typos.
  • String constants: there’s a long discussion in the issue.
    • This proposal relies on JS strings being externrefs but then how do you handle the issue of not having string constants.
    • By implementing globals with externref and then generate stringrefs with constants.
    • Second idea is doing this with data segments.
    • There’s a third idea for Array.fromIterable but I’m not going to get into it.
    • Thankfully we had a really nice idea here that works really well.
    • Basically implement with importedStringConstants as true when compiling with WebAssembly.compile. Then the second field of the string specifier in the import part of the binary format is an immutable global of type externref. These are UTF-8 strings.
    • This is very efficient, easy to generate, and easy to work with. Ticks all of our boxes.
  • What’s left?
    • Decide on UTF-8 support (or deferring it - TextEncoder and TextDecoder not yet used by anyone)
    • Bikeshed names and collections of builtins
    • Write the spec
  • Phase 3 Poll but pausing here for questions.

CW: What do you mean by keeping or deferring UTF8 support?

RH: I would like feedback about the pressing need to use TextEncoder / TextDecoder. My understanding is that users mostly care about JS strings (UTF-16) and don't need to convert to/from UTF-8.

CW: When you went to the final solution for globals, you made a decision about handling UTF-8. Is that the same decision here?

RH: No, if you need a constant with unpaired surrogates, you can use the other mechanisms I mentioned like importing globals. The question here is about using imported TextEncoder and TextDecoder APIs to transcode GC arrays of i8 with UTF-8 contents to strings and back.

CW: So the current interpretation of the global approach is that it’s the UTF-8 string constant in wasm but turns into a UTF-16 string object in JS?

RH: Right, the UTF-8 is just piggy-backing on the core Wasm Spec for that.

CW: Makes sense.

Michael Ficarra: String constant thing - already did this myself in my own programs, but I had an imported proxy. It… works. Is this global approach necessary because it will be more performant?

RH: I think it will be a lot more performant for the proxy use-case. I expect this to have much better performance.

YI: At the beginning you mentioned that this is a mechanism for JS APIs in general in Wasm. Is there a future plan of more things beyond strings?

RH: I would say that not JS APIs in general. That is an expanded scope that I am not planning for. The main thing is like JS booleans where you can use imported JS function to manipulate them, but the performance would be really bad. In those cases you really don’t want to go through a virtual method call. We are not going to have a builtin explicitly for like the fetch API. That would be much more expanded scope.

CW: I remember during the first moments when we were discussing compile time imports one of the concerns was how to support operations that write strings to linear memory. Where did that settle?

RH: That is a good question. I believe we deferred it given that most users are GC users so they don’t have linear memory to contend with.

BT: The extern reference in the future could be sharpened to be a type import?

RH: Yes that is the plan. There is an issue talking about the possibilities of how we could do that.

SC: Same question about linear memory. Would be great to have as an extension, and would be great to have UTF-8 for C and C++.

AK: Thanks for the presentation. This is something that works really well with WasmGC in production, in particular for Google Sheets. Constants were the last piece. There is an origin trial happening now, with positive feedback. It's great to have real users.

Chris Woods: On that topic, how would we expose it to C/C++?

RH: The typing on the js strings is externref. I believe emscripten can do that. Basically from your C++ you can do that. If you’re trying to bring strings from JS into C++ memory, then you need an extension of this proposal for copying to memory.

TL: In principle, this should be possible today. I'm not sure anyone has tried it with clang, though. Clang supports externref, so you can write functions that take and return externref. These functions should be importable, so it should just work.

Oscar Spencer: One concern about this proposal. From an import standpoint, it doesn’t feel super semantic. When I think about combining this with the component model, any individual module is going to be using its own interface. Can’t analyze it statically. “If I see this special module with its special interface, I’m going to treat it differently.” I know this will work, but feels like a hack vs. going all the way to real component imports. What if I want it to be more portable, run in places that aren’t a JS engine?

RH: Are you thinking of string constants or the entire thing?

OS: Constants. Now I need to specify some kind of fancy component that can also satisfy those imports in the same way.

RH: We have done some thinking originally about how you could polyfill the functions into non-web embeddings. This one here where you have the actual string constant in the extern ref specifier. You’re right that in non-web embeddings, it’s a little tricky. I’m not sure you can do reflection of the imports in non-web embeddings but I know that’s how the proxy works.

OS: Certainly could make it work, but this feels like abuse of imports.

RH: I don’t deny that, but it’s the one thing that satisfies this narrow intersection of requirements.

AK: The module looks like a regular module, and for portability you’re not going to be able to depend on a module that wants those APIs running well in other non-browser environments. In the browser you’re inlining everything. I know it feels hackier, but pragmatically the whole thing doesn’t work well outside a JS context.

OS: What I’m thinking about today is that I can define an interface for a module. You write a module that conforms to this interface. With this proposal I now get into a situation where if you just happen to use a string that I didn’t say you could use, now it doesn’t work. It’s now my interface plus these strings that you’ve asked for.

LW: Pretty creative solution, can’t think of anything better myself. One thing that makes it less of a worry outside browsers is that this is a solution for Wasm that interoperates with a JS engine. Outside a browser you don’t make that assumption - strings would be in linear memory or something. Lots of other options for other contexts. Fundamentally this is about efficient interop with JS specifically, and is the best solution for that.

BT: In some sense beauty is in the eye of the beholder. I think this is beautiful. The Wasm module describes what it wants from the outside.

CW: Possibly naive, but I have to ask, what does it look like if instead of putting literal strings in import names, we use indexes into a passed array of strings?

RH: I believe that could work but it would increase the binary size. The measurements the google sheets ran, they saw this was already getting bigger.

CW: Wouldn’t global index space be the same, but index space shrinks?

RH: This solution vs. a string solution with different index space - every byte adds up when you have 100k. Might be overhead for imported strings as a array and you need to do JS get properties.

CW: You would get back to a world where you can pass strings as regular imports.

RH: That could be fine. There was also some concern around parsing being slower for JS strings than for binary strings. My guess is that it would be a workable solution but it will probably bother some more people that aren’t completely satisfied with this solution’s performance.

TL: Yeah, we were experimenting with that at one point. A full solution had a custom string section, and we would read from the custom section on module instantiation. The imports then have indexes. The custom section comes from the requirement that string literals are embeddable in wasm module, because producers know their string literals and want to produce one artifact that contains both them and the code. This solves that fine because imports are in the module. Custom sections would too, and you could post-process them to extract the strings and save the cost of parsing the custom section, but we never got that far. In general this is strictly better on the metrics we care about.

CW: I won’t push on this too hard, I just want to make sure it has been considered.

TL: If you did do that, you’d have imports "0" through "100000". Is that a lot better? Even for the rest of the ecosystem outside JS I’m not sure another solution would be better.

HA: Why can’t we do an imported table?

RH: One issue with tables is that they are mutable, we want to access these string expressions in constant expressions. Tables would be nice but don’t satisfy that.

Zalim: Why is one of the goals not changing the core spec?

RH: Personal view - over the long term, to get tighter JS integration, doing it through builtins is much easier than trying to find the right type system that perfectly intersects between JS and everyone who uses Wasm. String constants could be different, but one issue is that if you had a string section, you need a string type to talk about them. Is that extern or something else? Tough questions. I wouldn’t rule out string type and string section, but this solves that without going down that road.

TL: I think so far with this early stage of this proposal, it’s already proven as a well lit path. We’ve had users following the same path with WebGPU with web builtins that will go faster. Technically this proposal sets a path that scales a lot better than trying to add all of this to the core spec.

RH: Hopefully when we write the spec, the spec infrastructure will make it easy to extend.

SC: On the table mutability, immutable tables would be great.

MF: I’m sorry about this bike shed topic. How tied are you to the cute name for imports [the cute “‘“ namespace]? I would prefer to set a precedent for using the "wasm:" prefix for built-in modules.

RH: I agree, that would be my preference, but in the binary format the import namespace is repeated every single time. What would make it feasible is if we had a way of deduplicating names in the binary level, e.g. through a new name section. If we had that, we could do something better. Otherwise, the single character is the best we can do.

MF: Do we need to do it in that order?

RH: Practically I don’t think anyone will use this if we do it in that order.

NF: Does that code size actually matter given that you are probably compressing over the wire?

RH: Question for Google Sheets more than me. I’d guess it does. I can open an issue to revisit that before phase 4.

TL: One idea we’ve experimented with is that you can have a configurable name for the string constant namespace. That would remove the cuteness as well. We experimented with that and when you configured it to just be the quote character, it was 2% slower. With the "string-constants" name, it was 12% slower. It’s significantly more complicated to make it configurable in the engine so keeping it this way as proposed.

BT: Why not the empty string?

RH: Yeah I like that.

CW: Now I’m thinking about this new global trick…it’s interesting because previously we could understand it as a special case of compile time imports, now it’s compile time imports + import name reflection. One concrete question: for normal wasm modules, the import things will be quite small, so are there implementation defined limits for import names that will clash with these string constants?

RH: I’m sure we have one, feels familiar. But haven’t heard from anyone experimenting with it that it’s been an issue, and it’s something we could easily increase.

BT: Just to prompt your imagination a bit. Instead of a quote, just a json literal. The engine has amazing optimization for json and would create a literal expediently.

RH: I will file an issue, you’re all invited to comment. Phase 3 poll?

TL: Poll to advance to phase 3!

SF F N A SA
17 14 12 0 0

TL: That is consensus.

Room: applause

Memory control (Deepti Gandluri and Ben Visness)

DG & BV presenting slides; notes not already in slides:

  • BV: proposed mapping instructions assume you have a region you can map; will cover the negative case later
  • DG: "pages" might end up as "bytes" in map JS API
  • DG: don't want to expose "protect" in JS API"

CW: Can you explain the use cases for anonymous mapping?

BV: We'll get into it more later, but once you have mappings they should maybe be anonymous.

CW: For Wasm specifically, you can think of the memory as being an anonymous mapping already.

BV: This will be addressed later.

BT: The map operation takes an arraybuffer as an input? What does it represent?

DG: If you had an arraybuffer that was already mapped, you could imagine remapping it into memory.

BT: does it create an alias or detach?

DG: Whether you map or provide protection, you have to detach existing references. We have spec restrictions like never having multiple ArrayBuffers with the same backing store, at least in the current spec.

BT: The intent is that there aren't aliases where two things refer to the same physical memory?

DG: Yes, as the spec stands,

DG: [presenting "Reserved low region as mappable" slide, which includes different options as to how to expose ArrayBuffers]

CW: would it be simpler to have the mapping memory totally separate from linear memory?

DG: We have several options for what the memory layout could look like. We'll present a few, then discuss.

BV: We have to lay out the problem and describe some possible solutions first.

CW: Why not a separate memory?

DG: We are trying to scope this to a single linear memory because applications and toolchains do not want to deal with the extra annotations you would need with multiple memories.

CW: Makes sense, especially for a C-like language that just has pointers into a single memory.

BV: The toolchain support for multiple memories is not clear.

CW: But then, is it too restrictive to only allow mapping a single low region?

DG: We're only proposing the map length approach because it’s conservative. We could make it backwards compatible. You can also still perform all classic memory ops. There will be new APIs, but it's not an entirely new paradigm. Memory.discard would still work. Also potentially compatible with BYOB APIs. Downsides are non-intuitive fragmentation of memory and limited regions of mappable memory.

[slide presentation continues, Benefits/Downsides slide for "low region as mappable"]

BV: [presenting slides on "whole memory as mappable"]

CW: Question for JS interop for whole memory mappable; why is this harder? Seems like you could run into the same worries with the low region approach?

BV: With first approach the buffer starts at normal linear memory and the buffer wouldn’t be directly exposed.

CW: Isn't that gross if you want to pass an index out to JS?

BV: You would have to translate the indices.

DG: There’s no way to isolate the behavior between wasm and js

BT: For the whole region mappable, are there types of JS buffers that would work with it?

DG: Not right now. There are some proposals in progress e.g. read-only collections. If we can decide what we want in Wasm, we can influence those proposals.

NF: you mentioned defining in terms of bytes to fit with custom page sizes, but that shouldn’t be needed since everything should just work once you’ve defined the page sizes. Might want to make the two proposals mutually exclusive.

NF: At Fastly we found that these virtual memory APIs are the horizontal scaling bottleneck for multi-tenancy. Would it be possible to provide declarative mappings? We could use wizer to collect dynamic mappings, then capture them in our snapshot. Hopefully the APIs can return errors to disallow them at runtime

BV: We've been thinking about extending active data segments to work with mappable memories. If data segments aren't exactly the right tool, then there could be a set of mappings you provide up front.

JS: You mentioned memory.discard is like MADV_DONTNEED and it zeros, but linux also supports restoring a mapped file.

BV: Will address later in the presentation.

CW: Do I understand that the Chrome prototype discards at the end of memory?

DG: We're going to adopt the Firefox approach; we want some data showing that it does what we want it to do. This will be based on implementation feedback, but for now we’re trying to get the broad ideas.

CW: It seems harmless for it to be page ranges within the memory.

CW: Is the reason we want to expose mapping APIs because web APIs don't support passing in a buffer?

DG: We can’t actually do that today. You can’t have just some array buffer that you want to import as a memory.

DG: You can construct a Wasm memory, but it’s still controlled by the engine (not some external array).

NF: I recall benchmarks showing performance portability concerns around memory.discard?

BV: The performance was generally fine for most cases of memory.discard, except possibly for shared memory on Windows because it requires multiple API calls. This could be relaxed if we allowed traps on racy virtual memory operations. To avoid traps we had to memset zero first and then unlock the virtual memory. Most of the time it's equivalent to a memset to zero call.

YI: About the low reserved region option, I'm concerned from a toolchain point of view. Say I compile a C++ program; now I need to know how much space I need, but also how much mappable space I need. Seems like a lot of numbers to keep track of.

DG: Yes and no. It lets you do more powerful things. It's extra bookkeeping, but maybe it's worth it.

YI: The other option would allow all the memory to be used that way.

LW: It seems useful to separate the use-cases: how to efficiently get bytes in/out of memory. Separately, how do I deal with reserved/committed memory when dealing with a large amount of memory. I expect these can be handled separately. Even when you have memory APIs, you might not want to use them. Wasm is very embeddable and for many use cases it will require falling back to non-determinism in the spec or expensive emulation. If we subdivide these for IO and making a view available BYOB style might be better for I/O. For managing large amounts of memory, there are other approaches that don't necessarily need to trap, but just hint at optimizations.

DG: Concern for something like a hints approach, if you have an interaction between Wasm and some set of APIs, that could be a source of non-determinism, which is something we'd like to avoid.

LM: Given the ongoing lack of support for multi-memory operations, should we see this as a replacement?

DG: This is orthogonal to direct ops on multi-memories

TL: Multi-memory is really most useful for composing modules. We haven't seen source language toolchains that would make use of multiple memories.

KW (chat): We use multi-memory extensively in our (research) toolchain, because we have a nonstandard "map this abstract blob into this read-only memory" imported function. But this imported function requires the ability to modify the length of the referenced memory (including shortening it) which requires pretty nonstandard behavior.

DG: there’s an elegance to multi memory, but with actual apps there’s significant limitations.

SC: I think BYOB can be the answer for a lot of cases, but it doesn't work for WebGPU. In that case you want to map the GPU buffer into the main memory.

LW: I hear at some low level it's streaming to and from the GPU, so maybe that would make BYOB work.

SC: Chrome wants to map to the GPU memory from JS. It's fundamentally worse in Wasm right now.

SC: Makes me very sad that I won't be able to access that prefix from JavaScript.

DG: It could be possible, but not a regular array buffer.

RH: Could you expose things to JS by exporting functions that you call from JS (analogous to what we've done with JS interop for WasmGC objects)?

BV: I’d like to explore, we’d have to have builtins that are general Wasm memory ops.

RH: Basically export load/store instructions.

DG: In the short term you could, but in the long term I’d like it to be more cohesive. We don’t want to fall into local solutions for Wasm. It's more intuitive for developers if there are JS APIs as well.

CW: If we go far down the route of making the mappable memory available in JS, however we do it, then it seems like we should make the whole memory mappable. Unrelated point: I'm glad you mentioned access to shared memory for memory.discard. Have you looked at how mapping would work for shared memory?

BV: We should go through more slides

DG: opinion in the room? Picking between the two.

Whole memory mappable: 10 Low reserved: 0

DG: Anything we’d expose would need to be through imported or exported functions.

BV: Many questions apply equally to both.

BT: I know from looking at this before, the choice we make here means some amount of auditing of browser code bases. Any idea how much work that is?

DG: you’re not constraining anything if you go with the whole memory approach.

BT: There needs to be some auditing in code bases, maybe some automated way. Is there a way to get a handle on how much engineering work it would be to update the browsers to handle mapped memories everywhere.

BV: for firefox I’d turn off the buffer property and try it, but the internal representation wouldn’t have any problems

BT: I’m more thinking of random code handling the buffers, not the official APIs

DG: For the C++ API, do you mean the JS array buffer or the backing store?

BT: Like if the C++ punches a hole in the ArrayBuffer and passes the backing store around.

DG: Our security team wouldn't let us do that. It's a non-starter. We’ll still need to support Memory.buffer even if we prototype this

BT: It would be good to get some of those security concerns documented as part of the design.

YI: Is it possible to as we add new types of buffers, could you say the low region is the whole region and in practice have the “whole region” solution?

DG: The reason we're presenting the conservative approach is because it's possible we could learn things from investigating that path in the short term.

TL: Time check: 15 min

BV: Let’s mention other design questions

[Presenting “should memory.map take an address?”]

CW: I have a strong opinion that the user should choose the address for determinism.

BV: Could still have determinism if we spec the allocation algorithm.

CW: I don't think that's something we could spec.

DG: There's also a bit of this where if your whole memory is mappable, then it might make more sense for the user to supply the address. Whereas for the low region approach, the engine may have to do some tricks anyway. It’s just a question of how much control we want the application to have.

CW: We expect it to be user level code deciding where the mapping gets placed.

BV: For reasons like that, I would always want to expose the ability to choose mappings to the users, even if we also allow engines to choose addresses.

Dan Gohman: There’s always a contiguous region of virtual address space in the host?

BV: You’d do an initial reserve of the whole region you need and do mappings within it.

BV: [presenting slide "trapping vs zero filling behavior"]

Matthias Blume: Don't you always have to deal with traps if you have protection?

BV: Memory protection boils down to that issue. You can separate the aspects other than memory protection from protection, which needs traps

CW: Can you go into more detail about the scenarios where you're trying to avoid traps?

BV: For a memory.discard prototype, we wanted to be the case that two racing threads wouldn't trap. On Windows that requires de-commit then re-commit, and another thread could observe de-committed memory. It requires shenanigans to avoid traps in that case.

CW: How does that extend to more advanced features with mapping?

BV: If you're changing mappings, then the other thread could observe the memory in an inconsistent state.

CW: I wouldn’t be against specing non-deterministic traps.

BV: [continue presenting “Trapping vs Zero filling behavior”]

CW: I'm on board with traps.

DG: 4 minutes left and we have more open questions. We can preview them and open issues on github later. [continue skimming remaining slides]

CW: I think if you do have racing calls, it's fine for it to become some mush of all the things that happened.

DG: We want a longer conversation about what we want in the core spec or in imported APIs. We're going to open issues based on the feedback. Let's do another 5 minutes of Q&A.

Dan Gohman: You mentioned reading a file. Reading a file via mmap is effectively sync, which wouldn’t be appropriate in async environments such as JS hosts.

DG: We're looking at file mapping as a second use case. Yes, we can’t provide synchronous readbacks. We would want to make the optional file descriptor available for storage APIs, but not give any guarantees yet.

BT: Regarding memory protection, there’s a gap with native. Some languages don't have support for checking null pointers without help from the hardware.

DG: We agree that we want some form of memory protection.

CW: My concern with the racing concurrent memory primitives, I'm OK with traps, or some mish-mash of before/after behavior. But if you look at the POSIX level, they don't even say that; rather it's undefined. That's scary. I'd want us to assure ourselves that we're not falling into some insecure behavior.

TL: from zoom: have you thought about interaction with wasm GC?

DG: No. For this proposal we've wanted to constrain the scope to what we presented today. Thanks everyone!

Exception handling (Ben Titzer and Heejin Ahn)

BT presenting slides

BT: LLVM doesn’t support it, but a post process with Binaryen does. Does that count as a toolchain? Kotlin also supports it, are they important enough? Zalim is on the call; he may have opinions!

TL: CheerpX is the only thing that supports branch hinting, and that just went to phase 4.

CW: Agree, you're over-thinking it. We definitely have at least one toolchain.

MF: I’m already throwing and catching, but getting externrefs. How will it be different for me?

BT: exnref is not in the same type hierarchy as externref…maybe Derek knows better

DS: Maybe you're thinking of using JSTag?

MF: I’m not using the tags, just try-catching with setting and checking a global. I want to be able to introspect from JS and can I do that with exnref?

DS: The way you would do this is use a tag whose payload is an externref. You can either create one yourself, or import JSTag. Then when you get an exception, you'll get two things: a payload, and an externref that is the thing that was thrown.

MF: And that JS tag is in this proposal?

DS: Yes.

AR: The exnref is just a more flexible way to rethrow, essentially.

DG: Regarding the question of the spec and toolchain completeness, how close are we? Is the spec completely updated?

BT: Yes.

HA: It's hard to define the finish line. Finishing Binaryen implementation is going to be a long tail of several months to achieve parity with the current Phase 3 proposal, but with the Binaryen translator you can use it, with all the optimizations from the phase 3 proposal. For people that cannot (or will not) use Binaryen, we are a couple of months away from that for LLVM.

DG: An application can have parity through the Binaryen translator?

AR: Scala is also targeting it, so that’s another one we can count.

DS: JS spec is done, but I just want to add some WPT tests.

AK: Don’t those tests count?

HA: I don’t expect more than a week for that.

AK: My worry would be that the new thing doesn’t get implemented in LLVM, or is hard.

TL: From my experience in Binaryen, the new stuff is much better and more consistent with how Wasm otherwise works. So since LLVM handles loops and blocks then it will handle the new thing fine.

HA: I don’t expect the new one to be hackier than the old one. Wasm doesn’t perfectly fit LLVM IR anyway, so we had to make adjustments. Even though we don’t support value returning blocks in LLVM, we should be able to get around it. I’m not worried about feasibility.

AK: Thanks, I don't have any more concerns. It seems like the proposal clearly meets the toolchain requirement based on non-LLVM toolchains.

HA: I don’t think it’s hackier than the existing code.

AK: That’s why it’s good to get the test suite in. If we can have it in a week, let’s vote in two weeks.

TL: The process doesn’t say anything about JS tests.

DG: Since here it’s not a web-only feature I’m not sure we should gate on web tests.

BT: We could add JS API tests in the spec repo.

DS: I had trouble running the JS API tests.

BT: Does it even say the tests must pass?

(It does)

BT: Let’s defer the vote until we have the tests.

HA: I really think we can have them done in a week, but might not be merged to WPT repo. I’m happy to come back in a couple of weeks.

AK: We’ve done votes assuming something will happen…

BT: Lets not do that again.

TL: Let’s clarify the JS tests.

AR: It matters where it is and it should be in the spec repo.

HA: Is there a way to run them without WPT?

AR: Uou can set a path to node and it will run them.

DG: You can import spec tests to V8 to run them. This is a bit unique because we changed the spec at phase 3.

TS: Node and deno run relevant WPT tests. I think they should be in the spec

CW: What’s the fastest place to put tests? The spec repo.

RH: I'm pretty sure we can run tests from the spec repo.

AK: I’m happy to help.

BT: All agree.

HA: What's our conclusion?

BT: Put tests in the spec repo and then vote on phase 4.

Thursday, June 6

SpecTec DSL (Andreas Rossberg)

AR Presenting slides

BT: This is beautiful. Funding can go both ways; the funding agency might say, “why doesn’t industry fund this?” Building a DSL can be a way to add new forms of factoring. Is there some limit where we stop; are we going to have a set of abstraction capabilities?

AR: The design of the DSL was entirely driven by the meta notation in the existing spec, it just basically uses the meta notation the spec is already using, so there aren’t more abstractions over that. There’s maybe on thing - the syntax definitions are more precise allowing them to be parametrized whereas before it was handwavy

BT: Experience has shown that people tend to want to make languages more complicated to add increasing abstraction. If it’s too abstract, it may become too esoteric to learn. I don’t know exactly where the line is. I’m curious where your thoughts are.

AR: I think this is shielded against by the way that the DSL is supposed to be WYSIWYG. You can only be writing what shows up in the rendered document, you won’t have facilities beyond what the spec using.

BT: Yes, that makes sense.

Mattias Blume: Very enthusiastic about this. For long term goals, would it make sense to get rid of the reference interpreter and generate it instead?

AR: It’s a question of priorities & resources, those could be great student projects. Happy to talk to people if they want to contribute. Right now the main priority is getting it ready to adopt. With the meta interpreter in line, there’s a hump to get over - but will pay off in the long term

FM: I plan to vote for adoption, but to me there are some risks that you haven’t articulated, the one that strikes me that it increases the group’s dependency on the knowledge of a small internal group of people and it’s not very widely shared

AR: It’s not just me; many people have worked on this. This is somewhat covered in the slides; but yes, this is one of the biggest risks.

FM: One way of reducing the apparent risks is if SpecTec can be used by other specification efforts. That way there could be more people invested in keeping it going.

AR: There are other people who are interested, and looking into it - Nate Foster looking into using something for their tools. We think the general approach, will be scalable. Component model is also another option, LW interested in experimenting with it. Some of the stuff is generic, and some is specific. For right now we are trying not to make it over general to avoid solving too big of a problem up front. But I hope it will broaden eventually.

SR (chat): Nate and one of his students are using SpecTec for the P4 programming languages.

Deepti: A lot of other language standards do this: extending the spec editor to a group as a maintenance model. I’m excited to see the tutorial, and am interested in ways to get other groups involved as well.

AR: Extending the group of editors is always a good idea, but might be orthogonal to the question of tool maintenance; potentially different group of people.

David S: I have some concerns. I don’t think this should be adopted. First, there is inherent complexity. You’re taking a lot of inherently complicated things, like LaTeX, Coq, the Wasm spec itself, and putting them all together. When you take complicated systems and put them together, you get something which is fragile, specialized, and high-maintenance. Second, incentives. When someone wants to add a feature to the wasm spec, they’re incentivized to do the work to add the prose and reference impl and so on. If we make the process easier, that means less work overall, but the incentives are aligned with the work that needs to be done.

Big concern with this is that people who are incentivized to do the work or pushing the proposal forward, is then transferred to the group who isn’t incentivized to do this. And the expertise also isn’t centralized.

AR: I agree, that’s a risk. The way proposals work, actual spec authoring does not happen until after proposals reach phase 3, when a lot of the work of the design and prototyping has already been done. I have already often been involved in helping craft the actual spec.

CW: I would add in particular, we expect that the kind of solution that is needed when the extension of the tool is the same as when the spec would have to be extended, so an expert would have been brought into it anyway.

AR: For example, I’m not sure many other people would have had the expertise to extent the spec for threads - myself and CW

TL: All that complexity was there before, explicitly in the LaTeX and so on, and it wasn’t separately addressable. I’ve had a chance to play with SpecTec, and the experience is night and day, so much easier. I did run into some backend robustness problems, but still, it was so much better than what I was already doing. So I’m very much in favor of adopting this.

TS: It’s also useful to look at the spec efforts, including the JS API for Wasm which is the DSL which is used for a lot of other specs. JS/ECMA also has a DSL. The introduction of the tools that make this manual effort abstracted into tools, we see more people actually contributing to the spec, IMO its the right tradeoff

Dan Gohman: How do we expect a cross-cutting concern like profiles to work in SpecTec?

AR: The way profiles are currently specified is by marking rules, they are toggles on individual rules, which acts like a filter to the rules, I don’t think it’s a fundamental problem to solve that. We would have to look into this

HA: Haven’t had experience writing the spec, but have experience reviewing some spec, and agree with the need for this kind of tool. But the new spectec is supposed to be the single source of truth and not the generated spec, but we’re not expected to tweak the result? Do we file a bug report on the tool repo and wait for it to be fixed?

AR: The tool will be in the spec repo. We’re currently working on a fork, with the SpecTec tooling in a directory. The plan is that everything will be in the same place.

HA: If someone is going to write the spec for a new proposal, in case something has to be fixed in the generated spec, they can’t touch the output but have to make sure the tool be fixed so that the intended output is generated, right?

AR: Right, one of the questions in the questions slide is that we need process and infrastructure that allow temporary fallback to not block proposals. We need to be able to deal with limitations, as they will surely occur, so we need some commitment from the authors of SpecTec to fix limitations, and a process for working around them when needed.

HA: I’m not necessarily advocating for this, but wondering if there’s another possibility: The idea of single source of truth is attractive, but have you considered using the tool as a helpful tool, like when we use an LLM, and then you can tweak the generated tool if needed, and in case someone wants to write the spec manually from scratch, we allow them to do that?

ConradW: Andreas was describing coarse-grained fallbacks like disabling entire backends, but we should also discuss fine-grained fallbacks for individual rules in the spec.

AR: Ultimately this should only be a temporary thing, you want to be able to regenerate the spec, and interpret it. But something else to unblock, you can still do it. You can even still write the parts of the spec manually and not auto generate them. But generally you can write it as a fallback

Chris Woods: This seems like a long journey, with a lot of work, and thank you very much for your work so far and for proposing this. There are huge benefits to verification! Especially since we have use cases for putting this into critical infrastructure.

BT: To address what David mentioned: it's not really the case that if you know LaTeX, you can just jump in and write the spec, you already need a set of people that have expertise for specialized cases, but it lowers the bar of entry for people to write the spec without modifying the tool

HA: But would it be possible to just use the tool as a helper tool?

ConradW: The single source of truth, I wouldn't hold that very strongly - you can always write manually if needed. Can imagine that even at the switch over point, there may be sections of the spec manually. There are parts of the language where the ROI on extending the tool isn’t worth it. There have to be graceful fallbacks to the different sections of the spec. It can be generally true without sticking to it strictly

AR: Goal of conversions usually in the ideal future. You don’t want to merge proposals into the spec until you’ve extended the tool, at least for execution, i.e., to keep the interpreter running. But as intermediate steps, it makes sense to express things outside SpecTec.

CW: You could imagine a hypothetical memory mapping proposal perhaps has some primitive uninterpreted function.

AR: SpecTec has a feature to have auxiliary functions, which don’t need to have a definition. The interpreter knows some functions it has to implement on the outside, we could make the scheme more systematic.

BT: One additional risk is that SpecTec may succeed too well, and WebAssembly gets too big.

Deepti: You’ve talked about what it would take to get SpecTec to a point where we could adopt it. I was wondering if you had any timelines in mind.

AR: The idea is that for the partially green document subframe, would like to complete that within the 6 months, some caveats about the threading proposal. All the other proposals we should be able to make them work in time. Would include phase 4 proposals, and basically completing the rectangle of green by the end of the year.

Deepti: You had talked about a process for fallbacks. Should requiring the tool be extended to meet phase 4 requirements for proposals be required? Should we consider the process on a proposal-by-proposal basis? Overall, I think there are a lot of process details to figure out.

AR: In terms of infrastructure there isn’t so much needed. In our fork, CI runs the tool and generates the document. We’d need to add some config to disable individual backends. I’m not too worried about that part. For the process part, the question is, who is responsible for updating SpecTec to meet the needs of prospective phase 4 proposals, and on what timeline? These are questions the CG needs to consider.

DS: Let’s go ahead and do the vote.

Poll: Adopt SpecTec (once it’s ready) as the toolchain for authoring the spec.

SF: 19 in room, 5 on chat F: 13 in room, 4 online N: 4 in room, 4 online A: 1 in room SA: 0

DS: Consensus.

Typed Continuations (Frank Emrich)

FE presenting slides

CW: The process for going through the Asyncify benchmarks goes through Wasm all the way

FE: It would be unfair to run Asyncify without O2, we don’t run optimizations with wams-opt just yet. Even with wasm-opt we don’t expect orders of magnitude difference

CW: It’s cool that you’re competitive even without that.

FE: Well, not in the microbenchmarks, but they inherently tend to favor asyncify. But in any case, we’re not declaring victory yet, and we’d be really happy to have more macro benchmarks.

Ask to the CG: If you have macro benchmarks or a suite that runs in C for stack switching benchmark, send them to FE & other Wasm-FX folks.

FE: Cancellation is waiting for exception handling.

Mattias Blume: Have you looked at the history of benchmarks in the Scheme world?

FE: Ideally what we like to have benchmarks written in source language that we can then replace whatever stack switching primitives they have with our fiber library instead of rewriting the whole pipeline. Looked into Tiny go for a bit, to get a huge number of programs that are manipulating stacks, and then just run them, It’ll be great to target them from somewhere other than C.

CW: The problem is, we don’t have toolchains from Scheme to Wasm?

BT: Comment in chat: TODO

FE: I know OCaml folks are interested in using WasmFX. We ourselves looked into TinyGo. But we’re interested in suggestions. Another issue is that Wasmtime hasn’t implemented GC yet, which will unblock some toolchains that want to use WasmFX.

ChrisW: On the embedded side, there are a whole bunch of usecases which re C based, would be happy to play with the fiber library

Brendan Dahl: Do you have any intuition on why performance is not on par on the microbenchmarks?

FE: In these small microbenchmarks, it’s too simple, and you can play the game of resume -> the first instructions but they’re all through runtime calls. If you’re counting instructions with an asyncify pipeline, it wouldn’t be competitive. If you do the smallest microbenchmark running with cranelift, it shaves away 75% of the runtime overhead. Will be looking at asyncify to understand the generated code better

SC: Emscripten has a Fiber API. Are there any limitations in WasmFX compared to that?

FE: We are using it with arbitrary nesting, not sure if Asyncify accidentally supports it - WasmFX supports it, and the fibers API supports that, and asyncify supports it

SC: The Fiber APIs don’t require the nesting.

FE: You can resume another fiber while running inside a fiber.

SC: It looks similar to the emscripten library, maybe we can use the sample programs

BT: Do you ever need to go from a stack pointer to the fiber that contains that stack?

FE: Just the stack pointer? No.

BT: When exception handling is implemented in Wasmtime, you’re just going to walk down the stack; then to invalidate the stack you need to map from stack pointer to the fiber.

FE: In our design, exceptions propagate from child to parent, you can propagate child frames until you end up being the parent, until then you have to trampoline. The current frame that I’m seeing is a trampoline, then I must be a parent, the top of the fiber stack. You could then calculate the entire data structure - don’t have a good idea of how EH is implemented in the optimal way

BT: I’m probably going to do a similar thing; when you detect the boundary frame, then you know which fiber you’re on.

Adam: What do you do about the overflow of shadow stacks in LLVM?

FE: I don’t know. In the long term we want to implement stack awareness into clang/llvm. If you’re in a leaf function, you need to know the size of the stack. During a normal execution right now, you can’t even detect running out of shadow stack space.

SC: All of the stack switching stuff, Asyncify has similar problems

FE: I’m not sure what asyncify does for the shadow stack.

SC: In an emscripten fibers, you explicitly tell it how much shadow stack space you need when you allocate the fiber.

Bag-o-stacks (Thomas Lively)

TL presenting slides

AR: I remember you mentioned that you want to use fat pointers to track linearity, if you deallocate before you retire, how do you check if it's still valid?

TL: Yes, if you do a switch_return and you’re using fat pointers, you can’t eagerly deallocate the stack. But perhaps the GC could use this information as a hint, or it could deallocate other associated things. Or maybe switch_retire is not that beneficial in those circumstances.

BT: You keep the counter going, and you can reclaim it so it can go back in the pool and not be something other than a stack

TL: Yes, that’s true.

RH: Is the typing structural, or do we have a recursion group?

TL: We do have a recursion group, this fits right in with the type system of Wasm GC. iso-recursive types just like in Wasm GC. There is mutual recursion here so they have to be in the same rec group. but we have all the infrastructure for it now.

AR [in chat]: if we’re adding a third form of return, how does it interact with finalisation code on the retired stack, e.g., in the form of (encoded) finally clauses? Will they be executed? If yes, you need this to be an exception; if no, isn’t the feature incompatible with such finalisation patterns?

AR: When you compile finally, switch_retire bypasses it, and any finalization code would fill in

TL: Right, you would not be able to use switch_retire in any situation where your source language requires finalizers. You could either not use switch_retire at all, or just use it after you’ve run any finalizers.

TS: JS call interleaving question: Why is not possible to check lazily when actually returning to ensure JS calls have stack discipline and don’t catch the wrong traps

TL: The scheme I described by construction proves that you can’t violate anything. You could use runtime checks instead, but that would be expensive. The whole motivation for this proposal is to avoid the linear searching.

TS: You could have a stack pointer for the JS stack where you ensure that has been handled.

TL: I think it would involve some linear amount of work because there could be an arbitrary number of JS stacks that would need to be skipped.

HA: Is it possible to switch to another stack, not back to the main stack, and then another stack again - as long as you have a stack whichever one we have, as long as we have a reference to it we can switch to it, how can we specify the type signature for it.

TL: Yes, so in this example with lots of stacks, it’s typeable, you just get a really big mutually recursive group, with as many types as you need.

NF: For systems that aren’t using JS, could there be a nondeterminism thing that allows JS hosts to tighten the behavior to trapping, and other hosts being more permissive?

TL: It’s an open question, don’t have a concrete answer, you could imagine a host adding suspendable frames, a host like Go could maybe allow you to to suspect past those frames, wasm-time with fibers totally fine to suspend past frames, we could make it conservative for specific frames.

YI: If you’re lazily checking when JS frames are popping in, is that similar to checking the top of a frame, similar to returning from wasm to JS for example?

TL: That’s true. At the top of a new stack there’d be a trampoline that would do extra checks. We just have to be mindful of the costs of the various checks.

HA: How severe is the risk that we can’t throw an exception from the secondary, or tertiary stack, do languages allow exceptions being thrown from generated frames?

TL: Yes. The toolchain producer would have to arrange that the first stack would have its own trampoline that would have to catch the exception and manage that itself. The language runtime (producer) would have to have its own function with a catch-all.

HA: Elaborate what the trampoline specifically does, does that catch all live in the main stack?

TL: That would be on the generator stack.

HA: How are you signaling back to the main stack that something unexpected happened without throwing an exception then?

TL: Yes. In the type signature, you would have to be able to signal that kind of information. You might have an integer indicating whether there was an exception, or use a nullable type; the producer would have to have some protocol.

RH: If an uncaught execution makes it all the way to the stack, what happens?

TL: Because there are no unsoundness issues, the trap can just go to the main stack.

LW: Would you say then if you look at that diagram, going from the free-floating stacks back to the main stack, there is a (secret/internal) parent edge, but it’s more explicitly set in WasmFX, the depth is defined.

TL: Yes. There is no way to do anything with that edge except for this trapping mechanism.

LW: If there’s a free floating stack, and there’s a trap, can we go back to the main stack?

TL: Maybe. I think that might be feasible. Would that be useful to producers?

LW: USecase: I’ve taken some wasm, wrapped it in the JS export, from the outside it looks like JS or an ESM, but inside its go or whatever, so I’m stack switching everywhere. At some point i might call out to JS, the glue code calls out to arbitrary JS code. And maybe someone implemented that transitively in terms of wasm nested down the stack. and when I leave the Wasm world, I just want to be in the regular JS world. But i might want to call back into wasm, and the JS entry point was expecting a return value.

TL: that sounds like a Toolchain bug in the same sense that a trap happening would be a toolchain bug; here the toolchain would have to manage the return value

CW: That encapsulation example is the one that I see as an expressivity limitation of this proposal. Because if that arbitrary wasm wants do to its own control flow with BoS, it will start trapping unexpectedly if there’s a transitive JS frame.

LW: I think we can loosen the rules without giving up overall BoS.

CSW: Don’t agree.

FM: The issue coming from TC39 is that there should be no suspended JS frames, we’re not allowed to go back into JS frames, without suspended frames, if you call another function, the world can change underneath you. If you’re using the async notation, you should expect the world to change, but if you don’t have that notation and the world changes, that’s unexpected. There’s not that much difference between the proposals in this regard. The real issue is having suspended JS. There’s not a difference between WasmFX and BoS; it’s a limitation of the relationship between Wasm and JS.

CW: the difference in expressivity is that you can suspend the suffix of the stack that’s just wasm. Conceptually it doesn’t matter whether the suspended stack is on the parent or the child stack, it still violates the TC39 guarantee.

TL: If you try to use BoS with these rules to emulate WasmFX; it works, but there’s a possible interleaving of JS and Wasm frames that WasmFX would allow and BoS would not.

MB: I have a meta question: Applies to this talk and the previous one - for all this linearity, is there a plan to check this statically than a runtime check?

TL: We haven’t discussed that yet. We could, but we haven’t yet.

FM: The pushback is that we’ve got in V8 and the combination of JS & GC folks, if you enable restartable or multishot continuations, you’re also allowing multishot JS.

TL: Yes. We have not yet considered multi-shot or static checking for linearity.

MB: About JS stack discipline, you can implement this using the JS stack completely on the side, you don’t actually need the frames. When JS calls Wasm, it switches to this Wasm world, as long as you maintain the discipline on the JS side, that you should be ok

TL: That would mean also disallowing the pattern where when you call into JS, where it would need to happen on a separate JS stack, and then when you do this suspension it wouldn’t be able to capture the JS frame because it would be on some other stack that you have to maintain

MB: You would switch back to the JS stack to do that call, there would have to be little trampolines to switch back and forth. Switch back first and return in the right order, it would be ok

TL: This would be a pretty big change in the semantics of JS/Wasm interop; a key feature today is that JS and Wasm run on the same stack. That was a major learning in the NaCl days.

BT: Point out to the TC39, you can’t oppose that requirement, you can always interpret that but you can’t forbid it, but the web platform can prevent it, but there’s no way that a programming language can disallow it

TL: Yes.

BT: The next language which is implemented on top of Wasm is going to be in the same situation.

TL: FE mentioned it, the WasmFX proposal has a barrier instruction that can be used for a language to say “I can’t be suspended”

SL: Related, but not to do with the JS interaction: what happens with the bag of stack setup if there’s a trap in one of the coroutines?

TL: Yes. A trap propagates up the stack, and back to the main stack, without giving a reference to the stack that just trapped, so it can’t be switched back to. So it traps back to the point where Wasm was entered.

SL: Essentially everything is killed?

TL: If you re-enter the Wasm after a trap you can still switch to the trapped stack.

SL: Is this essentially the same behavior as WasmFX?

TL: It’s very similar, will trap all the way to the stack. If you have multiple stacks, the top ones and bottom ones we go away, but the middle ones will stay, in WasmFX, all of it go away

AR: Interaction with exceptions: If you wanted to emulate WasmFX using this proposal, you would need the ability to somehow rethrow on a different stack.

TL: Yes, or send it explicitly. My last slide as additional potential instructions which are relevant here. Switch_throw does exactly what you said: switches to another stack and throws on it.

LW: Even though there’s a symmetry in the switch, because of the parent rule (where traps propagate to): there is an asymmetry, when I’m on the main stack and I switch to a side stack I set the parent to me. And when I’m on a side stack and I switch to another stack I give it my parents. So what does it mean when I’m off the main stack, to call into JS and they take me back on the main stack. Does it matter what physical stack I’m on? The meaningful thing is I’m semantically back on the main stack which means if JS cals back into wasm it’s gonna set the parent instead of inheriting the parent or whatever. Sot that seems like the fundamental difference.

TL: In the past we’ve discussed a stack.link instruction which would set up the parent-child relationships.

LW: Instead of trapping if you made it not work, it would still work that way

CW: the thing you’re identifying is how you’ve been talking about how BoS would emulate typed continuations. BTW TC would emulate BoS exactly as you’re describing. You have an explicit runtime top level and all of the stacks that appear unconnected in BoS are connected just directly to that parent and go between them.

LW: You can think of restricted use of WasmFX, and thereby avoids various edge cases.

TS: I don’t think we should treat TC39 input as limiting how we implement the semantics they want to see: the stack discipline of JS stacks. As long as we keep that linear, whatever we do in between is unobservable. A JS/Wasm engine might implement this using stack switching under the hood, so all we have to do is preserve the discipline for JS.

On the growing complexity of WebAssembly (Andreas Rossberg)

AR presenting. slides: https://github.com/WebAssembly/meetings/blob/main/main/2024/presentations/2024-06-06-rossberg-complexity.pdf

  • Wasm 2.0 roughly tripled the number of “types”
  • Wasm 3.0 further expanded the number of types from the GC proposal and increased the complexity
  • Structural complexity often accidental, due to restrictions - for example, making i8 and i16 first-class numeric types would simplify the type hierarchy
  • SIMD introduced many instructions that were not symmetric across vector types
    • Simd has been painful to keep up to date and maintain as a spec editor
  • Store is a good example of what to follow, whereas as the spec progresses store is supported by more types.
  • Possible lessons:
    • Fewer instructions != simpler. Leaving out a one-off is even worse than adding a one-off.
    • Local simplicity can imply global complexity
    • Take coherence into account, not just how many opcodes. Use case only design can create a complexity mess
    • Resist overly cutting-edge features. CPUs have in some cases not converged on behavior. Only start a new row/column in the feature matrix if there is a plan to fill it in the foreseeable future.
    • Don’t introduce feature gaps that nobody owns. If we need a hole, at least have a plan to fill it.

Deepti: Mostly agree. For a little more context when SIMD was started it was symmetric, but then with guidance from the CG, holes were added since the instructions were not portable. This caused implementation complexity but also pushed the complexity to downstream users. It’s not just design complexity across the spec, but across all layers. Sometimes, letting the engine generate suboptimal code in some cases is actually faster than trying to do special-case emulation. So I think I wish that we had been able to push for different decisions there.

AR: Yes, I feel that. In a way this is just a paper implementation, it’s expected that this kind of thing would show up elsewhere.

Deepti: The only thing I disagree with is resisting cutting edge features. If you call wasm an abstraction over the CPU, it should be able to expose what the CPU does, and that’s not always consistent. When being over conservative we don’t expose the capabilities and we see performance fall behind current native performance and can hinder adoption.

AR: Right, certainly these are tradeoffs. It makes sense, and I consciously put “overly” cutting edge here, which is of course a value judgment on what counts as “overly”.

Deepti: I wanted to point out the trade offs.

RH: With implementor hat on: mostly agree, but one example is memory, they can be shared, i32, i64, etc. I like the symmetry. But when you start combining with other things, implementation complexity goes up. Memory copy takes two memory types, memories are implemented differently, all of a sudden you need to copy from a shared i32 memory to a non-shared i64 memory. All compilers have to support that…leads to a lot of extra work. I don’t have a clear takeaway, but even if you can create a new row/column and plan to fill it, there is a cost to that. In the copy example - I don’t know if anyone will copy from an i32 non-shared to shared i64, now that’s a fuzzing territory. I think symmetry is nice, but that’s another cost.

AR: That’s a fair point. In that particular case I would say that’s a sub matrix. The overall matrix would be complete if you allow copying memories between the same type. There’s often nuance to this.

BT: Generally agree, don’t like holes in the matrix either. Past examples like multi value and multi memory have been effectively tech debt, not only in the sense of the spec but the ecosystem. There are still a few engines out there that don’t implement multivalue. Multi memory is more useful than we realized because of modules and instrumentation. Took on technical debt to ship things early, now debt has compounded, paying them off. Every time we do this, we’re taking on some debt - the sooner we pay it off, the better.

AR: Yes and also what Ryan said about these complexities. On the flip side, if there are irregularities there, it can create complexity as well and they multiply to create more than just when you look at it locally.

David S: Agree with what’s on the slide. In C++, which has been evolving over 40 years, Bjarne’s early work has some of these points. Extremely difficult to push for coherence - people will want their special thing, want it sooner. Extremely important part of the standard body to have people like yourself who give that long-term view. When you lose touch with coherence, try to be cutting edge, ultimately you hurt future adoption because of the complexity. That’s where C++ is at right now, so I fully support what you’re saying here.

CW: Additional point about resisting cutting edge. Wasm has strong compatibility guarantee - put a feature in the language, keep supporting it for all time. Some discussion around ambitious simd features, some features are advertised as upcoming, we try to support now. But we see examples of CPUs adding a feature and removing it in the next CPU line. Wasm remains stuck with them forever. Another reason to be careful about cutting-edge features.

Derek: Also on CPU features, it’s also been a historical design goal to reflect what’s on CPU with what’s in WASM. On CPUs there’s holes in features, the gaps are a pain for compiler authors. We’ve sometimes said that we want performance portability (although that has sometimes been controversial); but both of these ideas run counter to the simplicity argument of filling the matrices.

CW: When we want to reflect what’s happening in CPUs, what we need is what’s in the intersection of CPUs.

Derek: Intersection has the same weird holes.

CW: Yes, simd has holes, that’s a better reason for holes than Intel having some feature that no one else has. Intersection is a good reason for holes. Unfortunate from a spec point of view, but less scary than cutting edge features.

AR: What I will say here is, one reason I’m interested in WASM is the nice abstraction across messy CPU instruction sets. I don’t want to deal with some messy instruction set that someone at Intel just cooked up in their head for reasons I don’t understand. We don’t want to make the abstraction too leaky and make wasm inherit these problems.

DD: I think this is a conflict of interest between two similar, overlapping interpretations of wasm. Is wasm its own type of CPU or an interface to the CPU you have? From a performant interface perspective, a lot of these make sense - compromise approach to sporadic support. Implicit goal to get the most out of your hardware. That's counter to the idea of wasm as its own CPU - incompatible with performance portability. Even if you design an ISA with the idea of performance on a wide amount of hardware, performance isn’t there unless it’s on all hardware. Always a sacrifice. Don’t mean to put myself or Apple behind either camp, but very complicated situation, and it shouldn’t be understated that past proposals like SIMD fly in the face of portability and wasm being a standalone CPU. Has very real and seemingly arbitrary designs based on design of Intel and ARM CPUs. But is an incredible value-add. Anyway, we can’t really have both.

Mattias: I basically want to continue along the same line of reasoning. Trying to look at existing CPUs and arch, trying to support these features in wasm. As a compiler writer I must choose which architecture will be fast. I’d like to step back and go to a higher level of abstraction. I can see that this is very hard and especially with SIMD. Ideally we’d be able to convey the higher level algorithm and let the engine figure out how to create these instructions. Same way we have loop in wasm, it’s a higher level feature that cpus don’t have.

AR: Disagree: a loop is just a label for a backedge.

Mattias: It forces structured control flow.

AR: Yes, it restricts things, but in terms of semantics it’s not a particularly weird feature.

Mattias: That was maybe a bad example. Trying to convey the idea that we’re trying to be high level and low level can be bad. If we try to go too low level, we make a choice at compile time.

AR: I have often used the slogan: “as low level as possible but no lower”. Of course open to interpretation, but definitely things where we are higher level than a CPU, and a function is a clear example. Sometimes we have to go to a high level, but usually want to stay as low as you possibly can.

Petr: Can we go on a deeper dive on this, maybe in CG meeting?

DS: Yes.

Petr: Yes, would like to discuss tools and what solutions are.

Large Wasm binaries from desktop application migrations (David Sankel)

DVS presenting slides. (I am referring to him as DVS to be distinct from Derek.)

  • Many adobe products have been moved to Wasm.
  • Problem: big wasm binaries, long loading times.
  • Current solutions of splitting binaries is hard and not transparent.
  • Hypothetical (?) “virtual memory” approach - delivering chunks of code as necessary
  • Not sure how

YI: Can kind of do this today, if you are willing to do some far-fetched stuff. CheerpX is similar, we have an image with binaries on HTTP server, you download whatever you need page by page. We have JIT compiler that compiles into wasm code. Easier here, don’t need to start with arbitrary machine code, could have wasm on your server. JIT compile what you need and use it. Probably not as fast as a monolith, but I think you could prototype something today that does that.

DG: Curious how you feel about latency, responsiveness of your app. if you need to go back and forth, what code you’re downloading, seems like you’re spreading the load time across the app by hurting responsiveness.

DVS: Well, right now you download the whole thing. The idea is to run while you’re downloading. If you have a fast internet connection it doesn’t really matter. It’s really for those with slow connections; can they start using the application?

Chris: This scenario we have on embedded, want to pack functionality into tiny space. Concept of dynamically provisioning, or loading, e.g. HMI but don’t want web interface the whole time, we have that scenario. So you switch modes, load and unload content dynamically. Yes there is a cost but it’s used so infrequently that it’s not used often. Works perfectly well for us. Doing it at module level might be a good way to split, AI could help?

Dvs: Refactoring a 30 year old code base into independent modules is not a scalable solution.

Chris: How would you handle the feature latency / colocation issue?

DVS: Collect data on which functionality is used most frequently and most frequently together, output chunks that are more coherent.

Petr: Wasm doesn’t really have an instruction stream per se, you don’t fetch and execute, you get a module and compile a module. There is per-function streaming compilation, but maybe that’s a way to enhance it, the streaming compilation. Related, if you think about prefetching instructions in regular CPUs: if you are a compiler engineer, or working on benchmark tuning and trying to use those, you will be debugging prefetch issues, you probably don’t want that same issue over the network. Want to be careful not to introduce something more fragile than the original approach.

Till: Highly skeptical of anything like this could work. The spec would need to change. Validation needs to happen before compilation. But suppose we did that. With this model if you don’t make coarse decisions, you change how all interactions with the applications happen. Imagine if the user goes online/offline. You have to be prepared for any part of your application to be indefinitely suspended. How do you present that?

DVS: Not a new problem for web applications.

Till: But it can happen at function granularity. Need to be able to present UI on suspend! Seems like a fundamental UX problem.

CW: Drill down a little. Talked about using wasm split already. What are the limitation with wasm split? You mentioned threads were a problem because of dynamic loading and having a new instance in every worker. Are there other things that cause problems with this?

DVS: There are all kinds of issues, threads are one. Other big one is wasm-split had too many remote functions defined (too many imports). Also it’s very intrusive. Needs to be done for every product.

CW: My first line thought is how can we make that experience better, before adding more feature to Wasm. Shared everything threads, increase the limit on the number of allowed imports. That would be much easier and faster than a big blue-sky solution. I’d be interested in streamlining the wasm-split experience for the shorter term.

TL: Without Shared everything threads, every thread needs to load everything separately, it’s terrible (and some of these apps have tons of threads). On Till’s point, UI interaction, really nice when you can make this transparent, but when your user goes offline and execution catches up, what do you do? Need to react, show UI, and more complex when you can’t block the main thread - can busy loop but you make the browser mad; you can use JSPI but now your whole JS interface is async.

CW: Does pause fix this?

TL: Just a more efficient version of busy looping, browser will still be angry. Could move the entire application off main thread, use sync XHR to fetch, but that also sounds hard. No easy solution - need to figure out individual problems and see how to make them better. Seems a lot harder to fix the spec since these issues are so fundamental.

Brendan Dahl: Using a page from Adobe's playbook, added paging support for PDFs. You can do range requests, get more data as you go, there are ways of doing this, but much harder with wasm.

HA: Do we need to download the whole program every time you open the tab, or can you use a cached version?

DVS: Pretty sure caching works, but from business perspective, new user success is the bigger thing. Most users who have a bad first try give up, they don’t try it again.

AK: Very familiar with the problem, and sympathetic, but this approach doesn’t make sense with the web; and even in the environments you’re talking about. If you don’t have a good connection, then using virtual memory as an analogy seems crazy. If you have a tiny connection, you start, but now need to download unexpected things while you run, not to mention main thread and blocking questions. Not sure if This virtual memory framing helps. The UX problems seem bigger than the technical problems. Curious if you have a response to that.

DVS: Not concerned about user experience problems there, since current experience is so bad. If connection is so bad they can’t catch up, they can’t get into the application while it’s downloading, there’s already people we can’t get on board.

AK: Another example, when google maps moved to vector drawing. The vector drawing was much slower. They used a whole separate mini-application for slow connections that runs while the big app loads.

DVS: There are mitigations yes, but I’m trying to look ahead and see how to solve this problem in a sweeping way. Want to solve it across applications.

AK: Don’t think this framing is pointed in the right direction - user is interacting with the thing while you are trying to fetch code and execute it. User won’t be able to use it if it’s frozen or if it’s loading at the start.

Peter: When you went from the desktop version of app to web version. Implementation was basically the same. ?? Maybe some lesson can be drawn from that example, how to split it up.

CW: Previously commented about short-term thing being improving wasm-split. But want to speak about blue-sky ideas. Consistently binary size is an issue, this is an example. GC for example. Haven’t done a large look at encoding changes to make binary size smaller. Second blue sky thing is: wasm isn’t so far away from being able to start execution while streaming compilation. Doesn’t work right now because data section comes after code section - but if it were otherwise, there’s not much technically that would prevent that. Could imagine a world where all the hottest functions are at the start.

Till: Now have all the same issues - async and all that.

CW: Sure, but can’t dismiss it outright. Will be a time when that’s worth thinking about.

Alon Zakai (chat): +1 for Conrad's point. There is a natural way to consider streaming execution here, at least in theory.

DS: One challenge: limitations you’re facing, some from wasm, many from web platform. Threads are a disaster because of web, async a disaster because of web. Downloading in the first place is because of web. Primitives for managing that (e.g. service workers) are part of the web. Historically a big challenge, e.g. no C++ app written for a platform like windows is prepared for the web’s event loop. I like that you’re doing this, and it’s also worth find other web standards people and hit them over the head too :) to get a comprehensive solution. One question: you said you want to solve it systematically rather than per-application, but do you not consider tooling-based solutions to be that?

DVS: Dunno. Opposite of what I’m looking for is to have refactor everything into small pieces manually. Ideally we take binary, flip a flag, and have it work.

BT: Thank you for bringing the use case to use. Don’t interpret the hostility in the room as not interesting. This feels more like a toolchain issue and would like to look more into that. I don’t really see a wasm feature that would improve this, other than maybe the relaxed section order CW referred to. I like Adam’s idea of emulating all of Photoshop… but we can’t just recommend that you rewrite all of photoshop-lite.

Till: There isn’t anything special about Web here actually. One social experiment you could do, go to photoshop desktop and ask them how they feel about photoshop for mac starting to execute as soon as the first instructions are there, then loading on a per-function level. What would they think about UX? Pretty sure they would not consider it viable. Don’t see why it would be viable for web.

DVS: The comments regarding this is automatically going to make your user experience worse. I think this is a red herring. The loading time already is making users not want to use the product. If we can make the apparent loading time shorter, even via sneaky means, will be more viable than the current status quo.

Till: What I think I and others are saying is that I just don’t think that’s really the case; you’re trading loading time for massive jitter and jankiness of the experience. For the first ~hours the user keeps touching new bits of the code that haven’t been downloaded.

DVS: Perhaps a misunderstanding of time - we’re talking 5-10 seconds loading time. If they have to wait 10 seconds, that’s enough to discourage a significant percentage of users. They are used to very fast loading times on web.

Deepti: Something concrete - when Derek was talking about restrictions, there is effort about making LLMs work well in browsers, similar server->client interaction. I do think there are efforts this could piggyback on in other areas of web platform. Concrete nice-to-have: what tooling do you need to do this kind of instrumentation? Refactoring apparently impossible, but what info do you need from tooling?

DVS: Automatic refactoring fine, but manually understanding codebase is not on the table.

HA: People are discouraged because they expect the loading time is going to be the same the second time they load the page. If we can let them know subsequent loads are going to be faster, maybe that will make them happier to use the web version. It's similar to taking forever to install the application the first time.

DVS: You want a “this is slow only once” dialog?

Chris: I just wanted to feel guilty about bringing up UX. Anyone designing UX, most UX performance is actually perception than real performance. What's the difference between real and perceived load speed? My question is how do you know which bits of code will be used? There are tools and automatic refactorings for that. There is low hanging fruit. This is a good question to answer for everyone.

AR: I just want to mirror and Ben and Deepti said, it should be realistic to have tool support for this. You can actually always split up modules, you can import/export this stuff. There are only two things you need, you need a profiler, and then you just need a tool that takes a module and splits it up. Then you need infrastructure through JS to link the modules in the fly once they are there.

TL: You’re right and this exact functionality has been implemented in the wasm-split tool in Binaryen for years. There’s actually just all these other fundamental problems on the web around threads and async that make this a problem. me AR: So it is a web-specific problem after all!

Mendy: I’ve looked at similar problems at my company. There were some low hanging fruit in our case, would be happy to talk about them.

Shared-everything threads (Thomas Lively)

TL presenting slides

  • TL: have had shared memories for years on the web, lots of holes left on other things that could be shared, this proposal makes everything else shared
  • TL: expect that some things might advance faster than others, might split up the proposal as time goes on
  • TL:
  • Everything can be marked shared (functions, tables, etc.)
  • If no annotations: share externref across threads and access JS object across threads; not currently allowed, JS/Web folks don’t like it
  • Add shared annotations to types
  • Thread local storage
  • Today: globals are all thread local since they are in different instances
  • If we have everything in the same instance, now we (maybe) need thread local globals
  • Waitqueue: implement condition variables and locks, needed for Wasm GC, has an i32 flag you can get set, wait on
  • JS API:
    • Shared memories aren’t a different type in JS from a non-shared memory today. We might want to change that, and JS engines are maybe getting shared objects as well. Would remove a level of wrapping.
    • ThreadBoundData: a shared JS object exposed to Wasm as a (shared externref), has a get method to get the underlying JS object, throws an exception if not on the thread that the JS object is associated with.
    • CW: this has implications on GC implementation, and being able to root the JS object from a different thread
    • TL: we will get to that in an upcoming slide
    • ThreadLocalFunction: allow a shared function to call a non-shared JS function
  • Pause instruction, to make spinning for locks efficient
  • New atomic accessors: atomic global.set, global.get, etc.
  • Acquire and Release atomics: today only SeqCst, borrowing C++ semantics for the new orderings
  • Component Model Builtins: no core wasm instruction for spawning threads, adding CM builtins for - creating new threads, Web will have its own thing
  • Prototyping status: V8 and wasm-tools prototypes underway, binaryen soon

SYG: a lot of V8 prototyping behind same flags as shared JS object prototyping

CW: I’m a little cooler on thread-local globals as we have them now, and have heard some pushback; it’s not clear that engines will be happy to implement, we might need to do more prototyping at the design level. and was just curious about how much of the design is being prototyped in V8 for those so far?

SYG: GC design in V8 does not yet support the cross-heap cycles that TLS would permit, but it should be possible

CW: Does that assessment of complexity include stuff around how initializers run for TLS blocks?

SYG: That is not included in the current prototyping work. Just how to collect it.

RH: What happens if I set a prototype on a shared JS object?

SYG: prototypes on shared objects are thread- or realm-local. Similar magic to primitives where we look up per-thread prototypes. Because at a deep level JS and Web prototypes are not shareable but we want to allow calling their methods, so we use thread-local prototypes. Shared objects will have fixed layouts to make it work for them.

CW: For per-realm ideas, these are already ideas that have come up in the shared structs proposal for JS, so these are not new ideas for Wasm.

SYG: that is correct

TL: open questions:

  1. Can thread-local globals be fast enough? Probably not.
    • Cross instance calls: need to find callee instance’s TLS block, details remain about what to do when new threads are created, etc; TL hopes it’s not too bad, but we will see

RH: also the interaction with stack switching

TL: that is the third open question 2. How strong should shared-to-unshared references be? - How much does the GC have to be aware of and handle these? - How does the application handle these edges if the GC/type system doesn’t allow it? - See the slides for further considerations

Mattias Blume: Concerning the wrapper that holds onto the unshared thing: Why would the non-owning thread need it?

TL: non-owning thread cannot deref it, needed for sending values around and then passing it back to the owning thread

CW: The purpose of the Weak reference semantics is if you have a JS object you need to bring into shared code and you know from your app that it will only be used on one thread, but the type system still prevents you from bringing it in, then this is a mechanism for getting to it. Even if it’s several layers of references removed you can transiently bring in the JS objects. It does make more sense for e.g. functions and imports than random data.

NF: It’s easy for now to be thinking about the JS interaction, but in Wasm semantics we shouldn’t ever create these shared->unshared edges, and we should be explicit that we’re talking about a JS issue.

TL: Yes, this issue is specific to the JS API. So far, the applications we have are using a lot of the deep JS integration. But we’ll probably have to face this problem in non-JS embeddings in the future.

  1. How should threads interact with stack switching?
    • Shared continuations allows for work stealing
    • Continuations capture function parameters; therefore cannot have non-shared parameters in a shared continuation when it suspends
    • See slides for details of what kinds of frames can have calls to what other kinds of frames

CW: we have been trying to be forward compatible with stack switching and kinds of calls, but cannot solve everything yet

TL: Should shared vs shared-suspendable be different types? Open questions.

AR: Can you reiterate why thread creation was removed?

TL: So on the Web, we have Web Workers, and we want to keep it that way, so for now we want to keep the property that the only way to create a thread is to create a new Worker.

AK: workers work today, but we see a future where we might want to add more lightweight thread creation on the web, but all this is not necessary for taking advantage of shared wasm gc. Browser people don’t think it is necessary to add a thread creation primitive to this already-big proposal. Also don’t want to spec thread creation as web worker creation and cut off future improvements with lighter weight threading.

TL: An additional problem with thread creation in Wasm is that if Wasm creates a new thread, there’s no JS context. This might be solvable, perhaps via proxying, but our experience in tools is that proxying isn’t great.

AR: sound like this is all specific to JS/Web, unfortunate that Wasm is held hostage to that. Very natural to want threads in the core language.

AK: This is a shared community. We thought strings might work well in the language, and people outside the web said, “we don’t want those string things in our nice pure wasm that’s not on the web.” Proposals like this are naturally going to reflect compromises between different parts. It didn’t sound like having a thread creation was a strict requirement. And we can keep the door open to adding a thread creation instruction in the future.

BT: I don’t buy that argument. From the beginning, people have always been asking for a thread creation primitive. Why is there not a way to do that in the bytecode?

CW: If you had a function to import, would people still be complaining about the lack of a bytecode?

AK: Kind of the same reason we didn’t do it in the original threads/atomics proposal, it still isn’t necessary.

TS: From an outside the Web perspective, I would have preferred for there to be a bytecode for creating threads, and I still hope we get there. I also do see how there is no consensus for how to do this both outside the Web and on the Web, so TL’s proposal is the best compromise we can get for now. We have a way in the CM to do this as one way to do it, and there are nonstandard ways. Extremely important to keep Web and CM aligned so that we are forward compatible with an eventual bytecode that works for both.

MB: What do architectures support? No hardware architectures have a create-thread instruction. There’s a scheduler, an OS, lots of machinery. Everyone does it by calling into a host environment. It’s not a trivial hardware-like operation.

TS: That also applies to Wasm GC, stack switching, and many other things.

CW: RH, can you talk about thread local globals?

RH: Have discussed that a lot in issues, and we don’t have a lot of time right now, so redirecting with a new question: What are your thoughts on how to proceed on the issue?

TL: Yes, we’ve had a lot of discussions and they’ve been productive, and they’ve stopped. So we are in a position to wait until we have a prototype and some performance numbers.

RH: Are you interested in implementing a different approach so that you can compare them? Especially if the first implementation is obviously slow?

TL: If the first implementation is fast, maybe that’s good enough?

LW: numbers will help with “can tls be fast?” it won’t help with the shared-to-unshared edges discussion.

TL, RH: Agreed.

RH: It’s not just a big project in SpiderMonkey, but also in Gecko, to get right, and that’s concerning.

TL: We need to wait for more implementor feedback. What do they need, what can they get away with? Evaluate the tradeoff in light of that data.

RH: Toolchains are going to want the stronger semantics.

TL: We can push back and ask for feedback; “what if you can’t have the stronger semantics?”

Memory64 (Sam Clegg)

SC presenting slides

SC: phase 3 for 2 years, tried to be ready for phase 4 for this meeting, but didn’t quite make it, very close. Mostly because we chose to add table64 for completeness and symmetry with memories.

(see slides for details)

  • Table instructions can take i64 indices now as well
  • Browsers have WIP implementations of table64, toolchains are all done
  • Final updates to the spec text are also mostly done. One wrinkle is rebasing to the wasm 3.0 branch in the spec, which was nontrivial because of the addition of multi-memory.
  • Hoping to vote for phase 4 at the next CG meeting

WASI / component model (slides) (Luke Wagner)

  • Preview 2
    • Stable point for toolchains and consumers to target
    • Goal: gather usage and feedback
    • Includes:
      • Component model: language-agnostic, composable packaging format
      • WASI: standardized APIs defined on top of CM
    • Component model:
      • High-level language-agnostic types at component boundaries
      • Records, lists, strings, etc…
      • Resources and handles
      • Human-friendly IDL called “WIT”
      • Linking: both shared-everything and shared-nothing
    • WASI interfaces: clocks, random, filesystem, etc…
    • Related standard: CNCF defining wasm OCI artifacts. No changed semantics.

TL: can you give an example of what kind of semantics they might’ve wanted to add here?

LW: configuration, filesystem images, environment variables, ambient authority, things we’d like to represent in the Wasm instead of in the container metadata

JS: is splitting single file wasms into multiple layers only for components or also for core modules?

LW: If we added data imports, they would be a splittable thing. Though we can get the same effect by having data segments in separate modules.

  • Implementation status
    • Producer toolchains: rust, js, python, go, etc… more in progress
    • Consumers: Wasmtime, jco, VSCode, wasm_component_layer

SC: Does wasm_component_layer use wasm-c-api?

LW: I don’t know, I think they defined a Rust trait that needs a little bit of adaptation per-engine

LW: wasm-edge has a work-in-progress implementation and tracking issue they are burning down

Michael Yuan: We’ve been implementing CM features for a while, will soon deliver to some of the downstream use cases. One in particular is the HTTP module.

LW: SIG Embedded is exciting both in itself and as a proof point for scaling components down to tiny environments.

LW: We have found that standards open the door to upstreaming in various language toolchains, avoiding forks and proprietary SDKs.

PP: can you talk about long-term support for preview 1?

LW: I have an upcoming slide about that

LW: preview 2 shipped just after 2024 instead of planned just before 2024, expect subsequent preview releases to be pushed back accordingly to ensure we get usage and feedback. Plan on doing semver minor/patch releases. Lots of work on versioning and tooling for versioning since January.

LW: continued support for preview 1, don’t want to cut them off from toolchains. Exploring 2-to-1 adapter.

LW: importantly, preview 1 is frozen. But what if I have a core wasm engine, and haven’t implemented components yet, how can I use the new preview 2 APIs? Plan to add a core module target to preview 2 that uses the canonical ABI version of the component APIs.

(Throwback to snowman bindings of yesteryear; chuckles)

LW: if I can use core modules, why even have components? Components unlock linking features. Also easy to wrap one of these core modules into a component.

CW: these core only runtimes that aren’t implementing component model yet will implement the APIs but using core module ABIs?

LW: Yes. This is similar to what engines have been doing in WASIp1; the ABI is different because it is mechanically generated from the WIT and the canonical ABI, but it still is just an ABI at the core wasm level.

TS: I got the feedback recently that the new p2 APIs are nicer for engines to implement even when they don’t have full component model support because they are more regular.

TS: This is all focused on p1 engines that want a gradual adoption path. At Fermyon with Spin, we didn’t want to have both a p1 and p2 path, so we already switch to a system where we wrapped customer core p1 modules in an adapter into components that use p2 APIs. Wasmtime’s WASIp1 implementation is now a wrapper around its WASIp2 implementation, which proves that WASIp2 is covering the same functionality.

PP: How does the translation from from P2 to core wasm actually work? Do you need a lot of component built ins or a library, or what?

LW: the two targets share 99% of everything, even guest bindings, it's just the final step in the toolchain pipeline. And the APIs can coexist side by side.

AC: the fundamental unit before we create a wasm component is a wasm module. For example rust will import core wasm APIs at the canonical ABI level and then package it up into a component. But the package already has both P1 and P2 stuff in it, and the adapter is packaged with the component.

SC: Is there a “component lower” which is the inverse of “component new”?

LW: If you have a component continuing a single core module, then it’s straightforward to extract that core module.

Chris Woods: embedded has shipped millions of devices and need to maintain them, tweaks to wasm semantics in WAMR to run in embedded environments. Diving into canonical ABI and how it works, making sure the P1 toolchain keeps working, hope to collaborate to make sure it works for embedded. Excited for this direction.

LW: Preview 3 will add async, streaming ABIs, composable concurrency, etc… (see slides for details)

TL: how can you avoid function coloring without core wasm stack switching? Is the host doing stack switching, like in JSPI?

LW: Yes, someone’s doing it, but it’s not in core Wasm, similar to JSPI. And a polyfill running in a browser would use JSPI.

LW: preview 3 is only adding new capabilities, will be compatible with preview 2. Aiming for smooth transition path.

CW: do you foresee any breaking changes after preview 3?

LW: We’ll have to at least change the version word in the magic cookie, similar to what core Wasm did pre-MVP.

CW: Are you expecting any of the 0.x releases to make breaking changes?

LW: I don’t see any need for it currently.

MatsB: can you elaborate on composable concurrency?

LW: hard to go into details off the cuff, mostly letting different languages wait on each other. But it is possible to compose e.g. Async JS, Python. Trickier is synchronous C functions and epoll. But it is possible.

MatsB: Is there a thread pool in the background?

LW: We decouple parallelism from concurrency. We don’t have any thread pools or any dependency on threads in the async model.

SC: do you expect to propose the component model as a separate vote prior to its 1.0 separate from WASI?

LW: Once we’re ready to advance phases, with a complete proposal, we’ll go through the regular phase process.

LW: component model is all compute, makes sense in core Wasm, and is now officially included in the official wasm WG charter. WASI however is focused on IO and perhaps deserves a whole new W3C CG+WG pair. Chatted with our w3c liaison, received positive feedback. (see slides for details)

CW: does 1.0 for WASI and the component model need to happen at the same time?

LW: We’re currently grouping them together, but once we get to 1.0-ish timeframe, they can decouple.

CW: are custom adapters still planned?

LW: Custom adapters are some of the fanciest parts of the original plan, but it’s not turned out to be urgent for the current use cases. We still expect to do it in the future, but it’s probably pretty far away at this point.

TS: at least at the API level, adapters are an implementation detail. WASI is completely separate, mostly would be a concern of toolchains and engines, not standards or applications.

LW: The de-facto kind of covers that, or perhaps we can talk about it as intersections of de-facto things, like Windows and POSIX to derive the filesystem API.

FM: a new WG would be appropriate. Also think about non-w3c perhaps or an independent organization.

TS: I helped set up the Bytecode Alliance, and we very much do not want to set up a standards venue. WASI had already been announced, but our question was, are we a standards venue, and we decided no, because a standards venue is a very different thing to run, with different legal and process considerations. And, maintaining relationships with other standards bodies is expensive for participants in standards processes. Many participants here already have relationships with the W3C. So I strongly favor keeping this within the W3C.

RH: would 1.0 component model mean going to phase 4?

LW: yes

RH: so web implementations, how do you define the JS value equivalents of CM types?

LW: The way jco transpile works is a model of how we expect a future JS API for components to work. Credit to Guy Bedford who designed and implemented much of this.

CW: the ESM integration proposal did set precedent for polyfills counting as implementations, fwiw.

LW: we have a long time before we have to answer that question

PP: WASIp1 is on millions of devices, WASIp2 runs on various devices. In the Wasm CG we say “two browser impls” for phase 4 acceptance. But what about for wasi where there may not be pure WASI in the browser?

LW: split between component model and WASI is key, WASI probably wouldn’t require web implementations, but it may make sense in some cases. CM still would, as part of wasm WG.

ChrisW: Breaking out the binding mechanism makes sense. I could see in the future all Wasm APIs having the binding mechanism factored out.

LW: canonical ABI variant for wasm gc

ChrisW: where does it make sense to stop standardizing interfaces in wasi? There will be very specialized interfaces like I2C, USB, physical features that not everything will support.

LW: To be a good WASI interface, it’s not necessary to be implemented natively. For example, in HTTP, many hosts already have HTTP implemented natively, and forcing guests to go through sockets would create more work and less flexibility. But if you’re on a host that has sockets, then it makes sense to let guests talk to sockets.

ChrisW: what about standardizing the libssl interface?

LW: As a general principle, it’s a good idea to avoid standardizing an interface for a particular implementation. Not all interfaces need to be standardized, users can use their own interfaces and mix them with WASI interfaces as desired.

Closure

DS: Thank you all for coming!