Ryan Seddon: So how does the browser actually render a website | JSConf EU 2015


Ok. So, this is quite a large subject to talk
about in 25 minutes, but I’ll do my best to cover what happens when you get the HTML and
it how it comes and pints to your page, you will understand some of the terms of what
is happening behind the screens. So thank you for having me in Berlin, it’s
been a beautiful city I’ve been exploring it for the last week, it’s been really great
to be here. As you were told I’m Ryan, I’m team leader
of the Zen desk in Melbourne Australia, where I work on lots of finance stuff, I’m really
interested in how the browser works on the front end side of things. Just a high level view of what we will cover.
We’ll look at sort of the high level view of it, it comes to be, how the HTML is passed,
how it gets through the different processes in the browser, we’ll dig into each section
and along the way we will have performance insights, so performance things are recommended,
that can be explained as what the browser is actually doing, as it’s constructing the
website. As I said this is a kind of like a 30,000
foot view, I’ve got 25 minutes to cover a really, really in depth subject so I’ll do
my best. So, the browser. It’s probably one of the
most complex applications that we use, I use it for 90% of my interaction with the computer,
like I’ve had my text editor or browser open and I use all my applications through the
browser, so it’s a pretty prevalent application and a really complex thing. So looking at, sort of like the components
that make up a browser, for the rendering stuff, the binding, a lot of operating system
stuff, so when it talks to the network, it will use certain APIs depending on the operating
system, Mac or windows, then the rendering part, that’s about actually constructing the
website from the HTML that gets sent to it, platform stuff that’s dependent on windows
or OS x, there are different things between operating systems, then JavaScript virtual
machine. I’ll only cover this bit, the rendering the parsing of the HTML, how it lays out the
page and pints to it the screen and you get the final result of the actual website you
are working on. If you look at the high level flow, we have
got sort of two processes, one will parse the HTML, one the CSS, then combined to the
render or frame tree depending on which browser you look at it, essentially a data structure
of the two bits put together, that will then layout the render tree, it will put it when
you have absolute position it will know to do all of that and then painting is the operation
of drawing the graphics and giving you the visual output. If we go to parsing, there are a few things
about parsing for HTML that are a bit different to most languages, it’s very for giving by
nature you can make a lot of mistakes and it will just work, that’s one of the great
things about web development, you can make a lot of mistakes and it will still work for
you, that means that the parsing isn’t straightforward, in most languages if you make a mistake or
throw an error, it will error out, HTML will try to recover. It can be halted, so we will
go into when it can stop parsing in certain situations, wilt do speculative parsing, that’s
kind of a big word, will go into what that means soon and re-entering, that probably
means nothing to you right now, we’ll explain what that means as we go. There was a stage when we did try to do strict
parsing, but if you made a mistake you got an error page, that was a pretty bad experience,
if you put something in production that had a mistake that is what your visitor received,
it will take over your whole page and say there is an error in the HTML, you can’t always
control the HTML rendered on your page if you have dynamic content et cetera. So if we look at valid HTML, this is validation,
you can see I’m missing the HTML tag, the head tag, I’m not even closing my tags, I’m
not even putting quotes around my attributes, so the parse has to handle this, you see what
it actually outputs in the end once it’s taken that in, will be that, so it will implicitly
insert the HTML in the head tags for you, close the tags and put quotes round the attributes
and handle the paragraph and the div, if you see that you would think the div should be
inside the paragraph, but it knows for the rules of parsing that it actually can’t do
that and it, it has to handle that and figure it out for you it’s pretty for giving in the
way it handles it. So the parsing flow, these are terms common
to, if you have learnt about parsing or not, so there is tokenisation, that takes the text
and turns it into what are called tokens, that will create the parse tree, that will
then create the DOM tree which we all interact with JavaScript, that’s the thing you look
at in the browser when you are creating or querying the DOM, that’s what you are looking
at. We’ll just, we won’t going to the script execution bit, we’ll cover that later. So parsing, as each character comes in through
the token, it will have special cases, an open tag, it will read the next bunch of characters
and then say that’s the tag name, look for a closing tag and create a token called ‘Start
tag’, then on the other end it will look for>close tag, that’s how it works out everything
is positioned, in the example where I didn’t have that, if it finds an open tag, or not a close tag it will automatically
close it, that’s simple rules, it will go through each character and figure out the
tokens. That will create a parse tree, so parse tree
is representation of your HTML, it’s pretty much one-to-one, you can see there, it’s got
all the elements in there, paragraph, div, then they will take the parse tree and make
it into the DOM tree, it’s all the DOM elements, that’s where you interact with the JavaScript
in the page. So, there are certain situation where I said
it will actually stop the parsing of the HTML, so when it hits a script tag it needs to actually
stop, fetch it for the network, if it’s a non-new line script and then it will have
to pause, it will have to execute that and then continue on. So, if you have that in your page it will
actually make it slower to render to the page. So things like network latency, et cetera.
Link and style can also affect it as well, if the script needs to query information about
a DOM element, a type or it can affect the parsing while it’s doing that. So there is speculative parsing, what it will
do, it will twiddle its thumbs while it’s pausing for the script to execute, down loading
and executing, it will create another threat, another process with the browser and it will
actually look ahead, so if you have got images or style sheets it will actually go ahead
and fetch all those before it continues on parsing the page, it’s actually kind of smart
about that. The other weird thing about HTML, if we go
back to that diagram is that it’s re-enter, it can be interrupted, when it gives the describe
tag it can bring it down, something like document, dot, you can edit the HTML on the fly it needs
to go through that on the fly, if you download JavaScript and too document dot right, you
can redo the tree, redo the DOM, that can slow it down. So I come to the first performance insight.
You may have been told, put your script that bottom, but never, never known why. So now
we know that a script tag will which stop the parser from going through all the HTML,
constructor and DOM tree, if we do it at the bottom it can go through and do it uninterrupted,
it won’t need any script tags, create all the elements, create the DOM tree, at the
end you will get all the scripts and it will be faster to render. If that’s not an option,
there are attributes, defer, that’s in IE a long time most browsers support it, defer
will actually defer the execution until it’s actually done parsing. Async will work slightly
differently it will do a separate thread and actually parse and execute the script in that
thread adjacent to the parsing one. The DOM document property, if you are waiting for
that, is deferrable… deferrable will… will stop DOM content loading from actually
firing and async won’t. So you can put it at the bottom or put either defer or async
on your scripts it’s a trade-off between them. So this parsing stop of visualised in the
form of a gif. So it’s kind of like my first website, what the parsing looks like. So you
can see it’s quite, it never actually quite falls over! {Laughter} . Trying to figure out what the hell are you
doing with that div tag, you can see at the end he just chucks the spade, there you go,
there’s your website! {Laughter} . I thought that was pretty good. CSS parsing,
I’m not going to say too much about CSS parsing, it’s pretty standard, if you have done it
in other languages, there is nothing special about it, it will create the CSS object model,
like the DOM object model, basically it’s a representation of your styles, you have
a style sheet, rules, selectors, decorations and so on. So that will then move on to our second stage
of combining those two, so we have got a DOM, CSS object model and that will go into and
create what’s called the render or the frame tree. So, accommodation, you combine the two object
models, style resolution, that means it will figure out all the styles applied to nodes,
it’s the actual representation of what will show on the screen, it’s not a one-to-one
mapping of your HTML. I’ll explain what that means in a second.
So if we revisit that diagram, you can see that parsing HTML and CSS isn’t always necessarily
can happen in parallel. JavaScript can affect those two running at the same time. So, the render tree is actually multiple trees,
there are like four of them, render objects which are kind of like the actual node itself,
it’s got a refer to the DOM node, hanging off that the render style, so that’s kind
of like the styles that apply to that DOM element. You have got things like layers, so the browser
needs to know if it’s positioned absolute, or if it has a z index on it where to place
it on the page so it puts it in the right order. Then line boxes, if you are wrapping
text it needs to know how to lay that out as well, based on font size, the constraints
of what’s inside, it gets a bit complex. So, like I said it’s not a one-to-one mapping,
things that are not in the render tree are all non-visual elements, so your head, your
script, your titles, any node that is actually hidden by display-none, or nodes that are
apparent by hidden display-none, won’t make it into the render tree it doesn’t need to
show on the page. If you look at things in the render page but
not in the DOM, it’s quite a lot of text to look at, we can look at the top, a paragraph
tag, so we’ve got a sentence and then we have got two in line tags, strong and emphasise
tag. On the left hand side, down the bottom there, you can see that’s actually what the
render tree looks like, a render block assigned to the tag, a block level element, in side there it’s got a text,
then it hits the strong tag, inside that it’s got some text emphasised and so on and so
on, hanging off that render box this line indexes, that will figure out based on the
constraints of where it’s rendering, if it needs to wrap the text it will create a line
box, a box round each line of text, ‘The quick brown fox’, was one line and ‘Jumps over the
lazy dog’, would be the second box, that’s two separate line boxes, you can see, there
are two separate line boxes, it knows how to lay it out, it will change it if you change
an element or the browser will now need to figure out what’s now in different lines. So, the DOM node, like I said, is converted
to a render object, or the render object has a reference to the DOM node, it’s a visual
output of what’s shown on the page, geometric information about it, it’s width and height,
it can layout and paint itself, so it will work out where it sits on the page in relation
ship to everything else, it’s if it’s floating it will need to have multiple element, if
it’s positioned at absolute it’s taken out of the flow and positioned based on, like,
if a parent has position-relative for example. It will hold styles and computer metrics about
itself, so it will hold, you know it’s font size, width and height. So, hopefully this will work, it’s a really
cool video of… if it loads… someone hacked the Mozilla Firefox and actually true boxes
around, as it’s laying out a page, it’s slowed down quite a lot, you can see it’s actually
figuring out dimensions, it’s laying out all its children and figuring out the position,
it starts top left of the box and works out where it needs to sit. I’ll play that again… it’s quite cool, you
can see it has to do quite a lot of work, laying out all its children and then it will
figure out where it is, it stops there and it slide down. This is not a very complex website, you can
imagine something with quite a lot of stuff going on, it needs to do quite a lot of work. I always like watching this, it’s really cool. Right. So,, so how it lays out the information,
it calculates all the visual properties, it has to combine all the style, so, that means
the browser default styles the external styles you are linking to, any inline style elements
or any inline styles, also take into account some of the old HTML element, properties,
attributes like VGA colour, have to take that into account to set a background colour if
that’s on there as well. There’s a lot of complexity round matching rules for each element
which I won’t go into because its quite a big subject, and how it shares the styles
and how it does the cascade all those rules round how CSS works. Do the style computation,
work out information about, it so colours to apply, width and height et cetera. Now
we’re into the layout process, this is we have got our tree, render tree with our visual
information like the video I showed where you could see all the elements moving round
the page, you might have heard it called reflow or relayout as well if you transfer it. So it’s a recursive process, hopeful remember
what recursion is, so it will work it’s way down the tree, then as if we remember each
render project that’s you sign too DOM node how to lay itself it will layout it’s children
first, it needs to know based on all the dimensions of the children what height and what width
it should be, layout its children, then layout itself, then flow all the way through the
tree of your render tree then figure out where everything sits It will batch layout, so it will do incremental
layout the browser will be intelligent, every time you do a change it won’t do a relay out
yes I did will combine them all and do them at a regular interval so essentially when
something changes in a DOM the render tree representation will say flag itself as dirty
so saying I need to re layout the browser will come a longer pick up all the objects
that need layout and lay them out in one go rather than individually that will traverse
the tree using recursion, find all the dirty render objects, because I need to layout all
these bits and figure out where everything sits. This will happen so it won’t block.
However this immediate layout where it needs to layout the entire document for example
if you change the font size, if you have a fall back font then I font face coming the
height of the fight might change slightly that will cause a whole layout of the page
it needs to figure out where everything sits again because it shifts down that will also
happen when you resize the browser and also when you access certain JavaScript properties
like node.offset height you want to get the height to give you the proper value. Comes to performance inside number 2. We want
to make note from the browser and we want to batch our changes. We want to act like
a browser do all your DOM changes in one go than rewrite you want to do all your reads
in one pass all your writes, and just show here a bad example. So this code is trying to create like a aspect
ratio four div, so like a 16:9 it’s reading its width dividing it by 1.7, you get height
like 16:9 resolution. You can see here I am reading the client width, and then I am writing
it to the DOM, I am reading, it writing it. What that force it to do if I reading it then
writing it that’s fine but if I read it again oh crap I need to layout the page need to
give you the true dimension for the client width because div 2 could have changed cause
of div 1 has got a different height now that could affect div 2 that needs to go and layout
the page and give you the correct value. So the best way to too that is actually to
too your reads in one go and your writes in one go. So you want to read all the values
back, and then you want to write them in one go. So this is obviously naive code, so the
real world situation is there’s a library you’re not using your framework of a DOM which
is pretty good, a pretty good article calling preventing layout thrashing which talks about
reading writing, reading writing, which I highly recommend you read. Fast DOM will give
you an API to do all your reads you called fast DOM.read when you want to read something
from the DOM, and likewise if you do write it will co-ordinate the process for you so
it will make sure it will batch all the reads then do all the writes one after the other.
Most modern JS framework will do this internally,
it will do so does ember with their newer rendering. So finally we are on to 4, paint. This is
what takes all the information from the render tree and will actually do the calls to actually
paint something. So kind of think of it like the canvas API where you are drawing boxes,
paint uses the process of taking all the information from the render tree and gives you the visual
output. There’s a few things about it, it will take
so it will layout the render tree, take all that information you have done the layout
so it knows where it sits the co-ordinates of the items, it will create layers, so layers
of things you have got a position absolute, make that work from the bottom up things display
correctly, it’s an incremental process, and actually there’s a 12 step phase, it needs
to paint like the background colour and image, the porters, the shadows et cetera build it
up so it comes out in the correct order you have got the correct rendering it’s quite
a heavy process. So if we remember one of the trees of the render trees or render layers
will try to group things into layers if you got a position absolute element all it’s children
will be in it’s own layer if its got transparency, if it’s got over flow hidden it’s I canvas
or video it will put it into it’s own layer. It’s kind of a many to 1 one layer can have
many render objects over DOM nodes it can put on the screen. Painting, so painting will also produce what’s
called a bitmap for each layer, so this used to be CPU based but a lot of browsers now
do it on the graphics card on the GPU, so that’s lot faster the GPU is obviously made
for drawing graphics, there some really good articles about a Blink and Webkit do it, it
will take that bitmap like a jpeg upload it to the GPU create what’s called a texture,
those textures will be composit he’d into at file image it’s a bunch of images they
get put together put on the screen. This comes to performance inside number 3,
so inline critical CSS this is a new thing you might have seen. Essentially it’s a
way of taking the most important bits of your site and inlining the CSS in a style tag inside
your HTML right in the head, that will speed up first paint times so you delay everything
else you delay all your other CSS all your other JavaScript, so then the browser can
come in parse all your HTML, you can get something on the screen really fast, you will see lots
of sites like filament group, a lot of people really into performance talk about this a
lot. The last bit about delta last bitmap, browsers performance where they can compare
the last paint to next paint they can do a dif when you do visual regression testing
they can do something where they go ok only this bit changed I only have to draw this
box here, it can be smart about it. So we take an example I don’t actually do
this on my website but I went into the DevTools, this is my blog, so the idea is that on the
first render I would get this, essentially the most important thing of a blog is the
blog content we care about related articles or my logo or the blog title you just want
to read the article. The ready idea is you render that straight away on second part when
everything loads asynchronously you get everything else coming in, you can see there I have got a different font
that is helvetica, comes in and you set ubuntu, you get that in under a second you could have
that on the users screen so it gives an illusion that your site is really fast but what you
are doing is delaying all the stuff and giving them the important information. I guess the
other thing is to think about is that you don’t want stuff to shift round so like the
header up the top you want to have fixed height so inline CSS so it doesn’t jump, you kind
of want to make it not jump as much so it’s not jarring to the user. So all of these tips I have talked about apply
after page load as well, this the process to get it on the page initially but things
like layout and paint all happen as you interact with the website it’s so really important
information to know about. For recap, we parse our CSS and our HTML that
creates a DOM tree, the DOM tree then gets converted into a render tree, so the CSS and
the HTML combined. It’s actually 4 trees so the layers, the line boxes, the render objects,
and the render styles. So layout is a process where it computes where the elements will
appear on the page, based on it’s relationship to other elements, taking into account all
the CSS, then painting will actually produce an image of that layer give you the visual
output you are expecting on the page. Like I said this process is really complicated
I have little scratched the surface, there are some really great articles I will share
round later I highly recommend you check it out some really good talks in here, really
good article about going more indepth on how the browser works as well. Go hug a browser engineer they have a hard
job. It’s
really interesting it’s made me a better developer to understand how it all works. Thank you.
{applause}

, ,

Post navigation

16 thoughts on “Ryan Seddon: So how does the browser actually render a website | JSConf EU 2015

  1. I learned about it before from the article http://www.html5rocks.com/en/tutorials/internals/howbrowserswork/. Definitely worth checking out if you interested in this topic.

  2. At 13:51 he says head, script and title aren't in the render tree, because they aren't visual elements. Well, not by default, but they can be. Just set head, title {display:block} and the header will be rendered with the page title visible.

  3. @6:21 The diagram from a slide is cut off by a window of speaker. I think it's super unnecessary to show the speaker, unless he or she is physically motioning to something. I don't understand why this is being done…

  4. Wow, A LOT goes on behind the scenes! I got answers to questions I didn't even know I had. Awesome presentation!

  5. What if you had a cylinder that was solid in the center and connected to a hollow cylinder by pegs extending from the edge of the solid cylinder out to the inside of the hollow? What if the core was balanced in the center of a hollow cylinder by magnetism without touching?

  6. I know how they all work. It is not as you may think. The NCSA Mosaic Canvas was one of the first browsers that used a single Canvas with as many as 25 added frames. Netscape Navigator was that Browser. These could handle up to 50 added text, image, label, button, entry boxes, spin boxes, check boxes, option boxes and message box widgets. And all of these along with the frames depending on the complexity of how many were used by a website. These also used two methods to render a GUI(Graphical User Interface) inside the browser.

    1. The browser first request the HTML page with a HTTP/HTTPS Request.
    2. When the HTML is fetched it is Parsed and fetches the CSS, JS and Image path links then does a Multi Threaded second request and downloads them all.
    3. The HTML then is separated into two distinct methods. One it determines the div tags traversal using a for loop to append a positive and negative count to regex both beginning and end div tags into an array by div1 div2 div3 div3 div2 div1 tree. A second method fetches all the text for rendering on the GUI into their required Frames below.
    4. All HTML tags then become add_tag() methods for the Canvas Markup. These also in a for loop are stripped leaving only the render-able text pre-placed onto the screen but without config_tag() methods.
    5. The HTML at this same instance fetches all class=value, id=value these become the add_tag() names that will be matched to the config_tag() methods to render all color, background, paddings, margins etc as the screen is Painted. iN THIS STEP THE class=value and id=value become the div tag named add_tag() methods. They resemble these—-Sdiv1value Sdiv2value Sdiv3value Ediv3value Ediv2value Ediv1value. These can now become the div block add_tag() begin and end location methods pre loading the GUI Canvas and Frames.
    6. The CSS and the HTML really have no correlation within the browser with exception to only their relative class and id names. The CSS contains the Markup Rules and Attributes for the add_tag() methods and the GUI.These have to mate/match by name to the already set add_tag() methods for the config_tags() to render them all. This is the Painting.
    7. An elide() method is also used to hide both the begin add_tag() and end add_tag()(Of-Which-Separate-Each-Block) methods as markup tags in the GUI Canvas. These tags(un-related to HTML tags are used as Markers for the GUI's Frames and all widgets. Hidden or elided are not seen but mark the exact positions for the config_tags() to render them in the right order.
    8. The config_tag() methods are last. These using conditionals are filtered out to determine page GUI layouts according to Markup Rules matched to the add_tag methods to Render/Paint the GUI Screen Mosaic Canvas.
    9. The last config_tags() methods the Hyperlinks have to depend on already being placed and positioned on the GUI Canvas and Frames. These are the Hyperlink Tags and have their path suspended into the for loop using Lambda for each config_tag() method. So when a client clicks them can send the correct callback function and URL path to be fetched next.

    In summation all Browsers use highly customized versions of the original NCSA Mosaic Canvas. Microsoft purchased a copy from SpyGlass Inc. back before IE was introduced. Webview is another copy that exist besides it. It came from a customized original copy of Netscape Navigator. These have also become so customized that today the CSS use all of the HTML5 Canvas Markup rules. And use many Advertiser added junk from their respective type. Chrome being such.

    Many cannot understand how CSS is used. The CSS isn't a computer language its only a Parsed set of Markup Rules that in a for loop are matched by name to their relative add_tag() methods(which are the already set add_tag HTML Parsed section blocks) and thus rendering/painting the final Mosaic Canvas and Frames using the last defined config_tag() methods. These parsed in the loop for their Attributes painting the GUI Mosaic Canvas.

    Where does JavaScript fit in? Well contrary to common belief it is Parsed by the Browsers HTML Parser and then is Token-ized into the clients machine. It executes by using several eval() and literal_eval() methods. These evals take strings and convert them in to executable scripts in the browsers backend and also only use the config_tag() methods along with their own sets of elide() functions to present or hide data presentation. And were first designed to allow website masters to implement injected Advertising into the browsers. This language gradually became less advertising and more dynamic as it grew.

    Search the NCSA Mosaic Canvas developed at Berkley on Wikipedia. Microsoft before it got its hands on the Spyglass Mosaic was testing with Tk(Tkinter) before they found the Mosaic Canvas. They more than likely used parts of both to develop IE as well. Read more about Tk/Tkinter Widgets at effbot dot com. Also the original NCSA Mosaic Browser code can be viewed at Github. Thanks and enjoy the Knowledge !

  7. Mozilla layout/reflow video:
    https://www.youtube.com/watch?v=ZTnIxIA5KGw

    Suggested reading links:
    https://limpet.net/mbrubeck/2014/08/23/toy-layout-engine-4-style.html
    http://www.html5rocks.com/en/tutorials/internals/howbrowserswork/
    https://www.youtube.com/watch?gl=US&v=RVnARGhhs9w
    https://www.chromium.org/developers/design-documents/gpu-accelerated-compositing-in-chrome
    https://www.youtube.com/watch?v=Lpk1dYdo62o

Leave a Reply

Your email address will not be published. Required fields are marked *