Web Development – Computer Science for Business Leaders – July 2016


DAVID MALAN: We’re back
and today we’ll end with a little bit of web development. So based on everyone’s
experience, sounds like some folks have a little
bit of experience with this. So we’ll try to fill it in some gaps
and go in any number of directions that you might like. But ultimately give you a sense of
exactly the next layering on top of platform-as-a-service,
infrastructure-as-a-service, platform-as-a-service, and
now software-as-a-service. So a very common thing
for engineers to use is an IDE, integrated
development environment. This is a piece of software
with which they write code. Now, technically speaking,
as we’ll see tomorrow, you don’t need anything more
than Notepad or TextEdit to actually write code. Because most every coding
language these days is text-based. So all you need generally is a
program with which to write it and then a program with which to run it. Cloud9 and in turn CS50 IDE,
which is the web-based tool we use in the class I
teach during the year, is a web-based programming
environment that gives us all of the
requisite tools that we might need for any number of languages. It also gives us a built-in web server,
it will give us a built-in database server, although we
won’t use that today, and it gives us the ability
to actually write code, all within the confines
of a web browser. The alternative to this
would be for me to write all of a whole bunch of
instructions for how everyone here can install their own
web server, whether it’s Apache or IIS, on their own Mac or PC. And then we’re going to run into
different issues of Windows and Mac OS and all of the litany of
headaches that might happen. With the web, now, we’re just already
up and running and ready to go. So what you’ll see before you is
a screen that looks like this. I’ve maximized mine, so you
might just see a few more icons and buttons around the way. If you don’t see this, call
me over in just a moment. And let me orient you
to what’s going on here. So in the top left-hand corner is your
so-called workspace or file browser. So as we start to create files,
just like in Windows and Mac OS, they’ll start to appear
there on the left. In the top right, you’ll
see a window where we can actually write code ultimately. And I noticed some of you
might have clicked and dragged and pulled something like this up,
notice that the workspace windows can be moved like common software. But go ahead, and in the window
above, see this little plus up here? Go ahead and click that. And you’ll see at least two
options, New File and New Terminal. Click New File and you’ll get a little
tab, like a typical editing program. And that’s where we’re going
to write ultimately some code, but more on that in just a moment. In the bottom right-hand corner,
where you ran update 50 a moment ago, update 50 is a command that
we, for the course I teach, wrote that sort of
automatically updates students workspaces to have the very
latest versions of software. But even though this thing is positioned
as CS50 IDE, at the end of the day, this is designed to be and is actually
representative of a real world development environment
complete with the ability to browse your file system
or files and directories, complete with the ability to
write code in multiple tabs, and complete with the ability
to run your own server. Indeed, this blue window here is
what’s called a terminal window. And it’s giving you command-line
access– text-based access to the underlying operating
system, which in this case is Linux called Ubuntu 14.04,
which is the version number, it’s a very popular distribution of a
free operating system called Ubuntu. Moreover, in this environment, you have
what are called super user privileges. So you can write commands like
sudo, which is substitute user do, and actually run commands as
though you’re administrator. We won’t do that and try not to do
that, lest we break things right now. But what’s powerful about this is
that it gives you and in turn students more generally the
ability to do anything they want in a nice sort
of sandbox environment that’s nonetheless representative. Now you wouldn’t typically use Cloud9
in this context with our free accounts to run your web server,
because the way they achieve free accounts is that the
service, when you’re not using it, automatically turns off your container. What you have access to,
here, is a container. So you couldn’t run
a website 24/7 on it, but you could if you actually paid
for the sort of monthly plans as well. Thank you. So what are we now
going to do with this? So we’re not going to type this up here. Let me go ahead and
propose the following. Go ahead and first do this. Back on your own computer’s desktop,
whether it’s Mac OS or Windows, go ahead and open up
TextEdit if you’re on a Mac or open up Notepad.exe
if you’re on Windows. If you’re not sure how
to find those, just call me over or ask someone near you. But just the simplest
stupidest text editing programs on both operating systems. And they’re simple insofar as they
really don’t do all that much. Now just as an aside, odds are if
you’re using TextEdit on a Mac, it’s actually not as
simple as would be ideal. Odds are by default,
you are seeing a window that looks like this with
a whole font thing up top. This is bad, this is going to create
not a simple text file for us. This is going to create RTF or
rich rich text format, which is actually formatted text. So Mac users only, if you wouldn’t
mind going to TextEdit Preferences and then change the default format
from Rich Text to Plain Text. Otherwise you’ll be saving the
file in the wrong file format. And we won’t use this for
very long it’s meant to just be demonstrative of the relative
simplicity with which we can start writing web pages. So what is a web page? A web page is written in a language
called HTML, HyperText Markup Language. This is not a programming language. As you’ll see tomorrow, you can’t use
HTML to make the computer do something. You can only use HTML to make the
computer show something, really. And we’ll see what distinguishes
the two by tomorrow. And HTML is this markup language
that works essentially as follows. This is perhaps the simplest
web page that you could write. And by that I mean it has only the
minimally required elements for a page. So this is a page that
apparently is just going to tell the user hello, world. But how does it do that? Well, up top, there is just this
declaration called the document type declaration that frankly you just
kind of have to copy and paste. It’s anomalous, it doesn’t look
like anything else exactly. So you just copy and paste this
to the start of your document. And this signifies to the browser that’s
going to read this file top to bottom, left to right, hey browser,
this is an HTML page written in version 5 of the language. That’s just the symbol for 5, it’s
not as intuitive as would be ideal. But they’re after– what’s nice about
HTML, is everything follows a pattern. So you’ll notice, and I’ll point
to this one as it’s closer, notice the parallelism
between open bracket, this is the angle bracket
HTML close bracket, and then notice the opposite of
it, so to speak here at the bottom. And by opposite I mean if this
is an open tag or a start tag as we’ll call it, this would be
the close tag or the end tag here. Different only insofar as
there are this forward slash in front of the same word HTML. Now, up here is a division of the page
into two sections, the head of the page and the body of the page. These are the only two
sections inside of a web page. The head is really just
the top menu bar space and the body is 99% of the
page, the so-called viewport. Big rectangular region where
actual information is presented. The head of the page, open bracket
head close bracket head, here, for now contains only one child
element, a title element. And you might have guessed, this is
how I specify what the title of my page should be in the menu bar or in
the tab that you see on the page. Meanwhile, the body
of the page is, again, the big rectangular region
with which we’re all familiar. The only words there are going
to be apparently hello, world. So notice that this is kind of
like a tree structure if you will. It’s kind of like a family tree,
whereby if the roots or the patriarch or matriarch of the family
is this HTML element, everything else kind
of descends from that. And we can see that. Can we see that? No, can’t see that here. We can see that if I draw it. We can see that we have
an HTML elements that I’ll draw as the root of this family tree. It has two children so to speak. So it has the head on the
left, because it came first. And the body on the right,
because it came second. Then the head, meanwhile,
has a title child. And then the title child has
literally the words hello comma world. The body, now, has how many children? Seems like just one. So I’ll draw it in quotes
because it’s different. Hello comma world. So I offer this just as kind of a
different way of thinking about things. There’s this whole hierarchy or
tree structure to a web page. And that’s why we have the
nice pretty indentation. And how every tag that’s opened
is simultaneously closed. But notice the order, the
first tag to be opened was HTML, ergo the last tag
to be closed should be a HTML. There should be the symmetry built in. So what does this do for us? I’m going to go ahead and cut a corner
and just copy this for a moment. And I’m going to go
into TextEdit on my Mac. And I’ll slow down in just a moment. But let me just show
you the revelation here. Let me go ahead and save this. And on my Mac, I’m going to
arbitrarily call it hello.HTML, where dot HTML is just a convention
that humans use for web pages typically. And I’m going to click Save. Mac OS is going to be
minorly annoying and yell at me because it wants me to save
it as a text file because it’s text. But no, I know better, it’s HTML. And now let me go to my actual
desktop, where I see this file here. Today’s slides, a screenshot,
and now hello.HTML. And if I double-click on this,
I’m on the internet it would seem. But not quite, well let’s see. Let me zoom in and what did I
mean by the head of the page? Well, notice where the title
is, it’s up there in the tab. The body of the page happens
to be identically worded, but that’s the body of the page. But I’m not really on the internet yet. I have not made a website. I’ve made a web page,
but it’s not on the web. Where is it? It’s on my desktop. So if I told you all– hey everyone, go
to File colon slash slash slash users jharvard desktop, none
of you are going to be able to see that because it’s
only running on my computer. Now I could run a web server
on my Mac or on your PC, then we could via it’s
private IP address, share the files here internally. But then, frankly, firewalls on my
computer or yours might get in the way. So it’s just generally annoying. So ironically, going way
outside the bounds of Harvard and using the cloud,
Cloud9 in this case, or CS50 IDE, which is just
our customisation thereof, allows us to actually do
everything publicly instead. So let me go ahead and pause here. And just as a proof of concept and
to get everyone on the same page, go ahead and in Notepad or
in TextEdit on a PC or Mac, respectively, go ahead and just
copy and paste that example or whip it up yourself
or some variant thereof. Save it and double-click it and
make sure it’s working for you. If you haven’t already,
just do File, Save. And save it there, too, as
hello.HTML or some such file name. So even if you’ve never
used an IDE before, you should find it fairly similar
to most any editing program. And then there is one
step that we’ll need to do together to get this to work. Stop me if going too fast, but
at this point in the story, if you’re following
along, you should have hello.HTML in a tab, saved as such. And on the top left, now,
you should see hello.HTML having suddenly appeared
in your file browser. So even though we are sort
of paradoxically on the web right now and using a
web-based app, the web server is not running– our web server. Cloud9’s web server is running,
but not our personal web server. So to make this work, in the little
terminal window, the blue window down below, go ahead
and run a command, it’s a slightly custom
command called apache50. Recall that Apache is the very popular
open-source software, for Linux especially, that runs
many websites out there. In fact, it’s still, I think, the
most popular web server out there. Run apache50, which is our
own customisation of it so that it’s easier to start. And do apache50 space
start space period. And what this command does is it
says, hey Apache, hey web server, start yourself using
the current directory. Dot means the current directory
as the root of the web server. And that’s important
because the hello.HTML file we just created is in this folder. And insofar as that’s the file we
want to serve up to the internet, we need to go ahead and start
the web server in this folder. So if you hit Enter,
you should hopefully see a successful message followed
by a URL, which you can hover over. And then click and Open. And you should see something
that looks like my screen here. So these are the
contents of the directory in which we started our web server. And so now, if you click on hello.HTML,
you should see an error message. So what’s– yes? No? You see your file? OK. Let me– does anyone else see forbidden? You see forbidden. All right. So let me do this. Let me go back here. Oh, I know what that– OK. That’s curious that you didn’t see
this, but let’s go ahead and do this. Go ahead and do– in
your terminal window, the blue window, chmod for
change mode plus or sorry– a plus r for all plus read. So let everyone read this
file, which is by design since we want it to be on the internet. And then hello.HTML. So again, we’re mixing
a graphical interface, which is the tabs and the menu options
and the things with which we’re all familiar and comfortable with
a command-line interface, which is purely text-based, which is older
school but more powerful and versatile and still very much in vogue
for software development. Nothing should happen
when you run that command unless you see an error message,
in which case it’s probably a typo. But then if you go back to that
other window and click Reload now, you should see it,
albeit somewhat small. OK. And what we’ve done, to be clear, is
we’ve given read permission to all, to everyone in the world. And that now allows our file to be
publicly accessible on the internet. So incredibly underwhelming. So let’s now actually do
something that’s a bit more interesting in the following ways. But before I forge ahead, I know we have
a few folks who just need to catch up. But just call me over during
the next lull if you’d like. Any questions? So that we don’t lose folks. Yeah? You are. OK. Where did this get you? STUDENT: [INAUDIBLE] DAVID MALAN: OK. So click Private. And then CS50 Harvard University. And then Create Workspace,
the green button down below. And let me come back in a moment again,
this will take a moment to create. All right. So this is all very underwhelming, but
let’s do a couple of different things here. Let me go ahead and go into my
web page– the editor, here. And suppose I want to
do the following, let me grab some sample Latin
text for just a moment. I’ll get myself five paragraphs–
just three paragraphs is fine. So this is sort of nonsensical
text and I’m going to ahead and paste it into my page like this. And you can see it’s not
wrapping because it’s quite long. Can I wrap that? I won’t worry about that for now. OK. So I have three long paragraphs. And I’m going to go
ahead and reload my page. And it looks like one bigger paragraph. What is going on? Well, it turns out HTML
is a markup language. And it’s only going to do
what you tell it to do. And what I have not told it to do is
to give me line breaks or paragraph breaks. So if I actually want these
three things to be paragraphs, it turns out in HTML there’s a
Paragraph tag or p tag to be succinct. And if I open the Paragraph
tag like this, here. And I then close it, here. And then I open it again, here. Notice that the editor is actually
trying to be a little too helpful. The moment it notices that I
have opened a tag so to speak, it tries to be helpful
by closing it for me. But of course, I need to now move
that to be actually below the text so that I keep everything
nicely hierarchical as intended. So now it’s gotten more
verbose, but HTML kind of like, if you remember many, many
years ago, Word Perfect, before there was a way to make things
bold and italics and WYSIWYG editors, What You See Is What You Get, you
would actually have to be emphatic and say make this bold, make this
underline, make this italics. So now that I’ve done
that, if I save and reload, still pretty ugly but at
least now I have the semblance of three actual paragraphs. So let me go ahead and rewind for just
a moment, that’s all fine and good. What about something like a list
of my three favorite things? So I’m going to give myself
an ordered list with ol. So you can tell the
authors of HTML tried to be very succinct if cryptic sometimes. Then I want to list item and
I will say my favorite things are– I don’t know, like movies,
and then li TV, and then how about books, number three. So if I save that, how
do you think this is going to render, based on other things
you’ve seen on the internet already? STUDENT: [INAUDIBLE] DAVID MALAN: Yeah, numerical
list of things, let me reload. And indeed I get this automatically. Still pretty simple, but it’s sort
of adding the logical structure of numbering for me. If I didn’t like that
and I actually love reading equally with
everything else, and reload, now I get the familiar bulleted list. And there’s other ways to stylize
this that we’ll come back to in just a moment. This, of course, is not really what
the web is great for– static text, but the whole point of the web
in HyperText Markup Language was to have hypertext,
text that links elsewhere. So why don’t I greet the user with
my favorite website is say a href– and what am I doing here? So notice the following. Actually, let me do this. I got ahead of myself– www.harvard.edu. So all of us have been very
acclimated by websites– to any time you type a URL in certain websites,
they automatically become clickable. And this is a feature of
modern browsers or really of modern websites using language called
JavaScript, more on that tomorrow. But this, of course, is not a link. Like nothing is actually happening here. I need to, again, tell the
browser to be emphatic. So I want to say a for
anchor, which means link. href, which means Hyper reference or
what is the URL I want to link to. And now I want to go to harvard.edu. Close quote, close bracket,
and now Harvard’s website. So notice here, there’s some
fundamentally new syntax. The open bracket a says give me a link,
href modifies the behavior of this tag. So it turns out that whereas a
is a tag, href is an attribute. And an attribute just modifies
the behavior of that tag. And in this case, and
you would only know this from having been told by someone
or by reading the documentation. href controls the destination
of that hyperlink. And then notice, here, that
I’m still closing the tag. So what is this going to look like
visually on a web page once I reload? What am I going to see, literally? OK. Yeah, Harvard’s website. My favorite website
is Harvard’s website. I’m not going to see the URL. But it’s going to work– reload. And now we get the old school but
familiar blue underlined hyperlink. And if I scroll down–
if I hover over it, it’s really small,
especially on the screen. What do you see in the bottom
left-hand corner of either screen? The actual destination. So this is kind of a juicy moment to
mention a potential security concern. How many of you have
ever been phished before? P-H-I-S-H-E-D, which means you’ve
received an email purporting to be from usually like Paypal or eBay or
your bank or something like that, that’s actually just trying to phish
for your username and password. Most everyone– I mean it probably
ends up in your spam folder these days, because the mail providers
have gotten pretty good about this. But what feature of HTML do
these phishers take advantage of? Well, they might actually
do something like this. They could have this be– let’s
call it HTTP bad guy website phish– I’m trying to pick a domain that doesn’t
actually exist and creep everyone out. So bad guy phishing website– OK. That probably doesn’t exist,
but don’t visit it just in case. And if I reload, the text, of
course, still seems the same. And if you hover over it, of course,
you see the malicious destination. But more maliciously, as is the case in
a lot of these phishing attacks, what if I do this? And deliberately type it to be
dichotomous with the actual URL. And if I reload now,
most of us are probably, given a link like this,
going to click it. I mean most of us are not so uptight
as to first hover over every link we’re about to visit look at what it is
in the bottom left-hand corner, and then proceed to click. So how do these phishing attacks work? What are they trying to get you to do? This is the how? What’s the why? Go to their website. But why is that useful? So you log in. And in fact, you know
what, let me try something. Let me go to– let’s say a bank that’s
in Harvard Square, BankofAmerica.com. Here’s their website. Recall from earlier that you can go in
Chrome to View, Developer, View Source, and I.E. and Firefox all have the same. Here is their HTML,
it’s a little cryptic, it’s longer certainly
than the pages we ran. That’s crazy long. I’m just going to go ahead and
copy it and paste it into Cloud9, it’s all 1,390 lines of it. Save that in hello.HTML. And now reload the page. And notice my URL, I’m
on Cloud c9users.io. So this is not certainly
Bank of America’s site. Let me go ahead and reload. Woo. I have re implemented
Bank of America’s website. But fortunately, you know, actually I’m
stealing material from tomorrow now. You know it’s secure, because
look at that padlock icon there. But what does that mean? Notice my URL. Actually, ironically, it is secure. But my– connection is secure, it’s
a c9users.io, not Bank of America. So what does this mean? No, it just means someone
who works for Bank of America knows how to make icons
that look like padlocks. I mean, it literally is simple as that. So there’s this whole
can of worms, I’ll try to defer this till tomorrow
because it’s kind of a rabbit hole of interesting frightening topics. But it means nothing. And a phishing attack, really
is someone, who spent what? All of 30 seconds copying
someone else’s website, trying to trick you into going to it. But the takeaway here isn’t
so much that specific example, but just how HTML works. And how you can so quickly, after
like 5 minutes of the language, start to abuse it in
a way that leverages what we call social engineering. These aren’t technological
attacks really as they are human social attacks. All right, so let’s do something else. The internet, of course,
is filled with cats. So if we Google for a cat,
let’s just grab– that’s cute, a little image like this. And now, notice, this file, by nature
of being on the internet, has a URL. And so let me go ahead and if this were
my own cats URL on my own web server and I want to embed it, let
me go ahead and do this. Instead of my favorite
website, my favorite cat is image source equals
quote unquote that URL. And then, here, I’m going
to do alt picture of a cat. An underappreciated feature of
HTML that more website’s should be sensitive to is alternative text
for folks who are without sight or rely on screen readers for
recitation of what a site is. Of course, if someone’s blind
and they can’t see this image, it would be nice, certainly, if the
computer could tell them what it is. And so simply by providing
alternative text, a picture of a cat, you can go part way toward actually
helping the user follow along with what’s on your page. And if I now hit reload, here, my
favorite cat is this thing here. And it turns out there are going to be
mechanisms that allow us to scale this appropriately, although
we could just open it in Photoshop or some other program
and actually integrate it better into the website. But the point is, it’s these very
simple building blocks with which the entirety of the web is made. We can do other things
like if for whatever reason I want to emphasize cat, we can style
that and other text differently. And actually is there– let’s see. Let’s start to do exactly that. Let me go ahead and grab
some Latin text again, just so we have some
actual text to play with. And grab two paragraphs of this again. This is just a popular way of generating
sample text with which to play. Let me go ahead and
paste it into the site, give myself a couple of paragraphs. Let me go ahead and fix this real fast. Close the paragraph. Whoops. Close the paragraph. Open the Paragraph. Delete this. OK. So here’s where we’re at now. We’re back to just a couple
of simple paragraphs. And let’s suppose I want
to change the font of this. As best I can tell, this is like Times
New Roman in 14 points or something like that, whatever the
website’s default is. Well, turns out that
back in the day, you would actually do something like color
equals red or something like that. But the world eventually realized
that mixing your data with metadata, specifically mixing your data
with the presentation thereof, is generally bad practice. Certainly these days, because it makes
it much harder to maintain your website and in turn your data long-term. It makes it much harder to change
the aesthetics of your website over time, if you want to do
a refresh, change the colors, change the iconography. And so the world has gotten
a lot better at factoring out anything related to aesthetics
to a separate language called not HTML, but CSS, cascading style sheets. And unfortunately, the syntax
for this is a little different. But it still follows
some simple patterns. If I go into the head of my web
page and introduce a style tag, notice that I can do the following. I can specify that you know what,
I want every paragraph in my page to have the following properties. And notice the new syntax, here,
where we have curly braces like this. And I want to go ahead
and say, you know what, make the font size– oh, it’s too big. Let’s make it 11 point. And the color is just a little annoying. Let’s go ahead and make
it a nice shade of blue. So CSS, if you’ve kind of seen the
entirety of its grammar so to speak, although there are some few other
features, you have key-value pairs. The key is a word like color or
font size, the value of those respectively is blue or 11 point. You would only know what
the valid keys and values are by reading the
documentation or taking a class or reading a book or whatnot. But if I now reload
this, the effect is going to be make all paragraph tags
content match these properties. So if I go to Hello World and
click Reload, it’s not very pretty but the text is indeed a little
smaller and it’s much bluer. And so I’ve achieved that effect. If instead I want to do something
different, suppose that font family, I really don’t like the
old school serif approach, here, I want it to look a little
more modern, a little fresher, I can change the family
to be sans serif. And if you notice, this
is the before, after, and now we’ve changed the font entirely. And we’ve just scratched the
surface, here, of what we can do. But ultimately, the paradigm,
now, is that we have the ability to separate the aesthetics of
our page, the stylistic decisions from the content, but
there’s still a problem. So notice that we’re still
inside the same file, hello.HTML. And it turns out that even though yes,
there is this style tag right here– it turns out that’s not best practice. Best practice would have
us put it not up here, but instead use the tag as follows. A link tag, which annoyingly
named is not a hyperlink, it just links to another resource. That resource, in this case, is a file
called styles.CSS, which I’ll stipulate is just a file containing a
whole bunch of key-value pairs, a bunch of properties as we just saw. And then the relationship, here, is
that it’s a style sheet, which you just have to copy and paste. So I’ve removed from
the file, apparently, per this yellow highlight,
all of the properties and moved them into a separate file. Why might it be good to do that? STUDENT: [INAUDIBLE] DAVID MALAN: Or was that just a stretch? STUDENT: [INAUDIBLE] DAVID MALAN: OK. Exactly, you can factor
out the different job’s. So one of you can focus
on the actual content that you want to display, the images,
the text, the sort of business or the products you’re trying to sell. And then someone else, who’s
perhaps more artistic, and better than you at that can
actually do the refinements. What does the text look like? Where is it placed? Where to get all the aesthetics? So that makes sense. What else might be compelling about
factoring out CSS from the style tag in the page to a
separate file like this? Yeah, Vanessa. STUDENT: [INAUDIBLE] DAVID MALAN: Exactly,
if you’re– right now, we’re assuming naively we have
just one page, hello.HTML. If you have 2 pages,
10 pages, 1,000 pages– how else are you going to make all
of the text blue or all of the text sans serif? You don’t want to have to
copy and paste that same block in every one of those files,
if only because, God forbid, you want to change the aesthetics
of the site tomorrow or in a year, now you have to go through and change
2 or 10 or 1,000 pages separately. Much better to factor that out,
put it in one central place. What about more technically? If you were a browser, why might
you prefer, too– or even a user, why might you, too, prefer that the CSS
be factored out into the separate file styles.CSS? Vanessa? Easier to read. A little bit. STUDENT: [INAUDIBLE] DAVID MALAN: That’s fair. So the browser definitely
won’t care, because it’s just going to read it as text, top to
bottom, left to right, no matter how messy it is. And a user, yeah, that’s nice. But I don’t really care
as the business owner about making my source code so to
speak easier for humans to read. After all, I don’t want
them copying and pasting it even easier like Bank of America. Oh, OK. Sure, then that’s fair. Factoring out into some
central, more readable place. Why might a browser or an
end user actually benefit from factoring out your
CSS into a file like this? STUDENT: [INAUDIBLE] DAVID MALAN: Runs it one time or more
specifically downloads it one time. If you have a user visiting this
page and that page and that page, and the content is, of
course, changing, after all that’s why they’re visiting different
web pages, to see different things. But the stylization is global in
the sense that all of those pages are including this same file. The upside of that is that
especially if it’s a really big file, the website browser only has to
download it once and do what with it? It’s a book. Copy. Or just to borrow the more technical
term from earlier, to actually cache it for some amount of time. Now, caching, we know
can work against you because if then the browser
remembers it longer than you intend it might actually backfire. But at least if you’re not
changing the CSS that often or if the browser only caches
it for a few minutes or hours, it can certainly help,
especially on devices like this. What’s frightening these days–
let’s go ahead and do this real fast. If I go to Chrome and go to
View Developer, but not Source, but Developer Tools. Most browsers these days have
fairly arcane features like this built in, whereby if I
click on the Network tab, I can actually see all of the HTTP
requests that Chrome is about to make. So let me go somewhere– whoops. Let’s see, let’s go somewhere
like CNN by moving this up. Come on, come on. No, come on. OK. Let me move this all the way up. Let me go to CNN.com, Enter. And notice that just visiting
CNN’s web page– dear God. OK, it’s even worse than
the last time I did this. How many HTTP requests did my browser
just make in order to visit CNN.com? 300– atrociously 24 requests. Each of which– oh, now it’s up to 325. Each of which represents
apparently a JPEG or PNG, which are image file formats, which is
not unreasonable for like a new site. Some of these are script files,
JavaScript, which we discussed. GIF, which is an image format. GIF, GIF, GIF, GIF. Script, Script, Script, Script. I mean– my God, this is
actually just remarkable. Wow! A lot of this frankly
is advertising, too. Wow! All right. So, OK. Why is this bad? Never mind the content from CNN,
but why is this technologically bad? Yeah. STUDENT: [INAUDIBLE] DAVID MALAN: Consumer bandwidth,
and it’s not just bandwidth because at the end of the day– well,
it’s pretty big, it’s 3.8 megabytes. But it turns out downloading one
3.8 megabyte image would probably reach me faster than 332,
now, files that individually represent 3.8 megabytes. So it’s not so much the
bandwidth that’s concerning, but it’s another measure of sort
of speed that users experience. And the words come up
a couple times already. Late– Latency. OK, good. OK, latency, which is different. Bandwidth– often latency,
especially if you’ve ever used like YouTube or
Netflix or Hulu or the like, latency is that delay from
when you visit a video and it takes like a second or
five seconds to start playing. But then it looks beautiful because you
have a good bandwidth but bad latency. By contrast, if you had
good latency, the video might start streaming instantly but
very suddenly get very pixelated or hang or buffer and that’s
because you have bad bandwidth. So latency describes the
amount of time it takes. And for the browser to be
doing this, what’s happening? Well, recall from my simple
example earlier of the cat, that an HTML file can, inside of it,
reference other files or other URLs. A browser is defined upon reading
a web page, to look inside of it, looking for all of the images,
all of the movie files, audio files, anything that’s
mentioned inside of it, it’s designed to go fetch all of those
URLs as well one at a time or a few at a time. So the result is that CNN’s web page,
index.HTML as it might be called, itself mentions all of
these other darn files. So we are inducing, by visiting
CNN.com, 335 separate HTTP requests, some of which might
be parallelized to be fair, but that’s 300 requests. Each of which might have
200 milliseconds of latency and actually you can see how
long each one of these takes. It’s all between 0
milliseconds and 150 or 200. And imagine doing the
same on your phone. So phones have even less bandwidth
and it’s often higher latency, so this is not necessarily a great
user experience on the phone. So how might websites
mitigate this concern? Like I feel like having my
phone downloaded 335 files is not very good for business
for making me want to come back. They have a mobile version. So you can detect with
high probability if a user is coming from a mobile device. How is that? How would you know? Oh. STUDENT: [INAUDIBLE] DAVID MALAN: Yeah, the browser
should tell the website and indeed all browsers do. In fact, if I scroll
back up in time and go to the very first request for
www.CNN.com and I click on headers, this is fairly arcane
information that would now be found inside of those
virtual envelopes we were discussing this morning. And if I zoom in on
this, notice here, these are exactly the headers so to speak,
the text that my browser put inside of that virtual envelope. Odds are the first two lines look
familiar based on my quick example earlier, when I manually typed it in. But what is minorly
revealing about yourself? What’s the takeaway here,
in terms of privacy perhaps? Or curiosity? What else is my browser
presumptuously telling CNN? Yeah, it’s telling CNN that I own a Mac
running version Mac OS 10.11.2 no less, which is oddly precise. What browser– actually this
is historically confusing. But what browser am I apparently using? Chrome. So that’s mildly
interesting information, especially since it turns out web
development is still kind of a headache even all these years later after
it’s inception since so many of the manufacturers Google
and Microsoft and Mozilla all can’t re agree on all of
the implementation details. So one of the frustrations
in fact of web development is you might design something on
Chrome, looks amazing on your Mac, looks awful or broken or somehow other
on a PC or on Firefox or on safari or on IE or Edge or any–
I mean it’s the biggest nightmare that libraries, more on
those tomorrow, are helping with. Because you have other people
figure out all these headaches and you build on top of their software. So that’s mildly disconcerting. I also told CNN, already,
my IP address, because that had to be on the envelope for
the response to come back. So there’s a decent amount of
information being leaked here that CNN is inferring. But they can at least use that
as a feature to realize oh, by way of a different user agent,
that’s what this header is called, they can infer if I’m on a mobile
device or an Android device or an iPhone device, which can either
be used for statistical purposes or to actually decide
what kind of data to send or to request once you
visit that first web page. All right. So with that said, what does
this allow us to do ultimately when we have tags like this? So ultimately, we’re going to be able
to do things a little more efficiently by factoring out. This would be an example of
best practices so to speak. But we’ve really only just
scratched the surface of HTML. But I think it’s the kind
of language, frankly, where if that’s the extent
of our formal instruction, here is our– here’s a tag,
here’s what an attribute is, and here’s how to find more information. What I thought I’d propose, so that we
can get everyone back on the same page, give you an opportunity to
get your hands a little dirty, is propose a few different problems. Almost all of which can be
solved with Google or Bing or your favorite search engine. Or by me whispering in your ear
or offering a little bit of tips. So I wanted to propose, I’ll turn
on some music, I’ll wander about. And let me propose that you tackle
one or two of these problems, trying to bring to bear the very basic
conceptual ingredients we provided but also Google. So literally, it is totally
acceptable to type in web page how make text bigger or CSS change
color of text or the like totally fine. And indeed, you’ll find that
this is how many developers early on are self-taught. Once they understand the
conceptual framework, it’s so much easier to bootstrap
yourself to understanding and applying yet new techniques. So I’ll turn on some music,
tackle one or two of these, I’ll wander around
fielding some questions. Yeah, Griff. STUDENT: When you’re creating
a separate style sheet, are you just creating a new
tab at the top of the file? DAVID MALAN: Ah, yes. Correct. I did omit that part. But yes, you would simply in the
IDE, create a new tab and new file. And essentially repeat the steps from
before but call this something.CSS, styles.CSS. And do remember that
chmod command again. This time for this
filename, where again, you’re changing the mode of the file to
give all read permission of styles.CSS for instance. You only have to run that once. Because by default for security
sake, when you create files, they are typically viewable only by you. All right, feel free
to continue tinkering. But I thought I’d try to tie everything
together for our final segment here. This is, of course, Google. And let me go ahead and see. Google is constantly changing some of
this stuff– even more from Google. Is this going to work? OK. Let me go ahead and do
the following in Google. I’m going to go ahead and
search for cats again. Come on, search for cat. Oh. Oh, here we go. Google.com and cats. Enter. And now notice what happens
when I search for cats. So the URL changes from
just www.google.com to slash search questionmark num equals 20
site equals something source equals something– oh, q equals cats. And in fact, let me go ahead and just
delete all of the visual clutter here. And whittle this URL down to just
this canonical form, if you will. And hit Enter, and there’s
apparently no difference. Which is to say, it seems that we can
distill Google’s functionality down to its essence in terms
of its URL and kind of tie this morning’s conversation together
with this afternoon’s and now our HTML focus to figure
out how this exactly works. So let’s go ahead and use
this tool that developers would super often use these days. Going to Developer Tools,
down here going for instance, to the Network tab as I did before. I’m going to click Preserve Log. And I’m going to go ahead
and hit Reload because I want to see exactly what happens when
I visit or search for cats on Google. So a little more modestly
that results in 57 HTTP requests, which at least
is decently fewer than earlier. But let me look at the headers here. Rather, let me look at this top part. So this is the URL that I just requested
and the request method is something we didn’t really talk about explicitly
before, but I used this keyword get. So get being the operative verb
when I made that verbal request earlier for something of Google. Status code is 200 and this
is a code you don’t often see. But it turns out, when
you visit websites there is a number that you perhaps
annoyingly occasionally see. Yeah, 404, perhaps the biggest. 404 is what? Error more specifically, file not found. And indeed, there’s
this whole laundry list, HTTP status codes that the official
list is– not on Wikipedia– is here. And let me scroll down, it turns
out that you rarely hear about 200 because it means everything is OK. But, indeed, when you visit a web
page by default, if all is well, you’re getting back in the
virtual envelope from the server, as, say, Sean did for me with
the cat, a 200 message saying OK, here is the satisfaction
of your request. Less common would be to 201, 202. But if I scroll down here
to the 400s, 400s are bad. Literally, bad request, if the browser
or client has malformed its request. Unauthorized would mean there’s some
kind of authentication required. 402, you don’t really
see payment required. Apparently future use has
been ongoing for some time. Forbidden means, like some of you saw,
like me, you had to chmod your files to make them world viewable. And then here’s the
famous 404 not found. So just years ago,
people decided somewhat arbitrarily that 404
shall henceforth mean the file is not found on the server. And thus was born millions of
mistakes that we all see later. Worse is– well this
one’s kind of funny, gone. Like the page you’re
looking for is gone, although you won’t typically see that. 500, internal server
error, you might see if you’re developing
a web-based business and you or your engineering
team is prone to mistakes. Sometimes this means there’s
a particularly bad problem with the code on the backend,
with load balancers these days. Not uncommon is to occasionally
see 503 service unavailable, which typically means the load
balancer is responding to you but no back servers
are actually available, they’re overloaded or offline
or broken for some reason. And then another juicy one, that’s worth
noting now because it ties together our chat earlier, is moved permanently. This web page has moved permanently. Or found, the implication
of which is that it actually wants you to go elsewhere. So let me do this. Let me go to this little
text-based program, Telnet, and let me try going
to Google.com port 80. Let me get their home
page using version 1.1 of HTTP, which I’m pretending to speak. And the host shall be Google.com. Enter. Notice what actually comes back if I
visit not www.Google.com, which I did do earlier but just Google.com. A very, very small web page that
literally says the document has moved. And most humans don’t even see that
because their browsers understand HTTP, specifically status codes. And what’s the status code
apparently that came back here? It’s 301, moved permanently. What does that mean? Well, if you look a little lower,
it includes a location header, which we’ve not seen before even
though we’ve been experiencing it with some of today’s demos. Google wants me to go
to which URL instead? Yeah, so they apparently
want me to go to www instead. So let me try that. Let me instead go to–
sorry, let me do this again. To www Enter, get slash HTTP
1.1 host www.Google.com Enter. And now it actually comes back. So Google wants me to
go to www.Google.com. But both work apparently, it’s
not like the site is unresponsive like Harvard’s was years
ago as I mentioned. So why does Google want me to
standardize on www.Google.com? STUDENT: [INAUDIBLE] DAVID MALAN: Yeah, force me
to go to one central place so that they don’t have
to maintain two websites. Sure, why else? Why did I tell you to go to
CS50.io as opposed to www.CS50.io? Or why do you go to bit.ly instead
of www.bit.ly for URL shortening? It’s a popular URL shortening service. Easier and more to the point there. Shorter, not to have to type it. And so indeed, it’s just this human
convention that we have long had these subdomain’s so to speak of, www—
or not even subdomains, host names, www.something.com, it was all the more
of a visual cue that the user should be going there. It’s a web page, but it’s
slightly more verbose. And also for technical
reasons, turns out there’s these things called
cookies– more on those tomorrow. And cookies are actually scoped
to the domain name in question. So it turns out, especially
for bigger companies, if you want to have multiple
web applications, all of them in something.com, you can give
them each their own cookies based on what that initial host name is. So you want to force your user
to go to some host name, not just your root domain name
so that you actually have a bit more control over
what they’re actually receiving. But we started this conversation
by looking at Google, here, whose queries ended
up looking like this. So it turns out, you know what, I
bet we could re implement Google. And we don’t even have to do it with
Cloud9, but I’ll do it over here. You could do this on
your desktop as well. Let me go ahead and whittle this down
to be called my version of Google. And then down here, let me get
rid of the fake Latin text. And then here, let me introduce
a new technique, a form tag, the action of which– we’ll
come back to that actually. Form tag that has an input
whose type shall equal text. And whose name is
going to be oh, say, q. And then another input,
whose type is submit, whose value is use my version of Google. Close quote, close tag, Save. Excuse me, it’s not complete yet. But let me reload this page. Interesting. It’s not all that sexy, either. So let me add a logo here. My version of Google, which
notice is using an H1 tag, heading one, which just
means big and bold. Reload. OK. It’s getting a little better. And frankly, Google 1999,
it’s not all that far off now from what Google was if you
want to relive the 1990s there. All right, we need a
little bit of color, but it’s not all that dissimilar to
what it is now ironically or remarkably. So let’s just keep it here, we
won’t worry about aesthetics. I want to go ahead and search for cats. Whoops! Not cats, but cats. Huh. It doesn’t work, but notice here, it
did automatically append a question mark and a q equals cats as soon as I search. But I don’t want the user to go to
my page because I have no database, I have no search results. What could I do? It turns out that I could go in here
and add an action whose value is https www.Google.com/search
and just to be clear, I’m going to use a method of Get
as opposed to something else, which we’ll come back to called Post. Get, now, if I save
this, reload my page. And type cats, notice the URL
from which I go and to which I go. I seem to have re-implemented Google. I’m sort of– I’ve implemented
the front-end so to speak, Google has implemented
a much harder backend. But notice, it even works for dogs. And notice it’s pre-populating
this field up here. So what’s actually going on? Well, if we actually look
at this HTTP request, let me go to search for cats again. But let me go ahead and open
up that same developer toolbar that I showed earlier. And preserve the logs so we can
see everything that happens. And now I’m going to click
Use my version of Google. Let’s see what actually goes through. Here, in Google’s case, notice that
I’m requesting this URL, q equals cats. Method is get 200, but where
did that request come from? Whereas before it came
from Google’s own server. Well, their server was
just a web page that I happened to visit at www.Google.com,
so I filled it out and submitted it. But there’s nothing stopping me from
filling out or creating my own web form whose action, so to speak,
goes elsewhere for fulfillment. That’s not typically the
common case but it does rather tie all of these various
ingredients together. So what does this mean
now for the alternatives? There’s Get and then lastly,
there’s this thing called Post. When might you not want to use Get,
and why, based only on the example we’ve seen here. STUDENT: [INAUDIBLE] DAVID MALAN: OK. So if you just want one
result, you might not use Get. You could still use Get for that. And in fact, actually we can–
let’s see, if we can distill this. If we go to google.com, you’ve reminded
me of the I’m feeling lucky button. If you do search, click
on I’m feeling lucky. Let me inspect this so we can sniff
our own network traffic so to speak. Click I’m feeling lucky. Let’s see what’s slightly different. We went to the top hit, which
apparently is Wikipedia. And notice the one thing
that’s different here besides all of this nonsense
is site source– is it this? No. What is the I’m feeling lucky icon? Ah. Let’s see if we can
figure this out real fast. Button L. OK, I think it might be this. Let me secretly make a one
quick change– or not so secretly, while everyone watches. Input, type equals
hidden, value equals 1. And then the name of that
field shall equal this. So I’m guessing this could backfire. I’m guessing that the means by
which Google has implemented their I’m feeling
lucky button is so long as the web form submits a name of
button i or button 1 with a value of 1. My hunch is that’s going to take us
immediately to the first search result. So let’s try that. Let me go back to my version of this. Click Reload, to refresh the form
and search this time for giraffes. Whoops, that’s not giraffes. Giraffes, just to prove
that this is different. Enter. Yeah, this is the top hit
for giraffes on google.com. So what am I playing with here? I’m playing with HTTP parameters. And indeed, this is what
ultimately drives the web. There’s all the silly
aesthetics of HTML, of hyperlinks and images and bold
facing and blue text and sans serif and the like. But none of that is functional per
se, it’s all aesthetic, a markup. With HTTP though, the protocol that
web browsers and server’s speak, we have the ability to pass input from
browser, and in turn human, to server. And the simplest of
schemes is used to do that. Literally, what a browser
does to send input is to send a value like– a name
like q with a value like cat. And if there is a second
argument or input to provide, there is literally an ampersand
and then you say and– what was it? btnI equals 1. And then another ampersand if you have a
third key-value pair or HTTP parameter, and that is how the entire
web works with web forms. If you are using Facebook Messenger
and you fill out the little box to send a message and hit
Enter, what are you doing? You are sending the
equivalent of a message like this, that probably
has not cats for q, but whatever message you typed in. And it’s probably not
called q for query, it’s probably called m
for message or whatever Facebook has decided to call it. But these very simple
basic paradigms are the entirety of what drive the
web and these basic interactions. But sometimes you might not
want to see this in the URL. So to come back to the earlier question,
when might you not want to use Get, whereby Get means that this kind
of stuff ends up in the URL? Indeed, I keep showing in the
browser, precisely the results of clicking Submit. When would you not want your query or
your human input to appear in the URL, would you say? Yeah. Typing in a password. Don’t want it to be
there, why, to be clear? STUDENT: [INAUDIBLE] DAVID MALAN: But I mean suppose it
was just you in from your computer. Let me push back harder, still bad, why? STUDENT: [INAUDIBLE] DAVID MALAN: Yeah,
very reasonable, right? Especially since these days there’s
always this notion of auto complete or a history in your browser. And when it gets stored
in the URL it’s being saved for your later
convenience presumably or for your sibling or significant
other or parents or kids or whoever, prying eyes as well. So you might not want to do that. What else besides passwords
might be sensitive? Usernames. Usernames. Yeah, a little less sensitive
but maybe not something you want to reveal for your accounts. Credit card information, I mean
anything that mildly personal, you probably don’t want it
appearing in the URL insofar as it ends up in the browser’s cache and
history and autocomplete and so forth. In fact, why do some people use–
well, let’s be more technical. Technically, why do people use
incognito mode or private mode browsing? Well, what does it not do? No caches. Yeah, no caching, no cookies
and more on cookies tomorrow. So by using incognito mode or private
mode in your browser, among the things it does is it doesn’t save the
autocomplete and the history. So if you did have the misfortune
of going to a website that foolishly is putting your credit card
and email address and user name and such in the URL such that it
gets saved, at least in cognito mode, when you close it, throws
that information away. Now in practice, most people use it not
for that defense because this is not a common problem, but
really because they don’t want websites they’re visiting
and other such things to end up at all in their history as well. But the same would go true for
credit card numbers or the like. So I can think of at least a
few other things like YouTube, like you upload videos to YouTube. How in the world do you
put a video in the URL? Or how do you put a picture in a URL? All right, so there is kind
of some non-obvious problems to with non-obvious solutions
that arise if we’re using Get and in turn the URL alone. And so that too might kind of bite us. And indeed, there’s no official
limit on the length of URLs. But there is sort of realistic limit
of like 1,000 characters, 2,000 characters, unfortunately it
totally differs by browser. So thankfully, there exists
an alternative to Get called not Get but
Post that functionally does the same kind of thing, it
still has key-values and pairs and equal signs and ampersands. But it doesn’t put it in the
URL, it instead so to speak puts it deeper into
the virtual envelope. So it’s still there, but it’s not
exposed to the browser in this way. And it lets us upload gigabyte
video files, megabyte image files, not to mention our email
addresses and passwords and credit card numbers and the like. And to enable that you simply
do this, change it to Post. Unfortunately the server
has to support it. So if I now search for
cats, again, using Post, unfortunately Google
just doesn’t cooperate because for whatever technical or policy
reason they don’t want to support Post. And it’s probably just because it’s an
unnecessary feature on their servers. And in the case of search
results, they probably want people linking to them
for advertising reasons, for deep linking reasons. So in fact, one of the most
compelling reasons to use Get is when you do want stuff
to end up in the URL. If I highlight this URL
and paste it into an email, I want you to see the
exact same search results. And there’s nothing
more annoying– and this is common to big business, who
have pretty shoddy websites, where there’s no state maintained in the URL. So if you’re shopping on some website
trying to buy something and the state, the unique product
identifier is not in the URL simply because the website has been
using Post or some other mechanism, there’s no way to deep link to
that page that you’re seeing. And so the effect of this
is you copy something, you paste it into an email or instant
message, someone clicks on it, I just see the home page or I’m
not seeing what you’re seeing. And this is all too common
with big e-commerce sites, not Amazon, but sort of less
trendy ones that haven’t really given this much thought or care. Woo! That was a lot. Any questions? No? All right. So where are we going with all of this? So today was meant to start–
well, we started pretty low so to speak with binary and we
sort of built our way back up. And then we reset ourselves
looking at how the web works and incarnation of all
of these ideas that have been– on top of which
we’ve built to get to this point. Where do we go tomorrow? So in the morning, we’ll take a look
at the general notion of privacy, security, and as it relates to society. Drawing upon a couple of
current events from a few months ago like the FBI issues with
Apple and how that played out and what the underlying questions
were, the general idea of encryption and how that works and what it’s
good for and what risks it still presents to you. A case study like something like Dropbox
or SkyDrive or one drive or any number of incarnations of web-based
storage for consumers. Looking then a little
bit about programming. Talking about some of the basic building
blocks, some of the most common data structures and algorithms
that people use that you might typically learn in an
instructor or computer science course. And the kinds of ingredients
you bring in a design meeting when you’re trying to design some
piece of software thinking through, say on a white board, how is
it we’re going to build this or how are we going to
build this efficiently, how are we going to do our analytics
in a more efficient way than just throwing it all in a database
and just searching in an obvious way, actually engineering
non-obvious solutions. Technology stacks, which is just kind
of a generic way of describing types of software and work flows and
design patterns so to speak, with which you can build something
sort of a general methodologies. And then finally, web
programming, looking not at HTML and CSS and the
aesthetics of today’s focus, but rather on some of
the technologies that underlie the most dynamic of websites. For instance, when you click
and drag on Google Maps and you suddenly see an infinite
number of squares from around the world, tiles
that represent that map– how is something like that working? When you’re using Facebook Messenger and
all of a sudden a new message pops up without the whole web page reloading or
without you needing to reload the page, how does all of that work? It’s ultimately driven
by this language called JavaScript, which has no relationship
to Java, another programming language altogether. But as part of our technology
stacks discussion tomorrow, we’ll start sort of a laundry list of
yet more words of jargon and product names and the like just so that
you’re not necessarily comfy with all of those topics, but at least have seen
them and can kind of roughly mentally categorize them all. So as to look more effectively
up that additional information. Any questions? All right. Well, why don’t we officially wrap here. I’ll turn on some music, I’ll
spend some one-on-one time. If you have any questions
you want me to rewind with any of today’s topics
or hands on material, I believe there are some
snacks and drinks outside. There’s a reception officially til 6,
I’ll stick around for any questions. But otherwise we’ll
see you in the morning and I’ll send around an e-mail tonight
with maybe some URLs from today that you might want to look at to play. So enjoy.

, , , , , , , , ,

Post navigation

One thought on “Web Development – Computer Science for Business Leaders – July 2016

Leave a Reply

Your email address will not be published. Required fields are marked *