| Both sides previous revisionPrevious revisionNext revision | Previous revision |
| computer:web [2019-02-05 05:52] – [Beispiele] skrupellos | computer:web [2019-02-09 08:58] (current) – removed skrupellos |
|---|
| ====== Modern Web ====== | |
| ===== Pre* ===== | |
| ^ Doc ^ Type ^ Example ^ Comment ^ | |
| | [[https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/X-DNS-Prefetch-Control#Forcing_lookup_of_specific_hostnames|1]] | DNS Record | ''%%<link rel="dns-prefetch" href="//example.com">%%'' | | | |
| | [[https://developer.mozilla.org/en-US/docs/Web/HTML/Link_types|1]] | TCP Connection | ''%%<link rel="preconnect" href="//example.com">%%'' | | | |
| | [[https://developer.mozilla.org/en-US/docs/Web/HTTP/Link_prefetching_FAQ|1]] | Resource | ''%%<link rel="prefetch" href="//example.com/resource.png">%%'' | Nidrigere Prio als ''preload'' -> \\ Nur für Resourcen auf der "nächsten" Seite | | |
| | [[https://developer.mozilla.org/en-US/docs/Web/HTML/Preloading_content|1]] | Resource | ''%%<link rel="preload" href="//example.com/resource.png" as="image">%%'' | Höhere Prio als ''prefetch'' -> \\ Auch für Resourcen auf der aktuellen Seite | | |
| | [[https://developer.mozilla.org/en-US/docs/Web/HTML/Link_types|1]] | Site | ''%%<link rel="prerender" href="//example.com/site/">%%'' | | | |
| |
| ===== Metadata ===== | |
| * Google ersetzt bei den Breadcrums das erste Element durch die base URL der Seite (nicht der des ersten Elements) und lässt das letzte weg. | |
| * Firefox, Vivaldi und Chromium hängen ihren Namen mit "-" an ''<title>'' im Fenstertitel -> Am besten die Elemente im Title auch mit "-" Trennen. | |
| |
| |
| ==== Open Graph ==== | |
| * [[http://ogp.me/|Doku]] | |
| * ''<html prefix="og: http://ogp.me/ns#">'' nicht vergessen | |
| |
| ==== Twitter ==== | |
| * [[https://developer.twitter.com/en/docs/tweets/optimize-with-cards/overview/markup|Alle Tags]] | |
| * Twitter verwendet das Open Graph Metadatum, wenn das korrespondierende Twitter Datum fehlt. | |
| * Keine namespace definition in ''<html>'' (im gegensatz zu Open Graph) | |
| |
| ==== Beispiele ==== | |
| <code python> | |
| #!/usr/bin/env python3 | |
| |
| import requests | |
| import requests_cache | |
| from bs4 import BeautifulSoup, Comment | |
| |
| sites = [ | |
| 'https://de.wikipedia.org/wiki/Reporter_ohne_Grenzen', | |
| 'https://ebay.us/wWgf3M', | |
| 'https://stackoverflow.com/q/1740341', | |
| 'https://www.amazon.com/dp/1539836835', | |
| 'https://www.imdb.com/title/tt1323594', | |
| 'https://twitter.com/skruppy_/status/1091814026416398337', | |
| 'https://twitter.com/skruppy_', | |
| 'https://www.instagram.com/reporterohnegrenzen', | |
| 'https://www.instagram.com/p/BtV8o9fjkR8/', | |
| 'https://en.zalando.de/zalando-happy-birthday-gift-card-box-light-blue-zzgz000bs-k11.html', | |
| 'https://www.etsy.com/listing/645132564/monty-python-collage-t-shirt', | |
| 'https://www.youtube.com/user/RSFinternet', | |
| 'https://www.youtube.com/watch?v=5OOslRIRZ2k', | |
| 'http://www.spiegel.de/kultur/gesellschaft/a-1244288.html', | |
| 'https://github.com/Skrupellos', | |
| 'https://github.com/Skrupellos/sir', | |
| ] | |
| |
| requests_cache.install_cache('/tmp/requests-cache') | |
| |
| def findMeta(soup, attrKey, attrValue): | |
| return list(map(lambda x: x['content'], soup.find_all('meta', attrs={attrKey: attrValue}))) | |
| |
| def dump(title, values): | |
| print('\033[1;36m%s\033[0m' % title) | |
| for i, val in enumerate(values): | |
| print(' \033[36m%s)\033[0m %s' % (i+1, val)) | |
| |
| for url in sites: | |
| print('\033[1;35;47m %s \033[0m' % url) | |
| html = requests.get(url, headers={ | |
| 'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36', | |
| 'Accept-language': 'en-US,en;q=0.5', | |
| }).text | |
| soup = BeautifulSoup(html, 'html.parser') | |
| for comment in soup.findAll(text=lambda x: isinstance(x, Comment)): | |
| comment.extract() | |
| | |
| dump('<title>', [soup.find('title').string]) | |
| dump('meta - title', findMeta(soup, 'name', 'title')) | |
| dump('meta - og:tite', findMeta(soup, 'property', 'og:title')) | |
| dump('meta - twitter:title', findMeta(soup, 'name', 'twitter:title')) | |
| dump('<h1>', list(map(lambda x: x.prettify(), soup.find_all('h1')))) | |
| | |
| dump('meta - description', findMeta(soup, 'name', 'description')) | |
| dump('meta - og:description', findMeta(soup, 'property', 'og:description')) | |
| dump('meta - twitter:description', findMeta(soup, 'name', 'twitter:description')) | |
| print('') | |
| </code> | |
| |
| ===== Cache ===== | |
| <WRAP center round important 60%> | |
| Dieser Abschnitt ist noch hochgradig Falsch! | |
| </WRAP> | |
| |
| * public proxy (shared cache) vs private browser cache | |
| * Neu Laden vs validieren | |
| |
| ^ ''Cache-Control'' vom Server ^ Für ^ Funktion ^ Typ ^ | |
| | ''%%must-revalidate%%'' | Private / Public | Abgelaufen -> validieren | Re{validate,loading} | | |
| | ''%%no-cache%%'' | | Immer -> validieren | Cachability | | |
| | ''%%no-store%%'' | | Gar nie nicht speichern | Other | | |
| | ''%%no-transform%%'' | | Gecachten content nicht verändern (z.B. Optimieren) | Other | | |
| | ''%%public%%'' | | | Cachability | | |
| | ''%%private%%'' | Public | | Cachability | | |
| | ''%%proxy-revalidate%%'' | Public | Abgelaufen -> validieren | Re{validate,loading} | | |
| | ''%%max-age=<seconds>%%'' | Private / Public | Relative Cache Zeit | Expiration | | |
| | ''%%s-maxage=<seconds>%%'' | Public | Relative Cache Zeit (vorrang vor ''Expires'' und ''s-maxage'') | Expiration | | |
| | ''%%immutable%%'' | | Nicht abgelaufen -> verwenden (auch bei F5, aber nicht bei Shift+F5) / Abgelaufen ->Nicht verwenden [[http://bitsup.blogspot.com/2016/05/cache-control-immutable.html|1]] | Re{validate,loading} | | |
| |
| * https://tools.ietf.org/html/rfc7234 | |
| * https://developer.mozilla.org/en-US/docs/Web/HTTP/Caching | |
| * Schöner Entscheidungsbaum: https://developers.google.com/web/fundamentals/performance/optimizing-content-efficiency/http-caching?hl=de | |
| * https://www.slideshare.net/martinmartin7777/http-caching-basics-66553113 | |
| * https://www.keycdn.com/blog/http-cache-headers | |
| * https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Cache-Control | |