Saturday 28 January 2017

Déplacement Moyenne Polynôme Doit Être Inversible

9.7 Analyse des données Cette section fait l'objet d'une étude approfondie. Distribution normale. Somme d'un tas de flips de pièces aléatoires. Échantillonnage aléatoire. Mesure physique de certaines constantes inconnues, p. Constante de gravitation. Il ya une erreur associée à chaque mesure, de sorte que nous obtenons des résultats légèrement différents à chaque fois. Notre objectif est d'estimer la quantité inconnue aussi précisément et précisément que possible. La moyenne de l'échantillon et la variance de l'échantillon sont définis comme suit: La moyenne de l'échantillon estime la constante inconnue et la variance de l'échantillon mesure la précision de l'estimation. Dans des conditions assez générales, lorsque n est grand, la moyenne de l'échantillon obéit à une distribution normale avec une moyenne égale à la constante inconnue et une variance égale à la variance de l'échantillon. L'intervalle de confiance approximatif est L'intervalle de confiance mesure l'incertitude associée à notre estimation de la constante inconnue. Cela signifie que si nous effectuons la même expérience plusieurs fois, nous nous attendrions à ce que 95 du temps la moyenne estimée serait dans l'intervalle donné. Le nombre 1.96 se produit parce que la probabilité qu'une variable aléatoire normale se situe entre -1.96 et 1.96 se trouve être 95. Si on veut un intervalle de confiance de 90 ou 99, substituer 1.645 ou 2.575, respectivement. L'intervalle de confiance ci-dessus n'est pas exact. Il est approximatif parce que nous estimons l'écart-type. Dans n est petit (disons moins de 50), nous devrions utiliser l'intervalle de confiance exact de 95 avec la distribution Student T avec n-1 degrés de liberté. Par exemple, s'il ya n 25 échantillons, alors nous devrions utiliser 2.06 au lieu de 1.96. Ces nombres peuvent être calculés en utilisant or124.jar qui contient la bibliothèque OR-Objects. Le programme ProbDemo. java illustre comment l'utiliser. Implémentation. Le programme Average. java est une implémentation simple des formules ci-dessus. Cette formule implique deux passages à travers les données: un pour calculer la moyenne de l'échantillon et un pour calculer la variance de l'échantillon. Ainsi, nous stockons les données dans un tableau. Cela semble un gaspillage car nous pourrions calculer les deux en un seul passage en utilisant la formule de texte de rechange pour la variance de l'échantillon. Nous évitons cette approche en un seul passage parce qu'elle est numériquement instable. (Voir Exercice XYZ et XYZ.) Cette instabilité est plus prononcée lorsque les données ont une petite variance mais un grand nombre de chiffres significatifs. En fait, il peut amener le programme à prendre la racine carrée d'un nombre négatif (voir l'exercice XYZ). Cette subtilité surprend de nombreux programmeurs non initiés. En fait, il surprend même certains très expérimentés. Les versions de Microsoft Excel 1.0 à 2002 implémentent l'algorithme un-passe instable dans plus d'une douzaine de leurs fonctions de bibliothèque statistique. En conséquence, vous pouvez rencontrer des résultats inexacts sans avertissement. Ces bogues ont été corrigés avec la publication d'Excel 2003. Intervalles de confiance. Température en janvier contre juillet. Échantillonnage par sondage. Enquête de recensement, relevés de température, sondages à la sortie des élections, contrôle de la qualité dans les processus de fabrication, vérification des dossiers financiers, épidémiologie, etc. Généralement, les journaux rapportent le résultat de certains sondages comme 47 plusmn 3. Qu'est-ce que cela signifie vraiment? Est implicitement supposé. Nous supposons que la population se compose de N éléments, et nous prenons un échantillon de taille n, et l'échantillon i a une valeur réelle associée x i. Ce qui pourrait représenter le poids ou l'âge. Il pourrait également représenter 0 ou 1 pour indiquer si une caractéristique est présente ou absente (par exemple planifiez de voter pour Kerry). Les techniques de l'échantillonnage aléatoire s'appliquent, sauf que nous avons besoin de faire une correction pour la taille de la population finie. Lorsque N est grand par rapport à n (seule une petite fraction de la population est échantillonnée), les effets de population finis peuvent être ignorés. Histogramme. Le programme Histogram. java affiche dynamiquement l'histogramme lorsque les données sont accumulées. Régression linéaire simple. En 1800 Giuseppe Piazzi a découvert ce qui semblait être une nouvelle star et a suivi son mouvement pendant 41 jours avant de perdre la trace de lui en raison de mauvais temps. Il était étonné puisqu'il se déplaçait dans la direction opposée des autres étoiles. Carl Frederick Gauss a utilisé sa méthode nouvellement inventée des moindres carrés pour prédire où trouver l'étoile. Gauss est devenu célèbre après que l'étoile a été localisée selon sa prédiction. Comme il s'avère, le corps céleste était un astéroïde, le premier jamais découvert. Or, la méthode des moindres carrés est appliquée dans de nombreuses disciplines, de la psychologie à l'épidémiologie jusqu'à la physique. Le calcul célèbre de Gauss impliquait la prédiction de l'emplacement d'un objet en utilisant 6 variables. Nous considérons tout d'abord la régression linéaire simple qui implique seulement une seule variable prédictive x, et nous modélisons la réponse y beta 0 bêta 1 x. Si l'on considère une suite de n couples de nombres réels (x i. Y i), on définit le résidu à x i pour être r i (y i - bêta 0 - bêta 1 xi). L'objectif est d'estimer les valeurs des paramètres non observés bêta 0 et bêta 1 pour rendre les résidus aussi petits que possible. La méthode des moindres carrés consiste à choisir les paramètres pour minimiser la somme des carrés des résidus. En utilisant le calcul élémentaire, nous pouvons obtenir les estimations des moindres carrés classiques: Le programme LinearRegression. java lit dans n mesures de l'entrée standard, les trame et calcule la ligne qui correspond le mieux aux données selon la métrique des moindres carrés. Évaluer la solution adaptée. Pour mesurer la qualité de l'ajustement, on peut calculer le coefficient de détermination R 2 qui mesure la fraction de variabilité des données qui peut être expliquée par la variable x. On peut également estimer l'erreur-type, l'erreur-type de l'estimation de la régression pour le bêta 0 et le bêta 1. Et l'intervalle de confiance approximatif de 95 pour les deux coefficients inconnus sont. Durée des algorithmes. Prendre le journal des deux côtés. La pente est l'exposant, l'interception est la constante. Placer la latitude par rapport à la température de janvier. Dessiner des points à l'intérieur de 2 écarts-types en noir, entre 2 et 3 en bleu, au-dessus de 3 en vert. 18 des 19 outliers sont en Californie ou en Oregon. L'autre est dans le comté de Gunnison, Colorado, qui est à très haute altitude. Peut-être besoin d'intégrer la longitude et alititude dans le modèle. Test de normalité. La régression linéaire multiple. La régression linéaire multiple généralise la régression linéaire simple en permettant plusieurs variables prédictives au lieu d'une seule. Nous modélisons la réponse y beta 0 bêta 1 x 1. Bêta p x p. Maintenant, nous avons une suite de n valeurs de réponse y i. Et une séquence de n vecteurs prédicteurs (x i1. X i2. X ip). L'objectif est d'estimer le vecteur paramètre (beta 0. Bêta p) de façon à minimiser la somme des erreurs au carré. En notation matricielle, nous avons un système d'équations surdéterminé y Xbeta. Notre objectif est de trouver un vecteur bêta qui minimise X bêta - y. En supposant que X a un rang de colonne complet, on peut calculer bêta en résolvant les équations normales X T Xbeta X T y pour calculer notre estimation de bêta. La manière la plus simple de résoudre les équations normales est de calculer explicitement A X T X et b X T y et résoudre le système des équations Ax b en utilisant l'élimination gaussienne. Un algorithme numériquement stable pour le calcul de bêta consiste à calculer la factorisation QR X QR, puis à résoudre le système triangulaire Rbeta Q T y via la substitution arrière. C'est exactement ce que la méthode de résolution de Jamas fait lorsqu'elle est présentée avec un système surdéterminé (suppose que la matrice a un rang de colonne complet). Le programme MultipleLinearRegression. java est une mise en œuvre simple de cette approche. Voir Exercice XYZ pour une méthode basée sur SVD qui fonctionne même si le système n'a pas le rang de colonne complet). Un exemple. Données météorologiques et exemples de cette référence. Températures maximales journalières à n 1070 stations météorologiques aux États-Unis au cours de Mars 2001 Predictors latitude (X1), la longitude (X2), et l'élévation (X3). Modèle Y 101 - 2 X1 0,3 X2 - 0,003 X3. La température augmente à mesure que la longitude augmente (ouest), mais diminue à mesure que l'augmentation de la latitude (nord) et l'alitude augmentent. Est l'effet de la latitude sur la température plus grande dans l'ouest ou l'est Plot scatterplot de la température par rapport à la latitude (diviser à la longitude médiane de 93 degrés) pour mars. Placer les résidus par rapport aux valeurs ajustées. Ne doit pas montrer de motif. Évaluation du modèle. La variance d 'erreur s 2 est la somme de l' erreur au carré divisée par les degrés de liberté (n - p - 1). Les entrées diagonales de la matrice de variance standard sont sigma 2 (X T X) -1 estiment la variance des estimations des paramètres. Régression polynomiale. Variables prédictives au lieu d'une seule. Nous modélisons la réponse y beta 0 bêta 1 x 1. Bêta p x p. PolynomialRegression. java est un type de données pour effectuer une régression polynomiale. Transformée de Fourier discrète. La découverte d'algorithmes efficaces peut avoir un impact social et culturel profond. La transformée de Fourier discrète est une méthode pour décomposer une forme d'onde de N échantillons (par exemple le son) en composants périodiques. La solution de force brute prend du temps proportionnel à N2. À l'âge de 27 ans, Freidrich Gauss a proposé une méthode qui ne nécessite que N log N étapes, et il l'a utilisé pour analyser le mouvement périodique de l'astéroïde Ceres. Cette méthode a été plus tard redécouvert et popularisé par Cooley et Tukey en 1965 après qu'ils ont décrit comment l'implémenter efficacement sur un ordinateur numérique. Leur motivation était la surveillance des essais nucléaires en Union Soviétique et le suivi des sous-marins soviétiques. La FFT est devenue une pierre angulaire du traitement du signal et est un élément crucial des appareils tels que les lecteurs de DVD, les téléphones cellulaires et les lecteurs de disques. Il forme également la base de nombreux formats de données populaires, y compris JPEG, MP3 et DivX. Également analyse de la parole, synthèse musicale, traitement d'image. Les médecins utilisent systématiquement la FFT pour l'imagerie médicale, y compris l'imagerie par résonance magnétique (MRI), la spectroscopie par résonance magnétique (MRS), la tomographie assistée par ordinateur (tomodensitométrie). Une autre application importante est la résolution rapide d'équations aux dérivées partielles avec des conditions aux limites périodiques, notamment l'équation de Poissons et l'équation de Schroedinger non linéaire. Utilisé également pour simuler un mouvement brownien fractionnaire. Sans un moyen rapide de résoudre pour calculer le DFT rien de tout cela ne serait possible. Charles van Loan écrit La FFT est l'un des grands développements informatiques de ce XXe siècle. Il a changé le visage de la science et de l'ingénierie tellement qu'il n'est pas exagéré de dire que la vie telle que nous la connaissons serait très différente sans la FFT. L'analyse de Fourier est une méthode d'approximation d'une fonction (signal) par une somme de sinusoïdes (exponentiels complexes), chacun à une fréquence différente. En utilisant des ordinateurs, nous supposons également que la fonction continue est approchée par un nombre fini de points, échantillonnés sur un intervalle régulier. Les sinusoïdes jouent un rôle crucial dans la physique pour décrire les systèmes oscillants, y compris le mouvement harmonique simple. L'oreille humaine est un analyseur de Fourier pour le son. En gros, l'ouïe humaine fonctionne en divisant une onde sonore en composantes sinusoïdales. Chaque fréquence résonne à une position différente dans la membrane basilaire, et ces signaux sont délivrés au cerveau le long du nerf auditif. L'une des principales applications de la DFT est d'identifier des périodicitées dans les données et leurs forces relatives, par ex. Filtrage du bruit à haute fréquence dans les données acoustiques, isolation des cycles diurnes et annuels des conditions météorologiques, analyse des données astronomiques, réalisation de l'imagerie atmosphérique et identification des tendances saisonnières des données économiques. La transformée de Fourier discrète (DFT) d'une longueur N vecteur complexe x est définie par où i est la racine carrée de -1 et oméga eipip est une racine Nth principale de l'unité. On peut aussi interpréter la DFT comme le produit vecteur-matrice y F N x, où F N est la matrice N-par-N dont la j-ième ligne et la k-ième colonne sont oméga jk. Par exemple, lorsque N 4, Notons que certains auteurs définissent la matrice de Fourier comme le conjugué de notre matrice de Fourier et la normalisent par le facteur 1 sqrt (N) pour la rendre unitaire. Intuition: soit x i les échantillons d'un signal sur un intervalle de temps de 0 à T, et soit f i la DFT. Alors f 0 n est une approximation de la valeur moyenne du signal sur l'intervalle. Le module (valeur absolue) et l'argument (angle) du nombre complexe f j représentent (une moitié) l'amplitude et la phase de la composante de signal ayant la fréquence j T pour j 1 à n2 - 1. Transformée de Fourier rapide. Il est facile de calculer la DFT d'un vecteur de longueur N soit directement à partir de la définition, soit via une multiplication matricielle-vecteur dense. Les deux approches prennent du temps quadratique. La transformée de Fourier rapide (FFT) est une méthode ingénieuse qui calcule la DFT dans le temps proportionnel à N log N. Elle fonctionne en exploitant la symétrie de la matrice de Fourier F. L'idée cruciale est d'utiliser les propriétés des n-ièmes racines de l'unité pour se rapporter La transformée de Fourier d'un vecteur de taille n à deux transformées de Fourier sur des vecteurs de taille n2. Où x même désigne le vecteur de taille n2 constitué de x 0. X 2. X n-2 et x impair représentent le vecteur consistant en x 1. X 3. X n-1. La matrice I n2 est la matrice d'identité n2-by-n2 et la matrice D n2 est la matrice diagonale dont la dixième entrée diagonale est oméga k. La base 2 Cooley-Tukey FFT utilise cette formule récursive pour calculer la DFT dans un cadre de style diviser-et-conquérir. Notez que nous avons implicitement supposé que N est une puissance de 2. Le programme FFT. java est une mise en œuvre de ce schéma. Il s'appuie sur le ADT Complex. java développé dans la section xyz. Programme InplaceFFT. java est une variante en place: il utilise seulement O (1) mémoire supplémentaire. FFT inverse. La DFT inverse est définie comme: L'inverse de FN est le conjugué complexe de lui-même, réduit par un facteur de N. Ainsi, pour calculer la DFT inverse de x: calculer la DFT du conjugué de x, prendre le conjugué de Et multipliez chaque valeur par N. Téléphones à clavier. Touch Tone 174 téléphones codent les presses de touches en tant que signaux audio à l'aide d'un système appelé multifréquence à double tonalité (DTMF). Deux fréquences audio sont associées à chaque pression de touche selon le tableau ci-dessous. Par exemple, lorsque la touche 7 est pressée, le téléphone génère des signaux aux fréquences 770 Hz et 1209 Hz et les somme ensemble. Les fréquences doivent être inférieures à 1,5 des valeurs interdites ou la compagnie de téléphone l'ignore. Les hautes fréquences doivent être au moins aussi fortes que les basses fréquences, mais ne doivent pas dépasser 3 décibels. Implémentations commerciales. Parce que la FFT a une grande importance, il existe une riche littérature d'algorithmes FFT efficaces, et il existe de nombreuses implémentations de bibliothèque optimisées disponibles (par exemple, Matlab et la transformée de Fourier la plus rapide dans l'Ouest). Notre implémentation est une version simple qui saisit les idées les plus salées, mais elle peut être améliorée de plusieurs façons. Par exemple, les implémentations commerciales fonctionnent pour tout N, pas seulement les puissances de 2. Si l'entrée est réelle (au lieu de complexe), ils exploitent une symétrie supplémentaire et fonctionnent plus rapidement. Ils peuvent également manipuler des FFT multidimensionnelles. Notre implémentation FFT a une empreinte mémoire beaucoup plus grande que nécessaire. Avec beaucoup de soin, il est même possible de faire la FFT en place, c'est-à-dire sans arrays supplémentaires autres que x. Les implémentations FFT Commericial utilisent également des algorithmes itératifs au lieu de récursion. Cela peut rendre le code plus efficace, mais plus difficile à comprendre. Les machines de calcul hautes performances ont des processeurs vectoriels spécialisés, qui peuvent effectuer des opérations vectorielles plus rapidement qu'une séquence équivalente d'opérations scalaires. Bien que les calculateurs mesurent souvent la performance en termes de nombre d'échecs (opérations en virgule flottante), avec la FFT, le nombre de mems (accès mémoire) est également critique. Les algorithmes FFT commerciaux accordent une attention particulière aux coûts associés au déplacement des données dans la mémoire. FFT parallèles. Implémenté en matériel. Convolution. La convolution de deux vecteurs est un troisième vecteur qui représente un chevauchement entre les deux vecteurs. Il se pose dans de nombreuses applications: moyenne mobile pondérée dans les statistiques, ombres dans l'optique et échos dans l'acoustique. Etant donné deux signaux périodiques a et b de longueur N, la convolution circulaire de a et b est définie par et on utilise la notation c a o b. Le vecteur b est appelé fonction de réponse impulsionnelle, de filtre, de modèle ou d'étalement de point. Pour voir l'importance de la convolution, considérons deux polynômes de degré N p (x) et q (x). On note que les coefficients de r (x) p (x) q (x) sont obtenus en convolvant les coefficients de p avec q, où p et q sont rembourrés avec des 0 à une longueur 2N. Pour faciliter le calcul, nous pouvons également amorcer avec des 0 premiers supplémentaires pour faire de sa longueur une puissance de 2. Cela simule la convolution linéaire, car nous ne voulons pas de conditions aux limites périodiques. Le résultat théorique de l'analyse de Fourier est le théorème de la convolution. Il indique que la DFT d'une convolution de deux vecteurs est le produit en point de la DFT des deux vecteurs. Le théorème de convolution est utile car la DFT inverse est facile à calculer. Ceci implique que nous pouvons calculer la convolution circulaire (et donc la multiplication polynomiale) en N log N étapes en prenant trois FFT séparés. C'est incroyable sur deux niveaux. Premièrement, cela signifie que nous pouvons multiplier deux polynômes réels (ou complexes) sensiblement plus rapides que la force brute. Deuxièmement, la méthode repose sur des nombres complexes même si la multiplication de deux polynômes réels semble n'avoir rien à voir avec les nombres imaginaires. Matlab fournit une fonction conv qui effectue une convoltion linéaire de deux vecteurs. Cependant, leur mise en œuvre prend du temps quadratique. Dans de nombreuses applications, les vecteurs sont grands, par exemple 1 million d'entrées, et l'utilisation de cette fonction de bibliothèque comme une boîte noire serait inacceptable. En exploitant notre compréhension des algorithmes et de la complexité, nous pouvons remplacer la solution de bibliothèque par une solution optimisée en utilisant la FFT. Comme nous venons de le constater, nous pouvons améliorer de façon spectaculaire le calcul de la convolution en transformant d'abord les données du domaine temporel au domaine fréquentiel . Ce même principe s'applique aux problèmes apparentés, y compris la corrélation croisée, l'autocorrélation, la multiplication polynomiale, les transformations sine et cosinus discrètes. Cela signifie également que nous disposons d'algorithmes de multiplication rapide de matrice-vecteur pour les matrices circulantes et les matrices de Toeplitz, qui apparaissent dans des solutions numériques aux équations aux dérivées partielles. F, y spectre. F réponse impulsionnelle, y réponse en fréquence. 2D DFT. (Exercice) Calculer une DFT 2D d'une matrice N-par-N en prenant une DFT pour chaque colonne, puis en prenant une DFT de chaque rangée des valeurs résultantes. N 2 log N. Q. Pourquoi est-il appelé la distribution des étudiants T A. Découvert par un employé de la société de brassage Guinness nommé William Gosset en 1908, mais Guinness ne lui a pas permis de publier sous son propre nom, il a utilisé Student. Q. Pourquoi minimiser la somme des erreurs au carré au lieu de la somme des erreurs absolues ou d'une autre mesure A. Réponse courte: c'est ce que les scientifiques font dans la pratique. Il ya aussi quelques justifications mathématiques. Le théorème de Gauss-Markov dit que si vous avez un modèle linéaire dans lequel les erreurs ont une moyenne nulle, une variance égale et sont non corrélées, alors les estimations des moindres carrés de a et b (celles qui minimisent la somme de l'erreur carrée) La plus petite variance parmi toutes les estimations non biaisées de a et b. Si en plus nous supposons que les erreurs sont indépendantes et normalement distribuées, alors nous pouvons dériver 95 ou 99 intervalles de confiance. Q. Où puis-je obtenir une bibliothèque de traçage A. Consultez JFreeChart. Voici quelques instructions pour l'utiliser. Ou consultez la boîte à outils Scientific Graphics pour créer des graphiques interactifs et de qualité de publication de données scientifiques. Statistiques de baseball .. Faites une analyse des statistiques de baseball. Histogramme. Modifiez Histogram. java afin que vous n'ayez pas à saisir la plage à l'avance. Histogramme. Modifiez Histogram. java afin qu'il dispose de 10 seaux. Camembert. Plante de tige et de feuille. Régression linéaire simple. Modifiez le programme LinearRegression. java pour tracer et dimensionner les données. Encore une fois, nous prenons soin de choisir un algorithme stable au lieu d'une alternative un peu plus simple. Exercices créatifs Algorithme à un passage. Ecrire un programme OnePass. java qui calcule la moyenne et la variance de l'échantillon en une passe (au lieu de deux) en utilisant la formule de texte de remplacement. Vérifiez qu'il n'est pas numériquement stable en branchant n 3, x 1 1000000000, x 2 1000000001 et x 3 1000000002. L'algorithme one pass donne une variance de 0, mais la vraie réponse est 1. Vérifiez également qu'il peut mener À prendre la racine carrée d'un nombre négatif en branchant une entrée avec n 2, x 1 0,5000000000000002 et x 2 0,5000000000000001. Comparer avec Average. java. Variance de l'échantillon. Mettre en œuvre l'algorithme stable, à passage unique, pour calculer la variance de l'échantillon. Vérifiez que la formule est correcte. Graphique de quantile normal. Pour tester si un ensemble donné de données suit une distribution normale, créez un graphique quantile normal des données et vérifiez si les points se trouvent sur (ou près) d'une droite. Pour créer un graphique de quantile normal, trier les N points de données. Tracer ensuite le ième point de données par rapport à Phi-1 (i N). Graphiques 3D diamant. Ecrire un programme pour lire dans un ensemble de données tridimensionnelles et tracer un graphique en diamant des données comme celle ci-dessous. Les graphiques à diamant présentent plusieurs avantages par rapport aux diagrammes à barres 3D. Raccordement de la courbe polynomiale. Supposons que nous avons un ensemble de N observations (x i. Y i) et nous voulons modéliser les données en utilisant un polynôme de degré faible. Collecter empiriquement n échantillons: (x i. Y i). Dans la notation matricielle, notre problème de moindres carrés est: La matrice X est appelée la matrice de Vandermonde et a le rang de colonne complet si n ge p et les x i sont distincts. Notre problème est un cas particulier de régression linéaire générale. Le vecteur de solution bêta sont les coefficients du polynôme de degré p meilleur ajustement. Rang de régression linéaire déficiente. Meilleure méthode: utiliser SVD. Meilleures propriétés de stabilité numérique. Fonctionne même si A n'a pas de rang complet. Calculez le SVD maigre. A U r Sigma r V r T. Ici r est le rang de A, U r. Sigma r. Et V r sont les premières r colonnes de U, Sigma et V, respectivement. Le poignard pseduoinverse A U (Sigma r) -1 V r T et l'estimation des moindres carrés x Un poignard b. Le pseudoinverse généralise bien l'inverse de la matrice: si A est carré et inversible alors Un poignard A -1. Si A est maigre et a le rang complet alors Un poignard (A T A) -1 A T. Pour calculer un poignard b, ne pas former explicitement le pseudoinverse. Au lieu de cela, calculer v V T b, w Sigma -1 u, x Uw. Notez que Sigma -1 est facile à calculer puisque Sigma est diagonal. Dans Matlab, pinv (A) donne pseudoinverse et svd (A, 0) donne SVD mince pour matrices maigres (mais pas les graisses) Système sous-déterminé. Dans les applications d'ajustement de données, le système d'équations est généralement surdéterminé et A est maigre. Dans les systèmes de contrôle, nous avons un système d'équations sous-déterminé et l'objectif est de trouver un x qui résout Ax b de sorte que la norme de x soit minimisée. Encore une fois, le SVD vient à la rescousse. Si A a le rang de colonne complet, alors Un poignard b est une telle solution. Multiplication polynomiale. Donner deux polynômes de degré m et n, décrire comment calculer leur produit dans le temps O (m log n). Clustering. Les arbres évolutionnaires en biologie, la recherche marketing en affaires, la classification des peintres et des musiciens en arts libéraux, la classification des réponses à l'enquête en sociologie, référence: Guy Bleloch Dernière modification le 2 avril 2011. Copy copy 2000ndash2016 Robert Sedgewick et Kevin Wayne. Tous les droits réservés. Dans les sujets suivants, nous examinerons les techniques qui sont utiles pour l'analyse des données de séries temporelles, c'est-à-dire des séquences de mesures qui suivent des ordres non aléatoires. Contrairement aux analyses d'échantillons aléatoires d'observations qui sont discutées dans le contexte de la plupart des autres statistiques, l'analyse des séries chronologiques repose sur l'hypothèse que les valeurs successives du fichier de données représentent des mesures consécutives prises à intervalles de temps équidistants. Des discussions détaillées sur les méthodes décrites dans cette section peuvent être trouvées dans Anderson (1976), Box et Jenkins (1976), Kendall (1984), Kendall et Ord (1990), Montgomery, Johnson et Gardiner (1990), Pankratz ), Shumway (1988), Vandaele (1983), Walker (1991) et Wei (1989). L'analyse des séries chronologiques a deux objectifs principaux: (a) identifier la nature du phénomène représenté par la séquence des observations et (b) prévoir (prédire les valeurs futures de la variable temporelle). Ces deux objectifs exigent que le schéma des données chronologiques observées soit identifié et décrit plus ou moins formellement. Une fois le modèle établi, nous pouvons l'interpréter et l'intégrer à d'autres données (c'est-à-dire l'utiliser dans notre théorie du phénomène étudié, par exemple les prix des matières premières). Peu importe la profondeur de notre compréhension et la validité de notre interprétation (théorie) du phénomène, nous pouvons extrapoler le modèle identifié pour prédire les événements futurs. Le modèle systématique et le bruit aléatoire Comme dans la plupart des autres analyses, on suppose que les données consistent en un schéma systématique (habituellement un ensemble de composantes identifiables) et un bruit aléatoire (erreur) qui rend généralement le schéma difficile à identifier. La plupart des techniques d'analyse de séries chronologiques impliquent une certaine forme de filtrage du bruit afin de rendre le modèle plus saillant. Deux aspects généraux des modèles de séries chronologiques La plupart des modèles de séries temporelles peuvent être décrits en termes de deux catégories de composantes de base: la tendance et la saisonnalité. Le premier représente une composante linéaire systématique générale ou (le plus souvent) non linéaire qui change avec le temps et qui ne se répète pas ou au moins ne se répète pas dans le laps de temps capté par nos données (par exemple un plateau suivi d'une période de croissance exponentielle). Ce dernier peut avoir une nature formellement similaire (par exemple un plateau suivi d'une période de croissance exponentielle), mais il se répète à intervalles réguliers dans le temps. Ces deux classes générales de composantes de séries chronologiques peuvent coexister dans des données réelles. Par exemple, les ventes d'une entreprise peuvent croître rapidement au cours des années, mais elles suivent toujours des modèles saisonniers constants (par exemple, 25 ventes annuelles par année sont effectuées en décembre, alors que seulement 4 en août). Ce schéma général est bien illustré dans un ensemble de données classiques de la Série G (Box et Jenkins, 1976, p. 531) représentant les totaux mensuels des passagers aériens internationaux (mesurés en milliers) en douze années consécutives, de 1949 à 1960. Sta et graphique ci-dessus). Si vous tracez les observations successives (en mois) du total des passagers aériens, une tendance claire et presque linéaire apparaît, indiquant que l'industrie aérienne a connu une croissance régulière au cours des années (environ 4 fois plus de voyageurs en 1960 qu'en 1949). Dans le même temps, les chiffres mensuels suivront un modèle presque identique chaque année (par exemple, plus de gens voyagent pendant les vacances, puis pendant n'importe quelle autre période de l'année). Cet exemple de fichier de données illustre également un type général très commun de modèle dans les séries chronologiques, où l'amplitude des variations saisonnières augmente avec la tendance globale (c'est-à-dire que la variance est corrélée avec la moyenne sur les segments de la série). Ce schéma, appelé saisonnalité multiplicative, indique que l'amplitude relative des variations saisonnières est constante dans le temps, donc liée à la tendance. Il n'existe pas de techniques quantitatives prouvées pour identifier les composantes de tendance dans les données de séries chronologiques, tant que la tendance est monotone (constamment croissante ou décroissante) qu'une partie de l'analyse des données n'est généralement pas très difficile. Si les données de séries chronologiques contiennent des erreurs considérables, la première étape du processus d'identification des tendances est le lissage. Lissage. Le lissage implique toujours une certaine forme de calcul de la moyenne locale des données, de sorte que les composantes non systématiques des observations individuelles s'annulent mutuellement. La technique la plus courante est le lissage moyen mobile qui remplace chaque élément de la série par la moyenne simple ou pondérée de n éléments environnants, où n est la largeur de la vélocité de lissage (voir Box amp Jenkins, 1976 Velleman amp Hoaglin, 1981). Les médianes peuvent être utilisées au lieu de moyens. Le principal avantage de la médiane par rapport au lissage moyen mobile est que ses résultats sont moins biaisés par des valeurs aberrantes (dans la fenêtre de lissage). Ainsi, s'il ya des valeurs aberrantes dans les données (par exemple en raison d'erreurs de mesure), le lissage médian produit typiquement des courbes plus lissées ou au moins plus fiables que la moyenne mobile basée sur la même largeur de fenêtre. Le principal inconvénient du lissage médian est que, en l'absence de valeurs aberrantes évidentes, il peut produire des courbes plus quotjaggedquot que la moyenne mobile et il ne permet pas de pondération. Dans les cas relativement moins courants (dans les séries chronologiques), lorsque l'erreur de mesure est très importante, on peut utiliser le lissage des carrés des carrés ou des techniques de lissage pondérées exponentiellement négatives. Toutes ces méthodes filtrent le bruit et convertissent les données en une courbe lisse qui est relativement impartiale par des valeurs aberrantes (voir les sections respectives sur chacune de ces méthodes pour plus de détails). Les séries avec relativement peu de points et systématiquement distribués peuvent être lissées avec des splines bicubiques. Montage d'une fonction. De nombreuses données de séries temporelles monotones peuvent être adéquatement approchées par une fonction linéaire s'il existe une composante non linéaire monotone claire, les données doivent d'abord être transformées pour éliminer la non-linéarité. Habituellement, une fonction polynomiale logarithmique, exponentielle ou (moins souvent) peut être utilisée. Analyse de la saisonnalité La dépendance saisonnière (saisonnalité) est une autre composante générale du schéma des séries chronologiques. Le concept a été illustré dans l'exemple des données sur les passagers aériens ci-dessus. Elle est définie formellement comme une dépendance corrélationnelle d'ordre k entre chaque i ième élément de la série et l'élément (i-k) th (Kendall, 1976) et mesurée par autocorrélation (c'est-à-dire une corrélation entre les deux termes) k est habituellement appelée décalage. Si l'erreur de mesure n'est pas trop importante, la saisonnalité peut être identifiée visuellement dans la série comme un motif qui répète tous les éléments k. Corrélogramme d'autocorrélation. Les modèles saisonniers des séries chronologiques peuvent être examinés par corrélogrammes. Le corrélogramme (autocorrelogramme) affiche graphiquement et numériquement la fonction d'autocorrélation (ACF), c'est-à-dire les coefficients de corrélation série (et leurs erreurs-types) pour des décalages consécutifs dans une plage de décalages spécifiée (par exemple de 1 à 30). Les corrélations de deux erreurs-types pour chaque lag sont habituellement marquées en corrélogrammes, mais la taille de l'autocorrélation est plus intéressante que sa fiabilité (voir Concepts élémentaires) parce que nous nous intéressons habituellement aux autocorrélations très fortes (et donc très significatives). Examiner les corrélogrammes. Tout en examinant les corrélogrammes, on doit garder à l'esprit que les autocorrélations pour les décalages consécutifs sont formellement dépendantes. Prenons l'exemple suivant. Si le premier élément est étroitement lié au second, et le second au troisième, alors le premier élément doit également être relié quelque peu au troisième, etc. Ceci implique que le modèle des dépendances série peut changer considérablement après avoir enlevé le premier ordre Auto corrélation (c'est-à-dire après différenciation de la série avec un décalage de 1). Autocorrélations partielles. Another useful method to examine serial dependencies is to examine the partial autocorrelation function ( PACF ) - an extension of autocorrelation, where the dependence on the intermediate elements (those within the lag) is removed. In other words the partial autocorrelation is similar to autocorrelation, except that when calculating it, the (auto) correlations with all the elements within the lag are partialled out (Box amp Jenkins, 1976 see also McDowall, McCleary, Meidinger, amp Hay, 1980). If a lag of 1 is specified (i. e. there are no intermediate elements within the lag), then the partial autocorrelation is equivalent to auto correlation. In a sense, the partial autocorrelation provides a quotcleanerquot picture of serial dependencies for individual lags (not confounded by other serial dependencies). Removing serial dependency. Serial dependency for a particular lag of k can be removed by differencing the series, that is converting each i th element of the series into its difference from the ( i-k )th element. There are two major reasons for such transformations. First, one can identify the hidden nature of seasonal dependencies in the series. Remember that, as mentioned in the previous paragraph, autocorrelations for consecutive lags are interdependent. Therefore, removing some of the autocorrelations will change other auto correlations, that is, it may eliminate them or it may make some other seasonalities more apparent. The other reason for removing seasonal dependencies is to make the series stationary which is necessary for ARIMA and other techniques. The modeling and forecasting procedures discussed in the Identifying Patterns in Time Series Data . involved knowledge about the mathematical model of the process. However, in real-life research and practice, patterns of the data are unclear, individual observations involve considerable error, and we still need not only to uncover the hidden patterns in the data but also generate forecasts. The ARIMA methodology developed by Box and Jenkins (1976) allows us to do just that it has gained enormous popularity in many areas and research practice confirms its power and flexibility (Hoff, 1983 Pankratz, 1983 Vandaele, 1983). However, because of its power and flexibility, ARIMA is a complex technique it is not easy to use, it requires a great deal of experience, and although it often produces satisfactory results, those results depend on the researchers level of expertise (Bails amp Peppers, 1982). The following sections will introduce the basic ideas of this methodology. For those interested in a brief, applications-oriented (non - mathematical), introduction to ARIMA methods, we recommend McDowall, McCleary, Meidinger, and Hay (1980). Two Common Processes Autoregressive process. Most time series consist of elements that are serially dependent in the sense that one can estimate a coefficient or a set of coefficients that describe consecutive elements of the series from specific, time-lagged (previous) elements. This can be summarized in the equation: Where: is a constant (intercept), and 1 . 2 . 3 are the autoregressive model parameters. Put in words, each observation is made up of a random error component (random shock, ) and a linear combination of prior observations. Stationarity requirement. Note that an autoregressive process will only be stable if the parameters are within a certain range for example, if there is only one autoregressive parameter then is must fall within the interval of -1 lt lt 1. Otherwise, past effects would accumulate and the values of successive x t s would move towards infinity, that is, the series would not be stationary. If there is more than one autoregressive parameter, similar (general) restrictions on the parameter values can be defined (e. g. see Box Jenkins, 1976 Montgomery, 1990). Moving average process. Independent from the autoregressive process, each element in the series can also be affected by the past error (or random shock) that cannot be accounted for by the autoregressive component, that is: Where: 181 is a constant, and 1 . 2 . 3 are the moving average model parameters. Put in words, each observation is made up of a random error component (random shock, ) and a linear combination of prior random shocks. Invertibility requirement. Without going into too much detail, there is a quotdualityquot between the moving average process and the autoregressive process (e. g. see Box amp Jenkins, 1976 Montgomery, Johnson, amp Gardiner, 1990), that is, the moving average equation above can be rewritten ( inverted ) into an autoregressive form (of infinite order). However, analogous to the stationarity condition described above, this can only be done if the moving average parameters follow certain conditions, that is, if the model is invertible . Otherwise, the series will not be stationary. Autoregressive moving average model. The general model introduced by Box and Jenkins (1976) includes autoregressive as well as moving average parameters, and explicitly includes differencing in the formulation of the model. Specifically, the three types of parameters in the model are: the autoregressive parameters ( p ), the number of differencing passes ( d ), and moving average parameters ( q ). In the notation introduced by Box and Jenkins, models are summarized as ARIMA ( p, d, q ) so, for example, a model described as (0, 1, 2) means that it contains 0 (zero) autoregressive ( p ) parameters and 2 moving average ( q ) parameters which were computed for the series after it was differenced once. Identification. As mentioned earlier, the input series for ARIMA needs to be stationary. that is, it should have a constant mean, variance, and autocorrelation through time. Therefore, usually the series first needs to be differenced until it is stationary (this also often requires log transforming the data to stabilize the variance). The number of times the series needs to be differenced to achieve stationarity is reflected in the d parameter (see the previous paragraph). In order to determine the necessary level of differencing, one should examine the plot of the data and autocorrelogram. Significant changes in level (strong upward or downward changes) usually require first order non seasonal (lag1) differencing strong changes of slope usually require second order non seasonal differencing. Seasonal patterns require respective seasonal differencing (see below). If the estimated autocorrelation coefficients decline slowly at longer lags, first order differencing is usually needed. However, one should keep in mind that some time series may require little or no differencing, and that over differenced series produce less stable coefficient estimates. At this stage (which is usually called Identification phase, see below) we also need to decide how many autoregressive ( p ) and moving average ( q ) parameters are necessary to yield an effective but still parsimonious model of the process ( parsimonious means that it has the fewest parameters and greatest number of degrees of freedom among all models that fit the data). In practice, the numbers of the p or q parameters very rarely need to be greater than 2 (see below for more specific recommendations). Estimation and Forecasting. At the next step ( Estimation ), the parameters are estimated (using function minimization procedures, see below for more information on minimization procedures see also Nonlinear Estimation ), so that the sum of squared residuals is minimized. The estimates of the parameters are used in the last stage ( Forecasting ) to calculate new values of the series (beyond those included in the input data set) and confidence intervals for those predicted values. The estimation process is performed on transformed (differenced) data before the forecasts are generated, the series needs to be integrated (integration is the inverse of differencing) so that the forecasts are expressed in values compatible with the input data. This automatic integration feature is represented by the letter I in the name of the methodology (ARIMA Auto-Regressive Integrated Moving Average). The constant in ARIMA models. In addition to the standard autoregressive and moving average parameters, ARIMA models may also include a constant, as described above. The interpretation of a (statistically significant) constant depends on the model that is fit. Specifically, (1) if there are no autoregressive parameters in the model, then the expected value of the constant is , the mean of the series (2) if there are autoregressive parameters in the series, then the constant represents the intercept. If the series is differenced, then the constant represents the mean or intercept of the differenced series For example, if the series is differenced once, and there are no autoregressive parameters in the model, then the constant represents the mean of the differenced series, and therefore the linear trend slope of the un-differenced series. Number of parameters to be estimated. Before the estimation can begin, we need to decide on (identify) the specific number and type of ARIMA parameters to be estimated. The major tools used in the identification phase are plots of the series, correlograms of auto correlation (ACF), and partial autocorrelation (PACF). The decision is not straightforward and in less typical cases requires not only experience but also a good deal of experimentation with alternative models (as well as the technical parameters of ARIMA). However, a majority of empirical time series patterns can be sufficiently approximated using one of the 5 basic models that can be identified based on the shape of the autocorrelogram (ACF) and partial auto correlogram (PACF). The following brief summary is based on practical recommendations of Pankratz (1983) for additional practical advice, see also Hoff (1983), McCleary and Hay (1980), McDowall, McCleary, Meidinger, and Hay (1980), and Vandaele (1983). Also, note that since the number of parameters (to be estimated) of each kind is almost never greater than 2, it is often practical to try alternative models on the same data. One autoregressive (p) parameter . ACF - exponential decay PACF - spike at lag 1, no correlation for other lags. Two autoregressive (p) parameters . ACF - a sine-wave shape pattern or a set of exponential decays PACF - spikes at lags 1 and 2, no correlation for other lags. One moving average (q) parameter . ACF - spike at lag 1, no correlation for other lags PACF - damps out exponentially. Two moving average (q) parameters . ACF - spikes at lags 1 and 2, no correlation for other lags PACF - a sine-wave shape pattern or a set of exponential decays. One autoregressive (p) and one moving average (q) parameter . ACF - exponential decay starting at lag 1 PACF - exponential decay starting at lag 1. Seasonal models. Multiplicative seasonal ARIMA is a generalization and extension of the method introduced in the previous paragraphs to series in which a pattern repeats seasonally over time. In addition to the non-seasonal parameters, seasonal parameters for a specified lag (established in the identification phase) need to be estimated. Analogous to the simple ARIMA parameters, these are: seasonal autoregressive ( ps ), seasonal differencing ( ds ), and seasonal moving average parameters ( qs ). For example, the model ( 0,1,2 )( 0,1,1 ) describes a model that includes no autoregressive parameters, 2 regular moving average parameters and 1 seasonal moving average parameter, and these parameters were computed for the series after it was differenced once with lag 1, and once seasonally differenced. The seasonal lag used for the seasonal parameters is usually determined during the identification phase and must be explicitly specified. The general recommendations concerning the selection of parameters to be estimated (based on ACF and PACF) also apply to seasonal models. The main difference is that in seasonal series, ACF and PACF will show sizable coefficients at multiples of the seasonal lag (in addition to their overall patterns reflecting the non seasonal components of the series). There are several different methods for estimating the parameters. All of them should produce very similar estimates, but may be more or less efficient for any given model. In general, during the parameter estimation phase a function minimization algorithm is used (the so-called quasi-Newton method refer to the description of the Nonlinear Estimation method) to maximize the likelihood (probability) of the observed series, given the parameter values. In practice, this requires the calculation of the (conditional) sums of squares (SS) of the residuals, given the respective parameters. Different methods have been proposed to compute the SS for the residuals: (1) the approximate maximum likelihood method according to McLeod and Sales (1983), (2) the approximate maximum likelihood method with backcasting, and (3) the exact maximum likelihood method according to Melard (1984). Comparison of methods. In general, all methods should yield very similar parameter estimates. Also, all methods are about equally efficient in most real-world time series applications. However, method 1 above, (approximate maximum likelihood, no backcasts) is the fastest, and should be used in particular for very long time series (e. g. with more than 30,000 observations). Melards exact maximum likelihood method (number 3 above) may also become inefficient when used to estimate parameters for seasonal models with long seasonal lags (e. g. with yearly lags of 365 days). On the other hand, you should always use the approximate maximum likelihood method first in order to establish initial parameter estimates that are very close to the actual final values thus, usually only a few iterations with the exact maximum likelihood method ( 3 . above) are necessary to finalize the parameter estimates. Parameter standard errors. For all parameter estimates, you will compute so-called asymptotic standard errors . These are computed from the matrix of second-order partial derivatives that is approximated via finite differencing (see also the respective discussion in Nonlinear Estimation ). Penalty value. As mentioned above, the estimation procedure requires that the (conditional) sums of squares of the ARIMA residuals be minimized. If the model is inappropriate, it may happen during the iterative estimation process that the parameter estimates become very large, and, in fact, invalid. In that case, the it will assign a very large value (a so-called penalty value ) to the SS. This usually entices the iteration process to move the parameters away from invalid ranges. However, in some cases even this strategy fails, and you may see on the screen (during the Estimation procedure ) very large values for the SS in consecutive iterations. In that case, carefully evaluate the appropriateness of your model. If your model contains many parameters, and perhaps an intervention component (see below), you may try again with different parameter start values. Evaluation of the Model Parameter estimates. You will report approximate t values, computed from the parameter standard errors (see above). If not significant, the respective parameter can in most cases be dropped from the model without affecting substantially the overall fit of the model. Other quality criteria. Another straightforward and common measure of the reliability of the model is the accuracy of its forecasts generated based on partial data so that the forecasts can be compared with known (original) observations. However, a good model should not only provide sufficiently accurate forecasts, it should also be parsimonious and produce statistically independent residuals that contain only noise and no systematic components (e. g. the correlogram of residuals should not reveal any serial dependencies). A good test of the model is (a) to plot the residuals and inspect them for any systematic trends, and (b) to examine the autocorrelogram of residuals (there should be no serial dependency between residuals). Analysis of residuals. The major concern here is that the residuals are systematically distributed across the series (e. g. they could be negative in the first part of the series and approach zero in the second part) or that they contain some serial dependency which may suggest that the ARIMA model is inadequate. The analysis of ARIMA residuals constitutes an important test of the model. The estimation procedure assumes that the residual are not (auto-) correlated and that they are normally distributed. Limitations. The ARIMA method is appropriate only for a time series that is stationary (i. e. its mean, variance, and autocorrelation should be approximately constant through time) and it is recommended that there are at least 50 observations in the input data. It is also assumed that the values of the estimated parameters are constant throughout the series. Interrupted Time Series ARIMA A common research questions in time series analysis is whether an outside event affected subsequent observations. For example, did the implementation of a new economic policy improve economic performance did the a new anti-crime law affect subsequent crime rates and so on. In general, we would like to evaluate the impact of one or more discrete events on the values in the time series. This type of interrupted time series analysis is described in detail in McDowall, McCleary, Meidinger, Hay (1980). McDowall, et. Al. distinguish between three major types of impacts that are possible: (1) permanent abrupt, (2) permanent gradual, and (3) abrupt temporary. See also: Identifying Patterns in Time Series Data ARIMA Exponential Smoothing Seasonal Decomposition (Census I) X-11 Census method II seasonal adjustment X-11 Census method II result tables Distributed Lags Analysis Single Spectrum (Fourier) Analysis Cross-spectrum Analysis Basic Notations and Principles Fast Fourier Transformations Exponential smoothing has become very popular as a forecasting method for a wide variety of time series data. Historically, the method was independently developed by Brown and Holt. Brown worked for the US Navy during World War II, where his assignment was to design a tracking system for fire-control information to compute the location of submarines. Later, he applied this technique to the forecasting of demand for spare parts (an inventory control problem). He described those ideas in his 1959 book on inventory control. Holts research was sponsored by the Office of Naval Research independently, he developed exponential smoothing models for constant processes, processes with linear trends, and for seasonal data. Gardner (1985) proposed a quotunifiedquot classification of exponential smoothing methods. Excellent introductions can also be found in Makridakis, Wheelwright, and McGee (1983), Makridakis and Wheelwright (1989), Montgomery, Johnson, amp Gardiner (1990). Simple Exponential Smoothing A simple and pragmatic model for a time series would be to consider each observation as consisting of a constant ( b ) and an error component (epsilon), that is: X t b t . The constant b is relatively stable in each segment of the series, but may change slowly over time. If appropriate, then one way to isolate the true value of b . and thus the systematic or predictable part of the series, is to compute a kind of moving average, where the current and immediately preceding (quotyoungerquot) observations are assigned greater weight than the respective older observations. Simple exponential smoothing accomplishes exactly such weighting, where exponentially smaller weights are assigned to older observations. The specific formula for simple exponential smoothing is: When applied recursively to each successive observation in the series, each new smoothed value (forecast) is computed as the weighted average of the current observation and the previous smoothed observation the previous smoothed observation was computed in turn from the previous observed value and the smoothed value before the previous observation, and so on. Thus, in effect, each smoothed value is the weighted average of the previous observations, where the weights decrease exponentially depending on the value of parameter (alpha). If is equal to 1 (one) then the previous observations are ignored entirely if is equal to 0 (zero), then the current observation is ignored entirely, and the smoothed value consists entirely of the previous smoothed value (which in turn is computed from the smoothed observation before it, and so on thus all smoothed values will be equal to the initial smoothed value S 0 ). Values of in-between will produce intermediate results. Even though significant work has been done to study the theoretical properties of (simple and complex) exponential smoothing (e. g. see Gardner, 1985 Muth, 1960 see also McKenzie, 1984, 1985), the method has gained popularity mostly because of its usefulness as a forecasting tool. For example, empirical research by Makridakis et al . (1982, Makridakis, 1983), has shown simple exponential smoothing to be the best choice for one-period-ahead forecasting, from among 24 other time series methods and using a variety of accuracy measures (see also Gross and Craig, 1974, for additional empirical evidence). Thus, regardless of the theoretical model for the process underlying the observed time series, simple exponential smoothing will often produce quite accurate forecasts. Choosing the Best Value for Parameter (alpha) Gardner (1985) discusses various theoretical and empirical arguments for selecting an appropriate smoothing parameter. Obviously, looking at the formula presented above, should fall into the interval between 0 (zero) and 1 (although, see Brenner et al. . 1968, for an ARIMA perspective, implying 0lt lt2). Gardner (1985) reports that among practitioners, an smaller than .30 is usually recommended. However, in the study by Makridakis et al . (1982), values above .30 frequently yielded the best forecasts. After reviewing the literature on this topic, Gardner (1985) concludes that it is best to estimate an optimum from the data (see below), rather than to guess and set an artificially low value. Estimating the best value from the data. In practice, the smoothing parameter is often chosen by a grid search of the parameter space that is, different solutions for are tried starting, for example, with 0.1 to 0.9, with increments of 0.1. Then is chosen so as to produce the smallest sums of squares (or mean squares) for the residuals (i. e. observed values minus one-step-ahead forecasts this mean squared error is also referred to as ex post mean squared error, ex post MSE for short). Indices of Lack of Fit (Error) The most straightforward way of evaluating the accuracy of the forecasts based on a particular value is to simply plot the observed values and the one-step-ahead forecasts. This plot can also include the residuals (scaled against the right Y - axis), so that regions of better or worst fit can also easily be identified. This visual check of the accuracy of forecasts is often the most powerful method for determining whether or not the current exponential smoothing model fits the data. In addition, besides the ex post MSE criterion (see previous paragraph), there are other statistical measures of error that can be used to determine the optimum parameter (see Makridakis, Wheelwright, and McGee, 1983): Mean error: The mean error (ME) value is simply computed as the average error value (average of observed minus one-step-ahead forecast). Obviously, a drawback of this measure is that positive and negative error values can cancel each other out, so this measure is not a very good indicator of overall fit. Mean absolute error: The mean absolute error (MAE) value is computed as the average absolute error value. If this value is 0 (zero), the fit (forecast) is perfect. As compared to the mean squared error value, this measure of fit will de-emphasize outliers, that is, unique or rare large error values will affect the MAE less than the MSE value. Sum of squared error (SSE), Mean squared error. These values are computed as the sum (or average) of the squared error values. This is the most commonly used lack-of-fit indicator in statistical fitting procedures. Percentage error (PE). All the above measures rely on the actual error value. It may seem reasonable to rather express the lack of fit in terms of the relative deviation of the one-step-ahead forecasts from the observed values, that is, relative to the magnitude of the observed values. For example, when trying to predict monthly sales that may fluctuate widely (e. g. seasonally) from month to month, we may be satisfied if our prediction quothits the targetquot with about 10 accuracy. In other words, the absolute errors may be not so much of interest as are the relative errors in the forecasts. To assess the relative error, various indices have been proposed (see Makridakis, Wheelwright, and McGee, 1983). The first one, the percentage error value, is computed as: where X t is the observed value at time t . and F t is the forecasts (smoothed values). Mean percentage error (MPE). This value is computed as the average of the PE values. Mean absolute percentage error (MAPE). As is the case with the mean error value (ME, see above), a mean percentage error near 0 (zero) can be produced by large positive and negative percentage errors that cancel each other out. Thus, a better measure of relative overall fit is the mean absolute percentage error. Also, this measure is usually more meaningful than the mean squared error. For example, knowing that the average forecast is off by 5 is a useful result in and of itself, whereas a mean squared error of 30.8 is not immediately interpretable. Automatic search for best parameter. A quasi-Newton function minimization procedure (the same as in ARIMA is used to minimize either the mean squared error, mean absolute error, or mean absolute percentage error. In most cases, this procedure is more efficient than the grid search (particularly when more than one parameter must be determined), and the optimum parameter can quickly be identified. The first smoothed value S 0 . A final issue that we have neglected up to this point is the problem of the initial value, or how to start the smoothing process. If you look back at the formula above, it is evident that one needs an S 0 value in order to compute the smoothed value (forecast) for the first observation in the series. Depending on the choice of the parameter (i. e. when is close to zero), the initial value for the smoothing process can affect the quality of the forecasts for many observations. As with most other aspects of exponential smoothing it is recommended to choose the initial value that produces the best forecasts. On the other hand, in practice, when there are many leading observations prior to a crucial actual forecast, the initial value will not affect that forecast by much, since its effect will have long faded from the smoothed series (due to the exponentially decreasing weights, the older an observation the less it will influence the forecast). Seasonal and Non-seasonal Models With or Without Trend The discussion above in the context of simple exponential smoothing introduced the basic procedure for identifying a smoothing parameter, and for evaluating the goodness-of-fit of a model. In addition to simple exponential smoothing, more complex models have been developed to accommodate time series with seasonal and trend components. The general idea here is that forecasts are not only computed from consecutive previous observations (as in simple exponential smoothing), but an independent (smoothed) trend and seasonal component can be added. Gardner (1985) discusses the different models in terms of seasonality (none, additive, or multiplicative) and trend (none, linear, exponential, or damped). Additive and multiplicative seasonality. Many time series data follow recurring seasonal patterns. For example, annual sales of toys will probably peak in the months of November and December, and perhaps during the summer (with a much smaller peak) when children are on their summer break. This pattern will likely repeat every year, however, the relative amount of increase in sales during December may slowly change from year to year. Thus, it may be useful to smooth the seasonal component independently with an extra parameter, usually denoted as ( delta ). Seasonal components can be additive in nature or multiplicative. For example, during the month of December the sales for a particular toy may increase by 1 million dollars every year. Thus, we could add to our forecasts for every December the amount of 1 million dollars (over the respective annual average) to account for this seasonal fluctuation. In this case, the seasonality is additive . Alternatively, during the month of December the sales for a particular toy may increase by 40, that is, increase by a factor of 1.4. Thus, when the sales for the toy are generally weak, than the absolute (dollar) increase in sales during December will be relatively weak (but the percentage will be constant) if the sales of the toy are strong, than the absolute (dollar) increase in sales will be proportionately greater. Again, in this case the sales increase by a certain factor . and the seasonal component is thus multiplicative in nature (i. e. the multiplicative seasonal component in this case would be 1.4). In plots of the series, the distinguishing characteristic between these two types of seasonal components is that in the additive case, the series shows steady seasonal fluctuations, regardless of the overall level of the series in the multiplicative case, the size of the seasonal fluctuations vary, depending on the overall level of the series. The seasonal smoothing parameter . In general the one-step-ahead forecasts are computed as (for no trend models, for linear and exponential trend models a trend component is added to the model see below): In this formula, S t stands for the (simple) exponentially smoothed value of the series at time t . and I t-p stands for the smoothed seasonal factor at time t minus p (the length of the season). Thus, compared to simple exponential smoothing, the forecast is quotenhancedquot by adding or multiplying the simple smoothed value by the predicted seasonal component. This seasonal component is derived analogous to the S t value from simple exponential smoothing as: Put in words, the predicted seasonal component at time t is computed as the respective seasonal component in the last seasonal cycle plus a portion of the error ( e t the observed minus the forecast value at time t ). Considering the formulas above, it is clear that parameter can assume values between 0 and 1. If it is zero, then the seasonal component for a particular point in time is predicted to be identical to the predicted seasonal component for the respective time during the previous seasonal cycle, which in turn is predicted to be identical to that from the previous cycle, and so on. Thus, if is zero, a constant unchanging seasonal component is used to generate the one-step-ahead forecasts. If the parameter is equal to 1, then the seasonal component is modified quotmaximallyquot at every step by the respective forecast error (times (1- ). which we will ignore for the purpose of this brief introduction). In most cases, when seasonality is present in the time series, the optimum parameter will fall somewhere between 0 (zero) and 1(one). Linear, exponential, and damped trend. To remain with the toy example above, the sales for a toy can show a linear upward trend (e. g. each year, sales increase by 1 million dollars), exponential growth (e. g. each year, sales increase by a factor of 1.3), or a damped trend (during the first year sales increase by 1 million dollars during the second year the increase is only 80 over the previous year, i. e. 800,000 during the next year it is again 80 less than the previous year, i. e. 800,000 .8 640,000 etc.). Each type of trend leaves a clear quotsignaturequot that can usually be identified in the series shown below in the brief discussion of the different models are icons that illustrate the general patterns. In general, the trend factor may change slowly over time, and, again, it may make sense to smooth the trend component with a separate parameter (denoted gamma for linear and exponential trend models, and phi for damped trend models). The trend smoothing parameters (linear and exponential trend) and (damped trend). Analogous to the seasonal component, when a trend component is included in the exponential smoothing process, an independent trend component is computed for each time, and modified as a function of the forecast error and the respective parameter. If the parameter is 0 (zero), than the trend component is constant across all values of the time series (and for all forecasts). If the parameter is 1, then the trend component is modified maximally from observation to observation by the respective forecast error. Parameter values that fall in-between represent mixtures of those two extremes. Parameter is a trend modification parameter, and affects how strongly changes in the trend will affect estimates of the trend for subsequent forecasts, that is, how quickly the trend will be damped or increased. Suppose you recorded the monthly passenger load on international flights for a period of 12 years ( see Box amp Jenkins, 1976). If you plot those data, it is apparent that (1) there appears to be a linear upwards trend in the passenger loads over the years, and (2) there is a recurring pattern or seasonality within each year (i. e. most travel occurs during the summer months, and a minor peak occurs during the December holidays). The purpose of the seasonal decomposition method is to isolate those components, that is, to de-compose the series into the trend effect, seasonal effects, and remaining variability. The quotclassicquot technique designed to accomplish this decomposition is known as the Census I method. This technique is described and discussed in detail in Makridakis, Wheelwright, and McGee (1983), and Makridakis and Wheelwright (1989). General model. The general idea of seasonal decomposition is straightforward. In general, a time series like the one described above can be thought of as consisting of four different components: (1) A seasonal component (denoted as S t . where t stands for the particular point in time) (2) a trend component ( T t ), (3) a cyclical component ( C t ), and (4) a random, error, or irregular component ( I t ). The difference between a cyclical and a seasonal component is that the latter occurs at regular (seasonal) intervals, while cyclical factors have usually a longer duration that varies from cycle to cycle. In the Census I method, the trend and cyclical components are customarily combined into a trend-cycle component ( TC t ). The specific functional relationship between these components can assume different forms. However, two straightforward possibilities are that they combine in an additive or a multiplicative fashion: Here X t stands for the observed value of the time series at time t . Given some a priori knowledge about the cyclical factors affecting the series (e. g. business cycles), the estimates for the different components can be used to compute forecasts for future observations. (However, the Exponential smoothing method, which can also incorporate seasonality and trend components, is the preferred technique for forecasting purposes.) Additive and multiplicative seasonality . Let us consider the difference between an additive and multiplicative seasonal component in an example: The annual sales of toys will probably peak in the months of November and December, and perhaps during the summer (with a much smaller peak) when children are on their summer break. This seasonal pattern will likely repeat every year. Seasonal components can be additive or multiplicative in nature. For example, during the month of December the sales for a particular toy may increase by 3 million dollars every year. Thus, we could add to our forecasts for every December the amount of 3 million to account for this seasonal fluctuation. In this case, the seasonality is additive . Alternatively, during the month of December the sales for a particular toy may increase by 40, that is, increase by a factor of 1.4. Thus, when the sales for the toy are generally weak, then the absolute (dollar) increase in sales during December will be relatively weak (but the percentage will be constant) if the sales of the toy are strong, then the absolute (dollar) increase in sales will be proportionately greater. Again, in this case the sales increase by a certain factor . and the seasonal component is thus multiplicative in nature (i. e. the multiplicative seasonal component in this case would be 1.4). In plots of series, the distinguishing characteristic between these two types of seasonal components is that in the additive case, the series shows steady seasonal fluctuations, regardless of the overall level of the series in the multiplicative case, the size of the seasonal fluctuations vary, depending on the overall level of the series. Additive and multiplicative trend-cycle. We can extend the previous example to illustrate the additive and multiplicative trend-cycle components. In terms of our toy example, a fashion trend may produce a steady increase in sales (e. g. a trend towards more educational toys in general) as with the seasonal component, this trend may be additive (sales increase by 3 million dollars per year) or multiplicative (sales increase by 30, or by a factor of 1.3, annually) in nature. In addition, cyclical components may impact sales to reiterate, a cyclical component is different from a seasonal component in that it usually is of longer duration, and that it occurs at irregular intervals. For example, a particular toy may be particularly hot during a summer season (e. g. a particular doll which is tied to the release of a major childrens movie, and is promoted with extensive advertising). Again such a cyclical component can effect sales in an additive manner or multiplicative manner. The Seasonal Decomposition (Census I) standard formulas are shown in Makridakis, Wheelwright, and McGee (1983), and Makridakis and Wheelwright (1989). Moving average. First a moving average is computed for the series, with the moving average window width equal to the length of one season. If the length of the season is even, then the user can choose to use either equal weights for the moving average or unequal weights can be used, where the first and last observation in the moving average window are averaged. Ratios or differences. In the moving average series, all seasonal (within-season) variability will be eliminated thus, the differences (in additive models) or ratios (in multiplicative models) of the observed and smoothed series will isolate the seasonal component (plus irregular component). Specifically, the moving average is subtracted from the observed series (for additive models) or the observed series is divided by the moving average values (for multiplicative models). Seasonal components. The seasonal component is then computed as the average (for additive models) or medial average (for multiplicative models) for each point in the season. (The medial average of a set of values is the mean after the smallest and largest values are excluded). The resulting values represent the (average) seasonal component of the series. Seasonally adjusted series. The original series can be adjusted by subtracting from it (additive models) or dividing it by (multiplicative models) the seasonal component. The resulting series is the seasonally adjusted series (i. e. the seasonal component will be removed). Trend-cycle component. Remember that the cyclical component is different from the seasonal component in that it is usually longer than one season, and different cycles can be of different lengths. The combined trend and cyclical component can be approximated by applying to the seasonally adjusted series a 5 point (centered) weighed moving average smoothing transformation with the weights of 1, 2, 3, 2, 1. Random or irregular component. Finally, the random or irregular (error) component can be isolated by subtracting from the seasonally adjusted series (additive models) or dividing the adjusted series by (multiplicative models) the trend-cycle component. X-11 Census Method II Seasonal Adjustment The general ideas of seasonal decomposition and adjustment are discussed in the context of the Census I seasonal adjustment method ( Seasonal Decomposition (Census I) ). The Census method II (2) is an extension and refinement of the simple adjustment method. Over the years, different versions of the Census method II evolved at the Census Bureau the method that has become most popular and is used most widely in government and business is the so-called X-11 variant of the Census method II (see Hiskin, Young, Musgrave, 1967). Subsequently, the term X-11 has become synonymous with this refined version of the Census method II. In addition to the documentation that can be obtained from the Census Bureau, a detailed summary of this method is also provided in Makridakis, Wheelwright, and McGee (1983) and Makridakis and Wheelwright (1989). Suppose you recorded the monthly passenger load on international flights for a period of 12 years ( see Box Jenkins, 1976). If you plot those data, it is apparent that (1) there appears to be an upwards linear trend in the passenger loads over the years, and (2) there is a recurring pattern or seasonality within each year (i. e. most travel occurs during the summer months, and a minor peak occurs during the December holidays). The purpose of seasonal decomposition and adjustment is to isolate those components, that is, to de-compose the series into the trend effect, seasonal effects, and remaining variability. The classic technique designed to accomplish this decomposition was developed in the 1920s and is also known as the Census I method (see the Census I overview section). This technique is also described and discussed in detail in Makridakis, Wheelwright, and McGee (1983), and Makridakis and Wheelwright (1989). General model. The general idea of seasonal decomposition is straightforward. In general, a time series like the one described above can be thought of as consisting of four different components: (1) A seasonal component (denoted as S t . where t stands for the particular point in time) (2) a trend component ( T t ), (3) a cyclical component ( C t ), and (4) a random, error, or irregular component ( I t ). The difference between a cyclical and a seasonal component is that the latter occurs at regular (seasonal) intervals, while cyclical factors usually have a longer duration that varies from cycle to cycle. The trend and cyclical components are customarily combined into a trend-cycle component ( TC t ). The specific functional relationship between these components can assume different forms. However, two straightforward possibilities are that they combine in an additive or a multiplicative fashion: X t represents the observed value of the time series at time t . Given some a priori knowledge about the cyclical factors affecting the series (e. g. business cycles), the estimates for the different components can be used to compute forecasts for future observations. (However, the Exponential smoothing method, which can also incorporate seasonality and trend components, is the preferred technique for forecasting purposes.) Additive and multiplicative seasonality. Consider the difference between an additive and multiplicative seasonal component in an example: The annual sales of toys will probably peak in the months of November and December, and perhaps during the summer (with a much smaller peak) when children are on their summer break. This seasonal pattern will likely repeat every year. Seasonal components can be additive or multiplicative in nature. For example, during the month of December the sales for a particular toy may increase by 3 million dollars every year. Thus, you could add to your forecasts for every December the amount of 3 million to account for this seasonal fluctuation. In this case, the seasonality is additive . Alternatively, during the month of December the sales for a particular toy may increase by 40, that is, increase by a factor of 1.4. Thus, when the sales for the toy are generally weak, then the absolute (dollar) increase in sales during December will be relatively weak (but the percentage will be constant) if the sales of the toy are strong, then the absolute (dollar) increase in sales will be proportionately greater. Again, in this case the sales increase by a certain factor . and the seasonal component is thus multiplicative in nature (i. e. the multiplicative seasonal component in this case would be 1.4). In plots of series, the distinguishing characteristic between these two types of seasonal components is that in the additive case, the series shows steady seasonal fluctuations, regardless of the overall level of the series in the multiplicative case, the size of the seasonal fluctuations vary, depending on the overall level of the series. Additive and multiplicative trend-cycle. The previous example can be extended to illustrate the additive and multiplicative trend-cycle components. In terms of the toy example, a fashion trend may produce a steady increase in sales (e. g. a trend towards more educational toys in general) as with the seasonal component, this trend may be additive (sales increase by 3 million dollars per year) or multiplicative (sales increase by 30, or by a factor of 1.3, annually) in nature. In addition, cyclical components may impact sales. To reiterate, a cyclical component is different from a seasonal component in that it usually is of longer duration, and that it occurs at irregular intervals. For example, a particular toy may be particularly hot during a summer season (e. g. a particular doll which is tied to the release of a major childrens movie, and is promoted with extensive advertising). Again such a cyclical component can effect sales in an additive manner or multiplicative manner. The Census II Method The basic method for seasonal decomposition and adjustment outlined in the Basic Ideas and Terms topic can be refined in several ways. In fact, unlike many other time-series modeling techniques (e. g. ARIMA ) which are grounded in some theoretical model of an underlying process, the X-11 variant of the Census II method simply contains many ad hoc features and refinements, that over the years have proven to provide excellent estimates for many real-world applications (see Burman, 1979, Kendal Ord, 1990, Makridakis Wheelwright, 1989 Wallis, 1974). Some of the major refinements are listed below. Trading-day adjustment. Different months have different numbers of days, and different numbers of trading-days (i. e. Mondays, Tuesdays, etc.). When analyzing, for example, monthly revenue figures for an amusement park, the fluctuation in the different numbers of Saturdays and Sundays (peak days) in the different months will surely contribute significantly to the variability in monthly revenues. The X-11 variant of the Census II method allows the user to test whether such trading-day variability exists in the series, and, if so, to adjust the series accordingly. Extreme values. Most real-world time series contain outliers, that is, extreme fluctuations due to rare events. For example, a strike may affect production in a particular month of one year. Such extreme outliers may bias the estimates of the seasonal and trend components. The X-11 procedure includes provisions to deal with extreme values through the use of quotstatistical control principles, quot that is, values that are above or below a certain range (expressed in terms of multiples of sigma . the standard deviation) can be modified or dropped before final estimates for the seasonality are computed. Multiple refinements. The refinement for outliers, extreme values, and different numbers of trading-days can be applied more than once, in order to obtain successively improved estimates of the components. The X-11 method applies a series of successive refinements of the estimates to arrive at the final trend-cycle, seasonal, and irregular components, and the seasonally adjusted series. Tests and summary statistics. In addition to estimating the major components of the series, various summary statistics can be computed. For example, analysis of variance tables can be prepared to test the significance of seasonal variability and trading-day variability (see above) in the series the X-11 procedure will also compute the percentage change from month to month in the random and trend-cycle components. As the duration or span in terms of months (or quarters for quarterly X-11 ) increases, the change in the trend-cycle component will likely also increase, while the change in the random component should remain about the same. The width of the average span at which the changes in the random component are about equal to the changes in the trend-cycle component is called the month (quarter) for cyclical dominance . or MCD (QCD) for short. For example, if the MCD is equal to 2 then one can infer that over a 2 month span the trend-cycle will dominate the fluctuations of the irregular (random) component. These and various other results are discussed in greater detail below. Result Tables Computed by the X-11 Method The computations performed by the X-11 procedure are best discussed in the context of the results tables that are reported. The adjustment process is divided into seven major steps, which are customarily labeled with consecutive letters A through G. Prior adjustment (monthly seasonal adjustment only). Before any seasonal adjustment is performed on the monthly time series, various prior user - defined adjustments can be incorporated. The user can specify a second series that contains prior adjustment factors the values in that series will either be subtracted (additive model) from the original series, or the original series will be divided by these values (multiplicative model). For multiplicative models, user-specified trading-day adjustment weights can also be specified. These weights will be used to adjust the monthly observations depending on the number of respective trading-days represented by the observation. Preliminary estimation of trading-day variation (monthly X-11) and weights. Next, preliminary trading-day adjustment factors (monthly X-11 only) and weights for reducing the effect of extreme observations are computed. Final estimation of trading-day variation and irregular weights (monthly X - 11 ). The adjustments and weights computed in B above are then used to derive improved trend-cycle and seasonal estimates. These improved estimates are used to compute the final trading-day factors (monthly X-11 only) and weights. Final estimation of seasonal factors, trend-cycle, irregular, and seasonally adjusted series. The final trading-day factors and weights computed in C above are used to compute the final estimates of the components. Modified original, seasonally adjusted, and irregular series. The original and final seasonally adjusted series, and the irregular component are modified for extremes. The resulting modified series allow the user to examine the stability of the seasonal adjustment. Month (quarter) for cyclical dominance (MCD, QCD), moving average, and summary measures. In this part of the computations, various summary measures (see below) are computed to allow the user to examine the relative importance of the different components, the average fluctuation from month-to-month (quarter-to-quarter), the average number of consecutive changes in the same direction (average number of runs), etc. Charts. Finally, you will compute various charts (graphs) to summarize the results. For example, the final seasonally adjusted series will be plotted, in chronological order, or by month (see below). Specific Description of all Result Tables Computed by the X-11 Method In each part A through G of the analysis (see Results Tables Computed by the X-11 Method ), different result tables are computed. Customarily, these tables are numbered, and also identified by a letter to indicate the respective part of the analysis. For example, table B 11 shows the initial seasonally adjusted series C 11 is the refined seasonally adjusted series, and D 11 is the final seasonally adjusted series. Shown below is a list of all available tables. Those tables identified by an asterisk () are not available (applicable) when analyzing quarterly series. (Also, for quarterly adjustment, some of the computations outlined below are slightly different for example instead of a 12-term monthly moving average, a 4-term quarterly moving average is applied to compute the seasonal factors the initial trend-cycle estimate is computed via a centered 4-term moving average, the final trend-cycle estimate in each part is computed by a 5-term Henderson average.) Following the convention of the Bureau of the Census version of the X-11 method, three levels of printout detail are offered: Standard (17 to 27 tables), Long (27 to 39 tables), and Full (44 to 59 tables). In the description of each table below, the letters S, L, and F are used next to each title to indicate, which tables will be displayed andor printed at the respective setting of the output option. (For the charts, two levels of detail are available: Standard and All .) See the table name below, to obtain more information about that table. A 2. Prior Monthly Adjustment (S) Factors Tables B 14 through B 16, B18, and B19: Adjustment for trading-day variation. These tables are only available when analyzing monthly series. Different months contain different numbers of days of the week (i. e. Mondays, Tuesdays, etc.). In some series, the variation in the different numbers of trading-days may contribute significantly to monthly fluctuations (e. g. the monthly revenues of an amusement park will be greatly influenced by the number of SaturdaysSundays in each month). The user can specify initial weights for each trading-day (see A 4 ), andor these weights can be estimated from the data (the user can also choose to apply those weights conditionally, i. e. only if they explain a significant proportion of variance). B 14. Extreme Irregular Values Excluded from Trading-day Regression (L) B 15. Preliminary Trading-day Regression (L) B 16. Trading-day Adjustment Factors Derived from Regression Coefficients (F) B 17. Preliminary Weights for Irregular Component (L) B 18. Trading-day Factors Derived from Combined Daily Weights (F) B 19. Original Series Adjusted for Trading-day and Prior Variation (F) C 1. Original Series Modified by Preliminary Weights and Adjusted for Trading-day and Prior Variation (L) Tables C 14 through C 16, C 18, and C 19: Adjustment for trading-day variation. These tables are only available when analyzing monthly series, and when adjustment for trading-day variation is requested. In that case, the trading-day adjustment factors are computed from the refined adjusted series, analogous to the adjustment performed in part B ( B 14 through B 16, B 18 and B 19 ). C 14. Extreme Irregular Values Excluded from Trading-day Regression (S) C 15. Final Trading-day Regression (S) C 16. Final Trading-day Adjustment Factors Derived from Regression X11 output: Coefficients (S) C 17. Final Weights for Irregular Component (S) C 18. Final Trading-day Factors Derived From Combined Daily Weights (S) C 19. Original Series Adjusted for Trading-day and Prior Variation (S) D 1. Original Series Modified by Final Weights and Adjusted for Trading-day and Prior Variation (L) Distributed lags analysis is a specialized technique for examining the relationships between variables that involve some delay. For example, suppose that you are a manufacturer of computer software, and you want to determine the relationship between the number of inquiries that are received, and the number of orders that are placed by your customers. You could record those numbers monthly for a one year period, and then correlate the two variables. However, obviously inquiries will precede actual orders, and one can expect that the number of orders will follow the number of inquiries with some delay. Put another way, there will be a (time) lagged correlation between the number of inquiries and the number of orders that are received. Time-lagged correlations are particularly common in econometrics. For example, the benefits of investments in new machinery usually only become evident after some time. Higher income will change peoples choice of rental apartments, however, this relationship will be lagged because it will take some time for people to terminate their current leases, find new apartments, and move. In general, the relationship between capital appropriations and capital expenditures will be lagged, because it will require some time before investment decisions are actually acted upon. In all of these cases, we have an independent or explanatory variable that affects the dependent variables with some lag. The distributed lags method allows you to investigate those lags. Detailed discussions of distributed lags correlation can be found in most econometrics textbooks, for example, in Judge, Griffith, Hill, Luetkepohl, and Lee (1985), Maddala (1977), and Fomby, Hill, and Johnson (1984). In the following paragraphs we will present a brief description of these methods. We will assume that you are familiar with the concept of correlation (see Basic Statistics ), and the basic ideas of multiple regression (see Multiple Regression ). Suppose we have a dependent variable y and an independent or explanatory variable x which are both measured repeatedly over time. In some textbooks, the dependent variable is also referred to as the endogenous variable, and the independent or explanatory variable the exogenous variable. The simplest way to describe the relationship between the two would be in a simple linear relationship: In this equation, the value of the dependent variable at time t is expressed as a linear function of x measured at times t. t-1. t-2 . etc. Thus, the dependent variable is a linear function of x . and x is lagged by 1, 2 . etc. time periods. The beta weights ( i ) can be considered slope parameters in this equation. You may recognize this equation as a special case of the general linear regression equation (see the Multiple Regression overview). If the weights for the lagged time periods are statistically significant, we can conclude that the y variable is predicted (or explained) with the respective lag. Almon Distributed Lag A common problem that often arises when computing the weights for the multiple linear regression model shown above is that the values of adjacent (in time) values in the x variable are highly correlated. In extreme cases, their independent contributions to the prediction of y may become so redundant that the correlation matrix of measures can no longer be inverted, and thus, the beta weights cannot be computed. In less extreme cases, the computation of the beta weights and their standard errors can become very imprecise, due to round-off error. In the context of Multiple Regression this general computational problem is discussed as the multicollinearity or matrix ill-conditioning issue. Almon (1965) proposed a procedure that will reduce the multicollinearity in this case. Specifically, suppose we express each weight in the linear regression equation in the following manner: Almon could show that in many cases it is easier (i. e. it avoids the multicollinearity problem) to estimate the alpha values than the beta weights directly. Note that with this method, the precision of the beta weight estimates is dependent on the degree or order of the polynomial approximation . Misspecifications. A general problem with this technique is that, of course, the lag length and correct polynomial degree are not known a priori . The effects of misspecifications of these parameters are potentially serious (in terms of biased estimation). This issue is discussed in greater detail in Frost (1975), Schmidt and Waud (1973), Schmidt and Sickles (1975), and Trivedi and Pagan (1979). Single Spectrum (Fourier) Analysis Spectrum analysis is concerned with the exploration of cyclical patterns of data. The purpose of the analysis is to decompose a complex time series with cyclical components into a few underlying sinusoidal (sine and cosine) functions of particular wavelengths. The term quotspectrumquot provides an appropriate metaphor for the nature of this analysis: Suppose you study a beam of white sun light, which at first looks like a random (white noise) accumulation of light of different wavelengths. However, when put through a prism, we can separate the different wave lengths or cyclical components that make up white sun light. In fact, via this technique we can now identify and distinguish between different sources of light. Thus, by identifying the important underlying cyclical components, we have learned something about the phenomenon of interest. In essence, performing spectrum analysis on a time series is like putting the series through a prism in order to identify the wave lengths and importance of underlying cyclical components. As a result of a successful analysis one might uncover just a few recurring cycles of different lengths in the time series of interest, which at first looked more or less like random noise. A much cited example for spectrum analysis is the cyclical nature of sun spot activity (e. g. see Bloomfield, 1976, or Shumway, 1988). It turns out that sun spot activity varies over 11 year cycles. Other examples of celestial phenomena, weather patterns, fluctuations in commodity prices, economic activity, etc. are also often used in the literature to demonstrate this technique. To contrast this technique with ARIMA or Exponential Smoothing. the purpose of spectrum analysis is to identify the seasonal fluctuations of different lengths, while in the former types of analysis, the length of the seasonal component is usually known (or guessed) a priori and then included in some theoretical model of moving averages or autocorrelations. The classic text on spectrum analysis is Bloomfield (1976) however, other detailed discussions can be found in Jenkins and Watts (1968), Brillinger (1975), Brigham (1974), Elliott and Rao (1982), Priestley (1981), Shumway (1988), or Wei (1989). Cross-spectrum analysis is an extension of Single Spectrum (Fourier) Analysis to the simultaneous analysis of two series. In the following paragraphs, we will assume that you have already read the introduction to single spectrum analysis. Detailed discussions of this technique can be found in Bloomfield (1976), Jenkins and Watts (1968), Brillinger (1975), Brigham (1974), Elliott and Rao (1982), Priestley (1981), Shumway (1988), or Wei (1989). Strong periodicity in the series at the respective frequency. A much cited example for spectrum analysis is the cyclical nature of sun spot activity (e. g. see Bloomfield, 1976, or Shumway, 1988). It turns out that sun spot activity varies over 11 year cycles. Other examples of celestial phenomena, weather patterns, fluctuations in commodity prices, economic activity, etc. are also often used in the literature to demonstrate this technique. The purpose of cross-spectrum analysis is to uncover the correlations between two series at different frequencies. For example, sun spot activity may be related to weather phenomena here on earth. If so, then if we were to record those phenomena (e. g. yearly average temperature) and submit the resulting series to a cross-spectrum analysis together with the sun spot data, we may find that the weather indeed correlates with the sunspot activity at the 11 year cycle. That is, we may find a periodicity in the weather data that is in-sync with the sun spot cycles. One can easily think of other areas of research where such knowledge could be very useful for example, various economic indicators may show similar (correlated) cyclical behavior various physiological measures likely will also display quotcoordinatedquot (i. e. correlated) cyclical behavior, and so on. Basic Notation and Principles A simple example Consider the following two series with 16 cases: 000000 7.945114 .077020 3.729484 .304637 .078835 .043539 .032740 0.000000 Results for Each Variable The complete summary contains all spectrum statistics computed for each variable, as described in the Single Spectrum (Fourier) Analysis overview section. Looking at the results shown above, it is clear that both variables show strong periodicities at the frequencies .0625 and .1875. Cross-periodogram, Cross-Density, Quadrature-density, Cross-amplitude Analogous to the results for the single variables, the complete summary will also display periodogram values for the cross periodogram. However, the cross-spectrum consists of complex numbers that can be divided into a real and an imaginary part. These can be smoothed to obtain the cross-density and quadrature density (quad density for short) estimates, respectively. (The reasons for smoothing, and the different common weight functions for smoothing are discussed in the Single Spectrum (Fourier) Analysis .) The square root of the sum of the squared cross-density and quad-density values is called the cross - amplitude . The cross-amplitude can be interpreted as a measure of covariance between the respective frequency components in the two series. Thus we can conclude from the results shown in the table above that the .0625 and .1875 frequency components in the two series covary. Squared Coherency, Gain, and Phase Shift There are additional statistics that can be displayed in the complete summary. Squared coherency. One can standardize the cross-amplitude values by squaring them and dividing by the product of the spectrum density estimates for each series. The result is called the squared coherency . which can be interpreted similar to the squared correlation coefficient (see Correlations - Overview ), that is, the coherency value is the squared correlation between the cyclical components in the two series at the respective frequency. However, the coherency values should not be interpreted by themselves for example, when the spectral density estimates in both series are very small, large coherency values may result (the divisor in the computation of the coherency values will be very small), even though there are no strong cyclical components in either series at the respective frequencies. Gain. The gain value is computed by dividing the cross-amplitude value by the spectrum density estimates for one of the two series in the analysis. Consequently, two gain values are computed, which can be interpreted as the standard least squares regression coefficients for the respective frequencies. Phase shift. Finally, the phase shift estimates are computed as tan-1 of the ratio of the quad density estimates over the cross-density estimate. The phase shift estimates (usually denoted by the Greek letter ) are measures of the extent to which each frequency component of one series leads the other. How the Example Data were Created Now, let us return to the example data set presented above. The large spectral density estimates for both series, and the cross-amplitude values at frequencies 0.0625 and .1875 suggest two strong synchronized periodicities in both series at those frequencies. In fact, the two series were created as: v1 cos(2 .0625(v0-1)) .75sin(2 .2(v0-1)) v2 cos(2 .0625(v02)) .75sin(2 .2(v02)) Frequency and Period The wave length of a sine or cosine function is typically expressed in terms of the number of cycles per unit time ( Frequency ), often denoted by the Greek letter nu ( some text books also use f ). For example, the number of letters handled in a post office may show 12 cycles per year: On the first of every month a large amount of mail is sent (many bills come due on the first of the month), then the amount of mail decreases in the middle of the month, then it increases again towards the end of the month. Therefore, every month the fluctuation in the amount of mail handled by the post office will go through a full cycle. Thus, if the unit of analysis is one year, then n would be equal to 12, as there would be 12 cycles per year. Of course, there will likely be other cycles with different frequencies. For example, there might be annual cycles ( 1), and perhaps weekly cycles 2 cosine coefficient k 2 N2 where P k is the periodogram value at frequency k and N is the overall length of the series. The periodogram values can be interpreted in terms of variance (sums of squares) of the data at the respective frequency or period. Customarily, the periodogram values are plotted against the frequencies or periods. The Problem of Leakage In the example above, a sine function with a frequency of 0.2 was quotinsertedquot into the series. However, because of the length of the series (16), none of the frequencies reported exactly quothitsquot on that frequency. In practice, what often happens in those cases is that the respective frequency will quotleakquot into adjacent frequencies. For example, one may find large periodogram values for two adjacent frequencies, when, in fact, there is only one strong underlying sine or cosine function at a frequency that falls in-between those implied by the length of the series. There are three ways in which one can approach the problem of leakage: By padding the series one may apply a finer frequency quotrosterquot to the data, By tapering the series prior to the analysis one may reduce leakage, or By smoothing the periodogram one may identify the general frequency quotregionsquot or ( spectral densities ) that significantly contribute to the cyclical behavior of the series. See below for descriptions of each of these approaches. Padding the Time Series Because the frequency values are computed as Nt (the number of units of times) one may simply pad the series with a constant (e. g. zeros) and thereby introduce smaller increments in the frequency values. In a sense, padding allows one to apply a finer roster to the data. In fact, if we padded the example data file described in the example above with ten zeros, the results would not change, that is, the largest periodogram peaks would still occur at the frequency values closest to .0625 and .2. (Padding is also often desirable for computational efficiency reasons see below.) The so-called process of split-cosine-bell tapering is a recommended transformation of the series prior to the spectrum analysis. It usually leads to a reduction of leakage in the periodogram. The rationale for this transformation is explained in detail in Bloomfield (1976, p. 80-94). In essence, a proportion ( p ) of the data at the beginning and at the end of the series is transformed via multiplication by the weights: where m is chosen so that 2 mN is equal to the proportion of data to be tapered ( p ). Data Windows and Spectral Density Estimates In practice, when analyzing actual data, it is usually not of crucial importance to identify exactly the frequencies for particular underlying sine or cosine functions. Rather, because the periodogram values are subject to substantial random fluctuation, one is faced with the problem of very many quotchaoticquot periodogram spikes. In that case, one would like to find the frequencies with the greatest spectral densities . that is, the frequency regions, consisting of many adjacent frequencies, that contribute most to the overall periodic behavior of the series. This can be accomplished by smoothing the periodogram values via a weighted moving average transformation. Suppose the moving average window is of width m (which must be an odd number) the following are the most commonly used smoothers (note: p (m-1)2 ). Daniell (or equal weight) window. The Daniell window (Daniell 1946) amounts to a simple (equal weight) moving average transformation of the periodogram values, that is, each spectral density estimate is computed as the mean of the m2 preceding and subsequent periodogram values. Tukey window. In the Tukey (Blackman and Tukey, 1958) or Tukey-Hanning window (named after Julius Von Hann), for each frequency, the weights for the weighted moving average of the periodogram values are computed as: Hamming window. In the Hamming (named after R. W. Hamming) window or Tukey-Hamming window (Blackman and Tukey, 1958), for each frequency, the weights for the weighted moving average of the periodogram values are computed as: Parzen window. In the Parzen window (Parzen, 1961), for each frequency, the weights for the weighted moving average of the periodogram values are computed as: Bartlett window. In the Bartlett window (Bartlett, 1950) the weights are computed as: With the exception of the Daniell window, all weight functions will assign the greatest weight to the observation being smoothed in the center of the window, and increasingly smaller weights to values that are further away from the center. In many cases, all of these data windows will produce very similar results Preparing the Data for Analysis Let us now consider a few other practical points in spectrum analysis. Usually, one wants to subtract the mean from the series, and detrend the series (so that it is stationary ) prior to the analysis. Otherwise the periodogram and density spectrum will mostly be quotoverwhelmedquot by a very large value for the first cosine coefficient (for frequency 0.0). In a sense, the mean is a cycle of frequency 0 (zero) per unit time that is, it is a constant. Similarly, a trend is also of little interest when one wants to uncover the periodicities in the series. In fact, both of those potentially strong effects may mask the more interesting periodicities in the data, and thus both the mean and the trend (linear) should be removed from the series prior to the analysis. Sometimes, it is also useful to smooth the data prior to the analysis, in order to quottamequot the random noise that may obscure meaningful periodic cycles in the periodogram. Results when no Periodicity in the Series Exists Finally, what if there are no recurring cycles in the data, that is, if each observation is completely independent of all other observations If the distribution of the observations follows the normal distribution, such a time series is also referred to as a white noise series (like the white noise one hears on the radio when tuned in-between stations). A white noise input series will result in periodogram values that follow an exponential distribution. Thus, by testing the distribution of periodogram values against the exponential distribution, one may test whether the input series is different from a white noise series. In addition, the you can also request to compute the Kolmogorov-Smirnov one-sample d statistic (see also Nonparametrics and Distributions for more details). Testing for white noise in certain frequency bands. Note that you can also plot the periodogram values for a particular frequency range only. Again, if the input is a white noise series with respect to those frequencies (i. e. it there are no significant periodic cycles of those frequencies), then the distribution of the periodogram values should again follow an exponential distribution. The interpretation of the results of spectrum analysis is discussed in the Basic Notation and Principles topic, however, we have not described how it is done computationally. Up until the mid-1960s the standard way of performing the spectrum decomposition was to use explicit formulae to solve for the sine and cosine parameters. The computations involved required at least N2 (complex) multiplications. Thus, even with todays high-speed computers. it would be very time consuming to analyze even small time series (e. g. 8,000 observations would result in at least 64 million multiplications). The time requirements changed drastically with the development of the so-called fast Fourier transform algorithm. or FFT for short. In the mid-1960s, J. W. Cooley and J. W. Tukey (1965) popularized this algorithm which, in retrospect, had in fact been discovered independently by various individuals. Various refinements and improvements of this algorithm can be found in Monro (1975) and Monro and Branch (1976). Readers interested in the computational details of this algorithm may refer to any of the texts cited in the overview. Suffice it to say that via the FFT algorithm, the time to perform a spectral analysis is proportional to N log2( N ) -- a huge improvement. However, a draw-back of the standard FFT algorithm is that the number of cases in the series must be equal to a power of 2 (i. e. 16, 64, 128, 256. ). Usually, this necessitated padding of the series, which, as described above, will in most cases not change the characteristic peaks of the periodogram or the spectral density estimates. In cases, however, where the time units are meaningful, such padding may make the interpretation of results more cumbersome. Computation of FFT in Time Series The implementation of the FFT algorithm allows you to take full advantage of the savings afforded by this algorithm. On most standard computers, series with over 100,000 cases can easily be analyzed. However, there are a few things to remember when analyzing series of that size. As mentioned above, the standard (and most efficient) FFT algorithm requires that the length of the input series is equal to a power of 2. If this is not the case, additional computations have to be performed. It will use the simple explicit computational formulas as long as the input series is relatively small, and the number of computations can be performed in a relatively short amount of time. For long time series, in order to still utilize the FFT algorithm, an implementation of the general approach described by Monro and Branch (1976) is used. This method requires significantly more storage space, however, series of considerable length can still be analyzed very quickly, even if the number of observations is not equal to a power of 2. For time series of lengths not equal to a power of 2, we would like to make the following recommendations: If the input series is small to moderately sized (e. g. only a few thousand cases), then do not worry. The analysis will typically only take a few seconds anyway. In order to analyze moderately large and large series (e. g. over 100,000 cases), pad the series to a power of 2 and then taper the series during the exploratory part of your data analysis. copy Copyright StatSoft, Inc. 1984-2000 STATISTICA is a trademark of StatSoft, Inc.


No comments:

Post a Comment